Shared Assemblies and Strongly Named Assemblies (CLR via C#)

The CLR supports two kinds of assemblies: weakly named assemblies and strongly named assemblies.

The real difference between weakly named and strongly named assemblies is that a strongly named assembly is signed with a publisher’s public/private key pair that uniquely identifies the assembly’s publisher.

An assembly can be deployed in two ways: privately or globally. A privately deployed assembly is an assembly that is deployed in the application’s base directory or one of its subdirectories. A weakly named assembly can be deployed only privately.

A strongly named assembly consists of four attributes that uniquely identify the assembly: a file name (without an extension), a version number, a culture identity, and a public key. Since public keys are very large numbers, we frequently use a small hash value derived from a public key. This hash value is called a public key token.

The following figure shows how PE is signed

Because public keys are such large numbers, and a single assembly might reference many assemblies, a large percentage of the resulting file’s total size would be occupied with public key information. To conserve storage space, Microsoft hashes the public key and takes the last8 bytes of the hashed value. These reduced public key values—known as public key tokens—are what are actually stored in an AssemblyRef table. In general, developers and end users will see public key token values much more frequently than full public key values. Note, however, that the CLR never uses public key tokens when making security or trust decisions because it is possible that several public keys could hash to a single public key token.

The Global Assembly Cash (GAC):

If an assembly is to be accessed by multiple applications, the assembly must be placed into a well-known directory, and the CLR must know to look in this directory automatically when a reference to the assembly is detected. This well-known location is called the global assembly cache (GAC), which can usually be found in the following directory (assuming that Windows is installed in the C:\Windows directory):


The GAC directory is structured: It contains many subdirectories, and an algorithm is used to generate the names of these subdirectories. You should never manually copy assembly files into the GAC; instead, you should use tools to accomplish this task. These tools know the GAC’s internal structure and how to generate the proper subdirectory names.

The most common tool for installing strongly named assemblies into the GAC is GACUtil.exe

What is the purpose of “registering” an assembly in the GAC? Well, say two companies each produce an OurLibrary assembly consisting of one file: OurLibrary.dll. Obviously, both of these files can’t go in the same directory because the last one installed would overwrite the first one, surely breaking some application. When you install an assembly into the GAC, dedicated subdirectories are created under the C:\Windows\Assembly directory, and the assembly files are copied into one of these subdirectories.

Consider using delayed signing if you want to install your assemblies to the CAG in development environment.

Figure below illustrates how CLR resolves a referenced type

The above figure is not correct case if the references type is in the .NET Framework assemblies. In this case, CLR loads the file that matches CLR version.

Type Forwarding:

The CLR supports the ability to move a type (class, structure, enum, interface, or delegate) from one assembly to another. For example, in .NET 3.5, the System.TimeZoneInfo class is defined in the System.Core.dll assembly. But in .NET 4.0, Microsoft moved this class to the MSCorLib.dll assembly. Normally, moving a type from one assembly to another would break applications. However, the CLR offers a System.Runtime.CompilerServices.TypeForwardedToAttribute attribute, which can be applied to the original assembly (such asSystem.Core.dll). The parameter that you pass to this attribute’s constructor is of type System.Type and it indicates the new type (that is now defined in MSCorLib.dll) that applications should now use. The CLR’s binder uses this information. Since the TypeForwardedToAttribute’s constructor takes a Type, the assembly containing this attribute will be dependent on the new assembly defining the type. If you take advantage of this feature, then you should also apply the System.Runtime.CompilerServices.TypeForwardedFromAttribute attribute to the type in the new assembly and pass to this attribute’s constructor a string with the full name of the assembly that used to define the type. This attribute typically is used for tools, utilities, and serialization. Since the TypeForwardedFromAttribute’s constructor takes a String, the assembly containing this attribute is not dependent on the assembly that used to define the type.

Publisher Control Policy:

Microsoft offers an XML config file that is used to ease the versioning of any assembly. Simply you (as a publisher for the assembly) can port the new version of your assembly with config file which will tell CLR to load the new assembly (say version 2.0) instead of the previous version (1.0). This is done automatically without any end user interaction.

Further if the end user wants to use the previous version for some reasons and ignores the publishers control policy, he can edit his application configuration file to disable the publisher control policy. Doing this for each application you’ve is not practicl so the solution is to edit the Machine.Config file to apply these changes.

The Semantics of Constructors

Default Constructor Construction:

There are four characteristics of a class under which the compiler needs to synthesize a default constructor for classes that declare no constructor at all. The Standard refers to these as implicit, nontrivial default constructors. The synthesized constructor fulfills only an implementation need. It does this by

  1. Invoking member object default constructor or,
  2. Base class default constructors or,
  3. Initializing the virtual function or,
  4. Virtual base class mechanism for each object.

Classes that do not exhibit these characteristics and that declare no constructor at all are said to have implicit, trivial default constructors. In practice, these trivial default constructors are not synthesized.

Within the synthesized default constructor, only the base class subobjects and member class objects are initialized. All other nonstatic data members, such as integers, pointers to integers, arrays of integers, and so on, are not initialized. These initializations are needs of the program, not of the implementation. If there is a program need for a default constructor, such as initializing a pointer to 0, it is the programmer’s responsibility to provide it in the course of the class implementation.

Programmers new to C++ often have two common misunderstandings:

  1. That a default constructor is synthesized for every class that does not define one
  2. That the compiler-synthesized default constructor provides explicit default initializers for each data member declared within the class

As you have seen, neither of these is true

In the case of having virtual function in your class, the following two class "augmentations" occur during compilation:

  1. A virtual function table (referred to as the class vtbl in the original cfront implementation) is generated and populated with the addresses of the active virtual functions for that class.
  2. Within each class object, an additional pointer member (the vptr) is synthesized to hold the address of the associated class vtbl.

Copy Constructor Construction:

When are bitwise copy semantics not exhibited by a class? There are four instances:

  1. When the class contains a member object of a class for which a copy constructor exists (either explicitly declared by the class designer, or synthesized by the compiler)
  2. When the class is derived from a base class for which a copy constructor exists (again, either explicitly declared or synthesized)
  3. When the class declares one or more virtual functions
  4. When the class is derived from an inheritance chain in which one or more base classes are virtual

Program Transformation Semantics:

Such a requirement would levy a possibly severe performance penalty on a great many programs. For example, although the following three initializations are semantically equivalent:

X xx0( 1024 );
X xx1 = X( 1024 );
X xx2 = ( X ) 1024;

In the second and third instances, the syntax explicitly provides for a two-step initialization:

  1. Initialize a temporary object with 1024.
  2. Copy construct the explicit object with the temporary object.

That is, whereas xx0 is initialized by a single constructor invocation

// Pseudo C++ Code
xx0.X::X( 1024 );

a strict implementation of either xx1 or xx2 results in two constructor invocations, a temporary object, and a call to the destructor of class X on that temporary object:

// Pseudo C++ Code
X __temp0;
__temp0.X::X( 1024 );
xx1.X::X( __temp0 );

The simplest method of implementing the copy constructor is as follows:

Point3d::Point3d( const Point3d &rhs )


   _x = rhs._x;

   _y = rhs._y;

   _z = rhs._z;


This is okay, but use of the C library memcpy() function would be more efficient:

Point3d::Point3d( const Point3d &rhs )


   memcpy( this, &rhs, sizeof( Point3d );


Use of both memcpy() and memset(), however, works only if the classes do not contain any compiler-generated internal members. If the Point3d class declares one or more virtual functions or contains a virtual base class, use of either of these functions will result in overwriting the values the compiler set for these members.

As you can see, correct use of the memset() and memcpy() functions requires some knowledge of the C++ Object Model semantics!

Member Initialization List:

You must use the member initialization list in the following cases in order for your program to compile:

  1. When initializing a reference member.
  2. When initializing a const member.
  3. When invoking a base or member class constructor with a set of arguments.

The order in which the list entries are set down is determined by the declaration order of the members within the class declaration, not the order within the initialization list.

In summary, the compiler iterates over and possibly reorders the initialization list to reflect the declaration order of the members. It inserts the code within the body of the constructor prior to any explicit user code.

Object Lessons

There are three general flavors of transformations required by any object model component:

  1. Implementation-dependent transformations. These are implementation-specific aspects and vary across compilers.
  2. Language semantics transformations. These include constructor/destructor synthesis and augmentation, memberwise initialization and memberwise copy support, and the insertion within program code of conversion operators, temporaries, and constructor/destructor calls.
  3. Code and object model transformations. These include support for virtual functions, virtual base classes and inheritance in general, operators new and delete, arrays of class objects, local static class instances, and the static initialization of global objects with nonconstant expressions.

There are two aspects to the C++ Object Model:

  1. Direct support for object-oriented programming provided within the language.
  2. The underlying mechanisms by which this support is implemented.

Determining when to provide a copy constructor, and when not, is not something one should guess at or have adjudicated by some language guru. It should come from an understanding of the Object Model.

Differences between C Macro and Inline Funcion:

  • Macro invocations do not perform type checking, or even check that arguments are well-formed, whereas function calls usually do.
  • You cannot make the macro return something which is not the result of the last expression invoked inside it
  • Since C macros use mere textual substitution, this may result in unintended side-effects and inefficiency due to re-evaluation of arguments and order of operations.
  • Compiler errors within macros are often difficult to understand, because they refer to the expanded code, rather than the code the programmer typed.
  • Debugging information for inlined code is usually more helpful than that of macro-expanded code.

As you will see, the primary layout and access-time overheads within C++ are associated with the virtuals, that is,

  • The virtual function mechanism in its support of an efficient run-time binding.
  • Virtual base class in its support of a single, shared instance of a base class occurring multiple times within an inheritance hierarchy.

The C++ Object Model

Say we’ve the following class

class Point



    Point( float xval );

    virtual ~Point();

    float x() const;

int PointCount();


    virtual ostream& print( ostream &os ) const;

    float _x;

    static int _point_count;

A Simple Object Model:

In a simple model, the members themselves are not placed within the object. Only pointers addressing the members are placed within the object. Doing this avoids problems from members’ being quite different types and requiring different amounts (and sometimes different types of) storage. Members within an object are addressed by their slot’s index.

Although this model is not used in practice, this simple concept of an index or slot number is the one that has been developed into the C++ pointer-to-member concept.

A Table-driven Object Model:

For an implementation to maintain a uniform representation for the objects of all classes, an alternative object model might factor out all member specific information, placing it in a data member and member function pair of tables. The class object contains the pointers to the two member tables. The member function table is a sequence of slots, with each slot addressing a member. The data member table directly holds the data.

Although this model is not used in practice within C++, the concept of a member function table has been the traditional implementation supporting efficient runtime resolution of virtual functions

The C++ Object Model:

Stroustrup’s original (and still prevailing) C++ Object Model is derived from the simple object model by optimizing for space and access time. Nonstatic data members are allocated directly within each class object. Static data members are stored outside the individual class object. Static and nonstatic function members are also hoisted outside the class object. Virtual functions are supported in two steps:

  1. A table of pointers to virtual functions is generated for each class (this is called the virtual table).
  2. A single pointer to the associated virtual table is inserted within each class object (traditionally, this has been called the vptr). The setting, resetting, and not setting of the vptr is handled automatically through code generated within each class constructor, destructor, and copy assignment operator. The type_info object associated with each class in support of runtime type identification (RTTI) is also addressed within the virtual table, usually within the table’s first slot.

Figure below illustrates the general C++ Object Model for our Point class. The primary strength of the C++ Object Model is its space and runtime efficiency. Its primary drawback is the need to recompile unmodified code that makes use of an object of a class for which there has been an addition, removal, or modification of the nonstatic class data members. (The two table model, for example, offers more flexibility by providing an additional level of indirection. But it does this at the cost of space and runtime efficiency.)

Adding Inheritance

In the case of virtual inheritance, only a single occurrence of the base class is maintained (called a subobject) regardless of how many times the class is derived from within the inheritance chain. iostream, for example, contains only a single instance of the virtual ios base class.

How might a derived class internally model its base class instance? In a simple base class object model, each base class might be assigned a slot within the derived class object. Each slot holds the address of the base class subobject. The primary drawback to this scheme is the space and access-time overhead of the indirection. A benefit is that the size of the class object is unaffected by changes in the size of its associated base classes.

Alternatively, one can imagine a base table model. Here, a base class table is generated for which each slot contains the address of an associated base class, much as the virtual table holds the address of each virtual function. Each class object contains a bptr initialized to address its base class table. The primary drawback to this strategy, of course, is both the space and access-time overhead of the indirection. One benefit is a uniform representation of inheritance within each class object. Each class object would contain a base table pointer at some fixed location regardless of the size or number of its base classes. A second benefit would be the ability to grow, shrink, or otherwise modify the base class table without changing the size of the class objects themselves.

Understanding C++ object model let programmer knows what the final code after C++ code transformations.

A C program’s trick is sometimes a C++ program’s trap. One example of this is the use of a one-element array at the end of a struct to allow individual struct objects to address variable-sized arrays:

struct mumble {

    /* stuff */

    char pc[ 1 ];


// grab a string from file or standard input

// allocate memory both for struct & string

struct mumble *pmumb1 = ( struct mumble* ) malloc(sizeof(struct mumble)+strlen(string)+1);

strcpy( &mumble.pc, string );

This may or may not translate well when placed within a class declaration that

  • Specifies multiple access sections containing data,
  • Derives from another class or is itself the object of derivation, or
  • Defines one or more virtual functions.

If a programmer absolutely needs a data portion of an arbitrarily complex C++ class to have the look and feel of an equivalent C declaration, that portion is best factored out into an independent struct declaration. The original idiom for combining this C portion with its C++ part was to derive the C++ part from the C struct:

struct C_point { … };

class Point : public C_point { … };

Thus supporting both the C and C++ usage:

extern void draw_line( Point, Point );

extern "C" void draw_rect ( C_point, C_Point );

draw_line( Point( 0, 0 ), Point( 100, 100 ));

draw_rect( Point( 0, 0 ), Point( 100, 100 ));

This idiom is no longer recommended, however, because of changes to the class inheritance layout in some compilers (for example, the Microsoft C++ compiler) in support of the virtual function mechanism. Composition, rather than inheritance, is the only portable method of combining C and C++ portions of a class (the conversion operator provides a handy extraction method):

struct C_point { … };

class Point {


    operator C_point() { return _c_point; }

    // …


    C_point _c_point;

    // …


One reasonable use of the C struct in C++, then, is when you want to pass all or part of a complex class object to a C function. This struct declaration serves to encapsulate that data and guarantees a compatible C storage layout. This guarantee, however, is maintained only under composition. Under inheritance, the compiler decides whether additional data members are inserted within the base struct subobject.

The memory requirements to represent a class object in general are the following:

  • The accumulated size of its nonstatic data members.
  • Plus any padding (between members or on the aggregate boundary itself) due to alignment constraints (or simple efficiency)
  • Plus any internally generated overhead to support the virtuals.

The memory requirement to represent a pointer, however, is a fixed size regardless of the type it addresses.

But how, then, does a pointer to a ZooAnimal differ from, say, a pointer to an integer or a pointer to a template Array instantiated with a String?

ZooAnimal *px;

int *pi

Array< String > *pta;

In terms of memory requirements, there is generally no difference: all three need to be allocated sufficient memory to hold a machine address (usually a machine word). So the difference between pointers to different types rests neither in the representation of the pointer nor in the values (addresses) the pointers may hold. The difference lies in the type of object being addressed. That is, the type of a pointer instructs the compiler as to how to interpret the memory found at a particular address and also just how much memory that interpretation should span.

When a base class object is directly initialized or assigned with a derived class object, the derived object is sliced to fit into the available memory resources of the base type. There is nothing of the derived type remaining. Polymorphism is not present, and an observant compiler can resolve an invocation of a virtual function through the object at compile time, thus by-passing the virtual mechanism. This can be a significant performance win if the virtual function is defined as inline.

C++ also supports a concrete ADT style of programming now called object-based (OB)—nonpolymorphic data types, such as a String class. A String class exhibits a nonpolymorphic form of encapsulation; it provides a public interface and private implementation (both of state and algorithm) but does not support type extension. An OB design can be faster and more compact than an equivalent OO design. Faster because all function invocations are resolved at compile time and object construction need not set up the virtual mechanism, and more compact because each class object need not carry the additional overhead traditionally associated with the support of the virtual mechanism. However, an OB design also is less flexible

Network Programming with Windows Sockets

Named pipes and mailslots are suitable for interprocess communication between processes on the same computer or processes on Windows computers connected by a local or wide area network.

Named pipes and mailslots (both simply referred to here as “named pipes” unless the distinction is important) have the distinct drawback, however, of not being an industry standard. This is the case even though named pipes are protocolindependent and can run over industry-standard protocols such as TCP/IP.

Windows provides interoperability by supporting Windows Sockets, which are nearly the same as, and interoperable with, Berkeley Sockets, a de facto industry standard.

Winsock, because of conformance to industry standards, has naming conventions and programming characteristics somewhat different from the Windows functions described so far. The Winsock API is not strictly a part of the Windows API. Winsock also provides additional functions that are not part of the standard; these functions are used only as absolutely required. Among other advantages, programs will be more portable to other operating systems.

The Winsock API was developed as an extension of the Berkeley Sockets API into the Windows environment, and all Windows versions support Winsock. Winsock’s benefits include the following:

  • Porting code already written for Berkeley Sockets is straightforward.
  • Windows machines easily integrate into TCP/IP networks, both IPv4 and IPv6. IPv6, among other features, allows for longer IP addresses, overcoming the 4-byte address limit of IPv4.
  • Sockets can be used with Windows overlapped I/O, which, among other things, allows servers to scale when there is a large number of active clients.
  • Windows provides non-portable extensions.
  • Sockets supports protocols other than TCP/IP particularly Asynchronous Transfer Mode (ATM)

Differences between Winsock and Named Pipes (General):

  • Named pipes can be message-oriented, which can simplify programs.
  • Named pipes require ReadFile and WriteFile, whereas sockets can also use send and recv.
  • Sockets, unlike named pipes, are flexible so that a user can select the protocol to use with a socket, such as TCP or UDP. The user can also select protocols based on quality of service and other factors.
  • Sockets are based on an industry standard, allowing interoperability with non-Windows machines.
  • Named pipes do not have explicit port numbers and are distinguished by name.

Differences between Winsock and Named Pipes (Server Side):

  • When using sockets, call accept repetitively to connect to multiple clients. Each call will return a different connected socket.
  • Named requires you to create each named pipe instance with CreateNamedPipe. accept creates socket instances.
  • There is no upper bound on the number of socket clients, but there can be a limit on the number of named pipe instances, depending on the first call to CreateNampedPipe.
  • There are no WinSock convenience functions comparable to TransactNamedPipe.
  • Named pipes do not have explicit port numbers and are distinguished by name.

A named pipe server requires two function calls (CreateNamedPipe and ConnectNamedPipe), whereas socket servers require four function calls (socket, bind, listen and accept)

Differences between Winsock and Named Pipes (Client Side):

  • Named pipes use WaitNamedPipe following by CreateFile. The socket sequence is in the opposite order because the function socket can be regarded as the creation function, while connect is the blocking function.
  • An additional distinction is connect that is a socket client function, while the similarly named ConnectNamedPipe is a server function.

A thread-persistence problem may occur when you’ve multiple clients and DLL which includes the WinSock processing code (accept and recv). There are two proposed solutions:

  • Using Thread Local Storage (TLS). This solution dictates that each thread must handle a specific accept request at a time.
  • Encapsulate each accept request into a structure and then pass this structure to the thread.


Datagrams are similar to mailslots and are used in similar circumstances. There is no connection between the sender and receiver, and there can be multiple receivers. Delivery to the receiver is not ensured with either mailslots or datagrams, and successive messages will not necessarily be received in the order they were sent.

Windows Pipes and Mailslots

In this article:

Anonymous Pipes.

Named Pipes.



Two primary Windows mechanisms for IPC are the anonymous pipe and the named pipe, both of which are accessed with the familiar WriteFile and ReadFIle functions. Simple anonymous pipes are character-based and half-duplex. As such, they are well suited for redirecting the output of one program to the input of another, as is common with communicating Linux and UNIX programs.

Named pipes are much more powerful than anonymous pipes. They are fullduplex and message-oriented, and they allow networked communication. Furthermore, there can be multiple open handles on the same pipe. These capabilities, coupled with convenient transaction-oriented named pipe functions, make named pipes appropriate for creating client/server systems.

Mailslots, which allow for one-to-many message broadcasting and are also filelike, are used to help clients locate servers.

Anonymous Pipes are useful for simple byte-based communication programs inside the within the same machine. Whereas Named Pipes have the following features:

Named pipes are message-oriented, so the reading process can read varyinglength messages precisely as sent by the writing process

Named pipes are bidirectional, so two processes can exchange messages over the same pipe.

There can be multiple, independent instances of pipes with the same name. For example, several clients can communicate concurrently with a single server using distinct instances of a named pipe. Each client can have its own named pipe instance, and the server can respond to a client using the same instance.

Networked clients can access the pipe by name. Named pipe communication is the same whether the two processes are on the same machine or on different machines.

Several convenience and connection functions simplify named pipe request/ response interaction and client/server connection.

Figure above shows an illustrative client/server relationship, and the pseudocode shows one scheme for using named pipes. Notice that the server creates multiple instances of the same pipe, each of which can support a client. The server also creates a thread for each named pipe instance, so that each client has a dedicated thread and named pipe instance.

When creating a named pipe, the name must be in the form “\\.\pipe\pipename” the period (.) stands for the local machine; thus, you can’t create a pipe on a remote machine. The pipename is case-insensitive, can be up to 256 characters long, and can contain any character other than backslash.

When client is about initializing an instance from a named pipe:

If the server on the same machine use: “\\.\pipe\pipename

If the server on remote machine use: “\\servername\pipe\pipename



A Windows mailslot, like a named pipe, has a name that unrelated processes can use for communication. Mailslots are a broadcast mechanism, similar to datagrams, and behave differently from named pipes, making them useful in some important but limited situations. Here are the significant mailslot characteristics:

A mailslot is one-directional

A mailslot can have multiple writers and multiple readers, but frequently it will be one-to-many of one form or the other

A writer (client) does not know for certain that all, some, or any readers (servers) actually received the message

Mailslots can be located over a network domain

Message lengths are limited

Last but not least, client can locate mailslot performing CreateFile using the name “\\*\mailslot\mailslotname“. In this way, the * acts as wildcard and the client can locate every server on the domain, a networked group of systems assigned a common name by the network administrator.

These tables summarize pipes and mailslots

Windows pipes and mailslots, which are accessed with file I/O operations, provide stream-oriented interprocess and networked communication.

Structured Exception Handling

C++, C#, and other languages have very similar mechanisms, however, and these mechanisms build on the SEH facilities presented here.

Console control handlers, allow a program to detect external signals such as Ctrl-C a from the console or the user logging off or shutting down the system. These signals also provide a limited form of process-to-process signaling.

Vectored exception handling allows the user to specify functions to be executed directly when an exception occurs, and the functions are executed before SEH is invoked.

Exception handlers can respond to a variety of asynchronous events, but they do not detect situations such as the user logging off or entering a Ctrl-C from the keyboard to stop a program. Use console control handlers to detect such events.

There is important distinction between exceptions and signals. A signal applies to the entire process, whereas an exception applies only to the thread executing the code where the exception occurs.

Exception handling functions can be directly associated with exceptions, just as console control handlers can be associated with console control events. When an exception occurs, the vectored exception handlers are called first, before the system unwinds the stack to look for structured exception handlers.

Console control handlers can respond to external events that do not generate exceptions. VEH is a newer feature that allows functions to be executed before SEH processing occurs. VEH is similar to conventional interrupt handling.

DLL Injection and API Hooking

Situations that require breaking through process boundary walls to access another process’ address space include the following:

  • When you want to subclass a window created by another process
  • When you need debugging aids—for example, when you need to determine which dynamic-link libraries (DLLs) another process is using
  • When you want to hook other processes

In computer programming, DLL injection is a technique used to run code within the address space of another process by forcing it to load a dynamic-link library. DLL injection is often used by third-party developers to influence the behavior of a program in a way its authors did not anticipate or intend. For example, the injected code could trap system function calls, or read the contents of password textboxes, which cannot be done the usual way.

Common DLL injection approaches:

  • Using the registry
  • Using windows hooks
  • Using remote threads
  • Using Trojans
  • As a debugger
  • Injecting code with CreateProcess

Of all the methods for injecting a DLL, using the regsitry by far the easiest. All you do is add two values to an already existing registry key. But this technique also has some disadvantages:

  • Your DLL is mapped only into processes that use User32.dll. All GUI-based applications use User32.dll, but most CUI-based applications do not. So if you need to inject your DLL into a compiler or linker, this method won’t work.
  • Your DLL is mapped into every GUI-based application, but you probably need to inject your library into only one or a few processes. The more processes your DLL is mapped into, the greater the chance of crashing the “container” processes. After all, threads running in these processes are executing your code. If your code enters an infinite loop or accesses memory incorrectly, you affect the behavior and robustness of the processes in which your code runs. Therefore, it is best to inject your library into as few processes as possible.
  • Your DLL is mapped into every GUI-based application for its entire lifetime. This is similar to the previous problem. Ideally, your DLL should be mapped into just the processes you need, and it should be mapped into those processes for the minimum amount of time. Suppose that when the user invokes your application, you want to subclass WordPad’s main window. Your DLL doesn’t have to be mapped into WordPad’s address space until the user invokes your application. If the user later decides to terminate your application, you’ll want to unsubclass WordPad’s main window. In this case, your DLL no longer needs to be injected into WordPad’s address space. It’s best to keep your DLL injected only when necessary

Following are steps to inject a DLL using Remote Threads:

  1. Use the VirtualAllocEx function to allocate memory in the remote process’ address space.
  2. Use the WriteProcessMemory function to copy the DLL’s pathname to the memory allocated in step 1.
  3. Use the GetProcAddress function to get the real address (inside Kernel32.dll) of the LoadLibraryW or LoadLibraryA function.
  4. Use the CreateRemoteThread function to create a thread in the remote process that calls the proper LoadLibrary function, passing it the address of the memory allocated in step 1. At this point, the DLL has been injected into the remote process’ address space, and the DLL’s DllMain function receives a DLL_PROCESS_ATTACH notification and can execute the desired code. When DllMain returns, the remote thread returns from its call to Load-LibraryW/A back to the BaseThreadStart function. BaseThreadStart then calls ExitThread, causing the remote thread to die.

    Now the remote process has the block of storage allocated in step 1 and the DLL still stuck in its address space. To clean this stuff up, we need to execute the following steps after the remote thread exists:

  5. Use the VirtualFreeEx function to free the memory allocated in step 1.
  6. Use the GetProcAddress function to get the real address (inside Kernel32.dll) of the FreeLibrary function.
  7. Use the CreateRemoteThread function to create a thread in the remote process that calls the FreeLibrary function, passing the remote DLL’s HMODULE.

Another way to inject a DLL is to replace a DLL that you know a process will load. For example, if you know that a process will load Xyz.dll, you can create your own DLL and give it the same filename. Of course, you must rename the original Xyz.dll to something else.

Inside your Xyz.dll, you must export all the same symbols that the original Xyz.dll exported. You can do this easily using function forwarders, which make it trivially simple to hook certain functions, but you should avoid using this technique because it is not version-resilient. If you replace a system DLL, for example, and Microsoft adds new functions in the future, your DLL will not have function forwarders for them. Applications that reference these new functions will be unable to load and execute.

If you have just a single application in which you want to use this technique, you can give your DLL a unique name and change the import section of the application’s .exe module. More specifically, the import section contains the names of the DLLs required by a module. You can rummage through this import section in the file and alter it so that the loader loads your own DLL. This technique is not too bad, but you have to be pretty familiar with the .exe and DLL file formats

API Hooking

Jeff mentioned an example that is very suitable for using API Hooking, he said:

“For example, I know of a company that produced a DLL that was loaded by a database product. The DLL’s job was to enhance and extend the capabilities of the database product. When the database product was terminated, the DLL received a DLL_PROCESS_DETACH notification and only then executed all of its cleanup code. The DLL would call functions in other DLLs to close socket connections, files, and other resources, but by the time it received the DLL_PROCESS_DETACH notification, other DLLs in the process’ address space had already gotten their DLL_PROCESS_DETACH notifications. So when the company’s DLL tried to clean up, many of the functions it called would fail because the other DLLs had already uninitialized.

The company hired me to help them solve this problem, and I suggested that we hook the ExitProcess function. As you know, calling ExitProcess causes the system to notify the DLLs with DLL_PROCESS_DETACH notifications. By hooking the ExitProcess function, we ensured that the company’s DLL was notified when ExitProcess was called. This notification would come in before any DLLs got a DLL_PROCESS_DETACH notification; therefore, all the DLLs in the process were still initialized and functioning properly. At this point, the company’s DLL would know that the process was about to terminate and could perform all of its cleanup successfully. Then the operating system’s ExitProcess function would be called, causing all the DLLs to receive their DLL_PROCESS_DETACH notifications and clean up. The company’s DLL would have no special cleanup to perform when it received this notification because it had already done what it needed to do”

Hook function is the function which replaces the original process function.

API Hooking Methodologies:

  • API Hooking by overwriting code.
  • API Hooking by Manipulating a Module’s Import Section

API Hooking by overwriting code steps:

  1. Save the first few bytes of this function in some memory of your own.
  2. You overwrite the first few bytes of this function with a JUMP CPU instruction that jumps to the memory address of your replacement function. Of course, your replacement function must have exactly the same signature as the function you’re hooking: all the parameters must be the same, the return value must be the same, and the calling convention must be the same.
  3. Now, when a thread calls the hooked function, the JUMP instruction will actually jump to your replacement function. At this point, you can execute whatever code you’d like.
  4. You unhook the function by taking the saved bytes (from step 2) and placing them back at the beginning of the hooked function.
  5. You call the hooked function (which is no longer hooked), and the function performs its normal processing.
  6. When the original function returns, you execute steps 2 and 3 again so that your replacement function will be called in the future.

This method was heavily used by 16-bit Windows programmers and worked just fine in that environment. Today, this method has several serious shortcomings, and I strongly discourage its use. First, it is CPU-dependent: JUMP instructions on x86, x64, IA-64, and other CPUs are different, and you must use hand-coded machine instructions to get this technique to work. Second, this method doesn’t work at all in a preemptive, multithreaded environment. It takes time for a thread to overwrite the code at the beginning of a function. While the code is being overwritten, another thread might attempt to call the same function. The results are disastrous! So this method works only if you know that no more than one thread will attempt to call a particular function at any given time.

API Hooking by Manipulating a Module’s Import Section

As it turns out, another API hooking technique solves both of the problems I’ve mentioned. This technique is easy to implement and is quite robust. But to understand it, you must understand how dynamic linking works. In particular, you must understand what’s contained in a module’s import section.

As you know, a module’s import section contains the set of DLLs that the module requires in order to run. In addition, it contains the list of symbols that the module imports from each of the DLLs. When the module places a call to an imported function, the thread actually grabs the address of the desired imported function from the module’s import section and then jumps to that address.

So, to hook a particular function, all you do is change the address in the module’s import section. That’s it. No CPU-dependent stuff. And because you’re not modifying the function’s code in any way, you don’t need to worry about any thread synchronization issues.

You should take into consideration delay loaded DLLs, in particular you can hook LoadLibrary* to first Load the DLL then hook its desired functions.