Network Programming with Windows Sockets

Named pipes and mailslots are suitable for interprocess communication between processes on the same computer or processes on Windows computers connected by a local or wide area network.

Named pipes and mailslots (both simply referred to here as “named pipes” unless the distinction is important) have the distinct drawback, however, of not being an industry standard. This is the case even though named pipes are protocolindependent and can run over industry-standard protocols such as TCP/IP.

Windows provides interoperability by supporting Windows Sockets, which are nearly the same as, and interoperable with, Berkeley Sockets, a de facto industry standard.

Winsock, because of conformance to industry standards, has naming conventions and programming characteristics somewhat different from the Windows functions described so far. The Winsock API is not strictly a part of the Windows API. Winsock also provides additional functions that are not part of the standard; these functions are used only as absolutely required. Among other advantages, programs will be more portable to other operating systems.

The Winsock API was developed as an extension of the Berkeley Sockets API into the Windows environment, and all Windows versions support Winsock. Winsock’s benefits include the following:

  • Porting code already written for Berkeley Sockets is straightforward.
  • Windows machines easily integrate into TCP/IP networks, both IPv4 and IPv6. IPv6, among other features, allows for longer IP addresses, overcoming the 4-byte address limit of IPv4.
  • Sockets can be used with Windows overlapped I/O, which, among other things, allows servers to scale when there is a large number of active clients.
  • Windows provides non-portable extensions.
  • Sockets supports protocols other than TCP/IP particularly Asynchronous Transfer Mode (ATM)

Differences between Winsock and Named Pipes (General):

  • Named pipes can be message-oriented, which can simplify programs.
  • Named pipes require ReadFile and WriteFile, whereas sockets can also use send and recv.
  • Sockets, unlike named pipes, are flexible so that a user can select the protocol to use with a socket, such as TCP or UDP. The user can also select protocols based on quality of service and other factors.
  • Sockets are based on an industry standard, allowing interoperability with non-Windows machines.
  • Named pipes do not have explicit port numbers and are distinguished by name.

Differences between Winsock and Named Pipes (Server Side):

  • When using sockets, call accept repetitively to connect to multiple clients. Each call will return a different connected socket.
  • Named requires you to create each named pipe instance with CreateNamedPipe. accept creates socket instances.
  • There is no upper bound on the number of socket clients, but there can be a limit on the number of named pipe instances, depending on the first call to CreateNampedPipe.
  • There are no WinSock convenience functions comparable to TransactNamedPipe.
  • Named pipes do not have explicit port numbers and are distinguished by name.

A named pipe server requires two function calls (CreateNamedPipe and ConnectNamedPipe), whereas socket servers require four function calls (socket, bind, listen and accept)

Differences between Winsock and Named Pipes (Client Side):

  • Named pipes use WaitNamedPipe following by CreateFile. The socket sequence is in the opposite order because the function socket can be regarded as the creation function, while connect is the blocking function.
  • An additional distinction is connect that is a socket client function, while the similarly named ConnectNamedPipe is a server function.

A thread-persistence problem may occur when you’ve multiple clients and DLL which includes the WinSock processing code (accept and recv). There are two proposed solutions:

  • Using Thread Local Storage (TLS). This solution dictates that each thread must handle a specific accept request at a time.
  • Encapsulate each accept request into a structure and then pass this structure to the thread.

Datagrams

Datagrams are similar to mailslots and are used in similar circumstances. There is no connection between the sender and receiver, and there can be multiple receivers. Delivery to the receiver is not ensured with either mailslots or datagrams, and successive messages will not necessarily be received in the order they were sent.

Windows Pipes and Mailslots

In this article:

Anonymous Pipes.

Named Pipes.

Mailslots.

Pipes:

Two primary Windows mechanisms for IPC are the anonymous pipe and the named pipe, both of which are accessed with the familiar WriteFile and ReadFIle functions. Simple anonymous pipes are character-based and half-duplex. As such, they are well suited for redirecting the output of one program to the input of another, as is common with communicating Linux and UNIX programs.

Named pipes are much more powerful than anonymous pipes. They are fullduplex and message-oriented, and they allow networked communication. Furthermore, there can be multiple open handles on the same pipe. These capabilities, coupled with convenient transaction-oriented named pipe functions, make named pipes appropriate for creating client/server systems.

Mailslots, which allow for one-to-many message broadcasting and are also filelike, are used to help clients locate servers.

Anonymous Pipes are useful for simple byte-based communication programs inside the within the same machine. Whereas Named Pipes have the following features:

Named pipes are message-oriented, so the reading process can read varyinglength messages precisely as sent by the writing process

Named pipes are bidirectional, so two processes can exchange messages over the same pipe.

There can be multiple, independent instances of pipes with the same name. For example, several clients can communicate concurrently with a single server using distinct instances of a named pipe. Each client can have its own named pipe instance, and the server can respond to a client using the same instance.

Networked clients can access the pipe by name. Named pipe communication is the same whether the two processes are on the same machine or on different machines.

Several convenience and connection functions simplify named pipe request/ response interaction and client/server connection.

Figure above shows an illustrative client/server relationship, and the pseudocode shows one scheme for using named pipes. Notice that the server creates multiple instances of the same pipe, each of which can support a client. The server also creates a thread for each named pipe instance, so that each client has a dedicated thread and named pipe instance.

When creating a named pipe, the name must be in the form “\\.\pipe\pipename” the period (.) stands for the local machine; thus, you can’t create a pipe on a remote machine. The pipename is case-insensitive, can be up to 256 characters long, and can contain any character other than backslash.

When client is about initializing an instance from a named pipe:

If the server on the same machine use: “\\.\pipe\pipename

If the server on remote machine use: “\\servername\pipe\pipename

 

Mailslots:

A Windows mailslot, like a named pipe, has a name that unrelated processes can use for communication. Mailslots are a broadcast mechanism, similar to datagrams, and behave differently from named pipes, making them useful in some important but limited situations. Here are the significant mailslot characteristics:

A mailslot is one-directional

A mailslot can have multiple writers and multiple readers, but frequently it will be one-to-many of one form or the other

A writer (client) does not know for certain that all, some, or any readers (servers) actually received the message

Mailslots can be located over a network domain

Message lengths are limited

Last but not least, client can locate mailslot performing CreateFile using the name “\\*\mailslot\mailslotname“. In this way, the * acts as wildcard and the client can locate every server on the domain, a networked group of systems assigned a common name by the network administrator.

These tables summarize pipes and mailslots

Windows pipes and mailslots, which are accessed with file I/O operations, provide stream-oriented interprocess and networked communication.

Structured Exception Handling

C++, C#, and other languages have very similar mechanisms, however, and these mechanisms build on the SEH facilities presented here.

Console control handlers, allow a program to detect external signals such as Ctrl-C a from the console or the user logging off or shutting down the system. These signals also provide a limited form of process-to-process signaling.

Vectored exception handling allows the user to specify functions to be executed directly when an exception occurs, and the functions are executed before SEH is invoked.

Exception handlers can respond to a variety of asynchronous events, but they do not detect situations such as the user logging off or entering a Ctrl-C from the keyboard to stop a program. Use console control handlers to detect such events.

There is important distinction between exceptions and signals. A signal applies to the entire process, whereas an exception applies only to the thread executing the code where the exception occurs.

Exception handling functions can be directly associated with exceptions, just as console control handlers can be associated with console control events. When an exception occurs, the vectored exception handlers are called first, before the system unwinds the stack to look for structured exception handlers.

Console control handlers can respond to external events that do not generate exceptions. VEH is a newer feature that allows functions to be executed before SEH processing occurs. VEH is similar to conventional interrupt handling.

DLL Injection and API Hooking

Situations that require breaking through process boundary walls to access another process’ address space include the following:

  • When you want to subclass a window created by another process
  • When you need debugging aids—for example, when you need to determine which dynamic-link libraries (DLLs) another process is using
  • When you want to hook other processes

In computer programming, DLL injection is a technique used to run code within the address space of another process by forcing it to load a dynamic-link library. DLL injection is often used by third-party developers to influence the behavior of a program in a way its authors did not anticipate or intend. For example, the injected code could trap system function calls, or read the contents of password textboxes, which cannot be done the usual way.

Common DLL injection approaches:

  • Using the registry
  • Using windows hooks
  • Using remote threads
  • Using Trojans
  • As a debugger
  • Injecting code with CreateProcess

Of all the methods for injecting a DLL, using the regsitry by far the easiest. All you do is add two values to an already existing registry key. But this technique also has some disadvantages:

  • Your DLL is mapped only into processes that use User32.dll. All GUI-based applications use User32.dll, but most CUI-based applications do not. So if you need to inject your DLL into a compiler or linker, this method won’t work.
  • Your DLL is mapped into every GUI-based application, but you probably need to inject your library into only one or a few processes. The more processes your DLL is mapped into, the greater the chance of crashing the “container” processes. After all, threads running in these processes are executing your code. If your code enters an infinite loop or accesses memory incorrectly, you affect the behavior and robustness of the processes in which your code runs. Therefore, it is best to inject your library into as few processes as possible.
  • Your DLL is mapped into every GUI-based application for its entire lifetime. This is similar to the previous problem. Ideally, your DLL should be mapped into just the processes you need, and it should be mapped into those processes for the minimum amount of time. Suppose that when the user invokes your application, you want to subclass WordPad’s main window. Your DLL doesn’t have to be mapped into WordPad’s address space until the user invokes your application. If the user later decides to terminate your application, you’ll want to unsubclass WordPad’s main window. In this case, your DLL no longer needs to be injected into WordPad’s address space. It’s best to keep your DLL injected only when necessary

Following are steps to inject a DLL using Remote Threads:

  1. Use the VirtualAllocEx function to allocate memory in the remote process’ address space.
  2. Use the WriteProcessMemory function to copy the DLL’s pathname to the memory allocated in step 1.
  3. Use the GetProcAddress function to get the real address (inside Kernel32.dll) of the LoadLibraryW or LoadLibraryA function.
  4. Use the CreateRemoteThread function to create a thread in the remote process that calls the proper LoadLibrary function, passing it the address of the memory allocated in step 1. At this point, the DLL has been injected into the remote process’ address space, and the DLL’s DllMain function receives a DLL_PROCESS_ATTACH notification and can execute the desired code. When DllMain returns, the remote thread returns from its call to Load-LibraryW/A back to the BaseThreadStart function. BaseThreadStart then calls ExitThread, causing the remote thread to die.

    Now the remote process has the block of storage allocated in step 1 and the DLL still stuck in its address space. To clean this stuff up, we need to execute the following steps after the remote thread exists:

  5. Use the VirtualFreeEx function to free the memory allocated in step 1.
  6. Use the GetProcAddress function to get the real address (inside Kernel32.dll) of the FreeLibrary function.
  7. Use the CreateRemoteThread function to create a thread in the remote process that calls the FreeLibrary function, passing the remote DLL’s HMODULE.

Another way to inject a DLL is to replace a DLL that you know a process will load. For example, if you know that a process will load Xyz.dll, you can create your own DLL and give it the same filename. Of course, you must rename the original Xyz.dll to something else.

Inside your Xyz.dll, you must export all the same symbols that the original Xyz.dll exported. You can do this easily using function forwarders, which make it trivially simple to hook certain functions, but you should avoid using this technique because it is not version-resilient. If you replace a system DLL, for example, and Microsoft adds new functions in the future, your DLL will not have function forwarders for them. Applications that reference these new functions will be unable to load and execute.

If you have just a single application in which you want to use this technique, you can give your DLL a unique name and change the import section of the application’s .exe module. More specifically, the import section contains the names of the DLLs required by a module. You can rummage through this import section in the file and alter it so that the loader loads your own DLL. This technique is not too bad, but you have to be pretty familiar with the .exe and DLL file formats

API Hooking

Jeff mentioned an example that is very suitable for using API Hooking, he said:

“For example, I know of a company that produced a DLL that was loaded by a database product. The DLL’s job was to enhance and extend the capabilities of the database product. When the database product was terminated, the DLL received a DLL_PROCESS_DETACH notification and only then executed all of its cleanup code. The DLL would call functions in other DLLs to close socket connections, files, and other resources, but by the time it received the DLL_PROCESS_DETACH notification, other DLLs in the process’ address space had already gotten their DLL_PROCESS_DETACH notifications. So when the company’s DLL tried to clean up, many of the functions it called would fail because the other DLLs had already uninitialized.

The company hired me to help them solve this problem, and I suggested that we hook the ExitProcess function. As you know, calling ExitProcess causes the system to notify the DLLs with DLL_PROCESS_DETACH notifications. By hooking the ExitProcess function, we ensured that the company’s DLL was notified when ExitProcess was called. This notification would come in before any DLLs got a DLL_PROCESS_DETACH notification; therefore, all the DLLs in the process were still initialized and functioning properly. At this point, the company’s DLL would know that the process was about to terminate and could perform all of its cleanup successfully. Then the operating system’s ExitProcess function would be called, causing all the DLLs to receive their DLL_PROCESS_DETACH notifications and clean up. The company’s DLL would have no special cleanup to perform when it received this notification because it had already done what it needed to do”

Hook function is the function which replaces the original process function.

API Hooking Methodologies:

  • API Hooking by overwriting code.
  • API Hooking by Manipulating a Module’s Import Section

API Hooking by overwriting code steps:

  1. Save the first few bytes of this function in some memory of your own.
  2. You overwrite the first few bytes of this function with a JUMP CPU instruction that jumps to the memory address of your replacement function. Of course, your replacement function must have exactly the same signature as the function you’re hooking: all the parameters must be the same, the return value must be the same, and the calling convention must be the same.
  3. Now, when a thread calls the hooked function, the JUMP instruction will actually jump to your replacement function. At this point, you can execute whatever code you’d like.
  4. You unhook the function by taking the saved bytes (from step 2) and placing them back at the beginning of the hooked function.
  5. You call the hooked function (which is no longer hooked), and the function performs its normal processing.
  6. When the original function returns, you execute steps 2 and 3 again so that your replacement function will be called in the future.

This method was heavily used by 16-bit Windows programmers and worked just fine in that environment. Today, this method has several serious shortcomings, and I strongly discourage its use. First, it is CPU-dependent: JUMP instructions on x86, x64, IA-64, and other CPUs are different, and you must use hand-coded machine instructions to get this technique to work. Second, this method doesn’t work at all in a preemptive, multithreaded environment. It takes time for a thread to overwrite the code at the beginning of a function. While the code is being overwritten, another thread might attempt to call the same function. The results are disastrous! So this method works only if you know that no more than one thread will attempt to call a particular function at any given time.

API Hooking by Manipulating a Module’s Import Section

As it turns out, another API hooking technique solves both of the problems I’ve mentioned. This technique is easy to implement and is quite robust. But to understand it, you must understand how dynamic linking works. In particular, you must understand what’s contained in a module’s import section.

As you know, a module’s import section contains the set of DLLs that the module requires in order to run. In addition, it contains the list of symbols that the module imports from each of the DLLs. When the module places a call to an imported function, the thread actually grabs the address of the desired imported function from the module’s import section and then jumps to that address.

So, to hook a particular function, all you do is change the address in the module’s import section. That’s it. No CPU-dependent stuff. And because you’re not modifying the function’s code in any way, you don’t need to worry about any thread synchronization issues.

You should take into consideration delay loaded DLLs, in particular you can hook LoadLibrary* to first Load the DLL then hook its desired functions.

Thread-Local Storage

All threads of a process share its virtual address space. The local variables of a function are unique to each thread that runs the function. However, the static and global variables are shared by all threads in the process. With thread local storage (TLS), you can provide unique data for each thread that the process can access using a global index. One thread allocates the index, which can be used by the other threads to retrieve the unique data associated with the index.

There are two types of Thread-Local Storage (TLS):

  • Dynamic TLS.
  • Static TLS.

When the compiler compiles your program, it puts all the TLS variables into their own section, which is named, unsurprisingly enough, .tls. The linker combines all the .tls sections from all the object modules to produce one big .tls section in the resulting executable or DLL file.

For static TLS to work, the operating system must get involved. When your application is loaded into memory, the system looks for the .tls section in your executable file and dynamically allocates a block of memory large enough to hold all the static TLS variables. Every time the code in your application refers to one of these variables, the reference resolves to a memory location contained in the allocated block of memory. As a result, the compiler must generate additional code to reference the static TLS variables, which makes your application both larger in size and slower to execute. On an x86 CPU, three additional machine instructions are generated for every reference to a static TLS variable.

If another thread is created in your process, the system traps it and automatically allocates another block of memory to contain the new thread’s static TLS variables. The new thread has access only to its own static TLS variables and cannot access the TLS variables belonging to any other thread.

That’s basically how static TLS works. Now let’s add DLLs to the story. It’s likely that your application will use static TLS variables and that you’ll link to a DLL that also wants to use static TLS variables. When the system loads your application, it first determines the size of your application’s .tls section and adds the value to the size of any .tls sections in any DLLs to which your application links. When threads are created in your process, the system automatically allocates a block of memory large enough to hold all the TLS variables required by your application and all the implicitly linked DLLs. This is pretty cool.

DLL Advanced Techniques

For a thread to call a function in a DLL module, the DLL’s file image must be mapped into the address space of the calling thread’s process. You can accomplish this in two ways. The first way is to have your application’s source code simply reference symbols contained in the DLL. This causes the loader to implicitly load (and link) the required DLL when the application is invoked.

The second way is for the application to explicitly load the required DLL and explicitly link to the desired exported symbol while the application is running. In other words, while the application is running, a thread within it can decide that it wants to call a function within a DLL. That thread can explicitly load the DLL into the process’ address space, get the virtual memory address of a function contained within the DLL, and then call the function using this memory address. The beauty of this technique is that everything is done while the application is running.

Figure below shows how an application explicitly loads a DLL and links to a symbol within it

If you pass NULL to GetModuleHandle, the handle of the application executable is returned.

You can explicitly load a DLL through the WinAPI LoadLibrary(Ex)

You can explicitly unload a DLL through WInAPI FreeLibrary or FreeLibraryAndExitThread

You can explicitly link to an exported symbol from a DLL using WinAPI GetProcAddress

The Platform SDK documentation states that your DllMain function should perform only simple initialization, such as setting up thread-local storage, creating kernel objects, and opening files. You must also avoid calls to User, Shell, ODBC, COM, RPC, and socket functions (or functions that call these functions) because their DLLs might not have initialized yet or the functions might call LoadLibrary(Ex) internally, again creating a dependency loop.

If a process terminates because some thread in the system calls TerminateProcess, the system does not call the DLL’s DllMain function with a value of DLL_PROCESS_DETACH. This means that any DLLs mapped into the process’ address space do not have a chance to perform any cleanup before the process terminates. This can result in the loss of data. You should use the TerminateProcess function only as a last resort.

Figure below explains what happens when a thread call LoadLibrary

And this figure when FreeLibrary is called

Normally, you don’t even think about this DllMain serialization. The reason I’m making a big deal out of it is that you can have a bug in your code caused by DllMain serialization. This code looked something like the following code:

BOOL WINAPI DllMain(HINSTANCE hInstDll, DWORD fdwReason, PVOID fImpLoad) {
   HANDLE hThread;
   DWORD dwThreadId;

   switch (fdwReason) {

   case DLL_PROCESS_ATTACH:

      // The DLL is being mapped into the process' address space.

      // Create a thread to do some stuff.

      hThread = CreateThread(NULL, 0, SomeFunction, NULL, 0, &dwThreadId);

      // Suspend our thread until the new thread terminates.
      WaitForSingleObject(hThread, INFINITE);
      // We no longer need access to the new thread.
      CloseHandle(hThread);
      break;
   case DLL_THREAD_ATTACH:
      // A thread is being created.
      break;
   case DLL_THREAD_DETACH:
      // A thread is exiting cleanly.
      break;
   case DLL_PROCESS_DETACH:
      // The DLL is being unmapped from the process' address space.
      break;
   }

   return(TRUE);

}

 

It may took several hours to discover the problem with the code. Can you see it? When DllMain receives a DLL_PROCESS_ATTACH notification, a new thread is created. The system must call DllMain again with a value of DLL_THREAD_ATTACH. However, the new thread is suspended because the thread that caused the DLL_PROCESS_ATTACH notification to be sent to DllMain has not finished processing. The problem is the call to WaitForSingleObject. This function suspends the currently executing thread until the new thread terminates. However, the new thread never gets a chance to run, let alone terminate, because it is suspended—waiting for the current thread to exit the DllMain function. What we have here is a deadlock situation. Both threads are suspended forever!

Delay Loading a DLL

Microsoft Visual C++ offers a fantastic feature to make working with DLLs easier: delay-load DLLs. A delay-load DLL is a DLL that is implicitly linked but not actually loaded until your code attempts to reference a symbol contained within the DLL. Delay-load DLLs are helpful in these situations:

  • If your application uses several DLLs, its initialization time might be slow because the loader maps all the required DLLs into the process’ address space. One way to alleviate this problem is to spread out the loading of the DLLs as the process executes. Delay-load DLLs let you accomplish this easily.
  • If you call a new function in your code and then try to run your application on an older version of the system in which the function does not exist, the loader reports an error and does not allow the application to run. You need a way to allow your application to run and then, if you detect (at run time) that the application is running on an older system, you don’t call the missing function. For example, let’s say that an application wants to use the new Thread Pool functions when running on Windows Vista and the old functions when running on older versions of Windows. When the application initializes, it calls GetVersionEx to determine the host operating system and properly calls the appropriate functions. Attempting to run this application on versions of Windows older than Windows Vista causes the loader to display an error message because the new Thread Pool functions don’t exist on these operating systems. Again, delay-load DLLs let you solve this problem easily.

However, a couple of limitations are worth mentioning:

  • It is not possible to delay-load a DLL that exports fields.
  • The Kernel32.dll module cannot be delay-loaded because it must be loaded for LoadLibrary and GetProcAddress to be called.
  • You should not call a delay-load function in a DllMain entry point because the process might crash.

Let’s start with the easy stuff: getting delay-load DLLs to work. First, you create a DLL just as you normally would. You also create an executable as you normally would, but you do have to change a couple of linker switches and relink the executable. Here are the two linker switches you need to add:

  • /Lib:DelayImp.lib
  • /DelayLoad:MyDll.dll

To unload a delay-loaded DLL, you must do two things. First, you must specify an additional linker switch (/Delay:unload) when you build your executable file. Second, you must modify your source code and place a call to the __FUnloadDelayLoadedDLL2 function at the point where you want the DLL to be unloaded:

BOOL __FUnloadDelayLoadedDLL2(PCSTR szDll);

You can take advantage of function forwarders in your DLL module as well. The easiest way to do this is by using a pragma directive, as shown here:

// Function forwarders to functions in DllWork

#pragma comment(linker, "/export:SomeFunc=DllWork.SomeOtherFunc")

This pragma tells the linker that the DLL being compiled should export a function called SomeFunc. But the actual implementation of SomeFunc is in another function called SomeOtherFunc, which is contained in a module called DllWork.dll. You must create separate pragma lines for each function you want to forward.

Applications can depend on a specific version of a shared DLL and start to fail if another application is installed with a newer or older version of the same DLL. There are two ways to ensure that your application uses the correct DLL: DLL redirection and side-by-side components. Developers and administrators should use DLL redirection for existing applications, because it does not require any changes to the application. If you are creating a new application or updating an application and want to isolate your application from potential problems, create a side-by-side component.

Rebasing Modules

 

When this executable module is invoked, the operating system loader creates a virtual address for the new process. Then the loader maps the executable 
module at memory address 0x00400000 and the DLL module at 0x10000000. Why is this preferred base address so important? Let's look at this code:
int g_x;

 

void Func() {

   g_x = 5; // This is the important line.

}

When the compiler processes the Func function, the compiler and linker produce machine code that looks something like this:

MOV   [0x00414540], 5

In other words, the compiler and linker have created machine code that is actually hard-coded in the address of the g_x variable: 0x00414540. This address is in the machine code and absolutely identifies the location of the g_x variable in the process’ address space. But, of course, this memory address is correct if and only if the executable module loads at its preferred base address: 0x00400000.

What if we had the exact same code in a DLL module? In that case, the compiler and linker would generate machine code that looks something like this:

MOV   [0x10014540], 5

Again, notice that the virtual memory address for the DLL's g_x variable is hard-coded in the DLL file's image on the disk drive. And again, this memory address is absolutely correct as long as the DLL does in fact load at its preferred base address.

Relocating an executable (or DLL) module is an absolutely horrible process, and you should take measures to avoid it. Let's see why. 
Suppose that the loader relocates the second DLL to address 0x20000000. In that case, the code that changes the g_x variable to 5 should be

MOV   [0x20014540], 5

But the code in the file’s image looks like this:

 

MOV   [0x10014540], 5

 

If the code from the file’s image is allowed to execute, some 4-byte value in the first DLL module will be overwritten with the value 5. This can’t possibly be allowed. The loader must somehow fix this code. When the linker builds your module, it embeds a relocation section in the resulting file. This section contains a list of byte offsets. Each byte offset identifies a memory address used by a machine code instruction. If the loader can map a module at its preferred base address, the module’s relocation section is never accessed by the system. This is certainly what we want—you never want the relocation section to be used.

If, on the other hand, the module cannot be mapped at its preferred base address, the loader opens the module’s relocation section and iterates though all the entries. For each entry found, the loader goes to the page of storage that contains the machine code instruction to be modified. It then grabs the memory address that the machine instruction is currently using and adds to the address the difference between the module’s preferred base address and the address where the module actually got mapped.

So, in the preceding example, the second DLL was mapped at 0x20000000 but its preferred base address is 0x10000000. This yields a difference of 0x10000000, which is then added to the address in the machine code instruction, giving us this:

MOV   [0x20014540], 5

Now this code in the second DLL will reference its g_x variable correctly.

There are two major drawbacks when a module cannot load at its preferred base address:

  • The loader has to iterate through the relocation section and modify a lot of the module’s code. This produces a major performance hit and can really hurt an application’s initialization time.
  • As the loader writes to the module’s code pages, the system’s copy-on-write mechanism forces these pages to be backed by the system’s paging file.

By the way, you can create an executable or DLL module that doesn’t have a relocation section in it. You do this by passing the /FIXED switch to the linker when you build the module. Using this switch makes the module smaller in bytes, but it means that the module cannot be relocated. If the module cannot load at its preferred base address, it cannot load at all. If the loader must relocate a module but no relocation section exists for the module, the loader kills the entire process and displays an "Abnormal Process Termination" message to the user.

Preferred base addresses must always start on an allocation-granularity boundary.

Visual Studio ships a tool called rebase utility this utility do the following when you call ReBaseImage:

When you execute Rebase, passing it a set of image filenames, it does the following:

  1. It simulates creating a process’ address space.
  2. It opens all the modules that would normally be loaded into this address space. It thus gets the preferred base address and size of each module.
  3. It simulates relocating the modules in the simulated address space so that none of the modules overlap.
  4. For the relocated modules, it parses that module’s relocation section and modifies the code in the module file on disk.
  5. It updates the header of each relocated module to reflect the new preferred base address.

Binding Modules

Rebasing is very important and greatly improves the performance of the entire system. However, you can do even more to improve performance. Let’s say that you have properly rebased all your application’s modules. The loader writes the symbol’s virtual address into the executable module’s import section. This allows references to the imported symbols to actually get to the correct memory location.

Let’s think about this for a second. If the loader is writing the virtual addresses of the imported symbols into the .exe module’s import section, the pages that back the import section are written to. Because these pages are copy-on-write, the pages are backed by the paging file. So we have a problem that is similar to the rebasing problem: portions of the image file are swapped to and from the system’s paging file instead of being discarded and reread from the file’s disk image when necessary. Also, the loader has to resolve the addresses of all the imported symbols (for all modules), which can be time-consuming.

You can use the technique of binding a module so that your application can initialize faster and use less storage. Binding a module prepares that module’s import section with the virtual addresses of all the imported symbols. To improve initialization time and to use less storage, you must do this before loading the module, of course.

Visual Studio ships a bind.exe utility that do the following when you execute Bind passing it an image name:

  1. It opens the specified image file’s import section.
  2. For every DLL listed in the import section, it opens the DLL file and looks in its header to determine its preferred base address.
  3. It looks up each imported symbol in the DLL’s export section.
  4. It takes the RVA of the symbol and adds to it the module’s preferred base address. It writes the resulting expected virtual address of the imported symbol to the image file’s import section.
  5. It adds some additional information to the image file’s import section. This information includes the name of all DLL modules that the image is bound to and the time stamp of those modules.

OK, so now you know that you should bind all the modules that you ship with your application. But when should you perform the bind? If you bind your modules at your company, you would bind them to the system DLLs that you’ve installed, which are unlikely to be what the user has installed. Because you don’t know if your user is running Windows XP, Windows 2003, or Windows Vista, or whether these have service packs installed, you should perform binding as part of your application’s setup.