Working with files is something almost every application must do, and it’s always a hassle. Should your application open the file, read it, and close the file, or should it open the file and use a buffering algorithm to read from and write to different portions of the file? Microsoft Windows offers the best of both worlds: memory-mapped files.
Like virtual memory, memory-mapped files allow you to reserve a region of address space and commit physical storage to the region. The difference is that the physical storage comes from a file that is already on the disk instead of the system’s paging file. Once the file has been mapped, you can access it as if the whole file were loaded in memory.
Memory-mapped files are used for three different purposes:
- The system uses memory-mapped files to load and execute .exe and dynamic-link library (DLL) files. This greatly conserves both paging file space and the time required for an application to begin executing.
- You can use memory-mapped files to access a data file on disk. This shelters you from performing file I/O operations on the file and from buffering the file’s contents.
- You can use memory-mapped files to allow multiple processes running on the same machine to share data with each other. Windows does offer other methods for communicating data among processes—but these other methods are implemented using memory-mapped files, making memory-mapped files the most efficient way for multiple processes on a single machine to communicate with one another.
When a thread calls CreateProcess, the system performs the following steps:
- The system locates the .exe file specified in the call to CreateProcess. If the .exe file cannot be found, the process is not created and CreateProcess returns FALSE.
- The system creates a new process kernel object.
- The system creates a private address space for this new process.
- The system reserves a region of address space large enough to contain the .exe file. The desired location of this region is specified inside the .exe file itself. By default, an .exe file’s base address is 0x00400000. (This address might be different for a 64-bit application running on 64-bit Windows.) However, you can override this when you create your application’s .exe file by using the linker’s /BASE option when you link your application.
- The system notes that the physical storage backing the reserved region is in the .exe file on disk instead of the system’s paging file
After the .exe file has been mapped into the process’ address space, the system accesses a section of the .exe file that lists the DLLs containing functions that the code in the .exe calls. The system then calls LoadLibrary for each of these DLLs, and if any of the DLLs require additional DLLs, the system calls LoadLibrary to load those DLLs as well. Every time LoadLibrary is called to load a DLL, the system performs steps similar to steps 4 and 5 just listed:
- The system reserves a region of address space large enough to contain the DLL file. The desired location of this region is specified inside the DLL file itself. By default, Microsoft’s linker sets the DLL’s base address to 0x10000000 for an x86 DLL and 0x00400000 for an x64 DLL. However, you can override this when you build your DLL by using the linker’s /BASE option. All the standard system DLLs that ship with Windows have different base addresses so that they don’t overlap if loaded into a single address space.
- If the system is unable to reserve a region at the DLL’s preferred base address, either because the region is occupied by another DLL or .exe or because the region just isn’t big enough, the system will then try to find another region of address space to reserve for the DLL. It is unfortunate when a DLL cannot load at its preferred base address for two reasons. First, the system might not be able to load the DLL if it does not have relocation information. (You can remove relocation information from a DLL when it is created by using the linker’s /FIXED switch. This makes the DLL file smaller, but it also means that the DLL must load at its preferred address or it can’t load at all.) Second, the system must perform some relocations within the DLL. These relocations require additional storage from the system’s paging file; they also increase the amount of time needed to load the DLL.
- The system notes that the physical storage backing the reserved region is in the DLL file on disk instead of in the system’s paging file. If Windows has to perform relocations because the DLL could not load at its preferred base address, the system also notes that some of the physical storage for the DLL is mapped to the paging file.
If for some reason the system is unable to map the .exe and all the required DLLs, the system displays a message box to the user and frees the process’ address space and the process object. CreateProcess will return FALSE to its caller; the caller can call GetLastError to get a better idea of why the process could not be created.
After all the .exe and DLL files have been mapped into the process’ address space, the system can begin executing the .exe file’s startup code. After the .exe file has been mapped, the system takes care of all the paging, buffering, and caching. For example, if code in the .exe causes it to jump to the address of an instruction that isn’t loaded into memory, a fault will occur. The system detects the fault and automatically loads the page of code from the file’s image into a page of RAM. Then the system maps the page of RAM to the proper location in the process’ address space and allows the thread to continue executing as though the page of code were loaded all along. Of course, all this is invisible to the application. This process is repeated each time any thread in the process attempts to access code or data that is not loaded into RAM
Note: “Initially, static data is not shared by multiple instances of an executable or DLL”.
When you create a new process for an application that is already running, the system simply opens another memory-mapped view of the file-mapping object that identifies the executable file’s image and creates a new process object and a new thread object (for the primary thread). The system also assigns new process and thread IDs to these objects. By using memory-mapped files, multiple running instances of the same application can share the same code and data in RAM.
Note one small problem here. Processes use a flat address space. When you compile and link your program, all the code and data are thrown together as one large entity. The data is separated from the code but only to the extent that it follows the code in the .exe file. (See the following note for more detail.) The following illustration shows a simplified view of how the code and data for an application are loaded into virtual memory and then mapped into an application’s address space
As an example, let’s say that a second instance of an application is run. The system simply maps the pages of virtual memory containing the file’s code and data into the second application’s address space, as shown next
The system allocated a new page of virtual memory (labeled as “New page” in the image above) and copied the contents of data page 2 into it. The first instance’s address space is changed so that the new data page is mapped into the address space at the same location as the original address page. Now the system can let the process alter the global variable without fear of altering the data for another instance of the same application.
A similar sequence of events occurs when an application is being debugged. Let’s say that you’re running multiple instances of an application and want to debug only one instance. You access your debugger and set a breakpoint in a line of source code. The debugger modifies your code by changing one of your assembly language instructions to an instruction that causes the debugger to activate itself. So you have the same problem again. When the debugger modifies the code, it causes all instances of the application to activate the debugger when the changed assembly instruction is executed. To fix this situation, the system again uses copy-on-write memory. When the system senses that the debugger is attempting to change the code, it allocates a new block of memory, copies the page containing the instruction into the new page, and allows the debugger to modify the code in the page copy.
Sharing Static Data across Multiple Instances of an Executable or DLL
The fact that global and static data is not shared by multiple mappings of the same .exe or DLL is a safe default. However, on some occasions it is useful and convenient for multiple mappings of an .exe to share a single instance of a variable. For example, Windows offers no easy way to determine whether the user is running multiple instances of an application. But if you could get all the instances to share a single global variable, this global variable could reflect the number of instances running. When the user invoked an instance of the application, the new instance’s thread could simply check the value of the global variable (which had been updated by another instance), and if the count were greater than 1, the second instance could notify the user that only one instance of the application is allowed to run and the second instance would terminate.
Every .exe or DLL file image is composed of a collection of sections. By convention, each standard section name begins with a period. For example, when you compile your program, the compiler places all the code in a section called .text. The compiler also places all the uninitialized data in a .bss section and all the initialized data in a .data section.
Executable Common Sections
In addition to using the standard sections created by the compiler and the linker, you can create your own sections when you compile using the following directive:
So, for example, I can create a section called “Shared” that contains a single LONG value, as follows:
LONG g_lInstanceCount = 0;
When the compiler compiles this code, it creates a new section called Shared and places all the initialized data variables that it sees after the pragma in this new section. In the preceding example, the variable is placed in the Shared section. Following the variable, the #pragma data_seg() line tells the compiler to stop putting initialized variables in the Shared section and to start putting them back in the default data section. It is extremely important to remember that the compiler will store only initialized variables in the new section
The Microsoft Visual C++ compiler offers an allocate declaration specifier, however, that does allow you to place uninitialized data in any section you desire. Take a look at the following code:
// Create Shared section & have compiler place initialized data in it.
// Initialized, in Shared section
int a = 0;
// Uninitialized, not in Shared section
// Have compiler stop placing initialized data in Shared section.
// Initialized, in Shared section
__declspec(allocate("Shared")) int c = 0;
// Uninitialized, in Shared section
__declspec(allocate("Shared")) int d;
// Initialized, not in Shared section
int e = 0;
// Uninitialized, not in Shared section
Simply telling the compiler to place certain variables in their own section is not enough to share those variables. You must also tell the linker that the variables in a particular section are to be shared. You can do this by using the /SECTION switch on the linker's command line:
Following the colon, type the name of the section for which you want to alter attributes. In our example, we want to change the attributes of the Shared section. So we’d construct our linker switch as follows:
After the comma, we specify the desired attributes: use R for READ, W for WRITE, E for EXECUTE, and S for SHARED. The switch shown indicates that the data in the Shared section is readable, writable, and shared. If you want to change the attributes of more than one section, you must specify the /SECTION switch multiple times—once for each section for which you want to change attributes.
You can also embed linker switches right inside your source code using this syntax:
#pragma comment(linker, "/SECTION:Shared,RWS")
This line tells the compiler to embed the preceding string inside a special section of the generated .obj file named “.drectve”. When the linker combines all the .obj modules together, the linker examines each .obj module’s “.drectve” section and pretends that all the strings were passed to the linker as command-line arguments. this technique should be used all the time because it is so convenient—if you move a source code file into a new project, you don’t have to remember to set linker switches in the Visual C++ Project Properties dialog box
Although you can create shared sections, Microsoft discourages the use of shared sections for two reasons. First, sharing memory in this way can potentially violate security. Second, sharing variables means that an error in one application can affect the operation of another application because there is no way to protect a block of data from being randomly written to by an application.
Memory-Mapped Data Files
The operating system makes it possible to memory map a data file into your process’ address space. Thus it is very convenient to manipulate large streams of data. To understand the power of using memory-mapped files this way, let’s look at four possible methods of implementing a program to reverse the order of all the bytes in a file. They are:
Method 1: One file, one buffer.
The first and theoretically simplest method involves allocating a block of memory large enough to hold the entire file. The file is opened, its contents are read into the memory block, and the file is closed. With the contents in memory, we can now reverse all the bytes by swapping the first byte with the last, the second byte with the second-to-last, and so on. This swapping continues until you reach the middle of the file. After all the bytes have been swapped, you reopen the file and overwrite its contents with the contents of the memory block.
This method is pretty easy to implement but has two major drawbacks. First, a memory block the size of the file must be allocated. This might not be so bad if the file is small, but what if the file is huge—say, 2 GB? A 32-bit system will not allow the application to commit a block of physical storage that large. Large files require a different method.
Second, if the process is interrupted in the middle—while the reversed bytes are being written back out to the file—the contents of the file will be corrupted. The simplest way to guard against this is to make a copy of the original file before reversing its contents. If the whole process succeeds, you can delete the copy of the file. Unfortunately, this safeguard requires additional disk space
Methods 2: Two files, one buffer.
In the second method, you open the existing file and create a new file of 0 length on the disk. Then you allocate a small internal buffer—say, 8 KB. You seek to the end of the original file minus 8 KB, read the last 8 KB into the buffer, reverse the bytes, and write the buffer’s contents to the newly created file. The process of seeking, reading, reversing, and writing repeats until you reach the beginning of the original file. Some special—but not extensive—handling is required if the file’s length is not an exact multiple of 8 KB. After the original file is fully processed, both files are closed and the original file is deleted.
This method is a bit more complicated to implement than the first one. It uses memory much more efficiently because only an 8-KB chunk is ever allocated, but it has two big problems. First, the processing is slower than in the first method because on each iteration you must perform a seek on the original file before performing a read. Second, this method can potentially use a large amount of hard disk space. If the original file is 1 GB, the new file will grow to be 1 GB as the process continues. Just before the original file is deleted, the two files will occupy 2 GB of disk space. This is 1 GB more than should be required—a disadvantage that leads us to the next method
Method 3: One file, two buffers.
For this method, let’s say the program initializes by allocating two separate 8-KB buffers. The program reads the first 8 KB of the file into one buffer and the last 8 KB of the file into the other buffer. The process then reverses the contents of both buffers and writes the contents of the first buffer back to the end of the file and the contents of the second buffer back to the beginning of the same file. Each iteration continues by moving blocks from the front and back of the file in 8-KB chunks. Some special handling is required if the file’s length is not an exact multiple of 16 KB and the two 8-KB chunks overlap. This special handling is more complex than the special handling in the previous method, but it’s nothing that should scare off a seasoned programmer.
Compared with the previous two methods, this method is better at conserving hard disk space. Because everything is read from and written to the same file, no additional disk space is required. As for memory use, this method is also not too bad, using only 16 KB. Of course, this method is probably the most difficult to implement. Like the first method, this method can result in corruption of the data file if the process is somehow interrupted.
Now let’s take a look at how this process might be accomplished using memory-mapped files
Method 4: One file, zero buffers.
When using memory-mapped files to reverse the contents of a file, you open the file and then tell the system to reserve a region of virtual address space. You tell the system to map the first byte of the file to the first byte of this reserved region. You can then access the region of virtual memory as though it actually contained the file. In fact, if there were a single 0 byte at the end of a text file, you could simply call the C run-time function _tcsrev to reverse the data in the file because it is usable simply as an in-memory text string.
This method’s great advantage is that the system manages all the file caching for you. You don’t have to allocate any memory, load file data into memory, write data back to the file, or free any memory blocks. Unfortunately, the possibility that an interruption such as a power failure could corrupt data still exists with memory-mapped files.
A portion of a file that is mapped into your process’ address space is called a view, which explains how MapViewOfFile got its name.
Using Memory Mapped-File to Share Data among Processes
Windows has always excelled at offering mechanisms that allow applications to share data and information quickly and easily. These mechanisms include RPC, COM, OLE, DDE, window messages (especially WM_COPYDATA), the Clipboard, mailslots, pipes, sockets, and so on. In Windows, the lowest-level mechanism for sharing data on a single machine is the memory-mapped file. That’s right, all of the mechanisms I mention ultimately use memory-mapped files to do their dirty work if all the processes communicating are on the same machine. If you require high-performance with low overhead, the memory-mapped file is the hands-down best mechanism to use.
Here is an interesting problem that has caught unsuspecting programmers by surprise. Can you guess what is wrong with the following code fragment?
HANDLE hFile = CreateFile(...);
HANDLE hMap = CreateFileMapping(hFile, ...);
if (hMap == NULL)
If the call shown to CreateFile fails, it returns INVALID_HANDLE_VALUE. However, the unsuspecting programmer who wrote this code didn’t test to check whether the file was created successfully. When CreateFileMapping is called, INVALID_HANDLE_VALUE is passed in the hFile parameter, which causes the system to create a file mapping using storage from the paging file instead of the intended disk file. Any additional code that uses the memory-mapped file will work correctly. However, when the file-mapping object is destroyed, all the data that was written to the file-mapping storage (the paging file) will be destroyed by the system. At this point, the developer sits and scratches his or her head, wondering what went wrong! You must always check CreateFile‘s return value to see if an error occurred because CreateFile can fail for so many reasons!
Sparsely Committed Memory-Mapped Files