The Importance of Data Alignment

Data alignment is not so much a part of the operating system’s memory architecture as it is a part of the CPU’s architecture.

CPUs operate most efficiently when they access properly aligned data. Data is aligned when the memory address of the data modulo of the data’s size is 0. For example, a WORD value should always start on an address that is evenly divided by 2, a DWORD value should always start on an address that is evenly divided by 4, and so on. When the CPU attempts to read a data value that is not properly aligned, the CPU will do one of two things. It will either raise an exception or the CPU will perform multiple, aligned memory accesses to read the full misaligned data value.

Here is some code that accesses misaligned data:

VOID SomeFunc(PVOID pvDataBuffer) {

   // The first byte in the buffer is some byte of information
   char c = * (PBYTE) pvDataBuffer;

   // Increment past the first byte in the buffer
   pvDataBuffer = (PVOID)((PBYTE) pvDataBuffer + 1);

   // Bytes 2-5 contain a double-word value
   DWORD dw = * (DWORD *) pvDataBuffer;

   // The line above raises a data misalignment exception on some CPUs
...

Obviously, if the CPU performs multiple memory accesses, the performance of your application is hampered. At best, it will take the system twice as long to access a misaligned value as it will to access an aligned value—but the access time could be even worse! To get the best performance for your application, you’ll want to write your code so that the data is properly aligned.

Let’s take a closer look at how the x86 CPU handles data alignment. The x86 CPU contains a special bit flag in its EFLAGS register called the AC (alignment check) flag. By default, this flag is set to zero when the CPU first receives power. When this flag is zero, the CPU automatically does whatever it has to in order to successfully access misaligned data values. However, if this flag is set to 1, the CPU issues an INT 17H interrupt whenever there is an attempt to access misaligned data. The x86 version of Windows never alters this CPU flag bit. Therefore, you will never see a data misalignment exception occur in an application when it is running on an x86 processor. The same behavior happens when running on an AMD x86-64 CPU, where, by default, the hardware takes care of misalignment fault fixup.

Now let’s turn our attention to the IA-64 CPU. The IA-64 CPU cannot automatically fix up misaligned data accesses. Instead, when a misaligned data access occurs, the CPU notifies the operating system. Windows now decides if it should raise a data misalignment exception—or it can execute additional instructions that silently correct the problem and allow your code to continue executing. By default, when you install Windows on an IA-64 machine, the operating system automatically transforms a misalignment fault into an EXCEPTION_DATATYPE_MISALIGNMENT exception. However, you can alter this behavior. You can tell the system to silently correct misaligned data accesses for all threads in your process by having one of your process’ threads call the SetErrorMode function:

UINT SetErrorMode(UINT fuErrorMode);

For our discussion, the flag in question is the SEM_NOALIGNMENTFAULTEXCEPT flag. When this flag is set, the system automatically corrects for misaligned data accesses. When this flag is reset, the system does not correct for misaligned data accesses but instead raises data misalignment exceptions. Once you change this flag, you can’t update it again during the process’ lifetime.

Note that changing this flag affects all threads contained within the process that owns the thread that makes the call. In other words, changing this flag will not affect any threads contained in any other processes. You should also note that a process’ error mode flags are inherited by any child processes. Therefore, you might want to temporarily reset this flag before calling the CreateProcess function (although you usually don’t do this for the SEM_NOALIGNMENTFAULTEXCEPT flag because it can’t be reset once set).

Of course, you can call SetErrorMode, passing the SEM_NOALIGNMENTFAULTEXCEPT flag, regardless of which CPU platform you are running on. However, the results are not always the same. For x86 and x64 systems, this flag is always on and cannot be turned off. You can use the Windows Reliability and Performance Monitor to see how many alignment fixups per second the system is performing. The following figure shows what the Add Counters dialog box looks like just before you add this counter to the chart:

image

What this counter really shows is the number of times per second the CPU notifies the operating system of misaligned data accesses. If you monitor this counter on an x86 machine, you’ll see that it always reports zero fixups per second. This is because the x86 CPU itself is performing the fixups and doesn’t notify the operating system. Because the x86 CPU performs the fixup instead of the operating system, accessing misaligned data on an x86 machine is not nearly as bad a performance hit as that of CPUs that require software (the Windows operating system code) to do the fixup. As you can see, simply calling SetErrorMode is enough to make your application work correctly. But this solution is definitely not the most efficient.

Microsoft’s C/C++ compiler for the IA-64 supports a special keyword called __unaligned. You use the __unaligned modifier just as you would use the const or volatile modifiers, except that the __unaligned modifier is meaningful only when applied to pointer variables. When you access data via an unaligned pointer, the compiler generates code that assumes that the data is not aligned properly and adds the additional CPU instructions necessary to access the data. The code shown here is a modified version of the code shown earlier. This new version takes advantage of the __unaligned keyword:

VOID SomeFunc(PVOID pvDataBuffer) {

   // The first byte in the buffer is some byte of information
   char c = * (PBYTE) pvDataBuffer;

   // Increment past the first byte in the buffer
   pvDataBuffer = (PVOID)((PBYTE) pvDataBuffer + 1);

   // Bytes 2-5 contain a double-word value
   DWORD dw = * (__unaligned DWORD *) pvDataBuffer;

   // The line above causes the compiler to generate additional
   // instructions so that several aligned data accesses are performed
   // to read the DWORD.
   // Note that a data misalignment exception is not raised.
...

The instructions added by the compiler are still much more efficient than letting the CPU trap the misaligned data access and having the operating system correct the problem. In fact, if you monitor the Alignment Fixups/sec counter, you’ll see that accesses via unaligned pointers have no effect on the chart. Notice that the compiler will generate the additional instructions even in the case where the structure is aligned and, so, make the code less efficient in that case.

Finally, the __unaligned keyword is not supported by the x86 version of the Microsoft Visual C/C++ compiler. I assume that Microsoft felt that this wasn’t necessary because of the speed at which the CPU itself can perform the fixups. However, this also means that the x86 compiler will generate errors when it encounters the __unaligned keyword. So if you are trying to create a single source code base for your application, you’ll want to use the UNALIGNED and UNALIGNED64 macros instead of the __unaligned keyword. The UNALIGNED* macros are defined in WinNT.h as follows:

#if defined(_M_MRX000) || defined(_M_ALPHA) || defined(_M_PPC) ||
     defined(_M_IA64) || defined(_M_AMD64)
    #define ALIGNMENT_MACHINE
    #define UNALIGNED __unaligned
    #if defined(_WIN64)
       #define UNALIGNED64 __unaligned
    #else
       #define UNALIGNED64
    #endif


   #else
       #undef ALIGNMENT_MACHINE
       #define UNALIGNED
       #define UNALIGNED64
   #endif

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s