Kernel Objects

Each kernel object is simply a memory block allocated by the kernel and is accessible only by the kernel. This memory block is a data structure whose members maintain information about the object.

Because the kernel object data structures are accessible only by the kernel, it is impossible for an application to locate these data structures in memory and directly alter their contents.

If we cannot alter these structures directly, how do our applications manipulate these kernel objects? The answer is that Windows offers a set of functions that manipulate these structures in well-defined ways. These kernel objects are always accessible via these functions. When you call a function that creates a kernel object, the function returns a handle that identifies the object. Think of this handle as an opaque value that can be used by any thread in your process. A handle is a 32-bit value in a 32-bit Windows process and a 64-bit value in a 64-bit Windows process.

To make the operating system robust, these handle values are process-relative. So if you were to pass this handle value to a thread in another process (using some form of interprocess communication), the calls that this other process would make using your process’ handle value might fail or, even worse, they will create a reference to a totally different kernel object at the same index in your process handle table.

Kernel objects are owned by the kernel, not by a process. In other words, if your process calls a function that creates a kernel object and then your process terminates, the kernel object is not necessarily destroyed. Under most circumstances, the object will be destroyed; but if another process is using the kernel object your process created, the kernel knows not to destroy the object until the other process has stopped using it.

The kernel knows how many processes are using a particular kernel object because each object contains a usage count. The usage count is one of the data members common to all kernel object types.

Kernel objects can be protected with a security descriptor. A security descriptor describes who owns the object (usually its creator), which group and users can gain access to or use the object, and which group and users are denied access to the object. Security descriptors are usually used when writing server applications.

Neglecting proper security access flags is one of the biggest mistakes that developers make. Using the correct flags will certainly make it much easier to port an application between Windows versions. However, you also need to realize that each new version of Windows brings a new set of constraints that did not exist in the previous versions. For example, in Windows Vista, you need to take care of the User Account Control (UAC) feature. By default, UAC forces applications to run in a restricted context for security safety even though the current user is part of the Administrators group.

When you first start programming for Windows, you might be confused when you try to differentiate a User object or a GDI object from a kernel object. For example, is an icon a User object or a kernel object? The easiest way to determine whether an object is a kernel object is to examine the function that creates the object. Almost all functions that create kernel objects have a parameter that allows you to specify security attribute information.

None of the functions that create User or GDI objects have a PSECURITY_ATTRIBUTES parameter. For example, take a look at the CreateIcon function:

HICON CreateIcon(HINSTANCE hinst, int nWidth, int nHeight, BYTE cPlanes, BYTE cBitsPixel, CONST BYTE *pbANDbits, CONST BYTE *pbXORbits);

When a process is initialized, the system allocates a handle table for it. This handle table is used only for kernel objects, not for User objects or GDI objects.

When a process first initializes, its handle table is empty. When a thread in the process calls a function that creates a kernel object, such as CreateFileMapping, the kernel allocates a block of memory for the object and initializes it. The kernel then scans the process’ handle table for an empty entry.

All functions that create kernel objects return process-relative handles that can be used successfully by any and all threads that are running in the same process. This handle value should actually be divided by 4 (or shifted right two bits to ignore the last two bits that are used internally by Windows) to obtain the real index into the process’ handle table that identifies where the kernel object’s information is stored.

If you call a function to create a kernel object and the call fails, the handle value returned is usually 0 (NULL), and this is why the first valid handle value is 4. The system would have to be very low on memory or encountering a security problem for this to happen. Unfortunately, a few functions return a handle value of -1 (INVALID_HANDLE_VALUE defined in WinBase.h) when they fail. For example, if CreateFile fails to open the specified file, it returns INVALID_HANDLE_VALUE instead of NULL.

Regardless of how you create a kernel object, you indicate to the system that you are done manipulating the object by calling CloseHandle:

BOOL CloseHandle(HANDLE hobject);

Usually, when you create a kernel object, you store the corresponding handle in a variable. After you call CloseHandle with this variable as a parameter, you should also reset the variable to NULL. If, by mistake, you reuse this variable to call a Win32 function, two unexpected situations might occur. Because the handle table slot referenced by the variable has been cleared, Windows receives an invalid parameter and you get an error. But another situation that is harder to debug is also possible. When you create a new kernel object, Windows looks for a free slot in the handle table. So, if new kernel objects have been constructed in your application workflows, the handle table slot referenced by the variable will certainly contain one of these new kernel objects. Thus, the call might target a kernel object of the wrong type or, even worse, a kernel object of the same type as the closed one. Your application state then becomes corrupted without any chance to recover.

Let’s say that you forget to call CloseHandle—will there be an object leak? Well, yes and no. It is possible for a process to leak resources (such as kernel objects) while the process runs. However, when the process terminates, the operating system ensures that all resources used by the process are freed—this is guaranteed. For kernel objects, the system performs the following actions:

  1. When your process terminates, the system automatically scans the process’ handle table.
  2. If the table has any valid entries (objects that you didn’t close before terminating), the system closes these object handles for you.
  3. If the usage count of any of these objects goes to zero, the kernel destroys the object.

Because kernel object handles are process-relative, performing these tasks is difficult. However, Microsoft had several good reasons for designing the handles to be process-relative.

  • The most important reason was robustness. If kernel object handles were system wide values, one process could easily obtain the handle to an object that another process was using and wreak havoc on that process.
  • Another reason for process-relative handles is security. Kernel objects are protected with security, and a process must request permission to manipulate an object before attempting to manipulate it. The creator of the object can prevent an unauthorized user from touching the object simply by denying access to it.

There are three different ways to allow processes to share kernel objects:

  1. Using object handler inheritance.
  2. Naming objects.
  3. Duplicating Object Handlers.

Using Object Handler Inheritance

Object handle inheritance can be used only when processes have a parent-child relationship. In this scenario, one or more kernel object handles are available to the parent process, and the parent decides to spawn a child process, giving the child access to the parent’s kernel objects. For this type of inheritance to work, the parent process must perform several steps.

First, when the parent process creates a kernel object, the parent must indicate to the system that it wants the object’s handle to be inheritable. Sometimes I hear people use the term object inheritance. However, there is no such thing as object inheritance; Windows supports object handle inheritance. In other words, it is the handles that are inheritable, not the objects themselves.

To create an inheritable handle, the parent process must allocate and initialize a SECURITY_ATTRIBUTES structure and pass the structure’s address to the specific Create function. The following code creates a Mutex object and returns an inheritable handle to it:


sa.nLength = sizeof(sa);

sa.lpSecurityDescriptor = NULL;

sa.bInheritHandle = TRUE; // Make the returned handle inheritable.


HANDLE hMutex = CreateMutex(&sa, FALSE, NULL);

The next step to perform when using object handle inheritance is for the parent process to spawn the child process. This is done using the CreateProcess function:

BOOL CreateProcess(

   PCTSTR pszApplicationName,

   PTSTR pszCommandLine,



   BOOL bInheritHandles,

   DWORD dwCreationFlags,

   PVOID pvEnvironment,

   PCTSTR pszCurrentDirectory,

   LPSTARTUPINFO pStartupInfo,

   PPROCESS_INFORMATION pProcessInformation);

Usually, when you spawn a process, you pass FALSE for this parameter. This value tells the system that you do not want the child process to inherit the inheritable handles that are in the parent process’ handle table. If you pass TRUE for this parameter, however, the child inherits the parent’s inheritable handle values.

The content of kernel objects is stored in the kernel address space that is shared by all processes running on the system. For 32-bit systems, this is in memory between the following memory addresses: 0x80000000 and 0xFFFFFFFF. For 64-bit systems, this is in memory between the following memory addresses: 0x00000400’00000000 and 0xFFFFFFF’FFFFFFFF.

Be aware that object handle inheritance applies only at the time the child process is spawned. If the parent process were to create any new kernel objects with inheritable handles, an already-running child process would not inherit these new handles.

Object handle inheritance has one very strange characteristic: when you use it, the child has no idea that it has inherited any handles. Kernel object handle inheritance is useful only when the child process documents the fact that it expects to be given access to a kernel object when spawned from another process. Usually, the parent and child applications are written by the same company; however, a different company can write the child application if that company documents what the child application expects.

By far, the most common way for a child process to determine the handle value of the kernel object that it’s expecting is to have the handle value passed as a command-line argument to the child process. The child process’ initialization code parses the command line (usually by calling _stscanf_s) and extracts the handle value. Once the child has the handle value, it has the same access to the object as its parent. Note that the only reason handle inheritance works is because the handle value of the shared kernel object is identical in both the parent process and the child process. This is why the parent process is able to pass the handle value as a command-line argument.

Another technique is for the parent process to add an environment variable to its environment block. The variable’s name would be something that the child process knows to look for, and the variable’s value would be the handle value of the kernel object to be inherited. Then when the parent spawns the child process, the child process inherits the parent’s environment variables and can easily call GetEnvironmentVariable to obtain the inherited object’s handle value. This approach is excellent if the child process is going to spawn another child process, because the environment variables can be inherited again.

For the sake of completeness, I’ll also mention the GetHandleInformation function:

BOOL GetHandleInformation(HANDLE hObject, PDWORD pdwFlags);

This function returns the current flag settings for the specified handle in the DWORD pointed to by pdwFlags. To see if a handle is inheritable, do the following:

DWORD dwFlags; GetHandleInformation(hObj, &dwFlags); BOOL fHandleIsInheritable = (0 != (dwFlags & HANDLE_FLAG_INHERIT));

Naming Objects

Most of Kernel Object functions have a common last parameter, pszName. When you pass NULL for this parameter, you are indicating to the system that you want to create an unnamed (anonymous) kernel object. When you create an unnamed object, you can share the object across processes by using either inheritance or DuplicateHandle. To share an object by name, you must give the object a name.

An alternative method exists for sharing objects by name. Instead of calling a Create* function, a process can call one of the Open* function. As shown in the function below:

HANDLE OpenMutex( DWORD dwDesiredAccess, BOOL bInheritHandle, PCTSTR pszName);

The last parameter, pszName, indicates the name of a kernel object. You cannot pass NULL for this parameter; you must pass the address of a zero-terminated string. These functions search the single namespace of kernel objects attempting to find a match. If no kernel object with the specified name exists, the functions return NULL and GetLastError returns 2 (ERROR_FILE_NOT_FOUND). However, if a kernel object with the specified name does exist, but it has a different type, the functions return NULL and GetLastError returns 6 (ERROR_INVALID_HANDLE). And if it is the same type of object, the system then checks to see whether the requested access (via the dwDesiredAccess parameter) is allowed. If it is, the calling process’ handle table is updated and the object’s usage count is incremented. The returned handle will be inheritable if you pass TRUE for the bInheritHandle parameter.

The main difference between calling a Create* function versus calling an Open* function is that if the object doesn’t already exist, the Create* function will create it, whereas the Open* function will simply fail.

Named objects are commonly used to prevent multiple instances of an application from running. To do this, simply call a Create* function in your _tmain or _tWinMain function to create a named object. (It doesn’t matter what type of object you create.) When the Create* function returns, call GetLastError. If GetLastError returns ERROR_ALREADY_EXISTS, another instance of your application is running and the new instance can exit. Here’s some code that illustrates this:


   int nCmdShow) {

   HANDLE h = CreateMutex(NULL, FALSE,


   if (GetLastError() == ERROR_ALREADY_EXISTS) {

      // There is already an instance of this application running.

      // Close the object and immediately return.





   // This is the first instance of this application running.


   // Before exiting, close the object.





A service’s named kernel objects always go in the global namespace. By default, in Terminal Services, an application’s named kernel object goes in the session’s namespace. However, it is possible to force the named object to go into the global namespace by prefixing the name with “Global\”, as in the following example:

HANDLE h = CreateEvent(NULL, FALSE, FALSE, TEXT("Global\\MyName"));

You can also explicitly state that you want a kernel object to go in the current session’s namespace by prefixing the name with “Local\”, as in the following example:

HANDLE h = CreateEvent(NULL, FALSE, FALSE, TEXT("Local\\MyName"));


When you create a kernel object, you can protect the access to it by passing a pointer to a SECURITY_ATTRIBUTES structure. However, prior to the release of Windows Vista, it was not possible to protect the name of a shared object against hijacking. Any process, even with the lowest privileges, is able to create an object with a given name. If you take the previous example where an application is using a named mutex to detect whether or not it is already started, you could very easily write another application that creates a kernel object with the same name. If it gets started before the singleton application, this application becomes a “none-gleton” because it will start and then always immediately exit, thinking that another instance of itself is already running. This is the base mechanism behind a couple of attacks known as Denial of Service (DoS) attacks. Notice that unnamed kernel objects are not subject to DoS attacks, and it is quite common for an application to use unnamed objects, even though they can’t be shared between processes.

Duplicating Object Handlers

The last technique for sharing kernel objects across process boundaries requires the use of the DuplicateHandle function:

BOOL DuplicateHandle( HANDLE hSourceProcessHandle, HANDLE hSourceHandle, HANDLE hTargetProcessHandle, PHANDLE phTargetHandle, DWORD dwDesiredAccess, BOOL bInheritHandle, DWORD dwOptions);

Simply stated, this function takes an entry in one process’ handle table and makes a copy of the entry into another process’ handle table. DuplicateHandle takes several parameters but is actually quite straightforward. The most general usage of the DuplicateHandle function could involve three different processes that are running in the system.

Working with Characters and Strings

The problem is that some languages and writing systems (Japanese kanji being a classic example) have so many symbols in their character sets that a single byte, which offers no more than 256 different symbols at best, is just not enough. So double-byte character sets (DBCSs) were created to support these languages and writing systems. In a double-byte character set, each character in a string consists of either 1 or 2 bytes. With kanji, for example, if the first character is between 0x81 and 0x9F or between 0xE0 and 0xFC, you must look at the next byte to determine the full character in the string. Working with double-byte character sets is a programmer’s nightmare because some characters are 1 byte wide and some are 2 bytes wide. Fortunately, you can forget about DBCS and take advantage of the support of Unicode strings supported by Windows functions and the C run-time library functions.

Unicode Encoders:

  • UTF-16 encodes each character as 2 bytes (or 16 bits). Most popular and used.
  • UTF-8 encodes some characters as 1 byte, some characters as 2 bytes, some characters as 3 bytes, and some characters as 4 bytes. Characters with a value below 0x0080 are compressed to 1 byte, which works very well for characters used in the United States. Characters between 0x0080 and 0x07FF are converted to 2 bytes, which works well for European and Middle Eastern languages. Characters of 0x0800 and above are converted to 3 bytes, which works well for East Asian languages. Finally, surrogate pairs are written out as 4 bytes. UTF-8 is an extremely popular encoding format, but it’s less efficient than UTF-16 if you encode many characters with values of 0x0800 or above.
  • UTF-32 encodes every character as 4 bytes. This encoding is useful when you want to write a simple algorithm to traverse characters (used in any language) and you don’t want to have to deal with characters taking a variable number of bytes. For example, with UTF-32, you do not need to think about surrogates because every character is 4 bytes. Obviously, UTF-32 is not an efficient encoding format in terms of memory usage. Therefore, it’s rarely used for saving or transmitting strings to a file or network. This encoding format is typically used inside the program itself.

Microsoft’s C/C++ compiler defines a built-in data type, wchar_t, which represents a 16-bit Unicode (UTF-16) character.

This is how to define string of wchar_t:

wchar_t szBuffer[100] = L”A String”;

An uppercase L before a literal string informs the compiler that the string should be compiled as a Unicode string. When the compiler places the string in the program’s data section, it encodes each character using UTF16, interspersing zero bytes between every ASCII character in this simple case.

Header annotation gives the compiler the ability to analyze your code to see if it’s used properly. You can read about header annotation from this link.


Under Windows Vista, Microsoft’s source code for CreateWindowExA is simply a translation layer that allocates memory to convert ANSI strings to Unicode strings; the code then calls CreateWindowExW, passing the converted strings. When CreateWindowExW returns, CreateWindowExA frees its memory buffers and returns the window handle to you. So, for functions that fill buffers with strings, the system must convert from Unicode to non-Unicode equivalents before your application can process the string. Because the system must perform all these conversions, your application requires more memory and runs slower. You can make your application perform more efficiently by developing your application using Unicode from the start. Also, Windows has been known to have some bugs in these translation functions, so avoiding them also eliminates some potential bugs.

Certain functions in the Windows API, such as WinExec and OpenFile, exist solely for backward compatibility with 16-bit Windows programs that supported only ANSI strings. These methods should be avoided by today’s programs. You should replace any calls to WinExec and OpenFile with calls to the CreateProcess and CreateFile functions. Internally, the old functions call the new functions anyway. The big problem with the old functions is that they don’t accept Unicode strings and they typically offer fewer features. When you call these functions, you must pass ANSI strings. On Windows Vista, most non-obsolete functions have both Unicode and ANSI versions. However, Microsoft has started to get into the habit of producing some functions offering only Unicode versions—for example, ReadDirectoryChangesW and CreateProcessWithLogonW.

It is possible to write your source code so that it can be compiled using ANSI or Unicode characters and strings. In the WinNT.h header file, the following types and macros are defined:

#ifdef UNICODE



#define __TEXT(quote) quote // r_winnt

#define __TEXT(quote) L##quote

#else typedef CHAR TCHAR, *PTCHAR, PTSTR;


#define __TEXT(quote) quote

#endif #define TEXT(quote) __TEXT(quote)

Certain functions in the Windows API, such as WinExec and OpenFile, exist solely for backward compatibility with 16-bit Windows programs that supported only ANSI strings. These methods should be avoided by today’s programs. You should replace any calls to WinExec and OpenFile with calls to the CreateProcess and CreateFile functions.

When Microsoft was porting COM from 16-bit Windows to Win32, an executive decision was made that all COM interface methods requiring a string would accept only Unicode strings. This was a great decision because COM is typically used to allow different components to talk to each other and Unicode is the richest way to pass strings around. Using Unicode throughout your application makes interacting with COM easier too.

Finally, when the resource compiler compiles all your resources, the output file is a binary representation of the resources. String values in your resources (string tables, dialog box templates, menus, and so on) are always written as Unicode strings. Under Windows Vista, the system performs internal conversions if your application doesn’t define the UNICODE macro. For example, if UNICODE is not defined when you compile your source module, a call to LoadString will actually call the LoadStringA function. LoadStringA will then read the Unicode string from your resources and convert the string to ANSI. The ANSI representation of the string will be returned from the function to your application.

ny function that modifies a string exposes a potential danger: if the destination string buffer is not large enough to contain the resulting string, memory corruption occurs. Here is an example:

// The following puts 4 characters in a // 3-character buffer, resulting in memory corruption

WCHAR szBuffer[3] = L””;

wcscpy(szBuffer, L”abc”); // The terminating 0 is a character too!

The problem with the strcpy and wcscpy functions (and most other string manipulation functions) is that they do not accept an argument specifying the maximum size of the buffer, and therefore, the function doesn’t know that it is corrupting memory.

To secure your code use _s suffix string (found in StrSafe.h) functions rather than useual string functions.

The C run time actually allows you to provide a function of your own, which it will call when it detects an invalid parameter. Then, in this function, you can log the failure, attach a debugger, or do whatever you like. To enable this, you must first define a function that matches the following prototype:

void InvalidParameterHandler(PCTSTR expression, PCTSTR function, PCTSTR file, unsigned int line, uintptr_t /*pReserved*/);

To get this done please read this article and examine its sample code.

Why you should use Unicode:

  1. Unicode makes it easy for you to localize your application to world markets.
  2. Unicode allows you to distribute a single binary (.exe or DLL) file that supports all languages.
  3. Unicode improves the efficiency of your application because the code performs faster and uses less memory. Windows internally does everything with Unicode characters and strings, so when you pass an ANSI character or string, Windows must allocate memory and convert the ANSI character or string to its Unicode equivalent.
  4. Using Unicode ensures that your application can easily call all nondeprecated Windows functions, as some Windows functions offer versions that operate only on Unicode characters and strings.
  5. Using Unicode ensures that your code easily integrates with COM (which requires the use of Unicode characters and strings).
  6. Using Unicode ensures that your code easily integrates with the .NET Framework (which also requires the use of Unicode characters and strings).
  7. Using Unicode ensures that your code easily manipulates your own resources (where strings are always persisted as Unicode).

Tips to keep in mind while coding:

String mainpulation tips:

You use the Windows function MultiByteToWideChar to convert multibyte-character strings to wide-character strings.

For many applications that open text files and process them, such as compilers, it would be convenient if, after opening a file, the application could determine whether the text file contained ANSI characters or Unicode characters. The IsTextUnicode function exported by AdvApi32.dll and declared in WinBase.h can help make this distinction:

BOOL IsTextUnicode(CONST PVOID pvBuffer, int cb, PINT pResult);

Introduction and Error Handling

This book focuses on 64-Bit architecture. Here is a quick look at what you need to know about 64-bit Windows:

  • The 64-bit Windows kernel is a port of the 32-bit Windows kernel. This means that all the details and intricacies that you’ve learned about 32-bit Windows still apply in the 64-bit world. In fact, Microsoft has modified the 32-bit Windows source code so that it can be compiled to produce a 32-bit or a 64-bit system. They have just one source-code base, so new features and bug fixes are simultaneously applied to both systems.
  • Because the kernels use the same code and underlying concepts, the Windows API is identical on both platforms. This means that you do not have to redesign or reimplement your application to work on 64-bit Windows. You can simply make slight modifications to your source code and then rebuild.
  • For backward compatibility, 64-bit Windows can execute 32-bit applications. However, your application’s performance will improve if the application is built as a true 64-bit application.
  • Because it is so easy to port 32-bit code, there are already device drivers, tools, and applications available for 64-bit Windows. Unfortunately, Visual Studio is a native 32-bit application and Microsoft seems to be in no hurry to port it to be a native 64-bit application. However, the good news is that 32-bit Visual Studio does run quite well on 64-bit Windows; it just has a limited address space for its own data structures. And Visual Studio does allow you to debug a 64-bit application.

Always distinguish between error MessageId and ErrorCode.To get the error out, you must check on the ErrorCode instead of ErrorMessage.

While debugging, it’s extremely useful to monitor the thread’s last error code. In Microsoft Visual Studio, Microsoft’s debugger supports a useful feature—you can configure the Watch window to always show you the thread’s last error code number and the text description of the error. This is done by selecting a row in the Watch window and typing $err,hr.

This will support you with the error code. Visual Studio also ships with a small utility called Error Lookup. You can use Error Lookup to convert an error code number into its textual description.

If you detect an error in an application you’ve written, you might want to show the text description to the user. Windows offers a function that converts an error code into its text description. This function is called FormatMessage.

To indicate failure, simply set the thread’s last error code and then have your function return FALSE, INVALID_HANDLE_VALUE, NULL, or whatever is appropriate. To set the thread’s last error code, you simply call

VOID SetLastError(DWORD dwErrCode);

Passing into the function whatever 32-bit number you think is appropriate. I try to use codes that already exist in WinError.h—as long as the code maps well to the error I’m trying to report. If you don’t think that any of the codes in WinError.h accurately reflect the error, you can create your own code. The error code is a 32-bit number that is divided into the fields shown in table below:

Reference Books:


The likelihood of writing efficient code is very small unless you understand the origins of temporary objects, their cost, and how to eliminate them when you can.

Only the first form of initialization is guaranteed, across compiler implementations, not to generate a temporary object. If you use forms 2 or 3, you may end up with a temporary, depending on the compiler implementation.

For example take the form:

This lead the compiler to generate the following code:

The overall cost here is two constructors and one destructor!

If you have function with the following definition:

An invocation of g(“message”) will trigger the creation of a temporary string object unless you overload g() to accept a char * as an argument:

In the following code fragment the operator+() expects two Complex objects as arguments. A temporary Complex object gets generated to represent the constant 1.0:

The problem is that this temporary is generated over and over every iteration through the loop. Lifting constant expressions out of a loop is a trivial and well-known optimization. The temporary generation in a = b + 1.0; is a computation whose value is constant from one iteration to the next. In that case, why should we do it over and over? Let’s do it once and for all:

Passing by value constructs temporary objects so passing them by reference optimizes the performance. Passing by value makes the following overheads:

  1. Call for the class constructor and calling for all class data members constructors.
  2. Calling the copy constructor of the class for the created temp variable.
  3. After the return, calling of the class destructor and destructors for all class data members.

The above chunk of code results in 6 function calls and creation of temporary variables. The below code acts as a solution:

As pointed out

But on a performance-critical path you need to forgo elegance in favor of raw performance. The second, “ugly” form is much more efficient. It creates zero temporaries.

Key Points:

  • A temporary object could penalize performance twice in the form of constructor and destructor computations.
  • Declaring a constructor explicit will prevent the compiler from using it for type conversion behind your back.
  • A temporary object is often created by the compiler to fix a type mismatch. You can avoid it by function overloading.
  • Avoid object copy if you can. Pass and return objects by reference.

You can eliminate temporaries by using <op>= operators where <op> may be +, -, *, or /.

The Return Value Optimization

When returning a value, the compiler probably creates a temp variable and then set the value original object (i.e. if c = a + b, then c is original return variable).

If we developed 2 version of as the following:


The second version, without RVO, executed in 1.89 seconds. The first version, with RVO applied was much faster—1.30 seconds.

We speculated that the difference may lie in the fact that Version 1 used a named variable (retVal) as a return value whereas Version 2 used an unnamed variable. Version 2 used a constructor call in the return statement but never named it. It may be the case that this particular compiler implementation chose to avoid optimizing away named variables.

In addition, you must also define a copy constructor to “turn on” the Return Value Optimization. If the class involved does not have a copy constructor defined, the RVO is quietly turned off.

If the compiler can’t do RVO you can make iy through computational constructor:

You can now (after declaring the preceding computational constructor) use the following operator overloading guaranteeing RVO:

If you wanted to apply the same idea to the other arithmetic operators, you would have to add a third argument to distinguish the signatures of the computational constructors for addition, subtraction, multiplication, and division. This is the criticism against the computational constructor: It bends over backwards for the sake of efficiency and introduces “unnatural” constructors.

Key Points:

  • If you must return an object by value, the Return Value Optimization will help performance by eliminating the need for creation and destruction of a local object.
  • The application of the RVO is up to the discretion of the compiler implementation. You need to consult your compiler documentation or experiment to find if and when RVO is applied.

You will have a better shot at RVO by deploying the computational constructor.

Virtual Functions

Virtual functions seem to inflict a performance cost in several ways:

  • The vptr (virtual table pointer) must be initialized in the constructor.
  • The vptr must be initialized in the constructor virtual function is invoked via pointer indirection. We must fetch the pointer to the function table and then access the correct function offset.
  • Inlining is a compile-time decision. The compiler cannot inline virtual functions whose resolution takes place at run-time.

The inability to inline a virtual function is its biggest performance penalty.

Key Points:

  • The cost of a virtual function stems from the inability to inline calls that are dynamically bound at run-time. The only potential efficiency issue is the speed gained from inlining if there is any. Inlining efficiency is not an issue in the case of functions whose cost is not dominated by call and return overhead.

Templates are more performance-friendly than inheritance hierarchies. They push type resolution to compile-time, which we consider to be free.

Constructors and Destructors

Always take care of the overhead resulted from invoking constructors and destructors of an object.

If the construct (or destructor) is called frequently, it’s recommended to inline it.

This is not to say that inheritance is fundamentally a performance obstacle. We must make a distinction between the overall computational cost, required cost, and computational penalty. The overall computational cost is the set of all instructions executed in a computation. The required cost is that subset of instructions whose results are necessary. This part of the computation is mandatory; computational penalty is the rest. This is the part of the computation that could have been eliminated by an alternative design or implementation.

Initializing data members as pointer to objects let you initialize them whenever you want. Further, it lets you partially instantiate them. Further, allocating objects at run time consumes performance where the standalone object initialization allocates its memory (on the stack) at compilation time. It’s a trade-off, you must pick what is more suitable for you.

The habit of automatically defining all objects up front could be wasteful—you may construct objects that you end up not using. So, initialize code where you’ll sure need it. As an example observes packet variable in the 2 code fragments below:

After optimization:

This is called, lazy construction.

When initializing a data member in a class, use the initialization list rather than assigning a value to the data member in the constructor body. In strings, this reclaimed about 50 ms.

Key Points:

  • Constructors and destructors may be as efficient as hand-crafted C code. In practice, however, they often contain overhead in the form of superfluous computations.
  • The construction (destruction) of an object triggers recursive construction (destruction) of parent and member objects. Watch out for the combinatorial explosion of objects in complex hierarchies. They make construction and destruction more expensive.
  • Make sure that your code actually uses all the objects that it creates and the computations that they perform.
  • Don’t create an object unless you are going to use it.

Compilers must initialize contained member objects prior to entering the constructor body. You ought to use the initialization phase to complete the member object creation. This will save the overhead of calling the assignment operator later in the constructor body. In some cases, it will also avoid the generation of temporary objects.