As with any form of troubleshooting, the more you understand the underlying system being debugged the greater success you will have at identifying the root cause. In the .NET world, this translates to understanding how the runtime itself functions. Knowing how the garbage collector works will enable you to more efficiently debug memory “leak” issues. Knowing how the interoperability layer works will enable you to more efficiently debug COM problems. Knowing how synchronization works will enable you to more efficiently debug hangs. And the list goes on and on. Venturing outside of the comfort zone of your own application and digging deep into the run time will greatly enhance your debugging success. Problems that may have otherwise taken weeks to debug through traditional means can now be solved in a relatively short time span.
In this article, we will take a guided tour of the .NET runtime especially core runtime components and concepts useful when debugging.
At a high level, .NET is a virtual runtime environment that consists of a virtual execution engine, the Common Language Runtime (CLR), and a set of associated framework libraries. Applications written for .NET, at compile time, do not translate into machine code but instead use an intermediary representation that the execution engine translates at runtime (depending on architecture). Although this may seem as if the CLR acts as an interpreter (interpreting the intermediate language), the primary difference between the CLR and an interpreter is that the CLR does not retranslate the intermediate code each and every time. Rather, it takes a one-time hit of translating a chunk of intermediate code into machine code and then reuses the translated machine code in all subsequent invocations.
To better understand what components .NET consists of, Figure below illustrates the 50,000-foot overview of the different entities involved in the .NET world. At the core of .NET, there is an ECMA standard that states what implementations of the .NET runtime need to adhere to in order to be compliant. This standards document is commonly referred to as the Common Language Infrastructure (CLI). The CLI doesn’t just dictate rules for the runtime itself but also includes a set of library classes that are considered crucial and common enough to warrant inclusion. This set of class libraries is called the Base Class Libraries (BCL). The next layer in the Figure is the Common Language Runtime (CLR). This is an actual component and represents Microsoft’s implementation of the CLI. When a .NET redistributable package is installed on a machine, it includes the CLR. On top of the CLR sits the .NET framework. These are all the libraries that are available to developers when creating .NET applications. The .NET framework can be considered a superset of the BCL and includes frameworks such as the Windows Communication Foundation (WCF), Windows Presentation Foundation (WPF), and much more. The libraries that are part of the .NET framework but not the BCL are considered outside of the standards realm, and any applications that make use of them may or may not work on other CLI implementations besides the CLR. At the top level, we have the .NET applications, which run within the confines of the CLR.
Are there other CLI complaint implementations?
Is Microsoft’s CLR the only implementation of the CLI out there? Not quite. Because the CLI has become increasingly popular, there are a number of companies/organizations that have produced their own CLI-compliant runtimes. A great example of such an implementation is the Mono project (sponsored by Novell). In addition to being an open source project, the Mono CLI implementation can run on Windows, Linux, Solaris, and Mac OS X.
Additionally, Microsoft has released the Shared Source Common Language Infrastructure (2.0), aka Rotor project, which includes a CLI-compliant implementation of the standard. Because the source code is shared source, this project provides great insights into how a functional implementation works.
Because the CLR is responsible for all aspects of .NET application execution, what does the general execution flow look like? Figure below illustrates a high-level overview of the execution model starting with the application’s source code
In .NET, the net outcome of a compilation is known as an assembly. The notion of an assembly is at the heart of .NET and will be discussed in more detail later in the chapter. For now, you can view the assembly as a self-contained entity that encapsulates everything that needs to be known about the application (including the code, or MSIL for the application). When the .NET assembly is run, the CLR is automatically loaded and begins executing the MSIL. The way that MSIL is executed is by first translating it to instructions native to the platform that the code is executing on. This translation is done at runtime by a component in the CLR known as the Just-In-Time (JIT) compiler.
CLR and Windows Loader:
Windows loader is able to execute normally a native code program but in .NET case the program code is in MSIL not native code so how Windows execute such programs? The answers lies in the portable executable (PE) file format. Figure below illustrates at a high level, the general structure of a PE image file.
To support execution of PE images, the PE header includes a field called AddressOfEntryPoint. This field indicates the location of the entry point for the PE file. In the case of a .NET assembly, it points to a small piece of stub code located in the .text section. The next field of importance is in the data directories. When any given .NET compiler produces an assembly, it adds a data directory entry to the PE file. More specifically, the data directory entry is at index 15 and contains the location and size of the CLR header. The CLR header is then located in the next part of interest in the PE file, namely the .text section. The CLR header consists of a structure named the IMAGE_COR20_HEADER. This structure contains information such as the managed code application entry point, the major and minor version of the target CLR, and the strong name signature of the assembly. You can view this data structure as containing information needed to know which CLR to load and the most basic data about the assembly itself. Other parts of the .text section include the assembly metadata tables, the MSIL, and the unmanaged startup stub. The unmanaged startups tub simply contains the code that will be executed by the Windows loader to bootstrap the execution of the PE file.
In the next few sections, we will take a look at how the Windows loader loads both native images as well as .NET assemblies.
Loading Native Images:
To better understand the loading of .NET assemblies, we’ll start by looking at how the Windows loader loads native PE images. Let’s use good old notepad.exe as the example executable (running on Windows Vista Enterprise). Please note that when dealing with PE files there are two important terms used:
File offset: This is the offset within the PE file of any given location.
Relative Virtual Address (RVA): This value is applicable only when the PE image has been loaded and is the relative address within the virtual address space of the process. For example, an RVA of 0x200 means 0x200 bytes from the image base address once loaded into memory.
Loading .NET Assemblies:
- The user executes a .NET assembly.
- The Windows loader looks at the AddressOfEntryPoint field and references the .text section of the PE image file.
- The bytes located at the AddressOfEntryPoint location are simply a JMP instruction to an imported function in mscoree.dll.
- Control is transferred to the _CorExeMain function in mscoree.dll to bootstrap the CLR and transfer execution to the assembly’s entry point.
At a high level, an assembly is the primary building block and deployment unit of .NET applications and can be viewed as a self-describing logical container for other components. When I say self-describing I mean that the assembly contains all the necessary information to uniquely identify and describe the assembly.
There are two different categories of assemblies:
- Shared assemblies: are assemblies that are intended to be used across different.NET applications. Framework assemblies are good examples of shared assemblies.
- Private assemblies: are assemblies that are used as part of an application/component but are not suitable to be used by other applications/components.
Because an assembly is the fundamental building block of .NET applications and is entirely self-describing, where is the descriptive content stored? The answer lies in the metadata section of an assembly, also known as the assembly manifest. An assembly manifest is typically embedded in the assembly PE file but is not required to be.
Below is an example for single and multi-file assemblies
An assembly manifest typically contains the following pieces of information:
- List of dependent native code modules
- List of dependent assemblies
- Version of the assembly
- Public key token of the assembly (if assigned)
- Assembly resources
- Assembly flags such as stack reserve, sub system and so on
The best way to view the manifest for a given assembly is to use a tool called ILDasm. It is installed as part of the .NET 2.0 SDK and can display very rich assembly information. To view the manifest of an assembly, launch ildasm.exe with the name of the assembly from the command line.
Each object instance located on the managed heap consists of the following pieces of auxiliary information (check the figure below):
- The sync block is a bit mask of auxiliary information or an index into a table maintained by the CLR and contains auxiliary information about the object itself.
- The type handle is the fundamental unit of the type system in the CLR. It serves as the starting point for fully describing the type located on the managed heap.
- The object instance comes after the sync block index and the type handle and is the actual object data.
The method table contains metadata that fully describe the particular type. Figure below illustrates the overall layout of the method table
The very first category of data that the type handle points to contains some miscellaneous information about the type itself. Table below illustrates the fields in this category
A method descriptor contains detailed information about a method such as the textual representation of the method, the module it is contained within, the token, and the code address of the code behind the method.
Previously, we explained that an assembly can be viewed as a logical container for one or more code modules. A module then can be viewed as containing the actual code and/or resources for a given component. When traversing various kinds of CLR data structures (such as method tables, method descriptors, etc.), they all typically contain a pointer to the module where they are defined.
At a high level, a metadata token is represented by 4 bytes, as illustrated in Figure below.
The high-order byte represents the table that the token is referencing. Table below outlines the different tables available
The EEClass data structure is best viewed as the logical equivalent of the method table, and as such can be described as a mechanism to enable the self descriptive nature of the CLR type system. Internally, the EEClass and method table are two distinct constructs, but logically they represent the same concept, thus begging the question of why the separation was introduced to begin with. The separation occurred based on how frequently type fields were used by the CLR. Fields that are used quite frequently are stored in the method table, whereas fields that are used less frequently are stored in the EEClass data structure.
Figure below provides an overview of the most key elements of the EEClass data structure
The hierarchical nature of object-oriented languages such as C# is replicated in the EEClass structure. When the CLR loads types, it creates a similar hierarchy of EEClass nodes with parent and sibling pointers, enabling it to traverse the hierarchy in an efficient manner. For the most part, the fields in the EEClass data structure are straightforward. One field of importance is the MethodDesc Chunk field that contains a pointer to the first chunk of method descriptors in the type. This enables you to traverse the method descriptors that are part of any given type. Each chunk also contains a pointer to the next chunk in the chain.