Shared Assemblies and Strongly Named Assemblies (CLR via C#)

The CLR supports two kinds of assemblies: weakly named assemblies and strongly named assemblies.

The real difference between weakly named and strongly named assemblies is that a strongly named assembly is signed with a publisher’s public/private key pair that uniquely identifies the assembly’s publisher.

An assembly can be deployed in two ways: privately or globally. A privately deployed assembly is an assembly that is deployed in the application’s base directory or one of its subdirectories. A weakly named assembly can be deployed only privately.

A strongly named assembly consists of four attributes that uniquely identify the assembly: a file name (without an extension), a version number, a culture identity, and a public key. Since public keys are very large numbers, we frequently use a small hash value derived from a public key. This hash value is called a public key token.

The following figure shows how PE is signed

Because public keys are such large numbers, and a single assembly might reference many assemblies, a large percentage of the resulting file’s total size would be occupied with public key information. To conserve storage space, Microsoft hashes the public key and takes the last8 bytes of the hashed value. These reduced public key values—known as public key tokens—are what are actually stored in an AssemblyRef table. In general, developers and end users will see public key token values much more frequently than full public key values. Note, however, that the CLR never uses public key tokens when making security or trust decisions because it is possible that several public keys could hash to a single public key token.

The Global Assembly Cash (GAC):

If an assembly is to be accessed by multiple applications, the assembly must be placed into a well-known directory, and the CLR must know to look in this directory automatically when a reference to the assembly is detected. This well-known location is called the global assembly cache (GAC), which can usually be found in the following directory (assuming that Windows is installed in the C:\Windows directory):

C:\Windows\Assembly

The GAC directory is structured: It contains many subdirectories, and an algorithm is used to generate the names of these subdirectories. You should never manually copy assembly files into the GAC; instead, you should use tools to accomplish this task. These tools know the GAC’s internal structure and how to generate the proper subdirectory names.

The most common tool for installing strongly named assemblies into the GAC is GACUtil.exe

What is the purpose of “registering” an assembly in the GAC? Well, say two companies each produce an OurLibrary assembly consisting of one file: OurLibrary.dll. Obviously, both of these files can’t go in the same directory because the last one installed would overwrite the first one, surely breaking some application. When you install an assembly into the GAC, dedicated subdirectories are created under the C:\Windows\Assembly directory, and the assembly files are copied into one of these subdirectories.

Consider using delayed signing if you want to install your assemblies to the CAG in development environment.

Figure below illustrates how CLR resolves a referenced type

The above figure is not correct case if the references type is in the .NET Framework assemblies. In this case, CLR loads the file that matches CLR version.

Type Forwarding:

The CLR supports the ability to move a type (class, structure, enum, interface, or delegate) from one assembly to another. For example, in .NET 3.5, the System.TimeZoneInfo class is defined in the System.Core.dll assembly. But in .NET 4.0, Microsoft moved this class to the MSCorLib.dll assembly. Normally, moving a type from one assembly to another would break applications. However, the CLR offers a System.Runtime.CompilerServices.TypeForwardedToAttribute attribute, which can be applied to the original assembly (such asSystem.Core.dll). The parameter that you pass to this attribute’s constructor is of type System.Type and it indicates the new type (that is now defined in MSCorLib.dll) that applications should now use. The CLR’s binder uses this information. Since the TypeForwardedToAttribute’s constructor takes a Type, the assembly containing this attribute will be dependent on the new assembly defining the type. If you take advantage of this feature, then you should also apply the System.Runtime.CompilerServices.TypeForwardedFromAttribute attribute to the type in the new assembly and pass to this attribute’s constructor a string with the full name of the assembly that used to define the type. This attribute typically is used for tools, utilities, and serialization. Since the TypeForwardedFromAttribute’s constructor takes a String, the assembly containing this attribute is not dependent on the assembly that used to define the type.

Publisher Control Policy:

Microsoft offers an XML config file that is used to ease the versioning of any assembly. Simply you (as a publisher for the assembly) can port the new version of your assembly with config file which will tell CLR to load the new assembly (say version 2.0) instead of the previous version (1.0). This is done automatically without any end user interaction.

Further if the end user wants to use the previous version for some reasons and ignores the publishers control policy, he can edit his application configuration file to disable the publisher control policy. Doing this for each application you’ve is not practicl so the solution is to edit the Machine.Config file to apply these changes.

Building, Packaging, Deploying, and Administering Applications and Types (CLR via C#)

A managed PE file has four main parts:

  1. PE32(+) header
    The PE32(+) header is the standard information that Windows expects
  2. CLR header
    The CLR header is a small block of information that is specific to modules that require the CLR (managed modules). The header includes the major and minor version number of the CLR that the module was built for: some flags, a MethodDef token indicating the module’s entry point method if this module is a CUI or GUI executable, and an optional strong-name digital signature. Finally, the header contains the size and offsets of certain metadata tables contained within the module. You can see the exact format of the CLR header by examining the IMAGE_COR20_HEADER defined in the CorHdr.h header file.
  3. Metadata
    The metadata is a block of binary data that consists of several tables. There are three categories of tables: definition tables, reference tables, and manifest tables. Table below describes some of the more common definition tables that exist in a module’s metadata block.

    Common Reference Metadata Tables

    an assembly is a unit of reuse, versioning, and security. It allows you to partition your types and resources into separate files so that you, and consumers of your assembly, get to determine which files to package together and deploy. Once the CLR loads the file containing the manifest, it can determine which of the assembly’s other files contain the types and resources the application is referencing. Anyone consuming the assembly is required to know only the name of the file containing the manifest; the file partitioning is then abstracted away from the consumer and can change in the future without breaking the application’s behavior.
    Below is Manifest Metadata tables

    To make your own assemblies appear in the .NET tab’s list, add the following subkey to the registry:
    HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\.NETFramework\AssemblyFolders\MyLibName
    MyLibName is a unique name that you create—Visual Studio doesn’t display this name. After creating the subkey, change its default string value so that it refers to a directory path (such as C:\Program Files\MyLibPath) containing your assembly’s files. Using HKEY_LOCAL_MACHINE adds the assemblies for all users on a machine; use HKEY_CURRENT_USER instead to add the assemblies for a specific user.
  4. IL.

Culture

If you’re designing an application that has some culture-specific resources to it, Microsoft highly recommends that you create one assembly that contains your code and your application’s default (or fallback) resources. When building this assembly, don’t specify a culture. This is the assembly that other assemblies will reference when they create and manipulate types it publicly exposes.

Now you can create one or more separate assemblies that contain only culture-specific resources—no code at all. Assemblies that are marked with a culture are called satellite assemblies. For these satellite assemblies, assign a culture that accurately reflects the culture of the resources placed in the assembly. You should create one satellite assembly for each culture you intend to support.

The CLR’s Execution Model

Compiling Source Code into Managed Modules:

The common language runtime (CLR) is just what its name says it is: a runtime that is usable by different and varied programming languages. The core features of the CLR (such as memory management, assembly loading, security, exception handling, and thread synchronization) are available to any and all programming languages that target it.

In fact, at runtime, the CLR has no idea which programming language the developer used for the source code! By the way, managed assemblies always take advantage of Data Execution Prevention (DEP) and Address Space Layout Randomization (ASLR) in Windows; these two features improve the security of your whole system.

Figure below describes the CLR compilation process

Table blow describes parts of a managed module

Combining Managed Modules into Assemblies:

The CLR doesn’t actually work with modules, it works with assemblies. An assembly is an abstract concept that can be difficult to grasp initially. First, an assembly is a logical grouping of one or more modules or resource files. Second, an assembly is the smallest unit of reuse, security, and versioning. Depending on the choices you make with your compilers or tools, you can produce a single-file or a multi-file assembly. In the CLR world, an assembly is what we would call a component.

Figure below should help explain what assemblies are about. In this figure, some managed modules and resource (or data) files are being processed by a tool. This tool produces a single PE32(+) file that represents the logical grouping of files. What happens is that this PE32(+) file contains a block of data called the manifest. The manifest is simply another set of metadata tables. These tables describe the files that make up the assembly, the publicly exported types implemented by the files in the assembly, and the resource or data files that are associated with the assembly.

Assembly can do the following for you:

  • Allows you to decouple the logical and physical notions of a reusable, securable, versionable component.
  • Assemblies are self-descriptive, so their deployment is very easy.

Loading the Common Language Runtime:

Developers who want to write code that works only on a specific version of Windows might do this when using unsafe code or when interoperating with unmanaged code that is targeted to a specific CPU architecture. To aid these developers, the C# compiler offers a /platform command-line switch. This switch allows you to specify whether the resulting assembly can run on x86 machines running 32-bit Windows versions only, x64 machines running 64-bit Windows only, or Intel Itanium machines running 64-bit Windows only. If you don’t specify a platform, the default is anycpu, which indicates that the resulting assembly can run on any version of Windows.

64-bit versions of Windows offer a technology that allows 32-bit Windows applications to run. This technology is called WoW64 (for Windows on Windows64). This technology even allows 32-bit applications with x86 native code in them to run on an Itanium machine, because the WoW64 technology can emulate the x86 instruction set; albeit with a significant performance cost.

Process of loading is describes as:

  1. Windows examines EXE file’s header to determine whether the application requires a 32-bit or 64-bit address space.
  2. Windows also checks the CPU architecture information embedded inside the header to ensure that it matches the CPU type in the computer.
  3. After Windows has examined the EXE file’s header to determine whether to create a 32-bit process, a 64-bit process, or a WoW64 process, Windows loads the x86, x64, or IA64 version of MSCorEE.dll into the process’s address space
  4. Then, the process’s primary thread calls a method defined inside MSCorEE.dll. This method initializes the CLR, loads the EXE assembly, and then calls its entry point method (Main).

Executing your Assembly Code:

To execute a method, its IL must first be converted to native CPU instructions. This is the jobof the CLR’s JIT (just-in-time) compiler. Figure below shows what the happens when WriteLine is called for the first time.

Figure below shows what the process looks like when WriteLine is called the second time.

A performance hit is incurred only the first time a method is called. All subsequent calls tothe method execute at the full speed of the native code because verification and compilationto native code don’t need to be performed again.

Benefits of CLR and managed code over unmanaged code:

  • Write optimal code depending on the current machine architecture.
  • A JIT compiler can determine when a certain test is always false on the machine that itis running on. For example, consider a method that contains the following code:
    if (numberOfCPUs> 1) { ….}
    This code could cause the JIT compiler to not generate any CPU instructions if the hostmachine has only one CPU. In this case, the native code would be fine-tuned for thehost machine; the resulting code is smaller and executes faster.
  • The CLR could profile the code’s execution and recompile the IL into native code whilethe application runs. The recompiled code could be reorganized to reduce incorrectbranch predictions depending on the observed execution patterns. Current versions ofthe CLR do not do this, but future versions might.
  • Verification:
    • Verifies the code and make sure that there are no security problems.
    • Ability to runmultiple managed applications in a single Windows virtual address space. That will save a lot of OS resources.

The Native Code Generator Tool: NGen.exe:

The NGen.exe tool that ships with the .NET Framework can be used to compile IL code tonative code when an application is installed on a user’s machine. Since the code is compiledat install time, the CLR’s JIT compiler does not have to compile the IL code at runtime, andthis can improve the application’s performance. The NGen.exe tool is interesting in twoscenarios:

  1. Improving an application’s startup time.
  2. Reducing an application’s working set.

The compiled files using NGen.exe can be found under the directory

C:\Windows\Assembly\NativeImages_v4.0.#####_64

The directory name includes theversion of the CLR and information denoting whether the native code is compiled for x86(32-bit version of Windows), x64, or Itanium (the latter two for 64-bit versions of Windows).

Now, whenever the CLR loads an assembly file, the CLR looks to see if a correspondingNGen’d native file exists. If a native file cannot be found, the CLR JIT compiles the IL code asusual.

There are several potential problems with respect to NGen’d files:

  1. No intellectual property protection (especially in customer side applications)
  2. NGen’d files can get out of sync:
    When the CLR loads an NGen’d file, it compares anumber of characteristics about the previously compiled code and the current execution environment. If any of the characteristics don’t match, the NGen’d file cannot beused, and the normal JIT compiler process is used instead. Here is a partial list of characteristicsthat must match:
    1. CLR version: this changes with patches or service packs
    2. CPU type: this changes if you upgrade your processor hardware.
    3. Windows OS version: these changes with a new service pack update.
    4. Assembly’s identity module version ID (MVID): this changes when recompiling.
    5. Referenced assembly’s version IDs: this changes when you recompile a referencedassembly.
    6. Security: this changes when you revoke permissions (such as declarative inheritance,declarative link-time, SkipVerification, or UnmanagedCode permissions),that were once granted.
  3. Inferior execution-time performance

For sure it doesn’t make sense to use NGen.exe with server-side services.

Interoperability with Unmanaged Code:

The CLR supports these interoperability scenarios

  1. Managed code can call unmanaged function in a DLL: using P/Invoke mechanism.
  2. Managed code can use an existing COM component (server)

Unmanaged code can use a managed type (server)