Thread Scheduling, Priorities, and Affinities

Every 20 milliseconds or so (as returned by the second parameter of the GetSystemTimeAdjustment function), Windows looks at all the thread kernel objects currently in existence. Of these objects, only some are considered schedulable. Windows selects one of the schedulable thread kernel objects and loads the CPU’s registers with the values that were last saved in the thread’s context. This action is called a context switch.

Windows is called a preemptive multithreaded operating system because a thread can be stopped at any time and another thread can be scheduled.

Developers frequently ask Jeff. how they can guarantee that their thread will start running within some time period of some event—for example, how can you ensure that a particular thread will start running within 1 millisecond of data coming from the serial port? I have an easy answer: You can’t.

Real-time operating systems can make these promises, but Windows is not a real-time operating system. A real-time operating system requires intimate knowledge of the hardware it is running on so that it knows the latency associated with its hard disk controllers, keyboards, and so on.

In real life, an application must be careful when it calls SuspendThread because you have no idea what the thread might be doing when you attempt to suspend it. If the thread is attempting to allocate memory from a heap, for example, the thread will have a lock on the heap. As other threads attempt to access the heap, their execution will be halted until the first thread is resumed. SuspendThread is safe only if you know exactly what the target thread is (or might be doing) and you take extreme measures to avoid problems or deadlocks caused by suspending the thread.

Windows doesn’t offer any other way to suspend all threads in a process because of race conditions. For example, while the threads are suspended, a new thread might be created. Somehow the system must suspend any new threads during this window of time. Microsoft has integrated this functionality into the debugging mechanism of the system.

A race condition occurs when multiple processes access and manipulate the same data concurrently, and the outcome of the execution depends on the particular order in which the access takes place. A race condition is of interest to a hacker when the race condition can be utilized to gain privileged system access. Consider the following code snippet which illustrates a race condition:

if(access(“/tmp/datafile”,R_OK)==0){

fd=open(“/tmp/datafile);

process(fd);

close(fd);

}

This code creates the temporary file /tmp/datafile and then opens it. The potential race condition occurs between the call to access() and the call to open(). If an attacker can replace the contents of /tmp/datafile between the access() and open() functions, he can manipulate the actions of the program which uses that datafile. This is the race.

You probably understand why Suspending a Process does not work 100 percent of the time: while enumerating the set of threads, new threads can be created and destroyed. So after you take a snapshot from the current threads at the system, a new thread might appear in the target process, which my function will not suspend. Later, when you call SuspendProcess to resume the threads, it will resume a thread that it never suspended. Even worse, while it is enumerating the thread IDs, an existing thread might be destroyed and a new thread might be created, and both of these threads might have the same ID. This would cause the function to suspend some arbitrary thread (probably in a process other than the target process.)

There are a few important things to notice about Sleep function:

  • Calling Sleep allows the thread to voluntarily give up the remainder of its time slice.
  • The system makes the thread not schedulable for approximately the number of milliseconds specified. That’s right—if you tell the system you want to sleep for 100 milliseconds, you will sleep approximately that long, but possibly several seconds or minutes more. Remember that Windows is not a real-time operating system. Your thread will probably wake up at the right time, but whether it does depends on what else is going on in the system.
  • You can call Sleep and pass INFINITE for the dwMilliseconds parameter. This tells the system to never schedule the thread. This is not a useful thing to do. It is much better to have the thread exit and to recover its stack and kernel object.
  • You can pass 0 to Sleep. This tells the system that the calling thread relinquishes the remainder of its time slice, and it forces the system to schedule another thread. However, the system can reschedule the thread that just called Sleep. This will happen if there are no more schedulable threads at the same priority or higher.

Calling SwitchToThread is similar to calling Sleep and passing it a timeout of 0 milliseconds. The difference is that SwitchToThread allows lower-priority threads to execute. Sleep reschedules the calling thread immediately even if lower-priority threads are being starved.

Hyper-threading is a technology available on some Xeon, Pentium 4, and later CPUs. A hyper-threaded processor chip has multiple “logical” CPUs, and each can run a thread. Each thread has its own architectural state (set of registers), but all threads share main execution resources such as the CPU cache. When one thread is paused, the CPU automatically executes another thread; this happens without operating system intervention. A pause is a cache miss, branch misprediction, waiting for results of a previous instruction, and so on. Intel reports that hyper-threaded CPUs improve throughput somewhere between 10 percent to 30 percent, depending on the application and how it is using memory.

Sometimes you want to time how long it takes a thread to perform a particular task. What many people do is write code similar to the following, taking advantage of the new GetTickCount64 function:

// Get the current time (start time).

ULONGLONG qwStartTime = GetTickCount64();

// Perform complex algorithm here.

// Subtract start time from current time to get duration.

ULONGLONG qwElapsedTime = GetTickCount64() – qwStartTime;

This code makes a simple assumption: it won’t be interrupted. However, in a preemptive operating system, you never know when your thread will be scheduled CPU time. When CPU time is taken away from your thread, it becomes more difficult to time how long it takes your thread to perform various tasks. What we need is a function that returns the amount of CPU time that the thread has received.

A CONTEXT structure contains processor-specific register data. The system uses CONTEXT structures to perform various internal operations. Refer to the header file WinNT.h for definitions of these structures.

Starvation occurs when higher-priority threads use so much CPU time that they prevent lower-priority threads from executing.

By the way, when the system boots, it creates a special thread called the zero page thread. This thread is assigned priority 0 and is the only thread in the entire system that runs at priority 0. The zero page thread is responsible for zeroing any free pages of RAM in the system when there are no other threads that need to perform work.

Windows supports six priority classes: idle, below normal, normal, above normal, high, and real-time. Of course, normal is the most common priority class and is used by 99 percent of the applications out there.

A process cannot run in the real-time priority class unless the user has the Increase Scheduling Priority privilege. Any user designated as an administrator or a power user has this privilege by default.

Windows supports seven relative thread priorities: idle, lowest, below normal, normal, above normal, highest, and time-critical.

The concept of a process priority class confuses some people. They think that this somehow means that processes are scheduled. Processes are never scheduled; only threads are scheduled. The process priority class is an abstract concept that Microsoft created to help isolate you from the internal workings of the scheduler; it serves no other purpose.

In general, a thread with a high priority level should not be schedulable most of the time. When the thread has something to do, it quickly gets CPU time. At this point, the thread should execute as few CPU instructions as possible and go back to sleep, waiting to be schedulable again. In contrast, a thread with a low priority level can remain schedulable and execute a lot of CPU instructions to do its work. If you follow these rules, the entire operating system will be responsive to its users.

It might seem odd that the process that creates a child process chooses the priority class at which the child process runs. Let’s consider Windows Explorer as an example. When you use Windows Explorer to run an application, the new process runs at normal priority. Windows Explorer has no idea what the process does or how often its threads need to be scheduled. However, once the child process is running, it can change its own priority class by calling SetPriorityClass.

The system determines the thread’s priority level by combining a thread’s relative priority with the priority class of the thread’s process. This is sometimes referred to as the thread’s base priority level. Occasionally, the system boosts the priority level of a thread—usually in response to some I/O event such as a window message or a disk read.

For example, a thread with a normal thread priority in a high-priority class process has a base priority level of 13. If the user presses a key, the system places a WM_KEYDOWN message in the thread’s queue. Because a message has appeared in the thread’s queue, the thread is schedulable. In addition, the keyboard device driver can tell the system to temporarily boost the thread’s level. So the thread might be boosted by 2 and have a current priority level of 15.

The system boosts only threads that have a base priority level between 1 and 15. In fact, this is why this range is referred to as the dynamic priority range. In addition, the system never boosts a thread into the real-time range (above 15). Because threads in the real-time range perform most operating system functions, enforcing a cap on the boost prevents an application from interfering with the operating system. Also, the system never dynamically boosts threads in the real-time range (16 through 31.)

Another situation causes the system to dynamically boost a thread’s priority level. Imagine a priority 4 thread that is ready to run but cannot because a priority 8 thread is constantly schedulable. In this scenario, the priority 4 thread is being starved of CPU time. When the system detects that a thread has been starved of CPU time for about three to four seconds, it dynamically boosts the starving thread’s priority to 15 and allows that thread to run for twice its time quantum. When the double time quantum expires, the thread’s priority immediately returns to its base priority.

When the user works with windows of a process, that process is said to be the foreground process and all other processes are background processes. Certainly, a user would prefer the process that he or she is using to behave more responsively than the background processes. To improve the responsiveness of the foreground process, Windows tweaks the scheduling algorithm for threads in the foreground process. The system gives foreground process threads a larger time quantum than they would usually receive. This tweak is performed only if the foreground process is of the normal priority class. If it is of any other priority class, no tweaking is performed.

By default, Windows Vista uses soft affinity when assigning threads to processors. This means that if all other factors are equal, it tries to run the thread on the processor it ran on last. Having a thread stay on a single processor helps reuse data that is still in the processor’s memory cache.

When you can control which CPUs can run certain threads. This is called hard affinity.

In most environments, altering thread affinities interferes with the scheduler’s ability to effectively migrate threads across CPUs that make the most efficient use of CPU time. The following table shows an example

When Thread A wakes, the scheduler sees that the thread can run on CPU 0 and it is assigned to CPU 0. Thread B then wakes, and the scheduler sees that the thread can be assigned to CPU 0 or 1, but because CPU 0 is in use, the scheduler assigns it to CPU 1. So far, so good.

Now Thread C wakes, and the scheduler sees that it can run only on CPU 1. But CPU 1 is in use by Thread B, a priority 8 thread. Because Thread C is a priority 6 thread, it can’t preempt Thread B. Thread C can preempt Thread A, a priority 4 thread, but the scheduler will not preempt Thread A because Thread C can’t run on CPU 0.

This demonstrates how setting hard affinities for threads can interfere with the scheduler’s priority scheme

You can also set processor affinity in the header of an executable file. Oddly, there doesn’t seem to be a linker switch for this, but you can use code similar to this that takes advantage of functions declared in ImageHlp.h:

// Load the EXE into memory.

PLOADED_IMAGE pLoadedImage = ImageLoad(szExeName, NULL);

// Get the current load configuration information for the EXE.

IMAGE_LOAD_CONFIG_DIRECTORY ilcd;

GetImageConfigInformation(pLoadedImage, &ilcd);

// Change the processor affinity mask.

ilcd.ProcessAffinityMask = 0x00000003; // I desire CPUs 0 and 1

// Save the new load configuration information.

SetImageConfigInformation(pLoadedImage, &ilcd);

// Unload the EXE from memory

ImageUnload(pLoadedImage);

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s