Synchronous and Asynchronous Device IO

Common windows I/O devises:

This article discusses how an application’s threads communicate with these devices without waiting for the devices to respond

Below is way of haw to get handle of I/O device

if you have a handle to a device, you can find out what type of device it is by calling GetFileType:

DWORD GetFileType(HANDLE hDevice);

All you do is pass to the GetFileType function the handle to a device, and the function returns one of the values listed in next table:

To manage a file, the cache manager must maintain some internal data structures for the file—the larger the file, the more data structures required. When working with extremely large files, the cache manager might not be able to allocate the internal data structures it requires and will fail to open the file. To access extremely large files, you must open the file using the FILE_FLAG_NO_BUFFERING flag.

Because device I/O is slow when compared with most other operations, you might want to consider communicating with some devices asynchronously. Here’s how it works: Basically, you call a function to tell the operating system to read or write data, but instead of waiting for the I/O to complete, your call returns immediately, and the operating system completes the I/O on your behalf using its own threads. When the operating system has finished performing your requested I/O, you can be notified. Asynchronous I/O is the key to creating high-performance, scalable, responsive, and robust applications.

Most Windows functions that return a handle return NULL when the function fails. However, CreateFile returns INVALID_HANDLE_VALUE (defined as –1) instead. You may see code like this, which is incorrect

HANDLE hFile = CreateFile(…);

if (hFile == NULL) {

// We’ll never get in here

} else {

// File might or might not be created OK

}

Here’s the correct way to check for an invalid file handle:

HANDLE hFile = CreateFile(…);

if (hFile == INVALID_HANDLE_VALUE) {

// File not created

} else {

// File created OK

}

The first issue you must be aware of is that Windows was designed to work with extremely large files. Instead of representing a file’s size using 32-bit values, the original Microsoft designers chose to use 64-bit values. This means that theoretically a file can reach a size of 16 EB (exabytes).

Dealing with 64-bit values in a 32-bit operating system makes working with files a little unpleasant because a lot of Windows functions require you to pass a 64-bit value as two separate 32-bit values. But as you’ll see, working with the values is not too difficult and, in normal day-to-day operations, you probably won’t need to work with a file greater than 4 GB. This means that the high 32 bits of the file’s 64-bit size will frequently be 0 anyway.

The idea behind how the windows treats 32-bit applications as 64-bit application is pretty easy. Imagine a union like this:

typedef union _ULARGE_INTEGER {

struct {

DWORD LowPart; // Low 32-bit unsigned value

DWORD HighPart; // High 32-bit unsigned value

};

ULONGLONG QuadPart; // Full 64-bit unsigned value

} ULARGE_INTEGER, *PULARGE_INTEGER;

So, when you are working on a 32-bit application, the HighPart is going to be 0.

If you opened the same file twice each open will has its own file pointer.

Windows does not offer a GetFilePointerEx function, but you can use SetFilePointerEx to move the pointer by 0 bytes to get the desired effect, as shown in the following code snippet:

LARGE_INTEGER liCurrentPosition = { 0 };

SetFilePointerEx(hFile, liCurrentPosition, &liCurrentPosition, FILE_CURRENT);

Functions that do synchronous I/O are easy to use, but they block any other operations from occurring on the thread that issued the I/O until the request is completed. A great example of this is a CreateFile operation. When a user performs mouse and keyboard input, window messages are inserted into a queue that is associated with the thread that created the window that the input is destined for. If that thread is stuck inside a call to CreateFile, waiting for CreateFile to return, the window messages are not getting processed and all the windows created by the thread are frozen. The most common reason why applications hang is because their threads are stuck waiting for synchronous I/O operations to complete!

To build a responsive application, you should try to perform asynchronous I/O operations as much as possible. This typically also allows you to use very few threads in your application, thereby saving resources (such as thread kernel objects and stacks). Also, it is usually easy to offer your users the ability to cancel an operation when you initiate it asynchronously. For example, Internet Explorer allows the user to cancel (via a red X button or the Esc key) a Web request if it is taking too long and the user is impatient.

Basic of Asynchronous Device I/O

Compared to most other operations carried out by a computer, device I/O is one of the slowest and most unpredictable. The CPU performs arithmetic operations and even paints the screen much faster than it reads data from or writes data to a file or across a network. However, using asynchronous device I/O enables you to better use resources and thus create more efficient applications.
Consider a thread that issues an asynchronous I/O request to a device. This I/O request is passed to a device driver, which assumes the responsibility of actually performing the I/O. While the device driver waits for the device to respond, the application’s thread is not suspended as it waits for the I/O request to complete. Instead, this thread continues executing and performs other useful tasks.

You should be aware of a couple of issues when performing asynchronous I/O. First, the device driver doesn’t have to process queued I/O requests in a first-in first-out (FIFO) fashion. For example, if a thread executes the following code, the device driver will quite possibly write to the file and then read from the file:

OVERLAPPED o1 = { 0 };

OVERLAPPED o2 = { 0 };

BYTE bBuffer[100];

ReadFile (hFile, bBuffer, 100, NULL, &o1);

WriteFile(hFile, bBuffer, 100, NULL, &o2);

A device driver typically executes I/O requests out of order if doing so helps performance. For example, to reduce head movement and seek times, a file system driver might scan the queued I/O request list looking for requests that are near the same physical location on the hard drive.

The second issue you should be aware of is the proper way to perform error checking. Most Windows functions return FALSE to indicate failure or nonzero to indicate success. However, the ReadFile and WriteFile functions behave a little differently. An example might help to explain.

When attempting to queue an asynchronous I/O request, the device driver might choose to process the request synchronously. This can occur if you’re reading from a file and the system checks whether the data you want is already in the system’s cache. If the data is available, your I/O request is not queued to the device driver; instead, the system copies the data from the cache to your buffer, and the I/O operation is complete. The driver always performs certain operations synchronously, such as NTFS file compression, extending the length of a file or appending information to a file. For more information about operations that are always performed synchronously.

of the most common bugs developers introduce when implementing an asynchronous device I/O architecture. Here’s an example of what not to do:

VOID ReadData(HANDLE hFile) {

OVERLAPPED o = { 0 };

BYTE b[100];

ReadFile(hFile, b, 100, NULL, &o);

}

This code looks fairly harmless, and the call to ReadFile is perfect. The only problem is that the function returns after queuing the asynchronous I/O request. Returning from the function essentially frees the buffer and the OVERLAPPED structure from the thread’s stack, but the device driver is not aware that ReadData returned. The device driver still has two memory addresses that point to the thread’s stack. When the I/O completes, the device driver is going to modify memory on the thread’s stack, corrupting whatever happens to be occupying that spot in memory at the time. This bug is particularly difficult to find because the memory modification occurs asynchronously. Sometimes the device driver might perform I/O synchronously, in which case you won’t see the bug. Sometimes the I/O might complete right after the function returns, or it might complete over an hour later, and who knows what the stack is being used for then.

Receiving Completed I/O Request Notifications

Windows offers four different methods (briefly described in Table 10-9) for receiving I/O completion notifications, and this chapter covers all of them. The methods are shown in order of complexity, from the easiest to understand and implement (signaling a device kernel object) to the hardest to understand and implement (I/O completion ports).

Whenever a thread is created, the system also creates a queue that is associated with the thread. This queue is called the asynchronous procedure call (APC) queue. When issuing an I/O request, you can tell the device driver to append an entry to the calling thread’s APC queue. To have completed I/O notifications queued to your thread’s APC queue.

The APC queue is maintained internally by the system. You’ll also notice from the list that the system can execute your queued I/O requests in any order, and that the I/O requests that you issue last might be completed first and vice versa. Each entry in your thread’s APC queue contains the address of a callback function and a value that is passed to the function.

I/O Completion Ports

Windows is designed to be a secure, robust operating system running applications that service literally thousands of users. Historically, you’ve been able to architect a service application by following one of two models:

  • Serial model A single thread waits for a client to make a request (usually over the network). When the request comes in, the thread wakes and handles the client’s request.
  • Concurrent model A single thread waits for a client request and then creates a new thread to handle the request. While the new thread is handling the client’s request, the original thread loops back around and waits for another client request. When the thread that is handling the client’s request is completely processed, the thread dies.

The problem with the serial model is that it does not handle multiple, simultaneous requests well. If two clients make requests at the same time, only one can be processed at a time; the second request must wait for the first request to finish processing. A service that is designed using the serial approach cannot take advantage of multiprocessor machines. Obviously, the serial model is good only for the simplest of server applications, in which few client requests are made and requests can be handled very quickly. A Ping server is a good example of a serial server.

Service applications using the concurrent model were implemented using Windows. The Windows team noticed that application performance was not as high as desired. In particular, the team noticed that handling many simultaneous client requests meant that many threads were running in the system concurrently. Because all these threads were runnable (not suspended and waiting for something to happen), Microsoft realized that the Windows kernel spent too much time context switching between the running threads, and the threads were not getting as much CPU time to do their work. To make Windows an awesome server environment, Microsoft needed to address this problem. The result is the I/O completion port kernel object.

As you would expect, entries are removed from the I/O completion queue in a first-in first-out fashion. However, as you might not expect, threads that call GetQueuedCompletionStatus are awakened in a last-in first-out (LIFO) fashion. The reason for this is again to improve performance. For example, say that four threads are waiting in the waiting thread queue. If a single completed I/O entry appears, the last thread to call GetQueuedCompletionStatus wakes up to process the entry. When this last thread is finished processing the entry, the thread again calls GetQueuedCompletionStatus to enter the waiting thread queue. Now if another I/O completion entry appears, the same thread that processed the first entry is awakened to process the new entry.

As long as I/O requests complete so slowly that a single thread can handle them, the system just keeps waking the one thread, and the other three threads continue to sleep. By using this LIFO algorithm, threads that don’t get scheduled can have their memory resources (such as stack space) swapped out to the disk and flushed from a processor’s cache. This means having many threads waiting on a completion port isn’t bad. If you do have several threads waiting but few I/O requests completing, the extra threads have most of their resources swapped out of the system anyway

Now it’s time to discuss why I/O completion ports are so useful. First, when you create the I/O completion port, you specify the number of threads that can run concurrently. As I said, you usually set this value to the number of CPUs on the host machine. As completed I/O entries are queued, the I/O completion port wants to wake up waiting threads. However, the completion port wakes up only as many threads as you have specified. So if four I/O requests complete and four threads are waiting in a call to GetQueuedCompletionStatus, the I/O completion port will allow only two threads to wake up; the other two threads continue to sleep. As each thread processes a completed I/O entry, the thread again calls GetQueuedCompletionStatus. The system sees that more entries are queued and wakes the same threads to process the remaining entries.

If you’re thinking about this carefully, you should notice that something just doesn’t make a lot of sense: if the completion port only ever allows the specified number of threads to wake up concurrently, why have more threads waiting in the thread pool? For example, suppose I’m running on a machine with two CPUs and I create the I/O completion port, telling it to allow no more than two threads to process entries concurrently. But I create four threads (twice the number of CPUs) in the thread pool. It seems as though I am creating two additional threads that will never be awakened to process anything.

But I/O completion ports are very smart. When a completion port wakes a thread, the completion port places the thread’s ID in the fourth data structure associated with the completion port, a released thread list. This allows the completion port to remember which threads it awakened and to monitor the execution of these threads. If a released thread calls any function that places the thread in a wait state, the completion port detects this and updates its internal data structures by moving the thread’s ID from the released thread list to the paused thread list (the fifth and final data structure that is part of an I/O completion port

Let’s tie all of this together now. Say that we are again running on a machine with two CPUs. We create a completion port that allows no more than two threads to wake concurrently, and we create four threads that are waiting for completed I/O requests. If three completed I/O requests get queued to the port, only two threads are awakened to process the requests, reducing the number of runnable threads and saving context-switching time. Now if one of the running threads calls Sleep, WaitForSingleObject, WaitForMultipleObjects, SignalObjectAndWait, a synchronous I/O call, or any function that would cause the thread not to be runnable, the I/O completion port would detect this and wake a third thread immediately. The goal of the completion port is to keep the CPUs saturated with work.

Eventually, the first thread will become runnable again. When this happens, the number of runnable threads will be higher than the number of CPUs in the system. However, the completion port again is aware of this and will not allow any additional threads to wake up until the number of threads drops below the number of CPUs. The I/O completion port architecture presumes that the number of runnable threads will stay above the maximum for only a short time and will die down quickly as the threads loop around and again call GetQueuedCompletionStatus. This explains why the thread pool should contain more threads than the concurrent thread count set in the completion port

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s