Friday, August 19, 2011

Asynchronous (overlapped/nonblocing) IO


Using a feature in Win32 called overlapped IO, it is possible to set up all the IO operations to run concurrently and the program will be notified as each operation completes. The implementation of overlapped IO uses threads inside the kernel to do the work.  There are two situations where overlapped IO will always be performed synchronously.
ØDoing a write operation that causes a file to be extended.
Øreading or writing a compressed file
 ·   Allows several I/O calls to be pending at the same time.
·   Usually an asynchronous I/O call returns immediately, leaving the I/O call itself pending.
·   How do applications obtain the result of the I/O call? – There are four ways of using asynchronous I/O.
ØUsing events 
ØUsing the GetOverlappedResult function 
ØUsing asynchronous procedure calls (or APCs)
ØUsing I/O completion ports - particularly important because they are the only mechanism that is suited for high-volume servers that must maintain many simultaneous connections. Uses one active thread per processor.

IOCompletionPort – IOCP
·   I/O completion ports provide an efficient threading model for processing multiple asynchronous I/O requests on a multiprocessor system.
·   IOCP - very special kind of kernel object that coordinates how a pool of threads services overlapped requests, even across multiple processors.
·   IOCP allows to decouple the thread that starts an overlapped request from the thread that services it.
·   When a process creates an I/O completion port, the system creates an associated queue object for requests whose sole purpose is to service these requests.
·   IOCP allows an application to use a pool of threads that are created to process asynchronous I/O requests. This prevents the application from creating one thread per client which can have severe performance issues.
·    The overlapped I/O mechanism in Win32 allows an application to initiate an operation and receive notification of its completion later. The thread that initiates the overlapped operation is then free to do other things while the overlapped request completes behind the scenes.
·   The only I/O model that provides true scalability on Windows NT and Windows 2000 is overlapped I/O using completion ports for notification.
·    A completion port is a queue into which the operating system puts notifications of completed overlapped I/O requests.

How I/O Completion Ports Work
·   The CreateIoCompletionPort function creates an I/O completion port and associates one or more file handles with that port.
File handle - system abstraction representing an overlapped I/O endpoint.  Eg: file on disk, network endpoint, TCP socket, named pipe, or mail slot.
·   When an asynchronous I/O operation on one of these file handles completes, an I/O completion packet is queued in Last-in-first-out (FIFO) order to the associated I/O completion port.
·   Once the completion port has been created and sockets have been associated with it, one or more threads are needed to process the completion notifications.
·   Unlike some other operating systems, the Windows NT and Windows 2000 transport protocols do not have a sockets-style interface which applications can use to talk to them directly. Instead, they implement a much more general API called the Transport Driver Interface (TDI).
·   The Winsock kernel mode driver provides the sockets emulation (currently implemented in AFD.SYS).

Overview of Operation
  •    To use an IO completion port, application creates a bunch of threads that all wait on the IO completion port.  These threads become the "pool" of threads that can take care of completed overlapped IO requests.
  • A thread implicitly becomes part of the pool by waiting on the IO completion port.
  •  Every time a new file is opened for overlapped IO, you associate its file handle with the IO completion port.
  • Once this association is established, any file operation that completes successfully will cause an IO completion packet to be sent to the completion port.
  •  This happens inside the operating system and is transparent to the program.
  • In response to the IO completion packet, the completion port releases one of the waiting threads in the pool.
  • The completion port does not create new threads if no threads are currently waiting.
  • The released thread is given enough information to be able to identify the context of the completed overlapped IO operation.
  •  The thread can then go off and handle the request as necessary, but it remains in the pool of threads that is assigned to the completion port.  
  • The difference is that the thread becomes an active thread and not a waiting thread. When the thread is done handling the overlapped VO request, it should wait on the IO completion port again. 


Resource constraints that an application encounters when using Winsock
·   bandwidth of the network on which the application is sending data. -bandwidth management method are application-dependent
·    Virtual memory used by the application - use the SetWorkingSetSize Win32 API to increase the amount of physical memory the operating system will let it use
·   locked page limit - Whenever an application posts a send or receive, and AFD.SYS's buffering is disabled, all pages in the buffer are locked into physical memory. They need to be locked because the memory will be accessed by kernel-mode drivers and cannot be paged out for the duration of the access. The goal is to prevent an ill-behaved application from locking up all of the physical RAM and bringing down the system. This means that your application must be conscious of hitting a system-defined limit on the number of pages locked in memory.
·   system non-paged pool limit –

 Handling the resource constraints is complicated by the fact that there is no special error code returned when either of the conditions is encountered.

 Advantages:
·   IOCP using WinSock is very useful, robust, and scalable mechanism.
·   Completion ports and Windows Sockets 2.0 can be used to design applications that will scale to thousands of connections.
·   Mechanisms like the WSAAsyncSelect and select functions are provided for easier porting from Windows 3.1 and Unix respectively, but are not designed to scale.
·   The completion port mechanism is optimized for the operating system's internal workings.
·   There is no limit to the number of handles that can be used with an IOcompletion port.
·   IO completion ports allow one thread to queue a request and another thread to service it.


No comments:

Post a Comment