https://windows-internals.com/i-o-rings-when-one-i-o-operation-is-not-enough/ Skip to content Winsider Seminars & Solutions Inc. Windows Internals Training & Consulting * Home * About * Seminars * Testimonials * F.A.Q. * Internals Blog * Sitemap * Contact * + Back I/O Rings - When One I/O Operation is Not Enough Posted byYarden Shafir May 24, 2021May 24, 2021 Introduction I usually write about security features or techniques on Windows. But today's blog is not directly related to any security topics, other than the usual added risk that any new system call introduces. However, it's an interesting addition to the I/O world in Windows that could be useful for developers and I thought it would be interesting to look into and write about. All this is to say - if you're looking for a new exploit or EDR bypass technique, you should save yourselves the time and look at the other posts on this website instead. For the three of you who are still reading, let's talk about I/O rings! I/O ring is a new feature on Windows preview builds. This is the Windows implementation of a ring buffer - a circular buffer, in this case used to queue multiple I/O operations simultaneously, to allow user-mode applications performing a lot of I/O operations to do so in one action instead of transitioning from user to kernel and back for every individual request. This new feature adds a lot of new functions and internal data structures, so to avoid constantly breaking the flow of the blog with new data structures I will not put them as part of the post, but their definitions exist in the code sample at the end. I will only show a few internal data structures that aren't used in the code sample. I/O Ring Usage The current implementation of I/O rings only supports read operations and allows queuing up to 0x10000 operations at a time. For every operation the caller will need to supply a handle to the target file, an output buffer, an offset into the file and amount of memory to be read. This is all done in multiple new data structures that will be discussed later. But first, the caller needs to initialize its I/O ring. Create and Initialize an I/O Ring To do that, the system supplies a new system call - NtCreateIoRing. This function creates an instance of a new IoRing object type, described here as IORING_OBJECT: typedef struct _IORING_OBJECT { USHORT Type; USHORT Size; IORING_INFO Info; PSECTION SectionObject; PVOID KernelMappedBase; PMDL Mdl; PVOID MdlMappedBase; ULONG_PTR ViewSize; ULONG SubmitInProgress; PVOID IoRingEntryLock; PVOID EntriesCompleted; PVOID EntriesSubmitted; KEVENT RingEvent; PVOID EntriesPending; ULONG BuffersRegistered; PIORING_BUFFER_INFO BufferArray; ULONG FilesRegistered; PHANDLE FileHandleArray; } IORING_OBJECT, *PIORING_OBJECT; NtCreateIoRing receives one new structure as an input argument - IO_RING_STRUCTV1. This structure contains information about current version, which currently can only be 1, required and advisory flags (both don't currently support any values other than 0) and the requested size for the submission queue and completion queue. The function receives this information and does the following things: 1. Validates all the input and output arguments - their addresses, size alignment, etc. 2. Checks the requested submission queue size and calculate the amount of memory needed for the submission queue based on the requested number of entries. 1. If SubmissionQueueSize is over 0x10000 a new error status 0xC0460004 gets returned. This, and the other error codes added for I/O rings, were not added to the SDK yet so we'll call this one STATUS_SUB_QUEUE_SIZE_TOO_LARGE. 3. Checks the completions queue size and calculates the amount of memory needed for it. 1. The completion queue is limited to 0x20000 entries and error code 0xC0460005 (named here STATUS_COMP_QUEUE_SIZE_TOO_LARGE) is returned if a larger number is requested. 4. Creates a new object of type IoRingObjectType and initializes all fields that can be initialized at this point - flags, submission queue size and mask, event, etc. 5. Creates a section for the queues, maps it in system space and creates an MDL to back it. Then maps the same section in user-space. This section will contain the submission space and completion space and will be used by the application to communicate the parameters for all requested I/O operations with the kernel and receive the status codes. 6. Initializes the output structure with the submission queue address and other data to be returned to the caller. After NtCreateIoRing returns successfully, the caller can write its data into the supplied submission queue. The queue will have a queue head, followed by an array of NT_IORING_SQE structures, each representing one requested I/O operation. The header describes which entries should be processed at this time: [ioring_submission_queue] The queue header describes which entries should be processes using the QueueIndex and QueueCount fields. QueueIndex specifies the index of the first entry to be processes, and QueueCount specifies the amount of entries. QueueIndex + QueueCount has to be lower that total number of entries. Each queue entry contains data about the requested operation: file handle, file offset, output buffer base, offset and amount of data to be read. It also contains an OpCode field to specify the requested operation. I/O Ring Operation Codes There are four possible operation types that can be requested by the caller: 1. IORING_OP_READ: requests that the system reads data from a file into an output buffer. The file handle will be read from the FileRef field in the submission queue entry. This will either be interpreted as a file handle or as an index into a pre-registered array of file handles, depending on whether the IORING_SQE_PREREGISTERED_FILE flag (1) is set in the queue entry Flags field. The output will be written into an output buffer supplied in the Buffer field of the entry. Similar to FileRef, this field can instead contain an index into a pre-registered array of output buffers if the IORING_SQE_PREREGISTERED_BUFFER flag (2) is set. 2. IORING_OP_REGISTERED_FILES: requests pre-registration of file handles to be processed later. In this case the Buffer field of the queue entry points to an array of file handles. The requested file handles will get duplicated and placed in a new array, in the FileHandleArray field of the queue entry. The FilesRegistered field will contain the number of file handles. 3. IORING_OP_REGISTERED_BUFFERS: requests pre-registration of output buffers for file data to be read into. In this case, the Buffer field in the entry should contain an array of IORING_BUFFER_INFO structures, describing addresses and sizes of buffers into which file data will be read: typedef struct _IORING_BUFFER_INFO { PVOID Address; ULONG Length; } IORING_BUFFER_INFO, *PIORING_BUFFER_INFO; The buffers' addresses and sizes will be copied into a new array and placed in the BufferArray field of the submission queue. The BuffersRegistered field will contain the number of buffers. 4. IORING_OP_CANCEL: requests the cancellation of a pending operation for a file. Just like the in IORING_OP_READ, the FileRef can be a handle or an index into the file handle array depending on the flags. In this case the Buffer field points to the IO_STATUS_BLOCK to be canceled for the file. All these options can be a bit confusing so here are illustrations for the 4 different reading scenarios, based on the requested flags: Flags are 0, using the FileRef field as a file handle and the Buffer field as a pointer to the output buffer: [ioring_opcode_0] Flag IORING_SQE_PREREGISTERED_FILE (1) is requested, so FileRef is treated as an index into an array of pre-registered file handles and Buffer is a pointer to the output buffer: [ioring_opcode_1] Flag IORING_SQE_PREREGISTERED_BUFFER (2) is requested, so FileRef is a handle to a file and Buffer is treated as an index into an array of pre-registered output buffers: [ioring_opcode_2] Both IORING_SQE_PREREGISTERED_FILE and IORING_SQE_PREREGISTERED_BUFFER flags are set, so FileRef is treated as an index into a pre-registered file handle array and Buffer is treated as index into a pre-registered buffers array: [ioring_opcode_3] Submitting and Processing I/O Ring Once the caller set up all its submission queue entries, it can call NtSubmitIoRing to submit its requests to the kernel to get processed according to the requested parameters. Internally, NtSubmitIoRing iterates over all the entries and calls IopProcessIoRingEntry, sending the IoRing object and the current queue entry. The entry gets processed according to the specified OpCode and then calls IopIoRingDispatchComplete to fill in the completion queue. The completion queue, much like the submission queue, begins with a header, containing a queue index and count, followed by an array of entries. Each entry is an IORING_CQE structure - it has the UserData value from the submission queue entry and the Status and Information from the IO_STATUS_BLOCK for the operation: typedef struct _IORING_CQE { UINT_PTR UserData; HRESULT ResultCode; ULONG_PTR Information; } IORING_CQE, *PIORING_CQE; Once all requested entries are completed the system sets the event in IoRingObject->RingEvent. As long as not all entries are complete the system will wait on the event using the Timeout received from the caller and wake up when all requests are completed, causing the event to be signaled, or when the timeout expires. Since multiple entries can be processed, the status returned to the caller will either be an error status indicating a failure to process the entries or the return value of KeWaitForSingleObject. Status codes for individual operations can be found in the completion queue - so don't confuse receiving a STATUS_SUCCESS code from NtSubmitIoRing with successful read operations! Using I/O Ring - The Official Way Like other system calls, those new IoRing functions are not documented and not meant to be used directly. Instead, KernelBase.dll offers convenient wrapper functions that receive easy-to-use arguments and internally handle all the undocumented functions and data structures that need to be sent to the kernel. There are functions to create, query, submit and close the IoRing, as well as helper functions to build queue entries for the four different operations, which were discussed earlier. CreateIoRing CreateIoRing receives information about flags and queue sizes, and internally calls NtCreateIoRing and returns a handle to an IoRing instance: HRESULT CreateIoRing ( ULONG IoRingVersion, IORING_CREATE_FLAGS Flags, ULONG SubmissionQueueSize, ULONG CompletionQueueSize, PHANDLE Handle ); This handle is actually a pointer to an undocumented structure named um_io_ring containing the structure returned from NtCreateIoRing and other data needed to manage this IoRing instance: struct ioring_impl::um_io_ring { _In_ ULONG SqePending; _In_ ULONG SqeCount; _In_ HANDLE handle; _In_ IORING_INFO Info; _Out_ ULONG IoRingKernelAcceptedVersion; }; All the other IoRing functions will receive this handle as their first argument. After creating an IoRing instance, the application needs to build queue entries for all the requested I/O operations. Since the internal structure of the queues and the queue entry structures are not documented, KernelBase.dll exports helper functions to build those using input data supplied by the caller. There are four functions for this purpose: 1. BuildIoRingReadFile 2. BuildIoRingRegisterBuffers 3. BuildIoRingRegisterFileHandles 4. BuildIoRingCancelRequest Each function create adds a new queue entry to the submission queue with the required opcode and data. Their names make their purposes pretty obvious but lets go over them one by one just for clarity: BuildIoRingReadFile HRESULT BuildIoRingReadFile ( _In_ HANDLE IoRingHandle, _In_ PIORING_REQUEST_DATA IoringRequestDataFile, _In_ PIORING_REQUEST_DATA IoringRequestDataBuffer, _In_ ULONG NumberOfBytesToRead, _In_ ULONG64 FileOffset, _In_ ULONG_PTR UserData, _In_ IORING_SQE_FLAGS Flags ); The function receives the handle returned by CreateIoRing followed by two pointers to a new data structure which I chose to call IORING_REQUEST_DATA: typedef struct _IORING_REQUEST_DATA { ULONG Preregistered; PVOID Value; } IORING_REQUEST_DATA, *PIORING_REQUEST_DATA; The first field indicates whether the Preregistered field should be treated as a pointer or an index into a pre-registered array. The Value field is the FileRef/FileIndex if this argument supplies file data, or Buffer pointer/index if it supplies buffer data. Information from these two arguments as well as the size, offset, UserData and Flags in the next arguments are used to build a matching queue entry in the submission queue. BuildIoRingRegisterFileHandles and BuildIoRingRegisterBuffers HRESULT BuildIoRingRegisterFileHandles ( _In_ HANDLE IoRingHandle, _In_ ULONG FileHandlesCount, _In_ PVOID Handles, _In_ PVOID UserData ); HRESULT BuildIoRingRegisterBuffers ( _In_ HANDLE IoRingHandle, _In_ ULONG BuffersCount, _In_ PVOID Buffers, _In_ PVOID UserData ); These two functions create submission queue entries to pre-register file handles and output buffers. Both receive the handle returned from CreateIoRing, the count of pre-registered files/buffers in the array, an array of the handles or buffers to register and UserData. In BuildIoRingRegisterFileHandles, Handles is a pointer to an array of file handles and in BuildIoRingRegisterBuffers, Buffers is a pointer to an array of IORING_BUFFER_INFO structures containing Buffer base and size. BuildIoRingCancelRequest HRESULT BuildIoRingCancelRequest ( _In_ HANDLE IoRingHandle, _In_ PIORING_REQUEST_DATA IoringRequestData, _In_ PVOID Buffer, _In_ PVOID UserData ); Just like the other functions, BuildIoRingCancelRequest receives as its first argument the handle that was returned from CreateIoRing. The second argument is again a pointer to an IORING_REQUEST_DATA structure that contains the handle (or index in the file handles array) to the file whose operation should be canceled. The third and fourth arguments are the output buffer and UserData to be placed in the queue entry. After all queue entries were built with those functions, the queue can be submitted: SubmitIoRing HRESULT SubmitIoRing ( _In_ HANDLE IoRingHandle, _In_ ULONG EntryCount, _In_ ULONG Milliseconds, _Out_ PULONG SubmittedEntries ); The function receives the same handle as the first argument that was used to initialize the IoRing and submission queue. Then it receives the amount of entries to submit, time in milliseconds to wait on the completion of the operations, and a pointer to an output parameter that will receive the number of entries that were submitted. GetIoRingInfo HRESULT GetIoRingInfo ( _In_ HANDLE IoRingHandle, _Out_ PIORING_BASIC_INFO IoRingBasicInfo ); This API returns information about the current state of the IoRing with a new output structure: typedef struct _IORING_BASIC_INFO { ULONG Version; ULONG RequiredFlags; ULONG AdvisoryFlags; ULONG SubmissionQueueSize; ULONG CompletionQueueSize; } IORING_BASIC_INFO, *PIORING_BASIC_INFO; This contains the version and flags of the IoRing as well as the current size of the submission and completion queues. Once all operations on the IoRing are done, it needs be closed using CloseIoRing which receives the handle as its only argument and closes the handle to the IoRing object and frees the memory used for the structure. So far I couldn't find anything on the system that makes use of this feature, but once 21H2 is released I'd expect to start seeing I/ O-heavy Windows applications start using it, probably mostly in server and azure environments. Conclusion So far, no public documentation exists for this new addition to the I /O world in Windows, but hopefully when 21H2 is released later this year we will see all of this officially documented and used by both Windows and 3^rd party applications. If used wisely, this could lead to significant performance improvements for applications that have frequent read operations. Like every new feature and system call this could also have unexpected security effects. One bug was already found by hFiref0x, who was the first to publicly mention this feature and managed to crash the system by sending an incorrect parameter to NtCreateIoRing - a bug that was fixed since then. Looking more closely into these functions will likely lead to more such discoveries and interesting side effects of this new mechanism. Code Here's a small PoC showing two ways to use I/O rings - either through the official KernelBase API, or through the internal ntdll API. For the code to compile properly make sure to link it against onecoreuap.lib (for the KernelBase functions) or ntdll.lib (for the ntdll functions): #include #include typedef struct _IO_RING_STRUCTV1 { ULONG IoRingVersion; ULONG SubmissionQueueSize; ULONG CompletionQueueSize; ULONG RequiredFlags; ULONG AdvisoryFlags; } IO_RING_STRUCTV1, *PIO_RING_STRUCTV1; enum IORING_VERSION { IORING_VERSION_INVALID = 0x0, IORING_VERSION_1 = 0x1, }; enum IORING_CREATE_REQUIRED_FLAGS { IORING_CREATE_REQUIRED_FLAGS_NONE = 0x0, }; enum IORING_CREATE_ADVISORY_FLAGS { IORING_CREATE_ADVISORY_FLAGS_NONE = 0x0, }; typedef struct _IORING_CREATE_FLAGS { IORING_CREATE_REQUIRED_FLAGS Required; IORING_CREATE_ADVISORY_FLAGS Advisory; } IORING_CREATE_FLAGS, *PIORING_CREATE_FLAGS; typedef struct _IORING_QUEUE_HEAD { ULONG QueueIndex; ULONG QueueCount; ULONG64 Aligment; } IORING_QUEUE_HEAD, *PIORING_QUEUE_HEAD; typedef struct _IORING_INFO { ULONG Version; IORING_CREATE_FLAGS Flags; ULONG SubmissionQueueSize; ULONG SubQueueSizeMask; ULONG CompletionQueueSize; ULONG CompQueueSizeMask; PIORING_QUEUE_HEAD SubQueueBase; PVOID CompQueueBase; } IORING_INFO, *PIORING_INFO; typedef struct _NT_IORING_SQE { ULONG Opcode; ULONG Flags; HANDLE FileRef; LARGE_INTEGER FileOffset; PVOID Buffer; ULONG BufferSize; DWORD BufferOffset; ULONG Key; PVOID Unknown; PVOID UserData; PVOID stuff1; PVOID stuff2; PVOID stuff3; PVOID stuff4; } NT_IORING_SQE, *PNT_IORING_SQE; typedef struct _IORING_REQUEST_DATA { ULONG Preregistered; PVOID Value; } IORING_REQUEST_DATA, *PIORING_REQUEST_DATA; enum IORING_SQE_FLAGS { IOSQE_FLAGS_NONE = 0x0, }; EXTERN_C_START NTSTATUS NtSubmitIoRing ( _In_ HANDLE Handle, _In_ IORING_CREATE_REQUIRED_FLAGS Flags, _In_ ULONG EntryCount, _In_ PLARGE_INTEGER Timeout ); NTSTATUS NtCreateIoRing ( _Out_ PHANDLE pIoRingHandle, _In_ ULONG CreateParametersSize, _In_ PIO_RING_STRUCTV1 CreateParameters, _In_ ULONG OutputParametersSize, _In_ PIORING_INFO pRingInfo ); NTSTATUS NtClose ( _In_ HANDLE Handle ); HRESULT CreateIoRing ( _In_ ULONG IoRingVersion, _In_ IORING_CREATE_FLAGS Flags, _In_ ULONG SubmissionQueueSize, _In_ ULONG CompletionQueueSize, _Out_ PHANDLE Handle ); HRESULT BuildIoRingReadFile ( _In_ HANDLE Handle, _In_ PIORING_REQUEST_DATA IoringRequestDataFile, _In_ PIORING_REQUEST_DATA IoringRequestDataBuffer, _In_ ULONG NumberOfBytesToRead, _In_ ULONG64 FileOffset, _In_ ULONG_PTR UserData, _In_ IORING_SQE_FLAGS Flags ); HRESULT SubmitIoRing ( _In_ HANDLE Handle, _In_ ULONG EntryCount, _In_ ULONG Milliseconds, _Out_ PULONG SubmittedEntries ); HRESULT CloseIoRing ( _In_ HANDLE Handle ); EXTERN_C_END void IoRingNt () { NTSTATUS status; IO_RING_STRUCTV1 ioringStruct; IORING_INFO ioringInfo; HANDLE handle; PNT_IORING_SQE sqe; LARGE_INTEGER timeout; HANDLE hFile = NULL; ioringStruct.IoRingVersion = 1; ioringStruct.SubmissionQueueSize = 1; ioringStruct.CompletionQueueSize = 1; ioringStruct.AdvisoryFlags = IORING_CREATE_ADVISORY_FLAGS_NONE; ioringStruct.RequiredFlags = IORING_CREATE_REQUIRED_FLAGS_NONE; status = NtCreateIoRing(&handle, sizeof(ioringStruct), &ioringStruct, sizeof(ioringInfo), &ioringInfo); if (status != 0) { printf("Failed creating IO ring handle: 0x%x\n", status); goto Exit; } ioringInfo.SubQueueBase->QueueCount = 1; ioringInfo.SubQueueBase->QueueIndex = 0; ioringInfo.SubQueueBase->Aligment = 0; hFile = CreateFile(L"C:\\Windows\\System32\\notepad.exe", GENERIC_READ, 0, NULL, OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL, NULL); if (hFile == (HANDLE)-1) { printf("Failed opening file handle: 0x%x\n", GetLastError()); goto Exit; } sqe = (PNT_IORING_SQE)((ULONG64)ioringInfo.SubQueueBase + sizeof( IORING_QUEUE_HEAD)); sqe->Opcode = 1; sqe->Flags = 0; sqe->FileRef = hFile; sqe->FileOffset.QuadPart = 0; sqe->Buffer = VirtualAlloc(NULL, 0x200, MEM_COMMIT, PAGE_READWRITE); sqe->BufferOffset = 0; sqe->BufferSize = 0x200; sqe->Key = 1234; sqe->UserData = nullptr; timeout.QuadPart = 10000; status = NtSubmitIoRing(handle, IORING_CREATE_REQUIRED_FLAGS_NONE , 1, &timeout); if (status != 0) { printf("Failed submitting IO ring: 0x%x\n", status); goto Exit; } Exit: if (handle) { NtClose(handle); } if (hFile) { NtClose(hFile); } } void IoRingKernelBase () { HRESULT result; HANDLE handle; IORING_CREATE_FLAGS flags; IORING_REQUEST_DATA requestDataFile; IORING_REQUEST_DATA requestDataBuffer; ULONG submittedEntries; HANDLE hFile = NULL; flags.Required = IORING_CREATE_REQUIRED_FLAGS_NONE; flags.Advisory = IORING_CREATE_ADVISORY_FLAGS_NONE; result = CreateIoRing(1, flags, 1, 1, &handle); if (!SUCCEEDED(result)) { printf("Failed creating IO ring handle: 0x%x\n", result); goto Exit; } hFile = CreateFile(L"C:\\Windows\\System32\\notepad.exe", GENERIC_READ, 0, NULL, OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL, NULL); if (hFile == (HANDLE)-1) { printf("Failed opening file handle: 0x%x\n", GetLastError()); goto Exit; } requestDataFile.Preregistered = FALSE; requestDataFile.Value = hFile; requestDataBuffer.Preregistered = FALSE; requestDataBuffer.Value = VirtualAlloc(NULL, 0x200, MEM_COMMIT, PAGE_READWRITE); result = BuildIoRingReadFile(handle, &requestDataFile, &requestDataBuffer, 0x200, 0, NULL, IOSQE_FLAGS_NONE); if (!SUCCEEDED(result)) { printf("Failed building IO ring read file structure: 0x%x\n", result); goto Exit; } result = SubmitIoRing(handle, 1, 10000, &submittedEntries); if (!SUCCEEDED(result)) { printf("Failed submitting IO ring: 0x%x\n", result); goto Exit; } Exit: if (handle != 0) { CloseIoRing(handle); } if (hFile) { NtClose(hFile); } } int main () { IoRingKernelBase(); IoRingNt(); ExitProcess(0); } Posted byYarden ShafirMay 24, 2021May 24, 2021Posted inWindows Internals Post navigation Previous Post Previous post: Thread and Process State Change Winsider Seminars & Solutions Inc., Proudly powered by WordPress.