[HN Gopher] High-Performance DBMSs with io_uring: When and How t...
       ___________________________________________________________________
        
       High-Performance DBMSs with io_uring: When and How to use it
        
       Author : matt_d
       Score  : 65 points
       Date   : 2026-01-06 19:29 UTC (3 hours ago)
        
 (HTM) web link (arxiv.org)
 (TXT) w3m dump (arxiv.org)
        
       | melhindi wrote:
       | Hi, I am one of the authors. Happy to take questions.
        
       | anassar163 wrote:
       | This is one of the most easy-to-follow papers on io_uring and its
       | benefits. Good work!
        
         | melhindi wrote:
         | Thank you for the feedback, glad to hear that!
        
       | to_ziegler wrote:
       | We also wrote up a very concise, high-level summary here, if you
       | want the short version:
       | https://toziegler.github.io/2025-12-08-io-uring/
        
         | scott_w wrote:
         | Thanks! This explained to me very simply what the benefits are
         | in a way no article I've read before has.
        
           | to_ziegler wrote:
           | That's great to hear! We are happy it helped.
        
         | topspin wrote:
         | In your high level "You might _not_ want to use it if " points,
         | you mention Docker but not why, and that's odd. I happen to
         | know why: io_uring syscalls are blocked by default in Docker,
         | because io_uring is a large surface area for attacks, and this
         | has proven to be a real problem in practice. Others won't know
         | this, however. They also won't know that io_uring is similarly
         | blocked in widely used cloud sandboxes, Android, and elsewhere.
         | Seems like a fine place to point this stuff out: anyone
         | considering io_uring would want to know about these issues.
        
           | melhindi wrote:
           | Very good point! You're absolutely right: The fact that
           | io_uring is blocked by default in Docker and other sandboxes
           | due to security concerns is important context, and we should
           | have mentioned it explicitly there. We'll update the post,
           | and happy to incorporate any other caveats you think are
           | worth calling out.
        
       | lukeh wrote:
       | Small nitpick: malloc is not a system call.
        
         | to_ziegler wrote:
         | Good catch! We will fix this in the next version and change it
         | to brk/sbrk or mmap
        
       | eliasdejong wrote:
       | Really excellent research and well written, congrats. Shows that
       | io_uring really brings extra performance when properly used, and
       | not simply as a drop-in replacement.
       | 
       | > With IOPOLL, completion events are polled directly from the
       | NVMe device queue, either by the application or by the kernel
       | SQPOLL thread (cf. Section 2), replacing interrupt-based
       | signaling. This removes interrupt setup and handling overhead but
       | disables non-polled I/O, such as sockets, within the same ring.
       | 
       | > Treating io_uring as a drop-in replacement in a traditional
       | I/O-worker design is inadequate. Instead, io_uring requires a
       | ring-per-thread design that overlaps computation and I/O within
       | the same thread.
       | 
       | 1) So does this mean that if you want to take advantage of
       | IOPOLL, you should use two rings per thread: one for network and
       | one for storage?
       | 
       | 2) SQPoll is shown in the graph as outperforming IOPoll. I assume
       | both polling options are mutually exclusive?
       | 
       | 3) I'd be interested in what the considerations are (if any) for
       | using IOPoll over SQPoll.
       | 
       | 4) Additional question: I assume for a modern DBMS you would want
       | to run this as thread-per core?
        
         | mjasny wrote:
         | Thanks a lot for the kind words, we really appreciate it!
         | 
         | Regarding your questions:
         | 
         | 1) Yes. If you want to take advantage of IOPOLL while still
         | handling network I/O, you typically need two rings per thread:
         | an IOPOLL-enabled ring for storage and a regular ring for
         | sockets and other non-polled I/O.
         | 
         | 2) They are not mutually exclusive. SQPOLL was enabled in
         | addition to IOPOLL in the experiments (+SQPoll). SQPOLL affects
         | submission, while IOPOLL changes how completions are retrieved.
         | 
         | 3) The main trade-off is CPU usage vs. latency. SQPOLL spawns
         | an additional kernel thread that busy spins to issue I/O
         | requests from the ring. With IOPOLL interrupts are not used and
         | instead the device queues are polled (this does not necessarily
         | result in 100% CPU usage on the core).
         | 
         | 4) Yes. For a modern DBMS, a thread-per-core model is the
         | natural fit. Rings should not be shared between threads; each
         | thread should have its own io_uring instance(s) to avoid
         | synchronization and for locality.
        
       ___________________________________________________________________
       (page generated 2026-01-06 23:02 UTC)