[HN Gopher] High-Performance DBMSs with io_uring: When and How t...
___________________________________________________________________
High-Performance DBMSs with io_uring: When and How to use it
Author : matt_d
Score : 65 points
Date : 2026-01-06 19:29 UTC (3 hours ago)
(HTM) web link (arxiv.org)
(TXT) w3m dump (arxiv.org)
| melhindi wrote:
| Hi, I am one of the authors. Happy to take questions.
| anassar163 wrote:
| This is one of the most easy-to-follow papers on io_uring and its
| benefits. Good work!
| melhindi wrote:
| Thank you for the feedback, glad to hear that!
| to_ziegler wrote:
| We also wrote up a very concise, high-level summary here, if you
| want the short version:
| https://toziegler.github.io/2025-12-08-io-uring/
| scott_w wrote:
| Thanks! This explained to me very simply what the benefits are
| in a way no article I've read before has.
| to_ziegler wrote:
| That's great to hear! We are happy it helped.
| topspin wrote:
| In your high level "You might _not_ want to use it if " points,
| you mention Docker but not why, and that's odd. I happen to
| know why: io_uring syscalls are blocked by default in Docker,
| because io_uring is a large surface area for attacks, and this
| has proven to be a real problem in practice. Others won't know
| this, however. They also won't know that io_uring is similarly
| blocked in widely used cloud sandboxes, Android, and elsewhere.
| Seems like a fine place to point this stuff out: anyone
| considering io_uring would want to know about these issues.
| melhindi wrote:
| Very good point! You're absolutely right: The fact that
| io_uring is blocked by default in Docker and other sandboxes
| due to security concerns is important context, and we should
| have mentioned it explicitly there. We'll update the post,
| and happy to incorporate any other caveats you think are
| worth calling out.
| lukeh wrote:
| Small nitpick: malloc is not a system call.
| to_ziegler wrote:
| Good catch! We will fix this in the next version and change it
| to brk/sbrk or mmap
| eliasdejong wrote:
| Really excellent research and well written, congrats. Shows that
| io_uring really brings extra performance when properly used, and
| not simply as a drop-in replacement.
|
| > With IOPOLL, completion events are polled directly from the
| NVMe device queue, either by the application or by the kernel
| SQPOLL thread (cf. Section 2), replacing interrupt-based
| signaling. This removes interrupt setup and handling overhead but
| disables non-polled I/O, such as sockets, within the same ring.
|
| > Treating io_uring as a drop-in replacement in a traditional
| I/O-worker design is inadequate. Instead, io_uring requires a
| ring-per-thread design that overlaps computation and I/O within
| the same thread.
|
| 1) So does this mean that if you want to take advantage of
| IOPOLL, you should use two rings per thread: one for network and
| one for storage?
|
| 2) SQPoll is shown in the graph as outperforming IOPoll. I assume
| both polling options are mutually exclusive?
|
| 3) I'd be interested in what the considerations are (if any) for
| using IOPoll over SQPoll.
|
| 4) Additional question: I assume for a modern DBMS you would want
| to run this as thread-per core?
| mjasny wrote:
| Thanks a lot for the kind words, we really appreciate it!
|
| Regarding your questions:
|
| 1) Yes. If you want to take advantage of IOPOLL while still
| handling network I/O, you typically need two rings per thread:
| an IOPOLL-enabled ring for storage and a regular ring for
| sockets and other non-polled I/O.
|
| 2) They are not mutually exclusive. SQPOLL was enabled in
| addition to IOPOLL in the experiments (+SQPoll). SQPOLL affects
| submission, while IOPOLL changes how completions are retrieved.
|
| 3) The main trade-off is CPU usage vs. latency. SQPOLL spawns
| an additional kernel thread that busy spins to issue I/O
| requests from the ring. With IOPOLL interrupts are not used and
| instead the device queues are polled (this does not necessarily
| result in 100% CPU usage on the core).
|
| 4) Yes. For a modern DBMS, a thread-per-core model is the
| natural fit. Rings should not be shared between threads; each
| thread should have its own io_uring instance(s) to avoid
| synchronization and for locality.
___________________________________________________________________
(page generated 2026-01-06 23:02 UTC)