[HN Gopher] Io uring
       ___________________________________________________________________
        
       Io uring
        
       Author : sohkamyung
       Score  : 216 points
       Date   : 2023-05-22 13:07 UTC (9 hours ago)
        
 (HTM) web link (nick-black.com)
 (TXT) w3m dump (nick-black.com)
        
       | iamwil wrote:
       | One interesting thing is that io_uring can operate in different
       | modes. One of the modes enables kernel-side polling, so that when
       | you put data into the buffer, the kernel will pull the data out
       | itself to do IO. That means from the application side, you can
       | perform IO without any system calls.
       | 
       | Our general take was also that it has a lot of potential, but is
       | relatively low level that most mainstream programmers aren't
       | going to pay attention to it. Hence, it'll be a while before it
       | permeates through various ecosystems.
       | 
       | For those of you that like to listen on the way to work, we cover
       | io_uring on our podcast, The Technium.
       | 
       | https://www.youtube.com/watch?v=Ebpnd7rPpdI
       | 
       | https://open.spotify.com/episode/3MG2FmpE3NP7AK7zqQFArE?si=s...
        
         | BeeOnRope wrote:
         | Can you use the kernel polling mode w/o running as root?
        
         | blibble wrote:
         | > Hence, it'll be a while before it permeates through various
         | ecosystems.
         | 
         | this may take a while as it's a completely different IO model
         | 
         | it took us 30 odd years to get from select/epoll to
         | async/coroutines being popular
        
           | fanf2 wrote:
           | I am looking forward to io_uring support in libuv
        
           | mrcode007 wrote:
           | Windows has been using this mechanism for well over a decade
           | now. It's called IOCP (IO completion ports)
        
             | riceart wrote:
             | > well over a decade now.
             | 
             | 3 decades.
        
             | hawk_ wrote:
             | IOCP is syscall free only on one side i.e. completion.
             | io_uring can be syscall free on submission side as well.
        
             | blibble wrote:
             | and Linux has had POSIX async IO since 2003
             | 
             | (which no-one uses either because the API doesn't compose
             | well onto existing application structures)
        
               | mananaysiempre wrote:
               | If you had mentioned Solaris, I'd have agreed.
               | 
               | But POSIX async I/O (the aio_* functions) in Linux is
               | basically worthless performance-wise AFAIU, because Glibc
               | implements it in userspace by spawning threads to do
               | standard sync I/O. Now Linux also has _non_ -POSIX async
               | I/O (the io_* functions), but it's very situational
               | because it works only if you bypass the cache (O_DIRECT)
               | and can still randomly block on metadata operations (so
               | can Win32, to be fair). There's select/poll/epoll with
               | O_NONBLOCK of course, which is what people normally use,
               | but those do not really work with files on disk (neither
               | do their WinSock equivalents). Hell, signal-driven IO
               | (O_ASYNC) exists, I've used it to make a single-threaded
               | emulator (CPU-bound unlike a network server) interact
               | with the terminal. But asynchronous I/O of normal, cached
               | files is only possible on Linux through the use of
               | io_uring, as far as I've been able to figure out.
               | 
               | That said, I've read people here saying[1] that
               | overlapped I/O on Windows also works by scheduling
               | operations on a thread pool, even referencing KB
               | articles[2]. This does not mesh with everything I've read
               | about I/O in the NT kernel, which is supposed to be
               | natively async to the point where the I/O request
               | datastructure (the IRP) has what's essentially an
               | emulated call stack inside of it, in order to allow the
               | I/O subsystem to juggle continuations. What am I missing?
               | Does the Win32 subsystem need to dumb things down that
               | much even inside its own implementation?
               | 
               | (Windows 8 also introduced a ringbuffer-based, no-
               | syscalls thing called Registered I/O that looks very much
               | like io_uring.)
               | 
               | [1] https://news.ycombinator.com/item?id=11867351
               | 
               | [2] https://support.microsoft.com/kb/156932
        
               | cyberax wrote:
               | > That said, I've read people here saying[1] that
               | overlapped I/O on Windows also works by scheduling
               | operations on a thread pool
               | 
               | The _kernel_ thread pool. Eventually, most work has to be
               | done in an actual thread, after all.
               | 
               | > [2] https://support.microsoft.com/kb/156932
               | 
               | It's a bit misleading. What they mean is that some
               | operations can act as barriers for further operations.
               | E.g. async calls to ReadFile won't run until the call to
               | WriteFile finishes (if it's writing past the end of the
               | file).
        
               | loeg wrote:
               | Yeah, but everything on Windows uses IOCP.
        
         | throwaway894345 wrote:
         | Does this imply that the caller has to poll the result buffer
         | to know when the kernel has processed the initial data?
        
           | sophacles wrote:
           | Sort of. In the strictest sense, yes - the cq is a ring
           | buffer (implemented with fancy atomic stuff), so you have to
           | check if there is a completion on the queue before you read
           | the entry. However, this doesn't need a syscall to do
           | polling, if more completions come in while you're processing,
           | they will be available to you.
           | 
           | There's also a syscall (io_uring_enter) that will do a
           | context switch and wake you up when completions are available
           | (it's a complicated syscall, that has a lot of knobs and
           | switches and levers - just be ready for a LOT of information
           | if you go read the man page).
        
       | lloydatkinson wrote:
       | I can't help but read this as IO urine every time.
        
       | [deleted]
        
       | sylware wrote:
       | Modern hardware does already work like that.
       | 
       | Look at AMD gpu command buffers, and xHCI (and NVMe).
       | 
       | But among those "ring buffers", which one is the best? (from a
       | consumer/producer concurrent access, on a modern CPU ofc).
       | 
       | If the commands are not convoluted, the programming is soooo much
       | simpler and cleaner.
        
         | phone8675309 wrote:
         | The article mentions the names for the queues are based on the
         | NVMe standard.
        
           | sylware wrote:
           | huh, ofc I only have experience with ring buffers from AMD
           | gpus and USB xHCI controllers, and only heard that NVMe has
           | ring buffers.
           | 
           | It seems USB xHCI is the "most concurrent friendly" and
           | hardwarely friendly (IOMMU and non-IOMMU) as it supports
           | "sparse ring buffers" (non continous in bus address space).
           | AMD gpu ring buffers are atomic read/write pointers (command
           | ring buffers and "interrupt"/completion ring buffers) with
           | doorbells to notify pointer updates.
           | 
           | I should have a look at linux IO uring to see how far they
           | did go and which model they did choose.
           | 
           | I wonder how modern hardware could use anything else than
           | command ring buffers.
        
       | PeterCorless wrote:
       | Testing at ScyllaDB showed that io_uring could be >3x the
       | throughput of posix-aio, and 12% faster than linux-aio. This is
       | back in 2020, so I am sure that there's been a lot of work done
       | since then.
       | 
       | https://www.scylladb.com/2020/05/05/how-io_uring-and-ebpf-wi...
        
       | thinkharderdev wrote:
       | I've tinkered around with io_uring on and off for the last couple
       | years. But I think it's really becoming quite cool (not that it
       | wasn't cool before... :)). This was a really interesting post on
       | what's new https://github.com/axboe/liburing/wiki/io_uring-and-
       | networki.... The combination of ring-mapped buffers and multi-
       | shot operations has some really interesting applications for
       | high-performance networking. Hoping over the next year or two we
       | can start to see really bleeding edge networking perf without
       | having to resort to using DPDK :)
        
       | gigatexal wrote:
       | io_uring is the single coolest thing to come out of facebook
        
       | rain1 wrote:
       | This is the future of linux syscalls. Get on board with this or
       | get left behind.
        
         | PhilipRoman wrote:
         | I'm not sure if you're being sarcastic or not. This is a
         | specialized case for improving performance. 99% of applications
         | you use every day will not benefit from io_uring in any
         | measurable way.
        
       | porsager wrote:
       | The already insanely fast (probably the fastest) http/websocket
       | server uWebSockets is seeing a 40% improvement with io_uring in
       | initial tests!
       | 
       | https://github.com/uNetworking/uWebSockets/issues/1603#issue...
        
         | Taywee wrote:
         | While impressive, that is a project that is basically perfectly
         | suited to gain the most from this kind of optimization, and the
         | fact that it already was incredibly fast means that there are
         | fewer low hanging fruits for performance, and other projects
         | would be less likely to have as big wins (as they're more
         | likely to already have bigger bottlenecks elsewhere).
        
         | klabb3 wrote:
         | That's really promising! uSockets/uWebSockets is an almost
         | perfect use case for io_uring. That library is also insanely
         | performant and well written. I ran some benchmarks mostly for
         | low memory overhead and it's much better than even Go std,
         | nginx+nchan etc etc.
        
       | obiefernandez wrote:
       | Got a huge is that who I think it is when I saw HN homepage this
       | morning. Author is a well-known legend in Atlanta tech scene,
       | especially amongst Georgia Tech alum in their 30s and 40s. His
       | stories and rants on Facebook are legendary
        
         | craigcitro wrote:
         | can't believe we're this far into a thread about Nick without
         | someone mentioning Tab.
        
         | mananaysiempre wrote:
         | The broader world probably knows him best for the terminal
         | handling library Notcurses[1] and _a lot_ of telling terminal
         | emulator authors to get their shit together.
         | 
         | I've had his grad-school project libtorque[2] (HotPar '10), an
         | event-handling and scheduling library, on my to-read list for
         | years, but I can't seem to figure out _how_ it accomplishes the
         | interesting things it does.
         | 
         | [1] https://nick-black.com/dankwiki/index.php/Notcurses,
         | https://github.com/dankamongmen/notcurses/
         | 
         | [2] https://nick-black.com/dankwiki/index.php/Libtorque
        
         | gred wrote:
         | This made me chuckle, as part of the cohort described above. I
         | read your comment, thought to myself "what on earth is he
         | talking about? there's nobody on my radar who might fit that
         | bill", checked the name, and immediately thought "oh yeah, that
         | guy." :-) I'm not on Facebook, so I haven't seen any of those
         | stories though.
        
         | [deleted]
        
         | jart wrote:
         | He's also famous for smoking Newports outside the NYC Google
         | office.
        
         | dpeck wrote:
         | Quite a formidable trivia player as well.
        
         | obiefernandez wrote:
         | Lots of other entertaining entries on that wiki if you look for
         | them.
         | 
         | "this is the wiki of nick black (aka dank), located at
         | 33deg46'44.4"N, 84deg23'2.4"W (33.779, 85.384) in the heart of
         | midtown atlanta. dankwiki's rollin' wit' you, though I make no
         | guarantees of its correctness, relevance, nor timeliness. track
         | changes using the recent changes page. I've revived DANKBLOG,
         | this wiki and grad school having not satisfied ye olde furor
         | scribendi.
         | 
         | hack the planet! don't mistake my kindness for weakness.
         | 
         | i primarily write to force my own understanding, and remember
         | things (a few entries are actually semi-authoritative). i'm
         | just a disreputable Mariner on your way to the Wedding. if you
         | derive use from this wiki, consider yourself lucky, and please
         | get confirmation before relying on my writeups to perform
         | surgery, design planes, determine whether a graph G is an
         | Aanderaa-Rosenberg scorpion, or feed your pet rhinoceros. do
         | not proceed if allergic to linux, postmodern literature,
         | nuclear physics, or cartoonish supervillainy. "
        
         | thrwwy646b97b8 wrote:
         | [flagged]
        
         | coldcalled wrote:
         | Once upon a time, I was sitting in my shack in Home Park when I
         | got a random call from an independent tech recruiter. During
         | the discussion, I mentioned my involvement in the Linux Users
         | Group at Georgia Tech, at which point suddenly we were
         | discussing Nick Black's life story, which was kind of weird (I
         | guess I was mostly listening and not talking). I don't know why
         | the conversation took that turn, I didn't initiate the topic
         | change. We never actually talked about the job the recruiter
         | was calling me about, and I never heard from the recruiter
         | again after that. To this day I think I still don't really
         | understand what happened.
        
           | tbrake wrote:
           | Ah Home Park. Part of the circle of life for tech students.
           | 
           | 1. enroll.
           | 
           | 2. get tired of dorm life.
           | 
           | 3. move to home park because it's likely all you can afford.
           | 
           | 4. get broken into 5-10 times.
           | 
           | 5. move away
        
             | coldcalled wrote:
             | [dead]
        
             | ethbr0 wrote:
             | The trick was to have a broken window pane that you patched
             | with duct tape and cardboard.
             | 
             | Not one break in!
             | 
             | Figured no one thought we were worth it. Although we were
             | two houses across the street from Kool Korner, so maybe
             | that was the "nicer" part of Home Park?
        
               | tbrake wrote:
               | Maybe it's cleaned up since the early 00's. We were over
               | near Rocky Mountain.
               | 
               | I should have been more precise re: break-ins, meaning
               | more car break-ins. No idea what the home rate was, but
               | every other week there were 1-2 cars missing windows
               | around Lynch.
        
               | ethbr0 wrote:
               | Oof. Hemphill was pretty rough, pre-Antico metastasis
               | cleaning up the far end.
               | 
               | The west & north sides of GT are unrecognizable to me now
               | though -- condos and yuppie shops.
        
       | davidw wrote:
       | That's sort of a wall of text.
       | 
       | I'd be curious to see
       | 
       | 1. A Hello World, absolute minimum example
       | 
       | and
       | 
       | 2. Doing something it's designed to do well, cut down to as small
       | an example as possible.
       | 
       | Sheez, downvoters, I'm curious about it, and want to see an
       | example. You don't learn to drive a car by reading the engine
       | specs, either.
        
         | thinkharderdev wrote:
         | You can find some hello world level examples here
         | https://github.com/axboe/liburing/tree/master/examples.
         | 
         | Right now, what it can do really well is non-blocking file IO.
         | My (limited) understanding is that as of now, the benefits of
         | io_uring over epoll for network IO is a bit more ambiguous.
         | That said, io_uring is adding new features (already available
         | in linux 6 kernel) that are really promising. See
         | https://github.com/axboe/liburing/wiki/io_uring-and-
         | networki....
        
         | gavinray wrote:
         | You can find some "hello-world" style examples here, along with
         | fantastic tutorial materials:
         | 
         | https://unixism.net/loti/tutorial/index.html
        
       | martinjacobd wrote:
       | I recently tinkered around with io_uring for a presentation. Here
       | is a toy echo server I made with it:
       | https://gist.github.com/martinjacobd/be50a93744b94749339afe8...
       | (non io_uring example for comparison:
       | https://gist.github.com/martinjacobd/feea261d2fafe5e7332e37d...)
       | 
       | A big todo I have for this is to modify it to accept
       | multishot_accept but I haven't been able to get a recent enough
       | kernel configured properly to do it. (Only >6.0 kernels support
       | multishot accept).
       | 
       | (edit) you need to s/multishot_accept/recv_multishot/g above :)
        
         | znpy wrote:
         | Could you also do an udp based server? It'd be interesting
        
         | thinkharderdev wrote:
         | I did an echo server with multi-shot here (using the io_uring
         | rust bindings) https://github.com/thinkharderdev/io-
         | uring/blob/ring-mapped-.... My biggest issue with io_uring is
         | figuring out what is available in which kernel version :)
        
           | martinjacobd wrote:
           | I meant recv_multishot not multishot_accept. The one I linked
           | does use multishot_accept. Just a think-o. Looks like your
           | version uses both though, so that's cool.
        
         | samsquire wrote:
         | I might have to use your code again, I used your Just in time
         | compilation code to execute generated machine code.
         | 
         | Thank you so much Martin Jacob!
        
       | jupp0r wrote:
       | Would be great if this was integrated into runtimes like tokio. I
       | can totally see graphs of promise chains being translated to
       | calls to io_uring representing the underlying data dependencies
       | and thus saving lots of overhead context switches.
       | 
       | Pretty cool to see how far IO APIs have come with this.
        
         | Tanjreeve wrote:
         | There already is a tokio_uring. I'm still a bit dubious how
         | much help it is for application level stuff outside of normal
         | file operations. E.g the data dependencies stuff is a very
         | granular scheduling operation to generalise in an easy to use
         | way. But it's definitely real and worth watching.
        
       ___________________________________________________________________
       (page generated 2023-05-22 23:00 UTC)