[HN Gopher] Io uring
___________________________________________________________________
Io uring
Author : sohkamyung
Score : 216 points
Date : 2023-05-22 13:07 UTC (9 hours ago)
(HTM) web link (nick-black.com)
(TXT) w3m dump (nick-black.com)
| iamwil wrote:
| One interesting thing is that io_uring can operate in different
| modes. One of the modes enables kernel-side polling, so that when
| you put data into the buffer, the kernel will pull the data out
| itself to do IO. That means from the application side, you can
| perform IO without any system calls.
|
| Our general take was also that it has a lot of potential, but is
| relatively low level that most mainstream programmers aren't
| going to pay attention to it. Hence, it'll be a while before it
| permeates through various ecosystems.
|
| For those of you that like to listen on the way to work, we cover
| io_uring on our podcast, The Technium.
|
| https://www.youtube.com/watch?v=Ebpnd7rPpdI
|
| https://open.spotify.com/episode/3MG2FmpE3NP7AK7zqQFArE?si=s...
| BeeOnRope wrote:
| Can you use the kernel polling mode w/o running as root?
| blibble wrote:
| > Hence, it'll be a while before it permeates through various
| ecosystems.
|
| this may take a while as it's a completely different IO model
|
| it took us 30 odd years to get from select/epoll to
| async/coroutines being popular
| fanf2 wrote:
| I am looking forward to io_uring support in libuv
| mrcode007 wrote:
| Windows has been using this mechanism for well over a decade
| now. It's called IOCP (IO completion ports)
| riceart wrote:
| > well over a decade now.
|
| 3 decades.
| hawk_ wrote:
| IOCP is syscall free only on one side i.e. completion.
| io_uring can be syscall free on submission side as well.
| blibble wrote:
| and Linux has had POSIX async IO since 2003
|
| (which no-one uses either because the API doesn't compose
| well onto existing application structures)
| mananaysiempre wrote:
| If you had mentioned Solaris, I'd have agreed.
|
| But POSIX async I/O (the aio_* functions) in Linux is
| basically worthless performance-wise AFAIU, because Glibc
| implements it in userspace by spawning threads to do
| standard sync I/O. Now Linux also has _non_ -POSIX async
| I/O (the io_* functions), but it's very situational
| because it works only if you bypass the cache (O_DIRECT)
| and can still randomly block on metadata operations (so
| can Win32, to be fair). There's select/poll/epoll with
| O_NONBLOCK of course, which is what people normally use,
| but those do not really work with files on disk (neither
| do their WinSock equivalents). Hell, signal-driven IO
| (O_ASYNC) exists, I've used it to make a single-threaded
| emulator (CPU-bound unlike a network server) interact
| with the terminal. But asynchronous I/O of normal, cached
| files is only possible on Linux through the use of
| io_uring, as far as I've been able to figure out.
|
| That said, I've read people here saying[1] that
| overlapped I/O on Windows also works by scheduling
| operations on a thread pool, even referencing KB
| articles[2]. This does not mesh with everything I've read
| about I/O in the NT kernel, which is supposed to be
| natively async to the point where the I/O request
| datastructure (the IRP) has what's essentially an
| emulated call stack inside of it, in order to allow the
| I/O subsystem to juggle continuations. What am I missing?
| Does the Win32 subsystem need to dumb things down that
| much even inside its own implementation?
|
| (Windows 8 also introduced a ringbuffer-based, no-
| syscalls thing called Registered I/O that looks very much
| like io_uring.)
|
| [1] https://news.ycombinator.com/item?id=11867351
|
| [2] https://support.microsoft.com/kb/156932
| cyberax wrote:
| > That said, I've read people here saying[1] that
| overlapped I/O on Windows also works by scheduling
| operations on a thread pool
|
| The _kernel_ thread pool. Eventually, most work has to be
| done in an actual thread, after all.
|
| > [2] https://support.microsoft.com/kb/156932
|
| It's a bit misleading. What they mean is that some
| operations can act as barriers for further operations.
| E.g. async calls to ReadFile won't run until the call to
| WriteFile finishes (if it's writing past the end of the
| file).
| loeg wrote:
| Yeah, but everything on Windows uses IOCP.
| throwaway894345 wrote:
| Does this imply that the caller has to poll the result buffer
| to know when the kernel has processed the initial data?
| sophacles wrote:
| Sort of. In the strictest sense, yes - the cq is a ring
| buffer (implemented with fancy atomic stuff), so you have to
| check if there is a completion on the queue before you read
| the entry. However, this doesn't need a syscall to do
| polling, if more completions come in while you're processing,
| they will be available to you.
|
| There's also a syscall (io_uring_enter) that will do a
| context switch and wake you up when completions are available
| (it's a complicated syscall, that has a lot of knobs and
| switches and levers - just be ready for a LOT of information
| if you go read the man page).
| lloydatkinson wrote:
| I can't help but read this as IO urine every time.
| [deleted]
| sylware wrote:
| Modern hardware does already work like that.
|
| Look at AMD gpu command buffers, and xHCI (and NVMe).
|
| But among those "ring buffers", which one is the best? (from a
| consumer/producer concurrent access, on a modern CPU ofc).
|
| If the commands are not convoluted, the programming is soooo much
| simpler and cleaner.
| phone8675309 wrote:
| The article mentions the names for the queues are based on the
| NVMe standard.
| sylware wrote:
| huh, ofc I only have experience with ring buffers from AMD
| gpus and USB xHCI controllers, and only heard that NVMe has
| ring buffers.
|
| It seems USB xHCI is the "most concurrent friendly" and
| hardwarely friendly (IOMMU and non-IOMMU) as it supports
| "sparse ring buffers" (non continous in bus address space).
| AMD gpu ring buffers are atomic read/write pointers (command
| ring buffers and "interrupt"/completion ring buffers) with
| doorbells to notify pointer updates.
|
| I should have a look at linux IO uring to see how far they
| did go and which model they did choose.
|
| I wonder how modern hardware could use anything else than
| command ring buffers.
| PeterCorless wrote:
| Testing at ScyllaDB showed that io_uring could be >3x the
| throughput of posix-aio, and 12% faster than linux-aio. This is
| back in 2020, so I am sure that there's been a lot of work done
| since then.
|
| https://www.scylladb.com/2020/05/05/how-io_uring-and-ebpf-wi...
| thinkharderdev wrote:
| I've tinkered around with io_uring on and off for the last couple
| years. But I think it's really becoming quite cool (not that it
| wasn't cool before... :)). This was a really interesting post on
| what's new https://github.com/axboe/liburing/wiki/io_uring-and-
| networki.... The combination of ring-mapped buffers and multi-
| shot operations has some really interesting applications for
| high-performance networking. Hoping over the next year or two we
| can start to see really bleeding edge networking perf without
| having to resort to using DPDK :)
| gigatexal wrote:
| io_uring is the single coolest thing to come out of facebook
| rain1 wrote:
| This is the future of linux syscalls. Get on board with this or
| get left behind.
| PhilipRoman wrote:
| I'm not sure if you're being sarcastic or not. This is a
| specialized case for improving performance. 99% of applications
| you use every day will not benefit from io_uring in any
| measurable way.
| porsager wrote:
| The already insanely fast (probably the fastest) http/websocket
| server uWebSockets is seeing a 40% improvement with io_uring in
| initial tests!
|
| https://github.com/uNetworking/uWebSockets/issues/1603#issue...
| Taywee wrote:
| While impressive, that is a project that is basically perfectly
| suited to gain the most from this kind of optimization, and the
| fact that it already was incredibly fast means that there are
| fewer low hanging fruits for performance, and other projects
| would be less likely to have as big wins (as they're more
| likely to already have bigger bottlenecks elsewhere).
| klabb3 wrote:
| That's really promising! uSockets/uWebSockets is an almost
| perfect use case for io_uring. That library is also insanely
| performant and well written. I ran some benchmarks mostly for
| low memory overhead and it's much better than even Go std,
| nginx+nchan etc etc.
| obiefernandez wrote:
| Got a huge is that who I think it is when I saw HN homepage this
| morning. Author is a well-known legend in Atlanta tech scene,
| especially amongst Georgia Tech alum in their 30s and 40s. His
| stories and rants on Facebook are legendary
| craigcitro wrote:
| can't believe we're this far into a thread about Nick without
| someone mentioning Tab.
| mananaysiempre wrote:
| The broader world probably knows him best for the terminal
| handling library Notcurses[1] and _a lot_ of telling terminal
| emulator authors to get their shit together.
|
| I've had his grad-school project libtorque[2] (HotPar '10), an
| event-handling and scheduling library, on my to-read list for
| years, but I can't seem to figure out _how_ it accomplishes the
| interesting things it does.
|
| [1] https://nick-black.com/dankwiki/index.php/Notcurses,
| https://github.com/dankamongmen/notcurses/
|
| [2] https://nick-black.com/dankwiki/index.php/Libtorque
| gred wrote:
| This made me chuckle, as part of the cohort described above. I
| read your comment, thought to myself "what on earth is he
| talking about? there's nobody on my radar who might fit that
| bill", checked the name, and immediately thought "oh yeah, that
| guy." :-) I'm not on Facebook, so I haven't seen any of those
| stories though.
| [deleted]
| jart wrote:
| He's also famous for smoking Newports outside the NYC Google
| office.
| dpeck wrote:
| Quite a formidable trivia player as well.
| obiefernandez wrote:
| Lots of other entertaining entries on that wiki if you look for
| them.
|
| "this is the wiki of nick black (aka dank), located at
| 33deg46'44.4"N, 84deg23'2.4"W (33.779, 85.384) in the heart of
| midtown atlanta. dankwiki's rollin' wit' you, though I make no
| guarantees of its correctness, relevance, nor timeliness. track
| changes using the recent changes page. I've revived DANKBLOG,
| this wiki and grad school having not satisfied ye olde furor
| scribendi.
|
| hack the planet! don't mistake my kindness for weakness.
|
| i primarily write to force my own understanding, and remember
| things (a few entries are actually semi-authoritative). i'm
| just a disreputable Mariner on your way to the Wedding. if you
| derive use from this wiki, consider yourself lucky, and please
| get confirmation before relying on my writeups to perform
| surgery, design planes, determine whether a graph G is an
| Aanderaa-Rosenberg scorpion, or feed your pet rhinoceros. do
| not proceed if allergic to linux, postmodern literature,
| nuclear physics, or cartoonish supervillainy. "
| thrwwy646b97b8 wrote:
| [flagged]
| coldcalled wrote:
| Once upon a time, I was sitting in my shack in Home Park when I
| got a random call from an independent tech recruiter. During
| the discussion, I mentioned my involvement in the Linux Users
| Group at Georgia Tech, at which point suddenly we were
| discussing Nick Black's life story, which was kind of weird (I
| guess I was mostly listening and not talking). I don't know why
| the conversation took that turn, I didn't initiate the topic
| change. We never actually talked about the job the recruiter
| was calling me about, and I never heard from the recruiter
| again after that. To this day I think I still don't really
| understand what happened.
| tbrake wrote:
| Ah Home Park. Part of the circle of life for tech students.
|
| 1. enroll.
|
| 2. get tired of dorm life.
|
| 3. move to home park because it's likely all you can afford.
|
| 4. get broken into 5-10 times.
|
| 5. move away
| coldcalled wrote:
| [dead]
| ethbr0 wrote:
| The trick was to have a broken window pane that you patched
| with duct tape and cardboard.
|
| Not one break in!
|
| Figured no one thought we were worth it. Although we were
| two houses across the street from Kool Korner, so maybe
| that was the "nicer" part of Home Park?
| tbrake wrote:
| Maybe it's cleaned up since the early 00's. We were over
| near Rocky Mountain.
|
| I should have been more precise re: break-ins, meaning
| more car break-ins. No idea what the home rate was, but
| every other week there were 1-2 cars missing windows
| around Lynch.
| ethbr0 wrote:
| Oof. Hemphill was pretty rough, pre-Antico metastasis
| cleaning up the far end.
|
| The west & north sides of GT are unrecognizable to me now
| though -- condos and yuppie shops.
| davidw wrote:
| That's sort of a wall of text.
|
| I'd be curious to see
|
| 1. A Hello World, absolute minimum example
|
| and
|
| 2. Doing something it's designed to do well, cut down to as small
| an example as possible.
|
| Sheez, downvoters, I'm curious about it, and want to see an
| example. You don't learn to drive a car by reading the engine
| specs, either.
| thinkharderdev wrote:
| You can find some hello world level examples here
| https://github.com/axboe/liburing/tree/master/examples.
|
| Right now, what it can do really well is non-blocking file IO.
| My (limited) understanding is that as of now, the benefits of
| io_uring over epoll for network IO is a bit more ambiguous.
| That said, io_uring is adding new features (already available
| in linux 6 kernel) that are really promising. See
| https://github.com/axboe/liburing/wiki/io_uring-and-
| networki....
| gavinray wrote:
| You can find some "hello-world" style examples here, along with
| fantastic tutorial materials:
|
| https://unixism.net/loti/tutorial/index.html
| martinjacobd wrote:
| I recently tinkered around with io_uring for a presentation. Here
| is a toy echo server I made with it:
| https://gist.github.com/martinjacobd/be50a93744b94749339afe8...
| (non io_uring example for comparison:
| https://gist.github.com/martinjacobd/feea261d2fafe5e7332e37d...)
|
| A big todo I have for this is to modify it to accept
| multishot_accept but I haven't been able to get a recent enough
| kernel configured properly to do it. (Only >6.0 kernels support
| multishot accept).
|
| (edit) you need to s/multishot_accept/recv_multishot/g above :)
| znpy wrote:
| Could you also do an udp based server? It'd be interesting
| thinkharderdev wrote:
| I did an echo server with multi-shot here (using the io_uring
| rust bindings) https://github.com/thinkharderdev/io-
| uring/blob/ring-mapped-.... My biggest issue with io_uring is
| figuring out what is available in which kernel version :)
| martinjacobd wrote:
| I meant recv_multishot not multishot_accept. The one I linked
| does use multishot_accept. Just a think-o. Looks like your
| version uses both though, so that's cool.
| samsquire wrote:
| I might have to use your code again, I used your Just in time
| compilation code to execute generated machine code.
|
| Thank you so much Martin Jacob!
| jupp0r wrote:
| Would be great if this was integrated into runtimes like tokio. I
| can totally see graphs of promise chains being translated to
| calls to io_uring representing the underlying data dependencies
| and thus saving lots of overhead context switches.
|
| Pretty cool to see how far IO APIs have come with this.
| Tanjreeve wrote:
| There already is a tokio_uring. I'm still a bit dubious how
| much help it is for application level stuff outside of normal
| file operations. E.g the data dependencies stuff is a very
| granular scheduling operation to generalise in an easy to use
| way. But it's definitely real and worth watching.
___________________________________________________________________
(page generated 2023-05-22 23:00 UTC)