[HN Gopher] Put an io_uring on it: Exploiting the Linux kernel
___________________________________________________________________
Put an io_uring on it: Exploiting the Linux kernel
Author : blopeur
Score : 80 points
Date : 2022-03-08 19:35 UTC (3 hours ago)
(HTM) web link (www.graplsecurity.com)
(TXT) w3m dump (www.graplsecurity.com)
| BigComrade wrote:
| tptacek wrote:
| This is one of the all-time great LPE writeups.
|
| A summary:
|
| 1. io_uring includes a feature that asks the kernel to manage
| groups of buffers for SQEs (the objects userland submits to tell
| uring what to do). If you enable this feature, the kernel
| overloads a field normally used to track a userland pointer with
| a kernel pointer.
|
| 2. The special-case code that handles I/O operations for files-
| that-are-not-files, like in procfs, missed the check for this
| "overloaded pointer" hack, and so can be tricked into advancing a
| kernel pointer arbitrarily, because it thinks it's working with a
| userland pointer.
|
| 3. The pointer you manipulate thusly is eventually freed, which
| lets you free kernel objects within a range of possible pointers.
|
| 4. io_uring allows you to control the CPU affinity of the kernel
| threads it generates on your behalf, because of course it does,
| so you can get your userland process and all your related
| io_uring kthreads onto the same CPU, and thus into the same SLUB
| cache area, which gives you enough control to target specific
| kernel objects (of a size bounded I think by the SQE?) reliably.
|
| 5. There's a well-known LPE trick for exploiting UAFs: the
| setxattr(2) syscall copies arbitrary extended attributes for
| files from userland to kernel buffers (that's its job), and the
| userfaultfd(2) syscall lets you defer page faults to userland;
| you can chain setxattr and userfaultfd to allocate and populate a
| kernel buffer of arbitrary size and contents and then block,
| keeping the object in memory.
|
| 6. Since that's a popular exploit technique, there's a default-
| yes setting in most distros to require root to use userfaultfd(2)
| --- but you can do the same thing with FUSE, where deferring I/O
| operations to userland is kind of the whole premise of the
| interface.
|
| 7. setxattr/userfaultfd can be transformed from a UAF primitive
| to an arbitrary kernel leak: if you have an arbitrary-free
| vulnerability (see step 3), you can do the setxattr-then-block
| thing, then trigger the free from another thread and target the
| xattr buffer, so setxattr's buffer is reclaimed out from under
| it, then trigger the allocation of a kernel structure you want to
| leak that is of the same size, which setxattr will copy into
| (another UAF); now you have a kernel structure that the kernel is
| treating like a file's extended attributes, which you can read
| back with getxattr. Neat!
|
| 8. At this point you can go hunting for kernel structures to
| whack, because you can use the arbitrary leak primitive to leak
| structs that in turn embed the (secret) addresses of other kernel
| structures.
|
| 9. Find a pointer to a socket's BPF filter and use the UAF to
| inject a BPF filter directly, bypassing the verifier, then
| trigger the BPF filter and do whatever you want, I guess.
|
| I'm sure I got a bunch of this wrong; corrections welcome. Again:
| really spectacular writeup: a good bug, some neat tricks, and a
| decent survey of Linux kernel LPE techniques.
| junon wrote:
| Yes, unfortunately I figured this might happen. People have been
| warning of some major issues with its design for a while now wrt
| security. Paired with the fact it's not much faster in practice
| than epoll in a large majority of usecases, I really worry it's
| going to footgun some people.
| FridgeSeal wrote:
| I'm confused by this, isn't one of the mains points of uring is
| that it's faster?
| frevib wrote:
| For disk IO it's faster, there are many benchmarks on the
| internet.
|
| For network IO, it depends. Only two things make it
| theoretically faster than epoll; io_uring supports batching of
| requests, and you can save one sys call compared to epoll in an
| event loop. There some other things that could make it faster
| like SQPOLL, but this could also hurt performance.
|
| Network IO discussion:
| https://github.com/axboe/liburing/issues/536
| dralley wrote:
| > Paired with the fact it's not much faster in practice than
| epoll in a large majority of usecases, I really worry it's
| going to footgun some people.
|
| "it's not faster than epoll" is somewhat dependent on your
| hardware and kernel. For one thing, Jens Axobe has been working
| on a lot of io-uring optimizations lately, but you probably
| won't see them unless you're using a kernel from the last few
| months. And by "a lot" I really mean 3x to 4x faster in the
| last year on the benchmarks he has been using.
|
| So if all your comparisons are on an enterprisey linux distro,
| you probably aren't getting a complete picture of epoll vs io-
| uring performance. epoll has been around a while, it's had more
| hours poured into optimization and probably regresses less
| frequently.
| egberts1 wrote:
| Whoa!
|
| One frickin' GIANT driver coherency setting, I/O Ring, that is.
___________________________________________________________________
(page generated 2022-03-08 23:00 UTC)