[HN Gopher] How we found and fixed an eBPF Linux kernel vulnerab...
___________________________________________________________________
How we found and fixed an eBPF Linux kernel vulnerability
Author : xxmarkuski
Score : 213 points
Date : 2024-08-08 10:39 UTC (12 hours ago)
(HTM) web link (bughunters.google.com)
(TXT) w3m dump (bughunters.google.com)
| katzinsky wrote:
| The one time I tried to use eBPF it wasn't expressive enough for
| what I needed.
|
| Does the limited flexibility it provides really justify the added
| kernel space complexity? I can understand it for packet filtering
| but some of the other stuff it's used for like sandboxing just
| isn't convincing.
| knorker wrote:
| There are other technologies for this, such as DTrace. The
| kernel's choice isn't eBPF or nothing, it's eBPF or something
| else like it.
|
| You may not use it much, but some people use it all day. I
| think FAANG engineers have said that they run tens (hundreds?)
| of these things on all servers, all the time. And that's
| excluding one-offs. And FAANG has full time kernel coders on
| staff, so they're also funding this complexity that they use.
|
| But also yes, I've solved problems by using eBPF. Problems that
| are basically unsolvable by non-kernel-gurus without eBPF. I
| rarely need it. But when I need it, there's nothing else that
| does the trick.
|
| In some cases, even for kernel gurus, it's a choice between
| eBPF or maintaining a custom kernel patch forever.
| katzinsky wrote:
| I'm not sure "Google engineers use it" is a very good counter
| argument. They have a very high tolerance for complexity and
| like most large corporations what actually gets built and
| used tends to be driven more by internal politics than
| technical merit.
| eggnet wrote:
| Google would maintain a kernel patch or upstream a patch if
| that was the right choice for a given problem.
| katzinsky wrote:
| That's really begging the question. I don't believe they
| would as they have consistently over engineered solutions
| in the past.
| knorker wrote:
| I don't mean it as a counter argument, or I don't think the
| way you mean it, at least.
|
| You may not use it at your smaller scale. But there are
| millions of machines out there that do use it, and the
| alternative for the same functionality is much worse.
|
| I bet you never use SCTP sockets either. eBPF is used much
| more than SCTP.
|
| And its users "fund" its development, so it's not a burden
| to those who don't use it.
|
| But are you sure your systems don't use it? Run "bpftool
| prog" to see. Whatever you see there someone thought was
| better than the alternative.
| lynxmachine wrote:
| > I've solved problems by using eBPF. Problems that are
| basically unsolvable by non-kernel-gurus without eBPF. I
| rarely need it.
|
| Would you mind giving some examples? I recently started
| learning about ebpf's from Liz Rice's book and is curious
| about what makes ebpf the correct choice in a particular
| scenario.
| znpy wrote:
| > There are other technologies for this, such as DTrace. The
| kernel's choice isn't eBPF or nothing, it's eBPF or something
| else like it.
|
| To add on this point: I successfully used SystemTap a few
| years ago to debug an issue i was having.
|
| Before going further: keep in mind that my point of view (at
| the time) was the one of somebody working as a devops
| engineer, debugging some annoyances with containers (managed
| by Kubernetes) going OOM. I'm no kernel developer and I have
| a basic-good understanding of the C language based on first-
| years university course and geekyness/nerdyness. So in this
| context I'm a glorified hobbyist.
|
| Learning SystemTap is easier in my opinion. I followed a
| tutorial by RedHat to get the hang of the manual parts but
| after that I remember being fairly easy:
|
| 1. Try to reproduce the issue you're having (fairly easy for
| me)
|
| 2. Skim the source code of the linux about the part that you
| think might be relevant (for me it was the oom killer)
|
| 3. Add probes in there, see if they fire when you reproduce
| the issue
|
| 4. Look back at the source code of the kernel and see what
| chain of data structures and fields you can follow to reach
| the piece of information you need
|
| 5. Improve your probes
|
| 6. If successful, you're done
|
| 7. Goto 4
|
| I think it took like one or two days between following the
| tutorial and getting a working probe.
|
| It was a pleasant couple of days.
| fch42 wrote:
| DTrace and eBPF are "not so different" in the sense that
| dtrace programs / hooks are also a form of low-level code /
| instruction set that the kernel (dtrace driver) validates at
| load. It's an "internal" artifact of dtrace though,
| https://github.com/illumos/illumos-
| gate/blob/master/usr/src/... and to my knowledge, nothing
| like a clang/gcc "dtrace target" exists to translate more-or-
| less arbitrary higher-level language "to low-level dtrace".
|
| The additional flexibility eBPF gets from this is amazing
| really. While dtrace is a more-targeted (and for its intended
| usecases, in some situations still superior to eBPF) but also
| less-general tool.
|
| (citrus vs. stone fruit ...)
| cryptonector wrote:
| DTrace's bytecode machine is also very very limited. eBPF's
| is much less limited. Limiting the scope of what a probe
| can do is very important.
| bcantrill wrote:
| Yes, thank you. Long before eBPF existed, we spent a ton
| of time on the safety of DTrace[0][1] -- there's a bunch
| of subtlety to it. The proof is in the pudding, however:
| thanks to our strict adherence to the safety constraint,
| we have absolute confidence in using DTrace in
| production.
|
| [0] https://bcantrill.dtrace.org/2005/07/19/dtrace-
| safety/
|
| [1] https://www.usenix.org/legacy/publications/library/pr
| oceedin..., SS3.3
| saagarjha wrote:
| I'm curious which part of these tenets would feel would
| have prevented the bug demonstrated, besides "oh we tried
| harder"? I don't see any of those that seem unique to
| DTrace other than limiting where probes can be placed.
| cryptonector wrote:
| The DTrace bytecode VM is simply more limited:
| - it cannot branch backwards (this is also true of eBPF)
| - it can only do ternary operator branches - it
| cannot define functions - functions it can call are
| limited to some builtin ones - it can only scribble
| on the one pre-allocated probe buffer - it can only
| access the probe's defined parameters
| tptacek wrote:
| eBPF programs can absolutely branch backwards. You may be
| thinking of cBPF.
| cryptonector wrote:
| I was thinking of the original BPF. I didn't realize that
| eBPF added back branching.
| tptacek wrote:
| If the verifier can prove to itself that a loop is
| bounded, it'll accept it. A good starting place for eBPF
| itself: if a normal ARM program could do it, eBPF can do
| it. It's a fully functional ISA.
| cryptonector wrote:
| I'm w/ the DTrace guys on this. A turing complete VM is a
| bad idea for this purpose.
| tptacek wrote:
| It depends on what you're using it for. If you want to
| expose this to untrusted code, yes, but I wouldn't be
| comfortable doing that with DTrace either.
| cryptonector wrote:
| There's two untrusted code cases here: untrusted DTrace
| scripts / users, and untrusted targets for inspection.
| The latter has to be possible to examine, so the
| observability tools (like DTrace) have to be secure for
| that purpose. This means you want to make it difficult to
| overflow buffers in the observability tools.
|
| There's also a need to make sure that even trusted users
| don't accidentally cause too much observability load.
| That's why DTrace has a circular probe buffer pool, it's
| why it drops probes under load, it's why it pre-allocates
| each probe's buffer by computing how much the probe's
| actions will write to it, it's why it doesn't allow
| looping (since that would make the probe's effect less
| predictable), etc.
|
| Bryan, Adam, and Mike designed it this way two plus
| decades ago, and Linux still hasn't caught up.
| tptacek wrote:
| Linux has a different design than DTrace; eBPF is more
| capable as a trusted tool, and less capable for untrusted
| tools. It doesn't make sense to say one approach has
| "caught up" to the other, unless you really believe the
| verifier will reach a state where nobody's going find
| verifier bugs --- at which point eBPF will be strictly
| superior. Beyond that, it's a matter of taste. What seems
| clearly to be true is that eBPF is wildly more popular.
| cryptonector wrote:
| And I should say that DTrace probe actions _can
| dereference pointers_ , but NULL dereferences do not
| cause crashes, and rich type data is generally available.
| bcantrill wrote:
| Well, we didn't merely "try harder" -- we treated safety
| as a constraint which informed every aspect of the
| design. And yes, treating safety as a constraint rather
| than merely an objective results in different
| implementation decisions. From the article:
|
| _This working model significantly increases the attack
| surface of the kernel, since it allows executing
| arbitrary code at a high privilege level. Because of this
| risk, programs have to be verified before they can be
| loaded. This ensures that all eBPF security assumptions
| are met. The verifier, which consists of complex code, is
| responsible for this task._
|
| _Given how difficult the task of validating that a
| program is safe to execute is, there have been many
| vulnerabilities found within the eBPF verifier. When one
| of these vulnerabilities is exploited, the result is
| usually a local privilege escalation exploit (or
| container escape in containerized environments). While
| the verifier's code has been audited extensively, this
| task also becomes harder as new features are added to
| eBPF and the complexity of the verifier grows_
|
| DTrace was developed over 20 years ago; there have not
| been "many vulnerabilities" found in the verifier -- and
| we have not grown the complexity of the verifier over
| time. You can dismiss these as implementation details,
| but these details reflect different views of the problem
| and its contraints.
| saagarjha wrote:
| No, like, the bug that was demonstrated seems to be
| fairly fundamental to running any sort of bytecode in the
| kernel: they need to verify all branches, and this is
| potentially slow, so they optimize it (which is where the
| bug is). What are you doing differently? It seems to me
| that you're either not going to optimize this or you are?
| tptacek wrote:
| The DTrace instruction set is more limited than that of
| the eBPF VM; eBPF is essentially a fully functional ISA,
| where DTrace was (if I'm remembering this right) designed
| around the D script language. An eBPF program is often
| just a clang C program, and you're trusting the kernel
| verifier to reject it if it can't be proven safe.
| Further: eBPF programs are JIT'd to actual machine code;
| once you've loaded and verified an eBPF program, it has
| conceptually all the same power as, say, shellcode you
| managed to load into the kernel via an LPE.
|
| That's not to say that security researchers couldn't find
| DTrace vulnerabilities if they, for instance, built
| DIF/DOF fuzzers of 2023 levels of sophistication for
| them. I don't know that anyone's doing that, because
| DTrace is more or less a dead letter.
| ssahoo wrote:
| Wouldn't even the classic loadable kernel mode driver be a
| better choice than a patch and eBpf? I know they are unsafe but
| people who deal with it, know the power comes with
| responsibility.
| tptacek wrote:
| No? SREs roll eBPF programs on the fly just in the process of
| debugging problems; if you tried to do that with an LKM,
| you'd almost certainly blow up your system. People who write
| Linux kernel code routinely crash their systems in the
| process of development.
| techwiz137 wrote:
| In my country we have a saying. "Porcupine in the pants". Sounds
| like for all the good it can do, it isn't written safely and
| carefully.
| deskr wrote:
| With experience you'll realise that despite things being done
| safely and carefully, mistakes can and do pop up.
| tptacek wrote:
| A reminder that on the platforms eBPF is most commonly used,
| verifier bugs don't matter much, because unprivileged code isn't
| allowed to load eBPF programs to begin with. Bugs like this are
| thus root -> ring0 vulnerabilities. That's not nothing, but for
| serverside work it's usually worth the tradeoff, especially
| because eBPF's track record for kernel LPEs is actually pretty
| strong compared to the kernel as a whole.
|
| In the setting eBPF is used today, most of the value of the
| verifier is that it's hard to _accidentally_ crash your kernel
| with a bad eBPF program. That is comically untrue about an
| ordinary LKM.
| chc4 wrote:
| The PoC uses eBPF maps as their out-of-bounds pointer, but it
| sounds like it would also be exploitable via non-extended BPF
| programs loadable via seccomp since it's just improper scalar
| value range tracking, which doesn't require any privileges on
| most platforms.
|
| And, of course, root -> ring0 is less of a problem with
| unprivileged user namespaces where you can make yourself
| "root", as we've seen in every eBPF bug PoC since distros
| started turning that on (and have since turned it off again,
| mostly)
| 10000truths wrote:
| Verifier bugs matter because resolving them is a prerequisite
| for secure unprivileged use of eBPF.
| tptacek wrote:
| Put it this way: verifier bugs matter, but people probably
| don't do unscheduled fleetwide updates to fix them.
| dumpling777 wrote:
| Let's not forget also that we can give CAP_BPF to containers.
| With things like Cilium on the rise, the attack vector of
| landing in container environment that has cap_bpf is more and
| more realistic
| tptacek wrote:
| I don't believe shared-kernel container systems are real
| security boundaries to begin with, so, to me, a container
| running with CAP_BPF isn't much different than any other
| program a machine owner might opt to run; the point is that
| you trust the workload, and so the verifier is more of a
| safety net than a vault door.
| mrbluecoat wrote:
| > "Uno no es ninguno" (One is none)
|
| I believe that translates to "One is not none"
|
| https://bughunters.google.com/blog/6303226026131456/a-deep-d...
| DanielVZ wrote:
| Thats the direct translation but for some reason in spanish our
| double negations are usually just negations.
| kmarc wrote:
| It doesn't; It translates to "One is none" This is the infamous
| double negation many foreign speakers (including me) struggles
| with.
|
| https://spanish.stackexchange.com/questions/26777/how-does-d...
| samatman wrote:
| Perhaps we should translate this as "one ain't nothin'".
| TacticalCoder wrote:
| > "Uno no es ninguno" (One is none)
|
| Literally "One not is none", aka "One is _not_ none ".
| jolmg wrote:
| In Spanish, it's common for double negatives to not actually be
| double negatives. For example, if you wanted to say "there's
| nothing here", you'd say "no hay nada aqui", which word-for-
| word means "there's not nothing here".
|
| Checking out the Royal Spanish Academy, here's what they say
| about it:
|
| https://www.rae.es/espanol-al-dia/doble-negacion-no-vino-nad...
|
| > The so-called "double negation" is due to the obligatory
| negative agreement that must be established in Spanish, and
| other Romance languages, in certain circumstances (see New
| Grammar, SS 48.3d), which results in the joint presence in the
| statement of the adverb _no_ and other elements that also have
| a negative meaning.
|
| > The concurrence of these two "negations" does not annul the
| negative meaning of the statement.
___________________________________________________________________
(page generated 2024-08-08 23:00 UTC)