[HN Gopher] How we found and fixed an eBPF Linux kernel vulnerab...
___________________________________________________________________
How we found and fixed an eBPF Linux kernel vulnerability
Author : xxmarkuski
Score : 257 points
Date : 2024-08-08 10:39 UTC (1 days ago)
(HTM) web link (bughunters.google.com)
(TXT) w3m dump (bughunters.google.com)
| katzinsky wrote:
| The one time I tried to use eBPF it wasn't expressive enough for
| what I needed.
|
| Does the limited flexibility it provides really justify the added
| kernel space complexity? I can understand it for packet filtering
| but some of the other stuff it's used for like sandboxing just
| isn't convincing.
| knorker wrote:
| There are other technologies for this, such as DTrace. The
| kernel's choice isn't eBPF or nothing, it's eBPF or something
| else like it.
|
| You may not use it much, but some people use it all day. I
| think FAANG engineers have said that they run tens (hundreds?)
| of these things on all servers, all the time. And that's
| excluding one-offs. And FAANG has full time kernel coders on
| staff, so they're also funding this complexity that they use.
|
| But also yes, I've solved problems by using eBPF. Problems that
| are basically unsolvable by non-kernel-gurus without eBPF. I
| rarely need it. But when I need it, there's nothing else that
| does the trick.
|
| In some cases, even for kernel gurus, it's a choice between
| eBPF or maintaining a custom kernel patch forever.
| katzinsky wrote:
| I'm not sure "Google engineers use it" is a very good counter
| argument. They have a very high tolerance for complexity and
| like most large corporations what actually gets built and
| used tends to be driven more by internal politics than
| technical merit.
| eggnet wrote:
| Google would maintain a kernel patch or upstream a patch if
| that was the right choice for a given problem.
| katzinsky wrote:
| That's really begging the question. I don't believe they
| would as they have consistently over engineered solutions
| in the past.
| DaiPlusPlus wrote:
| > Google would maintain a kernel patch
|
| I look forward to seeing that patch on Google Graveyard
| in a couple years' time.
| knorker wrote:
| I don't mean it as a counter argument, or I don't think the
| way you mean it, at least.
|
| You may not use it at your smaller scale. But there are
| millions of machines out there that do use it, and the
| alternative for the same functionality is much worse.
|
| I bet you never use SCTP sockets either. eBPF is used much
| more than SCTP.
|
| And its users "fund" its development, so it's not a burden
| to those who don't use it.
|
| But are you sure your systems don't use it? Run "bpftool
| prog" to see. Whatever you see there someone thought was
| better than the alternative.
| lynxmachine wrote:
| > I've solved problems by using eBPF. Problems that are
| basically unsolvable by non-kernel-gurus without eBPF. I
| rarely need it.
|
| Would you mind giving some examples? I recently started
| learning about ebpf's from Liz Rice's book and is curious
| about what makes ebpf the correct choice in a particular
| scenario.
| znpy wrote:
| > There are other technologies for this, such as DTrace. The
| kernel's choice isn't eBPF or nothing, it's eBPF or something
| else like it.
|
| To add on this point: I successfully used SystemTap a few
| years ago to debug an issue i was having.
|
| Before going further: keep in mind that my point of view (at
| the time) was the one of somebody working as a devops
| engineer, debugging some annoyances with containers (managed
| by Kubernetes) going OOM. I'm no kernel developer and I have
| a basic-good understanding of the C language based on first-
| years university course and geekyness/nerdyness. So in this
| context I'm a glorified hobbyist.
|
| Learning SystemTap is easier in my opinion. I followed a
| tutorial by RedHat to get the hang of the manual parts but
| after that I remember being fairly easy:
|
| 1. Try to reproduce the issue you're having (fairly easy for
| me)
|
| 2. Skim the source code of the linux about the part that you
| think might be relevant (for me it was the oom killer)
|
| 3. Add probes in there, see if they fire when you reproduce
| the issue
|
| 4. Look back at the source code of the kernel and see what
| chain of data structures and fields you can follow to reach
| the piece of information you need
|
| 5. Improve your probes
|
| 6. If successful, you're done
|
| 7. Goto 4
|
| I think it took like one or two days between following the
| tutorial and getting a working probe.
|
| It was a pleasant couple of days.
| fch42 wrote:
| DTrace and eBPF are "not so different" in the sense that
| dtrace programs / hooks are also a form of low-level code /
| instruction set that the kernel (dtrace driver) validates at
| load. It's an "internal" artifact of dtrace though,
| https://github.com/illumos/illumos-
| gate/blob/master/usr/src/... and to my knowledge, nothing
| like a clang/gcc "dtrace target" exists to translate more-or-
| less arbitrary higher-level language "to low-level dtrace".
|
| The additional flexibility eBPF gets from this is amazing
| really. While dtrace is a more-targeted (and for its intended
| usecases, in some situations still superior to eBPF) but also
| less-general tool.
|
| (citrus vs. stone fruit ...)
| cryptonector wrote:
| DTrace's bytecode machine is also very very limited. eBPF's
| is much less limited. Limiting the scope of what a probe
| can do is very important.
| bcantrill wrote:
| Yes, thank you. Long before eBPF existed, we spent a ton
| of time on the safety of DTrace[0][1] -- there's a bunch
| of subtlety to it. The proof is in the pudding, however:
| thanks to our strict adherence to the safety constraint,
| we have absolute confidence in using DTrace in
| production.
|
| [0] https://bcantrill.dtrace.org/2005/07/19/dtrace-
| safety/
|
| [1] https://www.usenix.org/legacy/publications/library/pr
| oceedin..., SS3.3
| saagarjha wrote:
| I'm curious which part of these tenets would feel would
| have prevented the bug demonstrated, besides "oh we tried
| harder"? I don't see any of those that seem unique to
| DTrace other than limiting where probes can be placed.
| cryptonector wrote:
| The DTrace bytecode VM is simply more limited:
| - it cannot branch backwards (this is also true of eBPF)
| - it can only do ternary operator branches - it
| cannot define functions - functions it can call are
| limited to some builtin ones - it can only scribble
| on the one pre-allocated probe buffer - it can only
| access the probe's defined parameters
| tptacek wrote:
| eBPF programs can absolutely branch backwards. You may be
| thinking of cBPF.
| cryptonector wrote:
| I was thinking of the original BPF. I didn't realize that
| eBPF added back branching.
| tptacek wrote:
| If the verifier can prove to itself that a loop is
| bounded, it'll accept it. A good starting place for eBPF
| itself: if a normal ARM program could do it, eBPF can do
| it. It's a fully functional ISA.
| cryptonector wrote:
| I'm w/ the DTrace guys on this. A turing complete VM is a
| bad idea for this purpose.
| tptacek wrote:
| It depends on what you're using it for. If you want to
| expose this to untrusted code, yes, but I wouldn't be
| comfortable doing that with DTrace either.
| cryptonector wrote:
| There's two untrusted code cases here: untrusted DTrace
| scripts / users, and untrusted targets for inspection.
| The latter has to be possible to examine, so the
| observability tools (like DTrace) have to be secure for
| that purpose. This means you want to make it difficult to
| overflow buffers in the observability tools.
|
| There's also a need to make sure that even trusted users
| don't accidentally cause too much observability load.
| That's why DTrace has a circular probe buffer pool, it's
| why it drops probes under load, it's why it pre-allocates
| each probe's buffer by computing how much the probe's
| actions will write to it, it's why it doesn't allow
| looping (since that would make the probe's effect less
| predictable), etc.
|
| Bryan, Adam, and Mike designed it this way two plus
| decades ago, and Linux still hasn't caught up.
| tptacek wrote:
| Linux has a different design than DTrace; eBPF is more
| capable as a trusted tool, and less capable for untrusted
| tools. It doesn't make sense to say one approach has
| "caught up" to the other, unless you really believe the
| verifier will reach a state where nobody's going find
| verifier bugs --- at which point eBPF will be strictly
| superior. Beyond that, it's a matter of taste. What seems
| clearly to be true is that eBPF is wildly more popular.
| cryptonector wrote:
| It's really hard to bring a host to its knees using
| DTrace, yet it's quite powerful for observability. In my
| opinion it is better to start with that then add extra
| power where it's needed.
| tptacek wrote:
| I understand the argument, but it's clear which one
| succeeded in the market. Meanwhile: we take pretty good
| advantage of the extra power eBPF gives us over what
| DTrace would, so I'm happy to be on the golden path for
| the platform here. Like I said, though: this is a matter
| of taste.
| umanwizard wrote:
| eBPF isn't Turing complete because it has to be able to
| prove that loops are bounded.
| cryptonector wrote:
| And I should say that DTrace probe actions _can
| dereference pointers_ , but NULL dereferences do not
| cause crashes, and rich type data is generally available.
| bcantrill wrote:
| Well, we didn't merely "try harder" -- we treated safety
| as a constraint which informed every aspect of the
| design. And yes, treating safety as a constraint rather
| than merely an objective results in different
| implementation decisions. From the article:
|
| _This working model significantly increases the attack
| surface of the kernel, since it allows executing
| arbitrary code at a high privilege level. Because of this
| risk, programs have to be verified before they can be
| loaded. This ensures that all eBPF security assumptions
| are met. The verifier, which consists of complex code, is
| responsible for this task._
|
| _Given how difficult the task of validating that a
| program is safe to execute is, there have been many
| vulnerabilities found within the eBPF verifier. When one
| of these vulnerabilities is exploited, the result is
| usually a local privilege escalation exploit (or
| container escape in containerized environments). While
| the verifier's code has been audited extensively, this
| task also becomes harder as new features are added to
| eBPF and the complexity of the verifier grows_
|
| DTrace was developed over 20 years ago; there have not
| been "many vulnerabilities" found in the verifier -- and
| we have not grown the complexity of the verifier over
| time. You can dismiss these as implementation details,
| but these details reflect different views of the problem
| and its contraints.
| saagarjha wrote:
| No, like, the bug that was demonstrated seems to be
| fairly fundamental to running any sort of bytecode in the
| kernel: they need to verify all branches, and this is
| potentially slow, so they optimize it (which is where the
| bug is). What are you doing differently? It seems to me
| that you're either not going to optimize this or you are?
| tptacek wrote:
| The DTrace instruction set is more limited than that of
| the eBPF VM; eBPF is essentially a fully functional ISA,
| where DTrace was (if I'm remembering this right) designed
| around the D script language. An eBPF program is often
| just a clang C program, and you're trusting the kernel
| verifier to reject it if it can't be proven safe.
| Further: eBPF programs are JIT'd to actual machine code;
| once you've loaded and verified an eBPF program, it has
| conceptually all the same power as, say, shellcode you
| managed to load into the kernel via an LPE.
|
| That's not to say that security researchers couldn't find
| DTrace vulnerabilities if they, for instance, built
| DIF/DOF fuzzers of 2023 levels of sophistication for
| them. I don't know that anyone's doing that, because
| DTrace is more or less a dead letter.
| solarengineer wrote:
| For those who read this thread - DTrace is in use in
| Solaris and in Illumos, and various of us who use Illumos
| for our production use cases (like Oxide does) still very
| much use DTrace.
|
| I appreciate the rest of tptacek's comment which is
| informative. I also acknowledge that there may not be
| fuzzers written that have been disclosed.
| tptacek wrote:
| Oh, sorry, totally fair call-out. There's like a huge
| implicit "on Linux" thing in my brain about all this
| stuff.
|
| I'd also be open to an argument that the code quality in
| DTrace is higher! I spent a week trying to unwind the
| verifier so I could port a facsimile of it to userland.
| It is a lot. My point about fuzzers and stuff isn't that
| I'm concerned DTrace is full of bugs; I'd be surprised if
| it was. My thing is just that everything written in
| memory unsafe kernel code falls against Google Project
| Zero-grade vulnerability research, at some point.
|
| That's true of the rest of the kernel, too! So from a
| threat perspective, maybe it doesn't matter. I think my
| bias here --- that's all it is --- is that neither of
| these instrumentation schemes are things I'd want to
| expose to a shared-kernel cotenant.
|
| Thanks for helping me clarify this.
| ssahoo wrote:
| Wouldn't even the classic loadable kernel mode driver be a
| better choice than a patch and eBpf? I know they are unsafe but
| people who deal with it, know the power comes with
| responsibility.
| tptacek wrote:
| No? SREs roll eBPF programs on the fly just in the process of
| debugging problems; if you tried to do that with an LKM,
| you'd almost certainly blow up your system. People who write
| Linux kernel code routinely crash their systems in the
| process of development.
| techwiz137 wrote:
| In my country we have a saying. "Porcupine in the pants". Sounds
| like for all the good it can do, it isn't written safely and
| carefully.
| deskr wrote:
| With experience you'll realise that despite things being done
| safely and carefully, mistakes can and do pop up.
| bugtodiffer wrote:
| True. There are some nasty bugs in some very well written
| code.
| tptacek wrote:
| A reminder that on the platforms eBPF is most commonly used,
| verifier bugs don't matter much, because unprivileged code isn't
| allowed to load eBPF programs to begin with. Bugs like this are
| thus root -> ring0 vulnerabilities. That's not nothing, but for
| serverside work it's usually worth the tradeoff, especially
| because eBPF's track record for kernel LPEs is actually pretty
| strong compared to the kernel as a whole.
|
| In the setting eBPF is used today, most of the value of the
| verifier is that it's hard to _accidentally_ crash your kernel
| with a bad eBPF program. That is comically untrue about an
| ordinary LKM.
| chc4 wrote:
| The PoC uses eBPF maps as their out-of-bounds pointer, but it
| sounds like it would also be exploitable via non-extended BPF
| programs loadable via seccomp since it's just improper scalar
| value range tracking, which doesn't require any privileges on
| most platforms.
|
| And, of course, root -> ring0 is less of a problem with
| unprivileged user namespaces where you can make yourself
| "root", as we've seen in every eBPF bug PoC since distros
| started turning that on (and have since turned it off again,
| mostly)
| tptacek wrote:
| I just want to say that this is a hell of a nerd snipe.
| chc4 wrote:
| LMAO
|
| Ok that's fair. check_seccomp_filter actually has a more
| restrictive list than just "BPF with no backwards jumps",
| and in particular doesn't allow BPF_IND in the BPF_LDX, so
| you can't read out of bounds because you can't use a
| dynamic displacement...but BPF_STX _is_ allowed, so you can
| probably write out of bounds? BPF_W is the seccomp_data
| address and the control flow diagram they show to compute
| incorrect scalar ranges doesn 't require any backwards
| jumps...
| tptacek wrote:
| I feel like I just played the Uno Reverse card on the
| nerd snipe.
| 10000truths wrote:
| Verifier bugs matter because resolving them is a prerequisite
| for secure unprivileged use of eBPF.
| tptacek wrote:
| Put it this way: verifier bugs matter, but people probably
| don't do unscheduled fleetwide updates to fix them.
| mort96 wrote:
| Verifier bugs matter _for the kernel, which wants eBPF to be
| secure even for unprivileged accounts_.
|
| Verifier bugs don't matter _that much, for most Linux users,
| right now, because unprivileged accounts can 't use eBPF._
| dumpling777 wrote:
| Let's not forget also that we can give CAP_BPF to containers.
| With things like Cilium on the rise, the attack vector of
| landing in container environment that has cap_bpf is more and
| more realistic
| tptacek wrote:
| I don't believe shared-kernel container systems are real
| security boundaries to begin with, so, to me, a container
| running with CAP_BPF isn't much different than any other
| program a machine owner might opt to run; the point is that
| you trust the workload, and so the verifier is more of a
| safety net than a vault door.
| kortilla wrote:
| That pessimistic view is not shared by everyone who is
| working on namespaces, cgroups, etc so I think that's a
| pretty unproductive comment in this context.
|
| It reminds me of early days in hypervisors when someone
| would get an exploit to break out of the isolation and
| someone would dismiss it because "virtual machines aren't
| real isolation anyway".
|
| Look, I get it and I frankly agree with you in the current
| state of the world, but this is the time to shut up and get
| out of the way of people trying to make forward progress.
| Breakouts of containers are a big deal for people pushing
| the boundary there.
| tptacek wrote:
| I don't know who you're really talking to (it's not me),
| but all I'm saying is that CAP_BPF doesn't bother me
| much, because it's problematic only for a security
| boundary that is already problematic with a much lower
| degree of difficulty for attackers than the eBPF
| verifier.
| mrbluecoat wrote:
| > "Uno no es ninguno" (One is none)
|
| I believe that translates to "One is not none"
|
| https://bughunters.google.com/blog/6303226026131456/a-deep-d...
| DanielVZ wrote:
| Thats the direct translation but for some reason in spanish our
| double negations are usually just negations.
| kmarc wrote:
| It doesn't; It translates to "One is none" This is the infamous
| double negation many foreign speakers (including me) struggles
| with.
|
| https://spanish.stackexchange.com/questions/26777/how-does-d...
| samatman wrote:
| Perhaps we should translate this as "one ain't nothin'".
| TacticalCoder wrote:
| > "Uno no es ninguno" (One is none)
|
| Literally "One not is none", aka "One is _not_ none ".
| jolmg wrote:
| In Spanish, it's common for double negatives to not actually be
| double negatives. For example, if you wanted to say "there's
| nothing here", you'd say "no hay nada aqui", which word-for-
| word means "there's not nothing here".
|
| Checking out the Royal Spanish Academy, here's what they say
| about it:
|
| https://www.rae.es/espanol-al-dia/doble-negacion-no-vino-nad...
|
| > The so-called "double negation" is due to the obligatory
| negative agreement that must be established in Spanish, and
| other Romance languages, in certain circumstances (see New
| Grammar, SS 48.3d), which results in the joint presence in the
| statement of the adverb _no_ and other elements that also have
| a negative meaning.
|
| > The concurrence of these two "negations" does not annul the
| negative meaning of the statement.
| stirfish wrote:
| I like to think of it as additive negatives, as opposed to
| multiplicative negatives.
| cassepipe wrote:
| It's true but I don't think this would apply for such a
| simple statement as in this case else how would you say "One
| is _not_ none " in spanish ?
| dgb23 wrote:
| My guess is you wouldn't use negation.
| mejutoco wrote:
| Uno no es ninguno or uno no es cero or uno es diferente de
| cero all communicate this correctly IMO.
| cassepipe wrote:
| But "Uno no es ninguno" is the original phrase that's
| given for "One is none"
| b0afc375b5 wrote:
| I guess this is similar to english: "I ain't no snitch",
| which is a double negative but is equivalent to its single
| negative counterpart.
| mejutoco wrote:
| Same in French: "Je ne sais pas" means I do not know, not I
| do not not know (aka I know).
|
| In any case, the meaning of the sentence above: "uno no es
| ninguno" in Spanish is clearly one is not zero, or one is not
| none, or one is different than none.
|
| "Uno no es nada" could be "one is nothing", and "one is not
| nothing". It all depends on the frame of reference (in this
| case English), but for this sentence, the "one is not none"
| is correct IMO. I would never even do a second pass on that
| sentence, as a native Spanish speaker (appeal to authority, I
| know)
___________________________________________________________________
(page generated 2024-08-09 23:02 UTC)