[HN Gopher] Why Xen Wasn't Hit by RETBleed on Intel CPUs
       ___________________________________________________________________
        
       Why Xen Wasn't Hit by RETBleed on Intel CPUs
        
       Author : plam503711
       Score  : 116 points
       Date   : 2022-08-26 10:06 UTC (12 hours ago)
        
 (HTM) web link (xcp-ng.org)
 (TXT) w3m dump (xcp-ng.org)
        
       | api wrote:
       | > Security is hard
       | 
       | Unrelated but I am getting so sick of this cliche.
       | 
       | Thinking of more insightful ways to explain why something is
       | challenging is hard.
        
         | effie wrote:
         | Also: hardware is hard, cryptography is hard. Doing anything
         | new and important is hard.
        
       | kramerger wrote:
       | So basically when people were fixing Spectre 2 Xen developers
       | enabled an Intel thing that restricts speculation in indirect
       | branches while Linux opted to not use that.
       | 
       | What is the performance penalty for this thing? Was performance
       | the reason for Linux's refusal?
        
         | plam503711 wrote:
         | You can read the original conversation from 2018 here:
         | https://lkml.org/lkml/2018/1/21/192
        
           | Sakos wrote:
           | Previous discussion (2018):
           | https://news.ycombinator.com/item?id=16202205
           | 
           | I get the sense that IBRS wasn't used because Linus 1) didn't
           | want to take on the responsibility of fixing something that
           | needs to be fixed in hardware by Intel and 2) didn't want to
           | help Intel avoid responsibility for shipping insecure
           | hardware. The same way he's been antagonistic towards nVidia
           | for being bad participants.
        
             | nousermane wrote:
             | > Linus (...) didn't want to help Intel avoid
             | responsibility for shipping insecure hardware.
             | 
             | Whatever is the saying about good intentions, you can argue
             | he ended up doing Intel a favour here, by shipping patch
             | that took away only a couple % of performance. Surely, it
             | would've hurt them more, reputation-wise, if default
             | workaround took larger chunk of CPU speed.
        
               | Sakos wrote:
               | Possibly.
               | 
               | It also seems like the changes would've added an amount
               | of technical debt that wasn't acceptable to him ("garbage
               | MSR writes" in and out of the kernel). I could understand
               | why he'd want to avoid a bad solution (that would also
               | have to be maintained long-term) just because Intel is
               | unwilling to fix it on their end.
        
             | generalizations wrote:
             | Sounds like Linux/Linus figured they had enough clout to
             | call bs on the patch, while Xen just focused on doing the
             | best with what they had. I honestly don't blame either.
        
       | dontlaugh wrote:
       | This absolutely vindicates Xen's approach vs Linux's. Performance
       | gained by speculative execution sadly must be given up for safety
       | as a default, to only be re-gained in specific cases where safety
       | is certain.
       | 
       | It also vindicates the naming of "Spectre", it most definitely
       | keeps showing up.
       | 
       | The article reads a bit too much like "I told you so", but
       | ultimately the author is correct to say it.
        
         | ajross wrote:
         | > This absolutely vindicates Xen's approach vs Linux's
         | 
         | Meh. It likely just points out that Xen is a much more limited
         | software environment with much less dependence on indirect
         | branching[1]. There are environments where IBRS has high cost
         | and ones where it doesn't. Linux is in the former category.
         | 
         | Xen also has the advantage of being a hypervisor, meaning that
         | if all they do is expose IBRS to the guest, they can (somewhat
         | cleverly) claim that any resulting vulnerabilities are the
         | fault of the guest software not implementing them. Linux
         | exposes a Unix userspace, and no one told userspace apps they
         | need to use speculation barriers.
         | 
         | Really this article is mostly just marketing. It's a win for
         | Xen, sure, and they should crow about it. But we should
         | recognize crowing vs. genuine security analysis, too.
         | 
         | [1] Vs. say, Linux, which has an extremely robust polymorphic
         | device/bus/probe model where all the methods are function
         | pointers.
        
           | gizmo686 wrote:
           | It is easy to be fast if you are willing to ship broken
           | software. The Linux solution was broken. People knew the
           | Linux solution was broken when it was shipped. The Linux
           | developers knew what a non broken solution would be because
           | the CPU manufacturer told them. [0] Linux decided to go with
           | the broken solution. This attitude is not specific to Linux,
           | it is pervasive throughout the entire industry. It is the
           | reason that few people take a security volnerability report
           | seriously until someone turns it into a full exploit.
           | 
           | Frankly, Xen's work here was not at all impressive; they just
           | applied a fix that Intel told everone to apply. The fact that
           | this is a differentiating thing for them to market with is an
           | indictment of everyone who didn't apply it, and the industry
           | conditions that led to them.
           | 
           | [0] In fairness, the reason we are in this mesz is that said
           | CPU manufacturer has been releasing broken products in the
           | name of speed themselves.
        
             | [deleted]
        
             | ajross wrote:
             | > The Linux solution was broken. People knew the Linux
             | solution was broken when it was shipped.
             | 
             | That is not a fair characterization. There are endless
             | mountains of theoretical vulnerabilities[1], and no one
             | (certainly not including Xen) tries to mitigate them all
             | blindly. The dwm post linked in the article explicitly says
             | he's not losing sleep over the issue. Everyone (yes, likely
             | including Xen) believed in good faith that this was not
             | practically exploitable.
             | 
             | > Frankly, Xen's work here was not at all impressive; they
             | just applied a fix that Intel told everone to apply.
             | 
             | And this seems like a misunderstanding too. My gathering
             | from the linked article is that Xen virtualized the barrier
             | mechanism such that the job could be farmed out to guest
             | OSes. Someone running an unpatched Linux under Xen (which
             | is, what, 90+% of the worldwide cloud?) is still
             | vulnerable. But "Xen" is not, which seems maybe less
             | impactful than the marketing being presented would have you
             | believe.
             | 
             | [1] Rowhammer says hi.
        
               | paulmd wrote:
               | > There are endless mountains of theoretical
               | vulnerabilities[1], and no one (certainly not including
               | Xen) tries to mitigate them all blindly.
               | 
               | I mean, not nobody. That's sort of the _raison d 'etre_
               | of OpenBSD.
               | 
               | We're talking about a distro that wasn't affected by the
               | latest round of speculation vulnerabilities in AMD's SMT
               | implementation because as soon as they heard about
               | Spectre/Meltdown they _immediately realized that SMT was
               | gonna be a giant pile of sidechannels and disabled it on
               | all processors, even the ones that were believed safe at
               | the time_. They take  "defensive engineering" extremely
               | seriously and will mitigate anything that seems
               | plausible.
               | 
               | That was controversial at the time (extreme performance
               | cost! and AMD isn't affected so why do they have to
               | suffer!?) and they ended up being right, there were more
               | vulnerabilities to come based on SMT leaking data to the
               | other thread.
               | 
               | Nobody mitigates implausible/theoretical ideas that don't
               | seem likely to work, but, a good software engineer
               | certainly should be mitigating things that seem like
               | _reasonably feasible extensions of existing attacks_ ,
               | and hardening their environments in general to mitigate
               | the impact if something should pop up. That's not
               | extraordinary foresight, that's just part of the job.
               | 
               | Linus's decision did not follow good engineering
               | practices, and there _are_ examples of other OSs and
               | distros that _did_ do it properly. Xen may or may not not
               | be one of them, it 's certainly possible to accidentally
               | fall into a safe path (as AMD likely did on Meltdown,
               | given the broad multi-vendor scope of the vuln), or the
               | "right path" could simply have been easy for them to
               | take, but nobody should be defending Linus on the basis
               | of "nobody could have known". The decisions he made were
               | unsafe and incompatible with a defensive-engineering
               | mindset, and he was told this at the time.
               | 
               | Linus's "why are we doing all this over a handful of
               | broken intel processors" mindset is exactly the trap that
               | OpenBSD avoided falling into. They knew it wasn't just
               | going to be just a handful of broken intel processors,
               | SMT is fundamentally a shared resource and once they saw
               | the basis of Spectre-style sidechannels they knew SMT was
               | gonna be a steady drip-drip-drip of vulnerabilities
               | _across all architectures_. That was very foreseeable,
               | when I saw the OpenBSD thing at the time it was like
               | "yeah, probably gonna end up being a good call...".
               | 
               | For another "yeah, probably gonna be a problem down the
               | road": KPTI really needs to be enabled-by-default on AMD
               | processors. The Prefetch+TLB attack is still un-mitigated
               | in hardware and AMD relies on KPTI for protection, but
               | still recommends it be disabled by default for
               | performance reasons. The data bleed rate is faster than
               | Meltdown and it's really past time to turn it on by
               | default regardless of what it does to AMD's benchmark
               | numbers. It should have been on-by-default in the first
               | place, and now it's actually got demonstrated exploits
               | leaking kernel memory. Another risky, non-defensive call
               | from the Linux tech-leads.
        
               | ajross wrote:
               | Uh... OpenBSD _is susceptible to Retbleed_ , which
               | doesn't involve SMT behavior, only branch prediction
               | state on a single CPU. The very subject under discussion
               | seems to invalidate your point. OpenBSD, like everyone
               | else, made a call not to patch this vulnerability
               | proactively because it didn't seem exploitable. And like
               | the rest of us, they were wrong (a little -- Retbleed
               | remains a _very_ slow channel, but it 's real).
        
               | bonzini wrote:
               | > Xen virtualized the barrier mechanism such that the job
               | could be farmed out to guest OSes.
               | 
               | All hypervisors do that, including KVM. The difference is
               | that _because_ Xen has to let the guest control the
               | speculation control MSRs, it has to read and write the
               | MSR anyway on every guest <->host context switch. Using
               | IBRS in Xen comes essentially for free.
               | 
               | Linux on the other hand does _not_ have to access the
               | speculation control MSR on every userspace <->kernel
               | context switch, and doing so would have had a bigger
               | performance impact than retpolines. Therefore it took a
               | different approach.
               | 
               | Now the performance impact wasn't that bad on Skylake and
               | it probably would have been good to use IBRS on those
               | processors. FWIW very old versions of RHEL (6 and 7) in
               | fact did use IBRS instead of retpoline because we had
               | little time (there were less than two months from the
               | time the team was put together to the time we had to have
               | something ready to be shipped to customers) and it even
               | took days to read people on the issue because of how
               | secret it was. So we didn't want to put the compiler
               | update on the critical path.
        
             | leoc wrote:
             | > [0] In fairness, the reason we are in this mesz is that
             | said CPU manufacturer has been releasing broken products in
             | the name of speed themselves.
             | 
             | As bad as Intel's record there was, it's hard to really
             | single it out either. It certainly seems as if the whole
             | industry--CPU manufacturers, integrators, academics, kernel
             | devs, the lot--simply agreed not to notice that this
             | category of vulnerability existed until the moment it was
             | fully impossible to ignore.
        
         | plam503711 wrote:
         | The real golden-"I told you so" (that triggered the idea to
         | write this very blog post) comes from a tweet of David
         | Woodhouse last July:
         | https://twitter.com/dwmw2/status/1549042968320811008
        
           | eru wrote:
           | Direct link: https://lkml.org/lkml/2018/1/22/598
        
         | [deleted]
        
       | Lind5 wrote:
       | More info here from the discoverers at ETH Zurich
       | https://comsec.ethz.ch/research/microarch/retbleed/ and here is
       | the actual technical paper https://comsec.ethz.ch/wp-
       | content/files/retbleed_sec22.pdf
        
       | bityard wrote:
       | Has there ever been a practical speculative execution attack
       | found in the wild?
        
       | adultSwim wrote:
       | David Woodhouse was right all along
        
       | effie wrote:
       | > Mostly because people didn't believe that it was possible to
       | exploit the retpoline limitations versus the performance penalty
       | to mitigate them.
       | 
       | I'm not a native speaker, but is this an acceptable written
       | English? The part starting with "versus" seems out of place.
        
       | nano9 wrote:
       | Dear author, if you're the one sharing the article: you
       | misspelled "retpoline" in your very first usage of the word.
        
         | karamanolev wrote:
         | Must've been a rowhammer attack flipping some bits...
        
           | doubled112 wrote:
           | Haven't there been solar storms this week?
           | 
           | Maybe it wasn't malicious, maybe it was cosmic rays.
        
         | [deleted]
        
         | plam503711 wrote:
         | A spectre silently fixed the spelling, thanks for the feedback
         | ;)
        
           | ClassyJacket wrote:
           | I see you fixed rewrite as well, just as I was getting ready
           | to point it out :P
        
             | plam503711 wrote:
             | Yes, sorry for that, I was really more focused on getting
             | the story details than the spelling (plus I'm not a native
             | speaker as you probably guessed).
        
               | mmastrac wrote:
               | It wasn't obvious at all when reading - your written
               | English is excellent. Only after revisiting and re-
               | reading more carefully I noticed you used the
               | construction "So ...." slightly more than a native
               | speaker would have.
               | 
               | Thanks for the great article.
        
               | plam503711 wrote:
               | Thank you, both for being kind and also providing a
               | constructive feedback. "So" is very common trap for
               | French speakers :D
        
       | Thaxll wrote:
       | I forgot about Xen, is it still popular I thought everything
       | moved to KVM in the last 10years?
        
         | sofixa wrote:
         | Not really. VMware vSphere and KVM based virtualisations are by
         | far the most popular. Xen's last holdout was AWS' custom
         | version of it, but that has been replaced since a few
         | generations ago with the KVM-based Nitro.
         | 
         | So it's basically a niche thing, mostly used by those who
         | already had it/know it (VMware vSphere is going down that road
         | as well, it's basically legacy today).
        
         | eixiepia wrote:
         | Yes it's still popular in some places, and lots of new
         | development is going on. Xen is superior to KVM in my opinion.
        
           | AshamedCaptain wrote:
           | I'd actually _like_ to use Xen, but as far as I can see is
           | just dead in all but name.
           | 
           | To name one example, nested virtualization support is not
           | only hopelessly broken, it's MORE broken in recent releases
           | than it was a decade ago. You can see right here how the
           | feature kept getting broken by every other release until
           | nothing worked anymore: https://wiki.xenproject.org/wiki/Nest
           | ed_Virtualization_in_Xe....
           | 
           | And Xen is literally the only virtualizer out there that does
           | not support nested virtualization, which is a rather critical
           | feature since many (dev) stacks assume one has hardware
           | virtualization, and Windows is going to require it sooner
           | than later.
        
             | naasking wrote:
             | If nested virtualization isn't used by their main customers
             | I'm not sure it's that critical to them
        
             | plam503711 wrote:
             | Xen Project is far being dead (there's a lot of activity in
             | the mailing list, and now, thanks to new contributors like
             | Vates/XCP-ng, there's also more initiatives to have a
             | decent project tracking, see https://gitlab.com/groups/xen-
             | project/-/epics?state=opened&p... for example).
             | 
             | Regarding nested virt, you are mostly right: it's only
             | "working-ish" for basic things, but indeed, it's broken
             | when you start to use anything heavy in your nested VM. The
             | main reason nobody fixed it is because it's not really
             | used: as any other open source project, you find what you
             | need if you contribute. Obviously, as soon someone will
             | need this and willing to contribute, it will change :)
        
         | gwd wrote:
         | Xen and KVM are different beasts.
         | 
         | Xen can implement things like a CPU scheduler exclusively
         | focused on VMs, while KVM has to deal with the normal Linux
         | scheduler for processes.
         | 
         | Xen can do advanced defense-in-depth techniques like driver
         | domains -- something impossible to implement on KVM.
         | 
         | Xen has a mature security response process; if you're a cloud
         | provider, or ship anything with Xen inside of it, you can be
         | notified of security issues typically two weeks before the
         | public disclosure; and we're quite thorough about what we issue
         | security alerts for. For KVM, you just have to hope that your
         | functionality is worth issuing a CVE about, and unless you're a
         | distro, you're only going to be told after it's been made
         | public.
         | 
         | Xen is a microkernel, so you can run it on tiny embedded
         | devices for which Linux / KVM would be too big. Xen is small
         | enough that it's actually feasible to do Functional Safety
         | Certification on it.
         | 
         | That's why Xen is still used by QubesOS, the NSA, and various
         | defense contractors; why a number of cloud providers (including
         | say, Ghandi.net) use Xen; why Xilinx has their own Xen
         | distribution; and why Xen is in the reference implementation
         | for ARM's automotive stack -- in addition to being officially
         | supported by SUSE, and being the driving engine behind Citrix
         | Hypervisor and XCP-ng.
        
         | plam503711 wrote:
         | "The reports of my death are greatly exaggerated"
         | 
         | Regards,
         | 
         | Xen.
        
           | Joker_vD wrote:
           | It must be Delphi's brother then: not quite dead, but calling
           | them alive too would be a stretch. What a wonderful world
           | full of undead technologies we live in.
        
         | robcohen wrote:
         | QubesOS uses Xen
        
         | schainks wrote:
         | Spin up an EC2 node recently?
        
           | monocasa wrote:
           | I was under the impression that Amazon has a custom,
           | proprietary hypervisor these days that's simply compatible
           | with Xen's hypervisor<->guest interfaces.
        
             | bonzini wrote:
             | These days Amazon emulates the Xen hypercall interface on
             | top of KVM. It's pretty much their sole substantial
             | contribution to upstream KVM (by David Woodhouse that's
             | mentioned elsewhere in the comments in fact).
             | 
             | The implementation is split between KVM and their
             | proprietary equivalent of QEMU.
        
             | schainks wrote:
             | The Xen project members list suggests otherwise:
             | https://xenproject.org/about-us/project-members/
             | 
             | Even if it is a custom hypervisor, AWS likely derived it
             | from Xen and sponsors the project to continue doing so.
        
               | [deleted]
        
               | plam503711 wrote:
               | It's not a custom Xen: it's *is* Xen (AFAIK, with
               | possibly their own patch queue for some specific needs on
               | top of it). What's custom is the toolstack around it :)
        
               | schainks wrote:
               | Yeah that's what I thought, but I couldn't find
               | definitive proof besides old posts from the early days.
        
               | Thev00d00 wrote:
               | nitro is KVM based
        
       | saagarjha wrote:
       | Question: what's the performance difference for enabling this in
       | Linux versus in Xen? Naively I might expect that a system
       | probably spends more time in Linux code, so the overall impact
       | might be higher...
        
         | bonzini wrote:
         | See https://news.ycombinator.com/item?id=32607709; in addition
         | to what you say, enabling IBRS basically comes for free in Xen
         | because its "userspace" is actually guest kernel code that can
         | itself control processor speculation.
        
           | monocasa wrote:
           | Whatever comment you linked to seems to be deleted. Can you
           | summarize?
        
             | glandium wrote:
             | The link is good, but a semi-colon got stuck to it. If you
             | remove the semi-colon, you'll get to the comment.
        
       ___________________________________________________________________
       (page generated 2022-08-26 23:01 UTC)