[HN Gopher] Why Xen Wasn't Hit by RETBleed on Intel CPUs
___________________________________________________________________
Why Xen Wasn't Hit by RETBleed on Intel CPUs
Author : plam503711
Score : 116 points
Date : 2022-08-26 10:06 UTC (12 hours ago)
(HTM) web link (xcp-ng.org)
(TXT) w3m dump (xcp-ng.org)
| api wrote:
| > Security is hard
|
| Unrelated but I am getting so sick of this cliche.
|
| Thinking of more insightful ways to explain why something is
| challenging is hard.
| effie wrote:
| Also: hardware is hard, cryptography is hard. Doing anything
| new and important is hard.
| kramerger wrote:
| So basically when people were fixing Spectre 2 Xen developers
| enabled an Intel thing that restricts speculation in indirect
| branches while Linux opted to not use that.
|
| What is the performance penalty for this thing? Was performance
| the reason for Linux's refusal?
| plam503711 wrote:
| You can read the original conversation from 2018 here:
| https://lkml.org/lkml/2018/1/21/192
| Sakos wrote:
| Previous discussion (2018):
| https://news.ycombinator.com/item?id=16202205
|
| I get the sense that IBRS wasn't used because Linus 1) didn't
| want to take on the responsibility of fixing something that
| needs to be fixed in hardware by Intel and 2) didn't want to
| help Intel avoid responsibility for shipping insecure
| hardware. The same way he's been antagonistic towards nVidia
| for being bad participants.
| nousermane wrote:
| > Linus (...) didn't want to help Intel avoid
| responsibility for shipping insecure hardware.
|
| Whatever is the saying about good intentions, you can argue
| he ended up doing Intel a favour here, by shipping patch
| that took away only a couple % of performance. Surely, it
| would've hurt them more, reputation-wise, if default
| workaround took larger chunk of CPU speed.
| Sakos wrote:
| Possibly.
|
| It also seems like the changes would've added an amount
| of technical debt that wasn't acceptable to him ("garbage
| MSR writes" in and out of the kernel). I could understand
| why he'd want to avoid a bad solution (that would also
| have to be maintained long-term) just because Intel is
| unwilling to fix it on their end.
| generalizations wrote:
| Sounds like Linux/Linus figured they had enough clout to
| call bs on the patch, while Xen just focused on doing the
| best with what they had. I honestly don't blame either.
| dontlaugh wrote:
| This absolutely vindicates Xen's approach vs Linux's. Performance
| gained by speculative execution sadly must be given up for safety
| as a default, to only be re-gained in specific cases where safety
| is certain.
|
| It also vindicates the naming of "Spectre", it most definitely
| keeps showing up.
|
| The article reads a bit too much like "I told you so", but
| ultimately the author is correct to say it.
| ajross wrote:
| > This absolutely vindicates Xen's approach vs Linux's
|
| Meh. It likely just points out that Xen is a much more limited
| software environment with much less dependence on indirect
| branching[1]. There are environments where IBRS has high cost
| and ones where it doesn't. Linux is in the former category.
|
| Xen also has the advantage of being a hypervisor, meaning that
| if all they do is expose IBRS to the guest, they can (somewhat
| cleverly) claim that any resulting vulnerabilities are the
| fault of the guest software not implementing them. Linux
| exposes a Unix userspace, and no one told userspace apps they
| need to use speculation barriers.
|
| Really this article is mostly just marketing. It's a win for
| Xen, sure, and they should crow about it. But we should
| recognize crowing vs. genuine security analysis, too.
|
| [1] Vs. say, Linux, which has an extremely robust polymorphic
| device/bus/probe model where all the methods are function
| pointers.
| gizmo686 wrote:
| It is easy to be fast if you are willing to ship broken
| software. The Linux solution was broken. People knew the
| Linux solution was broken when it was shipped. The Linux
| developers knew what a non broken solution would be because
| the CPU manufacturer told them. [0] Linux decided to go with
| the broken solution. This attitude is not specific to Linux,
| it is pervasive throughout the entire industry. It is the
| reason that few people take a security volnerability report
| seriously until someone turns it into a full exploit.
|
| Frankly, Xen's work here was not at all impressive; they just
| applied a fix that Intel told everone to apply. The fact that
| this is a differentiating thing for them to market with is an
| indictment of everyone who didn't apply it, and the industry
| conditions that led to them.
|
| [0] In fairness, the reason we are in this mesz is that said
| CPU manufacturer has been releasing broken products in the
| name of speed themselves.
| [deleted]
| ajross wrote:
| > The Linux solution was broken. People knew the Linux
| solution was broken when it was shipped.
|
| That is not a fair characterization. There are endless
| mountains of theoretical vulnerabilities[1], and no one
| (certainly not including Xen) tries to mitigate them all
| blindly. The dwm post linked in the article explicitly says
| he's not losing sleep over the issue. Everyone (yes, likely
| including Xen) believed in good faith that this was not
| practically exploitable.
|
| > Frankly, Xen's work here was not at all impressive; they
| just applied a fix that Intel told everone to apply.
|
| And this seems like a misunderstanding too. My gathering
| from the linked article is that Xen virtualized the barrier
| mechanism such that the job could be farmed out to guest
| OSes. Someone running an unpatched Linux under Xen (which
| is, what, 90+% of the worldwide cloud?) is still
| vulnerable. But "Xen" is not, which seems maybe less
| impactful than the marketing being presented would have you
| believe.
|
| [1] Rowhammer says hi.
| paulmd wrote:
| > There are endless mountains of theoretical
| vulnerabilities[1], and no one (certainly not including
| Xen) tries to mitigate them all blindly.
|
| I mean, not nobody. That's sort of the _raison d 'etre_
| of OpenBSD.
|
| We're talking about a distro that wasn't affected by the
| latest round of speculation vulnerabilities in AMD's SMT
| implementation because as soon as they heard about
| Spectre/Meltdown they _immediately realized that SMT was
| gonna be a giant pile of sidechannels and disabled it on
| all processors, even the ones that were believed safe at
| the time_. They take "defensive engineering" extremely
| seriously and will mitigate anything that seems
| plausible.
|
| That was controversial at the time (extreme performance
| cost! and AMD isn't affected so why do they have to
| suffer!?) and they ended up being right, there were more
| vulnerabilities to come based on SMT leaking data to the
| other thread.
|
| Nobody mitigates implausible/theoretical ideas that don't
| seem likely to work, but, a good software engineer
| certainly should be mitigating things that seem like
| _reasonably feasible extensions of existing attacks_ ,
| and hardening their environments in general to mitigate
| the impact if something should pop up. That's not
| extraordinary foresight, that's just part of the job.
|
| Linus's decision did not follow good engineering
| practices, and there _are_ examples of other OSs and
| distros that _did_ do it properly. Xen may or may not not
| be one of them, it 's certainly possible to accidentally
| fall into a safe path (as AMD likely did on Meltdown,
| given the broad multi-vendor scope of the vuln), or the
| "right path" could simply have been easy for them to
| take, but nobody should be defending Linus on the basis
| of "nobody could have known". The decisions he made were
| unsafe and incompatible with a defensive-engineering
| mindset, and he was told this at the time.
|
| Linus's "why are we doing all this over a handful of
| broken intel processors" mindset is exactly the trap that
| OpenBSD avoided falling into. They knew it wasn't just
| going to be just a handful of broken intel processors,
| SMT is fundamentally a shared resource and once they saw
| the basis of Spectre-style sidechannels they knew SMT was
| gonna be a steady drip-drip-drip of vulnerabilities
| _across all architectures_. That was very foreseeable,
| when I saw the OpenBSD thing at the time it was like
| "yeah, probably gonna end up being a good call...".
|
| For another "yeah, probably gonna be a problem down the
| road": KPTI really needs to be enabled-by-default on AMD
| processors. The Prefetch+TLB attack is still un-mitigated
| in hardware and AMD relies on KPTI for protection, but
| still recommends it be disabled by default for
| performance reasons. The data bleed rate is faster than
| Meltdown and it's really past time to turn it on by
| default regardless of what it does to AMD's benchmark
| numbers. It should have been on-by-default in the first
| place, and now it's actually got demonstrated exploits
| leaking kernel memory. Another risky, non-defensive call
| from the Linux tech-leads.
| ajross wrote:
| Uh... OpenBSD _is susceptible to Retbleed_ , which
| doesn't involve SMT behavior, only branch prediction
| state on a single CPU. The very subject under discussion
| seems to invalidate your point. OpenBSD, like everyone
| else, made a call not to patch this vulnerability
| proactively because it didn't seem exploitable. And like
| the rest of us, they were wrong (a little -- Retbleed
| remains a _very_ slow channel, but it 's real).
| bonzini wrote:
| > Xen virtualized the barrier mechanism such that the job
| could be farmed out to guest OSes.
|
| All hypervisors do that, including KVM. The difference is
| that _because_ Xen has to let the guest control the
| speculation control MSRs, it has to read and write the
| MSR anyway on every guest <->host context switch. Using
| IBRS in Xen comes essentially for free.
|
| Linux on the other hand does _not_ have to access the
| speculation control MSR on every userspace <->kernel
| context switch, and doing so would have had a bigger
| performance impact than retpolines. Therefore it took a
| different approach.
|
| Now the performance impact wasn't that bad on Skylake and
| it probably would have been good to use IBRS on those
| processors. FWIW very old versions of RHEL (6 and 7) in
| fact did use IBRS instead of retpoline because we had
| little time (there were less than two months from the
| time the team was put together to the time we had to have
| something ready to be shipped to customers) and it even
| took days to read people on the issue because of how
| secret it was. So we didn't want to put the compiler
| update on the critical path.
| leoc wrote:
| > [0] In fairness, the reason we are in this mesz is that
| said CPU manufacturer has been releasing broken products in
| the name of speed themselves.
|
| As bad as Intel's record there was, it's hard to really
| single it out either. It certainly seems as if the whole
| industry--CPU manufacturers, integrators, academics, kernel
| devs, the lot--simply agreed not to notice that this
| category of vulnerability existed until the moment it was
| fully impossible to ignore.
| plam503711 wrote:
| The real golden-"I told you so" (that triggered the idea to
| write this very blog post) comes from a tweet of David
| Woodhouse last July:
| https://twitter.com/dwmw2/status/1549042968320811008
| eru wrote:
| Direct link: https://lkml.org/lkml/2018/1/22/598
| [deleted]
| Lind5 wrote:
| More info here from the discoverers at ETH Zurich
| https://comsec.ethz.ch/research/microarch/retbleed/ and here is
| the actual technical paper https://comsec.ethz.ch/wp-
| content/files/retbleed_sec22.pdf
| bityard wrote:
| Has there ever been a practical speculative execution attack
| found in the wild?
| adultSwim wrote:
| David Woodhouse was right all along
| effie wrote:
| > Mostly because people didn't believe that it was possible to
| exploit the retpoline limitations versus the performance penalty
| to mitigate them.
|
| I'm not a native speaker, but is this an acceptable written
| English? The part starting with "versus" seems out of place.
| nano9 wrote:
| Dear author, if you're the one sharing the article: you
| misspelled "retpoline" in your very first usage of the word.
| karamanolev wrote:
| Must've been a rowhammer attack flipping some bits...
| doubled112 wrote:
| Haven't there been solar storms this week?
|
| Maybe it wasn't malicious, maybe it was cosmic rays.
| [deleted]
| plam503711 wrote:
| A spectre silently fixed the spelling, thanks for the feedback
| ;)
| ClassyJacket wrote:
| I see you fixed rewrite as well, just as I was getting ready
| to point it out :P
| plam503711 wrote:
| Yes, sorry for that, I was really more focused on getting
| the story details than the spelling (plus I'm not a native
| speaker as you probably guessed).
| mmastrac wrote:
| It wasn't obvious at all when reading - your written
| English is excellent. Only after revisiting and re-
| reading more carefully I noticed you used the
| construction "So ...." slightly more than a native
| speaker would have.
|
| Thanks for the great article.
| plam503711 wrote:
| Thank you, both for being kind and also providing a
| constructive feedback. "So" is very common trap for
| French speakers :D
| Thaxll wrote:
| I forgot about Xen, is it still popular I thought everything
| moved to KVM in the last 10years?
| sofixa wrote:
| Not really. VMware vSphere and KVM based virtualisations are by
| far the most popular. Xen's last holdout was AWS' custom
| version of it, but that has been replaced since a few
| generations ago with the KVM-based Nitro.
|
| So it's basically a niche thing, mostly used by those who
| already had it/know it (VMware vSphere is going down that road
| as well, it's basically legacy today).
| eixiepia wrote:
| Yes it's still popular in some places, and lots of new
| development is going on. Xen is superior to KVM in my opinion.
| AshamedCaptain wrote:
| I'd actually _like_ to use Xen, but as far as I can see is
| just dead in all but name.
|
| To name one example, nested virtualization support is not
| only hopelessly broken, it's MORE broken in recent releases
| than it was a decade ago. You can see right here how the
| feature kept getting broken by every other release until
| nothing worked anymore: https://wiki.xenproject.org/wiki/Nest
| ed_Virtualization_in_Xe....
|
| And Xen is literally the only virtualizer out there that does
| not support nested virtualization, which is a rather critical
| feature since many (dev) stacks assume one has hardware
| virtualization, and Windows is going to require it sooner
| than later.
| naasking wrote:
| If nested virtualization isn't used by their main customers
| I'm not sure it's that critical to them
| plam503711 wrote:
| Xen Project is far being dead (there's a lot of activity in
| the mailing list, and now, thanks to new contributors like
| Vates/XCP-ng, there's also more initiatives to have a
| decent project tracking, see https://gitlab.com/groups/xen-
| project/-/epics?state=opened&p... for example).
|
| Regarding nested virt, you are mostly right: it's only
| "working-ish" for basic things, but indeed, it's broken
| when you start to use anything heavy in your nested VM. The
| main reason nobody fixed it is because it's not really
| used: as any other open source project, you find what you
| need if you contribute. Obviously, as soon someone will
| need this and willing to contribute, it will change :)
| gwd wrote:
| Xen and KVM are different beasts.
|
| Xen can implement things like a CPU scheduler exclusively
| focused on VMs, while KVM has to deal with the normal Linux
| scheduler for processes.
|
| Xen can do advanced defense-in-depth techniques like driver
| domains -- something impossible to implement on KVM.
|
| Xen has a mature security response process; if you're a cloud
| provider, or ship anything with Xen inside of it, you can be
| notified of security issues typically two weeks before the
| public disclosure; and we're quite thorough about what we issue
| security alerts for. For KVM, you just have to hope that your
| functionality is worth issuing a CVE about, and unless you're a
| distro, you're only going to be told after it's been made
| public.
|
| Xen is a microkernel, so you can run it on tiny embedded
| devices for which Linux / KVM would be too big. Xen is small
| enough that it's actually feasible to do Functional Safety
| Certification on it.
|
| That's why Xen is still used by QubesOS, the NSA, and various
| defense contractors; why a number of cloud providers (including
| say, Ghandi.net) use Xen; why Xilinx has their own Xen
| distribution; and why Xen is in the reference implementation
| for ARM's automotive stack -- in addition to being officially
| supported by SUSE, and being the driving engine behind Citrix
| Hypervisor and XCP-ng.
| plam503711 wrote:
| "The reports of my death are greatly exaggerated"
|
| Regards,
|
| Xen.
| Joker_vD wrote:
| It must be Delphi's brother then: not quite dead, but calling
| them alive too would be a stretch. What a wonderful world
| full of undead technologies we live in.
| robcohen wrote:
| QubesOS uses Xen
| schainks wrote:
| Spin up an EC2 node recently?
| monocasa wrote:
| I was under the impression that Amazon has a custom,
| proprietary hypervisor these days that's simply compatible
| with Xen's hypervisor<->guest interfaces.
| bonzini wrote:
| These days Amazon emulates the Xen hypercall interface on
| top of KVM. It's pretty much their sole substantial
| contribution to upstream KVM (by David Woodhouse that's
| mentioned elsewhere in the comments in fact).
|
| The implementation is split between KVM and their
| proprietary equivalent of QEMU.
| schainks wrote:
| The Xen project members list suggests otherwise:
| https://xenproject.org/about-us/project-members/
|
| Even if it is a custom hypervisor, AWS likely derived it
| from Xen and sponsors the project to continue doing so.
| [deleted]
| plam503711 wrote:
| It's not a custom Xen: it's *is* Xen (AFAIK, with
| possibly their own patch queue for some specific needs on
| top of it). What's custom is the toolstack around it :)
| schainks wrote:
| Yeah that's what I thought, but I couldn't find
| definitive proof besides old posts from the early days.
| Thev00d00 wrote:
| nitro is KVM based
| saagarjha wrote:
| Question: what's the performance difference for enabling this in
| Linux versus in Xen? Naively I might expect that a system
| probably spends more time in Linux code, so the overall impact
| might be higher...
| bonzini wrote:
| See https://news.ycombinator.com/item?id=32607709; in addition
| to what you say, enabling IBRS basically comes for free in Xen
| because its "userspace" is actually guest kernel code that can
| itself control processor speculation.
| monocasa wrote:
| Whatever comment you linked to seems to be deleted. Can you
| summarize?
| glandium wrote:
| The link is good, but a semi-colon got stuck to it. If you
| remove the semi-colon, you'll get to the comment.
___________________________________________________________________
(page generated 2022-08-26 23:01 UTC)