[HN Gopher] Zenbleed
       ___________________________________________________________________
        
       Zenbleed
        
       Author : loeg
       Score  : 743 points
       Date   : 2023-07-24 14:34 UTC (8 hours ago)
        
 (HTM) web link (lock.cmpxchg8b.com)
 (TXT) w3m dump (lock.cmpxchg8b.com)
        
       | ComputerGuru wrote:
       | No details on the performance impact of the microcode update.
       | _Presumably_ it disables speculative execution of vzeroupper?
        
         | hinkley wrote:
         | Or adds a guard.
         | 
         | They mention perf issues for the workaround but they're notably
         | absent from the microcode commentary.
         | 
         | I wonder what this is going to do to the new AMD hardware AWS
         | is trying to roll out, which is supposed to be a substantial
         | performance bump over the previous generation.
        
           | jeffffff wrote:
           | shouldn't have any effect, the new amd hardware is zen 4 and
           | this only affects zen 2
        
           | infinityio wrote:
           | It looks like this is a Zen 2-only exploit, so it shouldn't
           | have any impact - AWS are likely already running hardware
           | that isn't vulnerable to this
        
             | hinkley wrote:
             | The way Spectre and Meltdown played out, you'll have to
             | excuse me if I stand outside the blast radius while we
             | figure out if there's a chapter 2, 3 or 4 to this story.
             | 
             | They've proven Zen 2 has this problem. They haven't proven
             | no other AMD processors have it. A bunch of people looking
             | to make names for themselves are probably busily testing
             | every other AMD processor for a similar exploit.
        
               | heywhatupboys wrote:
               | > The way Spectre and Meltdown played out, you'll have to
               | excuse me if I stand outside the blast radius while we
               | figure out if there's a chapter 2, 3 or 4 to this story.
               | 
               | I am OOTL on this one, do you have some information you
               | could share?
        
               | loeg wrote:
               | There has been a long trickle of similar bugs to
               | Spectre/Meltdown coming out long after the initial bugs
               | and "fixes" were published. (The early fixes were all, in
               | some sense, incomplete.)
        
               | kzrdude wrote:
               | There was a list of vulnerabilities in this comment up
               | top: https://news.ycombinator.com/item?id=36849914
        
       | darkclouds wrote:
       | Nice catch!
       | 
       | > If you can't apply the update for some reason, there is a
       | software workaround: you can set the chicken bit DE_CFG[9].
       | 
       | It reminds me of the compiler switches which can alter the way
       | code at different levels (global, procedure, routine) can access
       | variables declared at different levels and the change in scope
       | that ensues.
       | 
       | Maybe some of this HW caching should be left to the coders.
        
       | codedokode wrote:
       | I don't understand how a microcode update could fix this. I
       | assume microcode is used for slow operations like triginometric
       | functions, and doesn't affect how registers are allocated or
       | renamed. Or does the update simply disables some optimizations
       | using "chicken bits"? And by the way, is there a list of such
       | bits?
        
         | sebzim4500 wrote:
         | Everything a modern CPU runs is microcode. There are a few x86
         | instructions that translate to a single microcode instruction,
         | but most are translated to several.
        
         | wolf550e wrote:
         | The designers leave themselves an ability to override any
         | instruction using the microcode so they can patch any
         | instruction. They don't use the microcode only to implement
         | complex instructions that require loops.
        
       | mrpippy wrote:
       | It feels like not-a-coincidence that OpenBSD added AMD microcode
       | loading in the last 3 days.
       | 
       | https://news.ycombinator.com/item?id=36838511
        
         | dralley wrote:
         | This may or may not also be relevant (I actually have no idea):
         | https://www.phoronix.com/news/Fedora-Server-Alert-FW-Updates
        
         | hammock wrote:
         | Explain that like I'm 5?
        
           | laverya wrote:
           | The patch for this exploit is to load AMD's updated
           | microcode.
        
             | dumdumchan wrote:
             | Is apt update && apt upgrade enough for pop-os users?
        
               | CameronNemo wrote:
               | Probably eventually yes, but if you are really concerned
               | you need to discuss it with your distro maintainers.
        
               | gabereiser wrote:
               | This. Not everyone is as quick as say Arch or Fedora in
               | updating/patching. Please reach out to your maintainers
               | of the distro you use.
        
               | [deleted]
        
               | vladvasiliu wrote:
               | Even Arch seems out of date as of 24 jul 2023 17:55 UTC.
               | 
               | The latest amd firmware version is 20230625.
        
               | LtdJorge wrote:
               | Gentoo already has it, however the latest ebuild is still
               | masked, so one would need to put "sys-kernel/linux-
               | firmware ~amd64" inside a file in
               | /etc/portage/package.accept_keywords, or better yet,
               | always run the git version, using * instead of ~amd64.
               | 
               | Apart from that, it's necessary to "sudo emaint sync -A
               | && sudo emerge -av sys-kernel/linux-firmware", while
               | checking that the correct files are included in the
               | savedconfig file if using it. After that, rebuild the
               | kernel or the initramfs and reboot.
        
               | kzrdude wrote:
               | I think you'll need to reboot for the microcode to be
               | updated
        
             | jahsome wrote:
             | I'm not sure five year olds know what microcode is. I'm 35,
             | been in tech nearly 20 years and don't recall having heard
             | that specific term before today.
        
               | eindiran wrote:
               | The whole "explain like I'm 5" thing is ridiculous. A
               | huge percentage of topics simply cannot be broken down to
               | an average 5 year old in a way that makes the
               | conversation worth having at all. The 5 year old has no
               | context about why in recent years there has been a huge
               | push towards running your own code on other people's
               | computers using various isolation techniques, or why
               | people are trying to exploit that. The 5 year old has no
               | context for what the exploits actually are, or how to
               | mitigate them. Even if you break all of those things down
               | into 5 year old bitesized chunks, you end up with boring
               | word soup completely disconnected from the meaningful
               | parts of the conversation.
               | 
               | Really what ELI5 is, is a technique to allow the asker to
               | not have to look anything up. From the parent comment,
               | you can look up "patch", "AMD", "microcode"; or you can
               | demand "ELI5!" and have someone else type up long,
               | careful definitions that don't reference context or words
               | that a 5 year old doesn't know.
               | 
               | Regarding what microcode is, here is a good explanation
               | of the differences between microcode and firmware:
               | 
               | https://superuser.com/questions/1283788/what-exactly-is-
               | micr...
        
               | jahsome wrote:
               | Sure, I can look it up (and I did) but this is a
               | _discussion_ section, so why not prompt a discussion by
               | asking for a simple explanation?
               | 
               | Appreciate the link! I'm not OP but that's exactly what I
               | was looking for.
        
               | byvirtueof wrote:
               | I agree that many topics are hard to explain to a five
               | year old, but ELI5 can be very helpful in forcing people
               | to simplify their writing. Many people explain things in
               | an unnecessarily complex way, and ELI5 at least makes
               | them think about the target audience.
        
               | wolf550e wrote:
               | A Grandchild's Guide to Using Grandpa's Computer a.k.a.
               | "If Dr. Zeuss were a Technical Writer" was written in
               | 1994 and mentions microcode.
               | 
               | Microcode updates are always discussed when talking about
               | microarchitectural security vulnerabilities (and other
               | scary CPU errata like
               | https://lkml.org/lkml/2023/3/8/976).
               | 
               | Microcode is always mentioned when discussing CPU design
               | evolution.
        
               | jahsome wrote:
               | It's funny that it's "always" mentioned, yet it's not
               | familiar to me. Also curious the Wikipedia article for
               | CPU design doesn't mention it, since it's "always"
               | referenced.
               | 
               | Just because something is familiar to you, or even large
               | swaths of a given population, doesn't mean everyone
               | should be expected to know it.
               | 
               | I love learning new things. I love discovering topics I
               | know nothing about, and I love picking the brains of
               | those passionate about them. But the condescension from a
               | certain type of tech nerd sucks all the fun out of
               | learning. I've certainly been guilty of this in the past.
        
               | heywhatupboys wrote:
               | > I'm not sure five year olds know what microcode is
               | 
               | Sounds like cope being outprogrammed by a kindergartner i
               | Roblox
        
               | enedil wrote:
               | But well educated five year olds from good schools would
               | know it.
        
       | akyuu wrote:
       | https://www.amd.com/en/resources/product-security/bulletin/a...
       | 
       | According to AMD's security bulletin, firmware updates for non-
       | EPYC CPUs won't be released until the end of the year. What
       | should users do until then, disable the chicken bit and take the
       | performance hit?
        
         | stefan_ wrote:
         | Are they out of their mind? This is not a "medium".
        
           | qhwudbebd wrote:
           | Presumably classified as severity 'medium' in an attempt to
           | look marginally less negligent when announcing that they
           | can't be bothered to issue microcode updates for most CPU
           | models until Nov or Dec.
        
       | ItsTotallyOn wrote:
       | What does this allow the attacker to do? Steal data? The post
       | isnt very clear.
        
         | timmaxw wrote:
         | It allows the attacker to eavesdrop on the data going through
         | operations like strcmp(), memcpy(), and strlen(). (These are
         | the standard functions in C for working with strings; and many
         | higher-level languages use them under the hood.) It works on
         | any function that uses the XMM/YMM/ZMM registers.
         | 
         | It's stochastic; the attacker randomly gets data from whatever
         | happens to be using the XMM/YMM/ZMM registers at the time. So
         | if the attacker could eavesdrop in the background constantly,
         | they might eventually see a password. Or they might be able to
         | trigger some system code that processes your password, then
         | eavesdrop for the next few milliseconds.
         | 
         | The attacker needs to run code on your machine. Unclear if
         | running code in a web browser is sufficient or not. It requires
         | an unusual sequence of machine instructions, which isn't
         | necessarily possible in JS/WASM, but 'sounds' says they did it:
         | https://news.ycombinator.com/item?id=36849767
        
         | sounds wrote:
         | Huh. The very first line seems pretty clear:                  >
         | If you remove the first word from the string "hello world",
         | what should        > the result be? This is the story of how we
         | discovered that the answer        > could be your root
         | password!
         | 
         | Can you please expand on your question?
        
           | bananapub wrote:
           | I assume they meant "what does this do in normal
           | vulnerability discussion terms", I don't know why tavis
           | didn't just say "arbitrary memory read across processes" or
           | whatever.
        
           | ItsTotallyOn wrote:
           | does it require physical access to the machine?
        
             | xmodem wrote:
             | No, only the ability to execute arbitrary code in an
             | unprivileged context. Would probably have to be arbitrary
             | x86_64 instructions - Javascript wouldn't cut it for this
             | one.
        
             | sounds wrote:
             | I was able to reproduce the vulnerability using javascript
             | on a webpage. Therefore, no.
        
               | Sohcahtoa82 wrote:
               | PoC || GTFO
        
               | IggleSniggle wrote:
               | [flagged]
        
               | Y_Y wrote:
               | Not even an xor? Harsh.
        
               | sounds wrote:
               | OP here hadn't even bothered to read the article. That's
               | the context of my reply. No PoCs going online so close to
               | the disclosure, sorry.
        
               | pests wrote:
               | It's okay to admit you are wrong or don't have a working
               | POC.
        
               | LtdJorge wrote:
               | What? The researcher that found it and wrote the article
               | already posted a PoC that can be used to farm data from
               | VMs in any VPS provider.
        
               | pests wrote:
               | Why is everyone claiming this is impossible in
               | JavaScript? If you have a POC you should post it so
               | others can learn of the danger.
               | 
               | You've even been quoted elsewhere in this thread about
               | this topic.
        
               | 0xbadcafebee wrote:
               | Some people think you need "the ability to execute
               | arbitrary code in an unprivileged context" to perform
               | this exploit. Which is of course a false assumption. The
               | bug class in this case is basically a user-after-free,
               | for a function which keeps its state per-cpu-core, for a
               | function that is (for almost all intents and purposes)
               | unprivileged.
               | 
               | From the article:                 We now know that basic
               | operations like strlen, memcpy and strcmp will use the
               | vector registers -        so we can effectively spy on
               | those operations happening anywhere on the system! It
               | doesn't matter       if they're happening in other
               | virtual machines, sandboxes, containers, processes,
               | whatever!
               | 
               | All you need to do is write some JavaScript that will _"
               | trigger something called the XMM Register Merge
               | Optimization2, followed by a register rename and a
               | mispredicted vzeroupper"_. It's up to the hacker to
               | determine how to do this explicitly in JS, but it's
               | theoretically possible by literally any application at
               | any time on any operating system. Even if some language
               | or interpreter claims to prevent it, it's possible to
               | find an exploit in that particular
               | language/interpreter/etc to get it to happen.
               | 
               | This is how exploit development works; if you can't go
               | straight ahead, go sideways. I guarantee you that someone
               | will find a way, if they haven't yet.
        
               | _flux wrote:
               | What javascript was that, or did you create your own? I
               | did not find any from this post.
        
               | KomoD wrote:
               | I'll take this as bullshit until there's a POC
        
               | crtasm wrote:
               | Might you post a screen recording?
        
               | CyberDildonics wrote:
               | Might you explain how that would prove anything?
        
               | heywhatupboys wrote:
               | effort to lie on a text comment << effort to lie with a
               | video
        
               | CyberDildonics wrote:
               | Might you think that source code would be much better
               | proof and easier to send out?
        
               | evandale wrote:
               | We are on a tech site with highly intelligent individuals
               | who have been programming computers since we've been in
               | diapers.
               | 
               | If you don't believe the text then how would you believe
               | the video? Anything can be done in devtools beforehand
               | and I can think of a million different ways to fake the
               | video.
               | 
               | Personally, if I didn't trust the text then an easily
               | faked video wouldn't placate me either.
        
             | kzrdude wrote:
             | No, it requires unprivileged arbitrary code execution
        
             | kristopolous wrote:
             | Beyond what everyone else said, these types of exploits can
             | break out of VMs. Unless I'm misreading it you could log
             | into your $5 linode/digitalocean/aws machine and start
             | reading other people's data on the host machine.
             | 
             | There's tons of million dollar/month businesses on
             | ~$20/month accounts on shared machines.
        
         | rkrzr wrote:
         | It allows the attacker to steal data like e.g. your (root)
         | password.
        
           | tremon wrote:
           | Only while it's stored unencrypted in memory, right?
        
             | saagarjha wrote:
             | As is the case whenever you type it in, yes
        
             | taneliv wrote:
             | My reading of the article was that memory is not directly
             | compromised, but CPU registers. So loaded unencrypted in
             | one of the affected registers.
        
         | beebmam wrote:
         | It is very clear, you just didn't read it.
         | 
         | >We now know that basic operations like strlen, memcpy and
         | strcmp will use the vector registers - so we can effectively
         | spy on those operations happening anywhere on the system! It
         | doesn't matter if they're happening in other virtual machines,
         | sandboxes, containers, processes, whatever!
         | 
         | >This works because the register file is shared by everything
         | on the same physical core. In fact, two hyperthreads even share
         | the same physical register file.
         | 
         | >It turns out that mispredicting on purpose is difficult to
         | optimize! It took a bit of work, but I found a variant that can
         | leak about 30 kb per core, per second.
         | 
         | >This is fast enough to monitor encryption keys and passwords
         | as users login!
        
           | hinkley wrote:
           | Literally the intro says it might contain the root password.
           | 
           | TLDR: The vector registers this bug affects are used for
           | string functions like strcmp, so anything could get loaded
           | into them, including passwords.
        
       | kristjank wrote:
       | At least it's fixed in microcode, unlike some recent exploits
       | (Spectre and Meltdown come to mind)
        
       | sounds wrote:
       | The site is getting hugged to death.
       | https://web.archive.org/web/20230724143835/https://lock.cmpx...
        
         | ksec wrote:
         | It is a simple static HTML page, how is it possible in 2023 a
         | static site could be hugged to death. In most cases HN traffic
         | barely hits 100 page view per second.
        
           | jedberg wrote:
           | It's a security writeup so it's probably run by a security
           | expert who is not an expert at running high traffic websites.
           | Most likely there is something on the page that causes a
           | database hit. Possibly the page content itself.
        
             | [deleted]
        
           | taviso wrote:
           | welp, that's unfortunate indeed.
           | 
           | It's a single-core 128 MB VPS, which seemed fine for my
           | boring static html articles. I guess I underestimated the
           | interest.
        
             | yakubin wrote:
             | FWIW, enabling gzip/zstd compression in your HTTP server
             | could help.
        
               | ransackdev wrote:
               | A single core machine already overloaded is going to get
               | even worse introducing the cpu overhead of gzipping
               | response bodies (assuming it's cpu bound and not IO
               | bound)
               | 
               | Cache control headers will help with return traffic
               | 
               | More cpu cores
               | 
               | If using nginx ensure sendfile is enabled and workers are
               | set to auto or tuned for your setup
               | 
               | Check ulimit file handle limits
               | 
               | Offload static assets to cdn
               | 
               | Since it's a static html site, you could even host on s3,
               | netlify, etc
        
               | jwilk wrote:
               | It's a static file. You need to compress it only once,
               | not for every response.
        
               | ptx wrote:
               | ...and here's how to do it in Apache: https://httpd.apach
               | e.org/docs/2.4/mod/mod_deflate.html#preco...
        
               | brazzledazzle wrote:
               | Could even host on github pages with a cname.
        
               | wolf550e wrote:
               | Only with something like mod_asis
               | (https://httpd.apache.org/docs/2.4/mod/mod_asis.html) to
               | serve already compressed content. Actually running zlib
               | on every request will only make it worse.
        
             | javajosh wrote:
             | As an aside, I'd be curious to now how your VPS failed.
             | Memory? Bandwidth?
        
             | account42 wrote:
             | Interesting, do you mind sharing what software you use to
             | serve the static html and what kind of traffic its getting.
        
               | loeg wrote:
               | HTTP/1.1 200 OK       Date: Mon, 24 Jul 2023 17:05:06 GMT
               | Server: Apache
        
               | brazzledazzle wrote:
               | I do not miss performance tuning apache.
        
               | cesarb wrote:
               | In my personal experience, the first step in tuning
               | Apache was "put a nginx server in front of it". Running
               | out of workers (either processes in the prefork model, or
               | threads otherwise) was in my experience way too easy,
               | especially when keepalive is enabled (even a couple of
               | seconds of keepalive can be painful). The async model
               | used by nginx can handle a lot more connections before
               | running out of resources.
        
               | zokier wrote:
               | Apache has been defaulting to event mpm for over a
               | decade.
        
             | tamimio wrote:
             | Doesn't matter, great article!
        
           | marcus0x62 wrote:
           | I imagine they are also getting traffic from sources other
           | than HN.
        
           | winrid wrote:
           | 100rps for most articles. I bet this is at least double that,
           | and he's using apache which by default I think is still
           | thread per connection.
        
         | ComputerGuru wrote:
         | Faster link: https://archive.is/QAwvQ
        
         | AdmiralAsshat wrote:
         | And now we've hugged the archive to death. Nice job!
        
         | loeg wrote:
         | The original still loads (eventually) for me. YMMV.
        
           | nevi-me wrote:
           | XMMV or ZMMV could also apply
        
             | account42 wrote:
             | [flagged]
        
       | artisanspam wrote:
       | Why does disabling SMT not fully prevent this? I don't know the
       | details of Zen 2 architecture, but register files are usually
       | implemented as SRAM on the CPU-die itself. So unless the core is
       | running SMT, I don't understand how another thread could be
       | accessing the register file to write a secret.
        
         | adrian_b wrote:
         | Because unless you pin the threads to certain CPU cores (e.g.
         | in Linux by using the taskset command, or in Windows by using
         | the Set Affinity command in Task Manager), they are migrated
         | very frequently between cores.
         | 
         | So even with SMT disabled, each core will execute sequentially
         | many threads, switching every few milliseconds from one thread
         | to another, and each context switch does not modify the hidden
         | registers, it just restores the architecturally visible
         | registers.
        
           | dontlaugh wrote:
           | Pinning doesn't help either, since there will always be more
           | threads than cores. Scheduling all those threads and even
           | blocking on IO will cause context switches.
        
         | wbl wrote:
         | Because the context switch only affects architectural state not
         | microarchitectural state.
        
           | artisanspam wrote:
           | Yes I understand that but I was struggling to think of a
           | sequence of instructions that would cause this secret leaking
           | on a single thread.
           | 
           | But a simple example is `vzeroupper` followed by anything
           | that writes a secret to the same register file entry would be
           | leaked on a subsequent flush.
        
             | wbl wrote:
             | That's not quite right. The attacker doss the vzeroupper
             | rollback. Any registers in the physical file that haven't
             | been overwritten can be exposed as a result, regardless of
             | what the victim did.
        
       | [deleted]
        
       | wzdd wrote:
       | Really lovely writeup. I liked the discussion of determining how
       | can you tell if a randomly-generated program performed correctly.
       | The obvious approach is to just run it on an "oracle" -- another
       | processor or simulator -- and see if it behaves the same way. But
       | if you're checking for microarchitectural effects with tight
       | timing windows you can also write the same program with various
       | stalls, fences, nops and so on -- things which shouldn't affect
       | the output (for single-threaded code) but which will result in
       | the CPU doing significantly different things
       | microarchitecturally. That way the CPU can be its own oracle.
        
         | weinzierl wrote:
         | This part was super interesting, especially the differences
         | between fuzzing software and hardware. I also liked the
         | _chicken bit_.
        
       | Shazshe wrote:
       | [flagged]
        
       | gavinhoward wrote:
       | Off-topic question, but can some experts tell me why it is safe
       | for `strlen()` and friends to use vector instructions when they
       | can technically read out of bounds?
        
         | loeg wrote:
         | Essentially because memory mappings and RAM work at page
         | granularity, rather than bytes. If a read from in-bounds in a
         | page isn't going to fault, a read later in the same page isn't
         | going to fault either (even if it is past the end of the
         | particular object).
         | 
         | You can see this in glibc's implementation, which checks for
         | crossing page boundaries:
         | https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/x86...
         | (line ~68)
        
           | gavinhoward wrote:
           | Ah, so _that 's_ why there is special code in Valgrind to
           | handle glibc and friends!
        
             | loeg wrote:
             | I think capability-pointer machines like CHERI might need
             | in-bounds-only variants of these functions, too.
        
               | saagarjha wrote:
               | Generally CHERI tracks things for 16-byte regions
        
               | loeg wrote:
               | Implementations using 32- or 64-byte (256 or 512 bit)
               | vector extensions would run afoul of 16-byte granularity.
               | While it is not common yet, ARM SVE allows vector sizes
               | larger than 128 bits -- e.g., Graviton3 has 256-bit SVE
               | and Fujitsu A64FX has 512-bit.
        
               | Liquid_Fire wrote:
               | I think you might be confusing the tracking of validity
               | of capabilities themselves (which could indeed be at a 16
               | byte granularity for an otherwise 64-bit system) with the
               | bounds of a capability, which can be as small as 1 byte.
        
       | dtx1 wrote:
       | > AMD have released an microcode update for affected processors.
       | Your BIOS or Operating System vendor may already have an update
       | available that includes it.
       | 
       | Yes, I love flashing BIOS...
       | 
       |  _edit_ nvm, Microcode can get updated via system updates.
        
         | Night_Thastus wrote:
         | To be fair, flashing the bios isn't nearly as bad on most
         | modern systems.
         | 
         | Put the file on a USB drive, plug it in, restart and go into
         | the bios, look for the flashing utility, select the file, done.
         | As long as the machine is on a UPS in case of disaster,
         | everything's accounted for.
        
           | dtx1 wrote:
           | From my experience: Better have a > 32gig USB Flash drive,
           | everything else doesn't work (MSI) and I don't have an UPS so
           | it's always quite an exciting experience. Especially since
           | Motherboard Manifactures save almost a whole dollar by not
           | having a display ouputting anything. So it's blinkenlights
           | and hope for the best
        
             | Night_Thastus wrote:
             | Get a UPS!!
             | 
             | Not just for convenience, but safety. You don't want to be
             | caught out when something goes wrong, even without flashing
             | the bios.
             | 
             | A lot of boards these days have 7-segment displays. They're
             | not great, but they're a good step up. Don't need to spend
             | a lot, I think they show up on $300-ish boards. Mine
             | definitely does.
        
               | dtx1 wrote:
               | I see no need for it. Living in Germany, any kind of
               | power outages are exceptionally rare. I remember one in
               | the last 10 years for a few hours and that was very
               | local. If I am in a situation where a power outage
               | occurs, i'll listen to my battery radio for a while and
               | be fine. I work on nothing and rely on nothing that would
               | actually require a UPS.
        
               | Night_Thastus wrote:
               | It's like not having a backup drive. Everything is fine
               | until one day it isn't.
               | 
               | A good UPS does more than just protect from outages. It
               | also protects from surges and low-voltage situations that
               | can both damage the equipment severely.
               | 
               | A UPS doesn't cost much and will last many years. Buying
               | a new motherboard and GPU because they got fried is much
               | more expensive.
        
           | baq wrote:
           | Sometimes there's even a backup BIOS die available, so yeah,
           | bricking is now much harder than in the past.
        
           | ls612 wrote:
           | My new computer takes a while to POST (z690 with ddr5 smh) so
           | it's basically been continuously either on or in sleep since
           | I built it 18 months ago and I've had an unexpected shutdown
           | due to power loss once in that time according to the Event
           | log. I think the risk of losing power while flashing the bios
           | is very small in real life unless you are stuck in a place
           | with third world electricity infrastructure.
        
             | formerly_proven wrote:
             | If POST takes a long time it's often memory training,
             | backing off on the timings just slightly might make it go a
             | lot quicker. Bios updates also often twiddle knobs in this
             | area.
        
               | ls612 wrote:
               | It isn't this, it takes about a minute to train after a
               | bios update or when I enable XMP but never trains after
               | that. It just takes like 20-30 seconds to get all the way
               | to the bios splash screen and only 5 seconds to return
               | from sleep so I just use sleep instead of turning it off.
               | Then the only time I need to wait through a boot is for
               | windows updates.
        
               | Night_Thastus wrote:
               | Doesn't that only happen on the first boot with new
               | memory? As well, I thought it was more of a concern on
               | AMD, and less on Intel. (Z690 is Intel)
        
           | yyyk wrote:
           | Often the BIOS will allow reading the update file from EFI
           | partition, so there's no need for the USB drive.
        
           | astrange wrote:
           | Mine loses its settings when you update the BIOS, so your fan
           | curves go away.
        
         | deaddodo wrote:
         | Microcode updates haven't been managed in-BIOS for over a
         | decade now. If you use Linux, you'll usually see them released
         | as some package like "intel-microcode" or "amd-microcode".
         | 
         | Even EFI updates rarely are very intrusive or dangerous, and
         | can also be handled by the Operating System via an update.
        
           | mrpippy wrote:
           | They are managed both ways. I think updating in BIOS is
           | preferable to ensure no CPU parameters change while (some
           | part) of the kernel has already initialized.
           | 
           | But of course BIOS updates have many downsides and often stop
           | after a few years.
        
       | dTP90pN wrote:
       | > AMD have released an microcode update for affected processors.
       | 
       | I don't think that is correct. AMD has released a microcode
       | update[0] for family 17h models 0x31 and 0xa0, which corresponds
       | to Rome, Castle Peak and Mendocino as per WikiChip [1].
       | 
       | So far, there seems to be no microcode update for Renoir, Grey
       | Hawk, Lucienne, Matisse and Van Gogh. Fortunately, the newly
       | released kernels can and do simply set the chicken bit for those.
       | [2]
       | 
       | [0]
       | https://git.kernel.org/pub/scm/linux/kernel/git/firmware/lin...
       | 
       | [1] https://en.wikichip.org/wiki/amd/cpuid#Family_23_.2817h.29
       | 
       | [2]
       | https://github.com/torvalds/linux/commit/522b1d69219d8f08317...
        
         | dTP90pN wrote:
         | More details:
         | 
         | `good_revs` as per the kernel:
         | https://github.com/torvalds/linux/commit/522b1d69219d8f08317...
         | 
         | Currently published revs ("Patch") (git HEAD):
         | 
         | https://git.kernel.org/pub/scm/linux/kernel/git/firmware/lin...
         | 
         | As of this writing, only two of the five `good_rev`s have been
         | published.
        
       | cratermoon wrote:
       | This link seems hugged to death, so here's an alternate source:
       | AMD 'Zenbleed' Bug Allows Data Theft From Zen 2 Processors,
       | Patches Coming: <https://www.tomshardware.com/news/zenbleed-bug-
       | allows-data-t...>
        
         | ItsTotallyOn wrote:
         | This story has comments from AMD, too.
        
       | [deleted]
        
       | nemetroid wrote:
       | The README in the tar file with the exploit (linked at "If you
       | want to test the exploit, the code is available here") contains
       | some more details, including a timeline:
       | 
       | - `2023-05-09` A component of our CPU validation pipeline
       | generates an anomalous result.
       | 
       | - `2023-05-12` We successfully isolate and reproduce the issue.
       | Investigation continues.
       | 
       | - `2023-05-14` We are now aware of the scope and severity of the
       | issue.
       | 
       | - `2023-05-15` We draft a brief status report and share our
       | findings with AMD PSIRT.
       | 
       | - `2023-05-17` AMD acknowledge our report and confirm they can
       | reproduce the issue.
       | 
       | - `2023-05-17` We complete development of a reliable PoC and
       | share it with AMD.
       | 
       | - `2023-05-19` We begin to notify major kernel and hypervisor
       | vendors.
       | 
       | - `2023-05-23` We receive a beta microcode update for Rome from
       | AMD.
       | 
       | - `2023-05-24` We confirm the update fixes the issue and notify
       | AMD.
       | 
       | - `2023-05-30` AMD inform us they have sent a SN (security
       | notice) to partners.
       | 
       | - `2023-06-12` Meeting with AMD to discuss status and details.
       | 
       | - `2023-07-20` AMD unexpectedly publish patches, earlier than an
       | agreed embargo date.
       | 
       | - `2023-07-21` As the fix is now public, we propose privately
       | notifying major distributions that they should begin preparing
       | updated firmware packages.
       | 
       | - `2023-07-24` Public disclosure.
        
         | sedatk wrote:
         | > AMD unexpectedly publish patches, earlier than an agreed
         | embargo date.
         | 
         | > As the fix is now public, we propose privately notifying
         | major distributions that they should begin preparing updated
         | firmware packages.
         | 
         | AMD had to drop the ball somewhere didn't it.
        
           | klyrs wrote:
           | It's _good_ that they published patches early, isn 't it?
        
             | robryk wrote:
             | You'd want the delay between first publication of X and the
             | microcode update making its way into releases of OSes to be
             | smallest, for various values of X (mention of a
             | vulnerability, microcode patch, description of
             | vulnerability, PoC). Making various OS releasers aware that
             | a microcode patch that fixes a vulnerability will be
             | published on a given date before that date decreases that
             | for most values of X.
        
             | taviso wrote:
             | Yes. It was unexpected, but good. Not a complaint.
        
               | sedatk wrote:
               | Uh, okay. I thought the embargo date was set so you could
               | have enough time to inform the distros. Not the case,
               | then.
        
             | [deleted]
        
       | LtdJorge wrote:
       | This is both as cool as it is scary. I managed to "exfiltrate"
       | pieces of my Bitwarden password (could easily be reconstructed),
       | ssh login password, and bank credentials in a minute of running
       | from a 10MB sample.
        
       | causi wrote:
       | _AMD Ryzen 5000 Series Processors with Radeon Graphics_
       | 
       | Does this mean Ryzen CPUs without integrated graphics are fine?
        
         | formerly_proven wrote:
         | No this means AMD's numbering scheme is intentionally obtuse.
         | This has nothing to do with graphics, but with the CPU core,
         | Zen 2.
        
         | lgl wrote:
         | The only series 5000 cpu's that are still using Zen2
         | architecture are apparently the 5300U, 5500U and 5700U, which
         | all use socket FP6 (mobile/embedded).
         | 
         | So I'm guessing it shouldn't affect any of the more recent and
         | very popular Zen3 cpus like the 5600, 5700 etc. I personally
         | own a 5600, which are a great bang for buck.
        
           | paulmd wrote:
           | Lucienne (5700U/5500U/5300U) are the only Zen2s in the 5000
           | series at present (afaik), but AMD continues to re-use the
           | Zen2 architecture in the 7000 series (7520U, etc), as well as
           | many semicustom products like Steam Deck.
           | 
           | It's in rather a sweet-spot as far as performance-power-area,
           | so this isn't entirely a bad thing. Zen3's main innovation
           | was unifying the CCXs/caches, but if you only have a 4C, or
           | you want to be able to power-gate a CCX (and its attendant IF
           | links/caches) down entirely, Zen2 does that better, and it's
           | slightly smaller. We'll be seeing Zen2 products for years to
           | come, most likely.
        
         | gruez wrote:
         | No, it's all Zen 2 CPUs, which include both desktop CPUs (with
         | or without integrated graphics, laptop CPUs, and server CPUs.
         | The reason why the product list is so confusing is that AMD
         | reuses architectures across generations. You'd think that all
         | ryzen 5000 series CPUs have the same microarchitecture, but
         | they don't). It's much easier to consult this list instead:
         | https://en.wikipedia.org/wiki/Zen_2#Products
        
           | paulmd wrote:
           | FYI this list isn't exhaustive. And I went to recommend the
           | wikichips link and it's not exhaustive either.
           | 
           | https://en.wikichip.org/wiki/amd/microarchitectures/zen_2#Al.
           | ..
           | 
           | Both of them are missing the newer 7000-family products with
           | Zen2 like 7520U etc.
           | 
           | https://www.amd.com/en/products/apu/amd-ryzen-5-7520u
           | 
           | https://www.amd.com/en/products/apu/amd-ryzen-3-7320u
           | 
           | https://www.amd.com/en/products/apu/amd-athlon-gold-7220u
        
             | tremon wrote:
             | _products /apu/amd-athlon_
             | 
             | Wait... now there's also APU's under the AMD Athlon brand?
             | I know that people are happy when AMD's product offerings
             | are on-par or outperforming Intel, but they didn't have to
             | outdo Intel in the consumer confusion arena as well.
        
               | paulmd wrote:
               | Has been for a while.
               | 
               | https://www.techpowerup.com/cpu-specs/athlon-200ge.c2073
               | 
               | Intel also used the Pentium branding for low-end
               | processors (below i3 and in the Atom lineup), and
               | followed it up with the rather perplexing move of using
               | their company name as the sole branding for their worst
               | products ("Intel Processor").
        
             | neogodless wrote:
             | The 7520U and 7530U are listed on the linked Wikipedia
             | page. Look under "Ultra-mobile APUs".
             | 
             | The Athlon is missing, though.
        
       | lopkeny12ko wrote:
       | Relevant snippet:
       | 
       | This technique is CVE-2023-20593 and it works on all Zen 2 class
       | processors, which includes at least the following products:
       | AMD Ryzen 3000 Series Processors         AMD Ryzen PRO 3000
       | Series Processors         AMD Ryzen Threadripper 3000 Series
       | Processors         AMD Ryzen 4000 Series Processors with Radeon
       | Graphics         AMD Ryzen PRO 4000 Series Processors         AMD
       | Ryzen 5000 Series Processors with Radeon Graphics         AMD
       | Ryzen 7020 Series Processors with Radeon Graphics         AMD
       | EPYC "Rome" Processors
        
         | kevin_thibedeau wrote:
         | FYI, Ryzen 3000 APUs aren't Zen 2.
        
           | neogodless wrote:
           | > AMD Ryzen 3000 Series Processors
           | 
           | The above are desktop. If they meant APUs, it would list
           | "Ryzen 3000 Series Processors with Radeon Graphics."
        
           | timw4mail wrote:
           | They are Zen+, aren't they?
        
         | justinclift wrote:
         | Whew, my 5600X looks like it avoided this one too. :)
        
         | tremon wrote:
         | Do they mean "only confirmed on Zen2", or is the problem
         | definitely confined to only this architecture?
         | 
         | Is it likely that this same technique (or similar) also works
         | on earlier (Zen/Zen+) or later (Zen3) cores, but they just
         | haven't been able to demonstrate it yet?
        
           | Arnavion wrote:
           | Doesn't repro on 2920x (Zen+).
        
           | rincebrain wrote:
           | At least the stock exploit code he provided said "nope I
           | can't get shit to leak" on my 5900X.
        
           | zacmps wrote:
           | I tested on a Zen 3 Epyc and wasn't able to get the POC to
           | work, so I think it probably is just Zen 2.
        
           | paulmd wrote:
           | It's Tavis Ormandy, and he reported it to AMD, so _one would
           | assume_ they tried it on related hardware and it 's not
           | working.
        
         | ye-olde-sysrq wrote:
         | So are Ryzen 5000's without Radeon not vulnerable? I guess said
         | processors are zen 3?
         | 
         | I have an "AMD Ryzen 9 5950x Desktop Processor" which appears
         | to be Zen 3. I think I'm good?
         | 
         | (Not that I'm running untrusted workloads, but yknow, fortune
         | favors the prepared)
        
           | Tuna-Fish wrote:
           | You are likely frequently running untrusted workloads. As
           | javascript in a browser. I don't know about this one, but at
           | least meltdown was fully exploitable from js.
           | 
           | But yes, you are fine, 5950x is Zen3.
        
             | anarazel wrote:
             | I wish Firefox would use PR_SCHED_CORE to reduce the
             | likelihood of such leakage...
        
             | CameronNemo wrote:
             | I was under the impression that 5600g and 5600u were Zen3,
             | but being the APU models they have Radeon graphics.
             | 
             | Anecdotally, I tried to reproduce on my 5600g but couldn't.
             | Which is surprising because they claim it works on 5700u...
             | 
             | Edit: just discovered that while my 5600g is Zen3, the
             | 5700u is Zen2. Lol.
        
         | eugene3306 wrote:
         | and how about playstation 5 ?
         | 
         | and also xbox and that thing from valve?
        
           | javajosh wrote:
           | I mean, the PS5 is running a Zen 2 processor [0] so I would
           | assume it's vulnerable. In general I would assume that AAA
           | games are safe. Websites and smaller games made by
           | malefactors will be the issue. (Note that AAA game makers
           | have little interest in antagonizing the audience, OTOH they
           | also will push limits to install anti-cheat mechanisms. On
           | balance I'd trust them.)
           | 
           | 0 - https://blog.playstation.com/2020/03/18/unveiling-new-
           | detail...
        
             | darkwater wrote:
             | I think the interesting point here might be one could be
             | able to extract some secret from memory of a PS5, like to
             | break some kind of encryption
        
               | tracker1 wrote:
               | Interresting, could well be a path to jailbreaking the
               | PS5... although, not sure if that has or hasn't already
               | happened. For XBox Series, you can just use dev mode in
               | the first place.
        
               | FirmwareBurner wrote:
               | What valuable secrets do people have on their PS5/Xbox?
               | You also need a way to deploy the malicious payload on
               | those platforms which, due to their closed nature, is
               | very difficult to do.
        
               | kmeisthax wrote:
               | The valuable secret here would be the keys that let you
               | decrypt and copy games. The threat models of locked-down
               | platforms are incredibly strange.
        
               | FirmwareBurner wrote:
               | That's a good point but I can't believe that every
               | console doesn't have it's own unique set of keys so that
               | if you compromise one before SW patches land, it won't be
               | much use in the ecosystem.
        
               | kmeisthax wrote:
               | It depends. I'm going to speak in general terms, since I
               | obviously don't know how every single system works, but
               | per-console keys are used for pairing system storage to
               | the motherboard and _maybe_ keeping save data from being
               | copied from user to user. Most CDNs don 't really provide
               | the option for on-the-fly per user encryption, so instead
               | you serve up games encrypted with title keys and then
               | issue each console a title key that's encrypted with a
               | per-console key. Disc games need to be encrypted with
               | keys that every system already has, otherwise you can't
               | actually use the disc to play the game.
               | 
               | As for the value of being able to do 'hero attacks' on
               | game consoles, let me point out that once you have a
               | cleartext dump of a game, you've already done most of the
               | work. The Xbox 360 was actually very well secured, to the
               | point where it was easier to hack a disc drive to inject
               | fake authentication data into a normal DVD-R than to
               | actually hack a 360's CPU to run copied games. That's why
               | we didn't have widely-accessible homebrew on that
               | platform for the longest time. Furthermore, you can make
               | emulators that just don't care about authenticating media
               | (because why would they) and run cleartext games on
               | those.
        
               | javajosh wrote:
               | Oh, I can imagine lots of uses for a bevy of PS5's,
               | assuming you can gain remote control. What do you do with
               | a botnet? What do you do with a botnet with a pretty good
               | GPU? What do you do with an always-on microphone in
               | people's living rooms?
        
               | AdmiralAsshat wrote:
               | At least with the PS3, I seem to recall that I couldn't
               | extract any of my games' save data from the hard-drive of
               | my PS3 unit that went dead due to RROD (or was it YLOD?)
               | because the hard-drive was encrypted using the PS3's
               | serial key as part of the encryption.
               | 
               | I don't know if that mechanism persists into the PS4/PS5.
        
         | winrid wrote:
         | Looks like my 2700x narrowly misses this one, assuming 7020
         | series is affected and not 7000 series.
        
           | loeg wrote:
           | Yeah -- Ryzen 2700x is Zen+, not Zen 2. Current understanding
           | is that Zen+ is not affected.
        
           | _flux wrote:
           | The wording "at least" suggests the list might not be
           | exhaustive.
        
       | blinkingled wrote:
       | On my Zen2 / Renoir based system the PoC exploit continues to
       | work albeit slowly even after updating the microcode (linked from
       | TFA) that has the fix for this issue. The wrmsr stops it fully in
       | its track.
       | 
       | Edit: just realized it must have been that the initramfs image is
       | not updated with the manually updated firmware in /lib/firmware.
       | 
       | Edit2: Updated the initramfs and even if the benchmark.sh fails,
       | ./zenbleed -v2 still picks out and prints strings which doesn't
       | happen with the wrmsr solution.
        
         | johnp_ wrote:
         | linux-firmware does not carry any microcode update for Renoir
         | (yet). Or what do you mean by "TFA"?
         | 
         | The fixed Renoir microcode should have revision >= 0x0860010b
         | as per the kernel:
         | https://github.com/torvalds/linux/commit/522b1d69219d8f08317...
        
       | href wrote:
       | Can anyone explain the `wrmsr -a 0xc0011029 $(($(rdmsr -c
       | 0xc0011029) | (1<<9)))`? It seems to help on my system, but I
       | don't understand what it does, and I don't know how to unset it.
        
         | taviso wrote:
         | An msr is a "model specific register", a chicken bit can
         | configure cpu features.
         | 
         | They don't persist across a reboot, so you can't break
         | anything. You can undo what you just did without a reboot, just
         | use `... & ~(1 << 9)` instead (unset the bit instead of set
         | it).
        
         | mmastrac wrote:
         | This sets the chicken bit: https://www.phoronix.com/news/Linux-
         | AMD-Spectral-Chicken
        
         | mike_hearn wrote:
         | CPU designers know that some features are risky. Much like how
         | web apps may often have "feature flags" that can be flipped on
         | and off by operators in case a feature goes wrong, CPUs have
         | "chicken bits" that control various performance enhancing
         | tricks and exotic instructions. By flipping that bit you
         | disable the optimization.
        
       | HideousKojima wrote:
       | [flagged]
        
       | jrmg wrote:
       | _AMD have released an microcode update for affected processors.
       | Your BIOS or Operating System vendor may already have an update
       | available that includes it._
       | 
       | I don't really understand how CPU microcode updates work. If I'm
       | keeping Ubuntu up to date, will this just happen automatically?
        
         | naikrovek wrote:
         | no.
         | 
         | microcode changes are provided to the CPU at boot time and are
         | only valid early in the boot process. the machine UEFI/BIOS
         | must apply them.
        
           | tremon wrote:
           | Linux can (and does) apply microcode patches during kernel
           | boot.
        
             | kzrdude wrote:
             | for example use journalctl -k -g microcode to see log
             | messages related to this: (intel cpu, so revision does not
             | relate to anything AMD)
             | 
             | > microcode: microcode updated early to revision 0xa6, date
             | = 2022-06-28
        
         | sdht0 wrote:
         | https://www.cyberciti.biz/faq/install-update-intel-microcode...
        
         | tremon wrote:
         | If you already have the package amd64-microcode installed
         | (highly likely), then yes it will be updated automatically.
         | 
         | https://packages.ubuntu.com/search?keywords=amd64-microcode
        
           | jrmg wrote:
           | Great, thanks.
           | 
           | Sort of weirds me out that my OS can just silently update my
           | CPU - I didn't realize I was giving it that level of
           | control... I guess it's good vs the alternative of no-one
           | actually updating for exploits like his though.
        
             | Thaxll wrote:
             | It does not upgrade your cpu, it loads up the firemware
             | when you boot Linux.
        
               | jrmg wrote:
               | That's reassuring, thanks (not sure why you're getting
               | downvoted!)
        
             | sp332 wrote:
             | _Active microcode updates are stored in volatile memory and
             | thus have to be applied during each system boot._
             | 
             | https://wiki.gentoo.org/wiki/Microcode
        
             | loeg wrote:
             | As opposed to updating any other piece of software in the
             | system directly? The OS has always had full control.
        
       | eric__cartman wrote:
       | This is incredibly scary. On my Zen 2 box (Ryzen 3600) logging
       | the output of the exploit running as an unprivileged user while
       | copying and pasting a string into a text editor in the background
       | (I used Kate), resulted in pieces of the string being logged into
       | the output of zenbleed. And this is after a few seconds of
       | runtime mind you, not even a full minute.
       | 
       | Thankfully the exploit is highly dependent on a specific asm
       | routine so exploiting it from JS or WASM in a browser should be
       | extremely difficult. Otherwise a nefarious tab left open for
       | hours in the background could exfiltrate without an issue.
       | 
       | I'm eagerly waiting for Fedora maintainers to push the new
       | microcode so the kernel can update it during the boot process.
        
         | zekica wrote:
         | I tried on my zen 2 box, and the same things works even when
         | the exploit is run in a KVM.
        
         | loeg wrote:
         | > Thankfully the exploit is highly dependent on a specific asm
         | routine so exploiting it from JS or WASM in a browser should be
         | extremely difficult. Otherwise a nefarious tab left open for
         | hours in the background could exfiltrate without an issue.
         | 
         | At least one commentor here claims to be able to reproduce this
         | with javascript: https://news.ycombinator.com/item?id=36849767
         | .
        
           | IshKebab wrote:
           | A very bold claim with zero evidence.
        
             | saagarjha wrote:
             | What about it is very bold? The instruction sequence
             | mentioned seems pretty reasonable and not at all out of the
             | question for a JavaScript JIT to generate.
        
         | kludge41 wrote:
         | How do you build the POC? I get "No such file or directory" and
         | error 127 on Ubuntu.
        
           | eric__cartman wrote:
           | I had to run make on the uncompressed folder. Perhaps the
           | build-essential package doesn't come with NASM in Ubuntu?
           | I'll need a bit more info on the error if you want me to try
           | and help you :)
        
             | kludge41 wrote:
             | After extracting the POC and installing build-essential, I
             | still get this: nasm -O0 -felf64 -o zenleak.o zenleak.asm
             | make: nasm: No such file or directory make: **
             | [Makefile:11: zenleak.o] Error 127
        
               | eric__cartman wrote:
               | Install the nasm package. It's probably not included in
               | build-essencial.
        
               | kludge41 wrote:
               | Thank you. I guess I should've read the error better, but
               | I thought nasm was the thing complaining.
        
       | hprotagonist wrote:
       | ah, not the color theme. Hamming distance strikes again!
       | 
       | https://kippura.org/zenburnpage
        
         | heywhatupboys wrote:
         | I knew there were color schemes for the color blind.
         | 
         | Schemes for the blind are news to me though
        
       | sedatk wrote:
       | I didn't expect it to as it's Zen3, but still tried: doesn't
       | repro on my 5950X.
        
       | [deleted]
        
       | 0xbadcafebee wrote:
       | This is super cool. This exploit will be one of the canonical
       | examples that just running something in a VM does not mean it's
       | safe. We've always known about VM breakout, but this is a no-
       | breakout massive exploit that is simple to execute and gives big
       | payoffs.
       | 
       | Remember: just because this one bug gets fixed in microcode
       | doesn't mean there's not another one of these waiting to be
       | discovered. Many (most?) 0-days are known about by black-hats-
       | for-hire well before they're made public.
       | 
       | CPU vulnerabilities found in the past few years:
       | https://en.wikipedia.org/wiki/Meltdown_(security_vulnerability)
       | https://en.wikipedia.org/wiki/Spectre_(security_vulnerability)
       | https://aepicleak.com/
       | https://en.wikipedia.org/wiki/Software_Guard_Extensions#SGAxe
       | https://en.wikipedia.org/wiki/Software_Guard_Extensions#LVI
       | https://en.wikipedia.org/wiki/Software_Guard_Extensions#Plundervo
       | lt       https://en.wikipedia.org/wiki/Software_Guard_Extensions#
       | MicroScope_replay_attack       https://en.wikipedia.org/wiki/Soft
       | ware_Guard_Extensions#Enclave_attack       https://en.wikipedia.o
       | rg/wiki/Software_Guard_Extensions#Prime+Probe_attack
       | https://www.vusec.net/projects/crosstalk/
       | https://en.wikipedia.org/wiki/Hertzbleed
       | https://www.securityweek.com/amd-processors-expose-sensitive-
       | data-new-squip-attack/
        
         | zamadatix wrote:
         | In the case of the VM won't registers be wiped when
         | entering/exiting the VM?
        
           | loeg wrote:
           | The problem is the freed entries in the register file. A VM
           | can, at least, use this bug to read registers from a non-VM
           | thread running on the adjacent SMT/HT of a single physical
           | core. I suspect a VM could also read registers from other
           | processes scheduled on the same SMT/HT.
        
             | astrange wrote:
             | Are people running multiple untrusted VMs without turning
             | SMT off? Even letting them share caches seems like asking
             | for trouble.
        
               | bbojan wrote:
               | The fine article states that simply turning off SMT
               | doesn't help with this particular exploit.
        
               | zamadatix wrote:
               | In the context of this conversation, SMT on/off is
               | relevant to what scope of the vulnerability has with VMs
               | beyond the claim in the article that the issue is in some
               | way present inside VMs.
        
               | Astronaut3315 wrote:
               | This specific CVE still applies even if SMT is off, per
               | the article.
        
               | zamadatix wrote:
               | In the context of this conversation, SMT on/off is
               | relevant to what scope of the vulnerability has with VMs
               | beyond the claim in the article that the issue is in some
               | way present inside VMs.
        
               | jeroenhd wrote:
               | Not only do people do this, it's generally how VPS
               | providers work. Most machines barely use the CPU most of
               | the time (web servers etc.) so reserving a full CPU core
               | for a VPS is horribly inefficient. It doesn't matter
               | anyway, because SMT isn't relevant for this particular
               | bug.
               | 
               | With SMT allowing twice the cores on a CPU for most
               | workloads, disabling it would double the cost for most
               | providers!
               | 
               | There are VPS providers that will let you rent dedicated
               | CPU cores, but they often cost 4-5x more than a normal
               | virtual CPU. Overprovisioning is how virtual servers are
               | available for cheap!
        
               | zamadatix wrote:
               | SMT is relevant in the VM case of this bug because it
               | determines whether this bug is restricted to data outside
               | the VM or not.
               | 
               | Providers usually won't disable SMT completely, they'd
               | run a scheduler which only allows 1 VM to use both SMT
               | threads of a core. Ultra cheap VPS providers may still
               | find that not worth the pennies though as if you sell a
               | majority of single core VPS then the majority of your SMT
               | threads are still unavailable even with the scheduler
               | approach.
               | 
               | Fully dedicated cores aren't necessarily required because
               | in the timesliced case the registers are unloaded and
               | reloaded when different VMs are shuffled on and off the
               | core. That said, they definitely prevent the cross-vm-
               | data-leak case of this bug.
        
               | toast0 wrote:
               | > Fully dedicated cores aren't necessarily required
               | because in the timesliced case the registers are unloaded
               | and reloaded when different VMs are shuffled on and off
               | the core. That said, they definitely prevent the cross-
               | vm-data-leak case of this bug.
               | 
               | Registers are unloaded and reloaded when different
               | processes / threads are scheduled within a running VM
               | too. That _should_ protect the register contents, but
               | because of this issue, it doesn 't, so I don't see why it
               | would if it's a hypervisor switching VMs instead of an OS
               | switching processes. If you're running a vulnerable
               | processor on a vulnerable microcode, it seems like you
               | can potentially read things put into the vulnerable
               | registers by anything else running on the same physical
               | core, regardless of context.
        
               | KeplerBoy wrote:
               | Well you don't have to reserve any CPU Cores per VM.
               | There's no law saying you can't have more VMs than
               | logical cores. They're just processes after all and we
               | can have thousands of them.
        
               | jeroenhd wrote:
               | Of course not, but the vulnerability works by exploiting
               | the shared register file so to mitigate this entire class
               | of vulnerabilities, you'd need to dedicate a CPU core and
               | as much of its associated cache as possible to a single
               | VM.
        
               | loeg wrote:
               | Someone, somewhere is, of course. I don't know if the
               | hyperscalers do, or not.
        
             | zamadatix wrote:
             | Ah, this is a good point for those still using hypervisor
             | schedulers which allow mapping different VMs to the same
             | core.
        
           | crote wrote:
           | The problem is that the _logical_ registers don 't have a 1:1
           | relation to the _physical_ registers.
           | 
           | For example, let's imagine a toy architecture with two
           | registers: r0 and r1. We can create a little assembly snippet
           | using them: "r0 = load(addr1); r1 = load(addr2); r0 = r0 +
           | r1; store(addr3, r0)". Pretty simple.
           | 
           | Now, what happens if we want to do that _twice_? Well, we get
           | something like  "r0 = load(addr1); r1 = load(addr2); r0 = r0
           | + r1; store(addr3, r0); r0 = load(addr4); r1 = load(addr5);
           | r0 = r0 + r1; store(addr6, r0)". Because there is no overlap
           | between the accessed memory sections, they are completely
           | independent. In theory they could even execute at the same
           | time - but that is impossible because they use the same
           | registers.
           | 
           | This can be solved by adding more physical registers to the
           | CPU, let's call them R0-R6. During execution the CPU can now
           | analyze and rewrite the original assembly into "R1 =
           | load(addr1); R4 = load(addr4); R2 = load(addr2); R5 =
           | load(addr5); R3 = R1 + R2; R6 = R4 + R5; store(addr3, R3);
           | store(addr6, R6)". This means we can now start the loads for
           | the second addition before the first addition is done, which
           | means we have to wait less time for the data to arrive when
           | we finally want to actually do the second addition. To the
           | user nothing has changed and the results are identical!
           | 
           | The issue here is that when entering/exiting a VM you can
           | definitely clear the logical registers r0&r1, but there is no
           | guarantee that you are _actually_ clearing the physical
           | registers. On a hardware level,  "clearing a register" now
           | means "mark logical register as empty". The CPU makes sure
           | that any future use of that _logical_ register results in it
           | behaving _as if_ it has been clear, but there is no need to
           | touch the content of the _physical_ register. It just gets
           | marked as  "free for use". The only way that physical
           | register becomes available again is after a write, after all,
           | and that write would by definition overwrite the stale
           | content - so clearing it would be pointless. Unless your CPU
           | misbehaves and you run into this new bug, of course.
        
         | cmrdporcupine wrote:
         | In the end, I'm thinking _most_ of these are related to branch
         | prediction?
         | 
         | It strikes me that it's either that branch prediction is so
         | inherently complex enough it's always going to be vulnerable to
         | this _and /or_ it just so defies the way most of us intuitively
         | think about code paths / instruction execution that it's hard
         | to conceive of the edge cases until too late?
         | 
         | At what point does the complexity of CPU architectures become
         | so difficult to reason about that we just accept the
         | performance penalty of keeping it simpler?
        
           | c7DJTLrn wrote:
           | We demanded more performance and we got what we demanded. I
           | doubt manufacturers are going to walk back on branch
           | prediction no matter how flawed it is. They'll add some more
           | mitigations and features which will be broken-on-arrival.
        
           | Tuna-Fish wrote:
           | > At what point does the complexity of CPU architectures
           | become so difficult to reason about that we just accept the
           | performance penalty of keeping it simpler?
           | 
           | Never for branch prediction. It just gets you too much
           | performance. If it becomes too much of a problem, the
           | solution is greater isolation of workloads.
        
             | hedgehog wrote:
             | In certain cases isolation and simplicity overlap, I
             | suspect for example that the dangers of SMT implementation
             | complexity are part of why Apple didn't implement it for
             | their respective CPUs. Likely we'll see this elsewhere too,
             | for example Amazon may not ever push to have SMT in their
             | Graviton chips (the early generations are off the shelf
             | cores from ARM where they didn't have a readily available
             | choice).
        
           | loeg wrote:
           | Speculative execution, not branch prediction.
        
           | rcxdude wrote:
           | >At what point does the complexity of CPU architectures
           | become so difficult to reason about that we just accept the
           | performance penalty of keeping it simpler?
           | 
           | Basically never for anything that's at all CPU-bound, that
           | growth in complexity is really the only thing that's been
           | powering single-threaded CPU performance improvements since
           | Dennard scaling stopped in about 2006 (and by that time they
           | were already plenty complex: by the late 90s and early 2000's
           | x86 CPUs were firmly superscalar, out-of-order, branch-
           | predicting and speculative executing devices). If your
           | workload can be made fast without needing that stuff (i.e. no
           | branches and easily parallelised), you're probably using a
           | GPU instead nowadays.
        
           | paulmd wrote:
           | More generally, most of them are related to speculative
           | execution, where branch mis-prediction is a common gadget to
           | induce speculative mis-execution.
           | 
           | Speculation is hard, it's sort of akin to the idea of
           | introducing multithreading into a program, you are explicitly
           | choosing to tilt at the windmill of pure technical
           | correctness because in a highly concurrent application every
           | error will occur fairly routinely. Speculation is great too,
           | in combination with out-of-order execution it's a
           | multithreading-like boon to overall performance, because now
           | you can resolve several chunks of code in parallel instead of
           | one at a time. It's just also a minefield of correctness
           | issues, but the alternative would be losing something like
           | the equivalent of 10 years of performance gains (going back
           | to like ARM A53 performance).
           | 
           | The recent thing is that "observably correct" needs to
           | include timings. If you can just guess at what the data might
           | be, and the program runs faster if you're correct, that's
           | basically the same thing as reading the data by another
           | means. It's a timing oracle attack.
           | 
           | (in this case AMD just fucked up though, there's no timing
           | attack, this is just implemented wrong and this instruction
           | can speculate against changes that haven't propagated to
           | other parts of the pipeline yet)
           | 
           | The cache is the other problem, modern processors are built
           | with every tenant sharing this single big L3 cache and it
           | turns out that it also needs to be proof against timing
           | attacks for data present in the cache too.
        
           | 0cf8612b2e1e wrote:
           | If you pin the VM to a different core/CPU, would that do
           | anything to mitigate? Or are the OS affinity guarantees not
           | that strong?
        
             | saagarjha wrote:
             | In this case, it would avoid the exploit, because it
             | requires a shared register file.
        
         | c7DJTLrn wrote:
         | Running untrusted code whether in a sandbox, container, or VM,
         | has not been safe since at least Rowhammer, maybe before. I
         | believe a lot of these exploits are down to software and
         | hardware people not talking. Software people make assumptions
         | about the isolation guarantees, hardware people don't speak up
         | when said assumptions are made.
        
           | saagarjha wrote:
           | That is not true in this case. It's just a CPU bug; not even
           | a side channel.
        
         | Bluecobra wrote:
         | Yup! I worked at a few companies that would co-mingle Internet
         | facing/DMZ VMs with internal VMs. When pointing this out and
         | recommending we should airgap these VMs to it's own dedicated
         | hypervisor it always fell on deaf ears. Jokes on them I guess.
        
           | Kwpolska wrote:
           | I'm pretty sure AWS/Azure/GCP don't assign separate boxes to
           | every customer, and somehow they're fine.
        
             | Bluecobra wrote:
             | Good point, I should have clarified that I was talking
             | about on-prem VMs e.g. VMWare.
        
             | yencabulator wrote:
             | You can pay AWS a premium to make sure you're the only
             | tenant on the physical machine. You can also split your own
             | stuff into multiple tenants, and keep those separate too.
        
               | nicolas_17 wrote:
               | At which point you don't really need the flexibility of
               | AWS and you might as well get a Dedicated Server
               | elsewhere?
        
               | yencabulator wrote:
               | It'll still let you do the elastic scaling stuff, billing
               | for actual usage instead of racked hardware.
        
         | phendrenad2 wrote:
         | The problem is, VMs aren't really "Virtual Machines" anymore.
         | You're not parsing opcodes in a big switch statement, you're
         | running instructions on the actual CPU, with a few hardware
         | flags that the CPU says will guarantee no data or instruction
         | overlap. It promises! But that's a hard promise to make in
         | reality.
        
           | msla wrote:
           | This is because VM means two different things and has for a
           | long time:
           | 
           | IBM's VM was and is a hypervisor. It dates to the mid 1960s,
           | in the form of CP-40, and it didn't run opcodes in software,
           | but in hardware.
           | 
           | https://en.wikipedia.org/wiki/IBM_CP-40
           | 
           | p-code machines, which interpret bytecode, date back almost
           | as far, such as the O-code machine for BCPL.
           | 
           | https://en.wikipedia.org/wiki/BCPL
           | 
           | Getting people to distinguish between these concepts is
           | probably a lost cause.
        
             | Joker_vD wrote:
             | Looking at the IBM's tech from the sixties is somehow
             | weirdly depressing: it's unbelievable how much of the
             | architectural stuff they've invented already by the 1970.
        
               | meepmorp wrote:
               | I remember seeing VMware for the first time and thinking
               | that the PC world had finally entered the 1970s.
        
               | nine_k wrote:
               | Not depressing, but inspiring. So many great
               | architectural ideas can be made accessible to millions of
               | consumers, not limited to a few thousand megacorps.
        
           | MuffinFlavored wrote:
           | > you're running instructions on the actual CPU
           | 
           | Just how many times is the average operating system workload
           | (with or without a virtual machine also running a second
           | average operating system workload) context switching a
           | second?
           | 
           | Like... unless I'm wrong... the kernel is the main process,
           | and then it slices up processes/threads, and each time those
           | run, they have their own EAX/EBX/ECX/ESP/EBP/EIP/etc. (I know
           | it's RAX, etc. for 64-bit now)
           | 
           | How many cycles is a thread/process given before it context
           | switches to the next one? How is it managing all of the
           | pushfd/popfd, etc. between them? Is this not how modern
           | operating systems work, am I misunderstanding?
        
             | toast0 wrote:
             | > How many cycles is a thread/process given before it
             | context switches to the next one?
             | 
             | Depends on a lot of things. If it's a compute heavy task,
             | and there's no I/O interrupts, the task gets one
             | "timeslice", timeslices vary, but typical times are
             | somewhere in the neighborhood of 1 ms to 100 ms. If it's an
             | I/O heavy task, chances are the task returns from a syscall
             | with new data to read (or because a write finished), does a
             | little bit of work, then does another syscall with I/O.
             | Lots of context switches in network heavy code (io_uring
             | seems promising).
             | 
             | > How is it managing all of the pushfd/popfd, etc. between
             | them?
             | 
             | The basic plan is when the kernel takes an interrupt (or
             | gets a syscall, which is an interrupt on some systems and
             | other mechanisms on others), the kernel (or the cpu) loads
             | the kernel stack pointer for the current thread, then it
             | pushes all the (relevant) cpu registers onto the stack,
             | then the kernel business it taken care of, the scheduler
             | decides which userspace thread to return to (which might be
             | the same one that was interrupted or not), the destination
             | thread's kernel stack is switched to, registers are popped,
             | then the thread's userspace stack is switched to, then
             | userspace execution resumes.
        
             | saagarjha wrote:
             | Usually a few hundred to a few thousand times a second.
        
         | trebligdivad wrote:
         | The comparison to Meltdown/Spectre are a bit misleading though
         | - they were a whole new form of attack based on timing where
         | the CPU did exactly what it should have done; This zenbleed
         | case is a good old fashioned bug though - data in a register
         | that shouldn't be.
        
         | stcredzero wrote:
         | _this is a no-breakout massive exploit that is simple to
         | execute and gives big payoffs_
         | 
         | Wouldn't we be able to avoid the "big payoffs" of no-breakout
         | exploits if we had specialized hardware handle the secrets?
        
       ___________________________________________________________________
       (page generated 2023-07-24 23:00 UTC)