[HN Gopher] Reptar
___________________________________________________________________
Reptar
Author : abhi9u
Score : 571 points
Date : 2023-11-14 17:49 UTC (1 days ago)
(HTM) web link (lock.cmpxchg8b.com)
(TXT) w3m dump (lock.cmpxchg8b.com)
| saagarjha wrote:
| See also Intel's advisory, which has a description of impact:
| https://www.intel.com/content/www/us/en/security-center/advi...
|
| > Sequence of processor instructions leads to unexpected behavior
| for some Intel(R) Processors may allow an authenticated user to
| potentially enable escalation of privilege and/or information
| disclosure and/or denial of service via local access.
| yborg wrote:
| 'Some' appears to be almost any Intel x86 CPU made in the last
| 6 years.
| tedunangst wrote:
| Their diagnosis reminds me of what happened when qemu ran into
| repz ret. https://repzret.org/p/repzret/
| Lammy wrote:
| > the processor would begin to report machine check exceptions
| and halt.
|
| I get it https://www.youtube.com/watch?v=dXekDCcw2FE
| shadowgovt wrote:
| ... it literally took me all Goddamn day. Well done.
|
| Credit where credit is due: Google has some of the best
| codenames.
| doublerabbit wrote:
| Any reason to why its named after the dinosaur from the cartoon
| Rugrats? Or was that what was on TV at the time?
|
| Maybe I should start hacking while watching Teenage Mutant Ninja
| Turtles.
| 2OEH8eoCRo0 wrote:
| rep is an assembly instruction prefix
| Blackthorn wrote:
| I think from the memey line "Halt! I am Reptar!" Plus the rep
| prefix
| AdmiralAsshat wrote:
| If you discover a major processor vulnerability and wanna name
| it Shredder/Krang/Bebop/Rocksteady, I feel like you will have
| earned that right!
| xyst wrote:
| Reading this makes me realize how little I know of the hardware
| that runs my software
|
| > Prefixes allow you to change how instructions behave by
| enabling or disabling features
|
| Why do we need "prefixes" to disable or enable features? Is this
| for dynamically toggling feature so you don't have to go into
| BIOS?
| jeffbee wrote:
| It's just because x86 as an ISA has accreted over the course of
| 40+ years, and has variable-length instructions. Every time
| they extend the ISA they carve out part of the opcode space to
| squeeze in a new prefix. This will only continue, considering
| that Intel has proposed another new scheme this year.
| shenberg wrote:
| Prefixes are modifiers to specific instructions executed by the
| processor, e.g. to control the size of the operands or enable
| locking for concurrency.
| Tuna-Fish wrote:
| x86 was designed in 78, basically for the purpose of running a
| primitive laser printer (or other similar workloads). The big
| problem with this is that the encoding space for instructions
| was "efficiently utilized". When new instructions, or worse,
| additional registers were later added, you had to fit the new
| instruction variants in somehow, and you did this by tacking on
| prefixes.
| mschuster91 wrote:
| Nah, x86 goes even earlier in its heritage - it was,
| effectively, a bolt-on on Intel's way older designs, as a
| huge part of the 8086 was being ASM source-compatible with
| the older 8xxx chips, even as the instruction set itself
| changed [1]. What utterly amazes me is that the original 8086
| was mostly designed _by hand_ by a team of not even two dozen
| people - and today, we got hundreds if not thousands of
| people working on designing ASICs...
|
| [1] https://en.wikipedia.org/wiki/Intel_8086#The_first_x86_de
| sig...
| hulitu wrote:
| It is because testing plays a bigger part today than back
| then. The complexity has also increased (people do not
| design at transistor level anymore).
| irdc wrote:
| Acckkghtually, if you go back far enough you end up at the
| Datapoint 2200. If you want to understand where some of the
| crazier parts of the 8086 originate from, Ken Shirriff has
| a nice read: http://www.righto.com/2023/08/datapoint-
| to-8086.html
| thaumasiotes wrote:
| > x86 was designed in 78, basically for the purpose of
| running a primitive laser printer
|
| It's interesting that ASCII is transparently just a bunch of
| control codes for a physical printer/typewriter, combining
| things like "advance the paper one line", "advance the paper
| one inch", "reset the carriage position", and "strike an F at
| the carriage position", all of which are different mechanical
| actions that you might want a typewriter to do.
|
| But now we have Unicode, which is dedicated to the purpose of
| assigning ID numbers to visual glyphs, and ASCII has been
| interpreted as a bunch of glyph references instead of a bunch
| of machine instructions, and there are the control codes with
| no visual representation, sitting in Unicode, being
| inappropriate in every possible way.
|
| It's kind of like if Unicode were to incorporate "start
| microwave" as part of a set with "1", "2", "3", etc.
| rswail wrote:
| ASCII was used by teletypes, not typewriters. They were
| "cylinder" heads, as compared to IBM's golfball
| typewriters.
|
| The endless CR/LF/CRLF line ending problem would have been
| solved if the RS (Record Separator) ASCII code was used
| instead of the physical CR = carriage return, ie move print
| head back to start of line, and LF = line feed, ie rotate
| paper up one line.
|
| But Unix decided on LF, Apple used CR, Windows used CRLF,
| and even today, I had to get a guy to stop setting his
| system to "Windows" because he was screwing up a git repo
| with extraneous CRs.
| db48x wrote:
| Read
| https://wiki.osdev.org/X86-64_Instruction_Encoding#Legacy_Pr...
|
| The REP prefixes are the most common; they just let you perform
| the same instruction a variable number of times. It looks in
| the CX register for the count. This makes many common loops
| really, really short, especially for moving objects around in
| memory. The memcpy function is often inlined as a single REP
| MOVS instruction, possibly with an instruction to copy the
| count into CX if it isn't already there.
|
| I suppose the REX (operand size) prefix is pretty common too,
| since 64-bit programs will want to operate on 64-bit values and
| addresses pretty frequently.
|
| None of the prefixes toggle things that can be set globally, by
| the BIOS or otherwise. They all just specify things that the
| next instruction needs to do.
| pclmulqdq wrote:
| The ModR/M and SIB prefixes are probably the most common
| prefixes in instructions. They are so common that assemblers
| elide their existence when you read code. REX is in the same
| boat: so common that it's usually elided. The VEX prefix is
| also really common (all of the V* AVX instructions, like
| VMOVDQ), and then the LOCK prefix (all atomics).
|
| After all of those, REP is not that uncommon of a prefix to
| run into, although many people prefer SIMD memcpy/memset to
| REP MOVSB/REP STOSB. It is slightly unusual.
| bonzini wrote:
| ModRM and SIB are not a prefix, they're part of the opcode
| (second and third byte after all the prefixes and the
| 0Fh/0F38h/0F3Ah opcode map selectors)
| EarlKing wrote:
| More specifically, they're affixed to _certain_ opcodes
| that require them. There are a number of byte-sized
| opcodes that do not require a ModRM or SIB byte (although
| a number of those got gobbled up to make the REX prefix,
| but that 's another story).
|
| TL;DR Weeee! Intel machine language is crazy!
| EarlKing wrote:
| There's a good reason for using vector instructions over
| REP: Until relatively recently that was how you got maximum
| performance in small, tight loops. REP is making a comeback
| precisely because of ERMS and FSRM, so unfortunately this
| will become a bigger problem going forward.
| epcoa wrote:
| This isn't correct. ModR/M and SIB are _not_ prefixes. They
| are suffixes and essentially part of the core instruction
| encoding for certain memory and register access
| instruction. they are the primary means of encoding the
| myriad addressing modes of the x86. And their existence is
| not elided in any meaningful way, their value is explicitly
| derived from the instruction operands (SIB is scale, index,
| base), so when you see an instruction like:
|
| mov BYTE PTR [rdi+rbx*4],0x4
|
| SIB is determined by the register indices of rdi, rbx, and
| 4, all right there in the instruction. Likewise, Mod R/M
| encodes the addressing mode, which is clear from the
| operands in the assembler listing. Though x86 is such as
| mess that there are cases where you can encode the same
| instruction in either a Mod R/M form or a shorter form, eg
| PUSH/POP.
|
| REX is a prefix, but it is a bit special as it must be the
| last one, and repeats are undefined. It is not elided
| because of commonality but because its presence and value
| is usually implied from the operands, it is therefore
| redundant to list it.
|
| For instance, PUSH R12 must use a REX prefix (REX.B with
| the one byte encoding).
| epcoa wrote:
| That's a very poor summary of what prefixes are. My advice,
| just skip the original article which isn't very good or
| interesting and read taviso's blog that is linked in the top
| comment (it gives a few concrete examples of these prefixes).
| They are modifiers that are part of the CPU instruction.
| ajross wrote:
| "Prefixes" in this case mostly expand the instruction encoding
| space.
|
| So rarely-used addressing modes get a "segment prefix" that
| causes them to use a segment other than DS. Or x86_64 added a
| "REX" prefix that added more bits to the register fields
| allowing for 16 GPRs. Likewise the "LOCK" prefix (though poorly
| specified originally) causes (some!) memory operations to be
| atomic with respect to the rest of the system (c.f. "LOCK
| CMPXCHG" to effect a compare-and-set).
|
| All these things are operations other CPU architectures
| represent too, though they tend to pack them into the existing
| instruction space, requiring more bits to represent every
| instruction.
|
| Notably the "REP" prefix in question turns out to be the one
| exception. This is a microcoded repeat prefix left over from
| the ancient days. But it represents operations (c.f.
| memset/memmove) that are performance-sensitive even today, so
| it's worthwhile for CPU vendors to continue to optimize them.
| Which is how the bug in question seems to have happened.
| jasonwatkinspdx wrote:
| You got some great answers already, but to your first point
| check out Hennessey and Patterson's books, namely Computer
| Architecture and Computer Organization and Design.
|
| The latter is probably more suited to you unless you wanna go
| on a dive into computer architecture itself. There's older
| editions available for free (authorized by the authors) on the
| web.
|
| I first read the 3rd edition of Computer Architecture and
| besides being one of the most clear textbooks I've ever read it
| vastly improved my understanding of what's going on in there in
| relation to OoO speculative execution, etc.
| rvba wrote:
| It looks like Intel was cutting corners to be faster than AMD and
| now all those thigs come out. How much slower will all those
| processors be after multiple errata? 10%? 30%? 50%?
|
| In a duopoly market there seems to be no real competition. And
| yes I know that some (not all) bugs also happen for AMD.
| mschuster91 wrote:
| > And yes I know that some (not all) bugs also happen for AMD.
|
| Some of these novel side-channel attacks actually even apply in
| completely unrelated architectures such as ARM [1] or RISC-V
| [2].
|
| I think the problem is not (just) a lack of competition
| (although you're right that the duopoly in desktop/laptop/non-
| cloud servers for x86 brings its own serious issues, I've
| written and ranted more often than I can count [3]), it rather
| is that modern CPUs and SoCs have simply become so utterly
| complex and loaded with decades worth of backwards-
| compatibility baggage that it is impossible for any single
| human, even a small team of the best experts you can bring
| together, to fully grasp every tiny bit of them.
|
| [1] https://www.zdnet.com/article/arm-cpus-impacted-by-rare-
| side...
|
| [2]
| https://www.sciencedirect.com/science/article/pii/S004579062...
|
| [3]
| https://hn.algolia.com/?dateRange=all&page=0&prefix=true&que...
| bobim wrote:
| So no saving grace from the ISA... humans just lost ground on
| CPU design, and I suspect the situation will worsen when AI
| will enter the picture.
| mschuster91 wrote:
| > and I suspect the situation will worsen when AI will
| enter the picture.
|
| For now, AI lacks the contextual depth - but an AI that can
| actually _design_ a CPU from scratch (and not just
| rehashing prior-art VHDL it has ... learned? somehow), if
| that happens we 'll be at a Cambrian Explosion-style event
| anyway, and all we can do is stand on the sides, munch
| popcorn and remember this tiny quote from Star Wars [1].
|
| [1] https://www.youtube.com/watch?v=Xr9s6-tuppI
| nwmcsween wrote:
| Once AI can create itself, we will most likely be
| redundant.
| snvzz wrote:
| >Some of these novel side-channel attacks actually even apply
| in completely unrelated architectures such as ARM [1] or
| RISC-V [2].
|
| Possible? Yes. But far less likely.
|
| Complexity carries over and breeds bugs. RISC-V is an order
| of magnitude simpler than ARM64, which in turn is an order of
| magnitude simpler than x86.
|
| And it is so w/o disadvantage[0], positioning itself as the
| better ISA.
|
| 0. https://news.ycombinator.com/item?id=38272318
| arp242 wrote:
| It's not clear to me this fix will have any performance impact.
| I strongly suspect it will be negligible or zero.
|
| This seems like a "simple" bug of the type that people write
| every day, not deep architectural problems like Spectre and the
| like, which also affected AMD (in roughly equal measure if I
| recall correctly).
| kmeisthax wrote:
| Parent commenter might be thinking of Meltdown, a related
| architectural bug that only bit Intel and IBM PPC. Everything
| with speculative execution has Spectre[0], but you only have
| Meltdown if you speculate across _security boundaries_.
|
| The reason why Meltdown has a more dramatic name than
| Spectre, despite being the same vulnerability, is that
| hardware privilege boundaries are the only defensible
| boundary against timing attacks. We already expect context
| switches to be expensive, so we're allowed to make them a
| little _more_ expensive. It 'd be prohibitively expensive to
| avoid leaking timing from, say, one executable library to a
| block of JIT compiled JavaScript code within the same browser
| content process.
|
| [0] https://randomascii.wordpress.com/2018/01/07/finding-a-
| cpu-d...
| akoboldfrying wrote:
| Not sure what other errata you're referring to, but this looks
| like an off-by-one in the microcode. I would expect the fix to
| have zero or minimal penalty.
| varispeed wrote:
| It's going to be a pain for cloud and shared hosting.
|
| Most likely dedicated resources on demand will be the future.
| Some companies already offer it.
| kevincox wrote:
| GCP and AWS both offer non-shared hardware. If people want the
| extra isolation they just need to pay for it.
| Flow wrote:
| Would be possible to describe a modern CPU in something like TLA+
| to find all non-electrical problems like these?
| boxfire wrote:
| There are still bit flipping tricks like rowhammer for RAM, I
| wouldn't be surprised if there are such vulnerabilities in some
| CPUs.
| sterlind wrote:
| Rowhammer is an electrical vulnerability though. PP specified
| non-electrical vulns.
| sterlind wrote:
| I've heard Intel does use TLA+ extensively for specifying their
| designs and verifying their specs. But TLA+ specs are extremely
| high-level, so they don't capture implementation details that
| can lead to bugs. And model checking isn't a formal proof, only
| (tractably small) finite state spaces can be checked with TLC.
| And even there, you're only checking the invariants you
| specified.
|
| That said, I'm sure there's some verification framework like
| SPARK for VHDL, and this feels like exactly the kind of thing
| it should catch.
| dboreham wrote:
| Formal methods have been used in CPU design for nearly 40 years
| [1] but not yet for everything, and the methods tend to not
| have "round-trip-engineering" properties (e.g. TLA+ is not
| actually proving validity of the code you will run in
| production, just your description of its behavior and your idea
| of exhaustive test cases).
|
| [1] https://www.academia.edu/60937699/The_IMS_T_800_Transputer
| foobiekr wrote:
| CPU designers are so professional about verification and
| specification that they _dwarf_ software. There's just no
| comparison.
| bobim wrote:
| Is it even possible to design a cpu with out-of-order and
| speculative execution that would have no security issue? Is the
| future leads to a swarm of disconnected A55 cores each running a
| single application?
| SmoothBrain12 wrote:
| Yes, but they won't clock as fast because they'll be waiting
| for RAM.
| bobim wrote:
| We need to keep programs small so they fit in the cache.
| moffkalast wrote:
| We need 2 GBs of L1 cache, thus solving the cache miss
| problem once and for all.
| rep_lodsb wrote:
| 640K should be enough for anyone ;)
| Tuna-Fish wrote:
| This vulnerability was not caused by OoO or speculative
| execution. It was caused by the fact that x86 was designed 45
| years ago, and has had feature after feature piled on the same
| base, which has never been adequately rebuilt.
|
| The more proximate cause is that some instructions with
| multiple redundant prefixes (which is legal, but pointless)
| have their length miscalculated by some Intel CPUs, which
| results in wrong outcomes.
| epcoa wrote:
| Not entirely pointless, redundant prefixes are occasionally
| the useful method for alignment.
| TheCoreh wrote:
| A more sensible approach for that use-case would be IMO to
| have well-defined specialized prefixes for padding, instead
| of relying on the case-by-case behavior of redundant
| prefixes. (However I understand that there's almost
| certainly a good historical reason why this was not the way
| it was done)
| bobim wrote:
| Are new ISA solving this? Time to move to Risc V?
| epcoa wrote:
| N/A and No.
| dontlaugh wrote:
| RISC V is not great at this either, with the compression
| extension being common and variable length.
|
| ARM 64 gets this right, with fixed length 32 bit
| instructions.
| snvzz wrote:
| >ARM 64 gets this right, with fixed length 32 bit
| instructions.
|
| At the expense of code density, yet RISC-V is easy to
| decode, with implementations going up to 12-way decode
| (Veyron V2) despite variable length.
|
| ARM64 hardly "gets it right".
| camel-cdr wrote:
| I wouldn't say ARM64 gets it wrong either, I think both
| are viable approaches.
| snvzz wrote:
| Both approaches are viable, but RISC-V's approach is
| better, as it provides higher code density without
| imposing a significant increase in complexity in
| exchange.
|
| Higher code density is valuable. E.g.:
|
| - The decoders can see more by looking at a window of
| code of the same size, or we can have a narrowed window.
|
| - We can have less cache and save area and power. We can
| also clock the cache higher, enabled by it being smaller,
| lowering latency cycles.
|
| - Smaller binaries or rom image.
|
| Soon to be available (2024) large, high performance
| implementations will demonstrate RISC-V advantages well.
| kccqzy wrote:
| The easiest way of doing padding is to add a bunch of
| `nop` instructions which are one byte each.
|
| If you read the manual, Intel encourages minor variations
| of the `nop` instructions that can be lengthened into
| different number of bytes (like `nop dword ptr [eax]` or
| `nop dword ptr [eax + eax*1 + 00000000h]`).
|
| It is never recommended anywhere in my knowledge to rely
| on redundant prefixes of random non-nop instructions.
| epcoa wrote:
| NOPs are not generally free.
|
| It's a pretty old and well known technique:
|
| https://stackoverflow.com/questions/48046814/what-
| methods-ca...
|
| Note that this technique is really only legitimate where
| the used prefix already has defined behavior with the
| given instruction ("Use of repeat prefixes and/or
| undefined opcodes with other Intel 64 or IA-32
| instructions is reserved; such use may cause
| unpredictable behavior."), and of course the REX prefix
| has special limitations. The key is redundant, not
| spurious. It is not a good idea to be doing rep add for
| example. But otherwise, there is no issue.
| epcoa wrote:
| The prefixes are _redundant_ so it 's not really case-by-
| case behavior. You're just repeating the prefix you would
| be using anyway in that location.
|
| Using specialized prefixes wastes encoding space for no
| real gain. You realize on most common processors NOP
| itself is a pseudo-instruction? Even the apparently meme-
| worthy (see sibling comment) RISC-V, it's ADDI x0, x0, 0.
| tedunangst wrote:
| And then there are CPUs that retcon behavioral changes
| onto nops.
|
| > Moving a register to itself is functionally a nop, but
| the processor overloads it to signal information about
| priority.
|
| https://devblogs.microsoft.com/oldnewthing/20180809-00/?p
| =99...
| _a_a_a_ wrote:
| > A program can voluntarily set itself to low priority if
| it is waiting for a spin lock
|
| What does this even mean? How can a program do this when
| thread priority is an OS thing? It's seems just weird.
| epcoa wrote:
| Hardware threads as in SMT means thread priority is also
| a hardware thing.
| tedunangst wrote:
| It's an SMT CPU that dynamically assigns decode,
| registers, etc. https://course.ece.cmu.edu/~ece740/f13/li
| b/exe/fetch.php?med...
| shadowgovt wrote:
| Usually, the historical reason is that adding the logic
| to do something well-defined when unexpected prefixes are
| used is going to cost ten more transistors per chip,
| which is going to add to cost to handle a corner case
| that almost nobody will try to be in anyway. Far better
| to let whatever the implementation does happen as long as
| what happens doesn't break the system.
|
| The issue here is their verification of possible internal
| CPU states didn't account for this one.
|
| (There is, perhaps, an argument to be made that the x86
| architecture has become _so_ complex that the emulator
| between its embarrassingly stupid PDP-11-style single-
| thread codeflow and the embarrassingly parallel
| computation it does under the hood to give the user more
| performance than a really fast PDP-11 _cannot_ be
| reliably tested to exhaustion, so perhaps something needs
| to give on the design or the cost of the chips).
| iforgotpassword wrote:
| Because they cost no/less cycles compared to NOPs?
| tedunangst wrote:
| See http://repzret.org/p/repzret/
| gumby wrote:
| > It was caused by the fact that x86 was designed 45 years
| ago, and has had feature after feature piled on the same
| base, _which has never been adequately rebuilt_.
|
| Itanic would like to object! Unfortunately it can't get
| through the door.
| nextaccountic wrote:
| I think formal methods could help designing of such machine, if
| you can write a mathematical statement that amounts to "there
| is no side channel between A and B"
|
| Or at least put a practical bound on how many bits per second
| at most you can from any such side channel (the reasoning
| being, if you can get at most a bit for each million years, you
| probably don't have an attack)
|
| Then you verify if a given design meets this constraint
| mgaunard wrote:
| A program is itself a formal specification of what an
| algorithm does.
| bobim wrote:
| What would be the typical size of such a constraint-based
| problem, and do we have the compute power to translate the
| rules into an implementation? And what if one forgot a rule
| somewhere... Deeply interesting subject.
| less_less wrote:
| I think you'd want it to be a theorem (in Lean, Coq,
| Isabelle/HOL or whatever) instead of a constraint problem.
| So it would be more limited by developer effort than by
| computational power.
|
| Theoretically you can do this from software down to
| (idealized) gates, but in practice the effort is so great
| that it's only been done in extremely limited systems.
| tsimionescu wrote:
| Formal methods are widely used in processor design. It is
| hard to formalize specs to assert behaviors that bugs we
| haven't thought about don't exist. At least hard while also
| preserving the property of being a Turing machine.
| nextaccountic wrote:
| I know. I mean applying formal methods to this specific
| problem of proving side channels don't exist (which seems a
| very hard thing to do and might even require to modify the
| whole design to be amenable to this analysis)
| less_less wrote:
| As a tidbit, this was part of how one of the teams
| involved in the original Spectre paper found some of the
| vulnerabilities. Basically the idea was to design a small
| CPU that could be formally shown to be free of certain
| timing attacks. In the process they found a bunch of
| things that would have to change for the analysis to
| work... maybe in a small system those wouldn't _actually_
| lead to vulnerabilities, but they couldn 't prove it (or
| it would require lots of careful analysis). And in big
| systems, those features do lead to vulnerabilities.
| nextaccountic wrote:
| That's amazing!
|
| Do you have some link about the designed CPU?
| akoboldfrying wrote:
| Well, the bug in this specific case (based on the article by
| Tavis O. linked elsewhere in comments) looks to be the regular
| kind -- probably an off-by-one in a microcode edge case. That
| is, here it's _not_ the case that the CPU functions correctly
| but leaves behind traces of things that should be private in
| timing side channels, as was the case for Spectre.
| trebligdivad wrote:
| Yeh just a fun bug rather than anything too fundamental.
| Still, it is a fun bug.
| JohnBooty wrote:
| Is the future leads to a swarm of disconnected A55
| cores each running a single application?
|
| don't you dare tease me like that
| bobim wrote:
| And programmed in... Forth!
| lmm wrote:
| > Is it even possible to design a cpu with out-of-order and
| speculative execution that would have no security issue?
|
| Yes, of course. But we'd have to put actual effort in, and
| realistically people wouldn't pay enough extra to make it
| worthwhile.
| tasty_freeze wrote:
| Benchmarking is always problematic -- what is a good
| representative workload? All the same, I'd be curious if the
| ucode update that plugs this bug has affected CPU performance,
| eg, it diverts the "fast short rep move" path to just use the
| "bad for short moves but great for long moves" version.
| akoboldfrying wrote:
| In the article by Tavis O. linked elsewhere in comments, he
| suggests disabling the FSRM CPU feature _only as an expensive
| workaround_ to be taken only if the microcode can 't be updated
| for some reason. That suggests to me that he, at least, expects
| the update to do better.
| ReactiveJelly wrote:
| That would be the conservative thing to do. If there's no limit
| on microcode updates, if I was Intel, I'd consider doing that
| first and then speeding it up again later. Based on the
| 5-second guess that people who update everything regularly will
| care that we did the right thing for security, and people who
| hate updates won't be happy anyway, so at least the first
| update will be secure if they never get the next one.
|
| (I think there is a limit on microcode, they seem conservative
| to release new ones - I don't remember the details)
| kevincox wrote:
| It's a shame that Google didn't publish numbers. They have very
| good profiling across all of their servers and probably have
| incredibly high confidence numbers for the real-world impact on
| this. (Assuming that your world is lots of copying protocol
| buffers in C++ and Java)
| writeslowly wrote:
| I noticed the Intel advisory [1] says the following
|
| Intel would like to thank Intel employees:[...] for finding this
| issue internally.
|
| Intel would like to thank Google Employees: [...] for also
| reporting this issue.
|
| [1] https://www.intel.com/content/www/us/en/security-
| center/advi...
| narinxas wrote:
| I wonder how much sooner than google did intel employees found
| this issue
| narinxas wrote:
| but what I am really wondering about is how much money (if
| any) was the vulnerability worth up the moment when google
| also discovered this?
| ajross wrote:
| As described it's just a CPU crash exploit that requires
| local binary execution. Getting to a vulnerability would
| require understanding exactly how the corrupted microcode
| state works, and that seems extremely difficult outside of
| Intel.
|
| So as described, this isn't a "valuable" bug.
| derefr wrote:
| This assumes that either 1. partners and interested
| sponsor-state state actors aren't kept abreast Intel's
| microcode backend architecture, or 2. that there hasn't
| been at least one leak of this information from one of
| these partners into the hands of interested APT
| developers. I wouldn't put strong faith in either of
| these assumptions.
| ajross wrote:
| It does, but the same is true for virtually any such
| crash vulnerability. The question was whether this was a
| "valuable exploit", not whether it might theoretically be
| worse.
|
| The space of theoretically-very-bad attacks is much
| larger than practical ones people will pay for, c.f.
| rowhammer.
| ethbr1 wrote:
| >> _Getting to a vulnerability would require
| understanding exactly how the corrupted microcode state
| works, and that seems extremely difficult outside of
| Intel._
|
| Intel knows exactly how their ROB works.
|
| Therefore Intel knows the possible consequences of this
| bug and how to trigger them.
|
| _If_ there is a privilege execution path from this,
| Intel knows. And anyone Intel chose to share it with
| knew.
|
| Thankfully, since it's public now, the value of that
| decreases and customers can begin to mitigate.
| ajross wrote:
| > If there is a privilege execution path from this, Intel
| knows. And anyone Intel chose to share it with knew.
|
| No, or at least not yet. I mean, I've written plenty of
| bugs. More than I can count. How many of them were
| genuine security vulnerabilities if properly exploited?
| Probably not zero. But... I don't know. And I wrote the
| code!
| saagarjha wrote:
| Intel said it can be used for escalation if that answers
| your question.
| lmm wrote:
| Did they confirm that it can definitely be used for
| escalation? The description I saw was "may allow an
| authenticated user to potentially enable escalation of
| privilege and/or information disclosure and/or denial of
| service via local access" which sounds like they're
| covering all their bases and may not actually know what
| is and isn't possible.
| dgacmu wrote:
| It's not super-valuable yet, but it would keep you mount
| a really nasty DoS on cloud providers by triggering hard
| resets of the physical machines. Some people would
| probably pay for that, though it's obviously more
| interesting to push on privilege or exfiltration.
|
| Particularly since the MCEs triggered could prevent an
| automatic reboot. Would depend what the hardware
| management system did - do machines presenting MCEs get
| pulled?
| toast0 wrote:
| If I'm a cloud provider and somebody's workflow is hard
| resetting lots of my physical machines, I'm going to give
| them free access to single tenant machines at the very
| minimum. If they keep crashing the machines that only
| they run on, I guess that's ok.
| dgacmu wrote:
| You can exploit this from a single core shared instance.
|
| So you go and find yourself a thousand cheap / free tier
| accounts, spin up an instance in a few regions each, and
| boom, you've taken out 10k physical hosts. And run it in
| a lambda at the same time, and see how well the security
| mechanisms identify and isolate you.
|
| Causing a near simultaneous reboot of enough hosts is
| likely to take other parts of the infrastructure down.
| ajross wrote:
| I'm curious what part of this scheme involves "not ending
| up in jail"? Needless to say you can't do this without
| identifying yourself. To make this an exploitable DoS
| attack you need to be able to run arbitrary binaries on a
| few thousand cloud hosts _that you didn 't lease
| yourself_.
| blibble wrote:
| there exist people outside of your jurisdiction
|
| e.g. the GRU
| TeMPOraL wrote:
| So Replit, Godbolt, and whatever other cloud-hosted
| compilers are there?
| mschuster91 wrote:
| > I'm curious what part of this scheme involves "not
| ending up in jail"? Needless to say you can't do this
| without identifying yourself.
|
| Stolen credit cards are a dime a dozen, and nation state
| actors can just use their domestic banks or agents in the
| banks of other countries in a pinch to deflect blame or
| lay false trails.
|
| If I were Russia or China, I'd invest _a lot_ of money
| into researching all kinds of avenues on how to take out
| the large three public cloud providers if need be: take
| out AWS, Google, Microsoft and on the CDN side Cloudflare
| and Akamai and suddenly the entire Western economy grinds
| to a halt.
|
| The only ones who will not be affected are the US
| government cloud services in AWS, as this runs separate
| from other AWS regions - that is, unless the attacker
| gets access to credentials that allow them executions on
| the GovCloud regions...
| vbezhenar wrote:
| If clouds use shared servers to run their management
| workloads and if very important companies use shared
| servers to run their workloads, they would deserve it.
|
| But I don't believe it. People are not that stupid.
| mschuster91 wrote:
| > If clouds use shared servers to run their management
| workloads and if very important companies use shared
| servers to run their workloads, they would deserve it.
|
| Why target the management plane? Fire off payloads to
| take down the physical VM hosts and suddenly any cloud
| provider has a serious issue because the entire compute
| capacity drops.
| ajross wrote:
| > If I were Russia or China, I'd invest a lot of money
| into researching all kinds of avenues on how to take out
| the large three public cloud providers
|
| This subthread started with "is this issue a valuable
| exploit". Needless to say, if you need to invoke
| superpower-scale cyber warfare to find an application,
| the answer is "no". Russia and China have plenty of
| options to "take out" western infrastructure if they're
| willing to blow things up[1] at that scale.
|
| [1] Figuratively and literally
| dgacmu wrote:
| Countries have proven far more reticent to use kinetic
| options vs. cyberattacks. Or, put differently, we're all
| hacking each other left and right and the responses have
| thus far mostly remained in the digital realm.
|
| See, e.g., https://madsciblog.tradoc.army.mil/156-what-
| is-the-threshold...
|
| > responses are usually proportional to and in the same
| domain as the provocation
| mschuster91 wrote:
| > Or, put differently, we're all hacking each other left
| and right and the responses have thus far mostly remained
| in the digital realm.
|
| Which is both good and bad at the same time. Cyber
| warfare has been significantly impacting our economies
| and our citizens - anything from scam callcenters over
| ransomware to industrial espionage - to the tune of many
| dozens of billions of dollars a year. And yet, no Western
| government has ever held the bad actors publicly
| accountable, which means that they will continue to be a
| drain on our resources at best and a threat to national
| security at worst (e.g. the Chinese F-35 hack).
|
| I mean, I'm not calling for nuking Bejing, that would be
| disproportionate - but even after all that's happened,
| Russia and China are still connected to the global
| Internet, no sanctions, nothing.
| blibble wrote:
| it's not superpower-scale
|
| some bored kid with a couple of hundred stolen credit
| cards can bring down a significant chunk of AWS/GCP/...
| dgacmu wrote:
| I mean, you kinda can. There's a depressingly thriving
| market for stolen cards and things like compromised
| accounts. A card is a couple of dollars. There are many
| jurisdictions that turn a blind eye to hacking us
| companies. Look at how hard it's been to rein in the
| ransomware gangs and even 'booter' (ddos-for-rent)
| services.
|
| DoS isn't as lucrative as other things; I assume that
| most state actors would far prefer to find a way to turn
| this into a privilege escalation. But being able to
| possibly take out a cloud provider for a while is still
| monetizable.
| sweetjuly wrote:
| The blogpost describes that unrelated sibling SMT threads
| can become corrupted and branch erratically. If you can
| get a hypervisor thread executing as your SMT sibling and
| you can figure out how to control it (this is not an if
| so much as a when), that's a VM escape. The Intel
| advisory acknowledges this too when they say it can lead
| to privilege escalation. This is hardly a useless bug, in
| fact it's awfully powerful!
| jefc1111 wrote:
| This was a lot more fun than the Google puff piece.
| frontalier wrote:
| The date on the article is for tomorrow?
| bitwize wrote:
| Cereal Killer: Check this out, it's a memo about how they're
| gonna deal with those oil spills on the 14th.
|
| Acid Burn: What oil spills?
|
| Lord Nikon: Yo, brain dead, today's the 13th.
|
| Cereal Killer: Whoa, this hasn't happened yet!
| quietpain wrote:
| ...our validation pipeline produced an interesting assertion...
|
| What is a validation pipeline?
| tonfa wrote:
| The blog has a link to
| https://lock.cmpxchg8b.com/zenbleed.html#discovery which
| presents the concept.
| ForkMeOnTinder wrote:
| It's described one paragraph earlier.
|
| > I've written previously about a processor validation
| technique called Oracle Serialization that we've been using.
| The idea is to generate two forms of the same randomly
| generated program and verify their final state is identical.
| 1f60c wrote:
| Sounds like the real story should be that Google solved the
| halting problem. :-P
| kadoban wrote:
| You're free to solve the halting problem for restricted
| sets of programs, that doesn't break any rules of the
| universe.
|
| They also could be just discarding any where it runs for
| longer than X time, or a bunch of other possibilities.
| tgv wrote:
| They might be generating programs that they know will
| halt. Like: applications with finite loops and such.
| There are not enough details.
| mike_d wrote:
| The most awesome part:
|
| > This bug was independently discovered by multiple research
| teams within Google, including the silifuzz team and Google
| Information Security Engineering.
| yodon wrote:
| Dupe: https://news.ycombinator.com/item?id=38268043
|
| (As of this writing, this post has more votes, the other has more
| comments)
| dang wrote:
| We'll merge that one hither. Please stand by!
| blauditore wrote:
| Can someone give a TL;DR for non-CPU experts? All technical
| articles seem pretty long and/or complex.
| Arnavion wrote:
| Some x86 instructions can have prefixes that modify their
| behavior in a meaningful way. Such a prefix can be applied
| generally to any instruction, but it's expected to have no
| effect when applied to an instruction it doesn't make sense
| with. But it turns out the CPU actually misbehaves in some
| cases when this is done. Intel released a CPU firmware update
| to fix it.
| kmeisthax wrote:
| x86 has a builtin memory copy instruction, provided by the
| combination of the movsb instruction and a rep _prefix byte_ ,
| that says you want the instruction to run in a loop until it
| runs out data to copy. This is "rep movsb". This instruction is
| fairly old, meaning a lot of code still has it, even though
| there's faster ways to copy memory in x86.
|
| Intel added two features to modern x86 chips that detects rep
| movsb and accelerates it to be as fast as those other ways.
| However, those features have a bug. You see, because rep is a
| prefix byte, you can just keep adding more prefix bytes to the
| instruction (up to a maximum of 16 AFAIK). x86 has other prefix
| bytes too, such as rex (used to access registers 8-16), vex,
| evex, etc. The part of the processor that recognizes a rep
| movsb does NOT account for these other prefix bytes, which
| makes the processor get confused in ways that are difficult to
| understand. The processor can start executing garbage, take the
| wrong branch in if statements, and so on.
|
| Most disturbingly, when multiple physical cores are executing
| these "rep rep rep rep movsb" instructions at the same time,
| they will start generating machine check exceptions, which can
| at worst force a physical machine reboot. This is very bad for
| Google because they rent out compute time to different
| companies and they all need to be able to share the same
| machine. They don't want some prankster running these
| instructions and killing someone else's compute jobs. We call
| this a "Denial of Service" vulnerability because, while I can't
| read someone else's computations or change them, I _can_ keep
| them from completing, which is just as bad.
| BlueTemplar wrote:
| > they all need to be able to share the same machine
|
| Do they ? As these issues keep piling up, it just seems that
| it's not worth the hassle, and they should instead never do
| sharing like this...
| jrockway wrote:
| To some extent, anyone with a web browser is sharing their
| machine with other people. That's Javascript.
|
| If you ever download untrustworthy code and run it in a VM
| to protect your main set of data, that's another case.
|
| The success of cloud computing is from the idea that
| multiple people can share the same computer. You only need
| one core, but CPUs come with 128, but with the cloud you
| can buy just that one core and share 1/128th of the power
| supply, rack space, motherboard, ethernet cable, sysadmin
| time, etc. and that reduces your costs. That assumption is
| all based on virtualization working, though; nobody wants
| 1/128th of someone else's computer, they want their own
| computer that's 1/128th as fast. Bugs like these
| demonstrate that you're just sharing a computer with
| someone, which is bad for the business of cloud providers.
| BlueTemplar wrote:
| My point is that for a sufficiently large user, you can
| probably use enough of the 128 cores by yourself alone,
| that it's more worthwhile to do that and turn off these
| mitigations : both because it removes a whole class of
| threats, and also because the mitigations tend to have a
| non-negligible performance impact, especially when first
| discovered, on chips that haven't been designed to
| protect against them.
| jrockway wrote:
| I very much agree with that. The reality is that cloud
| providers can replace entire machines with only a small
| latency blip in your application (or at least GCP can),
| so if you are doing things like buying 2 core VMs 64
| times to avoid losing more than 1% capacity when a
| machine dies, you probably don't actually need to do
| that. You could get a 128 core dedicated machine, and
| then not share it with anyone, and your availability time
| in that region/AZ probably wouldn't change much.
|
| That said, machines are really monstrously huge these
| days, and it can be hard to put them to good use. You
| also miss out on cost savings like burstable instances,
| which rely on someone else using the capacity for the 16
| hours a day when you don't need it. It's a balance, but
| I'd say "just buy a computer" would be my starting point
| for most application deployments.
| kevincox wrote:
| If you don't want to share GCP and AWS both offer ways to
| rent machines that aren't shared with other users. But for
| most people the cost isn't worth it because shared machines
| work well enough and provide much better resource
| utilization.
| kmeisthax wrote:
| So your argument is that everyone who wants to run a
| WordPress blog should be paying $320/mo[0] to rent a whole
| machine just so we can avoid one _specific_ kind of
| security problem?
|
| [0] Based on the cost to rent an EC2 Dedicated Host (a1
| family). See https://aws.amazon.com/ec2/dedicated-
| hosts/pricing/
| rep_lodsb wrote:
| The REX prefix is redundant for 'movsb', but not 'movsd'/'movsq'
| (moving either 32- or 64-bit words, depending on the prefix).
| That may have something to do with the bug, if there is any
| shared microcode between those instructions?
| ZoomerCretin wrote:
| Intel is a known partner of the NSA. If Intel was intentionally
| creating backdoors at the behest of the NSA, how would they look
| different from this vulnerability and the many other discovered
| vulnerabilities before it?
| rep_lodsb wrote:
| My guess is that it would be something that could be exploited
| via JavaScript. And no JIT would emit an instruction like the
| one that causes this bug.
| thelittleone wrote:
| But so is Google. It would be some very crafty theatrics if
| it's all coordinated.
| ZoomerCretin wrote:
| Only the people inserting the backdoor or using it would need
| to be bound by a National Security Letter's gag order. I
| doubt anyone at Google (including those subject to NSL gag
| orders) was made aware of this specific vulnerability.
|
| # Google's commitment to collaboration and hardware security
|
| ## As Reptar, Zenbleed, and Downfall suggest, computing
| hardware and processors remain susceptible to these types of
| vulnerabilities. This trend will only continue as hardware
| becomes increasingly complex. This is why Google continues to
| invest heavily in CPU and vulnerability research. Work like
| this, done in close collaboration with our industry partners,
| allows us to keep users safe and is critical to finding and
| mitigating vulnerabilities before they can be exploited.
|
| There's a tension between the NSA wanting backdoors and
| service providers (CPU designers + Cloud hosting) wanting
| secure platforms. It's possible that by employing CPU and
| security researchers, Google can tip the scales a bit further
| in their favor.
| gosub100 wrote:
| the backdoor would just be an encrypted stream of "random" data
| flowing right out the RNG. there's some maxim of crypto that
| encrypted data is indistinguishable from random bytes.
| tedunangst wrote:
| How would you distinguish this backdoor from one inserted by an
| unknown partner of the NSA?
| dang wrote:
| Related: https://cloud.google.com/blog/products/identity-
| security/goo...
|
| (via https://news.ycombinator.com/item?id=38268043, but we merged
| the comments hither)
| quotemstr wrote:
| If the problem really is that the processor is confused about
| instruction length, I'm impressed that this problem can be fixed
| in microcode without a huge performance hit: my intuition (which
| could be totally wrong) is that computing the length of an
| instruction would be something synthesized directly to logic
| gates.
|
| Actually, come to think of it, my hunch is that the uOP decoder
| (presumably in hardware) is actually fine and that the microcoded
| optimized copy routine is trying to infer things about the uOP
| stream that just aren't true --- "Oh, this is a rep mov, so of
| course I need to go backward two uOPs to loop" or something.
|
| I expect Intel's CPU team isn't going to divulge the details
| though. :-)
| ShadowBanThis01 wrote:
| Is what? Another useless title.
| eigenform wrote:
| I wonder which MCEs are being taken when this is triggered?
| malkia wrote:
| Konrad Magnusson from Paradox Interactive (Victoria 3) team found
| something related to that and mimalloc ->
| https://github.com/microsoft/mimalloc/issues/807
|
| Not sure if fully related, but possibly.
| saagarjha wrote:
| Seems unlikely unless they somehow emitted redundant prefixes
| lights0123 wrote:
| The article mentions
|
| > This fact is sometimes useful; compilers can use redundant
| prefixes to pad a single instruction to a desirable alignment
| boundary.
|
| so I imagine that could happen under the right optimization
| mode.
| ithkuil wrote:
| Why would a compiler prefer a redundant prefix over a nop
| for alignment?
| Vecr wrote:
| It can be faster (at runtime).
| ithkuil wrote:
| so basically you're saying that the cpu frontend missed
| the opportunity to ignore the 0x90 because it was an
| actual instruction which would be converted into an
| actual nop uop?
|
| Is this still the case or modern intel CPUs optimize out
| the nop in the frontend decoder?
| Vecr wrote:
| Some compiler writers thought that was the case, if [0]
| is related to OP. I don't have a "modern" (after 6th gen)
| Intel CPU to test it on, but note that most programs are
| compiled for a relatively generic CPU.
|
| [0]: https://github.com/microsoft/mimalloc/issues/807
| rasz wrote:
| tedunangst down in the comments linked
| https://repzret.org/p/repzret/ :
|
| "Looking in the old AMD optimisation guide for the then-
| current K8 processor microarchitecture (the first
| implementation of 64bit x86!), there is effectively
| mention of a "Two-Byte Near-Return ret Instruction".
|
| The text goes on to explain in advice 6.2 that "A two-
| byte ret has a rep instruction inserted before the ret,
| which produces the functional equivalent of the single-
| byte near-return ret instruction".
|
| It says that this form is preferred to the simple ret
| either when it is the target of any kind of branch,
| conditional (jne/je/...) or unconditional (jmp/call/...),
| or when it directly follows a conditional branch.
|
| Basically, when the next instruction after a branch is a
| ret, whether the branch was taken or not, it should have
| a rep prefix.
|
| Why? Because "The processor is unable to apply a branch
| prediction to the single-byte near-return form (opcode
| C3h) of the ret instruction." Thus, "Use of a two-byte
| near-return can improve performance", because it is not
| affected by this shortcoming."
|
| ...
|
| " If a ret is at an odd offset and follows another
| branch, they will share a branch selector and will
| therefore be mispredicted (only when the branch was taken
| at least once, else it would not take up any branch
| indicator %2B selector). Otherwise, if it is the target
| of a branch, and if it is at an even offset but not
| 16-byte aligned, as all branch indicators are at odd
| offsets except at byte 0, it will have no branch
| indicator, thus no branch selector, and will be
| mispredicted.
|
| Looking back at the gcc mailing list message introducing
| repz ret, we understand that previously, gcc generated:
| nop, ret
|
| But decoding two instructions is more expensive than the
| equivalent repz ret.
|
| The optimization guide for the following AMD CPU
| generation, the K10, has an interesting modification in
| the advice 6.2: instead of the two byte repz ret, the
| three-byte ret 0 is recommended
|
| Continuing in the following generation of AMD CPUs,
| Bulldozer, we see that any advice regarding ret has
| disappeared from the optimization guide."
|
| TLDR: Blame AMD K8! First x64 CPU. This GCC optimization
| is outdated and should only be used when specifically
| optimizing for K8.
| farhanhubble wrote:
| This is such an interesting read, right in the league of
| "Smashing the stack" and "row hammer". As someone with very
| little knowledge of security I wonder if CPU designers do any
| kind of formal verification of the microcode architecture?
| saagarjha wrote:
| Yes.
| asylteltine wrote:
| Interesting write up. The submission needs a better and more
| accurate title though
| atesti wrote:
| I don't understand "ERMS" and "FSRM" and there seems to be
| nothing good on google about it.
|
| Are these just CPUID flags that tell you that you can use a rep
| movsb for maximum performance instead of optimized SSE memcpy
| implementations? Or is it a special encoding/prefix for rep movsb
| to make it faster? In case of the later, why would that be
| necessary? How does one make use of fsrm?
| tommiegannert wrote:
| Found this [1], which also links to the Intel Optimization
| Manual [2].
|
| Seems like ERMS was a cheaper replacement for AVX and FSRM was
| a better version, for shorter blocks.
|
| > Cheapest versions of later processors - Kaby Lake Celeron and
| Pentium, released in 2017, don't have AVX that could have been
| used for fast memory copy, but still have the Enhanced REP
| MOVSB. And some of Intel's mobile and low-power architectures
| released in 2018 and onwards, which were not based on SkyLake,
| copy about twice more bytes per CPU cycle with REP MOVSB than
| previous generations of microarchitectures.
|
| > Enhanced REP MOVSB (ERMSB) before the Ice Lake
| microarchitecture with Fast Short REP MOV (FSRM) was only
| faster than AVX copy or general-use register copy if the block
| size is at least 256 bytes. For the blocks below 64 bytes, it
| was much slower, because there is a high internal startup in
| ERMSB - about 35 cycles. The FSRM feature intended blocks
| before 128 bytes also be quick.
|
| [1] https://stackoverflow.com/a/43837564
|
| [2]
| http://www.intel.com/content/dam/www/public/us/en/documents/...
| ithkuil wrote:
| FSRM is just the name of a cpu optimization that affects
| existing code.
|
| Choosing an optimal instruction choice and scheduling can be
| done statically during compile time or dynamically (via chosing
| one of several library functions at runtime, or jitting).
|
| In order to be able to detect which is the optimal instruction
| scheduling at runtime you need to know the actual CPU. You
| could have a table of all cpu models or you could just ask your
| OS whether the CPU you run on has that optimization
| implemented.
|
| Linux had to be patched so that it can _report_ that a CPU does
| implement that optimization.
|
| https://www.phoronix.com/news/Intel-5.6-FSRM-Memmove
| rwmj wrote:
| The flags just tell you that, on this CPU, rep movsb is fast so
| you don't need to use an SSE/AVX-optimized implementation.
| tommiegannert wrote:
| Nice find. That indeed sounds terrible for anyone executing
| external code in what they believe to be sandboxes. Good thing it
| can be patched (and AFAICT, it seems to be a good fix, rather
| than a performance-affecting workaround.)
| tazjin wrote:
| Can we get a better title for this? "Reptar - new CPU
| vulnerability" or something. I thought it was some random startup
| ad until I picked up the name somewhere else.
| weinzierl wrote:
| If it is changed to what you suggested a question mark would be
| warranted, because it is not yet clear what can be done with
| this _" glitch"_ (as the article calls it).
| Thorrez wrote:
| Intel says
|
| >A potential security vulnerability in some Intel(r)
| Processors may allow escalation of privilege and/or
| information disclosure and/or denial of service via local
| access.
|
| https://www.intel.com/content/www/us/en/security-
| center/advi...
| Borg3 wrote:
| Uhm.. Why not padding using NOP ? Looks much more safer that
| slapping around random prefixes.
| muricula wrote:
| Modern Intel CPUs I am led to believe that issuing nops is
| actually slower than adding prefixes. I think there is work in
| the backend updating retired instruction counters and other
| state which still occurs for nops, but decoding prefixes
| happens entirely in the front end.
|
| When a nop truly is necessary you will see compilers and
| performance engineers add prefixes to the nop to make it the
| desired size.
| krylon wrote:
| This is very well written. I know little about assembly
| programming and Intel's ISA, let alone their microarchitectures,
| but I could follow the explanation and feel like I have a rough
| understanding of what is going on here.
|
| Does anyone know if AMD CPUs are affected?
| purpleidea wrote:
| In this new Intel microcode bug, Tavis writes:
|
| "We know something strange is happening, but how microcode works
| in modern systems is a closely guarded secret."
|
| My question: How likely is it that this is an intentional bug
| door that was added into the microcode by Intel and its
| government partners?
|
| I don't know enough about microcode and CPU's to be able to
| answer this myself, so backed-up opinions welcome!
| jsnell wrote:
| 0%.
|
| This isn't how anyone would backdoor a CPU. An actual backdoor
| would be done via some instruction sequence that is basically
| impossible to trigger by accident and hard to detect even when
| triggered.
| fsflover wrote:
| Can you give an example of such sequence? Is it really so
| easy to hide it given that the microcode can be decoded in
| principle, https://news.ycombinator.com/item?id=32145324? Why
| is hiding it in a "bug" a worse solution? Why you can't do
| both?
| jsnell wrote:
| Here's a couple of plausible ways.
|
| One is to make the condition for the backdoor trigger based
| on multiple (unlikely) instructions in sequence. This bug
| was triggered by a single instruction, so it would have
| been a pretty easy case for fuzzing. If you need a sequence
| of 10 specific instructions in a specific sequence, with no
| kind of observable side-effects for getting just the first
| 9 right so that nobody can do a guided search? That's not
| going to be found just by random chance. It doesn't matter
| _what_ those instructions are, as long as they 're not
| something that would get generated by real compilers on
| real programs.
|
| The other is to make it dependent on the data rather than
| just the static instructions. Like, what if you had the
| SHA1 acceleration instructions trigger a backdoor iff the
| output of the hash is a certain value? You could probably
| even arrange for the backdoor to get triggered from managed
| and sandboxed runtimes like Javascript, rather than needing
| to get the victim to run native code. And somebody
| triggering this by accident would be equivalent to a SHA1
| preimage collision.
___________________________________________________________________
(page generated 2023-11-15 23:01 UTC)