[HN Gopher] Speculating the Entire x86-64 Instruction Set in Sec...
___________________________________________________________________
Speculating the Entire x86-64 Instruction Set in Seconds with One
Weird Trick
Author : muricula
Score : 78 points
Date : 2021-03-25 19:07 UTC (3 hours ago)
(HTM) web link (blog.can.ac)
(TXT) w3m dump (blog.can.ac)
| blight wrote:
| be sure to check out the data set extracted by this research over
| at https://haruspex.can.ac/
| alpb wrote:
| Off-topic: This was posted yesterday, but got no attention. (I
| tried to re-post yesterday, but got redirected to the existing
| post.) https://news.ycombinator.com/item?id=26576032 I wonder
| what makes HN disallow reposting of the same URL in a short
| periods of time but allowing it in long-term.
| woodruffw wrote:
| This is a really clever technique! I was impressed by
| sandsifter[1] when it originally came out, and this seems an
| awful lot faster and less prone to false negatives (since it's
| purely speculative and doesn't require sandsifter's `#PF` hack).
|
| At the risk of unwarranted self-promotion: the other side of this
| equation is fidelity in _software_ instruction set decoders. x86
| 's massive size and layers of historical complexity make it among
| the most difficult instruction formats to accurately decode; I've
| spent a good part of the last two years working on a fuzzer
| that's discovered thousands of bugs in various popular x86
| decoders[2][3].
|
| [1]: https://github.com/xoreaxeaxeax/sandsifter
|
| [2]: https://github.com/trailofbits/mishegos
|
| [3]: https://ww.easychair.org/publications/preprint_download/1LHr
| [deleted]
| TrainedMonkey wrote:
| This just cinched to me that we need to sunset hardware x86 and
| run code that cannot be recompiled on emulators. x86 had a good
| run, but it becoming increasingly obvious that maintaining
| backward compatibility in modern high performance parts is
| incredibly expensive and bug ridden.
|
| At this point sunsetting x86 is not even a pipe dream, most
| people carry ARM powered computers in their pockets and Apple
| recently demonstrated that it can be quite successful in high
| performance devices.
| matthewmacleod wrote:
| This is such a weird meme to me. In what way is the thing you
| described at all "increasingly obvious"? We see this kind of
| statement all the time, but almost _never_ accompanied by any
| sort of actual rationale for it.
| mhh__ wrote:
| > but it becoming increasingly obvious that maintaining
| backward compatibility in modern high performance parts is
| incredibly expensive and bug ridden
|
| Moving everything to ARM will not be cheap. As for bugs that is
| entirely dependent on the company making the chip, which ones
| do you have in mind? (Also recall that M1 is vulnerable to
| Spectre too).
|
| I kind of hope X86s days are numbered as well, but I'm not
| looking forward to an ARM monoculture.
| [deleted]
| als0 wrote:
| > Moving everything to ARM will not be cheap
|
| I thought the same thing until I saw how well Apple's Rosetta
| 2 works. Now that we've seen what's possible, I'm hoping that
| the x64 emulation in Windows/ARM will rise to the challenge
| set by Apple.
| monocasa wrote:
| I imagine Apple has a patent on the cute optional TSO
| memory model that makes it work.
| mhh__ wrote:
| And to be able to use Rosetta 2 I only have to spend the
| combined value of all my computers again? Double that if I
| want the same amount of RAM and storage as I have now
| als0 wrote:
| You don't have to use Rosetta 2 - that was an example of
| a good implementation. I did mention the announced x64
| emulation on Windows, but that's only a preview release
| at the moment. If you're a Linux user the only thing I'm
| aware of is QEMU TCG but there might be faster projects
| out there.
| monocasa wrote:
| While some of the specifics are obviously x86 specific, I'm not
| sure that the underlying issues here are. From undocumented
| instructions, to lack of documentation about speculation
| barriers most of the root issues you see here are equally
| applicable to the cores in an M1.
| als0 wrote:
| > to lack of documentation about speculation barriers
|
| Can you elaborate on this? I was having a look at the M1
| instructions here [1] and it seems that they implement at
| least one of the barriers, CSDB, which is actually documented
| by ARM[2].
|
| [1] https://dougallj.github.io/applecpu/firestorm-int.html
|
| [2] https://developer.arm.com/documentation/ddi0596/2020-12/B
| ase...
| monocasa wrote:
| There's explicit speculation barriers that they document,
| but they don't tell you what _other_ instructions are
| speculation barriers simply due to microarchitectural
| compromises like what you're seeing in this article.
| als0 wrote:
| OK thanks for clarifying. If there are explicit
| instructions for blocking speculation then why are you
| concerned about implicit barriers?
| phkahler wrote:
| Seems like implicit undocumented barriers could be used
| in critical code to provide unfair performance advantage.
| Maybe. Might be a bit of a stretch.
| monocasa wrote:
| I'm not. My point is that this article was being unfairly
| interpreted by the parent as some black mark against x86.
| There's plenty to hold against it, but not really
| anything that's documented here. I probably should have
| made that clearer.
| mhh__ wrote:
| I'm kind of curious whether Apple aren't documenting M1
| because they can't be bothered, can't (not organized enough
| yet), or because it's their toy not ours.
| monocasa wrote:
| D) Scared that disclosing microarchitectural details will
| open them up to more patent fights that they have to pay
| off without a lot of benefit to endusers or developers.
|
| E) All of the above
| retrac wrote:
| VME support was broken on Ryzen for a while before a microcode
| patch came out. The VME instructions are a relatively obscure,
| and originally Intel proprietary extension, to i386's virtual
| 8086 mode. Introduced in the 90s to speed up DOS virtual
| machines under OS/2 and NT, I think.
|
| I don't know much about how stuff is implemented these days,
| but virtual 8086 mode involves some page table muckery and
| similar. Surely implementing it creates a larger exposure area,
| from a security standpoint.
| sesuximo wrote:
| A lot of code doesn't really care about speculative execution.
| It would be a shame to throw out decades of development and
| thousands of cpus just for one use case that didn't totally fit
| it.
| muricula wrote:
| Many years ago I worked with a greybeard AMD hardware designer.
| He told me that they commissioned a study about whether it made
| sense to ditch backwards compatibility, and realized that the
| parts of the CPU needed to support backwards compatibility they
| were willing to ditch contributed to less than 1% of die area,
| and of course were all already designed and battle tested.
|
| Unfortunately I don't have a better source for this than an
| anecdote.
| mhh__ wrote:
| That probably depends on whether you mean _old instructions
| now cause a fault_ or moving x86 to a new format entirely
| that doesn 't need a decoder from hell
| jeffbee wrote:
| You could lose the entire x86-specific part of the die in a
| corner of a 512x512 FMAC unit. Die cost due to x86 complexity
| seemed like something worth attacking in 1990 when RISC was
| gaining mindshare but over the years the complexity of that
| has not expanded very much while the other stuff on the die
| has gotten much larger.
___________________________________________________________________
(page generated 2021-03-25 23:00 UTC)