[HN Gopher] Speculating the Entire x86-64 Instruction Set in Sec...
       ___________________________________________________________________
        
       Speculating the Entire x86-64 Instruction Set in Seconds with One
       Weird Trick
        
       Author : muricula
       Score  : 78 points
       Date   : 2021-03-25 19:07 UTC (3 hours ago)
        
 (HTM) web link (blog.can.ac)
 (TXT) w3m dump (blog.can.ac)
        
       | blight wrote:
       | be sure to check out the data set extracted by this research over
       | at https://haruspex.can.ac/
        
       | alpb wrote:
       | Off-topic: This was posted yesterday, but got no attention. (I
       | tried to re-post yesterday, but got redirected to the existing
       | post.) https://news.ycombinator.com/item?id=26576032 I wonder
       | what makes HN disallow reposting of the same URL in a short
       | periods of time but allowing it in long-term.
        
       | woodruffw wrote:
       | This is a really clever technique! I was impressed by
       | sandsifter[1] when it originally came out, and this seems an
       | awful lot faster and less prone to false negatives (since it's
       | purely speculative and doesn't require sandsifter's `#PF` hack).
       | 
       | At the risk of unwarranted self-promotion: the other side of this
       | equation is fidelity in _software_ instruction set decoders. x86
       | 's massive size and layers of historical complexity make it among
       | the most difficult instruction formats to accurately decode; I've
       | spent a good part of the last two years working on a fuzzer
       | that's discovered thousands of bugs in various popular x86
       | decoders[2][3].
       | 
       | [1]: https://github.com/xoreaxeaxeax/sandsifter
       | 
       | [2]: https://github.com/trailofbits/mishegos
       | 
       | [3]: https://ww.easychair.org/publications/preprint_download/1LHr
        
       | [deleted]
        
       | TrainedMonkey wrote:
       | This just cinched to me that we need to sunset hardware x86 and
       | run code that cannot be recompiled on emulators. x86 had a good
       | run, but it becoming increasingly obvious that maintaining
       | backward compatibility in modern high performance parts is
       | incredibly expensive and bug ridden.
       | 
       | At this point sunsetting x86 is not even a pipe dream, most
       | people carry ARM powered computers in their pockets and Apple
       | recently demonstrated that it can be quite successful in high
       | performance devices.
        
         | matthewmacleod wrote:
         | This is such a weird meme to me. In what way is the thing you
         | described at all "increasingly obvious"? We see this kind of
         | statement all the time, but almost _never_ accompanied by any
         | sort of actual rationale for it.
        
         | mhh__ wrote:
         | > but it becoming increasingly obvious that maintaining
         | backward compatibility in modern high performance parts is
         | incredibly expensive and bug ridden
         | 
         | Moving everything to ARM will not be cheap. As for bugs that is
         | entirely dependent on the company making the chip, which ones
         | do you have in mind? (Also recall that M1 is vulnerable to
         | Spectre too).
         | 
         | I kind of hope X86s days are numbered as well, but I'm not
         | looking forward to an ARM monoculture.
        
           | [deleted]
        
           | als0 wrote:
           | > Moving everything to ARM will not be cheap
           | 
           | I thought the same thing until I saw how well Apple's Rosetta
           | 2 works. Now that we've seen what's possible, I'm hoping that
           | the x64 emulation in Windows/ARM will rise to the challenge
           | set by Apple.
        
             | monocasa wrote:
             | I imagine Apple has a patent on the cute optional TSO
             | memory model that makes it work.
        
             | mhh__ wrote:
             | And to be able to use Rosetta 2 I only have to spend the
             | combined value of all my computers again? Double that if I
             | want the same amount of RAM and storage as I have now
        
               | als0 wrote:
               | You don't have to use Rosetta 2 - that was an example of
               | a good implementation. I did mention the announced x64
               | emulation on Windows, but that's only a preview release
               | at the moment. If you're a Linux user the only thing I'm
               | aware of is QEMU TCG but there might be faster projects
               | out there.
        
         | monocasa wrote:
         | While some of the specifics are obviously x86 specific, I'm not
         | sure that the underlying issues here are. From undocumented
         | instructions, to lack of documentation about speculation
         | barriers most of the root issues you see here are equally
         | applicable to the cores in an M1.
        
           | als0 wrote:
           | > to lack of documentation about speculation barriers
           | 
           | Can you elaborate on this? I was having a look at the M1
           | instructions here [1] and it seems that they implement at
           | least one of the barriers, CSDB, which is actually documented
           | by ARM[2].
           | 
           | [1] https://dougallj.github.io/applecpu/firestorm-int.html
           | 
           | [2] https://developer.arm.com/documentation/ddi0596/2020-12/B
           | ase...
        
             | monocasa wrote:
             | There's explicit speculation barriers that they document,
             | but they don't tell you what _other_ instructions are
             | speculation barriers simply due to microarchitectural
             | compromises like what you're seeing in this article.
        
               | als0 wrote:
               | OK thanks for clarifying. If there are explicit
               | instructions for blocking speculation then why are you
               | concerned about implicit barriers?
        
               | phkahler wrote:
               | Seems like implicit undocumented barriers could be used
               | in critical code to provide unfair performance advantage.
               | Maybe. Might be a bit of a stretch.
        
               | monocasa wrote:
               | I'm not. My point is that this article was being unfairly
               | interpreted by the parent as some black mark against x86.
               | There's plenty to hold against it, but not really
               | anything that's documented here. I probably should have
               | made that clearer.
        
           | mhh__ wrote:
           | I'm kind of curious whether Apple aren't documenting M1
           | because they can't be bothered, can't (not organized enough
           | yet), or because it's their toy not ours.
        
             | monocasa wrote:
             | D) Scared that disclosing microarchitectural details will
             | open them up to more patent fights that they have to pay
             | off without a lot of benefit to endusers or developers.
             | 
             | E) All of the above
        
         | retrac wrote:
         | VME support was broken on Ryzen for a while before a microcode
         | patch came out. The VME instructions are a relatively obscure,
         | and originally Intel proprietary extension, to i386's virtual
         | 8086 mode. Introduced in the 90s to speed up DOS virtual
         | machines under OS/2 and NT, I think.
         | 
         | I don't know much about how stuff is implemented these days,
         | but virtual 8086 mode involves some page table muckery and
         | similar. Surely implementing it creates a larger exposure area,
         | from a security standpoint.
        
         | sesuximo wrote:
         | A lot of code doesn't really care about speculative execution.
         | It would be a shame to throw out decades of development and
         | thousands of cpus just for one use case that didn't totally fit
         | it.
        
         | muricula wrote:
         | Many years ago I worked with a greybeard AMD hardware designer.
         | He told me that they commissioned a study about whether it made
         | sense to ditch backwards compatibility, and realized that the
         | parts of the CPU needed to support backwards compatibility they
         | were willing to ditch contributed to less than 1% of die area,
         | and of course were all already designed and battle tested.
         | 
         | Unfortunately I don't have a better source for this than an
         | anecdote.
        
           | mhh__ wrote:
           | That probably depends on whether you mean _old instructions
           | now cause a fault_ or moving x86 to a new format entirely
           | that doesn 't need a decoder from hell
        
           | jeffbee wrote:
           | You could lose the entire x86-specific part of the die in a
           | corner of a 512x512 FMAC unit. Die cost due to x86 complexity
           | seemed like something worth attacking in 1990 when RISC was
           | gaining mindshare but over the years the complexity of that
           | has not expanded very much while the other stuff on the die
           | has gotten much larger.
        
       ___________________________________________________________________
       (page generated 2021-03-25 23:00 UTC)