[HN Gopher] The Pentium FDIV bug, reverse-engineered
       ___________________________________________________________________
        
       The Pentium FDIV bug, reverse-engineered
        
       Author : croes
       Score  : 191 points
       Date   : 2024-12-11 18:33 UTC (3 days ago)
        
 (HTM) web link (oldbytes.space)
 (TXT) w3m dump (oldbytes.space)
        
       | perdomon wrote:
       | This dude pulled out a microscope and said "there's your
       | problem." Super impressive work. Really great micro-read.
        
         | layer8 wrote:
         | To be fair, he knew what the problem was (errors in the lookup
         | table) beforehand.
        
           | pests wrote:
           | It's not like the microscope is going to show "a lookup
           | table" though. You need to know how it was implemented in
           | silicon, how transistors are integrated into the silicon, etc
           | to even start identifying the actual physical mistake.
        
         | sgerenser wrote:
         | I wonder what node generation doing this type of thing becomes
         | impossible (due to features being too small to be visible with
         | an optical microscope)? I would have guessed sometime before
         | the first Pentium, but obviously not.
        
           | poizan42 wrote:
           | I think the limit for discernible features when using an
           | optical microscope is somewhere around 200nm. This would put
           | the limit somewhere around the 250nm node size, which was
           | used around 1993-1998.
        
       | Cumpiler69 wrote:
       | This is probably one of the reasons Intel went to a microcode
       | architecture after.
       | 
       | I wonder how many yet to be discover silicone bugs are out there
       | on modern chips?
        
         | KingLancelot wrote:
         | Silicone is plastic, Silicon is the element.
        
           | jghn wrote:
           | Not according to General Beringer of NORAD! :) [1]
           | 
           | [1] https://youtu.be/iRsycWRQrc8?t=82
        
         | wmf wrote:
         | They always used microcode:
         | https://www.righto.com/2022/11/how-8086-processors-microcode...
         | 
         | I'm not sure when Intel started supporting microcode updates
         | but I think it was much later.
        
           | LukeShu wrote:
           | Microcode updates came the very next generation with the
           | Pentium Pro.
        
         | Lammy wrote:
         | Older Intel CPUs were already using microcode. Intel went after
         | NEC with a copyright case over 8086 microcode, and after AMD
         | with a copyright case over 287/386/486 microcode:
         | 
         | - https://thechipletter.substack.com/p/intel-vs-nec-the-
         | case-o...
         | 
         | - https://www.upi.com/Archives/1994/03/10/Jury-backs-AMD-in-
         | di...
         | 
         | I would totally believe the FDIV bug is why Intel went to a
         | _patchable_ microcode architecture however. See "Intel P6
         | Microcode Can Be Patched -- Intel Discloses Details of Download
         | Mechanism for Fixing CPU Bugs (1997)"
         | https://news.ycombinator.com/item?id=35934367
        
         | kens wrote:
         | Intel used microcode starting with the 8086. However, patchable
         | microcode wasn't introduced until the Pentium Pro. The original
         | purpose was for testing, being able to run special test
         | microcode routines. But after the Pentium, Intel realized that
         | being able to patch microcode was also good for fixing bugs in
         | the field.
        
           | peterfirefly wrote:
           | Being able to patch the microcode only solves part of the
           | possible problems a CPU can have.
           | 
           | My guess -- and I hope you can confirm it at some point in
           | the future -- is that more modern CPUs can patch other data
           | structures as well. Perhaps the TLB walker state machine,
           | perhaps some tables involved in computation (like the FDIV
           | table), almost certainly some of the decoder machinery.
           | 
           | How does one make a patchable parallel multi-stage decoder is
           | what I'd really like to know!
        
             | Tuna-Fish wrote:
             | Mostly, you can turn off parts of the CPU (so called
             | chicken bits). They are invaluable for validation, but they
             | have also been frequently used for for fixing broken CPUs.
             | Most recently AMD just recently turned off their loop
             | buffer in Zen4: https://chipsandcheese.com/p/amd-disables-
             | zen-4s-loop-buffer
        
         | userbinator wrote:
         | Look at how long the public errata lists are, and use that as a
         | lower bound.
         | 
         | Related article: https://news.ycombinator.com/item?id=16058920
        
       | molticrystal wrote:
       | The story was posted a couple days ago and ken left a couple
       | comments there: https://news.ycombinator.com/item?id=42388455
       | 
       | I look forward to the promised proper write up that should be out
       | soon.
        
         | pests wrote:
         | There is a certain theme of posts on HN where I am just certain
         | the author is gonna be Ken and again not disappointed.
        
       | Thaxll wrote:
       | So nowdays this table could have been fixed with a microcode
       | update right?
        
         | jeffbee wrote:
         | With a microcode update that ruins FDIV performance, sure. Even
         | at that time there were CPUs still using microcoded division,
         | like the AMD K5.
        
           | Netch wrote:
           | This division, using SRT loop with 2 bit output per
           | iteration, perhaps would have already been microcoded - but
           | using the lookup table as an accelerator. An alternative
           | could use a simpler approach (e.g. 1-bit-iteration "non-
           | restoring" division). Longer but still fitting into normal
           | range.
           | 
           | But if they had understood possible aftermath of non-tested
           | block they would have implemented two blocks, and switch to
           | older one if misworking was detected.
        
         | phire wrote:
         | The table couldn't be fixed. But it can be bypassed.
         | 
         | The microcode update would need to disable the entire FDIV
         | instruction and re-implement it without using any floating
         | point hardware at all, at least for the problematic devisors.
         | It would be as slow as the software workarounds for the FDIV
         | bug (average penalty for random divisors was apparently 50
         | cycles).
         | 
         | The main advantage of a microcode update is that all FDIVs are
         | automatically intercepted system-wide, while the software
         | workarounds needed to somehow find and replace all FDIVs in the
         | target software. Some did it by recompiling, others scanned for
         | FDIV instructions in machine code and replaced them; Both
         | approaches were problematic and self-modifying code would be
         | hard to catch.
         | 
         | A microcode update _" might"_ have allowed Intel to argue their
         | way out of an extensive recall. But 50 cycles on average is a
         | massive performance hit, FDIV takes 19 cycles for single-
         | precision. BTW, this microcode update would have killed
         | performance in quake, which famously depended on floating point
         | instructions (especially the expensive FDIV) running in
         | parallel with integer instructions.
        
           | hakfoo wrote:
           | It's interesting that there's no "trap instruction/sequence"
           | feature built into the CPU architecture. That would
           | presumably be valuable for auditing and debugging, or to
           | create a custom instruction by trapping an otherwise unused
           | bit sequence.
        
             | j16sdiz wrote:
             | > create a custom instruction by trapping an otherwise
             | unused bit sequence. ..
             | 
             | ... until a new CPU support an instruction extension that
             | use the same bit sequence.
        
               | immibis wrote:
               | That's why architectures - including x86! - have opcodes
               | they promise will always be undefined.
        
             | Tuna-Fish wrote:
             | There is today, for all the reasons you state. Transistor
             | budgets were tighter back then.
        
       | mega_dingus wrote:
       | Oh to remember mid-90s humor
       | 
       | How many Intel engineers does it take to change a light bulb?
       | 0.99999999
        
         | zoky wrote:
         | Why didn't Intel call the Pentium the 586? Because they added
         | 486+100 on the first one they made and got 585.999999987.
        
           | Dalewyn wrote:
           | Amusing joke, but it actually is effectively called the 586
           | because the internal name is P5 and Penta from which Pentium
           | is derived is 5.[1]
           | 
           | Incidentally, Pentium M to Intel Core through 16th gen
           | Lunarrow Lake all identify as P6 ("Family 6") for 686 because
           | they are all based off of the Pentium 3.
           | 
           | [1]: https://en.wikipedia.org/wiki/Pentium_(original)
        
             | nayuki wrote:
             | Also, as per the page:
             | 
             | > Intel used the Pentium name instead of 586, because in
             | 1991, it had lost a trademark dispute over the "386"
             | trademark, when a judge ruled that the number was generic.
        
             | xattt wrote:
             | You're missing the fact that Intel wanted to differentiate
             | itself from the growing IA-32 clone chips from AMD and
             | Cyrix. 586 couldn't be trademarked, but Pentium could.
        
             | zusammen wrote:
             | The main reason is that it's impossible to trademark a
             | model number.
             | 
             | Also, Weird Al would not have been able to make a song,
             | "It's All About the 586es." It just doesn't scan.
        
       | jghn wrote:
       | An anecdote regarding this bug that always cracks me up. My
       | college roommate showed up with a shiny new pentium machine that
       | year, and kept bragging about how awesome it was. We used some
       | math software called Maple that was pretty intensive for PCs at
       | the time, and he thought he was cool because he could do his
       | homework on his PC instead of on one of the unix machines in the
       | lab.
       | 
       | Except that he kept getting wrong answers on his homework.
       | 
       | And then he realized that when he did it on one of the unix
       | machines, he got correct answers.
       | 
       | And then a few months later he realized why ....
        
         | mleo wrote:
         | The mention of Maple brings back vivid memories of freshmen
         | year of college when the math department decided to use the
         | software as part of instruction and no one understood how to
         | use it. There was a near revolt by the students.
        
           | kens wrote:
           | Coincidentally, I was a developer on Maple for a summer,
           | working on definite integration and other things.
        
           | bee_rider wrote:
           | And you know a week before the semester, the TA's were given
           | given a copy and told to figure it out.
        
             | WalterBright wrote:
             | > were given given a copy and told to figure it out
             | 
             | Welcome to engineering. That's what we do. It's all we do.
        
               | Moru wrote:
               | But you get paid to do it, teachers are expected to do
               | this on their off time. :-)
               | 
               | "You know so much about computers, it's easy for you to
               | figure it out!"
        
               | WalterBright wrote:
               | I.e. the teachers are expected to learn the tools they
               | want to teach the students how to use? The horror!
        
               | bee_rider wrote:
               | Because of the way you did your quote, we've switched
               | from talking about the TA's to the teachers themselves.
               | This is sort of different, the teaching assistants don't
               | really have teaching goals and they usually aren't part
               | of the decision making process for picking what gets
               | taught, at least where I've been.
               | 
               | Anyway, as far as "learn the tools they want to teach the
               | students how to use" goes, I dunno, hard to say. I
               | wouldn't be that surprised to hear that some department
               | head got the wrong-headed idea that students needed to
               | learn more practical tools and skills, shit rolled
               | downhill, and some professor got stuck with the job.
               | 
               | Usually professors aren't expert tool users after all,
               | they are there for theory stuff. Hopefully nobody is
               | expecting to become a vscode expert by listening to some
               | professor who uses eMacs or vim like a proper grey-beard.
        
           | rikthevik wrote:
           | I've never used Maple for work, only for survival. We had a
           | Calculus for Engineers textbook that was so riddled with
           | errors I had to use a combination of the non-Engineering
           | textbook and Maple to figure out wtf was going on. Bad
           | textbook, bad prof, bad TAs and six other classes. What a
           | mess. Those poor undergrads...
        
         | Timwi wrote:
         | "some math software called Maple". I still use a version of
         | Maple that I bought in 1998. I found subsequent versions of it
         | much harder to use and I've never found an open-source software
         | that could do what it can do. I don't need anything fancy, just
         | occasionally solve an equation or a system of equations, or
         | maybe plot a simple function. That old copy of Maple continues
         | to serve me extremely well.
        
           | zozbot234 wrote:
           | I assume that you're familiar with Maxima? It's perhaps the
           | most commonly used open-source CAS - there are some emerging
           | alternatives like Sympy.
           | 
           | That aside, yes it's interesting that an old program from
           | 1998 can still serve us quite well.
        
       ___________________________________________________________________
       (page generated 2024-12-14 23:01 UTC)