hngopher.com

       [HN Gopher] Intel's $475M error: the silicon behind the Pentium ...
       ___________________________________________________________________
        
       Intel's $475M error: the silicon behind the Pentium division bug
        
       Author : gslin
       Score  : 330 points
       Date   : 2024-12-28 21:48 UTC (1 days ago)
        
 (HTM) web link (www.righto.com)
 (TXT) w3m dump (www.righto.com)
        
       | kens wrote:
       | Author here if anyone has Pentium questions :-)
       | 
       | My Mastodon thread about the bug was on HN a few weeks ago, so
       | this might seem familiar, but now I've finished a detailed blog
       | post. The previous HN post has a bunch of comments:
       | https://news.ycombinator.com/item?id=42391079
        
         | mras0 wrote:
         | Great article and analysis as always, thanks! Somewhat crazy to
         | remember that a (as you argue) minor CPU erretum made world
         | wide headlines. So many worse ones out there (like you mention
         | from Intel) but others as well, that are completely forgotten.
         | 
         | For the Pentium, I'm curious about the FPU value stack (or
         | whatever the correct term is) rework they did. It's been a long
         | time, but didn't they do some kind of early "register renaming"
         | thing that had you had to manually manage doing careful
         | fxchg's?
        
           | lallysingh wrote:
           | AFAIK, the FPU was a stack calculator. So you pushed things
           | on and ran calculations on the stack.
           | https://en.wikibooks.org/wiki/X86_Assembly/Floating_Point
        
             | Sesse__ wrote:
             | It's only a stack machine in front, really. Behind-the-
             | scenes, it's probably just eight registers (the stack is a
             | fixed size, it doesn't spill to memory or anything).
        
           | Sesse__ wrote:
           | Yes, internally fxch is a register rename--_and_ fxch can go
           | in the V-pipe and takes only one cycle (Pentium has two
           | pipes, U and V).
           | 
           | IIRC fadd and fmul were both 3/1 (three cycles latency, one
           | cycle throughput), so you'd start an operation, use the free
           | fxch to get something else to the top, and then do two other
           | operations while you were waiting for the operation to
           | finish. That way, you could get long strings of FPU
           | operations at effectively 1 op/cycle if you planned things
           | well.
           | 
           | IIRC, MSVC did a pretty good job of it, too. GCC didn't,
           | really (and thus Pentium GCC was born).
        
             | ack_complete wrote:
             | FMUL could only be issued every other cycle, which made
             | scheduling even more annoying. Doing something like a
             | matrix-vector multiplication was a messy game of
             | FADD/FMUL/FXCH hot potato since for every operation one of
             | the arguments had to be the top of the stack, so the TOS
             | was constantly being replaced.
             | 
             | Compilers got pretty good at optimizing straight line math
             | but were not as good at cases where variables needed to be
             | kept in the stack during a loop, like a running sum. You
             | had to get the order of exchanges just right to preserve
             | stack order across loop iterations. The compilers at the
             | time often had to spill to memory or use multiple FXCHs at
             | the end of the loop.
        
               | Sesse__ wrote:
               | > FMUL could only be issued every other cycle, which made
               | scheduling even more annoying.
               | 
               | Huh, are you sure? Do you have any documentation that
               | clarifies the rules for this? I was under the impression
               | that something like `FMUL st, st(2) ; FXCH st(1), FMUL
               | st, st(2)` would kick off two muls in two cycles, with no
               | stall.
        
               | Tuna-Fish wrote:
               | Agner Fog's manuals are clear on this. Only the last of
               | FMUL's 3 cycles can overlap with another FMUL.
               | 
               | You can immediately overlap with a FADD.
        
         | icehawk wrote:
         | I was about to ask if the explanation of floating point numbers
         | was using Avogadro's number on purpose, but then I realized the
         | other number was Planck's constant.
        
           | kens wrote:
           | Yes, I wanted to use meaningful floating point examples
           | instead of random numbers. You get a gold star for noticing
           | :-)
        
         | skissane wrote:
         | > The bug is presumably in the Pentium's voluminous microcode.
         | The microcode is too complex for me to analyze, so don't expect
         | a detailed blog post on this subject.
         | 
         | How hard is it to "dump" the microcode into a bitstream? Could
         | it be done programatically from high resolution die
         | photographs? Of course, I appreciate that's probably the easy
         | part in comparison to reverse engineering what the bitstream
         | means.
         | 
         | > By carefully examining the PLA under a microscope
         | 
         | Do you do this stuff at home? What kind of equipment do you
         | have in your lab? How did you develop the skills to do all
         | this?
        
           | kens wrote:
           | Dumping the microcode into a bitstream can be done in an
           | automated way if you have clear, high-resolution die photos.
           | There are programs to generate ROM bitsreams from photos.
           | Part of the problem is removing all the layers of metal to
           | expose the transistors. My process isn't great, so the
           | pictures aren't as clear as I'd like. But yes, the hard part
           | is figuring out what the microcode bitstream means. Intel's
           | patents explained a lot about the 8086 microcode structure,
           | but Intel revealed much less about later processors.
           | 
           | I do this stuff at home. I have an AmScope metallurgical
           | microscope; a metallurgical microscope shines light down
           | through the lens, rather than shining the light from
           | underneath like a biological microscope. Thus, the
           | metallurgical microscope works for opaque chips. The Pentium
           | is reaching the limits of my microscope, since the feature
           | size is about the wavelength of light. I don't have any
           | training in this; I learned through reading and
           | experimentation.
        
             | dekhn wrote:
             | One tidbit to add about scopes: some biological scopes do
             | use "epi" illumination like metallurgical scopes. It's
             | commonly used on high end scopes, in combination with laser
             | illumination and fluorescence. They are much more
             | complicated and require much better alignment than a
             | regular trans illumination scope.
             | 
             | I suppose you might be able to get slightly better
             | resolution using a shorter wavelength, but at that point,
             | it requires a lot of technical skill and environmental
             | conditions and time and money, Just getting to the point
             | you've reached (and knowing what the limitations are) can
             | be satisfying in itself.
        
         | ernst_mulder wrote:
         | Thank you very much for this detailed article.
         | 
         | I never realised this is how floating point division can be
         | implemented. Actually funny how I didn't realise that multiple
         | integer division steps are required to implement floating point
         | division :-)
         | 
         | In hindsight one could wonder why the unused parts of the
         | lookup table were not filled with 2 and -2 in the first place.
        
         | ksec wrote:
         | In my view, this $475M was perhaps the best marketing spend for
         | Intel. Because of the bug and recall, everyone including those
         | not in tech knew about Intel. Coming from the 486 when people
         | were expecting 586 or 686 but then suddenly "Pentium", this bug
         | and recall built a reputation and good will that carried on
         | later with Pentium MMX.
        
           | wmf wrote:
           | Nah, Intel already did a big Pentium marketing blitz with the
           | bunny people before this bug.
        
             | xattt wrote:
             | Bunny people were part of the MMX and PII marketing.
        
       | fourseventy wrote:
       | Didn't Intel have floating point division issues more recently as
       | well?
        
         | hinkley wrote:
         | Inflation adjusted that's over 1 billion today. And they do
         | more mitigations with microcode these days.
        
           | xattt wrote:
           | Some irony if internal calculations of financial damage
           | estimates were under or over-estimated because they were done
           | on a defective chip.
        
           | fuzztester wrote:
           | yes.
           | 
           | their "1 in a billion" (excuse) became $1 billion (cost to
           | them).
           | 
           | of course, the CEOs not only go scot-free, but get to bail
           | out with their golden parachutes, while the shareholders and
           | public take the hit.
           | 
           | https://en.m.wikipedia.org/wiki/Golden_parachute
        
             | bolognafairy wrote:
             | Those evil CEOs pulling the wool over the eyes of the poor
             | shareholders!
             | 
             | This is the employment contract that was negotiated and
             | agreed to by the board / shareholders.
        
               | fuzztester wrote:
               | Where yuh bin livin all yore laafe, pilgrim? Under a
               | Boulder in Colorado, mebbe? Dontcha know dat contracts
               | can be gamed, and hev bin fer yeers, if not deecades? Dis
               | here ain't Aahffel, ya know?
               | 
               | Come eat some chili widdus.
               | 
               | Id'll shore put some hair on yore chest, and grey cells
               | in yore coconut.
               | 
               | # sorry, in a punny mood and too many spaghetti western
               | movies
        
             | KennyBlanken wrote:
             | Oh, it gets even better. US taxpayers are giving them
             | billions for "national security" reasons.
             | 
             | Nothing like giving piles of cash to a grossly incompetent
             | company (the Pentium math bug, Puma cablemodem issues,
             | their shitty 4G cellular radios, extensive issues with
             | gigabit and 2.5G network interfaces, and now the whole
             | 13th/14th gen processor self-destruction mess.)
        
         | kens wrote:
         | There's an FSIN trig inaccuracy, but I don't know of other
         | division issues:
         | https://randomascii.wordpress.com/2014/10/09/intel-underesti...
        
       | hinkley wrote:
       | > Intel's whitepaper claimed that a typical user would encounter
       | a problem once every 27,000 years, insignificant compared to
       | other sources of error such as DRAM bit flips.
       | 
       | > However, IBM performed their own analysis,29 suggesting that
       | the problem could hit customers every few days.
       | 
       | I bet these aren't as far off as they seem. Intel seems to be
       | considering a single user, while I suspect IBM is thinking in
       | terms of support calls.
       | 
       | This is a problem I've had at work. When you process a 100
       | million requests a day the one in a billion problem is hitting
       | you a few times a month. If it's something a customer or worse a
       | manager notices, they ignore the denominator and suspect you all
       | of incompetence. Four times a month can translate into "all the
       | time" in the manner humans bias their experiences. If you get two
       | statistical clusters of three in a week someone will lose their
       | shit.
        
         | kens wrote:
         | No, IBM's estimate is for a single user. IBM figures that a
         | typical spreadsheet user does 5000 divides per second when
         | recalculating and does 15 minutes of recalculating a day. IBM
         | also figures that the numbers people use are 90 times as likely
         | to cause an error as Intel's uniformly-distributed numbers. The
         | result is one user will have an error every 24 days.
        
           | hinkley wrote:
           | Ah.
           | 
           | The other failure mode that occurred to me is that if a
           | spread sheet is involved you could keep running the same calc
           | on a bad input for months or even years when aggregating
           | intermediate values over units of time. A problem that
           | happens every time you run a calculation is very different
           | from one that happens at random. Better in some ways and
           | worse in others.
        
           | jiggawatts wrote:
           | That's also a clearly flawed analysis, because the numbers
           | mostly don't change between re-computations of the
           | spreadsheet cell values!
           | 
           | E.g.: Adding a row doesn't invalidate calculations for
           | previous rows in typical spreadsheet usage. The bug is
           | deterministic, so repeating successful calculations over and
           | over with the same numbers won't ever trigger the bug.
        
             | kens wrote:
             | Yes, the book "Inside Intel" makes the same argument about
             | spreadsheets (p364). My opinion is that Intel's analysis is
             | mostly objective, while IBM's analysis is kind of a scam.
        
               | cornholio wrote:
               | IBM's result is correct if we interpret "one user
               | experiences the problem every few days" as "one in a
               | million users will experience the problem 5000 times a
               | second, for 15 minutes every day they use the spreadsheet
               | with certain values". It's an average that makes no
               | sense.
        
               | wat10000 wrote:
               | Spreadsheets Georg....
        
       | dboreham wrote:
       | Another great article from Ken. I remember this particularly
       | because the first PC that I bought with my own money had an
       | affected CPU. Prior to this era I hadn't been much interested in
       | PCs because they couldn't run "real" software. But Windows NT
       | changed that (thank you Mr. Cutler), and Taiwanese sourced low
       | cost motherboards made it practical to build your own machine, as
       | many people still do today. Ken touched on the fact that it was
       | easy for users to check if their CPU was affected. I remember
       | that this was as easy as typing a division expression with the
       | magic numbers into Excel. If MS had released a version of Excel
       | that worked around the bug, I suspect fewer users would have
       | claimed their replacement device!
        
         | ryao wrote:
         | Couldn't these PCs run 386BSD?
        
           | wmf wrote:
           | Yeah, there was BSD, Coherent, SCO, Xenix, etc. Arguably OS/2
           | was also a "real" operating system.
        
       | urbandw311er wrote:
       | What an interesting and utterly dedicated analysis. Thank you so
       | much for all your work analysing the silicon and sharing your
       | findings. I particularly like how you're able to call out Intel
       | on the actual root cause, which their PR made sound like
       | something analogous to a trivial omission. But, in fact, was less
       | forgivable and more blameworthy, ie they stuffed up their table
       | generation algorithm.
        
       | ThrowawayTestr wrote:
       | >Smith posted the email on a Compuserve forum, a 1990s version of
       | social media.
       | 
       | I hate how this sentence makes me feel.
        
         | lizzas wrote:
         | My initial feeling is: that data is probably mostly unmined and
         | lost. Lucky bastards!
        
         | lotsofpulp wrote:
         | I like to use the 1900s instead of the 1990s.
        
           | lazide wrote:
           | Does it help, or make it worse, if you say it as 'late
           | 1900s'?
        
       | tgma wrote:
       | Intel $475B error: not building a decent GPU
        
         | lizzas wrote:
         | Lack of clairvoyance? Missing out on mobile was more obvious
         | tho.
        
           | phire wrote:
           | More explicitly. In 2006, Apple asked Intel to make a SoC for
           | their upcoming product... the iPhone.
           | 
           | At the time, Intel was one of the leading ARM SoC providers,
           | their custom XScale ARM cores were faster than anything from
           | ARM Inc themselves. It was the perfect line of chips for
           | smartphones.
           | 
           | The MBA types at Intel ran some sales projects and decided
           | that such a chip wasn't likely to be profitable. There was
           | apparently debate within Intel, the engineering types wanted
           | to develop the product line anyway, and others wanting to win
           | good-will from Apple. But the MBA types won. Not only did
           | they reject Apple's request for an iPhone SoC, but they
           | immediately sold off their entire XScale division to marvel
           | (who did nothing with it) so they wouldn't even be able to
           | change their mind later even if they wanted.
           | 
           | With hindsight, I think we can safely say Intel's projections
           | for iPhone sales were very wrong. They would have easily made
           | their money back on just the sales from the first-gen iPhone,
           | and Apple would probably gone back to intel for at least a
           | few generations. Even if Apple dumped them, Intel would have
           | a great product to sell to the rapidly market of Android
           | smartphones in the early 2010s.
           | 
           | -----------
           | 
           | But I think it's actually far worse than just Intel missing
           | out on the mobile market.
           | 
           | In 2008, Apple acquired P.A. Semi, and started work on their
           | own custom ARM processors (and ARM SoCs). The ARM processors
           | which Apple eventually used to replace Intel as suppler in
           | laptops and desktops too.
           | 
           | Maybe Apple would have gone down that path anyway, but I
           | really suspect Intel's reluctance to work with Apple to
           | produce the chips Apple wanted (especially the iPhone chip)
           | was a huge motivating factor that drove Apple down the path
           | of developing their own CPUs.
           | 
           | Remember, this is 2006. Intel had only just switched to Intel
           | in January because IBM had continually failed to deliver
           | Apple the laptop-class powerpc chips they needed _[1]_. And
           | while at that time, Intel had a good roadmap for laptop-class
           | chips, it would have looked to Apple as if history was at
           | risk of repeating itself, especially as they moved into the
           | mobile market where low power consumption was even more
           | important.
           | 
           |  _[1]_ _TBH, IBM were failing to provide desktop-class CPUs
           | too. But the laptop cpus were the more pressing issue. Fun
           | fact: IBM actually tried to sell the PowerPC core they were
           | developing for the xbox 360 and PS3 to Apple as a low-power
           | laptop core. It was sold to Microsoft /Sony as a low-power
           | core too, but if you look at the launch versions of both
           | consoles, they run extremely hot, even when paired with
           | comically large (for the era) cooling solutions._
        
             | scarface_74 wrote:
             | > More explicitly. In 2006, Apple asked Intel to make a SoC
             | for their upcoming product... the iPhone.
             | 
             | This isn't strictly true. Tony Fadell and one of t- the
             | creator of the iPod and considered co-creator of the iPhone
             | - said in an interview with Ben Thompson (Stratechery) that
             | Intel was never seriously in the running for iPhone chips.
             | 
             | Jobs wanted it. But the technical people at Apple pushed
             | back.
             | 
             | Besides, especially in 2006 less than a year before the
             | iPhone was introduced, chip decisions had already been
             | made.
        
           | cylemons wrote:
           | Was it really? x86 is more performance oriented and not
           | efficiency oriented. Its variable length just makes it really
           | hard to have a low power CPU that isn't too slow.
        
             | tgma wrote:
             | I think the impact of ISA is way overblown. The instruction
             | decode pipeline is worse but doesn't consume that many
             | transistors in the end relative to the total size of the
             | system. I think it has much more to do with the attitude of
             | Intel defining the x86 market as desktop and servers and
             | not focused on super low power parts; plus their monopoly
             | which led to a long stagnation because they didn't have to
             | innovate as much.
             | 
             | You can see today with modern Ryzen laptop chips that
             | aren't that much worse than ARMs fabbed with the same node
             | on perf/watt.
        
               | adrian_b wrote:
               | For applications where the performance is determined by
               | array operations, which can leverage AVX-512
               | instructions, an AMD Zen 5 core has better performance
               | per area and per power than any ARM-based core, with the
               | possible exception of the Fujitsu custom cores.
               | 
               | The Apple cores themselves do not have great performance
               | for array operations, but when considering the CPU cores
               | together with the shared SME/AMX accelerator, the
               | aggregate might have a good performance per area and per
               | power consumption, but that cannot be known with
               | certainty, because Apple does not provide information
               | usable for comparison purposes.
               | 
               | The comparison is easy only with the cores designed by
               | Arm Holdings. For array operations, the best performance
               | among the Arm-designed cores is obtained by Cortex-X4
               | a.k.a. Neoverse V3. Cortex-A720 and Cortex-A725 have half
               | of the number of SIMD pipelines but more than half of the
               | area, while Cortex-X925 has only 50% more SIMD pipelines
               | but a double area. Intel's Skymont a.k.a. Darkmont have
               | the same area and the same number of SIMD pipelines as
               | Cortex-X4, so like Cortex-X4 they are also more efficient
               | than the much bigger core Lion Cove, which is faster on
               | average for non-optimized programs but it has the same
               | maximum throughput for optimized programs.
               | 
               | When compared with Cortex-X4/Neoverse V3, a Zen 5 compact
               | core has a throughput for array operations that can be up
               | to double, while the area of a Zen 5 compact core is less
               | than double the area of an Arm Cortex-X4. A high-clock
               | frequency Zen 5 core has more than double the area of a
               | Cortex-X4, but due to the high clock frequency it still
               | has a better performance per area, even if it no longer
               | has also a better performance per power consumption, like
               | the Zen 5 compact cores.
               | 
               | So the advantage in ISA of Aarch64, which results in a
               | simpler and smaller CPU core frontend, is not enough to
               | ensure better performance per area and per power
               | consumption when the backend, i.e. the execution units,
               | does not have itself a good enough performance per area
               | and per power consumption.
               | 
               | The area of Arm Cortex-X4 and of the very similar Intel
               | Skymont core is about 1.7 square mm in a "3 nm" TSMC
               | process (both including 1 MB of L2 cache memory). The
               | area of a Zen 5 compact core in a "4 nm" TSMC process
               | (with 1 MB of L2) is about 3 square mm (in Strix Point).
               | The area of a Zen 5 compact core with full SIMD pipelines
               | must be greater, but not by much, perhaps by 10%, and if
               | it were done in the same "3 nm" process like Cortex-X4
               | and Skymont, the area would shrink , perhaps by 20% to
               | 25% (depending on the fraction of the area occupied by
               | SRAM). In any case there is little doubt that the area in
               | the same fabrication process of a Zen 5 compact with full
               | 512-bit SIMD pipelines would be less than 3.4 square mm
               | (= double Cortex-X4), leading to a better performance per
               | area and per power consumption than for either Cortex-X4
               | or Skymont (this considers only the maximum throughput
               | for optimized programs, but for non-optimized programs
               | the advantage could be even greater for Zen 5, which has
               | a higher IPC on average).
               | 
               | Cores like Arm Cortex-X4/Neoverse V3 (also Intel
               | Skymont/Darkmont) are optimal from the POV of performance
               | per area and power consumption only for applications that
               | are dominated by irregular integer and pointer
               | operations, which cannot be accelerated using array
               | operations (e.g. for the compilation of software
               | projects). Until now, with the exception of the Fujitsu
               | custom cores, which are inaccessible for most computer
               | users, no Arm-based CPU core has been suitable for
               | scientific/technical computing, because none has had
               | enough performance per area and per power consumption,
               | when performing array operations. For a given socket,
               | both the total die area inside the package and the total
               | power consumption are limited, so the performance per
               | area and per power consumption of a CPU core determines
               | the performance per socket that can be achieved.
        
               | scarface_74 wrote:
               | Innovate on _what_ though? There was no market for
               | performant very low power chips before the iPhone and
               | then Android took off.
               | 
               | I am sure if IBM had more of a market than the minuscule
               | Mac market for laptop class PPC chips back in 2005, they
               | could have poured money into making that work.
               | 
               | Even today, I doubt it would be worth Apple's money to
               | design and manufacture its own M class desktop chips just
               | for around 25 million Macs + iPads if they weren't
               | reusing a lot of the R&D
        
               | tgma wrote:
               | In 2010s, Intel pretty much sold the same Haswell design
               | for more than half a decade and lipsticked the pig. It is
               | not just low power that they missed. They had time to
               | improve the performance/watt for server use, add core
               | counts, do big-little, improve the iGPU, etc.
               | 
               | They just sat on it, their marketing dept made fancy
               | boxes for high end CPUs and their HR department innovated
               | DEI strategies.
        
               | JustExAWS wrote:
               | Yes I'm sure that Intel fell behind because a for profit
               | company was more concerned with hiring minorities than
               | hiring the best employees they could find.
               | 
               | It's amazing that the "take responsibility", "pull
               | yourself up by your bootstraps crowd" has now become the
               | "we can't get ahead because of minorities crowd"
        
               | tgma wrote:
               | Huh, it's not clear what you are suggesting. Who's "we"
               | and who's not taking responsibility?
               | 
               | The best people were clearly not staying at Intel and
               | they have been winning hard at AMD, Tesla, NVIDIA, Apple,
               | Qualcomm, and TSMC, in case you have not been paying
               | attention. They could not stop winning and getting ahead
               | in the past 5-10 years, in fact. So much semiconductor
               | innovation happened.
               | 
               | Yes, if you start promoting the wrong people, very
               | quickly the best ones leave. No one likes to report to
               | their stupid peer who just got promoted or the idiot they
               | hire from the outside when there are more qualified
               | people they could promote from within.
               | 
               | --
               | 
               | And re marketing boxes, just check out where Intel chose
               | to innovate:
               | 
               | https://www.reddit.com/r/intel/comments/15dx55m/which_i9_
               | box...
        
               | JustExAWS wrote:
               | The problem with Intel weren't the technical people. It
               | started with the board laying off people, borrowing money
               | to pay dividends to investors, bad strategy, not building
               | relationships with customers who didn't want to work with
               | them for fabs, etc and then firing the CEO who had a
               | strategy that they knew was going to take years fo
               | implement
               | 
               | It wasn't because of "DI&E" initiatives and a refusal to
               | hire white people
        
             | phire wrote:
             | Intel had a leading line of ARM SoCs from 2002-2006. Some
             | of the best on the market for PDAs and smartphones. Their
             | XScale SoCs were very popular.
             | 
             | But Intel gave up and sold it off, right as smartphones
             | were reaching mainstream.
        
               | tgma wrote:
               | They sold XScale to Marvell which ironically has a higher
               | market cap than Intel.
        
         | bee_rider wrote:
         | Their iGPUs are good enough for day-to-day (non gaming)
         | computer use and rock-solid in Linux.
        
           | tgma wrote:
           | Good enough? Maybe better today, but they have been god awful
           | compared to AMD and absolute garbage compared to something
           | like M1 iGPU. They are responsible for more than half of the
           | pain inflicted on users in Vista days.
           | 
           | Ironically, they have lost the driver advantage in Linux with
           | their latest Arc stuff.
           | 
           | I trust they could have done a lot better, a lot earlier, if
           | they cared to invest in iGPU. Feels like deliberately
           | neglected.
        
             | bronson wrote:
             | The same way missing mobile feels so nuts that it's gotta
             | be deliberate.
        
           | DannyBee wrote:
           | ???
           | 
           | The lunar lake Xe (IE the generation before the current one)
           | is not rock solid on linux - i can get it to crash the gpu
           | consistently just by loading enough things that use GL. Not
           | like 100, like 5.
           | 
           | If i start chrome and signal and something else, it often
           | crashes the gpu after a few minutes.
           | 
           | I've tried latest kernel and firmware and mesa and ....
           | 
           | The GPU should not crash, period.
        
       | Sniffnoy wrote:
       | Given that the fixed table is a much simpler one (by letting out-
       | of-bounds just return 2, rather than adding circuitry to make it
       | return 0), I wonder why they didn't just do it that way in the
       | first place?
        
         | kens wrote:
         | Returning 0 for undefined table entries is the obvious thing to
         | do. Setting these entries to 2 is a bit of a conceptual leap,
         | even though it would have prevented the FDIV error and it makes
         | the PLA simpler. So I can't fault Intel for this.
        
           | Sniffnoy wrote:
           | It's not really a conceptual leap if you've ever had to work
           | with "don't care" cases before...
        
           | mjevans wrote:
           | It's a NULL / 'do not care' issue. 0 isn't a reserved out of
           | band value, it's payload data and anything beyond the bounds
           | should have been DNC.
           | 
           | It's possible some other result, likely aligned to an easy
           | binary multiple would still produce a square block of 2, and
           | that allowing the far edges to float to some other value
           | could yield a slightly more compact logic array. Back-filling
           | the entire side to the clamped upper value doesn't cost that
           | much more though, and is known to solve the issue. As pointed
           | out elsewhere, that sort of solution would also be faster for
           | engineering time, fit within the planned space budget, and
           | best of all reduces conative load. It's obviously correct
           | when looking at the bug.
        
         | ajross wrote:
         | "Make it work first before you make it work fast".
         | Fundamentally this is a software problem solved with software
         | techniques. And like most software there's some optimization
         | left on the table just because no one thought of it in time.
         | And you can't patch a CPU of this era.
        
         | lizzas wrote:
         | That must have been such a satisfying fix for the engineers
         | though!
        
         | jandrese wrote:
         | More engineering time resulted in a more efficient solution.
        
         | phire wrote:
         | It feels like the kind of optimization that gets missed because
         | the task was split between multiple people, and nobody had
         | complete knowledge of the problem.
         | 
         | The person generating the table didn't realize filling the out-
         | of-bounds with two would make for a simpler PLA. And the person
         | squishing the table into the PLA didn't realize the zeros were
         | "don't care" and assumed they needed to be preserved.
         | 
         | It's also possible they simply stopped optimizing as soon as
         | they felt the PLA was small enough for their needs. If they had
         | already done the floorplanning, making the PLA even smaller
         | wasn't going to make the chip any smaller, and their
         | engineering time would be better spent elsewhere.
        
       | evanmoran wrote:
       | The bug is super fun, but I also find the Intel response to be
       | fascinating on its own. They apparently didn't replace everyone's
       | processor with a non faulty version who wanted it, resulting in a
       | ton of bad press.
       | 
       | To contrast, I've been thinking a lot about the Amazon Colorsoft
       | launch, which had a yellow band graphics issue on some devices
       | (mine included). Amazon waited a bit before acknowledging it
       | (maybe a day or two, presumably to get the facts right). Then
       | they simply quietly replace all of them. No recall. They just
       | send you a new one if you ask for it (mine replacement comes
       | Friday, hopefully it will fix it). My takeaway is that it's
       | pretty clear that having an incredibly robust return/support
       | apparatus has a lot of benefits when launches don't go quite
       | right. Certainly more than you'd expect from analysis.
       | 
       | Similarly I haven't seen too many recent reports about the Apple
       | AirPod Pros crackle issue that happened a couple years ago (my
       | AirPods had to be replaced twice), but Apple also just quietly
       | replaced them and the support competence really seemed something
       | powerful that isn't always noticed.
       | 
       | Colorsoft: https://www.tomsguide.com/tablets/e-readers/amazon-
       | kindle-co...
       | 
       | AirPods Pro: https://support.apple.com/airpods-pro-service-
       | program-sound-...
        
         | lizzas wrote:
         | That is default Amazon - you can return stuff no hassle for
         | almost any reason.
        
           | WalterBright wrote:
           | Only up to a point. If one is abusing it, expect getting
           | locked out. I buy enough stuff from Amazon that they don't
           | mind me returning something once in a while.
        
         | dan-robertson wrote:
         | I thought the response from intel was to invest a lot in
         | correctness for a while and then deciding that AMD were not
         | being punished for their higher defect rate and so, more
         | recently, investing in other things to try to compete with AMD
         | on other metrics than how buggy the cpu is.
        
           | ryao wrote:
           | I read a claim that they had gutted their verification team
           | several years ago in response to Zen since they claimed that
           | they needed to develop faster and verification was slowing
           | them down. Then not that long ago we started hearing about
           | the raptor lake issues.
        
             | userbinator wrote:
             | This article? https://news.ycombinator.com/item?id=16058920
        
         | donio wrote:
         | The Kindle and AirPod cases are not really comparable since
         | those are relatively minor products for the respective
         | companies.
         | 
         | On the Apple side the iPhone 4 antennagate is a better
         | comparison since the equivalent fix there would have involved
         | free replacements for a flagship and revenue-critical product
         | which Apple did _not_ offer.
         | 
         | Intel on the other hand _did_ eventually offer free
         | replacements for anybody who asked and took a major financial
         | hit.
        
           | wruza wrote:
           | Antennagate didn't affect _everyone_ though, only those 90s
           | businessman nokia-in-fist style holders.
           | 
           | Anecdata ofc, but everyone I know already held phones in
           | fingers back then, rather than hugging it as a brick.
        
             | donio wrote:
             | Maybe but by that argument 99% of the affected Pentium
             | users could have happily used their computers until they
             | became obsolete. The bug went completely unnoticed for over
             | a year with millions of units in use.
             | 
             | The media coverage and the fact that "computer can't
             | divide" is something that the public could wrap their heads
             | around is what made the recall unavoidable.
             | 
             | Intel's own marketing hype around the Pentium has played
             | into it too. It would have been a smaller deal during the
             | 486 era.
        
               | scarface_74 wrote:
               | There were even (bad) jokes about it newspapers at the
               | time.
               | 
               | https://www.latimes.com/archives/la-
               | xpm-1994-12-14-ls-8729-s...
               | 
               | > Why didn't Intel call the Pentium the 586? Because they
               | added 486 and 100 on the first Pentium and got
               | 585.999983605 ."
        
             | scarface_74 wrote:
             | And Apple sold the same GSM iPhone 4 without making any
             | changes to it for 3 years and the uproar died down.
             | 
             | Before anyone well actually's me, yes they did come out
             | with a separate CDMA iPhone 4 for Verizon where they
             | changed the antenna design
        
         | mikepurvis wrote:
         | I had the first gen white MacBook with the magnetic closure
         | that resulted in chipped, discoloured topcases. I had it
         | replaced for free like three or four times over the lifespan of
         | that computer, including past the three year AppleCare expiry.
         | 
         | I really respected Apple's commitment to standing behind their
         | product in that way.
        
           | colechristensen wrote:
           | I thought I remembered at least some of those replacements
           | were class-action settlements and not Apple's good will.
        
         | flomo wrote:
         | For the most part, this wasn't an individual problem.
         | Corporations purchased these pretty expensive Pentium computers
         | through a distributor, and just got them replaced by the
         | vendor, per their support contract.
         | 
         | I've been in some consumer Apple "shadow warranty" situations,
         | so I know what you are talking about, but IMO very different
         | than the "IT crisis" that intel was facing. "IBM said so" had a
         | ton of IT weight back then.
        
       | coin wrote:
       | > He called Intel tech support but was brushed off
       | 
       | I laughed when I read this. It's hard enough to get support for
       | basic issues, good luck explaining a hardware bug.
        
       | WalterBright wrote:
       | > It appears that only one person (Professor Nicely) noticed the
       | bug in actual use.
       | 
       | I recall a study done years ago where students were supplied
       | calculators for their math class. The calculators had been
       | doctored to produce incorrect results. The researchers wanted to
       | know how wrong the calculators had to be before the students
       | noticed something was amiss.
       | 
       | It was a factor of 2.
       | 
       | Noticing the error, and being affected by the error, are two
       | entirely different things.
       | 
       | I.e. how many people check to see if the computer's output is
       | correct? I'd say very, very, very few. Not me, either, except in
       | one case - when I was doing engineering computations at Boeing,
       | I'd run the equations backwards to verify the outputs matched the
       | inputs.
        
         | kllrnohj wrote:
         | > Noticing the error, and being affected by the error, are two
         | entirely different things.
         | 
         | Only somewhat true. Take any consumer usage here for example.
         | If you're playing a game and it hits this incorrect output but
         | you don't notice anything as a result, were you actually
         | affected?
         | 
         | How much usage of FDIV on a Pentium was for numerically
         | significant output instead of just multimedia?
        
           | WalterBright wrote:
           | If your game has some artifacts in the display, nobody cares.
           | 
           | But if you're doing financial work, scientific work, or
           | engineering work, the results matter. An awful lot of people
           | used Excel.
           | 
           | BTW, telling a customer that a bug doesn't matter doesn't
           | work out very well.
        
         | wat10000 wrote:
         | I used to tutor physics in college. My students would show a
         | problem they worked and ask for feedback, and I'd tell them
         | that they definitely went wrong _somewhere_ since they
         | calculated that the rollercoaster was 23,000 miles tall.
         | 
         | Which is to say, it will depend a lot on the context and the
         | understanding of the person doing the calculation.
        
           | WalterBright wrote:
           | It is institute policy at Caltech (at least when I attended)
           | that obviously wrong answers would get you zero credit, even
           | if the result came from a minor error. However, if you
           | concluded after solving the problem that the answer was
           | absurd, but you didn't know where the calculation went wrong,
           | you'd get partial credit.
        
       | WalterBright wrote:
       | I remember that bug. Because I could not control what CPU my
       | customers were running on, I had to add special code in the
       | library to detect the bad FPU and execute workaround code (this
       | code was supplied by Intel).
       | 
       | I.e. Intel's problem became my problem, grrrr
        
       | stickfigure wrote:
       | Reminds me of a joke floating around at the time that captures a
       | couple different 90s themes:                   I AM PENTIUM OF
       | BORG.         DIVISION IS FUTILE.         YOU WILL BE
       | APPROXIMATED.
        
       | fortran77 wrote:
       | "At Intel, Quality is job 0.9999999999999999762"
        
       | keshavmr wrote:
       | At the 2012 Turning Award conference in San Francisco, Prof
       | William Kahan mentioned that he had a newer test suite available
       | in 1993 that would have caught Intel's bug. Still, Intel did not
       | run that.. Prof. Kahan was actively involved in its analysis and
       | further testing. (I'm stating this just from memory).
        
       | hyperman1 wrote:
       | How did idiv work on the pentium. Was it also optimized, or
       | somehow connected to fdiv, or just the old slow algorithm?
        
       | pieterr wrote:
       | Reminds me of part 2 of day24. Some wrong wirings. ;-)
       | 
       | https://adventofcode.com/2024/day/24
        
       | Unearned5161 wrote:
       | From someone who had to mentally let go once you started talking
       | about planes crossing each other, thank you for such an amazingly
       | detailed writeup. It's not everyday that you learn a new cool way
       | to divide numbers!
        
       | ijustlovemath wrote:
       | > Curiously, the adder is an 8-bit adder but only 7 bits are
       | used; perhaps the 8-bit adder was a standard logic block at
       | Intel.
       | 
       | I believe this is because for any adder you always want 1 bit
       | extra to detect overflow! This is why 9 bit adders are a common
       | component in MCUs
        
         | kens wrote:
         | The weird thing is that I traced out the circuitry and the
         | bottom bit of the adder is discarded, not the top bit where
         | overflow would happen. (Note that you won't get overflow for
         | this addition because the partial remainder is in range, just
         | split into the sum and carry parts.)
        
       | chiph wrote:
       | I'm surprised they took the risk of extending the lookup table to
       | have all 2's in the undefined region. A safer route would have
       | been to just fix the 5 entries. Someone was pretty confident!
        
         | justsid wrote:
         | It actually seems like it becomes much easier to reason about
         | because you remove a ton of (literal in the diagram) edge
         | cases.
        
       | CaliforniaKarl wrote:
       | > The explanation is that Intel didn't just fill in the five
       | missing table entries with the correct value of 2. Instead, Intel
       | filled all the unused table entries with 2.
       | 
       | I wonder why they didn't do this in the first place.
        
         | Panzer04 wrote:
         | Implementation detail. Somone overspecified it and didn't
         | realise that it didn't matter.
         | 
         | Look at it again later, someone asks why not just fill
         | everything in instead and everyone feels a bit silly XD.
        
       ___________________________________________________________________
       (page generated 2024-12-29 23:00 UTC)