[HN Gopher] The legend of "x86 CPUs decode instructions into RIS...
       ___________________________________________________________________
        
       The legend of "x86 CPUs decode instructions into RISC form
       internally" (2020)
        
       Author : segfaultbuserr
       Score  : 137 points
       Date   : 2023-06-18 13:33 UTC (9 hours ago)
        
 (HTM) web link (fanael.github.io)
 (TXT) w3m dump (fanael.github.io)
        
       | fooblaster wrote:
       | I'm not sure how you could write something like this without
       | considering something like the micro op cache, which is present
       | in all modern x86 and some arm processors. The micro op cache on
       | x86 is effectively is the only way an x86 processor can get full
       | ipc performance, and that's because it contains pre decoded
       | instructions. We don't know the formats here, but we can
       | guarantee that they are fixed length instructions and that they
       | have branch instructions annotated. Yeah sure, these instructions
       | have more complicated semantics than true risc instructions, but
       | they have the most important part - fixed length. This makes it
       | possible for 8-10 of them to be dispatched to the backend per
       | cycle. In my mind, this definitely is the "legend" manifested.
        
         | jabl wrote:
         | Do we know they are fixed length? They could e.g. use a table
         | with offsets to instruction boundaries in the cache?
        
         | gpderetta wrote:
         | We know that they are over 100 bits (which is not very RISCy)
         | and not fixed length as some constants cause the instructions
         | to take more than one cache slots. IIRC they are also not
         | necessarily load store.
        
       | mikequinlan wrote:
       | >Final verdict
       | 
       | >There is some truth to the story that x86 processors decode
       | instructions into RISC-like form internally. This was, in fact,
       | pretty much how P6 worked, later improvements however made the
       | correspondence tortuous at best. Some microarchitecture families,
       | on the other hand, never did anything of the sort, meaning it was
       | never anywhere near a true statement for them.
        
         | p_l wrote:
         | P6 was normal microcoded architecture, very much not RISC-like,
         | though.
         | 
         | Especially with its 143 bit long instructions and memory-memory
         | operands.
         | 
         | Now, K5 was a different beast, with it's "microcode" being a
         | frontend that converted x86 to AMD29k
        
       | ajross wrote:
       | "RISC" architectures are doing something effectively identical to
       | uop fusion though. The real myth is the idea of a CISC/RISC
       | dichotomy in the first place when frankly that notion only ever
       | applied to the ISA specifications and not (except for the very
       | earliest cores) CPU designs.
       | 
       | In point of fact beyond the instruction decode stage all modern
       | cores look more or less identical.
        
         | gumby wrote:
         | > The real myth is the idea of a CISC/RISC dichotomy in the
         | first place
         | 
         | The divergence was one of philosophy, and had unexpected
         | implications.
         | 
         | CISC was a "business as usual" evolution of the 1960s view
         | (exception: Seymour Cray) that you should make it easy to write
         | assembly code so have lots of addressing modes and subroutines
         | (string ops, BCD, etc) in the instruction set.
         | 
         | RISC was realizing that software was good enough that compilers
         | could do the heavy lifting and without all that junk hardware
         | designers could spend their transistor budget more usefully.
         | 
         | That's all well and good (I was convinced at the time, anyway)
         | but the results have been amusing. For example some RISC
         | experiments turn out to have painted their designs into dead
         | ends (delay slots, visible register windows, etc) while the
         | looseness of the CISC approach allowed more optimization to be
         | done in the micromachine. I did not see that coming!
         | 
         | Agree on the point that the cores themselves have found a
         | common local maximum.
        
           | phire wrote:
           | But there wasn't ever a divergence in philosophy. It was a
           | straight switch.
           | 
           | In the 70s, everyone designing an ISA was doing CISC. Then in
           | the 80s, everyone suddenly switched to designing RISC ISAs,
           | more or less overnight. There weren't any holdouts, nobody
           | ever designed a new CISC ISA again.
           | 
           | The only reason why it might seem like there was a divergence
           | is because some CPU microarchitecture designers were allowed
           | to design new ISAs to meet their needs, while others were
           | stuck having to design new microarchitecture for legacy CISC
           | ISAs which were too entrenched to replace.
           | 
           |  _> For example some RISC experiments turn out to have
           | painted their designs into dead ends_
           | 
           | Which is kind of obvious in hindsight. The RISC philosophy
           | somewhat encouraged exposing pipeline implementation details
           | to the ISA, which is a great idea if you can design a fresh
           | new ISA for each new CPU microarchitecture.
           | 
           | But those RISC ISAs became entrenched, and CPU
           | microarchitecture found themselves having to design for what
           | are now legacy RISC ISAs and work around implementation
           | details that don't make sense anymore.
           | 
           | Really the divergence was fresh ISAs vs legacy ISAs.
           | 
           |  _> while the looseness of the CISC approach allowed more
           | optimisation to be done in the micromachine._
           | 
           | I don't think this is actually an inherent advantage of CISC.
           | It's simply result of the shear amount of R&D that AMD,
           | Intel, and others poured into the problem of making fast
           | microarchitectures for x86 CPUs.
           | 
           | If you threw the same amount of resources at any other legacy
           | RISC ISA, you would probably get the same result.
        
         | jeffbee wrote:
         | That's because there are only a few ISAs left standing. The ISA
         | does have consequences for core design. This becomes apparent
         | for ISAs that were unintentionally on the wrong side of the
         | development of superscalar. The dead ISAs assumed in-order,
         | single-issue, fixed-length pipelines and as soon as the state
         | of the art shifted those ISAs became hard to implement. MIPS
         | and SPARC are both basically incompatible with modern high-
         | performance CPU design techniques.
        
           | ajross wrote:
           | > MIPS and SPARC are both basically incompatible with modern
           | high-performance CPU design techniques.
           | 
           | I don't see how that follows at all? MIPS in fact is about as
           | pure a "RISC" implementation as is possible to conceive[1],
           | and it shares all its core ideas with RISC-V. You absolutely
           | could make a deeply pipelined superscalar multi-core MIPS
           | chip. SPARC has the hardware stack engine to worry about, but
           | then modern CPUs have all moved to behind-the-scenes stack
           | virtualization anyway.
           | 
           | No, CPUs are CPUs. Instruction set architectures are a
           | vanishingly tiny subset of the design of these things. They
           | just don't matter. They only seem to matter to programmers
           | like us, because it's the only part we see.
           | 
           | [1] Branch delay slots and integer multiply instruction
           | notwithstanding I guess.
        
             | hinkley wrote:
             | We had to touch MIPS in school. Having to deal with branch
             | delay slots was cruel. It broke some of my classmates. We
             | were on a cusp of needing every programmer we could get and
             | they were torturing students. I hope those teachers lost
             | sleep over that.
             | 
             | Am I correct in recalling they removed branch delay slots
             | in a later iteration of the chips?
        
               | IAmLiterallyAB wrote:
               | IIRC They made new branch instructions without delay
               | slots, but the normal branch instructions still have
               | delay slots.
               | 
               | Had to write MIPS assembly by hand recently, incredibly
               | counter intuitive.
        
             | jeffbee wrote:
             | The design of RISC-V starts with an entire chapter
             | excoriating MIPS for being useless and impossible to reduce
             | to transistors.
        
               | allenrb wrote:
               | Some of the same people were involved, no? They must have
               | had a good time writing that. "Things we have learned the
               | hard way".
        
               | mumbel wrote:
               | And now MIPS, the company, makes RISC-V
        
         | ant6n wrote:
         | The decode with variable width instructions is one of the
         | bottlenecks for ipc though. Its hard to imagine, for example, a
         | 10-way decode on x86.
         | 
         | An internal fixed with encoding inside the instruction cache
         | may work.
        
           | ajross wrote:
           | > Its hard to imagine, for example, a 10-way decode on x86.
           | 
           | Uh, why? Instructions start at byte boundaries. Say you want
           | to decode an whole 64 byte cache line at once (probably 10-14
           | instructions on average). Thats... 64 parallel decode units.
           | Seems extremely doable to me, given that "instruction decode"
           | isn't even visible as a block on the enormous die shots
           | anyway.
           | 
           | Obviously there's some cost there in terms of pipeline depth,
           | but then again Sandy Bridge showed how to do caching at the
           | uOp level to avoid that. Again, totally doable and a long-
           | solved problem. The real reason that Intel doesn't do 10-wide
           | decode is that it doesn't have 10 execution units to fill
           | (itself because typical compiler-generated code can't exploit
           | that kind of parallelism anyway).
        
             | formerly_proven wrote:
             | That's a Zen 2 core: https://static.wixstatic.com/media/501
             | 7e5_bbd1e91507434d40bc...
             | 
             | Overlaid on a higher resolution die shot https://static.wix
             | static.com/media/5017e5_982e0e47d7c04dd693...
             | 
             | Here's a POWER8 floorplan: https://cdn.wccftech.com/wp-
             | content/uploads/2013/08/IBM-Powe... (The decoder is
             | subsumed under IFU)
             | 
             | And POWER9 https://www.nextplatform.com/wp-
             | content/uploads/2016/08/ibm-...
             | 
             | Didn't find any recent, reliable sources for Intel cores.
        
               | hinkley wrote:
               | Why does the middle of the Zen FPU look like cache lines?
        
               | exmadscientist wrote:
               | I believe those are the utterly massive register files
               | needed to feed a modern vector FPU.
        
               | hinkley wrote:
               | That would have been my guess. But I don't think I've
               | ever seen a register file big enough that I could spot it
               | without a label. I'm almost surprised they are so tall
               | and not wide. Or is that because I am looking at it
               | sideways, and each register is top to bottom?
        
               | exmadscientist wrote:
               | This is about Skylake rather than Zen2 but it's
               | fascinating, if the subject of what's-really-in-a-
               | register-file is fascinating to you:
               | https://travisdowns.github.io/blog/2020/05/26/kreg2.html
        
               | hinkley wrote:
               | When I still read architecture docs like novels, there
               | was a group experimenting with processor-in-memory
               | architectures. Where the demo was Memory chips doing
               | vector processing of data in parallel.
               | 
               | I wonder how wide SIMD has to get before you treat it
               | like a CPU embedded into cache memory.
               | 
               | Though I guess we are already looking at SIMD
               | instructions wider than a cache line...
        
       | cachvico wrote:
       | Saved this great article from a couple of years ago,
       | https://medium.com/swlh/what-does-risc-and-cisc-mean-in-2020...
        
       | adrianmonk wrote:
       | If someone says x86 decodes to RISC internally, they might be
       | getting at one of two different ideas:
       | 
       | (1) RISC really is the fastest/best way for CPUs to operate
       | internally.
       | 
       | (2) x86 performance isn't held back (much) by its outmoded
       | instruction set.
       | 
       | x86 architectures were for a while translating into effectively
       | RISC but stopped doing it. Now internally they are less RISC-
       | like. This suggests #1 is false and #2 is true.
       | 
       | They could if they want to (because they have) but they don't
       | want to anymore. Presumably because it's not the best way to do
       | it. Although I guess it could be slightly better but not worth
       | the cost of translating.
        
         | pclmulqdq wrote:
         | I think if you want to analyze this to the same level of
         | pedantry as the original blog post, the thing people actually
         | mean is:
         | 
         | "x86 instructions decompose into operations that feel somewhat
         | RISC-like"
         | 
         | And this is pretty much true.
         | 
         | The author of this piece somewhat fixates on RISC-V as the
         | anointed version of "RISC-ness" when it really is just one
         | expression of the idea. The whole RISC vs CISC distinction is
         | pretty silly anyway, because there isn't really a clear
         | criterion or dividing line (see ARM's "RISC" instruction set)
         | that separates "RISC" from "CISC." They're basically just
         | marketing terms for "my instructions are reasonably simple"
         | (RISC) and "my instructions are reasonably expressive" (CISC).
        
       | SinePost wrote:
       | The "Final Verdict" is very plain and is hardly enhanced by
       | reading the body of the article. It would make more sense if it
       | was put in the opening of the article, creating a complete
       | abstract.
        
       | rollcat wrote:
       | Somewhat related: http://danluu.com/new-cpu-features/ discussion:
       | https://news.ycombinator.com/item?id=31093430
        
       | stncls wrote:
       | Needs (2020). It explains why, for example, Zen 2 & 3 are not
       | discussed.
        
         | mhh__ wrote:
         | They don't really add that much to the picture. Zen is a pretty
         | boring (in a good way) architecture in the big picture.
        
       | bjourne wrote:
       | RISC just means that the instruction set is _reduced_ (compared
       | to what was the norm in the early 1980s). It does not say whether
       | the architecture is register-memory or load-store (though most
       | RISC ISAs are load-store). As long as the x86 CPUs does not
       | decode to more than, say, two dozen microcode _types_ it uses
       | RISC  "internally".
        
       | kens wrote:
       | What I find most interesting is the "social history" of RISC vs
       | CISC: how did a computer architecture issue from the 1980s turn
       | into something that people vigorously debate 40 years later?
       | 
       | I have several theories:
       | 
       | 1. x86 refused to die as expected, so the RISC vs CISC debate
       | doesn't have a clear winner. There are reasonable arguments that
       | RISC won, CISC won, or it no longer matters.
       | 
       | 2. RISC vs CISC has clear teams: now Apple vs Intel, previously
       | DEC, Sun, etc vs Intel. So you can tie the debate into your
       | "personal identity" more than most topics. The debate also has an
       | underdog vs entrenched monopoly vibe that makes it more
       | interesting.
       | 
       | 3 RISC vs CISC is a simple enough topic for everyone to have an
       | opinion (unlike, say, caching policies). But on the other hand,
       | it's vague enough that nobody can agree on anything.
       | 
       | 4. RISC exists on three levels: First, a design philosophy /
       | ideology. Second, a style of instruction set architecture that
       | results from this philosophy. Finally, a hardware implementation
       | style (deep pipelines, etc) that results. With three levels for
       | discussion, there's lots of room for debate.
       | 
       | 5. RISC vs CISC has a large real-world impact, not just for
       | computer architects but for developers and users. E.g. Apple
       | switching to ARM affects the user but changing the internal bus
       | architecture does not.
       | 
       | (I've been trying to make a blog post on this subject, but it
       | keeps spiraling off in random directions.)
        
         | phire wrote:
         | _> 4. RISC exists on three levels... Finally, a hardware
         | implementation style (deep pipelines, etc) that results._
         | 
         | I agree with the philosophy and ISA, but I don't think RISC is
         | actually counts as a hardware architecture.
         | 
         | Yes, there is a certain style of architecture strongly
         | associated with RISC, the "classic RISC pipeline" that a lot of
         | early RISC implementations share. But RISC can't claim
         | ownership over the concept of a pipelined CPUs in general and
         | designers following the RISC philosophy almost immediately
         | branched out into other hardware architectures directions like
         | superscalar and out-of-order execution (some also branched into
         | VLIW).
         | 
         | Today, the "class RISC pipeline" is almost entirely abandoned
         | outside of very low-power and low-gate count embedded cores.
         | 
         | The primary advantage of the RISC philosophy was that it
         | allowed them to experiment with new hardware architecture ideas
         | several years early than those competitors who were stuck
         | supporting legacy CISC instruction sets. Especially when they
         | could just dump their previous ISA and create a new one hyper-
         | specialised for that exact hardware architecture.
         | 
         | Those CISC designers also followed the same path in the 80s and
         | 90s, implementing pipelined architectures, and then superscalar
         | and then out-of-order, but their designs always had to dedicate
         | more gates to adapting their legacy ISAs to an appropriate
         | internal representation.
         | 
         | ----
         | 
         | But eventually silicon processes got dense enough for this
         | inherent advantage of the RISC philosophy to fade away. The
         | overhead of supporting those legacy CISC ISAs got smaller and
         | smaller.
         | 
         | All high-performance CPUs these days seem to have settled on a
         | common hardware architecture, doesn't matter if they use CISC
         | or RISC ISAs, the diagrams all seem to look more or less the
         | same. This architecture doesn't really have a name (which might
         | be part of the reason why everyone is stuck arguing RISC vs
         | CISC), but it's the out-of-order beast with absolutely massive
         | reorder buffers, wide-decoders, physical register files, long-
         | pipelines, good branch predictors, lots of execution units and
         | (often) an uOP cache.
         | 
         | Intel's Sandybridge is the first example of this exact hardware
         | architecture (though that design linage starts all the way back
         | with the Pentium pro, and you also have AMD examples that get
         | close), but Apple quickly follows up with Cyclone and then AMD
         | with Zen. Finally ARM starts rapidly catching up from about the
         | Cortex A76 onwards.
        
       | kens wrote:
       | A question that maybe HN can help me answer: are there _any_ new
       | instruction set architectures since, say, 1985 that are CISC?
       | (Excluding, of course, ISAs that are extensions of previous CISC
       | ISAs.)
        
         | exmadscientist wrote:
         | I'd guess there are a few hiding in the corners of the
         | processor world. Think ultra-low-power (TI MSP430?) or DSP (TI
         | C2000?).
         | 
         | (I've used both of those two examples, but it's been a while,
         | and I don't have any particular desire to crack open their
         | architecture manuals to see how CISCy or RISCy they are. It's
         | kind of an academic distinction anyway at this point.)
        
           | exmadscientist wrote:
           | I thought of one!
           | 
           | Recent ESP32 processors have a "ULP Coprocessor" for ultra-
           | low-power operations. Its instruction set is... well... not
           | very RISCy: https://docs.espressif.com/projects/esp-
           | idf/en/v4.2/esp32/ap...
           | 
           | (Spoiler alert for the lazy: it has _single instructions_ for
           | ADC reads and I2C transactions and such. I don 't think it
           | gets more CISC than that!)
        
             | phire wrote:
             | Eh... It's a load/store architecture with fixed size
             | instruction and all instructions execute in 1 cycle (unless
             | they stall due to IO)
             | 
             | That makes it quite RISCy. The overpowered instructions are
             | just because it's an IO processor, they don't do much work
             | on the CPU core itself, just trigger an IO component.
        
         | phire wrote:
         | I think Intel's iAPX 432 was probably the last in ~1982.
         | 
         | It's not just that the RISC philosophy became popular, but
         | suddenly it didn't make much sense to design a CISC ISA
         | anymore.
         | 
         | CISC was a great idea when you had really slow and narrow RAM.
         | It made sense to try and make each instruction as short and
         | powerful as possible, usually using microcoded routines. But
         | RAM got cheaper, buses got wider and caches started being a
         | thing. It didn't make any sense to waste transistors on
         | microcode, just put them all in RAM.
        
         | NotYourLawyer wrote:
         | RISC-V with extensions probably qualifies.
        
           | snvzz wrote:
           | >RISC-V with extensions probably qualifies (as CISC)
           | 
           | With RVA22, RISC-V has already caught up with ARM and x86
           | functionality.
           | 
           | Yet, It does not have their complexity. It is not even close.
        
         | neerajsi wrote:
         | Renesas RX is the newest cisc I've seen.
         | 
         | https://www.renesas.com/us/en/products/microcontrollers-micr...
        
       | sobkas wrote:
       | There are some similarities with Transmeta.
        
         | CalChris wrote:
         | Transmeta Crusoe first interpreted x86, Linus Torvalds worked
         | on the interpreter, and then JIT'd hotspots into a 128-bit wide
         | VLIW. There's no way that VLIW could be confused with RISC.
         | 
         | https://www.zdnet.com/article/transmetas-crusoe-how-it-works...
        
       | jylam wrote:
       | "(the code is 32-bit just to allow us to discuss very old x86
       | processors)"
       | 
       | fsck, that hurts.
        
         | hinkley wrote:
         | Damn kids. I was an early adopter of 32 bit coding. When I was
         | in a big hurry to get my career started there was still plenty
         | of 16 big code around, even Netscape ran on the 16 bit version
         | of the Windows API. I ended up tapping the brakes and changing
         | gears to make sure I didn't have to deal with that bullshit.
         | Most of my CS classes had been taught on 32 bit Unix boxes so
         | it just felt like sticks and rocks.
         | 
         | The jump from 32 bit to 64 was not so dramatic. I Wonder if
         | I'll be around for 128 bit. I suspect the big disruption there
         | will be changing the CPU cache line size, which has been stuck
         | at 64 bytes for ages. I can't imagine 4 words per cache line
         | will be efficient.
        
       | cptskippy wrote:
       | I grew up in the 80s and 90s, and what I gathered from listening
       | to the grey beards talk was that RISC based designs were more
       | elegant, easier to understand, and more efficient. When I first
       | started hearing about then modern CISC cpus decoding to RISC, it
       | was pushed as a justification that RISC was fundamentally
       | superior.
       | 
       | This was around the time IBM was pushing Power and everyone
       | thought it was poised to dominate the industry.
        
       | CalChris wrote:
       | I've never liked this idea that _x86 CPUs decode instructions
       | into RISC form internally_. Before there was RISC, before there
       | was even x86, there were microcoded instruction sets [1]. They
       | were first implemented in Wilkes ' 1958 EDSAC 2. Indeed the
       | Patterson Ditzel paper even comments on this:
       | Microprogrammed control allows the implementation of complex
       | architectures more cost-effectively than hardwired control. [2]
       | 
       | These horizontally microprogrammed instructions interpreted the
       | architectural instruction set. The VAX 11/750 microcode control
       | program had an interpreter loop. There could be more than 100
       | bits in these horizontal instructions with 30+ fields.
       | Horizontally microprogrammed instructions were not in any way
       | _reduced_. Indeed, reduction would mean paying the decode tax
       | twice.
       | 
       | There was another form, vertical microprogramming, which was
       | closer to RISC. But there was no translation from complex to
       | vertical.
       | 
       | [1]
       | https://www.cs.princeton.edu/courses/archive/fall10/cos375/B...
       | 
       | [2] https://inst.eecs.berkeley.edu/~n252/paper/RISC-
       | patterson.pd...
        
         | fulafel wrote:
         | Yep, and microcomputer processors were also microcoded. See eg
         | 8086 here https://www.righto.com/2022/11/how-8086-processors-
         | microcode...
         | 
         | And current x86 still implement some instructions via
         | microcode. Some are even performance sensitive (eg rep movsb)
        
           | p_l wrote:
           | P6 architecture was always microcoded - it's even somewhat
           | archetypical example of superscalar microcoded CISC. With
           | quite horizontal microcode.
        
         | Sparkyte wrote:
         | I don't think I could articulate my concern as well as you.
         | There is always some form of overhead when trying to translate
         | instructions.
        
         | mort96 wrote:
         | The "CISC CPUs just decode instruction into RISC internally"
         | thing is getting at something I think is important: RISCs and
         | CISCs aren't necessarily that different internally. "CISC CPUs
         | and RISC CPUs both just decode instructions into microcode and
         | then execute that" is probably a more accurate but less
         | "memeable" expression of that idea.
         | 
         | What exactly we mean by "RISC" and "CISC" becomes important
         | here. If, by RISC, we mean an architecture without complex
         | addressing modes, then "CISCs are just RISCs with a fancy
         | decoder" is wrong. But if we expand our definition of "RISC" to
         | allow for complex addressing modes and stuff, but keep a vague
         | upper limit on the "amount of stuff" the instruction can do,
         | it's sort of more appropriate; the "CISCy" REPNZ SCASB
         | instruction (basically a strlen-instruction) is certainly
         | decoded into a loop of less complex instructions.
         | 
         | I think the main issue with most of these discussions is in the
         | very idea that there is such a thing as "RISC" and "CISC", when
         | there's no such distinction. They are, at best, abstract
         | philosophies around ISA design which describe a general
         | preference for complex instructions vs simpler instructions,
         | where even the terms "complex" and "simple" have very muddy
         | definitions.
         | 
         | TL;DR: I agree
        
       | moomin wrote:
       | One woman's RISC is another man's CISC. The "perform operation
       | and branch on flags" operation described here might not be part
       | of RISC-V, but it 100% was part of ARM 1 when ARM was at the
       | forefront of the movement.
        
         | snvzz wrote:
         | >but it 100% was part of ARM 1 when ARM was at the forefront of
         | the movement.
         | 
         | ARM1 is what it is. They didn't have the time to do it
         | properly, or the hindsight we have now.
         | 
         | They had to get their product out.
        
       | compressedgas wrote:
       | No mention of AMD's RISC86 which was the patented internal
       | decoding of X86 instructions into a RISC instruction set.
       | 
       | https://patents.google.com/patent/US5926642A/en (1996)
        
         | cwzwarich wrote:
         | Even though AMD filed the patent a few years later, this was
         | actually from NextGen, who AMD acquired:
         | 
         | https://en.wikipedia.org/wiki/NexGen
         | 
         | Here's an old BYTE article from 1994:
         | 
         | https://halfhill.com/byte/1994-6_cover-nexgen.html
        
           | boramalper wrote:
           | Out of context: The level of detail and the quality of
           | writing in older _consumer_ magazines always amaze me.
        
           | p_l wrote:
           | Before NextGen, AMD built an instruction decoder from x86 to
           | AMD29k (which was Berkeley RISC style) and used it in K5.
        
             | gumby wrote:
             | Wow, really? Cool!
             | 
             | The 29K was a really cool architecture and I'm sorry it
             | didn't make it. AMD's pathetic marketing of the time
             | couldn't even beat MIPS' terrible marketing, plus MIPS had
             | SGI (and later SGI had MIPS).
        
               | p_l wrote:
               | Funnily enough, new AMD29050 are still being made (maybe
               | even with further development) by Honeywell - they form
               | the basis of their range of avionics computers like
               | Flight Management Systems etc.
        
             | mumbel wrote:
             | Got interested in amd29k for about a week before finding
             | something else to mess with. Quick attempt at ghidra
             | support, but never really RE'd with it, so no clue how does
             | on larger projects.
             | 
             | https://github.com/mumbel/ghidra_a29k
        
             | fulafel wrote:
             | Are there references about it using the actual 29k
             | instruction set internally? Some (non primary) sources from
             | cursory web search seems to say it used custom micro-ops
             | which it did call "RISC ops", and had other implementation
             | pieces carried from a abandoned 29k chip project.
        
               | p_l wrote:
               | There are persistent mentions of being able to switch off
               | the decoder and use plain AMD29K instructions, but I
               | never found any proper docs - don't have K5 to test
               | against either.
        
       | phendrenad2 wrote:
       | No mention of RISC86[1] and the hype[2] surrounding it.
       | 
       | [1] https://patents.google.com/patent/US6336178B1/en
       | 
       | [2] https://halfhill.com/byte/1996-1_amd-k6.html
        
       | rany_ wrote:
       | I'm not sure how true this is or if it's a legend but I remember
       | reading about this originating from Intel marketing in response
       | to the rise of the popularity of RISC in the 1990s.
       | 
       | In essence it intended to give the impression that there is no
       | need for RISC architecture because x86 was already a RISC behind
       | the scenes. So you got the best of both worlds.
        
         | Sharlin wrote:
         | Probably apocryphal, since the 8086 was already microcode-
         | based: http://www.righto.com/2022/11/how-8086-processors-
         | microcode-...
        
         | weinzierl wrote:
         | That is how I remember it and I believe it was at the time when
         | Apple made a big marketing drama that their top of the line
         | RISC machine is so fast that it falls under US export control.
         | Due to that, there was a spike in popularity of RISC and Intel
         | marketing being like "We are also RISC!". At least according to
         | my memory.
        
           | jeffbee wrote:
           | That was already a misleading campaign at the time and
           | certainly did not age well. When the G4 Mac launched at best
           | 450MHz (delayed 6 months and derated from advertised 500MHz)
           | it was going head-to-head with the 733MHz Pentium III
           | "Coppermine" that was available, cheap, and faster from most
           | points of view. By the time you could actually buy a 500MHz
           | G4, you could also buy a 1000MHz AMD Athlon. To make it seem
           | like the whole PowerPC thing had been a good idea you had to
           | cherry-pick the best Mac and an old PC as a baseline.
        
             | p_l wrote:
             | I do remember those ads, and they started back with G3 and
             | iMac.
             | 
             | I recall seeing them in Polish computer mags with G3 iMac
             | 333 MHz or so.
        
             | asveikau wrote:
             | Also worth noting that in that era, the PowerPC Macs were
             | running cooperative multitasking with no address space
             | isolation.
        
       | peter_d_sherman wrote:
       | Related:
       | 
       | https://news.ycombinator.com/item?id=27334855
       | 
       | https://www.google.com/search?q=%22christopher+domas%22+x86+...
       | 
       | https://en.wikipedia.org/wiki/Alternate_Instruction_Set
       | 
       | >"In 2018 Christopher Domas discovered that some Samuel 2
       | processors came with the Alternate Instruction Set enabled by
       | default and that by executing AIS instructions from user space,
       | it was possible to gain privilege escalation from Ring 3 to Ring
       | 0.[5] Domas had partially reverse engineered the AIS instruction
       | set using automated fuzzing against a cluster of seven thin
       | clients.[12] Domas used the terms "deeply embedded core" (DEC)
       | plus "deeply embedded instruction set" (DEIS) for the RISC
       | instruction set, "launch instruction" for JMPAI, "bridge
       | instruction" for the x86 prefix wrapper, "global configuration
       | register" for the Feature Control Register (FCR), and documented
       | the privilege escalation with the name "Rosenbridge".[5]"
       | 
       | Also -- I should point out that the debate of _if_ x86 (CISC) CPU
       | 's contain RISC cores -- is largely academic.
       | 
       | Both RISC and CISC CPU's contain ALU's -- so our only debate,
       | really, if we have one, is _how_ exactly data that the ALU is
       | going to process -- is going to wind up at the ALU...
       | 
       | It is well known in the x86 community that the x86 instructions
       | are an abstraction, a level of abstraction which runs on top of
       | lower-level of abstraction, the x86 microcode layer...
       | 
       | Historically, intentionally or unintentionally, most x86 vendors
       | have done everything they can to hide, obfuscate, and obscure
       | this layer... There (to the best of my knowledge, at this point
       | in time) is no official documentation of this layer, how it works
       | (etc., etc.) from any any major x86 vendor.
       | 
       | x86 microcode update blobs -- are binary "black boxes" and
       | encrypted.
       | 
       | Most of our (limited) knowledge in this area comes from various
       | others who have attempted to understand the internal workings of
       | x86 microcode:
       | 
       | https://www.google.com/search?q=%22reverse+engineering+x86+p...
       | 
       | https://github.com/RUB-SysSec/Microcode
       | 
       | https://twitter.com/_markel___/status/1262697756805795841
       | 
       | https://www.youtube.com/watch?v=lY5kucyhKFc
       | 
       | It should be pointed out that even if a complete understanding of
       | x86 microcode were to be had for one generation of CPU -- there
       | would always be successive generations where that implementation
       | might change -- leaving anyone who would wish to fully understand
       | it, back at square one...
       | 
       | To (mis)quote Douglas Adams:
       | 
       |  _" There is a theory which states that if ever anyone discovers
       | exactly what the x86 microcode layer is for and why it is here,
       | it will instantly disappear and be replaced by something even
       | more bizarre and inexplicable."
       | 
       | There is another theory which states that this has already
       | happened."_ :-) <g>
        
         | bri3d wrote:
         | It's also worth noting that "microcode" in a modern x86 CPU is
         | not really the same thing as "microcode" in an older,
         | "microcoded" CPU. What do I mean by this?
         | 
         | Some older CPUs were truly "microcoded." The heart of the CPU
         | was an interpreter loop which took instructions, then invoked a
         | ROM / microcode routine corresponding to the implementation for
         | that instruction. Each instruction began with the runtime
         | selecting an instruction and ended with the microcode routine
         | returning to a loop. A good example of this is the 8086:
         | http://www.righto.com/2022/11/how-8086-processors-microcode-...
         | . These CPUs worked like a traditional "emulator."
         | 
         | That is NOT how a modern x86 CPU works. A modern CPU works a
         | lot more like a JIT. In a modern x86 CPU, "microcode" runs
         | _alongside_ the main CPU core execution. The microcode runtime
         | can be thought of as a co-processor of sorts: some complex
         | instructions or instructions with errata are redirected into
         | the microcode co-processor, and it's responsible for breaking
         | those instructions down and emitting their lower-level uOps
         | back into the execution scheduler. However, most instructions
         | never touch microcode at all: they are decoded purely in
         | hardware and issued into the scheduler directly.
         | 
         | This is important because when people start talking about the
         | x86 "microcode layer," they quickly get confused between the
         | "microcode" (which is running _alongside_ the processor) and
         | uOps, which are the lower level instructions issued into the
         | processor's execution scheduler.
        
           | allenrb wrote:
           | Thanks, I'd been wondering about this for years. I couldn't
           | imagine how microcode would be directly involved with so many
           | pipeline stages and execution units. But if it's reserved for
           | the "highly unusual", it all fits together. And the explosion
           | of transistor budgets since 8086 means we can afford to
           | implement all but the wackiest stuff in hardware.
        
           | peter_d_sherman wrote:
           | Some excellent points!
           | 
           | uOps and how they function should be included in any serious
           | study of x86 hardware.
           | 
           | In an ideal world, they would be well documented by vendors.
           | 
           | (Random Idea: I wonder if it is possible to use some feature
           | of some newer x86 chip to issue an x86 instruction -- and
           | then to have it retrieve the uOp structure for that
           | instruction as data... sort of like a uOp proxy or debug
           | facility... If newer x86 chips don't have that function --
           | then a future x86 chip (which doesn't necessarily have to be
           | be Intel) should...)
           | 
           | >In a modern x86 CPU, "microcode" runs _alongside_ the main
           | CPU core execution.
           | 
           | Indeed!
           | 
           | The broader picture of a CPU, any CPU, is that there's a
           | _lot_ that is happening at the same time. A lot of signals
           | (and data!) moving around, and changing around for various
           | purposes! (In other words, modern CPU 's don't work linearly,
           | like a computer program -- unless perhaps we talk about the
           | earliest oldest CPU's...)
           | 
           | But the study of uOps should be undertaken in any serious
           | study of the x86...
           | 
           | It should be pointed out that the number of uOps -- should be
           | less, way less, than the number of x86 instructions, since
           | each x86 instruction is implemented in one to several uOps,
           | and these uOps frequently repeat across instructions...
           | 
           | Which again brings us the academic question of "if there are
           | way fewer uOps than x86 instructions -- then can the set of
           | uOps if taken by themselves be considered a RISC?" Why or why
           | not?
           | 
           | Which brings us to Christopher Domas' discovery that some VIA
           | x86 Samuel 2 processors had an "Alternate Instruction Set" --
           | what is the Alternate Instruction Set's relation to uOps? Is
           | the AIS direct uOps, or 1:1 mapping? Or was it implemented by
           | microcode that translated each AIS instruction into multiple
           | uOps, and if so, how was that microcode implemented?
           | 
           | As an addendum, I found the following resource for uops
           | online:
           | 
           | https://uops.info/index.html
           | 
           | Anyway, some excellent points in your post!
        
           | p_l wrote:
           | Umm, uOps are _exactly_ like the old horizontal microcode,
           | down to the ability to jump out to patched out
           | instructions...
           | 
           | The simplest horizontally microcoded CPUs had their "decoder"
           | be essentially mapping into microcode ROM, where said
           | microcode contained instructions that directly drove various
           | components of the CPU, coupled with some flags regarding next
           | micro-instruction to execute.
           | 
           | Many instructions could in fact decode to single "wide"
           | instruction, with sequencer-controlling bits telling it to
           | load the next instruction and dispatch it. Others would jump
           | to microcode "dispatch" routine which usually was uOP(s) that
           | executed advancing of program counter and triggering load of
           | next instruction to appropriate register then jumped based on
           | that.
           | 
           | Usually multiple wide instructions per macroinstruction
           | happened either for complex designs (there were microcoded
           | CPUs whose microcode was _multitasking with priority levels!_
           | ) or when decoded instruction mapped into a more complex
           | sequence of operations (take for example a difference between
           | an "ADD" between two registers and an "ADD" that does complex
           | offset calculation for all operands - or PDP-10 style
           | indirect addressing, where you'd loop memory load until
           | hitting the final, non-indirect address).
           | 
           | To make it simpler, it was common to make microcode ROMs have
           | multiple empty slots between addresses generated by
           | sequencer, so that you avoided jumps unless you got really
           | hairy instruction (largest possible VAX instruction was
           | longer than its page size).
        
       | 0xr0kk3r wrote:
       | It is fascinating that semantic confusion over RISC vs CISC
       | persists since I was in college in the 80's. It is largely
       | meaningless.
       | 
       | The naive idea behind RISC is essentially to reduce the ISA to
       | near-register-level operations: load, store, add, subtract,
       | compare, branch. This is great for two things: being the first
       | person to invent an ISA, and teaching computer engineering.
       | 
       | Look at the evolution of RISC-V. The intent was to build an open
       | source ISA from a 100% clean slate, using the world's academic
       | computer engineering brains (and corporations that wanted to be
       | free of Arm licensing) ... and a lot of the subtext was initially
       | around ISA purity.
       | 
       | Look at the ISA today, specifically the RISC-V extensions that
       | have been ratified. It has a soup of wacky opcodes to optimize
       | corner cases, and obscure vendor specific extensions that are
       | absolutely CISC-y (examine T-Head's additions if you don't
       | believe me!).
       | 
       | Ultimately the combination of ISA, implementation (the CPU), and
       | compiler struggle to provide optimal solutions for the majority
       | of applications. This inevitably leads to a complex instruction
       | set computer. Put enough engineers on the optimization problem
       | and that's what happens. It is not a good or bad thing, it just
       | IS.
        
         | codedokode wrote:
         | RISC-V is copying wrong decisions made tens years ago. Against
         | any common sense it doesn't trap on overflow on arithmetic
         | operations, and silently wraps number over, producing incorrect
         | result. Furthermore, it does not provide an overflow flag (or
         | any alternative) so it is difficult to make addition of 256-bit
         | numbers for example.
        
           | snvzz wrote:
           | I love how each and every criticism on RISC-V's decisions
           | ignores the rationale behind them.
           | 
           | Yes, that idea was evaluated, weighted and discarded as
           | harmful, and the details are referenced in the spec itself.
        
             | codedokode wrote:
             | I tried searching the spec [1] for "overflow" and here is
             | what it says at page 17:
             | 
             | > We did not include special instruction-set support for
             | overflow checks on integer arithmetic operations in the
             | base instruction set, as many overflow checks can be
             | cheaply implemented using RISC-V branches.
             | 
             | > For general signed addition, three additional
             | instructions after the addition are required
             | 
             | Is this "cheap", replacing 1 instruction with four?
             | According to some old mainframe era research (cannot find
             | link now), addition is one of the most often used
             | instructions and they suggest that we should replace each
             | instruction with four?
             | 
             | Their "rationale" is not rational at all. It doesn't make
             | sense.
             | 
             | Overflow check should be free (no additional instructions
             | required), otherwise we will see the same story we have
             | seen for last 50 years: compiler writers do not want to
             | implement checks because they are expensive; language
             | designers do not want to use proper arithmetic because it
             | is expensive. And CPU designers do not want to implement
             | traps because no language needs them. As a result, there
             | will be errors and vulnerabilities. A vicious circle.
             | 
             | What also surprises me is that they added fused add-
             | multiply instruction which can easily be replaced by 2
             | separate instructions, is not really needed in most
             | applications (like a web browser), and is difficult to
             | implement (if I am not mistaken, you need to read 3
             | registers instead of 2, which might require additional
             | ports in register file only for this useless instruction).
             | 
             | [1] https://github.com/riscv/riscv-isa-
             | manual/releases/download/...
        
           | wbl wrote:
           | It doesn't trap because trapping means you need to track the
           | possibility of a branch at every single arithmetic operation.
           | It doesn't have a flag so flag renaming isn't needed: you can
           | get the overflow from a CMP instruction and macroop fusion
           | should just work.
        
             | codedokode wrote:
             | > you need to track the possibility of a branch at every
             | single arithmetic operation
             | 
             | Every memory access can cause a trap, but CPUs seem to have
             | no problem about it. The branch is very unlikely and can
             | always be predicted as "not taken".
        
         | CalChris wrote:
         | To be fair, RISC-V has a small base, RV64I in the 64-bit case.
         | These bases are small, reduced and frozen. But after that, yes,
         | the extensions get whacky. L is Decimal Floating Point, still
         | marked Open. I'm not sure what's reduced about that. But
         | extensions are optional.
         | 
         | About the history of RISC, the basic idea dates to Seymour
         | Cray's 1964 CDC 6600. I don't think Berkeley gives Cray enough
         | credit.
        
           | dvwobuq wrote:
           | Patterson and Waterman detail exactly what they we're
           | thinking during the design of RISCV in the RISCV Reader and
           | Cray is mentioned in multiple places.
           | 
           | https://www.goodreads.com/en/book/show/36604301
        
         | dehrmann wrote:
         | It's the story of every framework. It starts out clean and
         | minimal, then gets features added on as users demand more for
         | more and more specific uses.
        
           | codedokode wrote:
           | This "feature hell" is often seen in open source projects,
           | when users add dubious features that nobody except them
           | needs, and as a result after many years the program has
           | hundreds of CLI flags and settings and becomes too complex.
           | 
           | See openvpn as an example.
        
             | 0xr0kk3r wrote:
             | It is NOT feature hell. That is an absolutist/purist
             | standpoint that only gets in the way in my experience.
             | Products evolve to fit their market, which is literally why
             | products are made.
             | 
             | Complexity needs to be managed, not labelled and shunned
             | because it is "too hard" or "ugly". That is life. Learn
             | that early and it will help.
        
         | snvzz wrote:
         | The usual RISC-V FUD points. It gets boring.
         | 
         | >It has a soup of wacky opcodes to optimize corner cases
         | 
         | OK, go ahead and name one (1) such opcode. I'll wait.
         | 
         | >obscure vendor specific extensions that are absolutely CISC-y
         | (examine T-Head's additions if you don't believe me!).
         | 
         | Yes, these extensions are harmful, and that's why they're
         | obscure and vendor-specific.
         | 
         | RISC-V considers pros and cons, evaluates across use cases, and
         | weights everything when considering whether to accept something
         | into the standard specs.
         | 
         | Simplicity itself is valuable; that is at the core of RISC. So
         | the default is to reject. A strong argument needs to be made to
         | justify adding anything.
        
           | codedokode wrote:
           | RISC-V ISA is very inconsistent. For example, for addition
           | with checked overflow the spec says that there is no need for
           | such instruction as it can be implemented "cheaply" in four
           | instructions. But at the same time they have fused multiply-
           | add which is only needed for matrix multiplication (i.e. only
           | for scientific software), which is difficult to implement (it
           | needs to read 3 registers at once), and which can be easily
           | replaced with two separate instructions.
        
           | [deleted]
        
           | 0xr0kk3r wrote:
           | Your argument is to admit that the extensions are harmful and
           | then challenge me to name one example of something harmful.
           | *Chef's kiss.*
        
         | ip26 wrote:
         | there is an iron triangle operating on ISA design. I would
         | propose the vertexes are complexity, performance, and memory
         | model. The ideal ISA has high performance, a strong memory
         | model, and a simple instruction set, but it cannot exist.
        
           | thesz wrote:
           | Define high performance.
           | 
           | Also, define strong memory model. This is the first time I
           | hear that memory model can be strong.
           | 
           | And, finally, define what is "simple" in instruction set.
        
             | sweetjuly wrote:
             | Strong and weak as terms to describe memory models is very
             | common, the standard RISC-V memory model is called "weak
             | memory ordering" after all :)
        
               | [deleted]
        
               | snvzz wrote:
               | Strong memory ordering is convenient for the programmer.
               | 
               | But it is a no-go for SMP scalability.
               | 
               | That's why most architectures today use weak ordering.
               | 
               | x86 is alone and a dinosaur.
        
               | ip26 wrote:
               | I'm curious to hear what problems are you thinking of in
               | particular that make it no-go? Strong model has
               | challenges, but I am not aware of any total showstoppers.
               | 
               | x86 has also illustrated the triangle, garnering some
               | weakly ordered benefits with examples like avx512 and
               | enhanced rep movsb.
               | 
               | The interesting thing is _both_ solutions (weak ordering,
               | special instructions) have been largely left to the
               | compiler to manage, so it could become a question of
               | which the compiler is better able to leverage. For
               | example, if people are comfortable programming MP code in
               | C on a strong memory model but reach for python on a weak
               | memory model, things could shake out differently than
               | expected.
        
       ___________________________________________________________________
       (page generated 2023-06-18 23:01 UTC)