[HN Gopher] The legend of "x86 CPUs decode instructions into RIS...
___________________________________________________________________
The legend of "x86 CPUs decode instructions into RISC form
internally" (2020)
Author : segfaultbuserr
Score : 137 points
Date : 2023-06-18 13:33 UTC (9 hours ago)
(HTM) web link (fanael.github.io)
(TXT) w3m dump (fanael.github.io)
| fooblaster wrote:
| I'm not sure how you could write something like this without
| considering something like the micro op cache, which is present
| in all modern x86 and some arm processors. The micro op cache on
| x86 is effectively is the only way an x86 processor can get full
| ipc performance, and that's because it contains pre decoded
| instructions. We don't know the formats here, but we can
| guarantee that they are fixed length instructions and that they
| have branch instructions annotated. Yeah sure, these instructions
| have more complicated semantics than true risc instructions, but
| they have the most important part - fixed length. This makes it
| possible for 8-10 of them to be dispatched to the backend per
| cycle. In my mind, this definitely is the "legend" manifested.
| jabl wrote:
| Do we know they are fixed length? They could e.g. use a table
| with offsets to instruction boundaries in the cache?
| gpderetta wrote:
| We know that they are over 100 bits (which is not very RISCy)
| and not fixed length as some constants cause the instructions
| to take more than one cache slots. IIRC they are also not
| necessarily load store.
| mikequinlan wrote:
| >Final verdict
|
| >There is some truth to the story that x86 processors decode
| instructions into RISC-like form internally. This was, in fact,
| pretty much how P6 worked, later improvements however made the
| correspondence tortuous at best. Some microarchitecture families,
| on the other hand, never did anything of the sort, meaning it was
| never anywhere near a true statement for them.
| p_l wrote:
| P6 was normal microcoded architecture, very much not RISC-like,
| though.
|
| Especially with its 143 bit long instructions and memory-memory
| operands.
|
| Now, K5 was a different beast, with it's "microcode" being a
| frontend that converted x86 to AMD29k
| ajross wrote:
| "RISC" architectures are doing something effectively identical to
| uop fusion though. The real myth is the idea of a CISC/RISC
| dichotomy in the first place when frankly that notion only ever
| applied to the ISA specifications and not (except for the very
| earliest cores) CPU designs.
|
| In point of fact beyond the instruction decode stage all modern
| cores look more or less identical.
| gumby wrote:
| > The real myth is the idea of a CISC/RISC dichotomy in the
| first place
|
| The divergence was one of philosophy, and had unexpected
| implications.
|
| CISC was a "business as usual" evolution of the 1960s view
| (exception: Seymour Cray) that you should make it easy to write
| assembly code so have lots of addressing modes and subroutines
| (string ops, BCD, etc) in the instruction set.
|
| RISC was realizing that software was good enough that compilers
| could do the heavy lifting and without all that junk hardware
| designers could spend their transistor budget more usefully.
|
| That's all well and good (I was convinced at the time, anyway)
| but the results have been amusing. For example some RISC
| experiments turn out to have painted their designs into dead
| ends (delay slots, visible register windows, etc) while the
| looseness of the CISC approach allowed more optimization to be
| done in the micromachine. I did not see that coming!
|
| Agree on the point that the cores themselves have found a
| common local maximum.
| phire wrote:
| But there wasn't ever a divergence in philosophy. It was a
| straight switch.
|
| In the 70s, everyone designing an ISA was doing CISC. Then in
| the 80s, everyone suddenly switched to designing RISC ISAs,
| more or less overnight. There weren't any holdouts, nobody
| ever designed a new CISC ISA again.
|
| The only reason why it might seem like there was a divergence
| is because some CPU microarchitecture designers were allowed
| to design new ISAs to meet their needs, while others were
| stuck having to design new microarchitecture for legacy CISC
| ISAs which were too entrenched to replace.
|
| _> For example some RISC experiments turn out to have
| painted their designs into dead ends_
|
| Which is kind of obvious in hindsight. The RISC philosophy
| somewhat encouraged exposing pipeline implementation details
| to the ISA, which is a great idea if you can design a fresh
| new ISA for each new CPU microarchitecture.
|
| But those RISC ISAs became entrenched, and CPU
| microarchitecture found themselves having to design for what
| are now legacy RISC ISAs and work around implementation
| details that don't make sense anymore.
|
| Really the divergence was fresh ISAs vs legacy ISAs.
|
| _> while the looseness of the CISC approach allowed more
| optimisation to be done in the micromachine._
|
| I don't think this is actually an inherent advantage of CISC.
| It's simply result of the shear amount of R&D that AMD,
| Intel, and others poured into the problem of making fast
| microarchitectures for x86 CPUs.
|
| If you threw the same amount of resources at any other legacy
| RISC ISA, you would probably get the same result.
| jeffbee wrote:
| That's because there are only a few ISAs left standing. The ISA
| does have consequences for core design. This becomes apparent
| for ISAs that were unintentionally on the wrong side of the
| development of superscalar. The dead ISAs assumed in-order,
| single-issue, fixed-length pipelines and as soon as the state
| of the art shifted those ISAs became hard to implement. MIPS
| and SPARC are both basically incompatible with modern high-
| performance CPU design techniques.
| ajross wrote:
| > MIPS and SPARC are both basically incompatible with modern
| high-performance CPU design techniques.
|
| I don't see how that follows at all? MIPS in fact is about as
| pure a "RISC" implementation as is possible to conceive[1],
| and it shares all its core ideas with RISC-V. You absolutely
| could make a deeply pipelined superscalar multi-core MIPS
| chip. SPARC has the hardware stack engine to worry about, but
| then modern CPUs have all moved to behind-the-scenes stack
| virtualization anyway.
|
| No, CPUs are CPUs. Instruction set architectures are a
| vanishingly tiny subset of the design of these things. They
| just don't matter. They only seem to matter to programmers
| like us, because it's the only part we see.
|
| [1] Branch delay slots and integer multiply instruction
| notwithstanding I guess.
| hinkley wrote:
| We had to touch MIPS in school. Having to deal with branch
| delay slots was cruel. It broke some of my classmates. We
| were on a cusp of needing every programmer we could get and
| they were torturing students. I hope those teachers lost
| sleep over that.
|
| Am I correct in recalling they removed branch delay slots
| in a later iteration of the chips?
| IAmLiterallyAB wrote:
| IIRC They made new branch instructions without delay
| slots, but the normal branch instructions still have
| delay slots.
|
| Had to write MIPS assembly by hand recently, incredibly
| counter intuitive.
| jeffbee wrote:
| The design of RISC-V starts with an entire chapter
| excoriating MIPS for being useless and impossible to reduce
| to transistors.
| allenrb wrote:
| Some of the same people were involved, no? They must have
| had a good time writing that. "Things we have learned the
| hard way".
| mumbel wrote:
| And now MIPS, the company, makes RISC-V
| ant6n wrote:
| The decode with variable width instructions is one of the
| bottlenecks for ipc though. Its hard to imagine, for example, a
| 10-way decode on x86.
|
| An internal fixed with encoding inside the instruction cache
| may work.
| ajross wrote:
| > Its hard to imagine, for example, a 10-way decode on x86.
|
| Uh, why? Instructions start at byte boundaries. Say you want
| to decode an whole 64 byte cache line at once (probably 10-14
| instructions on average). Thats... 64 parallel decode units.
| Seems extremely doable to me, given that "instruction decode"
| isn't even visible as a block on the enormous die shots
| anyway.
|
| Obviously there's some cost there in terms of pipeline depth,
| but then again Sandy Bridge showed how to do caching at the
| uOp level to avoid that. Again, totally doable and a long-
| solved problem. The real reason that Intel doesn't do 10-wide
| decode is that it doesn't have 10 execution units to fill
| (itself because typical compiler-generated code can't exploit
| that kind of parallelism anyway).
| formerly_proven wrote:
| That's a Zen 2 core: https://static.wixstatic.com/media/501
| 7e5_bbd1e91507434d40bc...
|
| Overlaid on a higher resolution die shot https://static.wix
| static.com/media/5017e5_982e0e47d7c04dd693...
|
| Here's a POWER8 floorplan: https://cdn.wccftech.com/wp-
| content/uploads/2013/08/IBM-Powe... (The decoder is
| subsumed under IFU)
|
| And POWER9 https://www.nextplatform.com/wp-
| content/uploads/2016/08/ibm-...
|
| Didn't find any recent, reliable sources for Intel cores.
| hinkley wrote:
| Why does the middle of the Zen FPU look like cache lines?
| exmadscientist wrote:
| I believe those are the utterly massive register files
| needed to feed a modern vector FPU.
| hinkley wrote:
| That would have been my guess. But I don't think I've
| ever seen a register file big enough that I could spot it
| without a label. I'm almost surprised they are so tall
| and not wide. Or is that because I am looking at it
| sideways, and each register is top to bottom?
| exmadscientist wrote:
| This is about Skylake rather than Zen2 but it's
| fascinating, if the subject of what's-really-in-a-
| register-file is fascinating to you:
| https://travisdowns.github.io/blog/2020/05/26/kreg2.html
| hinkley wrote:
| When I still read architecture docs like novels, there
| was a group experimenting with processor-in-memory
| architectures. Where the demo was Memory chips doing
| vector processing of data in parallel.
|
| I wonder how wide SIMD has to get before you treat it
| like a CPU embedded into cache memory.
|
| Though I guess we are already looking at SIMD
| instructions wider than a cache line...
| cachvico wrote:
| Saved this great article from a couple of years ago,
| https://medium.com/swlh/what-does-risc-and-cisc-mean-in-2020...
| adrianmonk wrote:
| If someone says x86 decodes to RISC internally, they might be
| getting at one of two different ideas:
|
| (1) RISC really is the fastest/best way for CPUs to operate
| internally.
|
| (2) x86 performance isn't held back (much) by its outmoded
| instruction set.
|
| x86 architectures were for a while translating into effectively
| RISC but stopped doing it. Now internally they are less RISC-
| like. This suggests #1 is false and #2 is true.
|
| They could if they want to (because they have) but they don't
| want to anymore. Presumably because it's not the best way to do
| it. Although I guess it could be slightly better but not worth
| the cost of translating.
| pclmulqdq wrote:
| I think if you want to analyze this to the same level of
| pedantry as the original blog post, the thing people actually
| mean is:
|
| "x86 instructions decompose into operations that feel somewhat
| RISC-like"
|
| And this is pretty much true.
|
| The author of this piece somewhat fixates on RISC-V as the
| anointed version of "RISC-ness" when it really is just one
| expression of the idea. The whole RISC vs CISC distinction is
| pretty silly anyway, because there isn't really a clear
| criterion or dividing line (see ARM's "RISC" instruction set)
| that separates "RISC" from "CISC." They're basically just
| marketing terms for "my instructions are reasonably simple"
| (RISC) and "my instructions are reasonably expressive" (CISC).
| SinePost wrote:
| The "Final Verdict" is very plain and is hardly enhanced by
| reading the body of the article. It would make more sense if it
| was put in the opening of the article, creating a complete
| abstract.
| rollcat wrote:
| Somewhat related: http://danluu.com/new-cpu-features/ discussion:
| https://news.ycombinator.com/item?id=31093430
| stncls wrote:
| Needs (2020). It explains why, for example, Zen 2 & 3 are not
| discussed.
| mhh__ wrote:
| They don't really add that much to the picture. Zen is a pretty
| boring (in a good way) architecture in the big picture.
| bjourne wrote:
| RISC just means that the instruction set is _reduced_ (compared
| to what was the norm in the early 1980s). It does not say whether
| the architecture is register-memory or load-store (though most
| RISC ISAs are load-store). As long as the x86 CPUs does not
| decode to more than, say, two dozen microcode _types_ it uses
| RISC "internally".
| kens wrote:
| What I find most interesting is the "social history" of RISC vs
| CISC: how did a computer architecture issue from the 1980s turn
| into something that people vigorously debate 40 years later?
|
| I have several theories:
|
| 1. x86 refused to die as expected, so the RISC vs CISC debate
| doesn't have a clear winner. There are reasonable arguments that
| RISC won, CISC won, or it no longer matters.
|
| 2. RISC vs CISC has clear teams: now Apple vs Intel, previously
| DEC, Sun, etc vs Intel. So you can tie the debate into your
| "personal identity" more than most topics. The debate also has an
| underdog vs entrenched monopoly vibe that makes it more
| interesting.
|
| 3 RISC vs CISC is a simple enough topic for everyone to have an
| opinion (unlike, say, caching policies). But on the other hand,
| it's vague enough that nobody can agree on anything.
|
| 4. RISC exists on three levels: First, a design philosophy /
| ideology. Second, a style of instruction set architecture that
| results from this philosophy. Finally, a hardware implementation
| style (deep pipelines, etc) that results. With three levels for
| discussion, there's lots of room for debate.
|
| 5. RISC vs CISC has a large real-world impact, not just for
| computer architects but for developers and users. E.g. Apple
| switching to ARM affects the user but changing the internal bus
| architecture does not.
|
| (I've been trying to make a blog post on this subject, but it
| keeps spiraling off in random directions.)
| phire wrote:
| _> 4. RISC exists on three levels... Finally, a hardware
| implementation style (deep pipelines, etc) that results._
|
| I agree with the philosophy and ISA, but I don't think RISC is
| actually counts as a hardware architecture.
|
| Yes, there is a certain style of architecture strongly
| associated with RISC, the "classic RISC pipeline" that a lot of
| early RISC implementations share. But RISC can't claim
| ownership over the concept of a pipelined CPUs in general and
| designers following the RISC philosophy almost immediately
| branched out into other hardware architectures directions like
| superscalar and out-of-order execution (some also branched into
| VLIW).
|
| Today, the "class RISC pipeline" is almost entirely abandoned
| outside of very low-power and low-gate count embedded cores.
|
| The primary advantage of the RISC philosophy was that it
| allowed them to experiment with new hardware architecture ideas
| several years early than those competitors who were stuck
| supporting legacy CISC instruction sets. Especially when they
| could just dump their previous ISA and create a new one hyper-
| specialised for that exact hardware architecture.
|
| Those CISC designers also followed the same path in the 80s and
| 90s, implementing pipelined architectures, and then superscalar
| and then out-of-order, but their designs always had to dedicate
| more gates to adapting their legacy ISAs to an appropriate
| internal representation.
|
| ----
|
| But eventually silicon processes got dense enough for this
| inherent advantage of the RISC philosophy to fade away. The
| overhead of supporting those legacy CISC ISAs got smaller and
| smaller.
|
| All high-performance CPUs these days seem to have settled on a
| common hardware architecture, doesn't matter if they use CISC
| or RISC ISAs, the diagrams all seem to look more or less the
| same. This architecture doesn't really have a name (which might
| be part of the reason why everyone is stuck arguing RISC vs
| CISC), but it's the out-of-order beast with absolutely massive
| reorder buffers, wide-decoders, physical register files, long-
| pipelines, good branch predictors, lots of execution units and
| (often) an uOP cache.
|
| Intel's Sandybridge is the first example of this exact hardware
| architecture (though that design linage starts all the way back
| with the Pentium pro, and you also have AMD examples that get
| close), but Apple quickly follows up with Cyclone and then AMD
| with Zen. Finally ARM starts rapidly catching up from about the
| Cortex A76 onwards.
| kens wrote:
| A question that maybe HN can help me answer: are there _any_ new
| instruction set architectures since, say, 1985 that are CISC?
| (Excluding, of course, ISAs that are extensions of previous CISC
| ISAs.)
| exmadscientist wrote:
| I'd guess there are a few hiding in the corners of the
| processor world. Think ultra-low-power (TI MSP430?) or DSP (TI
| C2000?).
|
| (I've used both of those two examples, but it's been a while,
| and I don't have any particular desire to crack open their
| architecture manuals to see how CISCy or RISCy they are. It's
| kind of an academic distinction anyway at this point.)
| exmadscientist wrote:
| I thought of one!
|
| Recent ESP32 processors have a "ULP Coprocessor" for ultra-
| low-power operations. Its instruction set is... well... not
| very RISCy: https://docs.espressif.com/projects/esp-
| idf/en/v4.2/esp32/ap...
|
| (Spoiler alert for the lazy: it has _single instructions_ for
| ADC reads and I2C transactions and such. I don 't think it
| gets more CISC than that!)
| phire wrote:
| Eh... It's a load/store architecture with fixed size
| instruction and all instructions execute in 1 cycle (unless
| they stall due to IO)
|
| That makes it quite RISCy. The overpowered instructions are
| just because it's an IO processor, they don't do much work
| on the CPU core itself, just trigger an IO component.
| phire wrote:
| I think Intel's iAPX 432 was probably the last in ~1982.
|
| It's not just that the RISC philosophy became popular, but
| suddenly it didn't make much sense to design a CISC ISA
| anymore.
|
| CISC was a great idea when you had really slow and narrow RAM.
| It made sense to try and make each instruction as short and
| powerful as possible, usually using microcoded routines. But
| RAM got cheaper, buses got wider and caches started being a
| thing. It didn't make any sense to waste transistors on
| microcode, just put them all in RAM.
| NotYourLawyer wrote:
| RISC-V with extensions probably qualifies.
| snvzz wrote:
| >RISC-V with extensions probably qualifies (as CISC)
|
| With RVA22, RISC-V has already caught up with ARM and x86
| functionality.
|
| Yet, It does not have their complexity. It is not even close.
| neerajsi wrote:
| Renesas RX is the newest cisc I've seen.
|
| https://www.renesas.com/us/en/products/microcontrollers-micr...
| sobkas wrote:
| There are some similarities with Transmeta.
| CalChris wrote:
| Transmeta Crusoe first interpreted x86, Linus Torvalds worked
| on the interpreter, and then JIT'd hotspots into a 128-bit wide
| VLIW. There's no way that VLIW could be confused with RISC.
|
| https://www.zdnet.com/article/transmetas-crusoe-how-it-works...
| jylam wrote:
| "(the code is 32-bit just to allow us to discuss very old x86
| processors)"
|
| fsck, that hurts.
| hinkley wrote:
| Damn kids. I was an early adopter of 32 bit coding. When I was
| in a big hurry to get my career started there was still plenty
| of 16 big code around, even Netscape ran on the 16 bit version
| of the Windows API. I ended up tapping the brakes and changing
| gears to make sure I didn't have to deal with that bullshit.
| Most of my CS classes had been taught on 32 bit Unix boxes so
| it just felt like sticks and rocks.
|
| The jump from 32 bit to 64 was not so dramatic. I Wonder if
| I'll be around for 128 bit. I suspect the big disruption there
| will be changing the CPU cache line size, which has been stuck
| at 64 bytes for ages. I can't imagine 4 words per cache line
| will be efficient.
| cptskippy wrote:
| I grew up in the 80s and 90s, and what I gathered from listening
| to the grey beards talk was that RISC based designs were more
| elegant, easier to understand, and more efficient. When I first
| started hearing about then modern CISC cpus decoding to RISC, it
| was pushed as a justification that RISC was fundamentally
| superior.
|
| This was around the time IBM was pushing Power and everyone
| thought it was poised to dominate the industry.
| CalChris wrote:
| I've never liked this idea that _x86 CPUs decode instructions
| into RISC form internally_. Before there was RISC, before there
| was even x86, there were microcoded instruction sets [1]. They
| were first implemented in Wilkes ' 1958 EDSAC 2. Indeed the
| Patterson Ditzel paper even comments on this:
| Microprogrammed control allows the implementation of complex
| architectures more cost-effectively than hardwired control. [2]
|
| These horizontally microprogrammed instructions interpreted the
| architectural instruction set. The VAX 11/750 microcode control
| program had an interpreter loop. There could be more than 100
| bits in these horizontal instructions with 30+ fields.
| Horizontally microprogrammed instructions were not in any way
| _reduced_. Indeed, reduction would mean paying the decode tax
| twice.
|
| There was another form, vertical microprogramming, which was
| closer to RISC. But there was no translation from complex to
| vertical.
|
| [1]
| https://www.cs.princeton.edu/courses/archive/fall10/cos375/B...
|
| [2] https://inst.eecs.berkeley.edu/~n252/paper/RISC-
| patterson.pd...
| fulafel wrote:
| Yep, and microcomputer processors were also microcoded. See eg
| 8086 here https://www.righto.com/2022/11/how-8086-processors-
| microcode...
|
| And current x86 still implement some instructions via
| microcode. Some are even performance sensitive (eg rep movsb)
| p_l wrote:
| P6 architecture was always microcoded - it's even somewhat
| archetypical example of superscalar microcoded CISC. With
| quite horizontal microcode.
| Sparkyte wrote:
| I don't think I could articulate my concern as well as you.
| There is always some form of overhead when trying to translate
| instructions.
| mort96 wrote:
| The "CISC CPUs just decode instruction into RISC internally"
| thing is getting at something I think is important: RISCs and
| CISCs aren't necessarily that different internally. "CISC CPUs
| and RISC CPUs both just decode instructions into microcode and
| then execute that" is probably a more accurate but less
| "memeable" expression of that idea.
|
| What exactly we mean by "RISC" and "CISC" becomes important
| here. If, by RISC, we mean an architecture without complex
| addressing modes, then "CISCs are just RISCs with a fancy
| decoder" is wrong. But if we expand our definition of "RISC" to
| allow for complex addressing modes and stuff, but keep a vague
| upper limit on the "amount of stuff" the instruction can do,
| it's sort of more appropriate; the "CISCy" REPNZ SCASB
| instruction (basically a strlen-instruction) is certainly
| decoded into a loop of less complex instructions.
|
| I think the main issue with most of these discussions is in the
| very idea that there is such a thing as "RISC" and "CISC", when
| there's no such distinction. They are, at best, abstract
| philosophies around ISA design which describe a general
| preference for complex instructions vs simpler instructions,
| where even the terms "complex" and "simple" have very muddy
| definitions.
|
| TL;DR: I agree
| moomin wrote:
| One woman's RISC is another man's CISC. The "perform operation
| and branch on flags" operation described here might not be part
| of RISC-V, but it 100% was part of ARM 1 when ARM was at the
| forefront of the movement.
| snvzz wrote:
| >but it 100% was part of ARM 1 when ARM was at the forefront of
| the movement.
|
| ARM1 is what it is. They didn't have the time to do it
| properly, or the hindsight we have now.
|
| They had to get their product out.
| compressedgas wrote:
| No mention of AMD's RISC86 which was the patented internal
| decoding of X86 instructions into a RISC instruction set.
|
| https://patents.google.com/patent/US5926642A/en (1996)
| cwzwarich wrote:
| Even though AMD filed the patent a few years later, this was
| actually from NextGen, who AMD acquired:
|
| https://en.wikipedia.org/wiki/NexGen
|
| Here's an old BYTE article from 1994:
|
| https://halfhill.com/byte/1994-6_cover-nexgen.html
| boramalper wrote:
| Out of context: The level of detail and the quality of
| writing in older _consumer_ magazines always amaze me.
| p_l wrote:
| Before NextGen, AMD built an instruction decoder from x86 to
| AMD29k (which was Berkeley RISC style) and used it in K5.
| gumby wrote:
| Wow, really? Cool!
|
| The 29K was a really cool architecture and I'm sorry it
| didn't make it. AMD's pathetic marketing of the time
| couldn't even beat MIPS' terrible marketing, plus MIPS had
| SGI (and later SGI had MIPS).
| p_l wrote:
| Funnily enough, new AMD29050 are still being made (maybe
| even with further development) by Honeywell - they form
| the basis of their range of avionics computers like
| Flight Management Systems etc.
| mumbel wrote:
| Got interested in amd29k for about a week before finding
| something else to mess with. Quick attempt at ghidra
| support, but never really RE'd with it, so no clue how does
| on larger projects.
|
| https://github.com/mumbel/ghidra_a29k
| fulafel wrote:
| Are there references about it using the actual 29k
| instruction set internally? Some (non primary) sources from
| cursory web search seems to say it used custom micro-ops
| which it did call "RISC ops", and had other implementation
| pieces carried from a abandoned 29k chip project.
| p_l wrote:
| There are persistent mentions of being able to switch off
| the decoder and use plain AMD29K instructions, but I
| never found any proper docs - don't have K5 to test
| against either.
| phendrenad2 wrote:
| No mention of RISC86[1] and the hype[2] surrounding it.
|
| [1] https://patents.google.com/patent/US6336178B1/en
|
| [2] https://halfhill.com/byte/1996-1_amd-k6.html
| rany_ wrote:
| I'm not sure how true this is or if it's a legend but I remember
| reading about this originating from Intel marketing in response
| to the rise of the popularity of RISC in the 1990s.
|
| In essence it intended to give the impression that there is no
| need for RISC architecture because x86 was already a RISC behind
| the scenes. So you got the best of both worlds.
| Sharlin wrote:
| Probably apocryphal, since the 8086 was already microcode-
| based: http://www.righto.com/2022/11/how-8086-processors-
| microcode-...
| weinzierl wrote:
| That is how I remember it and I believe it was at the time when
| Apple made a big marketing drama that their top of the line
| RISC machine is so fast that it falls under US export control.
| Due to that, there was a spike in popularity of RISC and Intel
| marketing being like "We are also RISC!". At least according to
| my memory.
| jeffbee wrote:
| That was already a misleading campaign at the time and
| certainly did not age well. When the G4 Mac launched at best
| 450MHz (delayed 6 months and derated from advertised 500MHz)
| it was going head-to-head with the 733MHz Pentium III
| "Coppermine" that was available, cheap, and faster from most
| points of view. By the time you could actually buy a 500MHz
| G4, you could also buy a 1000MHz AMD Athlon. To make it seem
| like the whole PowerPC thing had been a good idea you had to
| cherry-pick the best Mac and an old PC as a baseline.
| p_l wrote:
| I do remember those ads, and they started back with G3 and
| iMac.
|
| I recall seeing them in Polish computer mags with G3 iMac
| 333 MHz or so.
| asveikau wrote:
| Also worth noting that in that era, the PowerPC Macs were
| running cooperative multitasking with no address space
| isolation.
| peter_d_sherman wrote:
| Related:
|
| https://news.ycombinator.com/item?id=27334855
|
| https://www.google.com/search?q=%22christopher+domas%22+x86+...
|
| https://en.wikipedia.org/wiki/Alternate_Instruction_Set
|
| >"In 2018 Christopher Domas discovered that some Samuel 2
| processors came with the Alternate Instruction Set enabled by
| default and that by executing AIS instructions from user space,
| it was possible to gain privilege escalation from Ring 3 to Ring
| 0.[5] Domas had partially reverse engineered the AIS instruction
| set using automated fuzzing against a cluster of seven thin
| clients.[12] Domas used the terms "deeply embedded core" (DEC)
| plus "deeply embedded instruction set" (DEIS) for the RISC
| instruction set, "launch instruction" for JMPAI, "bridge
| instruction" for the x86 prefix wrapper, "global configuration
| register" for the Feature Control Register (FCR), and documented
| the privilege escalation with the name "Rosenbridge".[5]"
|
| Also -- I should point out that the debate of _if_ x86 (CISC) CPU
| 's contain RISC cores -- is largely academic.
|
| Both RISC and CISC CPU's contain ALU's -- so our only debate,
| really, if we have one, is _how_ exactly data that the ALU is
| going to process -- is going to wind up at the ALU...
|
| It is well known in the x86 community that the x86 instructions
| are an abstraction, a level of abstraction which runs on top of
| lower-level of abstraction, the x86 microcode layer...
|
| Historically, intentionally or unintentionally, most x86 vendors
| have done everything they can to hide, obfuscate, and obscure
| this layer... There (to the best of my knowledge, at this point
| in time) is no official documentation of this layer, how it works
| (etc., etc.) from any any major x86 vendor.
|
| x86 microcode update blobs -- are binary "black boxes" and
| encrypted.
|
| Most of our (limited) knowledge in this area comes from various
| others who have attempted to understand the internal workings of
| x86 microcode:
|
| https://www.google.com/search?q=%22reverse+engineering+x86+p...
|
| https://github.com/RUB-SysSec/Microcode
|
| https://twitter.com/_markel___/status/1262697756805795841
|
| https://www.youtube.com/watch?v=lY5kucyhKFc
|
| It should be pointed out that even if a complete understanding of
| x86 microcode were to be had for one generation of CPU -- there
| would always be successive generations where that implementation
| might change -- leaving anyone who would wish to fully understand
| it, back at square one...
|
| To (mis)quote Douglas Adams:
|
| _" There is a theory which states that if ever anyone discovers
| exactly what the x86 microcode layer is for and why it is here,
| it will instantly disappear and be replaced by something even
| more bizarre and inexplicable."
|
| There is another theory which states that this has already
| happened."_ :-) <g>
| bri3d wrote:
| It's also worth noting that "microcode" in a modern x86 CPU is
| not really the same thing as "microcode" in an older,
| "microcoded" CPU. What do I mean by this?
|
| Some older CPUs were truly "microcoded." The heart of the CPU
| was an interpreter loop which took instructions, then invoked a
| ROM / microcode routine corresponding to the implementation for
| that instruction. Each instruction began with the runtime
| selecting an instruction and ended with the microcode routine
| returning to a loop. A good example of this is the 8086:
| http://www.righto.com/2022/11/how-8086-processors-microcode-...
| . These CPUs worked like a traditional "emulator."
|
| That is NOT how a modern x86 CPU works. A modern CPU works a
| lot more like a JIT. In a modern x86 CPU, "microcode" runs
| _alongside_ the main CPU core execution. The microcode runtime
| can be thought of as a co-processor of sorts: some complex
| instructions or instructions with errata are redirected into
| the microcode co-processor, and it's responsible for breaking
| those instructions down and emitting their lower-level uOps
| back into the execution scheduler. However, most instructions
| never touch microcode at all: they are decoded purely in
| hardware and issued into the scheduler directly.
|
| This is important because when people start talking about the
| x86 "microcode layer," they quickly get confused between the
| "microcode" (which is running _alongside_ the processor) and
| uOps, which are the lower level instructions issued into the
| processor's execution scheduler.
| allenrb wrote:
| Thanks, I'd been wondering about this for years. I couldn't
| imagine how microcode would be directly involved with so many
| pipeline stages and execution units. But if it's reserved for
| the "highly unusual", it all fits together. And the explosion
| of transistor budgets since 8086 means we can afford to
| implement all but the wackiest stuff in hardware.
| peter_d_sherman wrote:
| Some excellent points!
|
| uOps and how they function should be included in any serious
| study of x86 hardware.
|
| In an ideal world, they would be well documented by vendors.
|
| (Random Idea: I wonder if it is possible to use some feature
| of some newer x86 chip to issue an x86 instruction -- and
| then to have it retrieve the uOp structure for that
| instruction as data... sort of like a uOp proxy or debug
| facility... If newer x86 chips don't have that function --
| then a future x86 chip (which doesn't necessarily have to be
| be Intel) should...)
|
| >In a modern x86 CPU, "microcode" runs _alongside_ the main
| CPU core execution.
|
| Indeed!
|
| The broader picture of a CPU, any CPU, is that there's a
| _lot_ that is happening at the same time. A lot of signals
| (and data!) moving around, and changing around for various
| purposes! (In other words, modern CPU 's don't work linearly,
| like a computer program -- unless perhaps we talk about the
| earliest oldest CPU's...)
|
| But the study of uOps should be undertaken in any serious
| study of the x86...
|
| It should be pointed out that the number of uOps -- should be
| less, way less, than the number of x86 instructions, since
| each x86 instruction is implemented in one to several uOps,
| and these uOps frequently repeat across instructions...
|
| Which again brings us the academic question of "if there are
| way fewer uOps than x86 instructions -- then can the set of
| uOps if taken by themselves be considered a RISC?" Why or why
| not?
|
| Which brings us to Christopher Domas' discovery that some VIA
| x86 Samuel 2 processors had an "Alternate Instruction Set" --
| what is the Alternate Instruction Set's relation to uOps? Is
| the AIS direct uOps, or 1:1 mapping? Or was it implemented by
| microcode that translated each AIS instruction into multiple
| uOps, and if so, how was that microcode implemented?
|
| As an addendum, I found the following resource for uops
| online:
|
| https://uops.info/index.html
|
| Anyway, some excellent points in your post!
| p_l wrote:
| Umm, uOps are _exactly_ like the old horizontal microcode,
| down to the ability to jump out to patched out
| instructions...
|
| The simplest horizontally microcoded CPUs had their "decoder"
| be essentially mapping into microcode ROM, where said
| microcode contained instructions that directly drove various
| components of the CPU, coupled with some flags regarding next
| micro-instruction to execute.
|
| Many instructions could in fact decode to single "wide"
| instruction, with sequencer-controlling bits telling it to
| load the next instruction and dispatch it. Others would jump
| to microcode "dispatch" routine which usually was uOP(s) that
| executed advancing of program counter and triggering load of
| next instruction to appropriate register then jumped based on
| that.
|
| Usually multiple wide instructions per macroinstruction
| happened either for complex designs (there were microcoded
| CPUs whose microcode was _multitasking with priority levels!_
| ) or when decoded instruction mapped into a more complex
| sequence of operations (take for example a difference between
| an "ADD" between two registers and an "ADD" that does complex
| offset calculation for all operands - or PDP-10 style
| indirect addressing, where you'd loop memory load until
| hitting the final, non-indirect address).
|
| To make it simpler, it was common to make microcode ROMs have
| multiple empty slots between addresses generated by
| sequencer, so that you avoided jumps unless you got really
| hairy instruction (largest possible VAX instruction was
| longer than its page size).
| 0xr0kk3r wrote:
| It is fascinating that semantic confusion over RISC vs CISC
| persists since I was in college in the 80's. It is largely
| meaningless.
|
| The naive idea behind RISC is essentially to reduce the ISA to
| near-register-level operations: load, store, add, subtract,
| compare, branch. This is great for two things: being the first
| person to invent an ISA, and teaching computer engineering.
|
| Look at the evolution of RISC-V. The intent was to build an open
| source ISA from a 100% clean slate, using the world's academic
| computer engineering brains (and corporations that wanted to be
| free of Arm licensing) ... and a lot of the subtext was initially
| around ISA purity.
|
| Look at the ISA today, specifically the RISC-V extensions that
| have been ratified. It has a soup of wacky opcodes to optimize
| corner cases, and obscure vendor specific extensions that are
| absolutely CISC-y (examine T-Head's additions if you don't
| believe me!).
|
| Ultimately the combination of ISA, implementation (the CPU), and
| compiler struggle to provide optimal solutions for the majority
| of applications. This inevitably leads to a complex instruction
| set computer. Put enough engineers on the optimization problem
| and that's what happens. It is not a good or bad thing, it just
| IS.
| codedokode wrote:
| RISC-V is copying wrong decisions made tens years ago. Against
| any common sense it doesn't trap on overflow on arithmetic
| operations, and silently wraps number over, producing incorrect
| result. Furthermore, it does not provide an overflow flag (or
| any alternative) so it is difficult to make addition of 256-bit
| numbers for example.
| snvzz wrote:
| I love how each and every criticism on RISC-V's decisions
| ignores the rationale behind them.
|
| Yes, that idea was evaluated, weighted and discarded as
| harmful, and the details are referenced in the spec itself.
| codedokode wrote:
| I tried searching the spec [1] for "overflow" and here is
| what it says at page 17:
|
| > We did not include special instruction-set support for
| overflow checks on integer arithmetic operations in the
| base instruction set, as many overflow checks can be
| cheaply implemented using RISC-V branches.
|
| > For general signed addition, three additional
| instructions after the addition are required
|
| Is this "cheap", replacing 1 instruction with four?
| According to some old mainframe era research (cannot find
| link now), addition is one of the most often used
| instructions and they suggest that we should replace each
| instruction with four?
|
| Their "rationale" is not rational at all. It doesn't make
| sense.
|
| Overflow check should be free (no additional instructions
| required), otherwise we will see the same story we have
| seen for last 50 years: compiler writers do not want to
| implement checks because they are expensive; language
| designers do not want to use proper arithmetic because it
| is expensive. And CPU designers do not want to implement
| traps because no language needs them. As a result, there
| will be errors and vulnerabilities. A vicious circle.
|
| What also surprises me is that they added fused add-
| multiply instruction which can easily be replaced by 2
| separate instructions, is not really needed in most
| applications (like a web browser), and is difficult to
| implement (if I am not mistaken, you need to read 3
| registers instead of 2, which might require additional
| ports in register file only for this useless instruction).
|
| [1] https://github.com/riscv/riscv-isa-
| manual/releases/download/...
| wbl wrote:
| It doesn't trap because trapping means you need to track the
| possibility of a branch at every single arithmetic operation.
| It doesn't have a flag so flag renaming isn't needed: you can
| get the overflow from a CMP instruction and macroop fusion
| should just work.
| codedokode wrote:
| > you need to track the possibility of a branch at every
| single arithmetic operation
|
| Every memory access can cause a trap, but CPUs seem to have
| no problem about it. The branch is very unlikely and can
| always be predicted as "not taken".
| CalChris wrote:
| To be fair, RISC-V has a small base, RV64I in the 64-bit case.
| These bases are small, reduced and frozen. But after that, yes,
| the extensions get whacky. L is Decimal Floating Point, still
| marked Open. I'm not sure what's reduced about that. But
| extensions are optional.
|
| About the history of RISC, the basic idea dates to Seymour
| Cray's 1964 CDC 6600. I don't think Berkeley gives Cray enough
| credit.
| dvwobuq wrote:
| Patterson and Waterman detail exactly what they we're
| thinking during the design of RISCV in the RISCV Reader and
| Cray is mentioned in multiple places.
|
| https://www.goodreads.com/en/book/show/36604301
| dehrmann wrote:
| It's the story of every framework. It starts out clean and
| minimal, then gets features added on as users demand more for
| more and more specific uses.
| codedokode wrote:
| This "feature hell" is often seen in open source projects,
| when users add dubious features that nobody except them
| needs, and as a result after many years the program has
| hundreds of CLI flags and settings and becomes too complex.
|
| See openvpn as an example.
| 0xr0kk3r wrote:
| It is NOT feature hell. That is an absolutist/purist
| standpoint that only gets in the way in my experience.
| Products evolve to fit their market, which is literally why
| products are made.
|
| Complexity needs to be managed, not labelled and shunned
| because it is "too hard" or "ugly". That is life. Learn
| that early and it will help.
| snvzz wrote:
| The usual RISC-V FUD points. It gets boring.
|
| >It has a soup of wacky opcodes to optimize corner cases
|
| OK, go ahead and name one (1) such opcode. I'll wait.
|
| >obscure vendor specific extensions that are absolutely CISC-y
| (examine T-Head's additions if you don't believe me!).
|
| Yes, these extensions are harmful, and that's why they're
| obscure and vendor-specific.
|
| RISC-V considers pros and cons, evaluates across use cases, and
| weights everything when considering whether to accept something
| into the standard specs.
|
| Simplicity itself is valuable; that is at the core of RISC. So
| the default is to reject. A strong argument needs to be made to
| justify adding anything.
| codedokode wrote:
| RISC-V ISA is very inconsistent. For example, for addition
| with checked overflow the spec says that there is no need for
| such instruction as it can be implemented "cheaply" in four
| instructions. But at the same time they have fused multiply-
| add which is only needed for matrix multiplication (i.e. only
| for scientific software), which is difficult to implement (it
| needs to read 3 registers at once), and which can be easily
| replaced with two separate instructions.
| [deleted]
| 0xr0kk3r wrote:
| Your argument is to admit that the extensions are harmful and
| then challenge me to name one example of something harmful.
| *Chef's kiss.*
| ip26 wrote:
| there is an iron triangle operating on ISA design. I would
| propose the vertexes are complexity, performance, and memory
| model. The ideal ISA has high performance, a strong memory
| model, and a simple instruction set, but it cannot exist.
| thesz wrote:
| Define high performance.
|
| Also, define strong memory model. This is the first time I
| hear that memory model can be strong.
|
| And, finally, define what is "simple" in instruction set.
| sweetjuly wrote:
| Strong and weak as terms to describe memory models is very
| common, the standard RISC-V memory model is called "weak
| memory ordering" after all :)
| [deleted]
| snvzz wrote:
| Strong memory ordering is convenient for the programmer.
|
| But it is a no-go for SMP scalability.
|
| That's why most architectures today use weak ordering.
|
| x86 is alone and a dinosaur.
| ip26 wrote:
| I'm curious to hear what problems are you thinking of in
| particular that make it no-go? Strong model has
| challenges, but I am not aware of any total showstoppers.
|
| x86 has also illustrated the triangle, garnering some
| weakly ordered benefits with examples like avx512 and
| enhanced rep movsb.
|
| The interesting thing is _both_ solutions (weak ordering,
| special instructions) have been largely left to the
| compiler to manage, so it could become a question of
| which the compiler is better able to leverage. For
| example, if people are comfortable programming MP code in
| C on a strong memory model but reach for python on a weak
| memory model, things could shake out differently than
| expected.
___________________________________________________________________
(page generated 2023-06-18 23:01 UTC)