[HN Gopher] iAPX432: Gordon Moore, Risk and Intel's Super-CISC F...
___________________________________________________________________
iAPX432: Gordon Moore, Risk and Intel's Super-CISC Failure
Author : klelatti
Score : 74 points
Date : 2023-04-02 16:33 UTC (6 hours ago)
(HTM) web link (thechipletter.substack.com)
(TXT) w3m dump (thechipletter.substack.com)
| nickdothutton wrote:
| I think the iAPX432 team went on to do the i960 (another
| interesting architecture that didn't really find the success that
| was hoped-for) and then finally they went on to the PentiumPro
| where they found more success.
| convolvatron wrote:
| it wasn't a total failure. the 860 and 960 were decent little
| engines that found a home in high performance computing and
| embedded applications that needed a little oomph. I worked on
| some 860 array products and certainly remember finding 960s in
| printers and other gear
| speedbird wrote:
| Worked on an i860 Stratus machine in the early 90s - provided
| a key part of our distributed infra due to its FT
| capabilities.
| trzy wrote:
| Trivia: The i960 was used to power Sega's highly successful
| Model 2 arcade board, which truly ushered in the era of 3D
| gaming* (with Daytona USA), and was used in the F-22 Raptor
| until it was later replaced with other CPUs.
|
| * Certainly not the first 3D arcade hardware but arguably
| this along with Namco's MIPS-based System 22 (Ridge Racer,
| released a little before Daytona), was the inflection point
| that made 2D effectively obsolete.
| panick21_ wrote:
| Its really said the i960 didn't take off. Intel wasn't in Unix
| workstation market much and the high end was owned by Digital
| and IBM. Intel was working on i860, i960,
|
| Intel if they had added quicker, could have cooperated with a
| Unix workstation maker and potentially done really well.
|
| Sun was defiantly looking around for a chip partner at the
| time, but non of the American companies were interested so they
| went Japan. So the timing didn't really work out. A Sun Intel
| alliance would have been a scary prospect and beneficial for
| both companies.
| twoodfin wrote:
| What's most interesting to me about the i432 is the rich array of
| object types essentially embedded into its ISA. The JVM "knows" a
| little bit about virtual dispatch tables, monitors, arrays, but
| even that pales in comparison to the i432's user-facing model of
| the CPU state.
|
| Is there anything comparable surviving today?
| userbinator wrote:
| There were some attempts in the Java direction:
| https://en.wikipedia.org/wiki/Java_processor
|
| But ultimately it seems that the idea of language-specific CPUs
| just didn't survive because people want to be able to use any
| programming language with them.
| panick21_ wrote:
| The Java-Everything trip Sun went on was truly horrific. Both
| in terms of technical and business results.
| gumby wrote:
| I don't think so, except at the margins.
|
| I started out as a Lisp hacker on machines designed for it
| (PDP-10 and CADR, later D-machines) so I was very much in the
| camp you describe. They had hardware / microcode support for
| tagging, unboxing, fundamental Lisp opcodes, and for the Lispms
| specifically, things like a GC barrier and transporter support.
| When I looked at implementations like VAXLisp, the extra cycles
| needed to implement these things seemed like a burden to me.
|
| Of course those machines did lots of other things as well, and
| so were subject to a lot of evolutionary pressure the research
| machines were not subject to.
|
| The shocker that changed my mind was the idea of using the TLB
| to implement the write barrier. Yes, doing all that extra work
| cost cycles, but you were doing on a machine that had evolved
| lots of extra capabilities that could ameliorate some of the
| burden. Plus the underlying hardware just got faster faster
| (I.e. second derivative was higher).
|
| Meanwhile, the more dedicated architectures were burning
| valuable real estate on these features and couldn't keep up
| elsewhere. You saw this in the article when the author wrote
| about gates that could have been used elsewhere.
|
| Finally, some decisions box you in -- the 64kb object size
| limitation being an example in the 432. Sure, you can work
| around it, but then the support for these objects becomes a
| deadweight (part of the RISC argument).
|
| You see this also in the use of GPUs as huge parallel machines,
| even though the original programming abstraction was triangles.
|
| Going back to my first sentence about "at the margins":
| optimize at the end. Apple famously added a "jvm" instruction
| -- must have been the fruit of a lot of metering! Note that
| they didn't have to do this for Objective-C: some extremely
| clever programming made dispatch cheap.
|
| Tagging/unboxing can be supported in a variety of (relatively)
| inexpensive ways by using ALU circuitry otherwise idle during
| address calculation OR (more likely these days) by implementing
| a couple of in demand ops, either way pretty cheap.
|
| Finally, we do have a return to and flourishing of separate,
| specialized functional units (image processors, "learning"
| units and such, like, say, the database hardware of old) but
| they aren't generally fully programmable (even if they have
| processors embedded in them) but they key factor is that they
| don't interfere (except via some DMA) with the core processing
| operations.
| aardvark179 wrote:
| "Going back to my first sentence about "at the margins":
| optimize at the end. Apple famously added a "jvm" instruction
| -- must have been the fruit of a lot of metering! Note that
| they didn't have to do this for Objective-C: some extremely
| clever programming made dispatch cheap."
|
| I'm struggling to think of what you are referring to here.
| ARM added op codes for running JVM byte code on the processor
| itself, but I think those instructions were dropped a long
| time ago. ARM also added an instruction (floating point
| convert to fixed point rounding towards zero) as it became
| such a common operation in JS code. There have also been
| various GC related instructions and features added to POWER,
| but I think all that was well after Apple had abandoned the
| architecture.
|
| I may be forgetting sonething, could you clarify?
| panick21_ wrote:
| Not adding tagging is basically a negligence crime. That
| feature isn't that expensive and it could have saved most of
| the security issues that have happened to last 20+ years.
| Lammy wrote:
| NSA et al probably like better it that way so they have
| easier access to the "'intel' inside" my PC.
| yourapostasy wrote:
| _> Is there anything comparable surviving today?_
|
| I'm not aware of such Super CISC instruction sets in popular
| use today, but I wonder with VM's and statistically-based AI
| proliferating now, whether we might revisit such architectures
| in the future. Could continuous VM-collected statistical data
| inform compiler and JIT compiler design to collapse expensive,
| common complex operations we can't identify patterns for with
| current methods into Super CISC instructions that substantially
| speed up patterns we didn't know previously existed, or are our
| current methods to analyze and implement compilers and JIT's
| good enough and what's mostly holding them back these days are
| other factors like memory and cache access speed and pipeline
| stalls?
| rodgerd wrote:
| > Is there anything comparable surviving today?
|
| Surviving? No. The most recent is arguably Sun's Rock
| processor, which was one of the final nails in their coffin,
| was quite an i432 redux. It promised all sorts of hardware
| support for transactions and other features that Sun thought
| would make it a killer chip, was rumoured to tape out requiring
| 1 kW of power for mediocre performance, and Oracle killed it
| when they saw how dysfunctional it was.
| monocasa wrote:
| If you squint hard enough, the underlying object capability
| system as privilege boundary concept still does live on.
|
| In hardware the 432 went on to inspire 16 and 32 protected
| modes on x86. There it was inspiration for just about anything
| involving the GDT and the LDT including fine grained memory
| segments, hardware task switching of Task State Segments, and
| virtual dispatch through Task Gates.
|
| But a large point of the RISC revolution was that these kinds
| of abstractions in microcode don't make sense anymore when you
| have ubiquitous I$s. Rather than a fixed length blob of vendor
| code that's hard to update, let end users create whatever
| abstractions they feel make sense in regular (albeit if
| privileged) code. Towards that end the 90s and 2000s had an
| explosion of supervisor mode enforced object capability
| systems. These days the most famous is probably sel4; there are
| a lot of parallels between sel4's syscall layer and the
| iAPX432's object capability interface between user code and
| microcode. In a lot of ways the most charitable way to look at
| the iAPX432 microcode was as a very early microkernel in ROM.
| markhahn wrote:
| timing is interesting: ia432 listed as "late 1981"
| (wikipedia), and 286 (protected mode 16:16 segmentation) in
| Feb 1982. of course, the 432 had been going on for some
| time...
| CalChris wrote:
| The ambitious 432 was also late, quite late. So Intel needed a
| simple stopgap product which was an iteration of the 8088, the
| 8086.
| wtallis wrote:
| The 8088 (1979) was a low-cost (reduced bus width) follow-up to
| the 8086 (1978). You may be thinking of the 8080 (1974) or 8085
| (1976).
| B1FF_PSUVM wrote:
| """
|
| The key new features included:
|
| Ada : The architecture would be programmed using the Ada
| programming language, which at the time was seen as the 'next big
| thing' in languages.
|
| """
|
| This was it - the next big thing. Missed and went down, but
| there's always the doubt if they were right too early ...
|
| Seems the performance just wasn't there:
|
| """
|
| Gordon admitted that "to a significant extent," he was personally
| responsible. "It was a very aggressive shot at a new
| microprocessor, but we were so aggressive that the performance
| was way below par."
|
| """
|
| It was kind of uncanny having it shouted from the rooftops one
| year, to dead silence about anything having ever happened a few
| years later.
| markhahn wrote:
| Ada was the CISC of languages - high-concept the same way. And
| it lost to C, surely the RISC of languages.
|
| Never bet against low-tech.
| markhahn wrote:
| 432 was a pretty interesting flop, but surely the ia64 has to
| rank up there.
|
| it would be interesting to try to chart some of the features that
| show up in various chips. for instance, 64k segments in both 432
| and 286+, or VLIW in 860 and ia64.
| ghaff wrote:
| One difference is that, according to the article, Intel
| actually learned quite a bit technically from the 432 even
| though it was a commercial flop. It's hard to see much of a
| silver lining in IA64/Itanium for either Intel or HP--or,
| indeed, for all the other companies that wasted resources on
| Itanium if only because they felt they had to cover their
| bases.
| fpoling wrote:
| Itanic was a flop due to AMD releasing 64bit CPU. And I still
| think Intel learned a lot from its failure if not from the
| technology but business-wise. Just stick to improving the
| existing architecture while keeping backward-compatibility.
| markhahn wrote:
| VLIW was really marooned in time: driven by overconfidence
| in the compiler (which had shown that you could actually
| expose pipeline hazards), and underestimates of the coming
| abundance of transistors (which make superscalar OoO really
| take off, along with giant onchip caches). well, and
| multicore to sop up even more available transistors.
| PAPPPmAc wrote:
| IMO, Itanic was a doomed design from the start, the lesson
| to be learned is that "You can't statically schedule
| dynamic behavior." The VLIW/EPIC type designs like Itanium
| require you have a _very clever_ compiler to schedule well
| enough to extract even a tiny fraction of theoretical
| performance for both instruction packing and memory
| scheduling reasons. That turns out to be extremely
| difficult in the best case, and in a dynamic environment
| (with things like interrupts, a multitasking OS, bus
| contention, DRAM refresh timing, etc.) it's basically
| impossible. Doing much of the micro-scheduling dynamically
| in the instruction decoder (see: all modern x86 parts that
| decompose x86 instructions into whatever it is they run
| internally that vendor generation) nearly always wins in
| practice.
|
| Intel spent decades trying to clean-room a user-visible
| high end architecture (iAPX432, then i860, then Itanium),
| while the x86 world found a cheat code for microprocessors
| with the dynamic translation of a standard ISA into
| whatever fancy modern core you run internally (microcode-
| on-top-of-a-RISC? Dynamic microcode? JIT instruction
| decoder? I don't think we really have a comprehensive name
| for it) thing. Arguably, NexGen were really the first to
| the trick in 1994, with their Nx586 design that later
| evolved into the AMD K6, but Intel's P6 - from which most
| i686 designs descend - is an even better implementation of
| the same trick less than a year later, and almost all
| subsequent designs work that way.
| wtallis wrote:
| Based on https://en.wikipedia.org/wiki/File:Itanium_Sales_F
| orecasts_e... it's clear that Itanium was delayed and sales
| projections were drastically reduced multiple times before
| AMD even announced their 64-bit alternative, let alone
| actually shipping Opteron. (For reference, AMD announced
| AMD64 in October 1999, published the spec August 2000,
| shipped hardware in April 2003. Intel didn't publicly
| confirm their plans to adopt x86-64 until February 2004,
| and shipped hardware in June 2004.)
| l1k wrote:
| A lot of RISC CPU arches which were popular in the 1990's
| declined because their promulgators stopped investments and
| bet on switching to IA64 instead. Around the year 2000, VLIW
| was seen as the future and all the CISC and RISC
| architectures were considered obsolete.
|
| That strategic failure by competitors allowed x86 to grow
| market share at the high end, which benefited Intel more than
| the money lost on Itanium.
| ghaff wrote:
| It's more complicated than that.
|
| Sun didn't slow down on UltraSPARC or make an Itanium side
| bet. IBM did (and continues to) place their big hardware
| bet on Power--Itanium was mostly a cover your bases thing.
| I don't know what HP would have done--presumably either
| gone their own way with VLIW or kept PA-RISC going.
|
| Pretty much all the other RISC/Unix players had to go to a
| standard processor; some were already on x86. Intel mostly
| recovered from Itanium specifically but it didn't do them
| any favors.
| sliken wrote:
| Actually, they did. Intel promised aggressive delivery
| schedule, performance ramp, and performance. The industry
| took it hook, line, and sinker. While AMD decided not to
| limit 64 bit to the high end and brought out x86-64.
|
| Sun did a port IA64 port of solaris, which is definitely
| an itanium side bet.
|
| HP was involved in the IA64 effort and definitely was
| planning on the replacement of pa-risc from day 1.
| davidgay wrote:
| > HP was involved in the IA64 effort and definitely was
| planning on the replacement of pa-risc from day 1.
|
| As my memory remembers and
| https://en.wikipedia.org/wiki/Itanium agrees, Itanium
| originated at HP. So yes, a replacement for pa-risc from
| day 1, but even more so...
| rodgerd wrote:
| Another way to look at the Itanic is that HP somehow
| conned Intel into betting the farm on building HP-PA3 for
| HP. Which is pretty impressive.
| foobiekr wrote:
| This isn't really true. IBM/Motorola need to own the
| failure of POWER and PowerPC and MIPS straight up died on
| the performance side. Sun continued with Ultrasparc.
|
| It wasn't that IA64 killed them, it's that they were
| getting shaky and IA64 appealed _because_ of that. Plus the
| lack of a 64bit x86.
| userbinator wrote:
| _Plus the lack of a 64bit x86._
|
| If you look at the definitions of various structures and
| opcodes in x86 you'll notice gaps that would've been
| ideal for a 64-bit expansion, so I think they had a plan
| besides IA64, but AMD beat them to it (and IMHO with a
| far more inelegant extension.)
| panick21_ wrote:
| Its simply economics Intel had the volume. Sun and SGI
| simply didn't have the economics to invest the same
| amount, and they were also not chip company, the both
| didn't invest enough in chip design or invested it
| wrongly.
|
| Sun spend an unbelievable amount of money on dumb ass
| processor projects.
|
| Towards the end of the 90s all of them realized their
| business model would not do well against Intel, so pretty
| much all of them were looking for an exit and IA64 hype
| basically killed most of them. Sun stuck it out with
| Sparc with mixed results. IBM POWER continues but in a
| thin slice of the market.
|
| Ironically there was a section of Digital and Intel who
| thought that Alpha should be the bases of 64 bit x86.
| That would have made Intel pretty dominate. Alpha (maybe
| a TSO version) with 32 bit x86 comparability mode.
| PAPPPmAc wrote:
| Look closely at AMD designs (and staff) of the very late
| 90s and early 2000s and/or all modern x86 parts and see
| that ...more or less, that's what happened, just not with
| an Alpha mode.
|
| Dirk Meyer (Co-Architect of the DEC Alpha 21064 and
| 21264) lead the K7 (Athlon) project, and they run on a
| licensed EV6 bus borrowed from the Alpha.
|
| Jim Keller (Co-Architect of the DEC Alpha 21164 21264)
| lead the K8 (first gen x86-64) project, and there are a
| number of design decisions in the K8 evocative of the
| later Alpha designs.
|
| The vast majority of x86 parts since the (NexGen Nx686
| which became) AMD K6 and Pentium Pro (P6) have been
| internal RISC-ish cores with decoders that ingest x86
| instructions and chunk them up to be scheduled on an
| internal RISC architecture.
|
| It has turned out to sort of be a better-than-both-worlds
| thing almost by accident. A major part of what did in the
| VLIW-ish designs was that "You can't statically schedule
| dynamic behavior" and a major problem for the RISC
| designs was that exposing architectural innovations on a
| RISC requires you change the ISA and/or memory behavior
| in visible ways from generation to generation,
| interfering with compatability so... the RISC-
| behind-x86-decoder designs get to follow the state of the
| art changing whatever they need to behind the decoder
| without breaking compatibility AND get to have the
| decoder do the micro-scheduling dynamically.
| Dalewyn wrote:
| >That strategic failure by competitors allowed x86 to grow
| market share at the high end, which benefited Intel more
| than the money lost on Itanium.
|
| In that sense, Itanium was a resounding success for Intel
| (and AMD).
| panick21_ wrote:
| Itanium was a success right until they actually made a
| chip.
|
| What they should have done is hype Itanium and then they
| day it came out they should have said yeah this was a
| joke, what we did is buy Alpha from Compaq and its
| literally just Alpha with x86 comparability mode.
|
| Then they would have dominated.
| jimmaswell wrote:
| If I recall correctly and shuttling instructions around fast
| enough is the main bottleneck right now, why do people want to
| return to RISC?
___________________________________________________________________
(page generated 2023-04-02 23:00 UTC)