[HN Gopher] The First RISC: John Cocke and the IBM 801
___________________________________________________________________
The First RISC: John Cocke and the IBM 801
Author : klelatti
Score : 58 points
Date : 2022-10-02 12:33 UTC (10 hours ago)
(HTM) web link (thechipletter.substack.com)
(TXT) w3m dump (thechipletter.substack.com)
| dosman33 wrote:
| When I worked at IBM in the early 2000's I occasionally remember
| hearing people say "IBM invented RISC!". As a young pup I knew to
| keep my mouth shut, but in my head I was like "best keep that
| news to yourself pop, the rest of the world only knows about the
| history of RISC-V" which is the only RISC we learned about in
| technical school in the late 90's. I didn't doubt IBM did
| something similar early on though, but it just came across to me
| as another funny IBM community knowledge thing that was a little
| off-base which wasn't too uncommon anyway. Obviously the IBM old
| timers were not off-base though.
|
| But it is nice to see history recognizing earlier pioneers in the
| field like John Cocke. And this article also demonstrates
| something I found perplexing as a young pup: IBM's marketing
| department dictated what engineering did. Obviously the point of
| the company is to make solutions that customers want to pay money
| for. Just because engineering has a better solution doesn't mean
| it's the best fit for customers (or the most profitable solution
| to the problem). But with my head stuck in FOSS philosophy of the
| day it was a good education in real life.
| klelatti wrote:
| Funnily enough, I worked for IBM marketing (my first job, for a
| short while) in 1984.
|
| One of our jobs was to demo System/36 - one of IBM's minis -
| which might have been replaced by an 801 system had they taken
| it up.
|
| One of my colleagues went through the demo with a room of
| customers. At the end he asked for questions. Straight away a
| hand went up.
|
| "If it's this slow with one user, how slow is it with five?"
|
| Sometimes engineering does know best!
| adrian_b wrote:
| Actually in this case IBM really has the right to brag about
| it, because there is no doubt that IBM 801 included the first
| RISC CPU, even if the name RISC was invented only several years
| later.
|
| Before IBM 801, there have been many simple CPUs, which, if
| they would be launched today, they would be called RISC CPUs,
| due to their simplicity.
|
| However for those old CPUs the simplicity was not intentional,
| but it was determined by the limitations of their manufacturing
| technology.
|
| On the other hand, the IBM 801 project, which has started in
| 1975, significant documentation about its principles of
| operation has been published during 1976 and a working
| prototype existed in 1978, was a project whose target was to
| intentionally reverse the trend towards more and more complex
| CPUs and investigate the ways by which greater simplicity may
| be in fact a method for reaching a higher performance.
|
| All the principles for obtaining high performance with a
| simpler CPU were discussed in the IBM 801 documentation 4 to 5
| years before the RISC project at Berkeley (1980 to 1984) and
| the MIPS project at Stanford (1981 to 1984).
|
| Moreover, in March 1982 there has been a very important
| symposium where all the teams working at that time on RISC CPUs
| have presented their work, including the Berkeley RISC,
| Stanford MIPS and IBM 801 projects.
|
| The presentations from that symposium have determined the
| designers of the future ARM CPU to change their architecture
| from some kind of 6502 extension to a RISC architecture. They
| have been particularly impressed by the IBM 801 presentation,
| so instead of using weak addressing modes, like in the RISC and
| MIPS projects, they have included in ARM the much more powerful
| IBM 801 addressing modes (i.e. including modes with base
| register auto update).
|
| While John Cocke has not invented the word RISC when designing
| IBM 801, either he or another member of his team (which worked
| at the project that eventually became IBM POWER) has invented
| the word "superscalar", a few years later, in 1987, when the
| design target for CPUs had moved from being able to execute one
| instruction in a single clock cycle, like in the early RISC
| CPUs, to being able to execute multiple instructions per clock
| cycle.
| cachvico wrote:
| > Moreover, in March 1982 there has been a very important
| symposium where all the teams working at that time on RISC
| CPUs have presented their work, including the Berkeley RISC,
| Stanford MIPS and IBM 801 projects.
|
| > The presentations from that symposium have determined the
| designers of the future ARM CPU to change their architecture
| from some kind of 6502 extension to a RISC architecture.
|
| That is absolutely fascinating. Are the papers from that
| conference available online?
| kragen wrote:
| Which pre-RISC simple CPUs are you thinking about? I think
| the key design features of RISC are no microcode, pipelined
| execution, usually one instruction per clock, a load/store
| architecture (no memory operands to ALU instructions), fixed-
| length instructions, and a large orthogonal general-purpose
| register set to compensate for the lack of memory operands.
|
| Maybe the CDC 6600 sort of fits, but it only had 8 60-bit
| operand registers, and they weren't orthogonal; you could
| load into 6 of them and store from 2 of them, and you
| couldn't use them for addressing (addressing was done with 16
| other 18-bit registers). It was only barely pipelined, though
| it was sort of superscalar. I do think it fulfills the "no
| microcode", "fixed-length instructions", and "usually one
| instruction per clock" desiderata -- but you can hardly say
| it was a "simple CPU", its design objective being to build
| "the largest computer in the world", and sporting 60-bit
| registers, hardware floating point (single and double
| precision), and a heavily multibanked memory system.
| Taniwha wrote:
| I'd include the PDP-8 and successors/cloners like the DG
| NOVA
| kragen wrote:
| The PDP-8 is certainly simple, but it has microcode,
| doesn't have pipelined execution, never approaches one
| instruction per clock, has memory operands to virtually
| every instruction (including an indirection bit, a mortal
| sin in RISC), and has a tiny register set. You could say
| its zero page is like a "large orthogonal general-purpose
| register set" but a all your ALU instructions have to
| involve a register which isn't in that set (the single
| accumulator) and b the current PC page is is just as
| accessible as the zero page. All it has in common with
| RISC is the fixed-width instruction set and a (very)
| small number of instruction formats.
|
| So I would say the PDP-8 is the opposite extreme from
| RISC.
|
| If you like the PDP-8 you should check out the 32-bit
| version: https://github.com/johnwcowan/pdp8x
|
| The NOVA is maybe a _little_ more plausible, since it at
| least has a load-store architecture and _four_
| accumulators instead of one, but it still doesn 't feel
| very RISCy to me. I mean it's maybe less RISCy than the
| 80386?
| rjsw wrote:
| The RISC-II thesis from 1983 references the 801, but it looks
| like IBM only started talking about it at around this time. The
| RISC-V project didn't start until 2010.
| klelatti wrote:
| There is a funny quote from David Patterson in his Oral
| History
|
| > There was the 801 project was secretive. There is something
| called the 801, "What is it?" It was very hard to know even
| what they were doing. It was supposed to be very exciting,
| very breakthrough, but unless John Cocke came and told you
| what they were doing, you couldn't figure out anything about
| it. So, there was that kind of the mystery of the 801.
|
| I think word did get out in the late 1970s but not
| officially!
| rjsw wrote:
| One of the references in the Katevenis thesis is "J. Cocke:
| Informal discussion on the IBM 801 mini-computer,
| U.C.Berkeley Campus, June 1983".
| klelatti wrote:
| From memory I think Cocke lectured at Berkeley (and maybe
| at Stanford too?)
| abudabi123 wrote:
| SC stood for Seymour Cray in RISC went one joke in his memory
| you can find on YT.
|
| https://en.wikipedia.org/wiki/Seymour_Cray
| panick21_ wrote:
| The early risk people gave a lot of credit to Crey.
| kragen wrote:
| It's surprising that you learned about RISC-V in technical
| school in the late 90s, since the RISC-V project began in 02010
| at Berkeley. You should see if you can get in touch with your
| instructor and help him fix his time machine.
| dboreham wrote:
| Perhaps parent meant MIPS.
| kragen wrote:
| Perhaps, but it would be a lot more interesting if it
| turned out his instructor _did_ know about RISC-V in the
| 90s!
| classichasclass wrote:
| > the rest of the world only knows about the history of RISC-V"
| which is the only RISC we learned about in technical school in
| the late 90's.
|
| This surprises me, especially since RISC-V wasn't really a
| thing until around 2010. Not MIPS, SPARC, PowerPC, PA-RISC,
| ARM, all extant by this period and having a longer history, but
| RISC-V? What school was this?
| cmrdporcupine wrote:
| I suspect the comment author typo'd and meant to write either
| MIPS or (more likely) Berkeley RISC.
| Zenst wrote:
| Interesting the IBMK ROMP https://en.wikipedia.org/wiki/IBM_ROMP
| Was a spin-off from the 801 project and was the CPU used in the
| IBM RT range and saw the birth of AIX.
| wrs wrote:
| For those who haven't heard of the RT, Bitsavers has a large
| hoard of docs. [0]
|
| I was at CMU when the Andrew project got early-access PC RTs
| and it was crazy secretive due to IBM's requirements. Special
| badge access, NDAs, machines chained to desks...very weird in a
| university environment. So it's not surprising there isn't a
| ton of public documentation of the 801.
|
| [0] http://www.bitsavers.org/pdf/ibm/pc/rt/
| jscipione wrote:
| Tangentially related to article, what happened to the riscv
| beagle board, is that still going to be available for purchase?
| drmpeg wrote:
| The chip it was going to use (the JH7100) never went into
| production and they had to cancel the project.
|
| There's a Kickstarter for the newer JH7110 chip.
|
| https://www.kickstarter.com/projects/starfive/visionfive-2
| monocasa wrote:
| Additionally Pine64 is putting out a board with the same
| JH7110 chip too.
| djmips wrote:
| As an aside, the America's cup anecdote is related incorrectly.
| "On August 22 [1851], the [U.S.-built schooner] America joined 14
| British ships for a regatta around the Isle of Wight." It won by
| a good margin and the Queen Victoria quote was at this race not
| at a later first America's Cup race. (named after the schooner
| and not the country).
|
| -> https://www.history.com/this-day-in-history/u-s-wins-
| first-a....
| klelatti wrote:
| Thanks! The version in the article is based on Cocke's own
| comments but I'll add a note to clarify.
| sennight wrote:
| A few years ago I dedicated a couple of days to finding any
| evidence of documentation for the 801 beyond a small number of
| well known academic papers, but found none. It really bothers me,
| probably to an unreasonable degree, how common it is for
| corporations to neglect such an easy win. IBM is really bad about
| it, especially considering how long of a history they have - just
| look at how anemic their public facing corporate archives page
| is. Yes, they had a long running tradition of publishing very low
| level papers in publicly accessible journals, a huge number of
| very important papers... but when they killed off their internal
| journals that served as the fountain head for the effort - their
| chosen steward quickly locked it all behind a paywall. Young
| engineers WANT to revere their elders while bringing forward the
| state of the art, and there have been some good efforts made in
| the collection of narratives, but engineers need work product
| (not English lit) in order to do the ancients justice. There
| aren't enough articles like this, so my thanks to the author -
| well done... but his job could easily be made so much less
| difficult by corporations taking advantage of something with such
| a long tail of public good will: preserving and then releasing
| obsolete work product.
| epc wrote:
| Much of IBM's internal informal history was lost in the
| transition from the RSCS based TOOLSRUN forums to a mix of
| Lotus Notes/Domino/web forums around 1999-2000 (note that this
| was a business choice, all internal fora were already available
| via nntp). IBM Research might have additional documentation on
| the 801 and related products but it looks like you'd have to
| contact their research library directly, nothing on line.
| sennight wrote:
| I wouldn't even dare to dream for version controlled
| source... I was over the moon when roughly dated personal
| backups of research edition Unix started popping up. From
| what I've seen in IBM documentation and a small number of
| code leaks - it would actually be very difficult to
| faithfully present their historical code simply due to the
| Rube Goldberg SCM they've employed. It was so wild that it
| permeated into their xcoff executable format.
| kragen wrote:
| From IBM management's point of view, I think, the easy win was
| that even though they discovered RISC in 01974 and shipped it
| in ROMP in 01981, their competition didn't ship RISC until
| 01985, in the form of the janky MIPS R2000; ARM 2 and Fujitsu's
| SPARC MB86900 didn't ship until 01986.
|
| Frank T. Cary and John R. Opel, who led IBM at the time,
| couldn't care less who young engineers in 02022 revere. Not
| only are they both dead, but also even at the time their
| objective was making a lot of money, not fostering engineering
| excellence, or garnering admiration for their minions.
| sennight wrote:
| I won't pretend to have enough familiarity with the goings on
| in long past IBM management to be able to state with any
| authority what they may or may not have thought, but I'd be
| surprised if your single-minded characterization was close to
| accurate. If you read through the very long list of IBM
| published papers - you'll see that a lot of them covered
| novel solutions to real problems of the day that IBM was
| selling solutions for, in enough detail to save potential
| competitors a lot of time. That is a calculated risk that
| they've made for a long time (until pretty recently). It
| isn't difficult making the business case for what I've
| proposed... rumor has it that even Intel has recognized how
| badly they screwed themselves over by treating senior
| engineers as poorly as they did in the late 90s.
|
| The whole imperative to recruit motivated talent isn't a new
| thing that started with Google's masseuses and sous-chefs,
| R&D heavy industries have been mindful of the advertising
| value in publishing for a long time.
| kragen wrote:
| Oh, I absolutely believe that IBM _researchers_ wanted to
| share their findings, and that IBM 's management was to
| some extent willing to tolerate them doing so. But I don't
| think IBM's management (post-Watson at least) saw that as a
| _benefit_ of having researchers, but rather a necessary
| cost, one they successfully curtailed during the 01970s.
| And I don 't think IBM's management cared one way or the
| other what engineers outside IBM thought of it.
|
| But none of them were my personal friends, so I might be
| inferring wrongly from their behavior. Their behavior was
| pretty egregious, though!
| klelatti wrote:
| Rereading this again I think I may have omitted one interesting
| point. That the 801 was, I think, one of the first computers with
| a split L1 cache.
|
| Will add a footnote in the absence of evidence to the contrary!
| twoodfin wrote:
| The basic architecture diagram for the 801 is fascinating. In
| one sense it's boring, because to a first approximation every
| modern computer is similar. On the other hand, it was the
| prototype for at least the subsequent 50 years of general
| purpose computing.
|
| From a high enough altitude, the only novelty in modern
| platforms is the GPU.
| klelatti wrote:
| Absolutely. The familiarity makes it hard to appreciate the
| novelty.
|
| I've looked at the architecture of some of IBM's minis of the
| era (and some of the more ambitions microprocessors - eg
| iAPX432) and they just look so strange in comparison.
| retrac wrote:
| I think it may have been the first microprocessor with an L1
| cache of any kind. I can't find any earlier chips with on-die
| cache, anyway. Even the instruction prefetch queues of the 8086
| and 68000 were still in the future when the IBM 801 went into
| manufacture.
|
| Another consequence of the RISC philosophy. After discarding
| all that die area for control logic, there's still some space
| even in a tight transistor budget to implement a cache. And
| since it's a clean, new architecture, you can impose
| restrictions on things like self-modifying code that would
| otherwise have made splitting the I/D caches more difficult.
| Cache coherency is for the compiler to worry about.
| monocasa wrote:
| To be fair, the 801 wasn't a microprocessor, and it's L1
| wasn't on die (because it's CPU wasn't a single die to begin
| with). It was implemented in MECL-10K which is at the sameish
| level of die integration as TTL, just a decent amount faster.
| kragen wrote:
| And there were earlier processors that had cache; WP cites
| the IBM 360/85 (01969) and the Atlas 2 (01964-01966) as the
| first computers with CPU cache.
|
| I'm curious when the first microprocessor with cache was,
| since the 801 wasn't a microprocessor; the article mentions
| the 68040 from 01990 with a split on-die L1 cache (4K+4K),
| but the 68030 in 01987 had 256B+256B, and the 68020 in
| 01984 had 256B icache. The 68010 had a 6-byte instruction
| "cache" which allowed it to execute two-instruction loops
| without fetching instructions, but I don't think that
| counts.
|
| I think maybe the RISC II (01983) or the RISC I (01982)
| might have had a cache before the 68020. Katevenis's
| dissertation is at
| https://archive.org/details/reducedinstructi0000kate but it
| isn't open access and I haven't read it. It had the
| overlapping register windows later popularized by SPARC,
| though, and we often think of a register window stack as a
| simpler (but less effective) alternative to a dcache.
|
| According to https://en.wikipedia.org/wiki/Transistor_count
| neither the ARM 2 (01986) nor the RISC-II-descended Fujitsu
| SPARC MB86900 (01986) had a cache.
| rjsw wrote:
| The Katevenis thesis describes another group at Berkeley
| designing an external I-cache chip to work with the RISC-
| II CPU, there are also a few pages analyzing how much
| difference a D-cache would make, doesn't look to me like
| RISC-II had either.
| kragen wrote:
| Thanks!
| fanf2 wrote:
| The first ARM with a cache was ARM3 (1989, 4kiB)
| kragen wrote:
| Thanks, Tony! That's what I thought.
|
| Given Sophie Wilson's concern at the time over the
| importance of memory bandwidth as a limiting factor in
| system efficiency, I wonder why they didn't at least
| include a 68010-like "loop mode", so you could eliminate
| the cost of instruction fetch in inner loops? My
| familiarity with ARM is not very good and so I'm not even
| sure if you can use LDM/STM to optimize memcpy but I'm
| sure it doesn't help with things like the FFT or string
| search.
| fanf2 wrote:
| Yes, LDM / STM were great for memcpy-like things (eg
| graphics). For other loops the usual technique was to use
| predicated instructions to avoid pipeline flushes, and
| loop unrolling.
|
| You would have to ask Sophie Wilson or Steve Furber why
| there is no loop cache; it's an interesting question. I
| guess there simply wasn't the transistor or complexity
| budget for it.
| AnimalMuppet wrote:
| Off topic: I've downvoted you about five times today, and
| I really ought to explain why. You have good content, and
| it pains me to have to downvote.
|
| I'm downvoting you because of the absurd thing you do
| with your dates.
|
| What's absurd about it?
|
| 1. It's not the standard way of writing dates.
| (Communication is about finding things understood the
| same way between speaker and listener; non-standard usage
| messes with that.)
|
| 2. Because of #1, it's harder to read. It _looks_ wrong.
| We have to take a half-second to think about it _every
| single time_. Multiply that by several posts a day, and
| also by hundreds of people reading each post, and it
| starts to add up to enough time to matter. This
| particular post is worse, because of all the CPU numbers
| that are five digits, and your year numbers start
| matching the pattern in our heads for CPU numbers.
|
| 3. It's trying to grind a Long Now axe, in a post that
| has nothing to do with anything related (other than
| having a year number in it). It's like, would you find it
| annoying if there was a zealous Christian on here, who
| had to work a reference to Jesus into every single post
| that mentioned philosophy or sociology? You might even
| find it annoying enough to start downvoting every time
| they did it.
|
| So, please. Stop. If you're talking only to Long Now
| people, use the Long Now date format. If you're trying to
| explain the Long Now to everyone else, explain the dates
| as part of that. But if you're talking to everyone else
| about the history of RISC, _just use everyone else 's
| date format._
| kragen wrote:
| > _Off topic: I 've downvoted you about five times today,
| and I really ought to explain why. You have good content_
|
| You don't, so I don't see the relevance of your opinion
| about how I ought to write good content.
|
| Write things your way, and I'll write them my way.
|
| At its best, this site is for writing good content, not
| spelling flames and personal attacks on nonconformists
| for being weird. Nonconformists being weird is why we
| have computers and the internet in the first place.
| [deleted]
| monocasa wrote:
| Huh, that's a great question. I'm certainly struggling to
| find something earlier than the 68020 as well.
|
| > We decided that RISC I should not be burdened with the
| design of a full-blown on-chip cache, but an instruction
| cache would definitely be a good idea for the next-
| generation RISC.[0]
|
| So it seems RISC-I didn't.
|
| Additionally a die shot[1], and understanding of the
| architecture of RISC-II (doubling down on the big window
| register file as a potential substitute for on die D
| cache) appears to imply that it doesn't have any on die
| cache, I or D. I certainly don't see any arrays of SRAM
| other than the register file.
|
| [0] - https://people.eecs.berkeley.edu/~kubitron/courses/
| cs252-F00...
|
| [1]-
| https://people.eecs.berkeley.edu/~pattrsn/Arch/RISC2.jpg
| kragen wrote:
| Thank you! I wasn't confident in understanding the die
| shots.
|
| With 40 years of hindsight I think we can probably
| conclude that caches are a better use of excess on-die
| transistors than register windows are? Even though
| register windows theoretically reduce the cost of
| subroutine calls and thus of good factoring.
|
| I feel like even a fairly small instruction cache, or
| explicitly-used instruction scratch memory, could have
| produced big payoffs in reducing instruction fetch cost,
| especially for RISC designs without hardware indirection?
| Like, if your compiler emitted explicit code at the
| beginning of every short leaf function to load it into
| instruction scratch memory, then run invoke it, instead
| of running it from RAM? At least if it contained a
| backward branch?
___________________________________________________________________
(page generated 2022-10-02 23:01 UTC)