[HN Gopher] The First RISC: John Cocke and the IBM 801
       ___________________________________________________________________
        
       The First RISC: John Cocke and the IBM 801
        
       Author : klelatti
       Score  : 58 points
       Date   : 2022-10-02 12:33 UTC (10 hours ago)
        
 (HTM) web link (thechipletter.substack.com)
 (TXT) w3m dump (thechipletter.substack.com)
        
       | dosman33 wrote:
       | When I worked at IBM in the early 2000's I occasionally remember
       | hearing people say "IBM invented RISC!". As a young pup I knew to
       | keep my mouth shut, but in my head I was like "best keep that
       | news to yourself pop, the rest of the world only knows about the
       | history of RISC-V" which is the only RISC we learned about in
       | technical school in the late 90's. I didn't doubt IBM did
       | something similar early on though, but it just came across to me
       | as another funny IBM community knowledge thing that was a little
       | off-base which wasn't too uncommon anyway. Obviously the IBM old
       | timers were not off-base though.
       | 
       | But it is nice to see history recognizing earlier pioneers in the
       | field like John Cocke. And this article also demonstrates
       | something I found perplexing as a young pup: IBM's marketing
       | department dictated what engineering did. Obviously the point of
       | the company is to make solutions that customers want to pay money
       | for. Just because engineering has a better solution doesn't mean
       | it's the best fit for customers (or the most profitable solution
       | to the problem). But with my head stuck in FOSS philosophy of the
       | day it was a good education in real life.
        
         | klelatti wrote:
         | Funnily enough, I worked for IBM marketing (my first job, for a
         | short while) in 1984.
         | 
         | One of our jobs was to demo System/36 - one of IBM's minis -
         | which might have been replaced by an 801 system had they taken
         | it up.
         | 
         | One of my colleagues went through the demo with a room of
         | customers. At the end he asked for questions. Straight away a
         | hand went up.
         | 
         | "If it's this slow with one user, how slow is it with five?"
         | 
         | Sometimes engineering does know best!
        
         | adrian_b wrote:
         | Actually in this case IBM really has the right to brag about
         | it, because there is no doubt that IBM 801 included the first
         | RISC CPU, even if the name RISC was invented only several years
         | later.
         | 
         | Before IBM 801, there have been many simple CPUs, which, if
         | they would be launched today, they would be called RISC CPUs,
         | due to their simplicity.
         | 
         | However for those old CPUs the simplicity was not intentional,
         | but it was determined by the limitations of their manufacturing
         | technology.
         | 
         | On the other hand, the IBM 801 project, which has started in
         | 1975, significant documentation about its principles of
         | operation has been published during 1976 and a working
         | prototype existed in 1978, was a project whose target was to
         | intentionally reverse the trend towards more and more complex
         | CPUs and investigate the ways by which greater simplicity may
         | be in fact a method for reaching a higher performance.
         | 
         | All the principles for obtaining high performance with a
         | simpler CPU were discussed in the IBM 801 documentation 4 to 5
         | years before the RISC project at Berkeley (1980 to 1984) and
         | the MIPS project at Stanford (1981 to 1984).
         | 
         | Moreover, in March 1982 there has been a very important
         | symposium where all the teams working at that time on RISC CPUs
         | have presented their work, including the Berkeley RISC,
         | Stanford MIPS and IBM 801 projects.
         | 
         | The presentations from that symposium have determined the
         | designers of the future ARM CPU to change their architecture
         | from some kind of 6502 extension to a RISC architecture. They
         | have been particularly impressed by the IBM 801 presentation,
         | so instead of using weak addressing modes, like in the RISC and
         | MIPS projects, they have included in ARM the much more powerful
         | IBM 801 addressing modes (i.e. including modes with base
         | register auto update).
         | 
         | While John Cocke has not invented the word RISC when designing
         | IBM 801, either he or another member of his team (which worked
         | at the project that eventually became IBM POWER) has invented
         | the word "superscalar", a few years later, in 1987, when the
         | design target for CPUs had moved from being able to execute one
         | instruction in a single clock cycle, like in the early RISC
         | CPUs, to being able to execute multiple instructions per clock
         | cycle.
        
           | cachvico wrote:
           | > Moreover, in March 1982 there has been a very important
           | symposium where all the teams working at that time on RISC
           | CPUs have presented their work, including the Berkeley RISC,
           | Stanford MIPS and IBM 801 projects.
           | 
           | > The presentations from that symposium have determined the
           | designers of the future ARM CPU to change their architecture
           | from some kind of 6502 extension to a RISC architecture.
           | 
           | That is absolutely fascinating. Are the papers from that
           | conference available online?
        
           | kragen wrote:
           | Which pre-RISC simple CPUs are you thinking about? I think
           | the key design features of RISC are no microcode, pipelined
           | execution, usually one instruction per clock, a load/store
           | architecture (no memory operands to ALU instructions), fixed-
           | length instructions, and a large orthogonal general-purpose
           | register set to compensate for the lack of memory operands.
           | 
           | Maybe the CDC 6600 sort of fits, but it only had 8 60-bit
           | operand registers, and they weren't orthogonal; you could
           | load into 6 of them and store from 2 of them, and you
           | couldn't use them for addressing (addressing was done with 16
           | other 18-bit registers). It was only barely pipelined, though
           | it was sort of superscalar. I do think it fulfills the "no
           | microcode", "fixed-length instructions", and "usually one
           | instruction per clock" desiderata -- but you can hardly say
           | it was a "simple CPU", its design objective being to build
           | "the largest computer in the world", and sporting 60-bit
           | registers, hardware floating point (single and double
           | precision), and a heavily multibanked memory system.
        
             | Taniwha wrote:
             | I'd include the PDP-8 and successors/cloners like the DG
             | NOVA
        
               | kragen wrote:
               | The PDP-8 is certainly simple, but it has microcode,
               | doesn't have pipelined execution, never approaches one
               | instruction per clock, has memory operands to virtually
               | every instruction (including an indirection bit, a mortal
               | sin in RISC), and has a tiny register set. You could say
               | its zero page is like a "large orthogonal general-purpose
               | register set" but a all your ALU instructions have to
               | involve a register which isn't in that set (the single
               | accumulator) and b the current PC page is is just as
               | accessible as the zero page. All it has in common with
               | RISC is the fixed-width instruction set and a (very)
               | small number of instruction formats.
               | 
               | So I would say the PDP-8 is the opposite extreme from
               | RISC.
               | 
               | If you like the PDP-8 you should check out the 32-bit
               | version: https://github.com/johnwcowan/pdp8x
               | 
               | The NOVA is maybe a _little_ more plausible, since it at
               | least has a load-store architecture and _four_
               | accumulators instead of one, but it still doesn 't feel
               | very RISCy to me. I mean it's maybe less RISCy than the
               | 80386?
        
         | rjsw wrote:
         | The RISC-II thesis from 1983 references the 801, but it looks
         | like IBM only started talking about it at around this time. The
         | RISC-V project didn't start until 2010.
        
           | klelatti wrote:
           | There is a funny quote from David Patterson in his Oral
           | History
           | 
           | > There was the 801 project was secretive. There is something
           | called the 801, "What is it?" It was very hard to know even
           | what they were doing. It was supposed to be very exciting,
           | very breakthrough, but unless John Cocke came and told you
           | what they were doing, you couldn't figure out anything about
           | it. So, there was that kind of the mystery of the 801.
           | 
           | I think word did get out in the late 1970s but not
           | officially!
        
             | rjsw wrote:
             | One of the references in the Katevenis thesis is "J. Cocke:
             | Informal discussion on the IBM 801 mini-computer,
             | U.C.Berkeley Campus, June 1983".
        
               | klelatti wrote:
               | From memory I think Cocke lectured at Berkeley (and maybe
               | at Stanford too?)
        
           | abudabi123 wrote:
           | SC stood for Seymour Cray in RISC went one joke in his memory
           | you can find on YT.
           | 
           | https://en.wikipedia.org/wiki/Seymour_Cray
        
         | panick21_ wrote:
         | The early risk people gave a lot of credit to Crey.
        
         | kragen wrote:
         | It's surprising that you learned about RISC-V in technical
         | school in the late 90s, since the RISC-V project began in 02010
         | at Berkeley. You should see if you can get in touch with your
         | instructor and help him fix his time machine.
        
           | dboreham wrote:
           | Perhaps parent meant MIPS.
        
             | kragen wrote:
             | Perhaps, but it would be a lot more interesting if it
             | turned out his instructor _did_ know about RISC-V in the
             | 90s!
        
         | classichasclass wrote:
         | > the rest of the world only knows about the history of RISC-V"
         | which is the only RISC we learned about in technical school in
         | the late 90's.
         | 
         | This surprises me, especially since RISC-V wasn't really a
         | thing until around 2010. Not MIPS, SPARC, PowerPC, PA-RISC,
         | ARM, all extant by this period and having a longer history, but
         | RISC-V? What school was this?
        
           | cmrdporcupine wrote:
           | I suspect the comment author typo'd and meant to write either
           | MIPS or (more likely) Berkeley RISC.
        
       | Zenst wrote:
       | Interesting the IBMK ROMP https://en.wikipedia.org/wiki/IBM_ROMP
       | Was a spin-off from the 801 project and was the CPU used in the
       | IBM RT range and saw the birth of AIX.
        
         | wrs wrote:
         | For those who haven't heard of the RT, Bitsavers has a large
         | hoard of docs. [0]
         | 
         | I was at CMU when the Andrew project got early-access PC RTs
         | and it was crazy secretive due to IBM's requirements. Special
         | badge access, NDAs, machines chained to desks...very weird in a
         | university environment. So it's not surprising there isn't a
         | ton of public documentation of the 801.
         | 
         | [0] http://www.bitsavers.org/pdf/ibm/pc/rt/
        
       | jscipione wrote:
       | Tangentially related to article, what happened to the riscv
       | beagle board, is that still going to be available for purchase?
        
         | drmpeg wrote:
         | The chip it was going to use (the JH7100) never went into
         | production and they had to cancel the project.
         | 
         | There's a Kickstarter for the newer JH7110 chip.
         | 
         | https://www.kickstarter.com/projects/starfive/visionfive-2
        
           | monocasa wrote:
           | Additionally Pine64 is putting out a board with the same
           | JH7110 chip too.
        
       | djmips wrote:
       | As an aside, the America's cup anecdote is related incorrectly.
       | "On August 22 [1851], the [U.S.-built schooner] America joined 14
       | British ships for a regatta around the Isle of Wight." It won by
       | a good margin and the Queen Victoria quote was at this race not
       | at a later first America's Cup race. (named after the schooner
       | and not the country).
       | 
       | -> https://www.history.com/this-day-in-history/u-s-wins-
       | first-a....
        
         | klelatti wrote:
         | Thanks! The version in the article is based on Cocke's own
         | comments but I'll add a note to clarify.
        
       | sennight wrote:
       | A few years ago I dedicated a couple of days to finding any
       | evidence of documentation for the 801 beyond a small number of
       | well known academic papers, but found none. It really bothers me,
       | probably to an unreasonable degree, how common it is for
       | corporations to neglect such an easy win. IBM is really bad about
       | it, especially considering how long of a history they have - just
       | look at how anemic their public facing corporate archives page
       | is. Yes, they had a long running tradition of publishing very low
       | level papers in publicly accessible journals, a huge number of
       | very important papers... but when they killed off their internal
       | journals that served as the fountain head for the effort - their
       | chosen steward quickly locked it all behind a paywall. Young
       | engineers WANT to revere their elders while bringing forward the
       | state of the art, and there have been some good efforts made in
       | the collection of narratives, but engineers need work product
       | (not English lit) in order to do the ancients justice. There
       | aren't enough articles like this, so my thanks to the author -
       | well done... but his job could easily be made so much less
       | difficult by corporations taking advantage of something with such
       | a long tail of public good will: preserving and then releasing
       | obsolete work product.
        
         | epc wrote:
         | Much of IBM's internal informal history was lost in the
         | transition from the RSCS based TOOLSRUN forums to a mix of
         | Lotus Notes/Domino/web forums around 1999-2000 (note that this
         | was a business choice, all internal fora were already available
         | via nntp). IBM Research might have additional documentation on
         | the 801 and related products but it looks like you'd have to
         | contact their research library directly, nothing on line.
        
           | sennight wrote:
           | I wouldn't even dare to dream for version controlled
           | source... I was over the moon when roughly dated personal
           | backups of research edition Unix started popping up. From
           | what I've seen in IBM documentation and a small number of
           | code leaks - it would actually be very difficult to
           | faithfully present their historical code simply due to the
           | Rube Goldberg SCM they've employed. It was so wild that it
           | permeated into their xcoff executable format.
        
         | kragen wrote:
         | From IBM management's point of view, I think, the easy win was
         | that even though they discovered RISC in 01974 and shipped it
         | in ROMP in 01981, their competition didn't ship RISC until
         | 01985, in the form of the janky MIPS R2000; ARM 2 and Fujitsu's
         | SPARC MB86900 didn't ship until 01986.
         | 
         | Frank T. Cary and John R. Opel, who led IBM at the time,
         | couldn't care less who young engineers in 02022 revere. Not
         | only are they both dead, but also even at the time their
         | objective was making a lot of money, not fostering engineering
         | excellence, or garnering admiration for their minions.
        
           | sennight wrote:
           | I won't pretend to have enough familiarity with the goings on
           | in long past IBM management to be able to state with any
           | authority what they may or may not have thought, but I'd be
           | surprised if your single-minded characterization was close to
           | accurate. If you read through the very long list of IBM
           | published papers - you'll see that a lot of them covered
           | novel solutions to real problems of the day that IBM was
           | selling solutions for, in enough detail to save potential
           | competitors a lot of time. That is a calculated risk that
           | they've made for a long time (until pretty recently). It
           | isn't difficult making the business case for what I've
           | proposed... rumor has it that even Intel has recognized how
           | badly they screwed themselves over by treating senior
           | engineers as poorly as they did in the late 90s.
           | 
           | The whole imperative to recruit motivated talent isn't a new
           | thing that started with Google's masseuses and sous-chefs,
           | R&D heavy industries have been mindful of the advertising
           | value in publishing for a long time.
        
             | kragen wrote:
             | Oh, I absolutely believe that IBM _researchers_ wanted to
             | share their findings, and that IBM 's management was to
             | some extent willing to tolerate them doing so. But I don't
             | think IBM's management (post-Watson at least) saw that as a
             | _benefit_ of having researchers, but rather a necessary
             | cost, one they successfully curtailed during the 01970s.
             | And I don 't think IBM's management cared one way or the
             | other what engineers outside IBM thought of it.
             | 
             | But none of them were my personal friends, so I might be
             | inferring wrongly from their behavior. Their behavior was
             | pretty egregious, though!
        
       | klelatti wrote:
       | Rereading this again I think I may have omitted one interesting
       | point. That the 801 was, I think, one of the first computers with
       | a split L1 cache.
       | 
       | Will add a footnote in the absence of evidence to the contrary!
        
         | twoodfin wrote:
         | The basic architecture diagram for the 801 is fascinating. In
         | one sense it's boring, because to a first approximation every
         | modern computer is similar. On the other hand, it was the
         | prototype for at least the subsequent 50 years of general
         | purpose computing.
         | 
         | From a high enough altitude, the only novelty in modern
         | platforms is the GPU.
        
           | klelatti wrote:
           | Absolutely. The familiarity makes it hard to appreciate the
           | novelty.
           | 
           | I've looked at the architecture of some of IBM's minis of the
           | era (and some of the more ambitions microprocessors - eg
           | iAPX432) and they just look so strange in comparison.
        
         | retrac wrote:
         | I think it may have been the first microprocessor with an L1
         | cache of any kind. I can't find any earlier chips with on-die
         | cache, anyway. Even the instruction prefetch queues of the 8086
         | and 68000 were still in the future when the IBM 801 went into
         | manufacture.
         | 
         | Another consequence of the RISC philosophy. After discarding
         | all that die area for control logic, there's still some space
         | even in a tight transistor budget to implement a cache. And
         | since it's a clean, new architecture, you can impose
         | restrictions on things like self-modifying code that would
         | otherwise have made splitting the I/D caches more difficult.
         | Cache coherency is for the compiler to worry about.
        
           | monocasa wrote:
           | To be fair, the 801 wasn't a microprocessor, and it's L1
           | wasn't on die (because it's CPU wasn't a single die to begin
           | with). It was implemented in MECL-10K which is at the sameish
           | level of die integration as TTL, just a decent amount faster.
        
             | kragen wrote:
             | And there were earlier processors that had cache; WP cites
             | the IBM 360/85 (01969) and the Atlas 2 (01964-01966) as the
             | first computers with CPU cache.
             | 
             | I'm curious when the first microprocessor with cache was,
             | since the 801 wasn't a microprocessor; the article mentions
             | the 68040 from 01990 with a split on-die L1 cache (4K+4K),
             | but the 68030 in 01987 had 256B+256B, and the 68020 in
             | 01984 had 256B icache. The 68010 had a 6-byte instruction
             | "cache" which allowed it to execute two-instruction loops
             | without fetching instructions, but I don't think that
             | counts.
             | 
             | I think maybe the RISC II (01983) or the RISC I (01982)
             | might have had a cache before the 68020. Katevenis's
             | dissertation is at
             | https://archive.org/details/reducedinstructi0000kate but it
             | isn't open access and I haven't read it. It had the
             | overlapping register windows later popularized by SPARC,
             | though, and we often think of a register window stack as a
             | simpler (but less effective) alternative to a dcache.
             | 
             | According to https://en.wikipedia.org/wiki/Transistor_count
             | neither the ARM 2 (01986) nor the RISC-II-descended Fujitsu
             | SPARC MB86900 (01986) had a cache.
        
               | rjsw wrote:
               | The Katevenis thesis describes another group at Berkeley
               | designing an external I-cache chip to work with the RISC-
               | II CPU, there are also a few pages analyzing how much
               | difference a D-cache would make, doesn't look to me like
               | RISC-II had either.
        
               | kragen wrote:
               | Thanks!
        
               | fanf2 wrote:
               | The first ARM with a cache was ARM3 (1989, 4kiB)
        
               | kragen wrote:
               | Thanks, Tony! That's what I thought.
               | 
               | Given Sophie Wilson's concern at the time over the
               | importance of memory bandwidth as a limiting factor in
               | system efficiency, I wonder why they didn't at least
               | include a 68010-like "loop mode", so you could eliminate
               | the cost of instruction fetch in inner loops? My
               | familiarity with ARM is not very good and so I'm not even
               | sure if you can use LDM/STM to optimize memcpy but I'm
               | sure it doesn't help with things like the FFT or string
               | search.
        
               | fanf2 wrote:
               | Yes, LDM / STM were great for memcpy-like things (eg
               | graphics). For other loops the usual technique was to use
               | predicated instructions to avoid pipeline flushes, and
               | loop unrolling.
               | 
               | You would have to ask Sophie Wilson or Steve Furber why
               | there is no loop cache; it's an interesting question. I
               | guess there simply wasn't the transistor or complexity
               | budget for it.
        
               | AnimalMuppet wrote:
               | Off topic: I've downvoted you about five times today, and
               | I really ought to explain why. You have good content, and
               | it pains me to have to downvote.
               | 
               | I'm downvoting you because of the absurd thing you do
               | with your dates.
               | 
               | What's absurd about it?
               | 
               | 1. It's not the standard way of writing dates.
               | (Communication is about finding things understood the
               | same way between speaker and listener; non-standard usage
               | messes with that.)
               | 
               | 2. Because of #1, it's harder to read. It _looks_ wrong.
               | We have to take a half-second to think about it _every
               | single time_. Multiply that by several posts a day, and
               | also by hundreds of people reading each post, and it
               | starts to add up to enough time to matter. This
               | particular post is worse, because of all the CPU numbers
               | that are five digits, and your year numbers start
               | matching the pattern in our heads for CPU numbers.
               | 
               | 3. It's trying to grind a Long Now axe, in a post that
               | has nothing to do with anything related (other than
               | having a year number in it). It's like, would you find it
               | annoying if there was a zealous Christian on here, who
               | had to work a reference to Jesus into every single post
               | that mentioned philosophy or sociology? You might even
               | find it annoying enough to start downvoting every time
               | they did it.
               | 
               | So, please. Stop. If you're talking only to Long Now
               | people, use the Long Now date format. If you're trying to
               | explain the Long Now to everyone else, explain the dates
               | as part of that. But if you're talking to everyone else
               | about the history of RISC, _just use everyone else 's
               | date format._
        
               | kragen wrote:
               | > _Off topic: I 've downvoted you about five times today,
               | and I really ought to explain why. You have good content_
               | 
               | You don't, so I don't see the relevance of your opinion
               | about how I ought to write good content.
               | 
               | Write things your way, and I'll write them my way.
               | 
               | At its best, this site is for writing good content, not
               | spelling flames and personal attacks on nonconformists
               | for being weird. Nonconformists being weird is why we
               | have computers and the internet in the first place.
        
               | [deleted]
        
               | monocasa wrote:
               | Huh, that's a great question. I'm certainly struggling to
               | find something earlier than the 68020 as well.
               | 
               | > We decided that RISC I should not be burdened with the
               | design of a full-blown on-chip cache, but an instruction
               | cache would definitely be a good idea for the next-
               | generation RISC.[0]
               | 
               | So it seems RISC-I didn't.
               | 
               | Additionally a die shot[1], and understanding of the
               | architecture of RISC-II (doubling down on the big window
               | register file as a potential substitute for on die D
               | cache) appears to imply that it doesn't have any on die
               | cache, I or D. I certainly don't see any arrays of SRAM
               | other than the register file.
               | 
               | [0] - https://people.eecs.berkeley.edu/~kubitron/courses/
               | cs252-F00...
               | 
               | [1]-
               | https://people.eecs.berkeley.edu/~pattrsn/Arch/RISC2.jpg
        
               | kragen wrote:
               | Thank you! I wasn't confident in understanding the die
               | shots.
               | 
               | With 40 years of hindsight I think we can probably
               | conclude that caches are a better use of excess on-die
               | transistors than register windows are? Even though
               | register windows theoretically reduce the cost of
               | subroutine calls and thus of good factoring.
               | 
               | I feel like even a fairly small instruction cache, or
               | explicitly-used instruction scratch memory, could have
               | produced big payoffs in reducing instruction fetch cost,
               | especially for RISC designs without hardware indirection?
               | Like, if your compiler emitted explicit code at the
               | beginning of every short leaf function to load it into
               | instruction scratch memory, then run invoke it, instead
               | of running it from RAM? At least if it contained a
               | backward branch?
        
       ___________________________________________________________________
       (page generated 2022-10-02 23:01 UTC)