[HN Gopher] Show HN: Minimax - A Compressed-First, Microcoded RI...
___________________________________________________________________
Show HN: Minimax - A Compressed-First, Microcoded RISC-V CPU
RISC-V's compressed instruction (RVC) extension is intended as an
add-on to the regular, 32-bit instruction set, not a replacement or
competitor. Its designers intended RVC instructions to be expanded
into regular 32-bit RV32I equivalents via a pre-decoder. What
happens if we explicitly architect a RISC-V CPU to execute RVC
instructions, and "mop up" any RV32I instructions that aren't
convenient via a microcode layer? What architectural optimizations
are unlocked as a result? "Minimax" is an experimental RISC-V
implementation intended to establish if an RVC-optimized CPU is, in
practice, any simpler than an ordinary RV32I core with pre-decoder.
While it passes a modest test suite, you should not use it without
caution. (There are a large number of excellent, open source,
"little" RISC-V implementations you should probably use reach for
first.)
Author : gsmecher
Score : 103 points
Date : 2022-11-01 15:41 UTC (7 hours ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| thrtythreeforty wrote:
| This is very impressive, especially the performance per LUT! Did
| I overlook frequency spec on a given target or did you not
| specify?
|
| Will the execute stage pipeline effectively to reach higher
| f_max? (Of course there will be a small logic penalty, and a
| larger FF penalty, but the core is small enough that it would
| probably be tolerable.) Or is the core's whole architecture
| predicated on a two stage design?
| gsmecher wrote:
| This core is targeted at "smaller-is-better" applications with
| few actual instruction-throughput requirements. If it reaches
| 200 MHz on a Xilinx KU060, I will be delighted. (That specific
| clock frequency on that specific part carries heavy hints about
| what this core is intended for.)
|
| With that in mind: the single instruction-per-clock design is
| for simplicity's sake, not performance's sake. If the execution
| stage were pipelined, it'd be a different core. If performance
| is the goal, I'd start by ripping out some of the details that
| distinguish this core from other (excellent) RISC-V cores.
| varispeed wrote:
| KU060 costs a nice sum of PS4,529.10 on Mouser (out of stock
| of course)
| Teknoman117 wrote:
| > out of stock of course
|
| I picked probably the worst time imaginable to get into
| FPGAs. All of my "higher" end stuff is repurposed mining
| hardware...
| thatcherc wrote:
| > 200 MHz on a Xilinx KU060
|
| > (That specific clock frequency on that specific part
| carries heavy hints about what this core is intended for.)
|
| Fun clue! Looks like the Xilinx KU060 is a rad-hard FPGA for
| space applications. Does anyone know what 200 MHz might
| imply? Comms maybe?
| gaudat wrote:
| Poor man's Tile64?
| cmrdporcupine wrote:
| This is very nice. A couple years ago I was playing around with a
| hobby project I was dubbing "Retro-V" which was to be a RISC-V
| core tied to a 1980s-style display processor and keyboard/mouse
| input on a small FGPA and 512k or 1MB or so of SRAM. I was using
| PicoRV32 for that, but this would have been be far better.
| drh wrote:
| Sounds interesting! What were you using for the display
| processor?
| cmrdporcupine wrote:
| I was hand-rolling my own. I had it doing a basic 640x480
| buffer with some basic character generation and sprite
| support & HDMI/DVI output
|
| These days I'd probably consider forking my friend Randy's
| C64 VICII implementation (VIC-II Kawari) and just expand
| framebuffer size, sprites, colours, etc, since he put so much
| work into it.
|
| It was a lot of fun, but I got stalled on the SD card
| interface. That was more complexity than I felt with dealing
| at that point. And I was working at Google at the time and so
| they owned all my thoughts and deeds and going through the
| open sourcing process for it would have been a hassle. If I
| wasn't hunting for work and needing to make $$ right now, I'd
| pick it up again maybe? Was more of a verilog learning
| process.
| gsmecher wrote:
| PicoRV32 and FemtoRV32 are both excellent, conventional RISC-V
| implementations, and are more complete and proven than Minimax.
| Relative to the size of any 7-series or newer Xilinx FPGA, the
| difference in LUT cost between any of the three is pretty
| minor. I think you made a perfectly defensible decision. (I
| love me some SERV, too, and if you are willing to spend
| orthodoxy to save gates, it's an excellent choice too.)
| cmrdporcupine wrote:
| Yes, PicoRV32 is very nice. However for what I was building,
| with limited RAM, compressed instructions would have made a
| lot of sense. I started porting a BASIC to my system (in C),
| and it quite easily would have filled almost the whole 512kB
| SRAM.
|
| And the thought of handwriting one in RISC-V assembly
| convinced me that maybe RISC-V wasn't as "retro friendly" as
| I would have liked.
| gsmecher wrote:
| Understood. Maybe this landed after your project - but both
| PicoRV32 and SERV now support compressed extensions, at
| some additional resource cost. FemtoRV32 Quark doesn't -
| which is not a knock, since it's a beautifully simple
| implementation and that's the point.
|
| The retrocomputing scene looks like a ton of fun and I'd be
| delighted if any of my work is used there.
| cmrdporcupine wrote:
| Ah, yes, this was 2018/19, in the Before Times, and I
| don't recall if PicoRV32 had compressed yet but I don't
| think it did.
|
| SERV always looked intriguing, too. Though I recall maybe
| its build process was a hassle.
|
| Anyways, this is neat, keep on keeping on! I'm just a
| software guy, so I remain amazed by the world at the gate
| level and what it can do. Entirely different kind of
| abstraction building.
| tomcam wrote:
| > RISC-V's compressed instruction (RVC) extension is intended as
| an add-on
|
| Doesn't it make this... an IISC? Increased instruction set?
| Asking for a friend
| znwu wrote:
| RISC no longer has the clear border as it had 30 years ago.
| Nowadays RISC just means an ISA has most of the following
| points: 1. Load/Store architecture 2. Fixed-length instructions
| or few length variations. 3. Highly uniform instruction
| encoding. 4. Mostly single-operation instructions.
|
| These four points all have direct benefits on hardware design.
| And compressed ISA like RVC and Thumb checks them all.
|
| On the contrary, "fewer instruction types", "orthognoal
| instructions" never had any real benefit beyond perceptual
| aesthetics, so as a result they are long abandoned.
| [deleted]
| sterlind wrote:
| the actual Verilog source is incredibly small. I would have
| thought that implementing a CPU, even a toy one, would take more
| than 500 lines. is this normal for hardware?
| nine_k wrote:
| I suspect some heavier lifting is done here:
| use ieee.std_logic_1164.all; use ieee.numeric_std.all;
|
| It looks that the VHDL source is about instruction decoding,
| registers, etc, but does not include things like ALU logic. (I
| don't know VHDL actually.)
| robinsonb5 wrote:
| Those two lines are just the VHDL equivalent of #include
| <stdio.h> - i.e. boilerplate that you'll see in almost every
| source file.
|
| But it's true that you don't have to describe the ALU down to
| the bit level - thanks to those two lines you can say "q <=
| d1 + d2" instead of having to build an adder at the gate
| level. (Though you can, of course, do that if you really want
| to!)
| gsmecher wrote:
| What you see is all there is.
|
| At a certain scale, it's conventional for hardware designs to
| become complex enough that it's necessary to structure them in
| hierarchies, just to maintain control. This design is small
| enough that none of the extra structure is essential.
|
| It's possible to be incredibly expressive in Verilog and VHDL.
| This implementation is written in VHDL, which has an outdated
| reputation for being long-winded.
|
| Also worth a look: FemtoRV32 Quark [0], which is written in
| Verilog.
|
| [0]: https://github.com/BrunoLevy/learn-
| fpga/blob/master/FemtoRV/...
| robinsonb5 wrote:
| Have you seen the OPC series of CPUs? (One Page Computing -
| the challenge being to keep the code small enough to be
| printed onto a single sheet of line printer paper!)
| gsmecher wrote:
| Yup! Thanks for pointing OPC [0] out. These CPUs were a
| huge eye-opener - and a huge lesson about the value of
| using a standardized instruction set.
|
| Building a custom CPU commits you to writing an assembler
| and listing generator - which is a good hobby-project job
| for one person who's handy with Python. After stumbling
| through those foothills, though, I found myself at the base
| of some very steep, scary GCC/binutils cliffs wondering how
| I could have gotten so lost, so far from home.
|
| Even if all RISC-V does is offer a bunch of arbitrary
| answers to arbitrary design questions, I consider it a
| massive win.
|
| [0]: https://revaldinho.github.io/opc/
| robinsonb5 wrote:
| That is very, cool. I'm particularly interested in the
| compressed-first approach because I have some projects where
| minimising BRAM usage is paramount so code density really
| matters. The use of microcode to emulate 32-bit instructions
| reminds me a lot of ZPU (I still have a soft spot for that
| architecture) - was that an influence?
| downvotetruth wrote:
| Can the address and/or data also be 16 bit or would that violate
| RISC-V spec?
| snvzz wrote:
| AIUI the registers and operations with them should be 32bit for
| RV32I.
|
| The bus is up to you... should you want a 8bit data bus and 16
| bit address bus, I don't think the spec cares.
|
| This is akin to 68020 (32bit ISA) vs 68000 (still 32bit ISA) or
| 68008 (still 32bit ISA).
| gsmecher wrote:
| I don't think the RISC-V spec cares, either, since it
| specifies an execution environment but not interfaces.
|
| A narrower data bus would allow a 2-cycle execution path, and
| would likely split the longest combinatorial path in the
| current design (which certainly goes through the adder tree.)
| This could be either an 0.5 instruction-per-clock (IPC)
| design, or a pipelined design that maintains 1 IPC at the
| expense of extra pipeline hazards and corresponding bubbles.
|
| A narrower address seems like it's only helpful as a knock-on
| to a split data bus.
|
| Gut feeling: I doubt that splitting the data or address buses
| into additional phases would actually save resources. You
| would certainly need more flip-flops to maintain state, and
| more LUTs to manage combinational paths across the two
| execution stages. While you can sometimes add complexity and
| "win back" gates, it's an approach with limits. If you
| compare SERV's resource usage to FemtoRV32-Quark's, it's
| notable how much additional state (flip-flops) SERV "spends"
| to reduce its combinatorial logic (LUT) footprint.
___________________________________________________________________
(page generated 2022-11-01 23:01 UTC)