hngopher.com

       [HN Gopher] Glacial - microcoded RISC-V core designed for low FP...
       ___________________________________________________________________
        
       Glacial - microcoded RISC-V core designed for low FPGA resource
       utilization
        
       Author : peter_d_sherman
       Score  : 39 points
       Date   : 2021-03-20 19:05 UTC (3 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | sprash wrote:
       | I never understood the hype around RISC-V. It is an ISA on the
       | level of a mediocre early 90s design and does not address any of
       | the problems we have today such as the memory latency bottleneck
       | and the resulting topology challenges. Several completely open
       | source designs are available that are vastly superior and real
       | world battle tested such as OpenSPARC-T2.
       | 
       | So why do we need RISC-V? Is it another case of NIHS?
        
         | duskwuff wrote:
         | Off the top of my head:
         | 
         | 1. RISC-V is _completely_ unencumbered from an IP perspective.
         | There is no possibility of a rightsholder reasserting rights on
         | IP they had previously released (like what happened with MIPS
         | in 2019).
         | 
         | 2. RISC-V is legacy-free. It's an extremely "clean" design,
         | free of weird quirks like the MIPS branch delay slot or SPARC
         | register windows.
         | 
         | 3. There are subsets of the RISC-V architecture defined for
         | different sizes of systems, e.g. 32/64 bit versions, an
         | embedded subset with fewer registers, etc. They all share an
         | instruction set and a general architecture, and most compilers
         | can target any subset. Some of the smaller subsets are well
         | within the realm of what a single student can be taught to
         | implement within a semester.
         | 
         | 4. Numerous real implementations of RISC-V exist -- both as
         | hardware and HDL -- are being maintained, and the hardware is
         | available on the open market.
        
       | retrac wrote:
       | Along the same lines of minimizing the amount of logic used at
       | the cost of cycles, there's SERV which uses a bit-serial
       | implementation with a 1-bit data path:
       | https://github.com/olofk/serv
       | 
       | From time to time, I have been tempted to design a RISC-V
       | implementation out of discrete 74xx components. Sure, there are
       | plenty of projects out there to build your own processor from
       | scratch like that, but most of them aren't LLVM targets!
       | 
       | The 32-bit datapaths and need for so many registers makes it a
       | bit daunting to approach directly. That approach would probably
       | end up similar in scale to a MIPS implementation I once saw done
       | like that. (Can't find the link, but it was about half a dozen
       | A4-sized PCBs).
       | 
       | Retreating to an 8-bit microcoded approach and lifting all the
       | registers and complexity into RAM and software is a very
       | attractive idea. Might even fit on a single Eurocard. It's not
       | like a discrete TTL RISC-V implementation would ever be a speed
       | demon, either way.
        
         | klelatti wrote:
         | Very interesting. Any sense of how many transistors / gates
         | would be a likely minimum needed for an RISC-V implementation
         | like this?
        
           | retrac wrote:
           | My last rough sketch came out to about 400 gates, excluding
           | the flip-flops.
           | 
           | If one cuts even more corners, that number could come down
           | much further. For example, an adder isn't actually necessary
           | and can be replaced with lookup tables and bit-twiddling,
           | again at the cost of cycles and more microcode.
           | 
           | See https://hackaday.io/project/161251-1-square-inch-ttl-cpu
           | -- while not RISC-V it demonstrates some of these principles
           | in action, taken to the extreme.
           | 
           | That design consists of: one 4 bit counter, four 8-bit flip-
           | flops, one quad OR gate, one dual 2-to-4 demux, and one 128
           | KB Flash ROM.
           | 
           | Including the flip-flops (and obviously excluding the Flash
           | memory) that comes out to about 200 or so gates by my count,
           | and the microcode/emulation program implements a fairly
           | typical CISC 16 bit processor. It's not even all that
           | inefficient, with under 100 cycles per instruction on
           | average.
        
             | klelatti wrote:
             | Thanks. As you say having LLVM be able to target a CPU
             | built with so few gates is pretty remarkable!
        
         | monocasa wrote:
         | If you don't know of it already, you might like the book Bit-
         | Slice Microprocessor Design written by Mick and Brick. It's
         | written to be very AM2900 specific, but a lot of the techniques
         | would apply to microcoded TTL processors with just a little
         | more work on your end. And it really does a good job of
         | exploring the space of microcoded minicomputer design in an
         | interesting way.
        
       | mechagodzilla wrote:
       | But what's the resource utilization??
        
         | monocasa wrote:
         | Reading through it, it's making the right tradeoffs for good
         | utilization. Basically trading LUTs for BROM, which is what
         | you'd want at this level.
         | 
         | I haven't synthesized it though, so I can't say for sure.
        
       | kiwidrew wrote:
       | I'm surprised that there aren't any specialised instructions or
       | hardware resources to handle the RISC-V instruction
       | decoding/dispatching. [1]
       | 
       | Like, sure, it's not meant to be a fast implementation, but even
       | just a "mask byte with 0x7C and set PC to that value times 8"
       | instruction (which in an FPGA implementation is just rearranging
       | the wires) could save 5-6 cycles per instruction.
       | 
       | Is it really "microcoded" when all you're doing is writing a
       | RISC-V emulator that runs on what looks to be a fairly standard 8
       | bit CPU?
       | 
       | [1]
       | https://github.com/brouhaha/glacial/blob/master/ucode/ucode....
        
       ___________________________________________________________________
       (page generated 2021-03-20 23:00 UTC)