[HN Gopher] Show HN: RISC-V core written in 600 lines of C89
___________________________________________________________________
Show HN: RISC-V core written in 600 lines of C89
Author : mnurzia
Score : 145 points
Date : 2023-06-10 13:08 UTC (9 hours ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| aportnoy wrote:
| How about a RISC-V disassembler in 200 lines of C99?
|
| https://github.com/andportnoy/riscv-disassembler/blob/master...
| mnurzia wrote:
| This is really cool, thanks for sharing! Something like this
| would be a great tool to distribute with my emulator.
| garganzol wrote:
| It would be nice if you could put a link to that project to
| your README file. Both projects are very impressive,
| especially when seen in conjunction with each other.
| aportnoy wrote:
| I mean, his simulator already has a disassembler contained
| within it, would just need to replace comments with print
| statements.
| charcircuit wrote:
| This isn't a RISC-V core. It is a RISC-V emulator library.
| nevi-me wrote:
| Question: do the implementation of single instructions compile to
| single instructions if targeting RISC-V with optimisations
| enabled? That would be really awesome if compilers realise what
| your code is doing and replace the implementations of
| instructions with those instructions.
| mnurzia wrote:
| Not really, my implementation isn't smart enough to guide
| compilers to the right solution. Trivial instructions, like
| xor, are of course recognized, but for example the 32x32 mul
| implementation isn't. Maybe compilers will be smart enough one
| day...
|
| https://godbolt.org/z/WEcTzKf7M
| dbcurtis wrote:
| Yeah, well, the rock that breaks your pick in that scenario is
| copying all the processor state back and forth to/from the
| emulation model, including flag register bits, and also
| correctly handling exceptions and faults. Emulating the
| instruction's happy path is just scratching the surface.
| duskwuff wrote:
| > including flag register bits
|
| RISC-V doesn't have those. Compare+branch is a single
| instruction.
| sitkack wrote:
| In the guest, you trap on reading emulation state, so that
| the source of truth is the hardware. Rather than use
| something like KVM I wonder if you could run another child
| process and use P trace?
| [deleted]
| sutterbutter wrote:
| As a total newb to programming, what am I looking at here?
| detrites wrote:
| There are several different types of CPU's, in two main
| classes, CISC and RISC. The difference is summarised by the
| first letter - "Complex" vs. "Reduced" - Instruction Set
| Computer. Or, what size "vocabulary" a CPU decodes.
|
| RISC-V is a type of CPU architecture (a set of plans for how to
| build one, not an actual CPU itself), that also happens to be
| open source. Anyone can build a RISC-V CPU without having to
| buy the rights to do so. (Many are.)
|
| This project is an _emulation_ of a RISC-V CPU. A kind of
| virtual "reference" CPU in software. It can be used to compile
| code that can run on a RISC-V type CPU, and to help understand
| what's happening inside the CPU when it runs.
|
| It's written in C, which is and was a very fundamental
| programming language that's influenced the design of many other
| languages. It is a language that is very close the fundamental
| language CPU's natively decode and process.
|
| CPU's natively use a language referred to as "Assembly", but
| which actually has many varieties particular to each CPU
| design. Regardless of variety of CPU, assembly is usually is
| about as reasonably "close to metal" as it gets.
|
| It's literally communicating with the CPU directly in its own
| language. This makes it extremely fast to run, but laborious to
| code, and also somewhat "dangerous" in that with such low-level
| control, it's easy to mess things up.
|
| This project takes an input of a text list of RISC-V assembly
| instructions (a "program") and pretends to be RISC-V CPU with
| those instructions loaded into it and being run on it. Useful
| for understanding, prototyping and building a RISC-V program.
|
| CPU's are designed rather to run assembly that already "works",
| having been created programmatically (compiled or interpreted),
| by a higher level language that isn't going to give it things
| that make no sense (hopefully).
|
| So there is not usually a lot of provisioning done in the
| design of the CPU to make it easy to watch it and its state
| carefully at a low level and examine how your assembly program
| is working, or not working. Emulation eases this.
| dragonwriter wrote:
| > CPU's natively use a language referred to as "Assembly", b
|
| Strictly, CPUs use machine code. Assembly targeting a
| particular CPU is a _very_ thin more-human-readable
| abstraction around the underlying machine code, but it is
| not, itself, what the CPU executes. That's why "assemblers"
| exist - they are compilers from assembly language to machine
| code (though, because assembly is a very thin abstraction,
| they are much simpler than most other compilers.)
| tester756 wrote:
| Would calling "Assembly" a CPU's frontend language be
| correct?
|
| The same way as it is in compilers
| detrites wrote:
| Agree. And deeper than that may be microcode, which we
| rarely see or reason about, and while may very much be
| there is rarely of practical use. (Ie, when learning, the
| distinctions may be somewhat an impediment without payoff.)
| bjourne wrote:
| Why stick with c89? Can't think of any compilers that doesn't
| support c99 nowadays. The major benefit is that you can use
| uint8_t and friends directly and don't need to define your own
| wrapper types.
| flohofwoe wrote:
| One "advantage" (if one wants to call that) is that the code
| would also compile as C++, while C99 has diverged enough from
| the common C/C++ subset that one cannot use all C99 features in
| C++ mode.
| mnurzia wrote:
| I totally missed this, good point.
|
| Slightly unrelated, but just thought I should mention: the
| sokol libraries are awesome!
| contrarian1234 wrote:
| Did Visual Studio finally make the jump?! (you could always
| just compile it as C++ code though)
| bjourne wrote:
| Nope stdint.h has been in msvc for over 10 years. Other c99
| features may be not supported though.
| flohofwoe wrote:
| Except for VLAs (which are optional post-C99 anyway), MSVC
| actually has pretty good support for recent C versions, and
| since 2020 they're basically back on the "modern C" train:
| https://devblogs.microsoft.com/cppblog/c11-and-c17-standard
| -...
| jpfr wrote:
| MSVC did a big rewrite of the C frontend around MSVC2013. I
| haven't encountered C99 idioms that don't work nowadays.
| Granted, I might not use every feature in my typical coding
| style...
| arp242 wrote:
| It's been "fully" C99 (and C11, C17) compliant for about
| 2 or 3 years. The only missing C99 featured before that
| were relatively rarely used ones like _Pragma.
| mort96 wrote:
| Hasn't the main issue with MS been VLAs? I seem to recall
| that VLAs are the main reason MSVC won't ever support C99,
| and that MSVC is one of the main reasons why VLAs were made
| optional. It seems like MSVC supports C11 and C17 now,
| thanks to the removal of mandatory VLAs.
| zabzonk wrote:
| vehement oppostion from ms my be one of the reasons for
| them being optional (and thus worthless) but the main one
| is that that they are impossible to use correctly. what
| happens if you make one too big?
| mort96 wrote:
| I think they could potentially have some very limited
| valid use cases, but I agree that a fixed length array
| and/or heap allocation is usually much better than VLAs.
|
| I was mainly just pointing out that MS's lack of C99
| support isn't really a part of keeping C89 alive,
| especially now that they officially support C11.
| boricj wrote:
| Funnily enough, the file rv.h does use stdint.h if available
| and contains the following comment:
|
| > All I want for Christmas is C89 with stdint.h
| dezgeg wrote:
| I've met several people that seriously think that C89 is the
| peak of programming languages and that C99 just brings
| misfeatures (like, allowing variable declarations in middle of
| basic blocks according to them)
| mnurzia wrote:
| It's more of a fun exercise, I guess. But I do have experience
| with at least one compiler that doesn't support C99: Zilog's
| ez80 C compiler. Back in the day I used to program my TI-84+ CE
| for fun[0], and the only C solution was a pretty bespoke
| C89-only compiler[1] distributed with a community toolchain[2],
| which has since switched to clang. It's somewhat irrational,
| but in the back of my mind it bugs me if the software I write
| can't run on platforms like that.
|
| [0] https://github.com/mnurzia/chip8-ce
|
| [1] http://www.zilog.com/docs/appnotes/pb0098.pdf
|
| [2] https://ce-programming.github.io/toolchain/
| freecodyx wrote:
| This proves that at the core. The things we rely on to achieve
| great software and life impacting technologies are extremely
| simple. The complexity is that how to make them.
| numpad0 wrote:
| The complexity is in how to distribute dev workload and how to
| make it financially viable. No one pays for beautiful works of
| art unless it's somehow anchored, tangled and aligned into
| their interests.
| arcticbull wrote:
| The core concepts are generally very straightforward, however
| it's always the optimization that adds complexity. That's how
| you get the orders of magnitude improvement. This C89 core
| definitely doesn't do macro op fusion for instance.
| rowanG077 wrote:
| The Readme doesn't answer it but I struggle to see why you want a
| c implementation of an ISA.
| detrites wrote:
| Not sure if this was intended, but coming to this as someone
| vaguely aware of RISC-V, it's looking like a fantastic form of
| documentation for the ISA, that both describes and gives a way
| to play with it, but in an intuitive, even fun manner.
|
| Obviously this works best for someone who already knows C -
| but, given it's C89 mitigates against this aspect somewhat.
| rowanG077 wrote:
| A reference implementation would be in Verilog or VHDL.
| Farmadupe wrote:
| Considering it's allocation-free, maybe it's an ultralight/
| simulator for checking large quantities of compiler output?
| (i.e no VM to create and destroy for every testcase)
|
| Or the same but for testing some verilog/vhdl CPU implemetation
| in a simulator?
|
| Or since it's only 500SLOC, maybe it's just for fun!
| rowanG077 wrote:
| Then I would expect a comparison with verilator.
| mnurzia wrote:
| This is an excellent idea. One limitation of a testing
| library of mine, `mptest`, is its inability to sandbox tests.
| I may take this idea and develop a more robust (and
| potentially parallel) testing framework around it.
| nly wrote:
| So you can compile and run it on any platform with a C compiler
| rowanG077 wrote:
| That is just something you can do with C code. That is not a
| goal in itself. Why would you want to run a C ISA instead of
| just using a standard simulator? Why not use verilator + any
| of the open source RISC-V cores?
| LoganDark wrote:
| Because those are slower, more complex, and more difficult
| to understand?
| rowanG077 wrote:
| I doubt verilator is much slower. The speed of it is
| insane. They are indeed more complex and difficult to
| understand. But I fail to see how that is a criterium. I
| would very much rather include an industry standard
| library in comparison to something homegrown.
| srgpqt wrote:
| Perhaps this could be used to run sandboxed code. Game engines
| could safely run mods using something like this, ala QuakeC.
| mnurzia wrote:
| Definitely. My motivation for writing this was to have a
| simple CPU for a virtual game console-like project. I decided
| to release it on its own, though.
| mcraiha wrote:
| For modern game engine you most likely want WebAssembly
| support. e.g. Flight Simulator does that
| https://flightsimulator.zendesk.com/hc/en-
| us/articles/766290...
| srgpqt wrote:
| Sure, I'd love to see your 600 line webassembly
| interpreter.
| sitkack wrote:
| Run wasm on this core.
| bitwize wrote:
| I feel myself descending into old-fartitude more and more with
| every year. My wife and I were recently involved in a car
| accident (no one was hurt). While I was being checked out I
| overheard a 20-year-old firefighter exchange Facebook information
| with an 18-year-old EMT. I was like, "wait a minute, you guys
| seem really young and you still use Facebook? I thought Facebook
| was for your grandparents and all the kids now use Snapchat or
| TikTok?"
|
| I get that same feeling now. This kid is 20 and still using C89?
| Shouldn't people his age have been reared entirely in the
| crystal-spires-and-togas utopia of Rust, with raw pointers and
| buffer overruns being mere legends of their people's benighted
| past told to them by their elders?
|
| It's kind of comforting to see young programmers embracing the
| old ways, even if it's for hack value only.
| sitkack wrote:
| I think kids or at least there's the risk of kids seeing old
| people romantically reenacting their eight bit micro days and
| think that it's some thing besides nostalgia.
|
| I was kind of the opposite as a kid, if it wasn't crazy
| futuristic I didn't want it. So even in the 80s I wanted an
| FPGA accelerators in every machine.
| mnurzia wrote:
| Admittedly, C89 has very little utility, especially among
| people my age. For example, my university progresses from
| Racket to Java to C++, and has a systems course that partially
| teaches C11. Although good for teaching, I don't think those
| languages artificially constrained me in the ways that C89
| does. I felt that my programming skills improved the most when
| I forced myself to work in such an under-powered language.
|
| I also like the idea of being able to run my code anywhere,
| kind of like Doom.
| peterfirefly wrote:
| 'switch' is a really, really nice language construct that was
| fully implemented long before C89. Using lots of nested 'if's
| instead is not a good idea.
| hgs3 wrote:
| 'switch' is good, but for VM's computed goto is better.
| KerrAvon wrote:
| depends on the compiler implementation. modern compilers may
| be able to treat equivalent switch statements, gotos, and
| if/else statements pretty much the same
| nsajko wrote:
| Only in trivial cases.
| sylware wrote:
| nested "ifs" are optimized out by compilers. Moreover in the
| latest horrible gcc extensions you have the case statement
| using a _not_compiler constant expression (you can find the
| usage of such horrible gcc extension in linux net code).
| mnurzia wrote:
| This was my one of my main justifications for making this
| design choice, in addition to the (in my opinion)
| overwhelming amount of break statements that would result
| from using switches. But more importantly, many of the "if"
| statements have non-constant or more complex expressions in
| them that aren't supported in switch statements in ANSI C.
| sylware wrote:
| Yep.
|
| And as you stated, it is important to stay as much as
| possible close to c89, because ISO is literaly doing
| planned obsolescence, but on a long time cycle (5-10
| years).
|
| Hopefully risc-v will be a success, and all system
| components and interpreters of very-high-level languages
| will be rewritten in risc-v assembly and it will become
| actually very hard to do planned-obsolescence.
| sylware wrote:
| A bigger implementation, but has 64bits support:
|
| https://bellard.org/tinyemu/
| garganzol wrote:
| Seeing the RISC-V instructions implemented in the emulator like
| that, it comes to my mind that RISC-V is really a reduced
| instruction set CPU.
|
| When compared to AVR 16-bit RISC instruction set, RISC-V looks so
| much simpler. (You may be indirectly familiar with AVR
| architecture by the household name "Arduino".)
|
| The intriguing part is that AVR is just a microcontroller, while
| RISC-V is intended to be a full-blown CPU.
| opencl wrote:
| The base instruction set is tiny but there are quite a few
| extensions and pretty much every practical implementation
| includes at least a few of them.
|
| i.e. the GD32V microcontrollers implement RV32IMAC, Allwinner
| D1 which is a "full-blown" CPU meant to run Linux implements
| RV64IMAFDCVU.
|
| RV32I/RV64I are the base 32/64 bit integer instruction sets and
| every letter after that is a different extension. Most of the
| extensions are relatively small and simple, but the C
| (compressed instructions) extension introduces some decoder
| complexity and the V (vector) extension adds several hundred
| instructions.
|
| Though even with all the extensions it is still a very
| small/simple ISA by modern standards.
| RobotToaster wrote:
| Is this designed to be used with some kind of C to VHDL/verilog
| transpiler?
| RealityVoid wrote:
| Not really, think of it like a... CPU emulator? Ish? You have
| registers as variables in the program. If you have register a1
| and you are at an instruction adding 1 to it, it will add 1 to
| the variable representing a1. So on and so forth.
|
| This works because, well, memory operations are mostly(all?) a
| CPU does so this "core" takes the program and does the same
| kind of memory operations the silicon would do, only in SW.
___________________________________________________________________
(page generated 2023-06-10 23:01 UTC)