[HN Gopher] Building a Minimalistic Virtual Machine
       ___________________________________________________________________
        
       Building a Minimalistic Virtual Machine
        
       Author : ingve
       Score  : 95 points
       Date   : 2023-02-25 14:00 UTC (9 hours ago)
        
 (HTM) web link (pointersgonewild.com)
 (TXT) w3m dump (pointersgonewild.com)
        
       | rhn_mk1 wrote:
       | I wonder how this compares to LLVM. That project had building a
       | directly interpretable bytecode and a virtual machine as an
       | initial goal.
        
         | maxime_cb wrote:
         | LLVM is more heavyweight. Has a lot of analysis and
         | optimization passes for static compilation. UVM is currently
         | very lightweight, will be JIT compiled. Crucially UVM will
         | provide graphics, audio and networking primitives.
        
           | rhn_mk1 wrote:
           | That's what LLVM is today, but IIRC their goals were the
           | same. It might be worth it to look at what made it turn away
           | from the minimal idea and into the heavyweight that it is
           | now.
        
             | maxime_cb wrote:
             | If I had to guess, I would say it probably comes down to
             | wanting to go with static, ahead of time compilation rather
             | than JIT. There are real and valid advantages to AOT
             | compilation. There's downsides with the LLVM approach too.
             | For example, LLVM makes it explicit that they provide zero
             | guarantees when it comes to the stability of their bitcode
             | format. That somewhat limits what you can do with it.
        
       | kaba0 wrote:
       | Hi! I can't help but see a big similarity to the JVM, both in
       | terms of design goals and the semantics of the byte codes - only
       | "not having dynamic linking" being an exception.
       | 
       | Could you expand on what you don't find sufficient in JVM byte
       | code, when it is arguably one of the best platform for backwards
       | compatibility, has easy support for bringing up a canvas and
       | start painting, etc.
        
         | maxime_cb wrote:
         | For one the JVM is a huge piece of software. Large enough that
         | only a large corporation could realistically reimplemented or
         | maintain it. It also exposes many APIs with a large surface
         | area. Then there's the issue of Oracle and how you feel about
         | them as a company.
         | 
         | UVM has obviously nowhere near the ecosystem, but you can draw
         | pixels to a frame buffer with two function calls, and your UI
         | will be guaranteed to look the same everywhere.
        
           | ternaryoperator wrote:
           | The folks at the Jacobin JVM [0] project (a JVM written in
           | Go) are working on the issue of size and the ability to have
           | a fully functional JVM maintained by a small group of
           | developers. Right now, per the latest post [1], they can run
           | simple classes and expect to complete the interpreter in the
           | next few months.
           | 
           | [0] jacobin.org [1]
           | http://binstock.blogspot.com/2023/02/jacobin-jvm-
           | at-18-month...
        
           | kaba0 wrote:
           | JVM is a specification which can be implemented (and has been
           | plenty of times, completely independently of each other) by a
           | single developer in like half a years tops. It is a simple
           | stack-machine with 100+ basic instructions, a simple
           | exception mechanism and a heap. A GC is not even necessary if
           | we are talking about minimalistic approaches, but a basic
           | tracing GC is also not hard.
           | 
           | It is only a (huge) plus that it can be run everywhere with
           | top of the line performance thanks to OpenJDK (which is
           | mostly developed by Oracle, but is big enough that an insane
           | amount of companies critically depend on it and several one
           | could single-handedly finance the future of the platform if
           | anything were to happen, which won't because it has the same
           | license as Linux).
        
             | ternaryoperator wrote:
             | It's not close to as easy as you represent. To start with,
             | there are 204 instructions, some of them, far more complex
             | than what you term "basic," such as invokedynamic. The
             | exception mechanism is also far from "simple," -- it's
             | simple conceptually but extremely difficult to get exactly
             | right when it involves finally clauses both in the
             | exception handler and the original excepting code. There
             | are many subtleties that can lead to completely wrong
             | results if not designed very, very carefully. It's far from
             | "simple."
        
               | kaba0 wrote:
               | Out of those 204 plenty are the exact same functionality
               | for different types though.
               | 
               | Sure, there is invokedynamic/static/virtual that is a bit
               | more complicated (they basically do runtime linking at
               | first run), but I have implemented them and it is not
               | harder than other pieces of a runtime.
               | 
               | Every method has an exception handler description, which
               | is basically a series of instruction address ranges -- if
               | the thrown exception came from there, it jumps to the
               | handler specified by the first match. If not, it
               | propagates up. "Finally" clause is syntactic sugar only.
               | 
               | Sure, these are hard to get right, but that is inherent
               | in the domain to a degree. You need many many
               | "integration" tests - I wrote a test runner that runs the
               | same program with OpenJDK and my implementation and
               | compared their outputs.
        
             | maxime_cb wrote:
             | Implementing a good GC is incredibly hard. The JVM may have
             | just "100+ basic instructions", but it also has classes,
             | objects, arrays, and a whole set of APIs it provides. Your
             | JVM is kind of useless if it doesn't ship with all of the
             | user interface primitives (and other APIs/classes) people
             | expect, for instance. Otherwise what you have is not what
             | people expect to find in a JVM.
             | 
             | I'm also under the impression that building a good JIT for
             | a JVM would be a massive undertaking. It literally took
             | over a decade for the Sun/Oracle JVM's JIT to become mature
             | enough.
             | 
             | I've designed UVM in a way that I believe it will be
             | possible to design a good JIT with relatively little effort
        
               | kaba0 wrote:
               | > Implementing a good GC is incredibly hard
               | 
               | If you are interpreting instructions a simple one will be
               | more than enough. Classes are its primitives, you just
               | create a basic runtime representation for them with name,
               | superclass, implemented interfaces and the methods'
               | bytecodes. Then an object can be as simple as a header
               | containing a pointer to the class's representation and
               | then a listing of its fields, which can all be 64bits.
               | And an array can have the exact same representation as
               | well, with the first element being its size.
               | 
               | All APIs are just classes with some methods that may be
               | "native", which are linked to a native implementation
               | (basically just a function pointer). This is how file
               | access and the like becomes possible.
               | 
               | > Otherwise what you have is not what people expect to
               | find in a JVM
               | 
               | You can just say that it is a partial implementation that
               | doesn't support the whole of the Java standard lib. It's
               | not unheard of (e.g. Java ME is a subset that runs on
               | every SIM and bank card).
               | 
               | > I'm also under the impression that building a good JIT
               | for a JVM would be a massive undertaking
               | 
               | Well, then just go with an okayish JIT. With all due
               | respect, you ain't going to beat the JVM with your UVM's
               | JIT compiler, not even close. Why do you think creating a
               | similarly good JIT compiler to a very similar design
               | would be any easier in case of UVM?
               | 
               | But don't get me wrong, I just ask these questions
               | because I dislike NIH syndrome and I believe there are
               | useful lessons to be learned from the past. But you
               | should be able to answer why the thing you do is any
               | different (unless it is for learning). And the JVM spec
               | is a surprisingly good read, and you can sure take great
               | ideas from that.
        
               | maxime_cb wrote:
               | > With all due respect, you ain't going to beat the JVM
               | with your UVM's JIT compiler, not even close.
               | 
               | I think I may be able to get very close to native
               | performance. I don't want to sound like an asshole by
               | appealing to authority, but you aren't talking to a
               | teenager writing an interpreter from their parent's
               | basement. I have 21 years of programming experience, a
               | PhD in compiler design and multiple published papers. I
               | have some idea what I'm talking about.
               | 
               | > Why do you think creating a similarly good JIT compiler
               | to a very similar design would be any easier in case of
               | UVM?
               | 
               | The design is superficially similar to the JVM but it's
               | also quite different. UVM's bytecode is untyped. It maps
               | fairly directly to the x86-64 and ARMv8 instruction sets.
               | If you want an idea of how a simple JIT compiler for a
               | bytecode like that can perform, you should look at the
               | performance of Apple's Rosetta. But, I actually think I
               | can build something that yields better performance than
               | that :)
        
               | kaba0 wrote:
               | I eagerly await your results, and didn't want to sound
               | condescending at all, sorry if it came across like that.
               | I was just genuinely interested in a - to me - more
               | understandable difference.
               | 
               | Also, what does "native performance" even mean here? Only
               | removing the interpreter overhead?
        
               | maxime_cb wrote:
               | Thanks for clarifying. Tone is sometimes ambiguous via
               | text.
               | 
               | At the moment I'm in no rush to actually write the JIT
               | compiler because I think it's faster to iterate with an
               | interpreter. I want to flesh out the VM and its APIs,
               | test the hell out of everything and develop the system a
               | bit more first.
               | 
               | The interpreter runs at something ~400 million
               | instructions per second on my laptop, which is probably
               | close to the performance of an old school Pentium 2 chip,
               | so it's actually fast enough to run a lot of non-trivial
               | software. With even a really basic JIT I should be able
               | to hit 10x that throughput. I've benchmarked code out of
               | GCC and it runs about 27 times faster (on a
               | microbenchmark).
        
       | fdaryfdyfgd wrote:
       | i use qemu for that. bypass all speed and virtualization
       | improvements via flags and you have a rather isolated system
        
       | l_theanine wrote:
       | Arguably more interesting is UXN: https://100r.co/site/uxn.html
       | 
       | A small personal computing stack, with a plethora of examples
       | ready to go. Built by two they/them hackers who live on a boat
       | and basically have bootstrapped everything about their vessel,
       | their computing, their engineering, etc. It's very, VERY old-
       | school hacker-y.
       | 
       | They would've made excellent phreaks back in the good ol' days.
        
         | maxime_cb wrote:
         | Author here. The creator of UXN is a friend of mine and we chat
         | semi-regularly about our VMs.
         | 
         | I have a lot of respect for uxn and credit it as an
         | inspiration, but the goals of each project are different. UXN
         | is a 16-bit system with 64KB of RAM accessible. It will also
         | probably always remain interpreted. These design restrictions
         | are seen as tools to foster creativity.
         | 
         | UVM is a 32/64-bit VM. It's currently interpreted, but I've
         | designed the instruction set with JIT compilation in mind. I
         | have a PhD in compiler design and I'm fairly confident that I
         | can make a fast JIT for UVM in a relatively short amount of
         | time, when I feel the design is mature/stable enough.
         | 
         | At the moment, UVM is relatively immature, but I want it to be
         | a small/minimalistic VM that you can still build "real" or
         | modern software in that takes good advantage of the
         | capabilities and performance of your machine.
         | 
         | Another difference is that IMO, UVM is more approachable. UXN's
         | assembly language is fairly esoteric IMO. It doesn't look like
         | any other assembly language I've ever seen. That doesn't make
         | it bad, but it does potentially make it harder to learn and
         | harder to leverage an existing base of programming skills.
         | UVM's assembly is designed to not be surprising if you've ever
         | programmed in assembly and know the basic ideas about how a
         | stack machine works. I also have a WIP C compiler that's
         | already usable to write simple programs. See my little snake
         | game for a fun toy example:
         | https://github.com/maximecb/uvm/blob/main/ncc/examples/snake...
         | 
         | Assembly syntax example:
         | https://github.com/maximecb/uvm/blob/main/vm/examples/factor...
         | 
         | I'll point to the fact that there is almost no boilerplate
         | necessary to start drawing some pixels on a 2D canvas, which
         | IMO makes it a fun platform to develop for. Like I said, it's
         | immature, but I'll iron out all the bugs I can find and keep
         | making it better.
        
           | n4ture wrote:
           | Awesome project! I've been on the lookout for such projects
           | ever since discovering uxn, I'll definitely have a look and
           | keep an eye on uvm.
           | 
           | >The creator of UXN is a friend of mine and we chat semi-
           | regularly about our VMs.
           | 
           | Does the discussion happen in a public place? If yes I'd be
           | extremely happy to join in since I also got started with
           | making my own system around a month ago, and it feels a bit
           | lonely going on such an endeavor at times.
           | 
           | It's extremely early and I haven't really shared it anywhere
           | yet, but I feel there is already the possibilty to play
           | around with the custom editor I made, try to make little
           | graphical programs etc.. If you manage to build it that is (I
           | develop mostly on OpenBSD and also try to make it build under
           | Ubuntu with gcc sometimes).
           | 
           | The source is hosted here for now:
           | https://git.blazebone.com/pochi/
           | 
           | The README (in the about tab) should give a rough explanation
           | of what it is, I also have a bit of documentation already.
           | 
           | >Another difference is that IMO, UVM is more approachable.
           | 
           | Very interesting choice, I did away with such assumptions and
           | ran the other way, my system might feel quite alien/esoteric
           | since I went for something that draws a lot of inspiration
           | from Chuck Moore's work with ColorForth as well as his F18
           | chip.
        
             | maxime_cb wrote:
             | > Does the discussion happen in a public place? If yes I'd
             | be extremely happy to join in since I also got started with
             | making my own system around a month ago, and it feels a bit
             | lonely going on such an endeavor at times.
             | 
             | I'm happy to discuss anything in the GitHub discussions for
             | UVM: https://github.com/maximecb/uvm/discussions
             | 
             | > Very interesting choice, I did away with such assumptions
             | and ran the other way, my system might feel quite
             | alien/esoteric since I went for something that draws a lot
             | of inspiration from Chuck Moore's work with ColorForth as
             | well as his F18 chip.
             | 
             | If you're building a system for fun, or to explore new
             | ideas, then it seems fine to make it as esoteric as you
             | want. However, in my experience, making esoteric choices
             | when designing programming languages for instance, can
             | really alienate potential users. Especially if you could
             | have obviously gone with some more traditional and familiar
             | choices but you went with something more esoteric that
             | doesn't have any clear value added.
             | 
             | IMO it's a bit like when it comes to terminology. If
             | there's a commonly accepted way to refer to something, use
             | it. Don't make up your own nomenclature, you'll just create
             | extra confusion for no reason.
        
           | samsquire wrote:
           | Could you explain what you did to build in mind for JIT?
           | 
           | I feel WASM has the opportunity to create a truly audiovisual
           | API for interacting with computers.
        
             | maxime_cb wrote:
             | I go into some of the design decisions I made to make JIT
             | optimizations easier here:
             | https://github.com/maximecb/uvm/blob/main/doc/design.md
        
               | samsquire wrote:
               | You mention parallel computation and being open for
               | discussion.
               | 
               | My favourite area of computing is parallel computing and
               | multithreading.
               | 
               | My toy multithreaded interpreter in Java can communicate
               | integers between threads with message passing. I never
               | got around to communicating complicated objects because
               | I'm not sure how to solve the garbage collection problem
               | with compound data structures/object graphs AND sending
               | objects between threads. I'm currently relying on Java
               | garbage collection at this time but I have played with a
               | garbage collector written by Matthew Plant
               | (http://maplant.com/gc.html), so if I were to implement
               | my language in C I could also be inspired by Pony's
               | reference capabilities.
               | 
               | Since my interpreter is a simple imaginary assembly
               | interpreter I have instructions for "sendcode"
               | "receivecode" which tell a thread to do a remote jump.
               | There is also a "send" and "receive" instruction for
               | sending data between threads in a thread safe manner.
               | This uses actor style mailboxes behind the scenes.
        
         | throwaway290 wrote:
         | I was about to mention uxn. It's almost as if people with
         | enough motivation to build and use this are also people who
         | will never agree to use and contribute to one another's work.
         | Not to say any of this work is a waste of course
        
           | entaloneralie wrote:
           | Creator of Uxn here, Maxime is a dear friend of mine and
           | we've jammed extensively on the design of UVM together.
        
           | deckard1 wrote:
           | welcome to the world of Forth/Lisp
           | 
           | It is somewhat sadly ironic that to preserve software one has
           | to actually create software that other people care about
           | preserving. And, as we've seen with emulators, if you have
           | software people care about then there are no roadblocks that
           | will stand in the way of someone figuring out how to get your
           | ancient code to run. I take the whole 100R thing as more an
           | interesting art project than a serious goal. More of a "what
           | if".
           | 
           | With that said, I feel most devs are the same. How often do
           | you come across a lead or senior dev that becomes passive
           | aggressive over people modifying "their" code? Quite often,
           | in my experience. Any code that I didn't write sucks and any
           | code I have to touch that I didn't write is the worst code
           | ever written. Code that I didn't write is old obsolete dog
           | turds and code I wrote is "modern". Every dev slaps "modern"
           | on their open source project today.
        
       | donnowhy wrote:
       | I have not ever studied POSIX _academically_ ( _curricularly_ )
       | but:
       | 
       | > _a small set of minimalistic APIs that remain stable over time_
       | 
       | but from what I've gathered (over years, by 'osmosis'). Isn't
       | this the essential idea behind POSIX?
        
         | maxime_cb wrote:
         | Yes, but POSIX doesn't provide APIs for graphics or audio, for
         | example, even though these things are necessary to build a lot
         | of end-user software.
        
       | samsquire wrote:
       | This is awesome.
       | 
       | Thank you for sharing.
       | 
       | My ancient ruby code and nodejs code all broke because I didn't
       | pin dependencies. As a result I've got software that is
       | unrunnable.
       | 
       | More software shall be unrunnable as time goes on, I don't know
       | many trends that prevent software from being unbuildable and
       | unrunnable due to change except maybe repeatable builds and
       | hermetic builds.
       | 
       | Given platform toolchains complexity and libc versions and
       | complexity of static Vs dynamic linking, I suspect preserving
       | software is very difficult.
       | 
       | It seems doing + - x/ on numbers is not the difficult part of
       | computers but arranging information into the right places in
       | order to do it.
       | 
       | Logistics and package management are difficult to get right.
       | 
       | I think Java got something right. Bytecode is longlasting. Can
       | rewrite the JVM for new platforms and architectures.
       | 
       | I am writing my own language and it is implemented as an assembly
       | interpreter and a compiler for that interpreter. This lets me get
       | development speedier.
       | 
       | What the hard thing I think is more interesting than bytecode or
       | virtual machines is INTEROP.
       | 
       | The Amd64 SysV Binary interface of registers for C calling
       | interface and the System call Interface of Linux.
       | 
       | Mozilla abandoned XPCOM extensions, part of the reason was
       | performance of the interop between JavaScript and C++.
       | 
       | If I could run a virtual machine and interop with modern code
       | that would mean the software was useable for longer.
        
         | whartung wrote:
         | The JVM is an interesting use case.
         | 
         | Indeed, at a compiled level, byte code is mostly long lasting.
         | 
         | I haven't fired up a 10 year jar file recently but it would not
         | surprise me if it Just Worked.
         | 
         | The success of that is twofold. First is simply that whatever
         | changes are being made to the JVM, they're mostly forward
         | looking and don't deprecate running code.
         | 
         | The other is that the conventional packaging mechanic is,
         | essentially, a static binary. A "fat jar" is the term of art,
         | with all of the dependencies bundled in.
         | 
         | But there's still potential problems. They've been removing
         | large subsystems from the JDK as of late. XML, web services,
         | Java FX are poster examples. So legacy binaries depending on
         | those will fail outright.
         | 
         | These can be added back to the Java runtime, but still "one
         | more thing".
         | 
         | Of course from the source code side, Java suffers dependency
         | hell and code rot along with the best of them. Network based
         | dependencies up and vanish. Long standing projects may not
         | publish 10 year old jar files any more.
         | 
         | Also, Java has had other clods dropped into its churn. Oracle
         | shutting down the java.net website was a huge sudden black hole
         | in the community consciousness of Java. Overnight thousands of
         | articles, blog posts, forum entries, and other artifacts
         | vanished like Keyser Soze. Leaving behind a debris field of
         | dead links across the internet.
         | 
         | So, to be fair, the JVM is a boon. I really like Java, and it's
         | still going strong. The VM architecture and the comprehensive
         | nature of the Java runtime has made the moving of running code
         | across systems much easier. As someone with an enterprise Java
         | background, used to deploying WAR and EAR files, I got to
         | mostly avoid the entire Docker and other such family of
         | infrastructure. Install a JDK, install an App Server, all
         | fairly trivial to isolate, and the system is ready to go.
         | 
         | But in the large, it takes more than a VM to get things
         | accomplished. There's always an eco-system at play.
         | 
         | And one primary characteristic of eco-systems is they evolve.
         | Time marches on, and waits for no one.
        
           | samsquire wrote:
           | Thank you for your thoughtful and interesting comment.
           | 
           | I too like Java a lot and think it's a great technology.
           | 
           | I can see how a runtime for a compiler (such as the JVM)
           | doesn't need to change much over time: if the compiler
           | produces the right code in 2000, then the code is probably
           | still right in 2020, it just could use more features of the
           | ISA that were introduced since then.
           | 
           | On the other hand, the design of platform, library code and
           | framework code is an ecosystem that is desperate to transform
           | over time as better approaches to solving technical and
           | business problems are found.
        
         | tambourine_man wrote:
         | That's one of the many advantages of building for the web. It
         | won't ever be unsupported (for some reasonable definition of
         | never).
         | 
         | You get text rendering, canvas, audio, webgl... it's a pretty
         | wide platform.
        
           | maxime_cb wrote:
           | Except that's just not true. It's already broken/incompatible
           | in many places across browsers. You may not notice if you're
           | just doing basic HTML/CSS, but if you do anything slightly
           | more dynamic, you're going to notice. An ever-expanding set
           | of complex APIs also makes it more and more likely that bugs
           | will go unseen and unfixed. See two examples I detailed in
           | the blog post.
        
             | tambourine_man wrote:
             | Sorry, I was reading the comments before reading your
             | article.
             | 
             | Getting around pointer events inconsistencies is a lot
             | easier than building your own cross-platform VM, of course.
             | But the project looks awesome and seems like a great
             | initiative.
             | 
             | I imagine there will also be differences in the way macOS,
             | Linux and Windows handle graphics, IO, audio, etc, that
             | will eventually leak to UVM, it's just the nature of the
             | challenge.
        
         | throwaway78941 wrote:
         | > My ancient ruby code and nodejs code all broke because I
         | didn't pin dependencies. As a result I've got software that is
         | unrunnable.
         | 
         | You can simply remove the ^ symbol from your versions listed in
         | package.json and it will use exactly that version you
         | originally added.
        
           | maxime_cb wrote:
           | Besides the issue of whether you pin or don't pin your
           | dependencies, the problem is that node packages can depend on
           | external native code. You can have several more layers of
           | dependencies in there. If, for any reason, those native
           | packages won't install/run on your machine, your dependencies
           | can still break under you, even if you pin them. Python and
           | Ruby have the same vulnerability when it comes to
           | dependencies breaking.
        
       | euclaise wrote:
       | Reminds me of the Dis VM and Inferno OS
        
       | jart wrote:
       | > or that I should base my system on an existing processor
       | architecture and work on something like Justine Tunney's blink
       | instead.
       | 
       | If she did then she'd be very welcome as a Blink developer.
        
         | maxime_cb wrote:
         | You seem to be doing just fine without me :)
         | 
         | Blink is a very impressive project. Mad props.
        
       | narag wrote:
       | Very interesting for embedded or kiosk. Or in general, if the VM
       | is run as a server inside a host: encryption in a box, instead of
       | a library. Curious about what kind of multitask will be added.
        
         | maxime_cb wrote:
         | I could actually use some feedback when it comes to the design
         | of the parallelism model for UVM. I have a few ideas but it's
         | not my area of expertise, so I would welcome feedback and
         | suggestions.
        
           | narag wrote:
           | I have experience as a user, implementation is way over my
           | head. Depending on the minimum host supported, you could only
           | use lightweight threads or write a scheduler. IIRC, JVM
           | started using its own thread implementation and later used
           | the one provided by the host OS, when available.
           | 
           | Best of luck!
        
             | maxime_cb wrote:
             | At the moment I have an event-driven system where you can
             | set up callbacks and timers. It's not green threads but it
             | makes it easy to have multiple different update events
             | running at different rates, for example: https://github.com
             | /maximecb/uvm/blob/main/ncc/examples/attac...
        
           | chc4 wrote:
           | For _parallelism_ I 'd expect something like Cilk (or Rayon,
           | which was inspired by it), where there's some syscalls for
           | spawning and then joining on tasks, but the VM handles
           | starting and managing the thread pool.
           | 
           | For _concurrency_ it 's a lot harder to make a good interface
        
       | efficax wrote:
       | the jvm already exists and you can run ancient compiled class
       | files with it.
        
         | maxime_cb wrote:
         | The level of cynicism on HN is sometimes really depressing.
         | 
         | > the jvm already exists and you can run [some] ancient
         | compiled class files with it [but many won't work correctly].
         | 
         | FTFY.
        
         | packetlost wrote:
         | ok, but not everything wants the baggage that comes with the
         | JVM.
        
       ___________________________________________________________________
       (page generated 2023-02-25 23:01 UTC)