[HN Gopher] Building a Minimalistic Virtual Machine
___________________________________________________________________
Building a Minimalistic Virtual Machine
Author : ingve
Score : 95 points
Date : 2023-02-25 14:00 UTC (9 hours ago)
(HTM) web link (pointersgonewild.com)
(TXT) w3m dump (pointersgonewild.com)
| rhn_mk1 wrote:
| I wonder how this compares to LLVM. That project had building a
| directly interpretable bytecode and a virtual machine as an
| initial goal.
| maxime_cb wrote:
| LLVM is more heavyweight. Has a lot of analysis and
| optimization passes for static compilation. UVM is currently
| very lightweight, will be JIT compiled. Crucially UVM will
| provide graphics, audio and networking primitives.
| rhn_mk1 wrote:
| That's what LLVM is today, but IIRC their goals were the
| same. It might be worth it to look at what made it turn away
| from the minimal idea and into the heavyweight that it is
| now.
| maxime_cb wrote:
| If I had to guess, I would say it probably comes down to
| wanting to go with static, ahead of time compilation rather
| than JIT. There are real and valid advantages to AOT
| compilation. There's downsides with the LLVM approach too.
| For example, LLVM makes it explicit that they provide zero
| guarantees when it comes to the stability of their bitcode
| format. That somewhat limits what you can do with it.
| kaba0 wrote:
| Hi! I can't help but see a big similarity to the JVM, both in
| terms of design goals and the semantics of the byte codes - only
| "not having dynamic linking" being an exception.
|
| Could you expand on what you don't find sufficient in JVM byte
| code, when it is arguably one of the best platform for backwards
| compatibility, has easy support for bringing up a canvas and
| start painting, etc.
| maxime_cb wrote:
| For one the JVM is a huge piece of software. Large enough that
| only a large corporation could realistically reimplemented or
| maintain it. It also exposes many APIs with a large surface
| area. Then there's the issue of Oracle and how you feel about
| them as a company.
|
| UVM has obviously nowhere near the ecosystem, but you can draw
| pixels to a frame buffer with two function calls, and your UI
| will be guaranteed to look the same everywhere.
| ternaryoperator wrote:
| The folks at the Jacobin JVM [0] project (a JVM written in
| Go) are working on the issue of size and the ability to have
| a fully functional JVM maintained by a small group of
| developers. Right now, per the latest post [1], they can run
| simple classes and expect to complete the interpreter in the
| next few months.
|
| [0] jacobin.org [1]
| http://binstock.blogspot.com/2023/02/jacobin-jvm-
| at-18-month...
| kaba0 wrote:
| JVM is a specification which can be implemented (and has been
| plenty of times, completely independently of each other) by a
| single developer in like half a years tops. It is a simple
| stack-machine with 100+ basic instructions, a simple
| exception mechanism and a heap. A GC is not even necessary if
| we are talking about minimalistic approaches, but a basic
| tracing GC is also not hard.
|
| It is only a (huge) plus that it can be run everywhere with
| top of the line performance thanks to OpenJDK (which is
| mostly developed by Oracle, but is big enough that an insane
| amount of companies critically depend on it and several one
| could single-handedly finance the future of the platform if
| anything were to happen, which won't because it has the same
| license as Linux).
| ternaryoperator wrote:
| It's not close to as easy as you represent. To start with,
| there are 204 instructions, some of them, far more complex
| than what you term "basic," such as invokedynamic. The
| exception mechanism is also far from "simple," -- it's
| simple conceptually but extremely difficult to get exactly
| right when it involves finally clauses both in the
| exception handler and the original excepting code. There
| are many subtleties that can lead to completely wrong
| results if not designed very, very carefully. It's far from
| "simple."
| kaba0 wrote:
| Out of those 204 plenty are the exact same functionality
| for different types though.
|
| Sure, there is invokedynamic/static/virtual that is a bit
| more complicated (they basically do runtime linking at
| first run), but I have implemented them and it is not
| harder than other pieces of a runtime.
|
| Every method has an exception handler description, which
| is basically a series of instruction address ranges -- if
| the thrown exception came from there, it jumps to the
| handler specified by the first match. If not, it
| propagates up. "Finally" clause is syntactic sugar only.
|
| Sure, these are hard to get right, but that is inherent
| in the domain to a degree. You need many many
| "integration" tests - I wrote a test runner that runs the
| same program with OpenJDK and my implementation and
| compared their outputs.
| maxime_cb wrote:
| Implementing a good GC is incredibly hard. The JVM may have
| just "100+ basic instructions", but it also has classes,
| objects, arrays, and a whole set of APIs it provides. Your
| JVM is kind of useless if it doesn't ship with all of the
| user interface primitives (and other APIs/classes) people
| expect, for instance. Otherwise what you have is not what
| people expect to find in a JVM.
|
| I'm also under the impression that building a good JIT for
| a JVM would be a massive undertaking. It literally took
| over a decade for the Sun/Oracle JVM's JIT to become mature
| enough.
|
| I've designed UVM in a way that I believe it will be
| possible to design a good JIT with relatively little effort
| kaba0 wrote:
| > Implementing a good GC is incredibly hard
|
| If you are interpreting instructions a simple one will be
| more than enough. Classes are its primitives, you just
| create a basic runtime representation for them with name,
| superclass, implemented interfaces and the methods'
| bytecodes. Then an object can be as simple as a header
| containing a pointer to the class's representation and
| then a listing of its fields, which can all be 64bits.
| And an array can have the exact same representation as
| well, with the first element being its size.
|
| All APIs are just classes with some methods that may be
| "native", which are linked to a native implementation
| (basically just a function pointer). This is how file
| access and the like becomes possible.
|
| > Otherwise what you have is not what people expect to
| find in a JVM
|
| You can just say that it is a partial implementation that
| doesn't support the whole of the Java standard lib. It's
| not unheard of (e.g. Java ME is a subset that runs on
| every SIM and bank card).
|
| > I'm also under the impression that building a good JIT
| for a JVM would be a massive undertaking
|
| Well, then just go with an okayish JIT. With all due
| respect, you ain't going to beat the JVM with your UVM's
| JIT compiler, not even close. Why do you think creating a
| similarly good JIT compiler to a very similar design
| would be any easier in case of UVM?
|
| But don't get me wrong, I just ask these questions
| because I dislike NIH syndrome and I believe there are
| useful lessons to be learned from the past. But you
| should be able to answer why the thing you do is any
| different (unless it is for learning). And the JVM spec
| is a surprisingly good read, and you can sure take great
| ideas from that.
| maxime_cb wrote:
| > With all due respect, you ain't going to beat the JVM
| with your UVM's JIT compiler, not even close.
|
| I think I may be able to get very close to native
| performance. I don't want to sound like an asshole by
| appealing to authority, but you aren't talking to a
| teenager writing an interpreter from their parent's
| basement. I have 21 years of programming experience, a
| PhD in compiler design and multiple published papers. I
| have some idea what I'm talking about.
|
| > Why do you think creating a similarly good JIT compiler
| to a very similar design would be any easier in case of
| UVM?
|
| The design is superficially similar to the JVM but it's
| also quite different. UVM's bytecode is untyped. It maps
| fairly directly to the x86-64 and ARMv8 instruction sets.
| If you want an idea of how a simple JIT compiler for a
| bytecode like that can perform, you should look at the
| performance of Apple's Rosetta. But, I actually think I
| can build something that yields better performance than
| that :)
| kaba0 wrote:
| I eagerly await your results, and didn't want to sound
| condescending at all, sorry if it came across like that.
| I was just genuinely interested in a - to me - more
| understandable difference.
|
| Also, what does "native performance" even mean here? Only
| removing the interpreter overhead?
| maxime_cb wrote:
| Thanks for clarifying. Tone is sometimes ambiguous via
| text.
|
| At the moment I'm in no rush to actually write the JIT
| compiler because I think it's faster to iterate with an
| interpreter. I want to flesh out the VM and its APIs,
| test the hell out of everything and develop the system a
| bit more first.
|
| The interpreter runs at something ~400 million
| instructions per second on my laptop, which is probably
| close to the performance of an old school Pentium 2 chip,
| so it's actually fast enough to run a lot of non-trivial
| software. With even a really basic JIT I should be able
| to hit 10x that throughput. I've benchmarked code out of
| GCC and it runs about 27 times faster (on a
| microbenchmark).
| fdaryfdyfgd wrote:
| i use qemu for that. bypass all speed and virtualization
| improvements via flags and you have a rather isolated system
| l_theanine wrote:
| Arguably more interesting is UXN: https://100r.co/site/uxn.html
|
| A small personal computing stack, with a plethora of examples
| ready to go. Built by two they/them hackers who live on a boat
| and basically have bootstrapped everything about their vessel,
| their computing, their engineering, etc. It's very, VERY old-
| school hacker-y.
|
| They would've made excellent phreaks back in the good ol' days.
| maxime_cb wrote:
| Author here. The creator of UXN is a friend of mine and we chat
| semi-regularly about our VMs.
|
| I have a lot of respect for uxn and credit it as an
| inspiration, but the goals of each project are different. UXN
| is a 16-bit system with 64KB of RAM accessible. It will also
| probably always remain interpreted. These design restrictions
| are seen as tools to foster creativity.
|
| UVM is a 32/64-bit VM. It's currently interpreted, but I've
| designed the instruction set with JIT compilation in mind. I
| have a PhD in compiler design and I'm fairly confident that I
| can make a fast JIT for UVM in a relatively short amount of
| time, when I feel the design is mature/stable enough.
|
| At the moment, UVM is relatively immature, but I want it to be
| a small/minimalistic VM that you can still build "real" or
| modern software in that takes good advantage of the
| capabilities and performance of your machine.
|
| Another difference is that IMO, UVM is more approachable. UXN's
| assembly language is fairly esoteric IMO. It doesn't look like
| any other assembly language I've ever seen. That doesn't make
| it bad, but it does potentially make it harder to learn and
| harder to leverage an existing base of programming skills.
| UVM's assembly is designed to not be surprising if you've ever
| programmed in assembly and know the basic ideas about how a
| stack machine works. I also have a WIP C compiler that's
| already usable to write simple programs. See my little snake
| game for a fun toy example:
| https://github.com/maximecb/uvm/blob/main/ncc/examples/snake...
|
| Assembly syntax example:
| https://github.com/maximecb/uvm/blob/main/vm/examples/factor...
|
| I'll point to the fact that there is almost no boilerplate
| necessary to start drawing some pixels on a 2D canvas, which
| IMO makes it a fun platform to develop for. Like I said, it's
| immature, but I'll iron out all the bugs I can find and keep
| making it better.
| n4ture wrote:
| Awesome project! I've been on the lookout for such projects
| ever since discovering uxn, I'll definitely have a look and
| keep an eye on uvm.
|
| >The creator of UXN is a friend of mine and we chat semi-
| regularly about our VMs.
|
| Does the discussion happen in a public place? If yes I'd be
| extremely happy to join in since I also got started with
| making my own system around a month ago, and it feels a bit
| lonely going on such an endeavor at times.
|
| It's extremely early and I haven't really shared it anywhere
| yet, but I feel there is already the possibilty to play
| around with the custom editor I made, try to make little
| graphical programs etc.. If you manage to build it that is (I
| develop mostly on OpenBSD and also try to make it build under
| Ubuntu with gcc sometimes).
|
| The source is hosted here for now:
| https://git.blazebone.com/pochi/
|
| The README (in the about tab) should give a rough explanation
| of what it is, I also have a bit of documentation already.
|
| >Another difference is that IMO, UVM is more approachable.
|
| Very interesting choice, I did away with such assumptions and
| ran the other way, my system might feel quite alien/esoteric
| since I went for something that draws a lot of inspiration
| from Chuck Moore's work with ColorForth as well as his F18
| chip.
| maxime_cb wrote:
| > Does the discussion happen in a public place? If yes I'd
| be extremely happy to join in since I also got started with
| making my own system around a month ago, and it feels a bit
| lonely going on such an endeavor at times.
|
| I'm happy to discuss anything in the GitHub discussions for
| UVM: https://github.com/maximecb/uvm/discussions
|
| > Very interesting choice, I did away with such assumptions
| and ran the other way, my system might feel quite
| alien/esoteric since I went for something that draws a lot
| of inspiration from Chuck Moore's work with ColorForth as
| well as his F18 chip.
|
| If you're building a system for fun, or to explore new
| ideas, then it seems fine to make it as esoteric as you
| want. However, in my experience, making esoteric choices
| when designing programming languages for instance, can
| really alienate potential users. Especially if you could
| have obviously gone with some more traditional and familiar
| choices but you went with something more esoteric that
| doesn't have any clear value added.
|
| IMO it's a bit like when it comes to terminology. If
| there's a commonly accepted way to refer to something, use
| it. Don't make up your own nomenclature, you'll just create
| extra confusion for no reason.
| samsquire wrote:
| Could you explain what you did to build in mind for JIT?
|
| I feel WASM has the opportunity to create a truly audiovisual
| API for interacting with computers.
| maxime_cb wrote:
| I go into some of the design decisions I made to make JIT
| optimizations easier here:
| https://github.com/maximecb/uvm/blob/main/doc/design.md
| samsquire wrote:
| You mention parallel computation and being open for
| discussion.
|
| My favourite area of computing is parallel computing and
| multithreading.
|
| My toy multithreaded interpreter in Java can communicate
| integers between threads with message passing. I never
| got around to communicating complicated objects because
| I'm not sure how to solve the garbage collection problem
| with compound data structures/object graphs AND sending
| objects between threads. I'm currently relying on Java
| garbage collection at this time but I have played with a
| garbage collector written by Matthew Plant
| (http://maplant.com/gc.html), so if I were to implement
| my language in C I could also be inspired by Pony's
| reference capabilities.
|
| Since my interpreter is a simple imaginary assembly
| interpreter I have instructions for "sendcode"
| "receivecode" which tell a thread to do a remote jump.
| There is also a "send" and "receive" instruction for
| sending data between threads in a thread safe manner.
| This uses actor style mailboxes behind the scenes.
| throwaway290 wrote:
| I was about to mention uxn. It's almost as if people with
| enough motivation to build and use this are also people who
| will never agree to use and contribute to one another's work.
| Not to say any of this work is a waste of course
| entaloneralie wrote:
| Creator of Uxn here, Maxime is a dear friend of mine and
| we've jammed extensively on the design of UVM together.
| deckard1 wrote:
| welcome to the world of Forth/Lisp
|
| It is somewhat sadly ironic that to preserve software one has
| to actually create software that other people care about
| preserving. And, as we've seen with emulators, if you have
| software people care about then there are no roadblocks that
| will stand in the way of someone figuring out how to get your
| ancient code to run. I take the whole 100R thing as more an
| interesting art project than a serious goal. More of a "what
| if".
|
| With that said, I feel most devs are the same. How often do
| you come across a lead or senior dev that becomes passive
| aggressive over people modifying "their" code? Quite often,
| in my experience. Any code that I didn't write sucks and any
| code I have to touch that I didn't write is the worst code
| ever written. Code that I didn't write is old obsolete dog
| turds and code I wrote is "modern". Every dev slaps "modern"
| on their open source project today.
| donnowhy wrote:
| I have not ever studied POSIX _academically_ ( _curricularly_ )
| but:
|
| > _a small set of minimalistic APIs that remain stable over time_
|
| but from what I've gathered (over years, by 'osmosis'). Isn't
| this the essential idea behind POSIX?
| maxime_cb wrote:
| Yes, but POSIX doesn't provide APIs for graphics or audio, for
| example, even though these things are necessary to build a lot
| of end-user software.
| samsquire wrote:
| This is awesome.
|
| Thank you for sharing.
|
| My ancient ruby code and nodejs code all broke because I didn't
| pin dependencies. As a result I've got software that is
| unrunnable.
|
| More software shall be unrunnable as time goes on, I don't know
| many trends that prevent software from being unbuildable and
| unrunnable due to change except maybe repeatable builds and
| hermetic builds.
|
| Given platform toolchains complexity and libc versions and
| complexity of static Vs dynamic linking, I suspect preserving
| software is very difficult.
|
| It seems doing + - x/ on numbers is not the difficult part of
| computers but arranging information into the right places in
| order to do it.
|
| Logistics and package management are difficult to get right.
|
| I think Java got something right. Bytecode is longlasting. Can
| rewrite the JVM for new platforms and architectures.
|
| I am writing my own language and it is implemented as an assembly
| interpreter and a compiler for that interpreter. This lets me get
| development speedier.
|
| What the hard thing I think is more interesting than bytecode or
| virtual machines is INTEROP.
|
| The Amd64 SysV Binary interface of registers for C calling
| interface and the System call Interface of Linux.
|
| Mozilla abandoned XPCOM extensions, part of the reason was
| performance of the interop between JavaScript and C++.
|
| If I could run a virtual machine and interop with modern code
| that would mean the software was useable for longer.
| whartung wrote:
| The JVM is an interesting use case.
|
| Indeed, at a compiled level, byte code is mostly long lasting.
|
| I haven't fired up a 10 year jar file recently but it would not
| surprise me if it Just Worked.
|
| The success of that is twofold. First is simply that whatever
| changes are being made to the JVM, they're mostly forward
| looking and don't deprecate running code.
|
| The other is that the conventional packaging mechanic is,
| essentially, a static binary. A "fat jar" is the term of art,
| with all of the dependencies bundled in.
|
| But there's still potential problems. They've been removing
| large subsystems from the JDK as of late. XML, web services,
| Java FX are poster examples. So legacy binaries depending on
| those will fail outright.
|
| These can be added back to the Java runtime, but still "one
| more thing".
|
| Of course from the source code side, Java suffers dependency
| hell and code rot along with the best of them. Network based
| dependencies up and vanish. Long standing projects may not
| publish 10 year old jar files any more.
|
| Also, Java has had other clods dropped into its churn. Oracle
| shutting down the java.net website was a huge sudden black hole
| in the community consciousness of Java. Overnight thousands of
| articles, blog posts, forum entries, and other artifacts
| vanished like Keyser Soze. Leaving behind a debris field of
| dead links across the internet.
|
| So, to be fair, the JVM is a boon. I really like Java, and it's
| still going strong. The VM architecture and the comprehensive
| nature of the Java runtime has made the moving of running code
| across systems much easier. As someone with an enterprise Java
| background, used to deploying WAR and EAR files, I got to
| mostly avoid the entire Docker and other such family of
| infrastructure. Install a JDK, install an App Server, all
| fairly trivial to isolate, and the system is ready to go.
|
| But in the large, it takes more than a VM to get things
| accomplished. There's always an eco-system at play.
|
| And one primary characteristic of eco-systems is they evolve.
| Time marches on, and waits for no one.
| samsquire wrote:
| Thank you for your thoughtful and interesting comment.
|
| I too like Java a lot and think it's a great technology.
|
| I can see how a runtime for a compiler (such as the JVM)
| doesn't need to change much over time: if the compiler
| produces the right code in 2000, then the code is probably
| still right in 2020, it just could use more features of the
| ISA that were introduced since then.
|
| On the other hand, the design of platform, library code and
| framework code is an ecosystem that is desperate to transform
| over time as better approaches to solving technical and
| business problems are found.
| tambourine_man wrote:
| That's one of the many advantages of building for the web. It
| won't ever be unsupported (for some reasonable definition of
| never).
|
| You get text rendering, canvas, audio, webgl... it's a pretty
| wide platform.
| maxime_cb wrote:
| Except that's just not true. It's already broken/incompatible
| in many places across browsers. You may not notice if you're
| just doing basic HTML/CSS, but if you do anything slightly
| more dynamic, you're going to notice. An ever-expanding set
| of complex APIs also makes it more and more likely that bugs
| will go unseen and unfixed. See two examples I detailed in
| the blog post.
| tambourine_man wrote:
| Sorry, I was reading the comments before reading your
| article.
|
| Getting around pointer events inconsistencies is a lot
| easier than building your own cross-platform VM, of course.
| But the project looks awesome and seems like a great
| initiative.
|
| I imagine there will also be differences in the way macOS,
| Linux and Windows handle graphics, IO, audio, etc, that
| will eventually leak to UVM, it's just the nature of the
| challenge.
| throwaway78941 wrote:
| > My ancient ruby code and nodejs code all broke because I
| didn't pin dependencies. As a result I've got software that is
| unrunnable.
|
| You can simply remove the ^ symbol from your versions listed in
| package.json and it will use exactly that version you
| originally added.
| maxime_cb wrote:
| Besides the issue of whether you pin or don't pin your
| dependencies, the problem is that node packages can depend on
| external native code. You can have several more layers of
| dependencies in there. If, for any reason, those native
| packages won't install/run on your machine, your dependencies
| can still break under you, even if you pin them. Python and
| Ruby have the same vulnerability when it comes to
| dependencies breaking.
| euclaise wrote:
| Reminds me of the Dis VM and Inferno OS
| jart wrote:
| > or that I should base my system on an existing processor
| architecture and work on something like Justine Tunney's blink
| instead.
|
| If she did then she'd be very welcome as a Blink developer.
| maxime_cb wrote:
| You seem to be doing just fine without me :)
|
| Blink is a very impressive project. Mad props.
| narag wrote:
| Very interesting for embedded or kiosk. Or in general, if the VM
| is run as a server inside a host: encryption in a box, instead of
| a library. Curious about what kind of multitask will be added.
| maxime_cb wrote:
| I could actually use some feedback when it comes to the design
| of the parallelism model for UVM. I have a few ideas but it's
| not my area of expertise, so I would welcome feedback and
| suggestions.
| narag wrote:
| I have experience as a user, implementation is way over my
| head. Depending on the minimum host supported, you could only
| use lightweight threads or write a scheduler. IIRC, JVM
| started using its own thread implementation and later used
| the one provided by the host OS, when available.
|
| Best of luck!
| maxime_cb wrote:
| At the moment I have an event-driven system where you can
| set up callbacks and timers. It's not green threads but it
| makes it easy to have multiple different update events
| running at different rates, for example: https://github.com
| /maximecb/uvm/blob/main/ncc/examples/attac...
| chc4 wrote:
| For _parallelism_ I 'd expect something like Cilk (or Rayon,
| which was inspired by it), where there's some syscalls for
| spawning and then joining on tasks, but the VM handles
| starting and managing the thread pool.
|
| For _concurrency_ it 's a lot harder to make a good interface
| efficax wrote:
| the jvm already exists and you can run ancient compiled class
| files with it.
| maxime_cb wrote:
| The level of cynicism on HN is sometimes really depressing.
|
| > the jvm already exists and you can run [some] ancient
| compiled class files with it [but many won't work correctly].
|
| FTFY.
| packetlost wrote:
| ok, but not everything wants the baggage that comes with the
| JVM.
___________________________________________________________________
(page generated 2023-02-25 23:01 UTC)