hngopher.com

       [HN Gopher] I Wrote a WebAssembly VM in C
       ___________________________________________________________________
        
       I Wrote a WebAssembly VM in C
        
       Author : irreducible
       Score  : 217 points
       Date   : 2025-02-03 14:30 UTC (8 hours ago)
        
 (HTM) web link (irreducible.io)
 (TXT) w3m dump (irreducible.io)
        
       | pdubroy wrote:
       | This is great! The WebAssembly Core Specification is actually
       | quite readable, although some of the language can be a bit
       | intimidating if you're not used to reading programming language
       | papers.
       | 
       | If anyone is looking for a slightly more accessible way to learn
       | WebAssembly, you might enjoy WebAssembly from the Ground Up:
       | https://wasmgroundup.com
       | 
       | (Disclaimer: I'm one of the authors)
        
         | MuffinFlavored wrote:
         | I know one of WebAssembly's biggest features by design is
         | security / "sandbox".
         | 
         | But I've always gotten confused with... it is secure because by
         | default it can't do much.
         | 
         | I don't quite understand how to view WebAssembly. You write in
         | one language, it compiles things like basic math (nothing with
         | network or filesystem) to another and it runs in an
         | interpreter.
         | 
         | I feel like I have a severe lack/misunderstanding. There's a
         | ton of hype for years, lots of investment... but it isn't like
         | any case where you want to add Lua to an app you can add
         | WebAssembly/vice versa?
        
           | jeroenhd wrote:
           | WebAssembly can communicate through buffers. WebAssembly can
           | also import foreign functions (Javascript functions in the
           | browser).
           | 
           | You can get output by reading the buffer at the end of
           | execution/when receiving callbacks. So, for instance, you
           | pass a few frames worth of buffers to WASM, WASM renders
           | pixels into the buffers, calls a callback, and the Javascript
           | reads data from the buffer (sending it to a <canvas> or
           | similar).
           | 
           | The benefit of WASM is that it can't be very malicious by
           | itself. It requires the runtime to provide it with exported
           | functions and callbacks to do any file I/O, network I/O, or
           | spawning new tasks. Lua and similar tools can go deep into
           | the runtime they exist in, altering system state and messing
           | with system memory if they want to, while WASM can only
           | interact with the specific API surface you provide it.
           | 
           | That makes WASM less powerful, but more predictable, and in
           | my opinion better for building integrations with as there is
           | no risk of internal APIs being accessed (that you will be
           | blamed for if they break in an update).
        
             | panic wrote:
             | I don't believe it is currently possible for a WebAssembly
             | instance to access any buffer other than its own memory.
             | You have to copy data in and out.
        
               | deathanatos wrote:
               | The embedder could hand the module functions for
               | manipulating external buffers via externrefs. (I'm not
               | sure if that's a good idea, or not, just that it
               | _could_.)
               | 
               | But if the module wants to compute on the values in the
               | buffer, at some level it would have to copy the data
               | in/out.
        
               | davexunit wrote:
               | Use the GC instructions and you can freely share heap
               | references amongst other modules and the host.
        
               | panic wrote:
               | How do you access the contents of a heap reference from
               | JavaScript in order to "send it to a <canvas> or
               | similar"?
        
               | davexunit wrote:
               | Assuming you're talking about reading binary data like
               | (array i8), the GC MVP doesn't have a great answer right
               | now. Have to call back into wasm to read the bytes.
               | Something for the group to address in future proposals.
               | Sharing between wasm modules is better right now.
        
             | brabel wrote:
             | > Lua and similar tools can go deep into the runtime they
             | exist in, altering system state and messing with system
             | memory if they want to
             | 
             | That's not correct, when you embed Lua you can choose which
             | APIs are available, to make the full stdlib available you
             | must explicitly call `luaL_openlibs` [1].
             | 
             | [1]
             | https://www.lua.org/manual/5.3/manual.html#luaL_openlibs
        
           | Karellen wrote:
           | > You write in one language
           | 
           | Not quite. Web assembly isn't a source language, it's a
           | compiler target. So you should be able to write in C, Rust,
           | Fortran, or Lua and compile any of those to WebAssembly.
           | 
           | Except that WebAssembly is a cross-platform assembly
           | language/machine code which is very similar to the native
           | machine code of many/most contemporary CPUs. This means a
           | WebAssembly interpreter can be very straightforward, and
           | could often translate one WebAssembly instruction to one
           | native CPU instruction. Or rather, it can _compile_ a stream
           | of WebAssembly instructions almost one-to-one to native CPU
           | instructions, which it can then execute directly.
        
             | whizzter wrote:
             | A JIT should be able to translate most arithmetic and
             | binary instructions to single-opcodes, however anything
             | involving memory and functions calls needs safety checks
             | that becomes multi-instruction. branches could mostly be
             | direct _unless_ the runtime has any kind of metering (it
             | should) to stop eternal loops (if it also wants to be
             | crash-safe even if it's exploit safe).
        
               | kouteiheika wrote:
               | > anything involving memory [..] needs safety checks that
               | becomes multi-instruction
               | 
               | Not necessarily; on AMD64 you can do memory accesses in a
               | single instruction relatively easily by using the CPU's
               | paging machinery for safety checks plus some clever use
               | of address space.
               | 
               | > branches could mostly be direct _unless_ the runtime
               | has any kind of metering (it should) to stop eternal
               | loops
               | 
               | Even with metering the branches would be direct, you'd
               | just insert the metering code at the start of each basic
               | block (so that's two extra instructions at the start of
               | each basic block). Or did you mean something else?
        
               | charleslmunger wrote:
               | Metering also doesn't require a branch if you implement
               | it with page faults. See "Implicit suspend checks" in
               | https://android-developers.googleblog.com/2023/11/the-
               | secret...
        
               | kouteiheika wrote:
               | Yep. That's a nice trick; unfortunately it's non-
               | deterministic.
        
             | beardyw wrote:
             | Yes, interpretation on the fly was never its intention. The
             | intention was to provide interpreted languages with a way
             | to implement fast compiled functions.
        
           | pdubroy wrote:
           | You should check out the book :-)
           | 
           | We have a chapter called "What Makes WebAssembly Safe?" which
           | covers the details. You can get a sneak peek here:
           | https://bsky.app/profile/wasmgroundup.com/post/3lh2e4eiwnm2p
        
           | coliveira wrote:
           | I think the biggest advantage of wasm in terms of security is
           | that it doesn't accept machine language written in the target
           | machine, only in this artificial machine language. This means
           | that it cannot encode arbitrary code that could be executed
           | by the host machine. Everything it runs has necessarily to go
           | through the wasm interpreter.
        
             | hb-robo wrote:
             | That's quite interesting. This is way outside of my
             | wheelhouse - has this kind of approach been tried in other
             | security contexts before? What would you even call that,
             | virtualization?
        
               | tubs wrote:
               | Java.
        
               | dmitrygr wrote:
               | The word is "bytecode" and the idea is as old as
               | computing.
        
             | wyldfire wrote:
             | > This means that it cannot encode arbitrary code that
             | could be executed by the host machine.
             | 
             | But the host machine still can, so it's not as big of
             | advantage in that regard. If you could somehow deliver a
             | payload of native code and jump to it, it'd work just fine.
             | But the security you get is the fact that it's really hard
             | to do that because there's no wasm instructions to jump to
             | arbitrary memory locations (even if all the host ISAs do
             | have those). Having a VM alone doesn't provide security
             | against attacks.
             | 
             | It's often the case that VMs are used with memory-safe
             | languages and those languages' runtime bounds checks and
             | other features are what gives them safety moreso than their
             | VM. In fact, most bytecode languages provide a JIT
             | (including some wasm deployments) so you're actually
             | running native code regardless.
        
           | Gupta2 wrote:
           | Speaking of WebAssembly security, is it vulnerable to
           | Spectre/CPU style attacks like those in JavaScript? (WASM
           | without imported JS functions)
        
             | saagarjha wrote:
             | Yes.
        
             | jeffparsons wrote:
             | Yes, if you give the Wasm instance access to timers.
        
           | pizlonator wrote:
           | > But I've always gotten confused with... it is secure
           | because by default it can't do much.
           | 
           | Yes. That's a super accurate description. You're not
           | confused.
           | 
           | > I don't quite understand how to view WebAssembly. You write
           | in one language, it compiles things like basic math (nothing
           | with network or filesystem) to another and it runs in an
           | interpreter.
           | 
           | Almost. Wasm is cheap to JIT compile and the resulting code
           | is usually super efficient. Sometimes parity with native
           | execution.
           | 
           | > I feel like I have a severe lack/misunderstanding. There's
           | a ton of hype for years, lots of investment... but it isn't
           | like any case where you want to add Lua to an app you can add
           | WebAssembly/vice versa?
           | 
           | It's definitely a case where the investment:utility ratio is
           | high. ;-)
           | 
           | Here's the trade off between embedding Lua and embedding
           | Wasm:
           | 
           | - Both have the problem that they are only as secure as the
           | API you expose to the guest program. If you expose `rm -rf /`
           | to either Lua or Wasm, you'll have a bad time. And it's
           | surprisingly difficult to convince yourself that you didn't
           | accidentally do that. Security is hard.
           | 
           | - Wasm is faster than Lua.
           | 
           | - Lua is a language for humans, no need for another language
           | and compiler. That makes Lua a more natural choice for
           | embedded scripting.
           | 
           | - Lua is object oriented, garbage collected, and has a very
           | principled story for how that gets exposed to the host in a
           | safe way. Wasm source languages are usually not GC'd. That
           | means that if you want to expose object oriented API to the
           | guest program, then it'll feel more natural to do that with
           | Lua.
           | 
           | - The wasm security model is dead simple and doesn't
           | (necessarily) rely on anything like GC, making it easier to
           | convince yourself that the wasm implementation is free of
           | security vulnerabilities. If you want a sandboxed execution
           | environment then Wasm is better for that reason.
        
         | veltas wrote:
         | > actually quite readable, although some of the language can be
         | a bit intimidating if you're not used to reading programming
         | language papers
         | 
         | You're more generous than me, I think it's rubbish.
         | 
         | Would have been easier to read if they had written it more like
         | an ISA manual.
        
           | amw-zero wrote:
           | This is an opportunity to learn. The way WebAssembly is
           | defined is the standard way PL semantics are defined.
        
           | mananaysiempre wrote:
           | You can understand the WASM spec in your sleep if you've ever
           | worked through a type-system paper from the last two decades
           | (or a logic paper from even earlier I guess).
           | 
           | Granted, not many people have, but there's a reason why it
           | makes sense for it to be written in that style: they want it
           | to be very clear that the verification (typechecking, really)
           | algorithm doesn't have any holes, and for that it's
           | reasonable to speak the language of the people who prove that
           | type of thing for a living.
           | 
           | The WASM spec is also the ultimate authoritative reference
           | for both programmers and implementers. That's different from
           | the goals of an ISA manual, which usually only targets
           | programmers and just says "don't do that" for certain dark
           | corners of the (sole) implementation. (The RISC-V manual is
           | atypical in this respect; still, I challenge you to describe
           | e.g. which PC value the handler will see if the user code
           | traps on a base RV32IMA system.)
        
         | amw-zero wrote:
         | I think it's much better to just learn how to read inference
         | rules. They're actually quite simple, and are used ubiquitously
         | to define PL semantics definitions.
         | 
         | Constraining this on "that's not an option" is a big waste of
         | time - learning this will open up all of the literature written
         | on the subject.
        
           | shpongled wrote:
           | The WASM spec is so well defined presumably because Andreas
           | Rossberg is the editor - and he did a bunch of PL research on
           | extensions to Standard ML, which is famous for it's
           | specification!
        
       | deivid wrote:
       | This is a really nice write up! It's giving me motivation to go
       | back to my WASM implementation
        
       | syrusakbary wrote:
       | This is an interesting approach, great work!
       | 
       | For anyone that wants to check where the meat is at, is mostly in
       | this file:
       | https://github.com/irrio/semblance/blob/main/src/wrun.c
       | 
       | Thinking out loud, I think it would have been a great idea to
       | conform with the Wasm-C-API (https://github.com/WebAssembly/wasm-
       | c-api) as a standard interface for the project (which most of the
       | Wasm runtimes: Wasmer, V8, wasmi, etc. have adopted), as the API
       | is already in C and it would make it easier to try for developers
       | familiar with that API.
       | 
       | Note for the author: if you feel familiar enough with Wasm and
       | you would like to contribute into Wasmer, we would also welcome
       | any patches or improvements. Keep up the work work!
        
         | oguz-ismail wrote:
         | > Wasmer
         | 
         | > Installed-Size: 266 MB
         | 
         | What the hell
        
           | syrusakbary wrote:
           | Indeed, we need to improve further the base binary size!
           | 
           | Most of the size comes from the LLVM backend, which is a bit
           | heavy. Wasmer ships many backends by default, and if you were
           | to use Wasmer headless that would be just a bit less than a
           | Mb.
           | 
           | If you want, you can always customize the build with only the
           | backends that you are interested in using.
           | 
           | Note: I've seen some builds of LLVM under 5-10Mb, but those
           | require heavy customization. Is clear that we have still some
           | work to do to reduce size on the general build!
        
             | CyberDildonics wrote:
             | So you are shipping all of llvm?
        
         | benatkin wrote:
         | Um, the author is clearly familiar enough with Wasm, but
         | probably knows enough to know to avoid a company that tried to
         | trademark WebAssembly.
         | 
         | > understandable concerns about the fact we, Wasmer, a VC-
         | backed corporation, attempted to trademark the name of a non-
         | profit organization, specifically WebAssembly
         | 
         | Acknowledgement of wrongdoing.
        
           | kowlo wrote:
           | I hadn't heard about this - terrible.
        
             | arjvik wrote:
             | Read the whole blogpost that quote was taken from:
             | 
             | https://wasmer.io/posts/wasmer-and-trademarks-extended
             | 
             | I don't think this is as much of a smoking gun as it is
             | made out to be.
        
           | syrusakbary wrote:
           | "Mistakes teach us, forgiveness frees us" - ChatGPT (o3-mini)
           | 
           | https://wasmer.io/posts/wasmer-and-trademarks
        
       | pcmoore wrote:
       | I found this article very interesting with regards direct WASM
       | interpretation: https://arxiv.org/abs/2205.01183
       | 
       | I produced https://github.com/peterseymour/winter on the back of
       | it and learnt WASM is not as simple as it should be.
        
       | whizzter wrote:
       | One tip for the author from another one, the spec-test contains
       | various weird forms of textual wasm that isn't obvious how to
       | compile but the wast2json converter can produce a simpler JSON
       | desc accompanies by regular binary wasm files.
        
         | bhelx wrote:
         | Same tip here. We did this with Chicory:
         | https://github.com/dylibso/chicory
         | 
         | I'd follow on that, the earlier you can get this test-suite
         | running the better for the iteration speed and correctness of
         | your project.
         | 
         | It took a bit of time to make everything work, but once we did,
         | we very quickly got to the point of running anything. The test-
         | suite is certainly incomplete but gets you 95% there:
         | https://github.com/WebAssembly/testsuite
        
       | davexunit wrote:
       | This was a fun read! I wrote a Wasm interpreter in Scheme awhile
       | back so it makes me happy to see more people writing their own.
       | It is less difficult than you might think. I encourage others to
       | give the spec a look and give it a try. No need to implement
       | every instruction, just enough to have fun.
        
       | greasy wrote:
       | This is awesome.
        
       | autumnlani wrote:
       | This is awesome. Nicely done
        
       | abnercoimbre wrote:
       | Take a look at Orca [0] since I think you'd be a great
       | contributor there.
       | 
       | [0] https://orca-app.dev
        
       | UncleEntity wrote:
       | Heh, I also made the decision to focus on one project instead of
       | hopping between new and shiny with the exception that I'm getting
       | our AI overloads to do all the yak shaving -- which is
       | frustrating to say the least...
       | 
       | --edit--
       | 
       | Oh, and I also was going to suggest using a library like libffi
       | to make calls into C so you can do multiple arguments and
       | whatnot.
        
       | Jyaif wrote:
       | Regarding using WebAssembly as a plugin API, like for zed:
       | 
       | how do plugin developers debug their code? Is there a way for
       | them to do breakpoint debugging for example? What happens if
       | their code crash, do they get a stacktrace?
        
       ___________________________________________________________________
       (page generated 2025-02-03 23:00 UTC)