[HN Gopher] I Wrote a WebAssembly VM in C
___________________________________________________________________
I Wrote a WebAssembly VM in C
Author : irreducible
Score : 217 points
Date : 2025-02-03 14:30 UTC (8 hours ago)
(HTM) web link (irreducible.io)
(TXT) w3m dump (irreducible.io)
| pdubroy wrote:
| This is great! The WebAssembly Core Specification is actually
| quite readable, although some of the language can be a bit
| intimidating if you're not used to reading programming language
| papers.
|
| If anyone is looking for a slightly more accessible way to learn
| WebAssembly, you might enjoy WebAssembly from the Ground Up:
| https://wasmgroundup.com
|
| (Disclaimer: I'm one of the authors)
| MuffinFlavored wrote:
| I know one of WebAssembly's biggest features by design is
| security / "sandbox".
|
| But I've always gotten confused with... it is secure because by
| default it can't do much.
|
| I don't quite understand how to view WebAssembly. You write in
| one language, it compiles things like basic math (nothing with
| network or filesystem) to another and it runs in an
| interpreter.
|
| I feel like I have a severe lack/misunderstanding. There's a
| ton of hype for years, lots of investment... but it isn't like
| any case where you want to add Lua to an app you can add
| WebAssembly/vice versa?
| jeroenhd wrote:
| WebAssembly can communicate through buffers. WebAssembly can
| also import foreign functions (Javascript functions in the
| browser).
|
| You can get output by reading the buffer at the end of
| execution/when receiving callbacks. So, for instance, you
| pass a few frames worth of buffers to WASM, WASM renders
| pixels into the buffers, calls a callback, and the Javascript
| reads data from the buffer (sending it to a <canvas> or
| similar).
|
| The benefit of WASM is that it can't be very malicious by
| itself. It requires the runtime to provide it with exported
| functions and callbacks to do any file I/O, network I/O, or
| spawning new tasks. Lua and similar tools can go deep into
| the runtime they exist in, altering system state and messing
| with system memory if they want to, while WASM can only
| interact with the specific API surface you provide it.
|
| That makes WASM less powerful, but more predictable, and in
| my opinion better for building integrations with as there is
| no risk of internal APIs being accessed (that you will be
| blamed for if they break in an update).
| panic wrote:
| I don't believe it is currently possible for a WebAssembly
| instance to access any buffer other than its own memory.
| You have to copy data in and out.
| deathanatos wrote:
| The embedder could hand the module functions for
| manipulating external buffers via externrefs. (I'm not
| sure if that's a good idea, or not, just that it
| _could_.)
|
| But if the module wants to compute on the values in the
| buffer, at some level it would have to copy the data
| in/out.
| davexunit wrote:
| Use the GC instructions and you can freely share heap
| references amongst other modules and the host.
| panic wrote:
| How do you access the contents of a heap reference from
| JavaScript in order to "send it to a <canvas> or
| similar"?
| davexunit wrote:
| Assuming you're talking about reading binary data like
| (array i8), the GC MVP doesn't have a great answer right
| now. Have to call back into wasm to read the bytes.
| Something for the group to address in future proposals.
| Sharing between wasm modules is better right now.
| brabel wrote:
| > Lua and similar tools can go deep into the runtime they
| exist in, altering system state and messing with system
| memory if they want to
|
| That's not correct, when you embed Lua you can choose which
| APIs are available, to make the full stdlib available you
| must explicitly call `luaL_openlibs` [1].
|
| [1]
| https://www.lua.org/manual/5.3/manual.html#luaL_openlibs
| Karellen wrote:
| > You write in one language
|
| Not quite. Web assembly isn't a source language, it's a
| compiler target. So you should be able to write in C, Rust,
| Fortran, or Lua and compile any of those to WebAssembly.
|
| Except that WebAssembly is a cross-platform assembly
| language/machine code which is very similar to the native
| machine code of many/most contemporary CPUs. This means a
| WebAssembly interpreter can be very straightforward, and
| could often translate one WebAssembly instruction to one
| native CPU instruction. Or rather, it can _compile_ a stream
| of WebAssembly instructions almost one-to-one to native CPU
| instructions, which it can then execute directly.
| whizzter wrote:
| A JIT should be able to translate most arithmetic and
| binary instructions to single-opcodes, however anything
| involving memory and functions calls needs safety checks
| that becomes multi-instruction. branches could mostly be
| direct _unless_ the runtime has any kind of metering (it
| should) to stop eternal loops (if it also wants to be
| crash-safe even if it's exploit safe).
| kouteiheika wrote:
| > anything involving memory [..] needs safety checks that
| becomes multi-instruction
|
| Not necessarily; on AMD64 you can do memory accesses in a
| single instruction relatively easily by using the CPU's
| paging machinery for safety checks plus some clever use
| of address space.
|
| > branches could mostly be direct _unless_ the runtime
| has any kind of metering (it should) to stop eternal
| loops
|
| Even with metering the branches would be direct, you'd
| just insert the metering code at the start of each basic
| block (so that's two extra instructions at the start of
| each basic block). Or did you mean something else?
| charleslmunger wrote:
| Metering also doesn't require a branch if you implement
| it with page faults. See "Implicit suspend checks" in
| https://android-developers.googleblog.com/2023/11/the-
| secret...
| kouteiheika wrote:
| Yep. That's a nice trick; unfortunately it's non-
| deterministic.
| beardyw wrote:
| Yes, interpretation on the fly was never its intention. The
| intention was to provide interpreted languages with a way
| to implement fast compiled functions.
| pdubroy wrote:
| You should check out the book :-)
|
| We have a chapter called "What Makes WebAssembly Safe?" which
| covers the details. You can get a sneak peek here:
| https://bsky.app/profile/wasmgroundup.com/post/3lh2e4eiwnm2p
| coliveira wrote:
| I think the biggest advantage of wasm in terms of security is
| that it doesn't accept machine language written in the target
| machine, only in this artificial machine language. This means
| that it cannot encode arbitrary code that could be executed
| by the host machine. Everything it runs has necessarily to go
| through the wasm interpreter.
| hb-robo wrote:
| That's quite interesting. This is way outside of my
| wheelhouse - has this kind of approach been tried in other
| security contexts before? What would you even call that,
| virtualization?
| tubs wrote:
| Java.
| dmitrygr wrote:
| The word is "bytecode" and the idea is as old as
| computing.
| wyldfire wrote:
| > This means that it cannot encode arbitrary code that
| could be executed by the host machine.
|
| But the host machine still can, so it's not as big of
| advantage in that regard. If you could somehow deliver a
| payload of native code and jump to it, it'd work just fine.
| But the security you get is the fact that it's really hard
| to do that because there's no wasm instructions to jump to
| arbitrary memory locations (even if all the host ISAs do
| have those). Having a VM alone doesn't provide security
| against attacks.
|
| It's often the case that VMs are used with memory-safe
| languages and those languages' runtime bounds checks and
| other features are what gives them safety moreso than their
| VM. In fact, most bytecode languages provide a JIT
| (including some wasm deployments) so you're actually
| running native code regardless.
| Gupta2 wrote:
| Speaking of WebAssembly security, is it vulnerable to
| Spectre/CPU style attacks like those in JavaScript? (WASM
| without imported JS functions)
| saagarjha wrote:
| Yes.
| jeffparsons wrote:
| Yes, if you give the Wasm instance access to timers.
| pizlonator wrote:
| > But I've always gotten confused with... it is secure
| because by default it can't do much.
|
| Yes. That's a super accurate description. You're not
| confused.
|
| > I don't quite understand how to view WebAssembly. You write
| in one language, it compiles things like basic math (nothing
| with network or filesystem) to another and it runs in an
| interpreter.
|
| Almost. Wasm is cheap to JIT compile and the resulting code
| is usually super efficient. Sometimes parity with native
| execution.
|
| > I feel like I have a severe lack/misunderstanding. There's
| a ton of hype for years, lots of investment... but it isn't
| like any case where you want to add Lua to an app you can add
| WebAssembly/vice versa?
|
| It's definitely a case where the investment:utility ratio is
| high. ;-)
|
| Here's the trade off between embedding Lua and embedding
| Wasm:
|
| - Both have the problem that they are only as secure as the
| API you expose to the guest program. If you expose `rm -rf /`
| to either Lua or Wasm, you'll have a bad time. And it's
| surprisingly difficult to convince yourself that you didn't
| accidentally do that. Security is hard.
|
| - Wasm is faster than Lua.
|
| - Lua is a language for humans, no need for another language
| and compiler. That makes Lua a more natural choice for
| embedded scripting.
|
| - Lua is object oriented, garbage collected, and has a very
| principled story for how that gets exposed to the host in a
| safe way. Wasm source languages are usually not GC'd. That
| means that if you want to expose object oriented API to the
| guest program, then it'll feel more natural to do that with
| Lua.
|
| - The wasm security model is dead simple and doesn't
| (necessarily) rely on anything like GC, making it easier to
| convince yourself that the wasm implementation is free of
| security vulnerabilities. If you want a sandboxed execution
| environment then Wasm is better for that reason.
| veltas wrote:
| > actually quite readable, although some of the language can be
| a bit intimidating if you're not used to reading programming
| language papers
|
| You're more generous than me, I think it's rubbish.
|
| Would have been easier to read if they had written it more like
| an ISA manual.
| amw-zero wrote:
| This is an opportunity to learn. The way WebAssembly is
| defined is the standard way PL semantics are defined.
| mananaysiempre wrote:
| You can understand the WASM spec in your sleep if you've ever
| worked through a type-system paper from the last two decades
| (or a logic paper from even earlier I guess).
|
| Granted, not many people have, but there's a reason why it
| makes sense for it to be written in that style: they want it
| to be very clear that the verification (typechecking, really)
| algorithm doesn't have any holes, and for that it's
| reasonable to speak the language of the people who prove that
| type of thing for a living.
|
| The WASM spec is also the ultimate authoritative reference
| for both programmers and implementers. That's different from
| the goals of an ISA manual, which usually only targets
| programmers and just says "don't do that" for certain dark
| corners of the (sole) implementation. (The RISC-V manual is
| atypical in this respect; still, I challenge you to describe
| e.g. which PC value the handler will see if the user code
| traps on a base RV32IMA system.)
| amw-zero wrote:
| I think it's much better to just learn how to read inference
| rules. They're actually quite simple, and are used ubiquitously
| to define PL semantics definitions.
|
| Constraining this on "that's not an option" is a big waste of
| time - learning this will open up all of the literature written
| on the subject.
| shpongled wrote:
| The WASM spec is so well defined presumably because Andreas
| Rossberg is the editor - and he did a bunch of PL research on
| extensions to Standard ML, which is famous for it's
| specification!
| deivid wrote:
| This is a really nice write up! It's giving me motivation to go
| back to my WASM implementation
| syrusakbary wrote:
| This is an interesting approach, great work!
|
| For anyone that wants to check where the meat is at, is mostly in
| this file:
| https://github.com/irrio/semblance/blob/main/src/wrun.c
|
| Thinking out loud, I think it would have been a great idea to
| conform with the Wasm-C-API (https://github.com/WebAssembly/wasm-
| c-api) as a standard interface for the project (which most of the
| Wasm runtimes: Wasmer, V8, wasmi, etc. have adopted), as the API
| is already in C and it would make it easier to try for developers
| familiar with that API.
|
| Note for the author: if you feel familiar enough with Wasm and
| you would like to contribute into Wasmer, we would also welcome
| any patches or improvements. Keep up the work work!
| oguz-ismail wrote:
| > Wasmer
|
| > Installed-Size: 266 MB
|
| What the hell
| syrusakbary wrote:
| Indeed, we need to improve further the base binary size!
|
| Most of the size comes from the LLVM backend, which is a bit
| heavy. Wasmer ships many backends by default, and if you were
| to use Wasmer headless that would be just a bit less than a
| Mb.
|
| If you want, you can always customize the build with only the
| backends that you are interested in using.
|
| Note: I've seen some builds of LLVM under 5-10Mb, but those
| require heavy customization. Is clear that we have still some
| work to do to reduce size on the general build!
| CyberDildonics wrote:
| So you are shipping all of llvm?
| benatkin wrote:
| Um, the author is clearly familiar enough with Wasm, but
| probably knows enough to know to avoid a company that tried to
| trademark WebAssembly.
|
| > understandable concerns about the fact we, Wasmer, a VC-
| backed corporation, attempted to trademark the name of a non-
| profit organization, specifically WebAssembly
|
| Acknowledgement of wrongdoing.
| kowlo wrote:
| I hadn't heard about this - terrible.
| arjvik wrote:
| Read the whole blogpost that quote was taken from:
|
| https://wasmer.io/posts/wasmer-and-trademarks-extended
|
| I don't think this is as much of a smoking gun as it is
| made out to be.
| syrusakbary wrote:
| "Mistakes teach us, forgiveness frees us" - ChatGPT (o3-mini)
|
| https://wasmer.io/posts/wasmer-and-trademarks
| pcmoore wrote:
| I found this article very interesting with regards direct WASM
| interpretation: https://arxiv.org/abs/2205.01183
|
| I produced https://github.com/peterseymour/winter on the back of
| it and learnt WASM is not as simple as it should be.
| whizzter wrote:
| One tip for the author from another one, the spec-test contains
| various weird forms of textual wasm that isn't obvious how to
| compile but the wast2json converter can produce a simpler JSON
| desc accompanies by regular binary wasm files.
| bhelx wrote:
| Same tip here. We did this with Chicory:
| https://github.com/dylibso/chicory
|
| I'd follow on that, the earlier you can get this test-suite
| running the better for the iteration speed and correctness of
| your project.
|
| It took a bit of time to make everything work, but once we did,
| we very quickly got to the point of running anything. The test-
| suite is certainly incomplete but gets you 95% there:
| https://github.com/WebAssembly/testsuite
| davexunit wrote:
| This was a fun read! I wrote a Wasm interpreter in Scheme awhile
| back so it makes me happy to see more people writing their own.
| It is less difficult than you might think. I encourage others to
| give the spec a look and give it a try. No need to implement
| every instruction, just enough to have fun.
| greasy wrote:
| This is awesome.
| autumnlani wrote:
| This is awesome. Nicely done
| abnercoimbre wrote:
| Take a look at Orca [0] since I think you'd be a great
| contributor there.
|
| [0] https://orca-app.dev
| UncleEntity wrote:
| Heh, I also made the decision to focus on one project instead of
| hopping between new and shiny with the exception that I'm getting
| our AI overloads to do all the yak shaving -- which is
| frustrating to say the least...
|
| --edit--
|
| Oh, and I also was going to suggest using a library like libffi
| to make calls into C so you can do multiple arguments and
| whatnot.
| Jyaif wrote:
| Regarding using WebAssembly as a plugin API, like for zed:
|
| how do plugin developers debug their code? Is there a way for
| them to do breakpoint debugging for example? What happens if
| their code crash, do they get a stacktrace?
___________________________________________________________________
(page generated 2025-02-03 23:00 UTC)