[HN Gopher] WebAssembly and Back Again: Fine-Grained Sandboxing ...
       ___________________________________________________________________
        
       WebAssembly and Back Again: Fine-Grained Sandboxing in Firefox 95
        
       Author : feross
       Score  : 303 points
       Date   : 2021-12-06 13:33 UTC (9 hours ago)
        
 (HTM) web link (hacks.mozilla.org)
 (TXT) w3m dump (hacks.mozilla.org)
        
       | pjmlp wrote:
       | With the recompilation back to C I fail to see how RLbox prevents
       | memory corruption due to lack of bounds checking or UB being
       | explored by the optimizer.
        
         | azakai wrote:
         | There is that risk, yes. For wasm2c to be correct it must emit
         | C code without undefined behavior. As best we know it does that
         | properly today, and we've tested and fuzzed it quite a lot, but
         | whenever you use an optimizing C compiler on the output there
         | is no 100% guarantee.
        
           | SAI_Peregrinus wrote:
           | There's no guarantee even without optimization. Some
           | optimizations exploit UB, but not all "unexpected" code
           | generation in the presence of UB is due to optimization.
           | 
           | Take signed int overflow on addition. On some platforms ADD
           | wraps, on others it traps. Depending on target CPU you'll get
           | very different behavior even for non-optimizing compilers
           | that just emit an ADD instruction!
           | 
           | WASM is just another target, with its own behavior.
        
           | Tobu wrote:
           | The host application is still C, and the final compilation of
           | the mix of C and sanitized C still relies on the optimizer to
           | be fast, so it might be possible for the untrusted library to
           | reveal UB in the rest of the application. I can see how the
           | whole approach (wasm + taint) would make compromise of the
           | sanitized bits through crafted data harder, but I'm not sure
           | it does enough for guarding against a supply chain compromise
           | of the library.
        
             | Tobu wrote:
             | And I see the safe languages section[1] in the WasmBoxC
             | post; that looks like it would address most of the
             | remaining UB risk, but Firefox will take some time getting
             | there.
             | 
             | [1]: https://kripken.github.io/blog/wasm/2020/07/27/wasmbox
             | c.html...
        
         | IshKebab wrote:
         | WASM is always bounds checked and doesn't have UB. The C code
         | produced by wasm2c should be the same.
        
           | pjmlp wrote:
           | WASM doesn't do bounds checking inside linear memory
           | segments, good luck preventing corruption on data structures
           | stored on the same segment.
           | 
           | Unless the produced C code is validated against optimizations
           | across every single compiler version, there are no guarantees
           | of that actually being the case.
        
             | afiori wrote:
             | they specifically address this in the article; the
             | objective is to be able to treat the component as untrusted
             | code (the same as a website's js) and sanitize/check the
             | component's output (the same as web APIs called by js).
             | 
             | Corruption is not an issue when you assume the corruptee to
             | already be an attacker
        
             | masklinn wrote:
             | > WASM doesn't do bounds checking inside linear memory
             | segments, good luck preventing corruption on data
             | structures stored on the same segment.
             | 
             | Isn't the point that you don't care?
             | 
             | Each untrusted library is compiled to wasm then C then
             | native, they can corrupt their own datastructures but the
             | point is to prevent that corruption from escaping those
             | boundaries, or at least that's how I understand it.
        
               | azakai wrote:
               | Mostly that's the case, yes. The main benefit of wasm
               | sandboxing in this situation is to keep any exploit of
               | these libraries in the sandbox - no memory corruption
               | outside. That's a big improvement on running the same
               | code outside of a sandbox.
               | 
               | (But in general corruption inside the sandbox is
               | potentially dangerous too. You need to be careful about
               | what you do with data you get from the sandboxed code.
               | RLBox does help in that area as well.)
        
               | pjmlp wrote:
               | Just because corruption doesn't escape the sandbox
               | doesn't means it isn't exploitable.
               | 
               | This like attacking microservices, you have a module that
               | exposes a set of interfaces and produces outputs when
               | called with specific APIs.
               | 
               | For the sake of example lets say you have an
               | authentication module that says if a given user id is
               | root.
               | 
               | Now imagine producing a sequence of API calls that it
               | will trigger the side effect of _is_root(id)_ being true
               | for an id that it is a plain user.
               | 
               | No sandbox escape took place, only internal corruption of
               | internal data keeping structures that lead the
               | _is_root()_ to misbehave.
        
       | davidkunz wrote:
       | Can't wait for Firefox XP.
        
       | kgeist wrote:
       | I wonder how it deals with intrinsics for optimization (SSE and
       | the like), does it fail to compile, or maybe WASM has some
       | support, or it's completely lost in translation?
        
         | azakai wrote:
         | For something like SSE to work you'd need both wasm and wasm2c
         | to support it.
         | 
         | Wasm doesn't support all of SSE, but wasm does have SIMD
         | support which is a portable subset of common SIMD instructions.
         | You may lose some performance there, but wasm is adding more
         | instructions to help (see "relaxed-simd"). There are also
         | headers to help translate between SSE and wasm SIMD for
         | existing code where possible.
         | 
         | wasm2c does have support for wasm SIMD, although I believe it
         | is not 100% complete yet.
        
       | SubzeroCarnage wrote:
       | Firefox appears to utilize a custom clang toolchain to enable
       | this without documenting how to make such toolchain (wasi
       | sysroot). And expects you to just download the precompiled
       | version from their servers.
       | 
       | Fedora and Fennec F-Droid have since disabled this feature.
       | 
       | https://src.fedoraproject.org/rpms/firefox/c/4cb1381d80a94c9...
       | 
       | https://gitlab.com/relan/fennecbuild/-/commit/12cdb51bb045c3...
        
         | fabrice_d wrote:
         | Pretty sure you can build it yourself from
         | https://github.com/WebAssembly/wasi-libc given that
         | https://github.com/WebAssembly/wasi-libc/commit/ad5133410f66...
         | is a contribution from a MoCo employee doing a lot of work
         | around toolchains.
        
           | floatboth wrote:
           | There's also the https://github.com/WebAssembly/wasi-sdk repo
           | which is kind of a meta-build-system for all this.
           | 
           | But in FreeBSD we build all the pieces directly, here's our
           | build recipes (with some hacks due to llvm's cmake code being
           | stupid sometimes):
           | 
           | compiler-rt (from llvm): https://github.com/freebsd/freebsd-
           | ports/blob/main/devel/was...
           | 
           | libc (from what you linked):
           | https://github.com/freebsd/freebsd-
           | ports/blob/main/devel/was...
           | 
           | libc++ (from llvm): https://github.com/freebsd/freebsd-
           | ports/blob/main/devel/was...
        
       | rajanaccros wrote:
       | How does this affect process isolation? If only _some_ components
       | can be sandboxed at this fine grained level, aren 't we still
       | subject to process isolation to sandbox everything else? It would
       | seem like one still has to run _fission.autostart true_ to
       | isolate the components that cannot be compiled in this way,
       | therefore not gaining the benefit of less overhead as stated in
       | the article.
        
         | bholley wrote:
         | The purpose of RLBox is to add an extra layer of component-
         | level isolation on top of Firefox's process-based site-level
         | isolation. The reduced overhead is relative to the hypothetical
         | scenario in which we performed the component-level isolation
         | with processes (rather than WebAssembly).
        
           | rajanaccros wrote:
           | Ohh I see. Not a replacement for process based site level
           | isolation. I just wasn't wrapping my head around that. Makes
           | much more sense now. Thanks for the explanation.
        
             | majkinetor wrote:
             | However, without removing processes it will still be as
             | slow as today, which I really hope browsers will do.
        
               | ekr____ wrote:
               | I think a more likely way to think about it is that this
               | allows us to sandbox things that would otherwise would
               | not be sandboxable. For a variety of reasons, it's
               | probably not practical to remove the existing process
               | sandboxes.
        
       | jerheinze wrote:
       | > Cross-platform sandboxing for Graphite, Hunspell, and Ogg is
       | shipping in Firefox 95, while Expat and Woff2 will ship in
       | Firefox 96.
       | 
       | I wonder what the other "good candidates" that he referred to
       | are.
        
       | anonymousDan wrote:
       | What's the performance overhead in comparison to the unsandboxed
       | version I wonder?
        
         | azakai wrote:
         | There are detailed performance numbers here on a variety of
         | real-world codebases:
         | 
         | https://kripken.github.io/blog/wasm/2020/07/27/wasmboxc.html
         | 
         | tl;dr Something like ~14% when using the best bounds checking
         | strategy, or ~42% when using the most portable one. (There are
         | options in the middle as well.)
        
         | bholley wrote:
         | It varies. Here's an example of some performance analysis I did
         | on the expat port:
         | https://bugzilla.mozilla.org/show_bug.cgi?id=1688452#c37
        
       | Jyaif wrote:
       | The downside to this technique is that wasm2c code is 50% slower,
       | so (at least for now) process-isolation is still a win in some
       | cases (when the overhead of process-isolation is small compared
       | to the rest).
       | 
       | Still, that's a very exciting development that could lead to a
       | revolution in operating systems.
        
         | floatboth wrote:
         | 42% slower only in the worst case using the slowest (explicit)
         | bounds checks, only 12% slower with a signal handler:
         | 
         | https://kripken.github.io/blog/wasm/2020/07/27/wasmboxc.html
        
       | chakkepolja wrote:
       | I am probably missing something: Why is WASM required here? Can't
       | these analysis done directly on LLVM IR?
        
         | glandium wrote:
         | LLVM IR is still CPU dependent, and is a moving target. WASM is
         | also a moving target, but much more controlled.
        
         | azakai wrote:
         | You're right that this could be done on LLVM IR. The MinSFI
         | project did exactly that basically, several years ago, but it
         | did not see adoption sadly.
         | 
         | The benefits of wasm over LLVM IR is that wasm has already done
         | the work to define the sandboxed format and build the tooling
         | to compile to it. Wasm is also almost as fast as running
         | normally. (Wasm is also portable and lacks undefined behavior,
         | although for this use case those might matter less.)
         | 
         | See the MinSFI section here which compares it directly to wasm
         | for sandboxing:
         | 
         | https://kripken.github.io/blog/wasm/2020/07/27/wasmboxc.html
         | 
         | And the original MinSFI presentation is here:
         | 
         | https://docs.google.com/presentation/d/1RD3bxsBfTZOIfrlq7HzG...
        
         | ink404 wrote:
         | It could be, seems like they already had something in WASM for
         | doing this so it made more sense to use that than re do on LLVM
         | IR.
        
         | fooyc wrote:
         | That's what I though too. C compilers should be able to achieve
         | that directly, and it's incredible that nobody though of doing
         | so yet.
         | 
         | What's great, though, is that they are achieving this with
         | tools that are already available.
        
       | throw10920 wrote:
       | WebAssembly is kind of a hack here (although a clever hack that
       | saves a lot of effort) - the essence of what the Mozilla folks
       | have done isn't _WebAssembly_ , it's a _trusted compiler_ - by
       | which I mean a compiler that emits trustable code, regardless of
       | how untrusted the source is. It 's a really neat idea that I hope
       | to see more adoption of, because our current security models for
       | software _suck_.
       | 
       | Security based on process isolation is extremely inefficient and
       | coarse-grained - having a trusted compiler could (eventually)
       | _massively_ increase performance by removing processes entirely
       | (no more virtual memory! no more TLB flushes and misses! less
       | task switch overhead!) and eliminating the kernel /user mode
       | separation, with an _increase_ in security.
       | 
       | "Could" because it's not clear to me if the reduction in
       | expressiveness from our languages now to future languages with a
       | theoretical trusted compiler (all jump targets have to be known
       | at compile-time?) will be accepted by the majority of the
       | populace. Look at how hard it is to get people to accept borrow-
       | checkers...
        
         | [deleted]
        
         | formerly_proven wrote:
         | In 2021, the world finally achieves AS/400 on the web.
        
           | wffurr wrote:
           | Can you expand on that? Was this a property of C compilers or
           | other languages on IBM mainframes?
           | 
           | I get that it's tongue in cheek, but it would probably be
           | even funnier / ironic if I had more context to understand it.
           | 
           | It's the same spirit as languages adopting functional
           | programming techniques aka rediscovering Lisp.
        
             | formerly_proven wrote:
             | AS/400 had a machine-independent binary format which was
             | translated ahead of execution by the system's specific
             | compiler into machine code, and all applications ran in the
             | same address space with zero memory protection because the
             | code generated by the compiler ensured isolation.
        
         | vanderZwan wrote:
         | > _a compiler that emits trustable code, regardless of how
         | untrusted the source is_
         | 
         | Hasn't this been one of the goals of the design of WebAssembly
         | since day one, and something that has been getting people
         | excited about it too? Using something for one of its intended
         | purposes isn't really a "hack", no?
        
           | IshKebab wrote:
           | Sort of. The main goal for WebAssembly is to be a fast
           | platform-agnostic compilation target that you can use in
           | websites. The "in websites" bit means that it has to be
           | completely safe (i.e. no accessing outside memory etc.) but
           | it's not the main goal.
           | 
           | This _is_ a bit of a hack because Mozilla don 't care about
           | the platform-agnostic bit, so they're taking LLVM IR,
           | compiling it to WebAssembly, then back to LLVM IR just so
           | that they can ensure that the code is safe.
           | 
           | WebAssembly comes with extra constraints that you probably
           | don't care about if you're compiling to native (e.g. there's
           | no 'goto') so you would get more efficient code (and probably
           | faster compilation) if you just had some way of compiling
           | LLVM IR directly to "safe binary".
           | 
           | That would probably be a mountain of work though so its
           | understandable why they went with this. Would be nice if they
           | said how much the performance was impacted.
        
           | oefrha wrote:
           | The hack is translating wasm back to C, then compiling again.
           | It wouldn't be a hack if they're running wasm, but they're
           | not.
        
             | masklinn wrote:
             | AFAIK the ability to compile wasm to native code was pretty
             | much always part of the goal, running wasm in a VM was
             | never the end-game.
             | 
             | Compiling _via C_ might be considered a hack (especially in
             | the sense that it introduces a potential weak link in the
             | chain), but it makes a lot of sense since they want to
             | integrate the result into the Firefox build artefacts, and
             | compiling via C is not exactly novel either.
        
         | madflame991 wrote:
         | > having a trusted compiler could (eventually) massively
         | increase performance by removing processes entirely (no more
         | virtual memory! no more TLB flushes and misses! less task
         | switch overhead!) and eliminating the kernel/user mode
         | separation
         | 
         | I saw a talk a while ago that was advocating for the same
         | thing, except this was about JS and not webassembly. I can't
         | find it tho - I remember it being related to the WAT js talk;
         | It also mentioned that it would eliminate rings on the cpu (and
         | simplify cpus) and context switches which would make execution
         | faster; they were citing some MS research on the matter - damn
         | I really wanna find the talk now...
         | 
         | Edit: https://www.destroyallsoftware.com/talks/the-birth-and-
         | death...
         | 
         | thanks BoppreH
         | 
         | MS research: "Hardware-based isolation incurs nontrivial
         | performance costs (up to 25-33%) and complicates system
         | implementations" (virtual memory and protection rings); I think
         | MS knows what they're talking about here
        
           | SigmundA wrote:
           | Singularity was a experimental OS written in a a variant of
           | C# and .Net managed code by MS Research that ran using
           | software isolated processes rather than hardware isolation,
           | this is probably what they where referencing:
           | 
           | https://en.wikipedia.org/wiki/Singularity_(operating_system)
        
             | kaba0 wrote:
             | http://joeduffyblog.com/2015/11/03/blogging-about-midori/
             | 
             | There is also a really great blog about Singularity's
             | "rebirth" experimental OS, Midori, that continued in its
             | footsteps.
        
           | [deleted]
        
           | throw10920 wrote:
           | Thanks for the link. I would argue that a true trusted
           | compiler needs to accept an unmanaged language and emit code
           | without a runtime, though. A runtime is cheating, because you
           | can always make one that implements an iron-clad sandbox that
           | doesn't require processes...by implementing a (very slow) VM.
           | 
           | To put in another way - I don't think that security or
           | performance are that hard to achieve on their own - the hard
           | part is getting _both at once_. And then, adding
           | expressiveness on top is even more difficult, as Rust as
           | aptly demonstrated.
        
             | kaba0 wrote:
             | Rust is not secure at all in the sense used here --
             | untrusted, arbitrary user code written in rust is a
             | security threat.
        
         | titzer wrote:
         | > it's a trusted compiler
         | 
         | Sorry I have to quibble here, but this term is already a thing,
         | and typically has the opposite connotation: a compiler that
         | _must be trusted_ because we cannot verify the output. It 's
         | trusted because we "trust" it (to not screw up).
         | 
         | I would argue that this makes the compiler _untrusted_ --we
         | don't care what it does, whatever it outputs is going to be
         | both statically and dynamically verified to not break the
         | sandbox properties.
         | 
         | > Security based on process isolation is extremely inefficient
         | and coarse-grained - having a trusted compiler could
         | (eventually) massively increase performance by removing
         | processes entirely (no more virtual memory! no more TLB flushes
         | and misses! less task switch overhead!) and eliminating the
         | kernel/user mode separation, with an increase in security.
         | 
         | I thought this right up until, well, about this time in 2017.
         | Side-channel attacks are a real and bad thing. Our conclusion
         | is that Spectre, in all its flavors, break confidentiality for
         | in-process memory, regardless of sandboxing technology. On
         | current hardware, there is no 100% bulletproof way to enforce
         | isolation.
        
         | zozbot234 wrote:
         | We will always need process boundaries to separate information
         | domains, because side-channel vulnerabilities are not addressed
         | by having a "trusted" compile step.
        
           | throw10920 wrote:
           | When it comes to side-channel attacks, process boundaries
           | aren't adequate either. Rowhammer, Meltdown/Spectre, and
           | friends show that handily, and the RSA key leakage attack
           | using SDR shows that even machine isolation isn't going to be
           | enough for some things.
           | 
           | I guess that the idea that trusted compilers are the way
           | forward is predicated on the assumption that we've managed to
           | mitigate most/all side-channel attacks, because there really
           | isn't much you can do about those otherwise.
        
           | lisper wrote:
           | Why can't a trusted compiler prevent side-channel attacks?
           | All you need to do is prevent the code from accessing the
           | side channel. It seems to me that doing this at compile time
           | would actually be easier than doing it at run time.
        
             | titzer wrote:
             | A key side-channel is execution time, and no, in general,
             | you can't prevent a program from getting a clock. Even
             | without a clock, one can construct one easily using shared
             | memory and threads. Clocks are also easy to find. Even with
             | low resolution clocks, timing differences can be amplified
             | programmatically, making them observable.
        
             | sfink wrote:
             | It _is_ easy. You just have to forbid access to timers and
             | loops. Where  "timers" include anything that can count and
             | store the count in shared memory.
             | 
             | Alternatively, you could forbid branches (and therefore
             | loops, implicitly).
        
               | leni536 wrote:
               | If you can use a loop for timing then you can use an
               | unrolled loop for timing too.
        
       | vlovich123 wrote:
       | I can't find it right now but I read a paper that showed that the
       | WASM security model is weaker than native compiled code in some
       | cases. For example, due to compiler and OS hardening techniques,
       | exploits of a libpng flaw weren't exploitable unless run in WASM.
       | You couldn't escape the WASM sandbox but the a application itself
       | could be compromised.
       | 
       | I'm sure that this approach is valid as a hardening measure but
       | some of the enthusiasm in the post is perhaps worthy of
       | temperance. This thunk through WASM can't protect against runtime
       | heap overflows and such.
       | 
       | > However, the transformation places two key restrictions on the
       | target code: it can't jump to unexpected parts of the rest of the
       | program, and it can't access memory outside of a specified region
       | 
       | Oof. The paper I recall specifically called these out as not
       | enforceable. The libpng example in the paper directly had an
       | external request have libpng corrupt and access other WASM memory
       | than it owned (in this model it would be other in-process native
       | memory I think or at least the other code placed within the same
       | heap region unless each component gets its own which then means
       | you need to have a fixed memory allocated upfront...).
        
         | miloignis wrote:
         | Are you perhaps thinking of "Everything Old is New Again:
         | Binary Security of WebAssembly"? (
         | https://www.usenix.org/system/files/sec20-lehmann.pdf )
         | 
         | In any case, I think you've misunderstood the security
         | properties. WASM can have weaker security _within_ the sandbox
         | because it doesn 't have access to some of the more
         | sophisticated mitigation measures that native code does, but
         | the security of the sandbox boundary itself is _very solid_.
         | 
         | The part of the article that you quote is accurate in the sense
         | that I believe it was meant - the code cannot jump to
         | unexpected parts of _the rest_ of the program (outside the
         | sandbox) and cannot access memory outside of a specified region
         | (the sandboxed memory). A vulnerability might allow the target
         | code to jump to somewhat unexpected parts _inside_ the sandbox,
         | or buffer overflows _inside_ the sandbox, but not outside.
         | 
         | As such, it's actually a really effective application of a WASM
         | sandbox!
        
           | pjmlp wrote:
           | Triggering a fire inside the castle might be enough to change
           | the output of calls being done into the sandbox, it doesn't
           | need to escape it to be exploitable.
        
             | miloignis wrote:
             | They're addressing this as well - from the article:
             | 
             | > This, in turn, makes it easy to apply without major
             | refactoring: the programmer only needs to sanitize any
             | values that come from the sandbox (since they could be
             | maliciously-crafted), a task which RLBox makes easy with a
             | tainting layer.
        
               | pjmlp wrote:
               | Programmer only needs to sanitize....
               | 
               | You mean like all those great programmers that keep
               | introducing bugs like the one recently found out by
               | Project Zero?
        
           | titzer wrote:
           | > WASM can have weaker security within the sandbox because it
           | doesn't have access to some of the more sophisticated
           | mitigation measures that native code does,
           | 
           | And that's mostly read-only data pages. The primary blocker
           | there is how to integrate that capability with ArrayBuffer in
           | the web platform, since Wasm memories can be exposed (or
           | aliased) as ArrayBuffer objects, and most engines aren't
           | prepared to encourage non-writable holes in ArrayBuffers.
        
         | Deukhoofd wrote:
         | > WASM security model is weaker than native compiled code in
         | some cases. For example, due to compiler and OS hardening
         | techniques, exploits of a libpng flaw weren't exploitable
         | unless run in WASM.
         | 
         | If I understand the post correctly, it's still native compiled
         | code in the end, and it won't run through WASM. The goal of the
         | approach sounds more like a code sanitizer tool to ensure the
         | external library they're using isn't making calls outside of
         | it, or requesting memory beyond the region its given.
        
       | Deukhoofd wrote:
       | So if I'm understanding this correctly, Firefox compiles its
       | dependencies as WASM, effectively blocking function calls to
       | things it shouldn't and illegal memory access, and then
       | translates it back to C so it can compile it normally? Sounds
       | neat!
        
         | fyrn- wrote:
         | Not back into C, from WASM to native executable / asm / object
         | code
        
           | flohofwoe wrote:
           | The solution described in the post actually translates C/C++
           | to WASM, and then translates the WASM bytecode back to C (via
           | a tool called wasm2c), which is then fed back into the C
           | compiler again to compile to native code, all 'offline' in
           | the Firefox build process.
        
           | bholley wrote:
           | Deukhoofd is correct -- we compile the WASM code back into C
           | in order to reuse and reduce friction with our existing
           | compilation pipeline.
        
           | Hendrikto wrote:
           | That was only for the prototype, the current implementation
           | is:
           | 
           | source code -> WASM -> C -> native code
        
       | FpUser wrote:
       | I am confused. Isn't WASM supposed to be eventually AOTed (Ahead
       | of Time Compile) or a least JITed? Why this bizarre twist with
       | WASM-C-NATIVE? Browser should do just that instead of these
       | dances around.
        
         | floatboth wrote:
         | This isn't for running WASM _from the web_. This has nothing
         | whatsoever to do with the WASM JIT that 's in Spidermonkey.
         | This is sandboxing for internal components of the browser (or
         | any application really). But, this _is_ a kind of AOT
         | compilation involving WASM in the middle.
        
       | mfrw wrote:
       | This sandboxing is achieved via RLBox[0], which is a toolkit for
       | sandboxing third-party libraries. It comprises of a WASM sandbox
       | and an API which existing application can leverage. The research
       | paper[1].
       | 
       | [0]: https://plsyssec.github.io/rlbox_sandboxing_api/sphinx/
       | 
       | [1]: https://arxiv.org/abs/2003.00572
        
       | paulgdp wrote:
       | How is it different from using Clang's CFI (control flow
       | integrity)?
       | 
       | I thought this was the same technique used in webassembly.
       | 
       | Chromium is using this too i think
        
         | azakai wrote:
         | CFI helps with control flow exploits, but it doesn't prevent
         | memory corruption for example.
         | 
         | This sandboxing technique ensures that both control flow and
         | memory accesses remain in the sandbox (except for when you
         | explicitly allow otherwise).
        
       | Jyaif wrote:
       | > it can't access memory outside of a specified region
       | 
       | How are segmentation faults handled?
        
         | azakai wrote:
         | Wasm is defined to trap when it accesses memory outside the
         | sandbox (the embedder can decide how to handle that trap, say
         | by shutting down that particular sandbox).
         | 
         | With wasm2c the trapping can be implemented in a variety of
         | ways, for example using the signal handler trick like wasm VMs
         | do (~14% overhead) or manual bounds checks (~42% overhead, but
         | fully portable).
        
           | bholley wrote:
           | I believe the implementation in Firefox masks off the high
           | bits of pointers and adds the result to the base address
           | before performing a load/store. This requires us to reserve a
           | power-of-two-sized region of address space, but we can
           | lazily/incrementally commit the pages as the sandboxed code
           | invokes sbrk.
        
             | azakai wrote:
             | Thanks for the details bholley!
             | 
             | Do you plan to use the signal handler trick eventually?
             | Less portable but in my tests it shrinks the total overhead
             | by half (from masking's 29% to 14%).
        
       | jayd16 wrote:
       | Reminds me a bit of Apple's AOT protections together with Unity's
       | IL2CPP approach.
        
       | makeworld wrote:
       | I don't have a lot of knowledge in this area, but using WASM for
       | forcing code to be safe seems bizarre. Why aren't there just
       | compiler flags that can enforce the same restrictions they want?
        
         | masklinn wrote:
         | > Why aren't there just compiler flags that can enforce the
         | same restrictions they want?
         | 
         | Because they want to compile arbitrary code in order to sandbox
         | it.
         | 
         | The alternative is something like eBPF, but that imposes a
         | limited subset of the source language, which would be unlikely
         | to work with something like a video decoder.
        
         | 7373737373 wrote:
         | Why is it bizarre? The Wasm function interface seems perfect
         | for sandboxing code. It's a "whitelist" system, the contained
         | process can only call external functions that have been
         | explicitly attached, perfect for implementing the capability
         | security paradigm and progressively hollowing out the attack
         | surface by separating functionality into several instances.
        
           | Deukhoofd wrote:
           | Requiring compilation to WASM to then translate it back to C
           | and compile it again might be a bit strange. Clang obviously
           | already has the tools to do the WASM sanitizing, it might be
           | really cool to have a way to directly enforce those rules
           | outside of WASM.
        
             | bilkow wrote:
             | As I understand it, clang doesn't have the tools to
             | sanitize WASM. It just emits WASM, which, malicious or
             | benevolent, can't access memory outside its designated
             | memory regions.
             | 
             | It's wasm2c job to ensure that the C generated enforces the
             | WASM memory rules, so I'd say the one sanitizing the code
             | is not clang but wasm2c.
        
         | the_duke wrote:
         | Webassembly is much more restricted than regular machine code.
         | 
         | It's a stack machine with a limited set of operations, no
         | direct control over the stack/control flow and restricted
         | access to memory.
         | 
         | It's way easier to compile this limited set of operations to
         | assembly (or C) that is guaranteed to not do things it
         | shouldn't.
        
         | IshKebab wrote:
         | You definitely _could_ do that. It would just be a ton of work
         | and nobody has done it.
        
           | deian wrote:
           | It is doable, but it's hard to make it fast on all platform.
           | See the SegmentZero32 description in <https://cseweb.ucsd.edu
           | /~dstefan/pubs/kolosick:2022:isolatio...> for an example
           | prototype.
        
         | azakai wrote:
         | For technical reasons adding compiler flags to do that is
         | fairly hard. You'd need to handle a lot of things like
         | compiling to the sandboxed format, system library support, the
         | FFI to normal code, etc. It would be possible to do all that,
         | but wasm has already done it - so compiling to wasm as an
         | intermediary step is the most practical solution.
         | 
         | (See also https://news.ycombinator.com/item?id=29460766)
        
         | bholley wrote:
         | Beyond the reasons others have mentioned, another key issue is
         | that this isn't a transparent transformation. The sandboxed
         | code can only access memory within a restricted subregion,
         | which often requires some small code changes on both sides of
         | the boundary (for example, copying input data into that memory
         | region so that sandboxed code can operate on it).
         | 
         | So implementing this in the compiler would entail some fairly
         | involved handshaking between the code and the compiler beyond
         | the normal scope of C/C++. Doing this in a library instead --
         | and leaning on a well-understood and well-studied execution
         | model -- makes everything a bit more natural to work with.
        
         | Tobu wrote:
         | NaCL (native client) sort of did this, but through an entirely
         | separate toolchain. It's not an easy task.
        
         | gostsamo wrote:
         | This looks more like using the compilation to wasm and back to
         | automatically rewrite the code of entire components in a manner
         | that makes them safer.
        
       | sdze wrote:
       | What a stupid idea to run bytecode in a browser.
       | 
       | We have gaping security holes with JavaScript already.
       | 
       | Stop the madness.
        
         | Jyaif wrote:
         | It's compiling the webasm back to C, so it's not running
         | bytecode.
        
       | BoppreH wrote:
       | Getting strong vibes of The Birth and Death of JavaScript (2014)
       | [1], one of the numerous great talks by Gary Bernhardt.
       | 
       | My engineer side is happy seeing how strong tooling enables such
       | creative features with high assurances.
       | 
       | My futurist side is dreading the day Intel launches their first
       | Javascript/WebAssembly-only processor.
       | 
       | [1] https://www.destroyallsoftware.com/talks/the-birth-and-
       | death...
        
         | MangoCoffee wrote:
         | i don't think JavaScript going to die but its time that we have
         | another option for the web. JavaScript have its warts. some
         | people love JavaScript and some don't. its not fair for
         | JavaScript to be the only option. i see Web Assembly as an
         | option for people who doesn't like JavaScript warts to use
         | their favor langue to develop for the web.
        
           | Omnius wrote:
           | I feel like the only people that "like" javascript or those
           | that had it for their first language. Its needed, its better
           | than it was, but compared to just about any other language
           | its a total mess.
        
             | k__ wrote:
             | My career was C, Java, PHP, JavaScript.
             | 
             | I like JS the most.
             | 
             | It's flexible, lightweight, and omnipresent.
             | 
             | The only other mainstream language that gives me that
             | feeling is Rust.
        
               | allisfalafel wrote:
               | Rust is hardly omnipresent. I understand it has trouble
               | with lesser used architectures and operating systems.
               | While yes you probably dont use them, they do still
               | exist.
        
         | lelandfe wrote:
         | Brilliant, hilarious talk, thanks for linking.
        
         | mmastrac wrote:
         | ARM already has Java and JavaScript extensions in their CPUs,
         | so that day isn't completely off the horizon yet.
         | 
         | I'm not even sure it would be a terrible idea, as we'd have a
         | very interesting JS/WASM-like set of opcodes that we could
         | target with _any_ compiler.
        
           | hajile wrote:
           | The "Javascript instruction" is a bit of a misnomer.
           | 
           | JS accidentally got part of the x86 execution model for float
           | conversion baked into the spec. ARM added an instruction to
           | mimic the old x86 one. It's potentially useful in some other
           | contexts too.
        
             | mmastrac wrote:
             | Regardless, FJCVTZS is still literally a "Javascript"
             | instruction: "Floating-point Javascript Convert to Signed
             | fixed-point, rounding toward Zero".
        
       | dmix wrote:
       | Here is an example of sandboxing a library and then calling
       | functions:
       | rlbox::rlbox_sandbox<rlbox_noop_sandbox> sandbox;
       | sandbox.create_sandbox();
       | sandbox.invoke_sandbox_function(hello);
       | 
       | https://github.com/PLSysSec/rlbox_sandboxing_api/blob/master...
       | 
       | Seems like it could get a bit verbose when used all over the
       | place but I guess there's always a cost with security and having
       | clearly defined risky parts also helps. Regardless I'm happy to
       | see the effort being made beyond process isolation and OS
       | capabilities.
        
       | kevincox wrote:
       | This is a really powerful tool and I hope we see this used more.
       | Traditional process based sandboxing is very efficient inside the
       | process, but IPC is very expensive. This approach flips the
       | tradeoffs exactly backwards as the sandboxed code is slower, but
       | IPC is nearly free. This means that it can cover exactly the
       | space that was too expensive to sandbox before. The two
       | approaches are perfect compliments for each other. I now imagine
       | that the vast majority of code can be put into one of these two
       | groups leaving very little code that is unable to be sandboxed
       | for performance reasons.
        
       ___________________________________________________________________
       (page generated 2021-12-06 23:00 UTC)