[HN Gopher] Show HN: (bits) of a Libc, Optimized for Wasm
___________________________________________________________________
Show HN: (bits) of a Libc, Optimized for Wasm
I make a no-CGO Go SQLite driver, by compiling the amalgamation to
Wasm, then loading the result with wazero (a CGO-free Wasm
runtime). To compile SQLite, I use wasi-sdk, which uses wasi-libc,
which is based on musl. It's been said that musl is slow(er than
glibc), which is true, to a point. musl uses SWAR on a size_t to
implement various functions in string.h. This is fine, except
size_t is just 32-bit on Wasm. I found that implementing a few of
those functions with Wasm SIMD128 can make them go around 4x
faster. Other functions don't even use SWAR; redoing _those_ can
make them 16x faster. Smooth sort also has trouble pulling its own
weight; a Shell sort seems both simpler and faster, while similarly
avoiding recursion, allocations and the addressable stack. I found
that using SIMD intrinsics (rather than SWAR) makes it easier to
avoid UB, but the code would definitely benefit from more eyeballs.
See this for some benchmarks on both x86-64 and Aarch64:
https://github.com/ncruces/go-sqlite3/actions/runs/145169318...
Author : ncruces
Score : 46 points
Date : 2025-04-18 18:06 UTC (4 hours ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| phickey wrote:
| This looks like a nice approach to making wasi-libc faster. Could
| you submit these changes upstream?
| ncruces wrote:
| I'd like to be a _little_ more sure that I 'm not totally
| messing things up _before_ doing that, but yes, eventually,
| that would be a nice outcome.
|
| I've also only really tested wazero. I can't know for sure that
| this is a straight improvement for other runtimes and
| architectures.
|
| For instance, the code delays using wasm_i8x16_bitmask as much
| as possible, because on Aarch64 it can be slower than not using
| SIMD at all, whereas it's plenty fast on x86-64.
| phickey wrote:
| The maintainers of wasi-libc are some of the best people to
| review this, and I don't think it would be wasting their time
| to ask them to look at a PR.
| ncruces wrote:
| A PR is a significant investment from me. I'd have to
| figure out where something like this is supposed to fit,
| how the build infra works, etc.
|
| One of the nice things about Go is how much that's a solved
| issue out of the box, compared to almost everything else;
| certainly compared to C.
|
| Pinging them in an issue:
| https://github.com/WebAssembly/wasi-libc/issues/580
| nu11ptr wrote:
| It is still a bit early, but I'm majorly bullish on WASM for
| multiple use cases:
|
| 1. Client side browser polyglot "applets" (Java applets were
| ahead of their time IMO)
|
| 2. Server side polyglot "servlets" (Node.js, embedded runtimes,
| etc.)
|
| 3. Language interop/FFI (Lang A -> WASM -> Lang B, like wasm2c)
|
| Why is #3 so interesting? The hardest thing in language
| conversion is the library calls. WASI standardizes that, so all
| the proprietary libs will eventually compile down to WASI as a
| sort of POSIX/libc like layer. In addition, WASM standardizes
| calling convention. The resulting new source code may not look
| like much, but it will solve the FFI calling
| convention/marshalling/library issues nicely.
| frumplestlatz wrote:
| I'm not sure how it solves the FFI problem. Lowest common
| denominator calling conventions don't make it any easier to
| bridge languages than it already is.
|
| C calling conventions are already the standard for FFI in
| native code, and that means dropping down to what can be
| expressed in C if you want to cross that boundary.
| ncruces wrote:
| As far as Go is concerned, the Wasm sandbox makes the
| (addressable, C) stack explicit, which solves at least _some_
| of the issues CGO has to deal with.
|
| It's not a panacea, though; it introduces other issues.
| fuhsnn wrote:
| Wasm intrinsics look neat as a higher-level fixed size SIMD
| abstraction. I wonder how good the compilers can do if using them
| for AOT targets with libraries like simd-everywhere.
|
| string.h is missing strstr(), there's an algorithm of similar
| complexity you might consider:
| http://0x80.pl/notesen/2016-11-28-simd-strfind.html
| ncruces wrote:
| Yeah, so far I did _exactly_ the ones (my build of) SQLite
| needed and not others.
|
| If there's interest, the set of implemented functions can
| definitely be extended.
___________________________________________________________________
(page generated 2025-04-18 23:00 UTC)