[HN Gopher] WebAssembly: Adding Python support to WASM language ...
___________________________________________________________________
WebAssembly: Adding Python support to WASM language runtimes
Author : assambar
Score : 134 points
Date : 2023-01-30 16:02 UTC (1 days ago)
(HTM) web link (wasmlabs.dev)
(TXT) w3m dump (wasmlabs.dev)
| brrrrrm wrote:
| the issue right now with Python support in WASM (at least for
| machine learning, the main driver of the language) is that Python
| is largely a wrapper language and none the utilities that make it
| so powerful (numpy, PyTorch, JAX) work particularly well in wasm,
| since it's so limited performance-wise (no FMA, no GPU support).
|
| I'm excited for pairing wasm with WebGPU, which will likely
| unblock these projects from building support for the
| web/untrusted ecosystem. A useful project would be one that makes
| this integration really easy to build today and a flip of the
| switch to turn on in the future.
| mjw1007 wrote:
| I've come across this notion that nowadays machine learning
| provides (in some sense) the biggest group of Python users a
| few times recently.
|
| What reason is there to suppose this is true? It seems
| surprising to me.
| claytonjy wrote:
| It's really hard to do much ML in anything _except_ python.
| Virtually everyone improving the ML ecosystems of other
| language got their start in Python and are knowingly
| competing with Python (e.g. R, Julia). If you want to get
| started in ML today, python is the obvious easiest path
| forward.
|
| So, most ML users are python users. I don't know how that
| group compares to non-ML python users, but I have a feeling
| there isn't a flood of eager new Django devs the way there is
| Pytorch users. Most non-ML things you could do with python
| can be done similarly well in Go/Rust/Typescript, but there's
| no other option for most ML stuff.
| mjw1007 wrote:
| I found a recentish (2021) survey at [1] which suggests
| that in 2021 ML was some way behind web development,
| sysadmin stuff, and data analysis among Python users (and
| didn't seem to be on the way up the list).
|
| [1] https://lp.jetbrains.com/python-developers-
| survey-2021/#Gene...
| claytonjy wrote:
| Great source; looks like I've quite underestimated the
| python-web-dev crowd's size.
|
| I'm curious what the longer-term trends look like; not
| much change between consecutive years.
|
| Data analysis is basically a pre-requisite for ML, so the
| combined "data stuff" usage is quite a lot bigger than
| web dev usage!
| _visgean wrote:
| > What reason is there to suppose this is true? It seems
| surprising to me.
|
| One reason is its just super easy for input output
| operations. ML is all about data and getting the data to the
| right place is really easy in python compared to some other
| languages..
| still_grokking wrote:
| Which languages?
|
| Python is OOP; but the "classical" data-centric languages
| are actually all more or less in the FP space. (I count
| array languages and APL-likes to FP in this case).
|
| Just an example: You don't have immutable data types by
| default in Python. This is actually a pretty bad default
| for data processing tasks.
| m00dy wrote:
| I have integrated pyodide + webgpu recently. (you can do matmul
| using webgpu's compute pipeline). The real problem is that
| browser tabs have 4gb max memory size. So, training neural
| networks on this stack is almost impossible. ( I don't even
| want to mention pyTorch's dependency hell).
| miohtama wrote:
| WebAssembly Memory64 is coming
|
| https://webassembly.org/roadmap/
| brrrrrm wrote:
| My claim is that it's not easy, not impossible. There's
| little incentive to hack in JavaScript or maintain a Pyodide
| compatible build. The 4gb limit isn't a technical limitation,
| just a standards thing (it could change easily).
| c120 wrote:
| So for someone who has python installed locally, what's the
| point?
|
| Is it just the sandbox or is there anything else I'm missing?
| kasajian wrote:
| It's not for someone who only runs Python locally.
| angelmm wrote:
| You get an extra layer of isolation, even at your development
| environment level.
|
| I remember a NodeJs CVE that was caused by a poisoned
| dependency. It was affecting people when downloading it from
| npm.
|
| There's still a gap here to cover, but the benefits may be
| worth :)
| ElectricalUnion wrote:
| I don't see how this would in any way prevent you from being
| affected by a equivalent poisoned pypi dependency; after all
| your secrets/credentials are inside the sandbox anyways or
| your code can't work.
| angelmm wrote:
| With Wasm + WASI, you need to explicitly mount files and
| environment variables. Inside the Wasm VM, the Python
| interpreter, source code and dependencies only have access
| to a very reduced surface. Although you're right that if
| you mount credentials inside, they will be accessible too.
|
| The incident I was talking about was the event-stream[1]
| vulnerability. The attacker introduced code that looked for
| the data of a crypto wallet. This data was stored in the
| user's home.
|
| By default, interpreters may get access to the same
| resources that the user running the process. In Wasm, the
| resources are granted manually.
|
| [1] https://blog.npmjs.org/post/180565383195/details-about-
| the-e...
| still_grokking wrote:
| > By default, interpreters may get access to the same
| resources that the user running the process. In Wasm, the
| resources are granted manually.
|
| What's the difference to run the code under a different
| user (like for example `nobody` for "full sandboxing", or
| a "clone of nobody" with some additional access rights)?
| chc wrote:
| If you're just looking to run trusted scripts locally, there
| isn't much point. If you're running a system that uses wasm,
| this means you can now easily support Python.
| AshleysBrain wrote:
| How does this handle garbage collection? AFAIK the WebAssembly GC
| proposal is still in development. Does it implement GC in WASM
| code?
| amelius wrote:
| Perhaps it just uses Python's built-in garbage collector that
| just increases/decreases the data segment size as needed by
| calling sbrk()?
| ridruejo wrote:
| Correct, it is just CPython compiled to Wasm (similar to
| compiling to x86 or arm)
| robertlagrant wrote:
| The non-Docker version seems to require an external site-
| packages, unless I missed it. Is it possible to produce a single
| wasm binary with all dependencies compiled in?
| seddonm1 wrote:
| I have been following and playing with this repository:
| https://github.com/singlestore-labs/python-wasi/
|
| It builds a single Python WASM module with all dependencies
| included (they use VFS) and a Dockerfile to make the process
| easy (and actually worked first go). It does produce large
| files though: wasi-python3.11.wasm 110MB
| ridruejo wrote:
| Yes! Single store is a great team. We are currently using
| some of their work for this Python release, like libz
| angelmm wrote:
| Hey! Dev here :)
|
| For external libraries, it requires you to mount the libraries
| with WASI when running the python.wasm module. Another option
| we're exploring is to use wasi-vfs[1] to include some common
| modules in our pre-built binaries. For example, Ruby does
| require some extra libraries for common workloads (like JSON
| parsing). This is still on the exploration phase, but we may do
| something with it.
|
| [1] https://github.com/kateinoigakukun/wasi-vfs
| robertlagrant wrote:
| Very cool. We ship some Python as a Debian dependency and so
| this could become a really interesting way to package
| everything up.
| simonw wrote:
| This looks very promising!
|
| The thing I most want to solve right now is this: I want to write
| a regular Python application that can safely execute untrusted
| Python code in a WASM sandbox as part of its execution.
|
| I want to do this so I can let end users customize my web
| applications in weird and interesting ways by pasting their own
| Python code into a textarea - think features like "run this
| Python code to transform my stored data" - without them being
| able to break my system.
|
| This feels like it should be pretty easy with WebAssembly! It's
| the classic code sandboxing problem - long a big challenge in
| Python world - finally solved in a robust way.
|
| I've been finding it surprisingly hard to get a proof-of-concept
| of this working though.
|
| Essentially I want to be able to do this, in my regular Python
| code: import some_webassembly_engine
| python = some_webassembly_engine.load( "python.wasm",
| max_cpu_time_in_seconds=3.0,
| max_allowed_memory_in_bytes=32000000 ) result =
| python.execute("3 + 5")
|
| I've not yet figured out the incantations I need to actually do
| this - in particular the limits on CPU and memory time.
|
| I posed this question on Mastodon recently and Jim Kring put
| together this demo, which gets most of the way there (albeit
| using an old Python 3.6 build):
| https://github.com/jimkring/python-sandbox-wasm
|
| It doesn't feel like this should be as hard to figure out as it
| is!
| irrational wrote:
| Why do this on the client? Why not pass it to the server and
| run it on Python there?
| simonw wrote:
| That's what I'm talking about: I want to run Python code on
| my server, but since it's from an untrusted source I want to
| make sure that it's in a sandbox with strict limits on what
| it can do, how much CPU it can use and how much RAM it has
| available to it - so malicious code can't be used to crash my
| server or steal data it shouldn't have access to.
| callahad wrote:
| The startup I'm working at is basically trying to do exactly
| that as a service, but a one-off thing for a regular Python
| application _shouldn 't_ be as hard to figure out as it is. Can
| you link to the Mastodon thread (darn lack of search!) and we
| can continue there?
| simonw wrote:
| Here's the Mastodon conversation:
| https://fedi.simonwillison.net/@simon/109682777068881522
|
| (I'm so close to building my own search engine just against
| my own content there.)
| phickey wrote:
| Wasmtime's `wasmtime-py` embedding in python has support for
| Wasm Components: https://github.com/bytecodealliance/wasmtime-
| py#components (disclosure, I helped create it)
|
| The remaining piece of the puzzle would be to create a wit-
| bindgen guest generator
| https://github.com/bytecodealliance/wit-bindgen#guests for this
| build of the python interpreter. You could then seamlessly call
| back and forth between the host and guest pythons, without even
| knowing that wasmtime is under the hood.
| simonw wrote:
| If you could provide example code for how to do this - how to
| run a snippet of untrusted Python code using wasmtime-py with
| a CPU and RAM limit - I would shout it from the rooftops. I
| think a LOT of people would benefit from clear examples of
| how to actually achieve this.
| samsquire wrote:
| This would be great. And with an exposeable API for safety a
| memory safe API that could be exposed to wasm applications. And
| rate limited.
| mritchie712 wrote:
| Have you tried to do it with pyodide? What issues did you hit
| using that?
| simonw wrote:
| Pyodide isn't currently supported outside of browsers, though
| that might change:
| https://github.com/pyodide/pyodide/issues/869
|
| Either way, I couldn't figure out how to do the above
| sequence of steps with any of the available Python WASM
| runtimes - they're all very under-documented at the moment,
| sadly. I tried all three of these:
|
| - https://github.com/wasmerio/wasmer-python
|
| - https://github.com/bytecodealliance/wasmtime-py
|
| - https://github.com/wasm3/pywasm3
| mike_hearn wrote:
| FWIW although it's not WebAssembly based, you can do that with
| GraalVM. It has a concept of language contexts which can be
| sandboxed including those constraints. There are two caveats:
|
| 1. Sandboxing for CPU time and max allowed memory requires the
| enterprise edition, so you'd have to pay for it.
|
| 2. The Python engine isn't 100% compatible with regular Python,
| although that may not matter for your use case as the
| compatibility is pretty good and issues mostly show up around
| extension modules.
| dayeye2006 wrote:
| Can anyone give me a ELI5 version what is the relationship
| between this and pyodie?
| ridruejo wrote:
| Pyodide is for the browser, this is intended for server side
| environments, so it can interact with files, sockets etc via
| WASI standard
| assambar wrote:
| Ready-to-use python.wasm, also in a Docker+Wasm container image.
| still_grokking wrote:
| But please don't forget to wrap it in at least some VM! /s
|
| That's not even funny, as in real life people would run
| something like that actually in a VM.
|
| So we have now: HW memory protection -> HW virtualization -> VM
| -> OS -> Docker -> WASM -> language runtime -> some code
| snippet.
|
| Things become quite crazy these days, to be honest...
| dom96 wrote:
| There seems to be so many different variants of the same thing
| out there. What makes this unique? For example I know Pyodide
| exists and also runs CPython under WASM.
| ridruejo wrote:
| This one is designed to run on the server side and interface
| with the OS via WASI, so it can read/write files etc
___________________________________________________________________
(page generated 2023-01-31 23:00 UTC)