Post Ac6q9qfoxCsZpd7uy0 by sminnee@mastodon.nz
(DIR) More posts by sminnee@mastodon.nz
(DIR) Post #Ac6ibbedaZDjj4IY6q by simon@fedi.simonwillison.net
2023-11-23T19:08:38Z
0 likes, 0 repeats
Here's a surprisingly difficult question: I'm looking for an implementation of a simple expression language in Python that can do bits of arithmetic and basic string operations (eg concatenetion) against some variables I pass to it... but is safe against untrusted input - and operates with limits on CPU and memory usageI'm not looking for a full sandboxed Python (though that would be nice) - I just want to be able to do "a + b * c" with untrusted inputAny good options I might have missed?
(DIR) Post #Ac6imbUGYE9mOOgtd2 by simon@fedi.simonwillison.net
2023-11-23T19:09:21Z
0 likes, 0 repeats
Think the kind of expressions you might use in an Excel formula
(DIR) Post #Ac6j2zGZuMtgmA6JLk by dvogel@mastodon.social
2023-11-23T19:13:33Z
0 likes, 0 repeats
@simon I don't know of any existing libraries but it should be fairly straight forward to use the ast stdlib module to parse the input and then eval it only after ensuring the tree only contains your limited vocabulary.
(DIR) Post #Ac6jGUli5y8AF66Ua0 by sminnee@mastodon.nz
2023-11-23T19:14:51Z
0 likes, 0 repeats
@simon I made my own little peg parser when I needed that, although it was JS (well, ReScript) not Python.
(DIR) Post #Ac6jTQ3ouVYbPDjlVw by steve@deliverabilit.ie
2023-11-23T19:14:53Z
0 likes, 0 repeats
@simon Sounds like cel might work. https://github.com/cloud-custodian/cel-python is a python implementation.
(DIR) Post #Ac6jemGago4pSh7u9w by bryanculbertson@mastodon.social
2023-11-23T19:15:36Z
0 likes, 0 repeats
@simon "select $expr as result" on in-memory sqlite DB with no data?
(DIR) Post #Ac6jeoPWiYPg6zK4Mi by bryanculbertson@mastodon.social
2023-11-23T19:18:16Z
0 likes, 0 repeats
@simon pandas.eval?
(DIR) Post #Ac6jqrqHmifkEIaUHg by danlyke@researchbuzz.masto.host
2023-11-23T19:15:48Z
0 likes, 0 repeats
@simon I'd just do it in C with Peg or Leg or whatever combo it is. The example calculator grammar makes it pretty easy to start.
(DIR) Post #Ac6k3OziLstUWXyUM4 by jonathanmatthews@fosstodon.org
2023-11-23T19:17:04Z
0 likes, 0 repeats
@simon Starlark could be useful - how about a wrapper such as https://pypi.org/project/starlark-pyo3/?
(DIR) Post #Ac6kERwBkY5YpyaqSe by ianholmes@mastodon.social
2023-11-23T19:21:02Z
0 likes, 0 repeats
@simon JEXL? https://pypi.org/project/pyjexl/
(DIR) Post #Ac6kQjDBLor3t3Of6O by simon@fedi.simonwillison.net
2023-11-23T19:26:54Z
0 likes, 0 repeats
@dvogel that's what's so frustrating here: I don't think it's enormously hard to build one of these from scratch, but I really feel like I shouldn't need to!
(DIR) Post #Ac6kc8HCV79koFiM1Q by simon@fedi.simonwillison.net
2023-11-23T19:27:52Z
0 likes, 0 repeats
@steve oh that looks like exactly what I want, thanks!
(DIR) Post #Ac6kmF1yF8WgeaPXpg by simon@fedi.simonwillison.net
2023-11-23T19:29:19Z
0 likes, 0 repeats
@bryanculbertson oh I hadn't seen that one! Do you know if it's considered to be "safe"?
(DIR) Post #Ac6kyqApyKfvKFZ9aC by buck@fosstodon.org
2023-11-23T19:30:59Z
0 likes, 0 repeats
@simon Have you considered using Sympy? There’s some cases where you need to read the documentation to ensure you’re not using eval under the hood but it may be able to do what you’re looking for
(DIR) Post #Ac6lAopan29udWLB7g by simon@fedi.simonwillison.net
2023-11-23T19:36:55Z
0 likes, 0 repeats
@nelson if there's a well maintained Python library for embedding it and it's definitely "safe" for untrusted code I'd absolutely consider it!
(DIR) Post #Ac6lM2HH8WG9WddbGK by bryanculbertson@mastodon.social
2023-11-23T19:39:43Z
0 likes, 0 repeats
@simon dunno if it is safe enough to expose to users, but I know it is safer than eval in the sense that only certain arithmetic operations are available
(DIR) Post #Ac6lw4UIOg4unwuXiK by mestachs@mastodon.green
2023-11-23T19:46:07Z
0 likes, 0 repeats
@simon it's your last point that looks harder to find (cpu/mem limits)I had once to play with expression but soon hit a lot of memory pressure in ruby. But users where still "friendly" (not triggering ddos with large expressions)Initially I went for https://github.com/rubysolo/dentakubut ended up making a go executable where I piped a json https://github.com/BLSQ/go-hesabu#readme
(DIR) Post #Ac6mL1Mf96AmMGEn5s by chris@m.objc.io
2023-11-23T19:50:41Z
0 likes, 0 repeats
@simon I like writing these things and would write my own (not saying you should). I was wondering if there's anything in sqlite you could reuse?
(DIR) Post #Ac6nH4gyvntBTUQKvo by simon@fedi.simonwillison.net
2023-11-23T20:00:50Z
0 likes, 0 repeats
@nelson There's still a Python wiki page about that https://wiki.python.org/moin/SandboxedPython
(DIR) Post #Ac6nlj3LR4ixL3V1Zg by mario@hachyderm.io
2023-11-23T20:06:48Z
0 likes, 0 repeats
@simon for sandboxing, https://healeycodes.com/running-untrusted-python-code might give you some ideas
(DIR) Post #Ac6o8PZC3FcXYzDJQ0 by simon@fedi.simonwillison.net
2023-11-23T20:11:02Z
0 likes, 0 repeats
@mario That's really useful, thanks - https://healeycodes.com/sandboxing-javascript-code looks relevant too
(DIR) Post #Ac6oKAIOecLNfzxgwq by migurski@mastodon.social
2023-11-23T20:05:20Z
0 likes, 0 repeats
@nelson @simon I’m going to have this need real soon too. My first thought was Py’s built-in AST module, which at least gets you a proper tree
(DIR) Post #Ac6oKCaC9PlklmIvy4 by simon@fedi.simonwillison.net
2023-11-23T20:12:00Z
0 likes, 0 repeats
@migurski @nelson I've seen some attempts at that, but they tend to be in some random GitHub repo that's not actively maintainedI want a sandboxing library that some enormous service is using to execute untrusted code thousands of times a second, with a dedicated security team maintaining it
(DIR) Post #Ac6omPkcmkKxibdSXw by migurski@mastodon.social
2023-11-23T20:17:53Z
0 likes, 0 repeats
@simon @nelson Nice thing about AST is that you can filter for a subset of operations allowed in your "a * b + c" example and refuse anything else; I feel like its inclusion in the stdlib makes it a reasonable candidate for safe?
(DIR) Post #Ac6pCr3PtaEZvW26t6 by dasch@mastodon.social
2023-11-23T20:22:30Z
0 likes, 0 repeats
@simon CEL would be a possibility: https://pypi.org/project/cel-python/
(DIR) Post #Ac6pZJ4GQE0vBF8RSy by sminnee@mastodon.nz
2023-11-23T20:24:13Z
0 likes, 0 repeats
@simon you looked at this? https://github.com/FlavioLionelRita/py-expression/wiki
(DIR) Post #Ac6pZJrBUL95cy1Via by simon@fedi.simonwillison.net
2023-11-23T20:26:55Z
0 likes, 0 repeats
@sminnee I hadn't - the lack of mentions of security in the documentation makes it hard for me to trust it, sadly
(DIR) Post #Ac6pm7Wkdma99WZP6W by migurski@mastodon.social
2023-11-23T20:19:08Z
0 likes, 0 repeats
@simon @nelson …like, ast.parse(), verify that every node is in a short allowed list, then just eval() it once you’re satisfied
(DIR) Post #Ac6pm8HtoUIPVkd3aq by simon@fedi.simonwillison.net
2023-11-23T20:28:01Z
0 likes, 0 repeats
@migurski @nelson There's a project that does that here, but it's not protected against nasty exponential calculation attacks: https://newville.github.io/asteval/motivation.html#how-safe-is-asteval
(DIR) Post #Ac6q9qfoxCsZpd7uy0 by sminnee@mastodon.nz
2023-11-23T20:32:50Z
0 likes, 0 repeats
@simon still, gotta be better than using something turing-complete. In your place I would try some adversarial expressions and put a max length of 1000 chars on it or something.
(DIR) Post #Ac6ryWiX0i0Sf47j5k by interface@mastodon.gamedev.place
2023-11-23T20:53:33Z
0 likes, 0 repeats
@simon not Python but C#... I work on this in my day job and it's 100% modeled after Excel: https://github.com/Microsoft/Power-Fx/
(DIR) Post #Ac6sZxfblNlWy7bfGq by d40cht@mastodon.energy
2023-11-23T21:00:25Z
0 likes, 0 repeats
@simon I'd just write something simple using the ast library to parse and a limited visitor to evaluate.
(DIR) Post #Ac76t1R1OAwlVNlbAe by kellan@fiasco.social
2023-11-23T23:40:34Z
0 likes, 0 repeats
@simon sounds like CEL https://github.com/google/cel-spec
(DIR) Post #Ac78vLQSEa89zdjjdo by krassowski@fosstodon.org
2023-11-24T00:03:35Z
0 likes, 0 repeats
@simon guarded eval of IPython might be of your interest: https://github.com/ipython/ipython/blob/225a1a2699f328c1b4a0ded2de1f7af9f731ad55/IPython/core/guarded_eval.py strictly no security guaranties, but it is as safe as you configure it to be. If you are asking from backend side of a web UI you could also use pyodide or pyscript to move the burden of computation (and risk) to the user side (and communicate results via JSON).
(DIR) Post #Ac796mX1nLfFFLGUoy by IanCal@data-folks.masto.host
2023-11-24T00:05:36Z
0 likes, 0 repeats
@simon perhaps you want something more like firecracker then? https://firecracker-microvm.github.io/ http API, I think there's a python client. It's for doing tiny sandboxed VMs, it was built for aws lambda.
(DIR) Post #Ac80HHOHDESxQ63w9Y by thadguidry@mastodon.social
2023-11-24T10:01:25Z
0 likes, 0 repeats
@simon @mario or GraalVM Polyglot Sandbox with GraalPy? https://www.graalvm.org/latest/security-guide/polyglot-sandbox/
(DIR) Post #Ac8SHoggDGqVP9iNzE by simon@fedi.simonwillison.net
2023-11-24T15:13:43Z
0 likes, 0 repeats
@thadguidry @mario wow that's actually a really interesting option
(DIR) Post #Ac9SIO0t4B8K7lOl3w by thadguidry@mastodon.social
2023-11-25T02:50:21Z
0 likes, 0 repeats
@simon probably best to engage with them directly and ask your questions.
(DIR) Post #AcAuBEi4FZEb7Kw00m by benlk@newsie.social
2023-11-25T19:37:18Z
0 likes, 0 repeats
@simon shell out to the command-line calculator `bc`? https://www.gnu.org/software/bc/manual/html_mono/bc.html as simple as `bc <<< "a + b * c"`
(DIR) Post #AcB0TjbohozFt3DN68 by simon@fedi.simonwillison.net
2023-11-25T20:47:15Z
0 likes, 0 repeats
@benlk sadly bc isn't designed for untrusted scripts - it has a system() function that can run shell commands, plus it isn't protected against resource exhaustion attacks
(DIR) Post #AcB0fiBxnemgy1piSG by zellyn@hachyderm.io
2023-11-25T20:45:05Z
0 likes, 0 repeats
@jonathanmatthews @simon I was half tongue-in-cheek going to suggest embedding starlark/rust, but if someone's already done it, it might be just what you need. If you want full python function definition, etc.
(DIR) Post #AcB0fjFXrpFAFQW5g0 by simon@fedi.simonwillison.net
2023-11-25T20:48:38Z
0 likes, 0 repeats
@zellyn @jonathanmatthews "It is safe to execute untrusted code." Neat! https://github.com/bazelbuild/starlark#design-principles
(DIR) Post #AcB0rGyyn1QY3OGyGG by simon@fedi.simonwillison.net
2023-11-25T20:50:56Z
0 likes, 0 repeats
@zellyn @jonathanmatthews https://pypi.org/project/pystarlark/ looks promising but also looks like it's not actively maintained, and it's also not ready to make confident assertions about its security
(DIR) Post #AcB14CAM0bSwsckrhI by simon@fedi.simonwillison.net
2023-11-25T20:52:09Z
0 likes, 0 repeats
@zellyn @jonathanmatthews this fork looks a lot better though! https://github.com/caketop/python-starlark-go
(DIR) Post #AcBBHCjQDSnrwN024G by simon@fedi.simonwillison.net
2023-11-25T22:48:58Z
0 likes, 0 repeats
@zellyn @jonathanmatthews Unfortunately, it doesn't look like there are simple mechanisms for Starlark for restricting the amount of memory and CPU a rogue program can use
(DIR) Post #AcBQC97fhmBsc1O8UC by zellyn@hachyderm.io
2023-11-26T01:36:16Z
0 likes, 0 repeats
@simon @jonathanmatthews Interesting. Naively, without knowing anything else, I'd expect to prefer starlark-rust for embedding in Python (because of Go's very different stacks/GC/etc.)
(DIR) Post #AcBQMZtB2m24WQaCDg by zellyn@hachyderm.io
2023-11-26T01:37:44Z
0 likes, 0 repeats
@simon @jonathanmatthews Awww, yeah. That makes sense. Looks like the "cel"-themed things are probably closer to what you're looking for. Unless you want to do something similar to what you do with sqlite in datasette, and just spin up time-limited wasm workers for evaluating expressions…
(DIR) Post #AcCOkF4C6flM8tqtQ8 by simon@fedi.simonwillison.net
2023-11-26T12:54:19Z
0 likes, 0 repeats
@zellyn @jonathanmatthews time limited WASM workers are my ideal solution I think, it's still harder than I'd like to run time/memory limited interpreters for different languages in WASM inside a Python script though - some notes here https://til.simonwillison.net/tils/search?q=WebAssembly+sandbox
(DIR) Post #AcD0O0A8dTZ9o0SEbo by benlk@newsie.social
2023-11-26T19:56:08Z
0 likes, 0 repeats
@simon Oof, good to know.
(DIR) Post #AcD4iMDyiee7q0DEWm by zellyn@hachyderm.io
2023-11-26T20:44:45Z
0 likes, 0 repeats
@simon @jonathanmatthews Googling for “self-contained wasm implementation” led me to wasm3 and pywasm3. https://github.com/wasm3/pywasm3/blob/main/examples/02-metered.py is intriguingly named, but apparently requires pre-processing your wasm modules to add metering, using https://github.com/ewasm/wasm-metering.So, I *think* all the pieces are in place to do what you want (and indeed, that metering solution seems like it might work with other, possibly more security-focused wasm runtimes), but the chain of hoops is distressingly long 🙂
(DIR) Post #AcD7b5gU7IaKNHlxtA by simon@fedi.simonwillison.net
2023-11-26T21:16:29Z
0 likes, 0 repeats
@zellyn @jonathanmatthews I got something working with wasmtime but it still feels a bit bleeding edge https://til.simonwillison.net/webassembly/python-in-a-wasm-sandbox