[HN Gopher] Show HN: Obelisk - a WASM-based deterministic workfl...
___________________________________________________________________
Show HN: Obelisk - a WASM-based deterministic workflow engine
A lightweight engine for durable execution / deterministic
workflows I built with Rust, wasmtime and the WASM Component Model.
Its main use is running reliable, long-running workflows that can
automatically resume after failures. Looking for feedback on the
approach and potential use cases!
Author : tomasol
Score : 48 points
Date : 2025-04-09 19:24 UTC (3 hours ago)
(HTM) web link (obeli.sk)
(TXT) w3m dump (obeli.sk)
| emgeee wrote:
| This is a pretty cool idea but I'm trying to think of the
| advantage of WASM vs other execution engines.
|
| It seems to me one of the main use-cases for WASM is to execute
| lambdas, which are often short-lived (like 500ms timeout limits).
| Maybe this could have a place in embedded systems?
| tomasol wrote:
| The biggest motivator for me is that WASM sandbox provides true
| deterministic execution. Contrary to engines like temporal,
| using hashmaps is 100% deterministic here. Attempting to spawn
| a thread is a compile error. It also performs well - the
| bottleneck is in the write throughput of sqlite. Last but not
| least - all the interfaces between workflows and activities are
| type safe, described in a WIT schema.
| jcmfernandes wrote:
| Somewhat similar to Golem -
| https://github.com/golemcloud/golem - correct?
|
| So, I like this idea, I really do. At the same time, in the
| short-term, WASM is relatively messy and, in my opinion,
| immature (as an ecosystem) for prime time. But with that out
| of the way (it will eventually come), you'll have to tell
| people that they can't use any code that relies on threads,
| so they better know if any of the libraries they use does it.
| How do you foresee navigating this? Runtime errors suck,
| especially in this context, as fixing them requires either
| live patching code or migrating execution logs to new code
| versions.
| tomasol wrote:
| Yeah, looks like Golem went similar route - using WASM
| Component Model and wasmtime.
|
| There is always this chicken and egg problem on a new
| platform, but I am hoping that LLMs can solve it partially
| - the activities are just HTTP clients with no complex
| logic.
|
| Regarding the restrictions required for determinism, they
| only apply to workflows, not activities. Workflows should
| be describing just the business logic. All the complexities
| of retries, failure recovery, replay after server crash
| etc. are handled by the runtime. The WASM sandbox makes it
| impossible to introduce non-determinism - it would cause a
| compile error so no need for runtime checks.
| jcmfernandes wrote:
| I understand what you mean by being able to fully sandbox
| things and guarantee determinism, a must for the
| workflows and not the activities (using temporal lingo).
|
| When you say that the runtime handles, for example,
| retries, doesn't that require me to depend on your HTTP
| client component? Or do I also need to compile activities
| to WASM and have obelisk running them because they are
| essentially background jobs (that is, you have workers
| pulling)?
|
| Finally, do you see the component's interface as the
| right layer for capturing IO? I'm imagining people
| attempting to run managed code (Java, python, ruby,
| etc.). The VMs can do thousands of syscallls before they
| start executing they user's code. Logging them one by one
| seems crazy, but I also don't see an alternative.
|
| EDIT:
|
| I RTFM and found the answers to my first two questions in
| the README :)
| tomasol wrote:
| > do I also need to compile activities to WASM
|
| Yes, currently all activities must conform to the WASI
| 0.2 standard. This is the simplest for deployment, as you
| only need the obelisk executable, toml config file. The
| webhooks, workflows and activities pulled from a OCI
| registry on startup.
|
| To support native code I plan to add external activities
| as well, with an interface similar to what Netflix
| Conductor uses for its workers.
|
| > Finally, do you see the component's interface as the
| right layer for capturing IO?
|
| An activity must encapsulate something much higher level
| than a single IO operation. So something like "Configure
| BGP on a router", "Start a VM" etc. It needs to be able
| to handle retries and thus be idempotent.
|
| Regarding performance, a workflow execution can call
| 500-700 child executions serially, or around 1400 child
| executions concurrently per second.
| disintegrator wrote:
| Really nice project. What's the reasoning behind the AGPL
| licensing. My understanding is that it will hurt adoption unless
| you're planning to offer paid licensing options? Either way it's
| a really nice project and I'm keen to try it out. I've found it
| tricky to get a WASM/WASI setup where I can at least my http
| requests (probably my own skill issue).
| tomasol wrote:
| Thanks for the kind words. In an ideal world I would like to
| offer a cloud version that would be monetized. There are a few
| examples on how to do HTTP requests, I have a demo repository
| [1] with GraphQL and regular JSON-over-HTTP activities. I do
| agree that the ecosystem is not mature yet, but I was able to
| generate HTTP activities using LLM on a single shot.
|
| 1: https://github.com/obeli-sk/demo-stargazers
| SvenL wrote:
| One issue I had many time with workflow engines was updates. I
| have a workflow and it has already running instances. 2
| scenarios:
|
| Can I update the workflow while it has running instances without
| interfering the running instances?
|
| Can I update a running instance with a new version of the
| workflow to patch some flaw? If no, can I replay an updated
| version of a workflow with the log of an old workflow version?
| tomasol wrote:
| Great questions. If you are fixing a bug in a workflow, which
| has running executions, there are two scenarios:
|
| Either the fix does not break the determinism, meaning the the
| execution did not hit the fix yet. In this case the execution
| can be replayed and continue on the patched WASM component.
|
| Otherwise, the execution replay causes "Non determinism
| detected" error. In this case you need to handle the situation
| manually. Since the execution log is in a sqlite file, you can
| select all execution affected by the bug and perform a custom
| cleanup. Also you can create a "forked" execution just by
| copying the execution log + child responses into a new
| execution, however there is no API for it yet.
|
| > Can I update the workflow while it has running instances
| without interfering the running instances?
|
| If you mean keep the in-progress executions on the old version
| of the code, you can do that by introducing a new version in
| the WIT file and/or change the new function name.
| halamadrid wrote:
| We are using a workflow engine called Unmeshed - which has what
| you are asking about. Workflow definitions can be updated
| without running interfering with running instances and if you
| choose to you can patch updates on to running workflows. And
| you can also rerun workflows with the same input from an older
| execution.
___________________________________________________________________
(page generated 2025-04-09 23:00 UTC)