[HN Gopher] Show HN: Obelisk - a WASM-based deterministic workfl...
       ___________________________________________________________________
        
       Show HN: Obelisk - a WASM-based deterministic workflow engine
        
       A lightweight engine for durable execution / deterministic
       workflows I built with Rust, wasmtime and the WASM Component Model.
       Its main use is running reliable, long-running workflows that can
       automatically resume after failures. Looking for feedback on the
       approach and potential use cases!
        
       Author : tomasol
       Score  : 48 points
       Date   : 2025-04-09 19:24 UTC (3 hours ago)
        
 (HTM) web link (obeli.sk)
 (TXT) w3m dump (obeli.sk)
        
       | emgeee wrote:
       | This is a pretty cool idea but I'm trying to think of the
       | advantage of WASM vs other execution engines.
       | 
       | It seems to me one of the main use-cases for WASM is to execute
       | lambdas, which are often short-lived (like 500ms timeout limits).
       | Maybe this could have a place in embedded systems?
        
         | tomasol wrote:
         | The biggest motivator for me is that WASM sandbox provides true
         | deterministic execution. Contrary to engines like temporal,
         | using hashmaps is 100% deterministic here. Attempting to spawn
         | a thread is a compile error. It also performs well - the
         | bottleneck is in the write throughput of sqlite. Last but not
         | least - all the interfaces between workflows and activities are
         | type safe, described in a WIT schema.
        
           | jcmfernandes wrote:
           | Somewhat similar to Golem -
           | https://github.com/golemcloud/golem - correct?
           | 
           | So, I like this idea, I really do. At the same time, in the
           | short-term, WASM is relatively messy and, in my opinion,
           | immature (as an ecosystem) for prime time. But with that out
           | of the way (it will eventually come), you'll have to tell
           | people that they can't use any code that relies on threads,
           | so they better know if any of the libraries they use does it.
           | How do you foresee navigating this? Runtime errors suck,
           | especially in this context, as fixing them requires either
           | live patching code or migrating execution logs to new code
           | versions.
        
             | tomasol wrote:
             | Yeah, looks like Golem went similar route - using WASM
             | Component Model and wasmtime.
             | 
             | There is always this chicken and egg problem on a new
             | platform, but I am hoping that LLMs can solve it partially
             | - the activities are just HTTP clients with no complex
             | logic.
             | 
             | Regarding the restrictions required for determinism, they
             | only apply to workflows, not activities. Workflows should
             | be describing just the business logic. All the complexities
             | of retries, failure recovery, replay after server crash
             | etc. are handled by the runtime. The WASM sandbox makes it
             | impossible to introduce non-determinism - it would cause a
             | compile error so no need for runtime checks.
        
               | jcmfernandes wrote:
               | I understand what you mean by being able to fully sandbox
               | things and guarantee determinism, a must for the
               | workflows and not the activities (using temporal lingo).
               | 
               | When you say that the runtime handles, for example,
               | retries, doesn't that require me to depend on your HTTP
               | client component? Or do I also need to compile activities
               | to WASM and have obelisk running them because they are
               | essentially background jobs (that is, you have workers
               | pulling)?
               | 
               | Finally, do you see the component's interface as the
               | right layer for capturing IO? I'm imagining people
               | attempting to run managed code (Java, python, ruby,
               | etc.). The VMs can do thousands of syscallls before they
               | start executing they user's code. Logging them one by one
               | seems crazy, but I also don't see an alternative.
               | 
               | EDIT:
               | 
               | I RTFM and found the answers to my first two questions in
               | the README :)
        
               | tomasol wrote:
               | > do I also need to compile activities to WASM
               | 
               | Yes, currently all activities must conform to the WASI
               | 0.2 standard. This is the simplest for deployment, as you
               | only need the obelisk executable, toml config file. The
               | webhooks, workflows and activities pulled from a OCI
               | registry on startup.
               | 
               | To support native code I plan to add external activities
               | as well, with an interface similar to what Netflix
               | Conductor uses for its workers.
               | 
               | > Finally, do you see the component's interface as the
               | right layer for capturing IO?
               | 
               | An activity must encapsulate something much higher level
               | than a single IO operation. So something like "Configure
               | BGP on a router", "Start a VM" etc. It needs to be able
               | to handle retries and thus be idempotent.
               | 
               | Regarding performance, a workflow execution can call
               | 500-700 child executions serially, or around 1400 child
               | executions concurrently per second.
        
       | disintegrator wrote:
       | Really nice project. What's the reasoning behind the AGPL
       | licensing. My understanding is that it will hurt adoption unless
       | you're planning to offer paid licensing options? Either way it's
       | a really nice project and I'm keen to try it out. I've found it
       | tricky to get a WASM/WASI setup where I can at least my http
       | requests (probably my own skill issue).
        
         | tomasol wrote:
         | Thanks for the kind words. In an ideal world I would like to
         | offer a cloud version that would be monetized. There are a few
         | examples on how to do HTTP requests, I have a demo repository
         | [1] with GraphQL and regular JSON-over-HTTP activities. I do
         | agree that the ecosystem is not mature yet, but I was able to
         | generate HTTP activities using LLM on a single shot.
         | 
         | 1: https://github.com/obeli-sk/demo-stargazers
        
       | SvenL wrote:
       | One issue I had many time with workflow engines was updates. I
       | have a workflow and it has already running instances. 2
       | scenarios:
       | 
       | Can I update the workflow while it has running instances without
       | interfering the running instances?
       | 
       | Can I update a running instance with a new version of the
       | workflow to patch some flaw? If no, can I replay an updated
       | version of a workflow with the log of an old workflow version?
        
         | tomasol wrote:
         | Great questions. If you are fixing a bug in a workflow, which
         | has running executions, there are two scenarios:
         | 
         | Either the fix does not break the determinism, meaning the the
         | execution did not hit the fix yet. In this case the execution
         | can be replayed and continue on the patched WASM component.
         | 
         | Otherwise, the execution replay causes "Non determinism
         | detected" error. In this case you need to handle the situation
         | manually. Since the execution log is in a sqlite file, you can
         | select all execution affected by the bug and perform a custom
         | cleanup. Also you can create a "forked" execution just by
         | copying the execution log + child responses into a new
         | execution, however there is no API for it yet.
         | 
         | > Can I update the workflow while it has running instances
         | without interfering the running instances?
         | 
         | If you mean keep the in-progress executions on the old version
         | of the code, you can do that by introducing a new version in
         | the WIT file and/or change the new function name.
        
         | halamadrid wrote:
         | We are using a workflow engine called Unmeshed - which has what
         | you are asking about. Workflow definitions can be updated
         | without running interfering with running instances and if you
         | choose to you can patch updates on to running workflows. And
         | you can also rerun workflows with the same input from an older
         | execution.
        
       ___________________________________________________________________
       (page generated 2025-04-09 23:00 UTC)