[HN Gopher] Build durable workflows with Postgres
       ___________________________________________________________________
        
       Build durable workflows with Postgres
        
       Author : KraftyOne
       Score  : 67 points
       Date   : 2025-08-08 19:24 UTC (3 hours ago)
        
 (HTM) web link (www.dbos.dev)
 (TXT) w3m dump (www.dbos.dev)
        
       | cpursley wrote:
       | I've been using https://www.pgflow.dev for workflows which is
       | built on pgmq and am really impressed so far. Most of the logic
       | is in the database so I'm considering building an Elixir adapter
       | DSL.
        
         | ishita_julep wrote:
         | what are you using the DSL for?
        
           | cpursley wrote:
           | It's used to generate the database migration that defines the
           | flows. More syntax sugar than anything.
        
         | mmcclure wrote:
         | Just curious, if you're already in Elixir and using Postgres,
         | why not use Oban[1]? It's my absolute favorite background job
         | library, and the thing I often miss most when working in other
         | ecosystems.
         | 
         | [1] https://github.com/oban-bg/oban
        
       | tonyhb wrote:
       | Anything that guarantees exactly once is selling snake oil. Side
       | effects happen inside any transaction, and only when it commits
       | (checkpoints) are the side effects safe.
       | 
       | Want to send an email, but the app crashes before committing? Now
       | you're at-least-once.
       | 
       | You can compress the window that causes at-least-once semantics,
       | but it's always there. For this reason, this blog post oversells
       | the capabilities of these types of systems as a whole. DBOS (and
       | Inngest, see the disclaimer below) try to get as close to exactly
       | once as possible, but the risk _always_ exists, which is why you
       | should always try to use idempotency in external API requests if
       | they support it. Defense in layers.
       | 
       | Disclaimer: I built the original `step.run` APIs at
       | https://www.inngest.com, which offers similar things on any
       | platform... without being tied to DB transactions.
        
         | KraftyOne wrote:
         | As the post says, the exactly-once guarantee is ONLY for steps
         | performing database operations. For those, you actually can get
         | an exactly-once guarantee by running the database operations in
         | the same Postgres transaction as your durable checkpoint.
         | That's a pretty cool benefit of building workflows on Postgres!
         | Of course, if there are side effects outside the database,
         | those happen at-least-once.
        
           | tonyhb wrote:
           | You can totally leverage postgres transactions to give
           | someone... postgres transactions!
           | 
           | I just figured that the exactly once semantics were so worth
           | discussing that any external side effects (which is what
           | orchestration is for) aren't included in that, which is a big
           | caveat.
        
         | jedberg wrote:
         | > Anything that guarantees exactly once is selling snake oil.
         | 
         | That's a pretty spicy take. I'll agree that exactly-once is
         | hard, but it's not impossible. Obviously there are caveats, but
         | the beauty of DBOS using Postgres as the method of coordination
         | instead of the an external server (like Temporal or Inngest) is
         | that the exactly-once guarantees of Postgres can carry over to
         | the application. Especially so if you're using that same
         | Postgres to store your application data.
        
       | abtinf wrote:
       | Why not just use Temporal?
        
         | KraftyOne wrote:
         | We wanted to make workflows more lightweight--we're building a
         | Postgres-backed library you can add to your existing
         | application instead of an external orchestrator that requires
         | you to rearchitect your system around it. This post goes into
         | more detail: https://www.dbos.dev/blog/durable-execution-
         | coding-compariso...
        
       | at0mic22 wrote:
       | Every few years someone discovers FOR UPDATE SKIP LOCKED and
       | represents it. I remember it lasting for 15 years at least
        
         | qianli_cs wrote:
         | Yup, some features are timeless and deserve a re-intro every
         | now and then. SKIP LOCKED is definitely one of them.
        
           | skrtskrt wrote:
           | with a nice NOWAIT when appropriate
        
         | atombender wrote:
         | The "someone" in this case happens to be Michael Stonebraker,
         | the creator of Postgres and CTO of DBOS.
        
       | alpb wrote:
       | I've been following DBOS for a while and I think the model isn't
       | too different than Azure Durable Functions (which uses Azure
       | Queues/Tables under the covers to maintain state).
       | https://learn.microsoft.com/en-us/azure/azure-functions/dura...
       | 
       | Perhaps the only difference is that Azure Durable Functions has
       | more syntactic sugar in C# (instead of DBOS choice being Python)
       | to preserve call results in the persistent storage? Where else do
       | they differ? At the end, all of them seem to be doing what
       | Temporal is doing (which has its own shortcomings and it's also
       | possible to get it wrong if you call a function directly instead
       | of invoking it via an Activity etc)?
        
         | KraftyOne wrote:
         | Both do durable workflows with similar guarantees. The big
         | difference is that DBOS is an open-source library you can add
         | to your existing code and run anywhere, whereas Durable
         | Functions is a cloud offering for orchestrating serverless
         | functions on Azure.
        
           | alpb wrote:
           | As far as I know, Azure Durable Functions doesn't have a
           | server-side proprietary component and it's actually fully
           | open source framework/clients as well. So it's actually not a
           | cloud offering per-se. You can see the full implementations
           | at:
           | 
           | * https://github.com/Azure/durabletask
           | 
           | * https://github.com/microsoft/durabletask-go
        
       | cmdtab wrote:
       | Recently moved some of the background jobs from graphile worker
       | to DBOS. Really recommend for the simplicity. Took me half an
       | hour.
       | 
       | I evaluated temporal, trigger, cloudflare workflows (highly not
       | recommended), etc and this was the easiest to implement
       | incrementally. Didn't need to change our infrastructure at all.
       | Just plugged the worker where I had graphile worker.
       | 
       | The hosted service UX and frontend can use a lot of work though
       | but it's not necessary for someone to use. OTEL support was
       | there.
        
         | LudwigNagasena wrote:
         | What was the reason for the transition?
        
           | cmdtab wrote:
           | Needed checkpoints in some of our jobs wrapping around the AI
           | agent so we can reduce cost and increase reliability (as
           | workflow will start from mid step as opposed to a complete
           | restart).
           | 
           | We already check pointed the agent but then figure it's
           | better to have a generic abstraction for other stuff we do.
        
         | diarrhea wrote:
         | Interesting!
         | 
         | What made you opt for DBOS over Temporal?
        
           | cmdtab wrote:
           | Temporal required re-architecting some stuff, their
           | typescript sdk and sandbox is bit unintuitive to use so would
           | have been an additional item to grok for the team, and
           | additional infrastructure to maintain. There was a latency
           | trade off too which in our case mattered.
           | 
           | Didn't face any issue though. Temporal observability and UI
           | was better than DBOS. Just harder to do incremental migration
           | in an existing codebase.
        
       | rlili wrote:
       | Some other lightweight solutions around:
       | 
       | https://github.com/iopsystems/durable
       | 
       | https://github.com/maxcountryman/underway
        
       | darkteflon wrote:
       | Often wondered whether it would be possible / advisable to
       | combine DBOS with, e.g., Dagster if you have complex data
       | orchestration requirements. They seem to deal with orthogonal
       | concerns but complement nicely. Is integration with orchestration
       | frameworks something the DBOS team has any thoughts on?
        
         | KraftyOne wrote:
         | Would love to learn more about what you're building--what
         | problems or parts of your system would you solve with Dagster
         | vs DBOS?
        
       | agambrahma wrote:
       | Curious how this compares to Cloudflare, which is the other
       | provider that is really going for simplified workflows
        
       | atombender wrote:
       | While DBOS looks like a nice system, I was really disappointed to
       | learn that Conductor, which is the DBOS equivalent of the
       | Temporal server, is not open source.
       | 
       | Without it, you get no centralized coordination of workflow
       | recovery. On Kubernetes, for example, my understanding is that
       | you will need to use a stateful set to assign stable executor
       | IDs, which the Conductor doesn't need.
       | 
       | I suppose that's their business model, to provide a simplistic
       | foundation where you have to pay money to get the grown up stuff.
        
       | jumploops wrote:
       | I've been looking at migrating to Temporal, but this looks
       | interesting.
       | 
       | For context, we have a simple (read: home-built) "durable" worker
       | setup that uses BullMQ for scheduling/queueing, but all of the
       | actual jobs are Postgres-based.
       | 
       | Due to the cron-nature of the many disparate jobs (bespoke AI-
       | native workflows), we have workers that scale up/down basically
       | on the hour, every hour.
       | 
       | Temporal is the obvious solution, but it will take some
       | rearchitecting to get our jobs to fit their structure. We're also
       | concerned with some of their limits (payload size, language
       | restrictions, etc.).
       | 
       | Looking at DBOS, it's unclear from the docs how to scale the
       | workers:
       | 
       | > DBOS is just a library for your program to import, so it can
       | run with any Python/Node program.
       | 
       | In our ideal case, we can add DBOS to our main application for
       | scheduling jobs, and then have a simple worker app that scales
       | independently.
       | 
       | How "easy" would it be to migrate our current system to DBOS?
        
         | KraftyOne wrote:
         | I'd love to learn more about what you're building--just reach
         | out at peter.kraft@dbos.dev.
         | 
         | One option is that you have DBOS workflows that schedule and
         | submit jobs to an external worker app. Another option is that
         | your workers use DBOS queues
         | (https://docs.dbos.dev/python/tutorials/queue-tutorial). I'd
         | have to better understand your use case to figure out what
         | would be the best fit.
        
       ___________________________________________________________________
       (page generated 2025-08-08 23:00 UTC)