[HN Gopher] Build durable workflows with Postgres
___________________________________________________________________
Build durable workflows with Postgres
Author : KraftyOne
Score : 67 points
Date : 2025-08-08 19:24 UTC (3 hours ago)
(HTM) web link (www.dbos.dev)
(TXT) w3m dump (www.dbos.dev)
| cpursley wrote:
| I've been using https://www.pgflow.dev for workflows which is
| built on pgmq and am really impressed so far. Most of the logic
| is in the database so I'm considering building an Elixir adapter
| DSL.
| ishita_julep wrote:
| what are you using the DSL for?
| cpursley wrote:
| It's used to generate the database migration that defines the
| flows. More syntax sugar than anything.
| mmcclure wrote:
| Just curious, if you're already in Elixir and using Postgres,
| why not use Oban[1]? It's my absolute favorite background job
| library, and the thing I often miss most when working in other
| ecosystems.
|
| [1] https://github.com/oban-bg/oban
| tonyhb wrote:
| Anything that guarantees exactly once is selling snake oil. Side
| effects happen inside any transaction, and only when it commits
| (checkpoints) are the side effects safe.
|
| Want to send an email, but the app crashes before committing? Now
| you're at-least-once.
|
| You can compress the window that causes at-least-once semantics,
| but it's always there. For this reason, this blog post oversells
| the capabilities of these types of systems as a whole. DBOS (and
| Inngest, see the disclaimer below) try to get as close to exactly
| once as possible, but the risk _always_ exists, which is why you
| should always try to use idempotency in external API requests if
| they support it. Defense in layers.
|
| Disclaimer: I built the original `step.run` APIs at
| https://www.inngest.com, which offers similar things on any
| platform... without being tied to DB transactions.
| KraftyOne wrote:
| As the post says, the exactly-once guarantee is ONLY for steps
| performing database operations. For those, you actually can get
| an exactly-once guarantee by running the database operations in
| the same Postgres transaction as your durable checkpoint.
| That's a pretty cool benefit of building workflows on Postgres!
| Of course, if there are side effects outside the database,
| those happen at-least-once.
| tonyhb wrote:
| You can totally leverage postgres transactions to give
| someone... postgres transactions!
|
| I just figured that the exactly once semantics were so worth
| discussing that any external side effects (which is what
| orchestration is for) aren't included in that, which is a big
| caveat.
| jedberg wrote:
| > Anything that guarantees exactly once is selling snake oil.
|
| That's a pretty spicy take. I'll agree that exactly-once is
| hard, but it's not impossible. Obviously there are caveats, but
| the beauty of DBOS using Postgres as the method of coordination
| instead of the an external server (like Temporal or Inngest) is
| that the exactly-once guarantees of Postgres can carry over to
| the application. Especially so if you're using that same
| Postgres to store your application data.
| abtinf wrote:
| Why not just use Temporal?
| KraftyOne wrote:
| We wanted to make workflows more lightweight--we're building a
| Postgres-backed library you can add to your existing
| application instead of an external orchestrator that requires
| you to rearchitect your system around it. This post goes into
| more detail: https://www.dbos.dev/blog/durable-execution-
| coding-compariso...
| at0mic22 wrote:
| Every few years someone discovers FOR UPDATE SKIP LOCKED and
| represents it. I remember it lasting for 15 years at least
| qianli_cs wrote:
| Yup, some features are timeless and deserve a re-intro every
| now and then. SKIP LOCKED is definitely one of them.
| skrtskrt wrote:
| with a nice NOWAIT when appropriate
| atombender wrote:
| The "someone" in this case happens to be Michael Stonebraker,
| the creator of Postgres and CTO of DBOS.
| alpb wrote:
| I've been following DBOS for a while and I think the model isn't
| too different than Azure Durable Functions (which uses Azure
| Queues/Tables under the covers to maintain state).
| https://learn.microsoft.com/en-us/azure/azure-functions/dura...
|
| Perhaps the only difference is that Azure Durable Functions has
| more syntactic sugar in C# (instead of DBOS choice being Python)
| to preserve call results in the persistent storage? Where else do
| they differ? At the end, all of them seem to be doing what
| Temporal is doing (which has its own shortcomings and it's also
| possible to get it wrong if you call a function directly instead
| of invoking it via an Activity etc)?
| KraftyOne wrote:
| Both do durable workflows with similar guarantees. The big
| difference is that DBOS is an open-source library you can add
| to your existing code and run anywhere, whereas Durable
| Functions is a cloud offering for orchestrating serverless
| functions on Azure.
| alpb wrote:
| As far as I know, Azure Durable Functions doesn't have a
| server-side proprietary component and it's actually fully
| open source framework/clients as well. So it's actually not a
| cloud offering per-se. You can see the full implementations
| at:
|
| * https://github.com/Azure/durabletask
|
| * https://github.com/microsoft/durabletask-go
| cmdtab wrote:
| Recently moved some of the background jobs from graphile worker
| to DBOS. Really recommend for the simplicity. Took me half an
| hour.
|
| I evaluated temporal, trigger, cloudflare workflows (highly not
| recommended), etc and this was the easiest to implement
| incrementally. Didn't need to change our infrastructure at all.
| Just plugged the worker where I had graphile worker.
|
| The hosted service UX and frontend can use a lot of work though
| but it's not necessary for someone to use. OTEL support was
| there.
| LudwigNagasena wrote:
| What was the reason for the transition?
| cmdtab wrote:
| Needed checkpoints in some of our jobs wrapping around the AI
| agent so we can reduce cost and increase reliability (as
| workflow will start from mid step as opposed to a complete
| restart).
|
| We already check pointed the agent but then figure it's
| better to have a generic abstraction for other stuff we do.
| diarrhea wrote:
| Interesting!
|
| What made you opt for DBOS over Temporal?
| cmdtab wrote:
| Temporal required re-architecting some stuff, their
| typescript sdk and sandbox is bit unintuitive to use so would
| have been an additional item to grok for the team, and
| additional infrastructure to maintain. There was a latency
| trade off too which in our case mattered.
|
| Didn't face any issue though. Temporal observability and UI
| was better than DBOS. Just harder to do incremental migration
| in an existing codebase.
| rlili wrote:
| Some other lightweight solutions around:
|
| https://github.com/iopsystems/durable
|
| https://github.com/maxcountryman/underway
| darkteflon wrote:
| Often wondered whether it would be possible / advisable to
| combine DBOS with, e.g., Dagster if you have complex data
| orchestration requirements. They seem to deal with orthogonal
| concerns but complement nicely. Is integration with orchestration
| frameworks something the DBOS team has any thoughts on?
| KraftyOne wrote:
| Would love to learn more about what you're building--what
| problems or parts of your system would you solve with Dagster
| vs DBOS?
| agambrahma wrote:
| Curious how this compares to Cloudflare, which is the other
| provider that is really going for simplified workflows
| atombender wrote:
| While DBOS looks like a nice system, I was really disappointed to
| learn that Conductor, which is the DBOS equivalent of the
| Temporal server, is not open source.
|
| Without it, you get no centralized coordination of workflow
| recovery. On Kubernetes, for example, my understanding is that
| you will need to use a stateful set to assign stable executor
| IDs, which the Conductor doesn't need.
|
| I suppose that's their business model, to provide a simplistic
| foundation where you have to pay money to get the grown up stuff.
| jumploops wrote:
| I've been looking at migrating to Temporal, but this looks
| interesting.
|
| For context, we have a simple (read: home-built) "durable" worker
| setup that uses BullMQ for scheduling/queueing, but all of the
| actual jobs are Postgres-based.
|
| Due to the cron-nature of the many disparate jobs (bespoke AI-
| native workflows), we have workers that scale up/down basically
| on the hour, every hour.
|
| Temporal is the obvious solution, but it will take some
| rearchitecting to get our jobs to fit their structure. We're also
| concerned with some of their limits (payload size, language
| restrictions, etc.).
|
| Looking at DBOS, it's unclear from the docs how to scale the
| workers:
|
| > DBOS is just a library for your program to import, so it can
| run with any Python/Node program.
|
| In our ideal case, we can add DBOS to our main application for
| scheduling jobs, and then have a simple worker app that scales
| independently.
|
| How "easy" would it be to migrate our current system to DBOS?
| KraftyOne wrote:
| I'd love to learn more about what you're building--just reach
| out at peter.kraft@dbos.dev.
|
| One option is that you have DBOS workflows that schedule and
| submit jobs to an external worker app. Another option is that
| your workers use DBOS queues
| (https://docs.dbos.dev/python/tutorials/queue-tutorial). I'd
| have to better understand your use case to figure out what
| would be the best fit.
___________________________________________________________________
(page generated 2025-08-08 23:00 UTC)