All you need is Postgres until you scale into TBs of data. We use Postgresql as a durable workflow engine, vector search, time-series data, BM25 search, OLTP/OLAP engine, and a queue. It's basically the only dependency we have for lobu.ai
The main benefit is centralizing all the data in one place so we don't need to worry about copying data in between multiple systems. Once something becomes the bottleneck, you can eventually migrate to a purpose specific tool to scale out.To be honest, LISTEN/NOTIFY in my opinion is the most fragile part of PG but it's fine as start until you scale out.
Curious to know experience of people using DBOS and Temporal.
I have used Temporal in the past, works really good, my only problem with it was some limits on request payload or event sizes, created some inconveniences to us when building solutions. It also enforces good engineering practices, but sometimes you don't want to write special logic if your CSV file is larger than 2Mb, upload it to S3, pass link, then download it in the workflow.
What is your experience with DBOS? How does it compare to Temporal in terms of operational complexity, feature parity and anything else
They've just released an external storage approach to solve the large payload issue. I don't 100% love it (it's bolted on, not an intrinsic part), and it's an early release right now - but you can consider this effectively solved for now.
That's good because back in the day if you were putting entire documents in a message queue I would laugh people out the door, putting something in object storage + linking is much more useful (though the distributed system part/backup current state part can be annoying!)
we're using dbos for ai gen workflows and processing video files. understanding how to migrate from celery took time, but for our case it was worth it.
I run a large on-prem temporal setup - throwaway acct as they will likely out me.
Temporal is, in my opinion having run it in prod for over a year - poorly designed, slow and ridicliously heavy infra wise.
If you're doing anything non-trivial (say, 200+ events/workflow) and you need to run only a couple hundred of them concurrently all day, you're going to spend millions on infra, and it's still going to absolutely suck.
Try running their own benchmarks, the numbers are pathetic.
Their sales team is also absolutely appalling and desperate.
From a Developer standpoint, the SDK is quite nice though.
Don't get trapped into nexus, and if the sales team call you make sure legal is in the room.
https://github.com/agentspan-ai/agentspan which is essentially an agentic SDK layer for Conductor can convert any of your langgraph, openAI, vercel, or ADK agent and makes it durable and adds orchestration with no code changes.
We have a durable queue built into postgres to handle some complex notification-ish logic. It's worked excellently and while there are services various cloud providers would love to sell us to do that it's extremely cheap to run.
For that particular usage, the volume we process and business criticality make it a good choice for inventing here - but for other durable processes we just use off the shelf tools since the cost of maintenance would quickly outstrip the value.
Postgres is a great tool to use and far more powerful than most people give it credit for - but there's always the balance of in-house maintenance vs. paying rent for someone else's solution.
Since DBOS doesn't support Rust, we implemented a very minimal Rust version of this at https://github.com/tensorzero/durable. It has been quite stable and extensible but of course you need to be very careful with the SQL implementations. Hope this is interesting to readers here.
Continuously amazed by what you can do with few tools, as long as Postgres is a part of your toolkit.
I recently developed a distributed queue and it works really great - benchmarks great too, with no race conditions or conflicts. I used SKIP LOCKED so that workers can compete safely.
You can also have multiple workers across nodes avoid conflict by using session wide mutexes i.e. pg advisory lock.
Convex has a workpool component that gives the ability to compose big complicated flows in an understandable way, and give you realtime updates on status of various pieces: https://www.convex.dev/components/workflow
I feel it's way too hand wavy on consistency and correctness. My opinion as someone who've implemented marketing workflows that breaks all the time (and tons of painful lessons).
Strong correctness guarantee is something that should not be undermine. Even more important than availability.
The examples on the website is simple but heavily undermines the importance of correctness. Anyone who implement similar pseudo-code directly will eventually suffer from data correctness issue in crashes.
As you said, the example is simple and it might not be obvious to people without prod experience what the problems can be. Postgres can give you all the primitives you need to solve this at the application layer. Durable workflows on Postgres is an effective way to access these primitives.
Having inherited a few of these - you tend to home-grow an ad-hoc version of many of the existing OSS tools, but with less of the patterns baked in.
Not sure where the NIH ends and where you're actually better off with a supported orchestration approach. I suppose if you expect your program to be around a while (or need advanced features), maybe think about using something a bit more battle tested?
The efforts we've undergone to make Oban (and Pro) work with CRDB have been ridiculous. Feature detection all over because of a lack of common operators and functions that can't be used in indexes. The worst is the rampant "serialization_failure" errors that force continual transaction retries. Not how I'd suggest scaling Postgres.
That said, as a predecessor to dbos in building durable workflows just using Postgres, I concur with the overall sentiment.
All you need is Postgres until you scale into TBs of data. We use Postgresql as a durable workflow engine, vector search, time-series data, BM25 search, OLTP/OLAP engine, and a queue. It's basically the only dependency we have for lobu.ai
The main benefit is centralizing all the data in one place so we don't need to worry about copying data in between multiple systems. Once something becomes the bottleneck, you can eventually migrate to a purpose specific tool to scale out.To be honest, LISTEN/NOTIFY in my opinion is the most fragile part of PG but it's fine as start until you scale out.
Armin Ronacher's `absurd` is an implementation of durable workflows for postgres:
https://lucumr.pocoo.org/2025/11/3/absurd-workflows/
https://github.com/earendil-works/absurd
https://earendil-works.github.io/absurd/
I've not used it, but it's worth comparing to other options
Curious to know experience of people using DBOS and Temporal.
I have used Temporal in the past, works really good, my only problem with it was some limits on request payload or event sizes, created some inconveniences to us when building solutions. It also enforces good engineering practices, but sometimes you don't want to write special logic if your CSV file is larger than 2Mb, upload it to S3, pass link, then download it in the workflow.
What is your experience with DBOS? How does it compare to Temporal in terms of operational complexity, feature parity and anything else
They've just released an external storage approach to solve the large payload issue. I don't 100% love it (it's bolted on, not an intrinsic part), and it's an early release right now - but you can consider this effectively solved for now.
That's good because back in the day if you were putting entire documents in a message queue I would laugh people out the door, putting something in object storage + linking is much more useful (though the distributed system part/backup current state part can be annoying!)
we're using dbos for ai gen workflows and processing video files. understanding how to migrate from celery took time, but for our case it was worth it.
I run a large on-prem temporal setup - throwaway acct as they will likely out me.
Temporal is, in my opinion having run it in prod for over a year - poorly designed, slow and ridicliously heavy infra wise.
If you're doing anything non-trivial (say, 200+ events/workflow) and you need to run only a couple hundred of them concurrently all day, you're going to spend millions on infra, and it's still going to absolutely suck.
Try running their own benchmarks, the numbers are pathetic.
Their sales team is also absolutely appalling and desperate.
From a Developer standpoint, the SDK is quite nice though.
Don't get trapped into nexus, and if the sales team call you make sure legal is in the room.
Since I'm in a ranting mode -- here's a good example: you're limited to _ONE_ IO per shard in the history service:
https://github.com/temporalio/temporal/blob/e22e6304b3c4a409...
https://github.com/temporalio/temporal/blob/e22e6304b3c4a409...
Temporal does a crazy amount of database operations and all of these are behind that mutex.
Oh, and you can't change the shard count on existing clusters.
Great stuff.
Conductor OSS does this quite well https://docs.conductor-oss.org/devguide/ai/index.html
https://github.com/agentspan-ai/agentspan which is essentially an agentic SDK layer for Conductor can convert any of your langgraph, openAI, vercel, or ADK agent and makes it durable and adds orchestration with no code changes.
We have a durable queue built into postgres to handle some complex notification-ish logic. It's worked excellently and while there are services various cloud providers would love to sell us to do that it's extremely cheap to run.
For that particular usage, the volume we process and business criticality make it a good choice for inventing here - but for other durable processes we just use off the shelf tools since the cost of maintenance would quickly outstrip the value.
Postgres is a great tool to use and far more powerful than most people give it credit for - but there's always the balance of in-house maintenance vs. paying rent for someone else's solution.
what's "maintenance" here ? If app is also using PostgreSQL it should be just initial effort of writing/importing code to run it, no ?
Since DBOS doesn't support Rust, we implemented a very minimal Rust version of this at https://github.com/tensorzero/durable. It has been quite stable and extensible but of course you need to be very careful with the SQL implementations. Hope this is interesting to readers here.
Continuously amazed by what you can do with few tools, as long as Postgres is a part of your toolkit.
I recently developed a distributed queue and it works really great - benchmarks great too, with no race conditions or conflicts. I used SKIP LOCKED so that workers can compete safely.
You can also have multiple workers across nodes avoid conflict by using session wide mutexes i.e. pg advisory lock.
Convex has a workpool component that gives the ability to compose big complicated flows in an understandable way, and give you realtime updates on status of various pieces: https://www.convex.dev/components/workflow
I feel it's way too hand wavy on consistency and correctness. My opinion as someone who've implemented marketing workflows that breaks all the time (and tons of painful lessons).
Strong correctness guarantee is something that should not be undermine. Even more important than availability.
The examples on the website is simple but heavily undermines the importance of correctness. Anyone who implement similar pseudo-code directly will eventually suffer from data correctness issue in crashes.
As you said, the example is simple and it might not be obvious to people without prod experience what the problems can be. Postgres can give you all the primitives you need to solve this at the application layer. Durable workflows on Postgres is an effective way to access these primitives.
Having inherited a few of these - you tend to home-grow an ad-hoc version of many of the existing OSS tools, but with less of the patterns baked in.
Not sure where the NIH ends and where you're actually better off with a supported orchestration approach. I suppose if you expect your program to be around a while (or need advanced features), maybe think about using something a bit more battle tested?
how is this compared to hatchet?
Citing CockroachDB as an example of scaling Postgres made me spit out coffee. Was this LLM-written?
The efforts we've undergone to make Oban (and Pro) work with CRDB have been ridiculous. Feature detection all over because of a lack of common operators and functions that can't be used in indexes. The worst is the rampant "serialization_failure" errors that force continual transaction retries. Not how I'd suggest scaling Postgres.
That said, as a predecessor to dbos in building durable workflows just using Postgres, I concur with the overall sentiment.
How do you incorporate secrets in this kind of implementation? Stored in db?
Temporal is an insane piece of software, always surprised people dont know about it. You could replace almost youre whole AWS stack with temporal
You could replace nearly the entire AWS stack with an Elixir (Erlang) monolith + Postgres.
Sure, if you wanna run a 48 node cassandra cluster...
PgFlow is pretty awesome for DAG workflows - it's built on pgmq (which does the heavy lifting).
Typescript: https://www.pgflow.dev
Elixir: https://github.com/agoodway/pgflow/blob/main/docs/COMPARISON...