[HN Gopher] Show HN: PgQueuer - Transform PostgreSQL into a Job ...
       ___________________________________________________________________
        
       Show HN: PgQueuer - Transform PostgreSQL into a Job Queue
        
       PgQueuer is a minimalist, high-performance job queue library for
       Python, leveraging the robustness of PostgreSQL. Designed for
       simplicity and efficiency, PgQueuer uses PostgreSQL's LISTEN/NOTIFY
       to manage job queues effortlessly.
        
       Author : jeeybee
       Score  : 84 points
       Date   : 2024-08-18 19:22 UTC (3 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | samwillis wrote:
       | This looks like a great task queue, I'm a massive proponent of
       | "Postgres is all you need" [0] and doubling down on it with my
       | project that takes it to the extreme.
       | 
       | What I would love is a Postgres task queue that does multi-step
       | pipelines, with fan out and accumulation. In my view a structured
       | relational database is a particularly good backend for that as it
       | inherently can model the structure. Is that something you have
       | considered exploring?
       | 
       | The one thing with listen/notify that I find lacking is the max
       | payload size of 8k, it somewhat limits its capability without
       | having to start saving stuff to tables. What I would really like
       | is a streaming table, with a schema and all the rich type
       | support... maybe one day.
       | 
       | 0: https://www.amazingcto.com/postgres-for-everything/
        
         | jeeybee wrote:
         | Thanks for your insights!
         | 
         | Regarding multi-step pipelines and fan-out capabilities: It's a
         | great suggestion, and while PgQueuer doesn't currently support
         | this, it's something I'm considering for future updates.
         | 
         | As for the LISTEN/NOTIFY payload limit, PgQueuer uses these
         | signals just to indicate changes in the queue table, avoiding
         | the size constraint by not transmitting substantial data
         | through this channel.
        
         | halfcat wrote:
         | Is multi-step (fan out, etc) typically something a queue or
         | message bus would handle?
         | 
         | I've always handled this with an orchestrator solution like
         | (think Airflow and similar).
         | 
         | Or is this a matter of use case? Like for a real-time scenario
         | where you need a series of things to happen (user registration,
         | etc) maybe a queue handling this makes sense? Whereas with
         | longer running tasks (ETL pipelines, etc) the orchestrator is
         | beneficial?
        
       | CowOfKrakatoa wrote:
       | How does LISTEN/NOTIFY compare to using select for update skip
       | locked? I thought listen/notify can lose queue items when the
       | process crashes? Is that true? Do you need to code for those
       | cases in some manner?
        
         | jeeybee wrote:
         | LISTEN/NOTIFY and SELECT FOR UPDATE SKIP LOCKED serve different
         | purposes in PgQueuer. LISTEN/NOTIFY notifies consumers about
         | changes in the queue table, prompting them to check for new
         | jobs. This method doesn't inherently lose messages if a process
         | crashes, because it simply triggers a check rather than
         | transmitting data. The actual job handling and locking are
         | managed by SELECT FOR UPDATE SKIP LOCKED, which safely
         | processes each job even when multiple workers are involved.
        
         | severino wrote:
         | I think the usage of listen/notify is just a mechanism to save
         | you from querying the database every X seconds looking for new
         | tasks (polling). That has some drawbacks, because if the
         | timeout is too small, you are making too much queries that
         | usually may not return any new tasks, and if it's too big, then
         | you may start processing the task long after it was submitted.
         | This way, it just notifies you that new tasks are ready so you
         | can query the database.
        
       | cklee wrote:
       | I've been thinking about the potential for PostgreSQL-backed job
       | queue libraries to share a common schema. For instance, I'm a big
       | fan of Oban in Elixir: https://github.com/sorentwo/oban
       | 
       | Given that there are many Sidekiq-compatible libraries across
       | various languages, it might be beneficial to have a similar
       | approach for PostgreSQL-based job queues. This could allow for
       | job processing in different languages while maintaining
       | compatibility.
       | 
       | Alternatively, we could consider developing a core job queue
       | library in Rust, with language-specific bindings. This would
       | provide a robust, cross-language solution while leveraging the
       | performance and safety benefits of Rust.
        
         | memset wrote:
         | I am building an SQS compatible queue for exactly that reason.
         | Use with any language or framework.
         | https://github.com/poundifdef/smoothmq
         | 
         | It is based on SQLite, but it's written in a modular way. It
         | would be easy to add Postgres as a backend (in fact, it might
         | "just work" if I switch the ORM connection string.)
        
           | GordonS wrote:
           | Does SmoothMQ support running multiple nodes for high
           | availability? (I didn't see anything in the docs, but they
           | seem unfinished)
        
         | earthnail wrote:
         | This would be so immensely useful. I'd estimate that there are
         | so many cases where the producer is Node or Rails and the
         | consumer is Python.
        
         | stephenr wrote:
         | Qless "solves" this problem (in redis) by having all core logic
         | written as lua and executed in redis.
         | 
         | You could take a similar approach for pg: define a series of
         | procedures that provide all the required functionality, and
         | then language bindings are all just thin wrappers (to handle
         | language native stuff) around calls to execute a given
         | procedure with the correct arguments.
        
         | rileymichael wrote:
         | If you want a generic queue that can be consumed in any
         | runtime, you can just build it directly into postgres via
         | extensions like https://github.com/tembo-io/pgmq.
        
       | redskyluan wrote:
       | there seems to be a big hype to adapt pg into any infra. I love
       | PG but this seems not be right thing.
        
         | mlnj wrote:
         | I use it as a job queue. Yes, it has it's cons, but not dealing
         | with another moving piece in the big picture is totally worth
         | it.
        
         | sgarland wrote:
         | At low-medium scale, this will be fine. Even at higher scale,
         | so long as you monitor autovacuum performance on the queue
         | table.
         | 
         | At some point it may become practical to bring a dedicated
         | queue system into the stack, sure, but this can massively
         | simplify things when you don't need or want the additional
         | complexity.
        
           | jeeybee wrote:
           | I agree, there is no need for FANG level infrastructure. Imo.
           | in most cases, the simplicity / performance tradeoff for
           | small/medium is worth it. There is also a statistics tooling
           | that helps you monitor throughput and failure rats
           | (aggregated on a per second basis)
        
         | eknkc wrote:
         | Instead of SQS, I recently created a basic abstraction on PG
         | that mimics the SQS apis. The intention was to use it during
         | development and we would simply switch to SQS later.
         | 
         | Never did. The production code still uses PG based queue (which
         | has been improved since) and pg just works perfectly fine.
         | Might still need to go with a dedicated queue service at some
         | point but it has been perfectly fine so far.
        
       | westurner wrote:
       | Does the celery SQLAlchemy broker support PostgreSQL's
       | LISTEN/NOTIFY features?
       | 
       | Similar support in SQLite would simplify testing applications
       | built with celery.
       | 
       | How to add table event messages to SQLite so that the SQLite
       | broker has the same features as AMQP? Could a vtable facade send
       | messages on tablet events?
       | 
       | Are there sqlite Triggers?
       | 
       | Celery > Backends and Brokers:
       | https://docs.celeryq.dev/en/stable/getting-started/backends-...
       | 
       | /? sqlalchemy listen notify:
       | https://www.google.com/search?q=sqlalchemy+listen+notify :
       | 
       | asyncpg.Connection.add_listener
       | 
       | sqlalchemy.event.listen, @listen_for
       | 
       | psychopg2 conn.poll(), while connection.notifies
       | 
       | psychopg2 > docs > advanced > Advanced notifications:
       | https://www.psycopg.org/docs/advanced.html#asynchronous-noti...
       | 
       | PgQueuer.db, PgQueuer.listeners.add_listener; asyncpg
       | add_listener:
       | https://github.com/janbjorge/PgQueuer/blob/main/src/PgQueuer...
       | 
       | asyncpg/tests/test_listeners.py:
       | https://github.com/MagicStack/asyncpg/blob/master/tests/test...
       | 
       | /? sqlite LISTEN NOTIFY:
       | https://www.google.com/search?q=sqlite+listen+notify
       | 
       | sqlite3 update_hook:
       | https://www.sqlite.org/c3ref/update_hook.html
        
       | aflukasz wrote:
       | BTW: Good PostgresFM episode on implementing queues in Postgres,
       | various caveats etc: https://www.youtube.com/watch?v=mW5z5NYpGeA
       | .
        
         | jeeybee wrote:
         | thanks for sharing, added to my to watch list.
        
       | fijiaarone wrote:
       | You can make anything that stores data into a job queue.
        
         | kaoD wrote:
         | But can you make a _decent_ job queue with anything that stores
         | data? Not easily. E.g. you need atomicity if multiple consumers
         | can take jobs, and I think you need CAS for that, not just any
         | storage will do, right?
         | 
         | You probably need ACI and also D if you want your jobs to
         | persist.
        
       | _medihack_ wrote:
       | There is also Procrastinate:
       | https://procrastinate.readthedocs.io/en/stable/index.html
       | 
       | Procrastinate also uses PostgreSQL's LISTEN/NOTIFY (but can
       | optionally be turned off and use polling). It also supports many
       | features (and more are planned), like sync and async jobs (it
       | uses asyncio under the hood), periodic tasks, retries, task
       | locks, priorities, job cancellation/aborting, Django integration
       | (optional).
       | 
       | DISCLAIMER: I am one of the co-maintainers of Procrastinate.
        
       | martinald wrote:
       | Any suggestions for something like this for dotnet?
        
         | eknkc wrote:
         | Hangfire with PostgreSQL driver.
        
       | rgbrgb wrote:
       | Cool, congrats on releasing. Have you seen graphile worker?
       | Wondering how this compares or if you're building for different
       | use-cases.
        
       | ijustlovemath wrote:
       | You could even layer in PostgREST for a nice HTTP API that is
       | available from any language!
        
       ___________________________________________________________________
       (page generated 2024-08-18 23:00 UTC)