[HN Gopher] Show HN: PgQueuer - Transform PostgreSQL into a Job ...
       ___________________________________________________________________
        
       Show HN: PgQueuer - Transform PostgreSQL into a Job Queue
        
       PgQueuer is a minimalist, high-performance job queue library for
       Python, leveraging the robustness of PostgreSQL. Designed for
       simplicity and efficiency, PgQueuer uses PostgreSQL's LISTEN/NOTIFY
       to manage job queues effortlessly.
        
       Author : jeeybee
       Score  : 349 points
       Date   : 2024-08-18 19:22 UTC (1 days ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | samwillis wrote:
       | This looks like a great task queue, I'm a massive proponent of
       | "Postgres is all you need" [0] and doubling down on it with my
       | project that takes it to the extreme.
       | 
       | What I would love is a Postgres task queue that does multi-step
       | pipelines, with fan out and accumulation. In my view a structured
       | relational database is a particularly good backend for that as it
       | inherently can model the structure. Is that something you have
       | considered exploring?
       | 
       | The one thing with listen/notify that I find lacking is the max
       | payload size of 8k, it somewhat limits its capability without
       | having to start saving stuff to tables. What I would really like
       | is a streaming table, with a schema and all the rich type
       | support... maybe one day.
       | 
       | 0: https://www.amazingcto.com/postgres-for-everything/
        
         | jeeybee wrote:
         | Thanks for your insights!
         | 
         | Regarding multi-step pipelines and fan-out capabilities: It's a
         | great suggestion, and while PgQueuer doesn't currently support
         | this, it's something I'm considering for future updates.
         | 
         | As for the LISTEN/NOTIFY payload limit, PgQueuer uses these
         | signals just to indicate changes in the queue table, avoiding
         | the size constraint by not transmitting substantial data
         | through this channel.
        
         | halfcat wrote:
         | Is multi-step (fan out, etc) typically something a queue or
         | message bus would handle?
         | 
         | I've always handled this with an orchestrator solution like
         | (think Airflow and similar).
         | 
         | Or is this a matter of use case? Like for a real-time scenario
         | where you need a series of things to happen (user registration,
         | etc) maybe a queue handling this makes sense? Whereas with
         | longer running tasks (ETL pipelines, etc) the orchestrator is
         | beneficial?
        
         | whateveracct wrote:
         | I've come to hate "Postgres is all you need." Or at least, "a
         | single Postgres database is all you need."
        
           | philippta wrote:
           | Why?
        
             | tmountain wrote:
             | Just to share an anecdote, we've been able to get to market
             | faster than ever before by just using Postgres (Supabase)
             | for basically everything. We're leveraging RLS, and it's
             | saved us from building an API (just using PostgREST). We've
             | knocked months off our project timeline by doing this.
        
           | berkes wrote:
           | I won't call it "hate", but I've ran into quite some
           | situations where the Postgres version caused a lot of pain.
           | 
           | - When it wasn't as easy as a dedicated solution: where
           | installing and managing a focused service is overall easier
           | than shoehorning it into PG.
           | 
           | - when it didn't perform anywhere close to a dedicated
           | solution: overhead from the guarantees that PG makes (acid
           | and all that) when you don't need them. Or where the
           | relational architecture isn't suited for this type of data:
           | e.g. hierarchical, time-series, etc.
           | 
           | - when it's not as feature complete as a dedicated service:
           | for example I am quite sure one can build (parts of) an
           | ActiveDirectory or Kafka Bus, entirely in PG. But it will
           | lack features that in future you'll likely need - they are
           | built into these dedicated solutions because they are often
           | needed after all.
        
         | jashmatthews wrote:
         | Putting low throughput queues in the same DB is great both for
         | simplicity and for getting exactly-once-processing.
         | 
         | Putting high throughput queues in Postgres sucks because...
         | 
         | No O(1) guarantee to get latest job. Query planner can go
         | haywire.
         | 
         | High update tables bloat like crazy. Needs a whole new storage
         | engine aka ZHEAP
         | 
         | Write amplification as every update has to update every index
         | 
         | LISTEN/NOTIFY doesn't work through connection pooling
        
           | mickeyp wrote:
           | Update-related throughput and index problems are only a
           | problem if you update tables. You can use an append-only
           | structure to mitigate some of that: insert new entries with
           | the updated statuses instead. You gain the benefit of history
           | also. You can even coax the index into holding non-key values
           | for speed with INCLUDE to CREATE INDEX.
           | 
           | You can then delete the older rows when needed or as
           | required.
           | 
           | Query planner issues are a general problem in postgres and is
           | not unique to this problem. Not sure what O(1) means in this
           | context. I am not sure pg has ever been able to promise
           | constant-time access to anything; indeed, with an index, it'd
           | never be asymptotically upper bounded as constant time at
           | all?
        
             | jashmatthews wrote:
             | By the time you need append-only job statuses it's better
             | to move to a dedicated queue. Append-only statuses help but
             | they also make the polling query a lot more expensive.
             | 
             | Deleting older rows is a nightmare at scale. It leaves
             | holes in the earlier parts of the table and nerfs half the
             | advantage of using append-only in the first place. You end
             | up paying 8kb page IO costs for a single job.
             | 
             | Dedicated queues have constant time operations for enqueue
             | and dequeue which don't blow up at random times.
        
               | iTokio wrote:
               | partitions are often used to drop old data in constant
               | time.
               | 
               | They can also help to mitigate io issues if you use your
               | insertion timestamp as the partition key and include it
               | in your main queries.
        
               | jashmatthews wrote:
               | Yeah the ULID/UUIDs which can be be partitioned by time
               | in this way are AWESOME for these use cases.
        
               | felixyz wrote:
               | With a partitioned table you can painlessly remove old
               | rows. Of course, you then have to maintain your
               | partitions, but that's trivial.
        
               | jashmatthews wrote:
               | It's far from trivial. Autoanalyze doesn't work on
               | partitioned tables, only on the partitions themselves.
               | Partitioning a busy job queue table is a nightmare in
               | itself.
        
           | ulrikrasmussen wrote:
           | > LISTEN/NOTIFY doesn't work through connection pooling
           | 
           | What's the problem with using it with connection pooling?
        
             | jeeybee wrote:
             | asyncpg clears out any listeners you have setup once a
             | connection is returned to pool. This will lead to 'missed'
             | events. I guess its something of the same story with
             | psycopg?
             | 
             | If event(s) any jobs will be picked up by the next event or
             | by a timer that checks every 30 seconds or so (can be set
             | by the dev.)
        
             | jashmatthews wrote:
             | Best to just forget about it and listen on a connection not
             | using pooling.
             | 
             | https://jpcamara.com/2023/04/12/pgbouncer-is-
             | useful.html#lis...
        
           | paffdragon wrote:
           | Indeed, that's my experience too. We used partitions like
           | others mentioned below, but Postgres had issues with moving
           | rows across tables atomically and had to implement our custom
           | complex queries to overcome it. Plus job expiration was
           | dynamic and had to use background cleaning. The bigger
           | problem was with the planner not able to pick up sudden
           | changes in volume and had to use a cron to run analyze on it.
           | Managing retries with backoffs, etc.. At some point we
           | stopped fighting it and just moved to SQS, we have zero
           | problems since, no maintenence needed, and it's still free so
           | we saved storage cost, time and developer effort for ongoing
           | maintenance.
           | 
           | We still use Postgres for simple queues, but those don't
           | really require a library as it's quite simple usually, with
           | some advisory locks we can handle the crashed job unlocking
           | fairly well too.
        
       | CowOfKrakatoa wrote:
       | How does LISTEN/NOTIFY compare to using select for update skip
       | locked? I thought listen/notify can lose queue items when the
       | process crashes? Is that true? Do you need to code for those
       | cases in some manner?
        
         | jeeybee wrote:
         | LISTEN/NOTIFY and SELECT FOR UPDATE SKIP LOCKED serve different
         | purposes in PgQueuer. LISTEN/NOTIFY notifies consumers about
         | changes in the queue table, prompting them to check for new
         | jobs. This method doesn't inherently lose messages if a process
         | crashes, because it simply triggers a check rather than
         | transmitting data. The actual job handling and locking are
         | managed by SELECT FOR UPDATE SKIP LOCKED, which safely
         | processes each job even when multiple workers are involved.
        
         | severino wrote:
         | I think the usage of listen/notify is just a mechanism to save
         | you from querying the database every X seconds looking for new
         | tasks (polling). That has some drawbacks, because if the
         | timeout is too small, you are making too much queries that
         | usually may not return any new tasks, and if it's too big, then
         | you may start processing the task long after it was submitted.
         | This way, it just notifies you that new tasks are ready so you
         | can query the database.
        
         | halfcat wrote:
         | There are two things:
         | 
         | 1. Signaling
         | 
         | 2. Messaging
         | 
         | In some systems, those are, effectively, the same. A consumer
         | listens, and the signal is the message. If the consumer process
         | crashes, the message returns to the queue and gets processed
         | when the consumer comes back online.
         | 
         | If the signal and messaging are separated, as in Postgres,
         | where LISTEN/NOTIFY is the signal, and the skip locked query is
         | the message pull, the consumer process would need to do some
         | combination of polling and listening.
         | 
         | In the consumer, that could essentially be a loop that's just
         | doing the skip locked query on startup, then dropping into a
         | LISTEN query only once there are no messages present in the
         | queue. Then the LISTEN/NOTIFY is just signaling to tell the
         | consumer to check for new messages.
        
       | cklee wrote:
       | I've been thinking about the potential for PostgreSQL-backed job
       | queue libraries to share a common schema. For instance, I'm a big
       | fan of Oban in Elixir: https://github.com/sorentwo/oban
       | 
       | Given that there are many Sidekiq-compatible libraries across
       | various languages, it might be beneficial to have a similar
       | approach for PostgreSQL-based job queues. This could allow for
       | job processing in different languages while maintaining
       | compatibility.
       | 
       | Alternatively, we could consider developing a core job queue
       | library in Rust, with language-specific bindings. This would
       | provide a robust, cross-language solution while leveraging the
       | performance and safety benefits of Rust.
        
         | memset wrote:
         | I am building an SQS compatible queue for exactly that reason.
         | Use with any language or framework.
         | https://github.com/poundifdef/smoothmq
         | 
         | It is based on SQLite, but it's written in a modular way. It
         | would be easy to add Postgres as a backend (in fact, it might
         | "just work" if I switch the ORM connection string.)
        
           | GordonS wrote:
           | Does SmoothMQ support running multiple nodes for high
           | availability? (I didn't see anything in the docs, but they
           | seem unfinished)
        
             | memset wrote:
             | Not today. It's a work in progress! There are several
             | iterations that I'm working on:
             | 
             | 1. Primary with secondaries as replicas (replication for
             | availability) 2. Sharding across multiple nodes (sharding
             | for horizontal scaling) 3. Sharding with replication
             | 
             | However, those aren't ready yet. The easiest way to
             | implement this would probably be to use Postgres as the
             | backing storage for the queue, which means relying on
             | Postgres' multiple node support. Then the queue server
             | itself could also scale up and down independently.
             | 
             | Working on the docs! I'd love your feedback - what makes
             | them seem unfinished? (What would you want to see that
             | would make them feel more complete?)
        
           | justinclift wrote:
           | Sounds like it wouldn't have immediate notification of new
           | submissions due to no listen/notify support in SQLite?
        
             | memset wrote:
             | It does not implement immediate notification of new
             | submissions because the SQS protocol doesn't have a "push"
             | mechanism, only pull.
             | 
             | The software, however, could support this for a different
             | queue protocol. This is because SQLite is just used as a
             | disk store for queue items. The golang code itself still
             | processes each message before writing to disk. Since that
             | code is "aware" of incoming messages, it could implement an
             | immediate notification mechanism if there was a protocol
             | that supported it.
        
               | agrothberg wrote:
               | SQS does offer long polling, which looks closer to "push"
               | semantics.
        
               | memset wrote:
               | Fair enough. I do implement long polling!
        
         | earthnail wrote:
         | This would be so immensely useful. I'd estimate that there are
         | so many cases where the producer is Node or Rails and the
         | consumer is Python.
        
           | mind-blight wrote:
           | This is the exact use case I'm running into right now. I've
           | been looking at BullMQ some it has good typescript support,
           | and is working towards a 1.0 for python. But, I have tried it
           | out in a production stack yet
        
             | jmvoodoo wrote:
             | We have been using bullmq in production for just over a
             | year. It is a piece of technology that our team doesn't
             | have to think about, which is pretty much all I could ask
             | for.
             | 
             | We did end up adding some additional generics which allows
             | us to get strong typing between producers and consumers.
             | That I think has been a key piece of making it easy to use
             | and avoiding dumb mistakes.
        
           | bgentry wrote:
           | River ( https://riverqueue.com ) is a Postgres background job
           | engine written in Go, which also has insert only clients in
           | other languages. Currently we have these for Ruby and Python:
           | 
           | https://github.com/riverqueue/riverqueue-ruby
           | 
           | https://github.com/riverqueue/riverqueue-python
        
         | stephenr wrote:
         | Qless "solves" this problem (in redis) by having all core logic
         | written as lua and executed in redis.
         | 
         | You could take a similar approach for pg: define a series of
         | procedures that provide all the required functionality, and
         | then language bindings are all just thin wrappers (to handle
         | language native stuff) around calls to execute a given
         | procedure with the correct arguments.
        
         | rileymichael wrote:
         | If you want a generic queue that can be consumed in any
         | runtime, you can just build it directly into postgres via
         | extensions like https://github.com/tembo-io/pgmq.
        
           | tmountain wrote:
           | Also, pgmq can run as a TLE (trusted language extension), so
           | you can install it into cloud hosted Postgres solutions like
           | Supabase. We're using pgmq, and it's solid so far.
        
           | vantiro wrote:
           | What I like about https://github.com/tembo-io/pgmq is that it
           | can be used with any programming language and does not
           | require any background workers or running a binary in
           | addition to postgres.
        
         | AlphaSite wrote:
         | River is my go to in Golang, it's really handy to have
         | transactional queuing with a nice little ui.
        
         | kraihx wrote:
         | For JavaScript and Perl there is already Minion, which relies
         | on listen/notify + FOR UPDATE SKIP LOCKED.
         | 
         | https://github.com/mojolicious/minion.js
         | 
         | https://github.com/mojolicious/minion
        
         | ukd1 wrote:
         | A common schema is one nice thing, but imho the win of these db
         | backed queues is being able to do things, including enqueue
         | background jobs in a single transaction. e.g. create user,
         | enqueue welcome email - both get done, or not - with redid-
         | based, this is ... not usually a thing; if you fail to do one,
         | it's left half done, leading to more code etc
         | 
         | p.s. I maintain a ruby equivalent called QueueClassic
        
       | redskyluan wrote:
       | there seems to be a big hype to adapt pg into any infra. I love
       | PG but this seems not be right thing.
        
         | mlnj wrote:
         | I use it as a job queue. Yes, it has it's cons, but not dealing
         | with another moving piece in the big picture is totally worth
         | it.
        
         | sgarland wrote:
         | At low-medium scale, this will be fine. Even at higher scale,
         | so long as you monitor autovacuum performance on the queue
         | table.
         | 
         | At some point it may become practical to bring a dedicated
         | queue system into the stack, sure, but this can massively
         | simplify things when you don't need or want the additional
         | complexity.
        
           | jeeybee wrote:
           | I agree, there is no need for FANG level infrastructure. Imo.
           | in most cases, the simplicity / performance tradeoff for
           | small/medium is worth it. There is also a statistics tooling
           | that helps you monitor throughput and failure rats
           | (aggregated on a per second basis)
        
           | arp242 wrote:
           | Aside from that, the main advantage of this is transactions.
           | I can do:                 begin;         insert_row();
           | schedule_job_for_elasticsearch();       commit;
           | 
           | And it's guaranteed that both the row and job for
           | Elasticsearch update are inserted.
           | 
           | If you use a dedicated queue system them this becomes a lot
           | more tricky:                 begin;         insert_row();
           | schedule_job_for_elasticsearch();       commit; // Can fail,
           | and then we have a ES job but no SQL row.            begin;
           | insert_row();       commit;
           | schedule_job_for_elasticsearch(); // Can fail, and then we
           | have a SQL row and no job.
           | 
           | There are of course also situations where this doesn't apply,
           | but this "insert row(s) in SQL and then queue job to do more
           | with that" is a fairly common use case for queues, and in
           | those cases this is a great choice.
        
             | nostrebored wrote:
             | Most of these two phase problems can be solved by having
             | separate queue consumers.
             | 
             | And as far as I can tell, this is only a perk when your two
             | actions are mutate the collocated database and do X. For
             | all other situations this seems like a downgrade.
        
               | jashmatthews wrote:
               | Do you mean like the consumer for the first phase
               | enqueues a job for the second phase?
        
             | jashmatthews wrote:
             | Transactional Outbox solves this. You use a table like in
             | the first example but instead of actually doing the
             | ElasticSearch update the Outbox table is piped into the
             | dedicated queue.
        
         | eknkc wrote:
         | Instead of SQS, I recently created a basic abstraction on PG
         | that mimics the SQS apis. The intention was to use it during
         | development and we would simply switch to SQS later.
         | 
         | Never did. The production code still uses PG based queue (which
         | has been improved since) and pg just works perfectly fine.
         | Might still need to go with a dedicated queue service at some
         | point but it has been perfectly fine so far.
        
         | jascha_eng wrote:
         | I mean I love postgres like the next guy. And I like simple
         | solutions as long as they work. I just wonder if this is truly
         | simpler than using a redis or rabbitmq queue if you need
         | Queues. If you're already using a cloud provider sqs is quite
         | trivial as well.
         | 
         | I guess if you already have postgres and don't want to use the
         | cloud provider's solution. You can use this to avoid hosting
         | another piece of infra.
        
           | bdcravens wrote:
           | db-based gives you the ability to query against your queues,
           | if you use case needs it. Other options tend to dispose the
           | state once the job is finished.
        
         | SOLAR_FIELDS wrote:
         | I think for me the problem with every single new PG queue is
         | that it seems like everyone and their mother thinks they need
         | to reinvent this specific wheel for some reason and the flavor
         | of the day doesn't often bring much new to the space. Probably
         | because it's
         | 
         | 1. Pretty easy to understand and grok the problem space
         | 
         | 2. Scratching the programmer itch of wanting something super
         | generic that you can reuse all over the place
         | 
         | 3. Doable with a modest effort over a reasonable scope of time
         | 
         | 4. Built on rock solid internals (Postgres) with specific
         | guarantees that you can lean on
         | 
         | Here's 7 of them just right quick:
         | 
         | - https://github.com/timgit/pg-boss
         | 
         | - https://github.com/queueclassic/queue_classic
         | 
         | - https://github.com/florentx/pgqueue
         | 
         | - https://github.com/mbreit/pg_jobs
         | 
         | - https://github.com/graphile/worker
         | 
         | - https://github.com/pgq/pgq
         | 
         | - https://github.com/que-rb/que
         | 
         | Probably could easily find more by searching, I only spent
         | about 5 minutes looking and grabbing the first ones I found.
         | 
         | I'm all for doing this kind of thing as an academic exercise,
         | because it's a great way to learn about this problem space. But
         | at this point if you're reinventing the Postgres job queue
         | wheel and sharing it to this technical audience you need to
         | probably also include why your wheel is particularly
         | interesting if you want to grab my attention.
        
       | westurner wrote:
       | Does the celery SQLAlchemy broker support PostgreSQL's
       | LISTEN/NOTIFY features?
       | 
       | Similar support in SQLite would simplify testing applications
       | built with celery.
       | 
       | How to add table event messages to SQLite so that the SQLite
       | broker has the same features as AMQP? Could a vtable facade send
       | messages on tablet events?
       | 
       | Are there sqlite Triggers?
       | 
       | Celery > Backends and Brokers:
       | https://docs.celeryq.dev/en/stable/getting-started/backends-...
       | 
       | /? sqlalchemy listen notify:
       | https://www.google.com/search?q=sqlalchemy+listen+notify :
       | 
       | asyncpg.Connection.add_listener
       | 
       | sqlalchemy.event.listen, @listen_for
       | 
       | psychopg2 conn.poll(), while connection.notifies
       | 
       | psychopg2 > docs > advanced > Advanced notifications:
       | https://www.psycopg.org/docs/advanced.html#asynchronous-noti...
       | 
       | PgQueuer.db, PgQueuer.listeners.add_listener; asyncpg
       | add_listener:
       | https://github.com/janbjorge/PgQueuer/blob/main/src/PgQueuer...
       | 
       | asyncpg/tests/test_listeners.py:
       | https://github.com/MagicStack/asyncpg/blob/master/tests/test...
       | 
       | /? sqlite LISTEN NOTIFY:
       | https://www.google.com/search?q=sqlite+listen+notify
       | 
       | sqlite3 update_hook:
       | https://www.sqlite.org/c3ref/update_hook.html
        
       | aflukasz wrote:
       | BTW: Good PostgresFM episode on implementing queues in Postgres,
       | various caveats etc: https://www.youtube.com/watch?v=mW5z5NYpGeA
       | .
        
         | jeeybee wrote:
         | thanks for sharing, added to my to watch list.
        
       | fijiaarone wrote:
       | You can make anything that stores data into a job queue.
        
         | kaoD wrote:
         | But can you make a _decent_ job queue with anything that stores
         | data? Not easily. E.g. you need atomicity if multiple consumers
         | can take jobs, and I think you need CAS for that, not just any
         | storage will do, right?
         | 
         | You probably need ACI and also D if you want your jobs to
         | persist.
        
       | _medihack_ wrote:
       | There is also Procrastinate:
       | https://procrastinate.readthedocs.io/en/stable/index.html
       | 
       | Procrastinate also uses PostgreSQL's LISTEN/NOTIFY (but can
       | optionally be turned off and use polling). It also supports many
       | features (and more are planned), like sync and async jobs (it
       | uses asyncio under the hood), periodic tasks, retries, task
       | locks, priorities, job cancellation/aborting, Django integration
       | (optional).
       | 
       | DISCLAIMER: I am a co-maintainer of Procrastinate.
        
         | joking wrote:
         | it should be the opposite of procastination, but good naming
         | anyway.
        
         | lukebuehler wrote:
         | I'm using Procrastinate in several projects. Would definitely
         | like to see a comparison.
         | 
         | What I personally love about Procrastinate is async, locks,
         | delayed and scheduled jobs, queue specific workers (allowing to
         | factor the backend in various ways). All this with a simple
         | codebase and schema.
        
       | martinald wrote:
       | Any suggestions for something like this for dotnet?
        
         | eknkc wrote:
         | Hangfire with PostgreSQL driver.
        
         | hkon wrote:
         | Updlock rowlock readpast should do the trick
        
         | wordofx wrote:
         | It's a simple query to write. You don't need a library or
         | framework.
        
       | rgbrgb wrote:
       | Cool, congrats on releasing. Have you seen graphile worker?
       | Wondering how this compares or if you're building for different
       | use-cases.
        
         | mind-blight wrote:
         | I think graphile worker is Node only. This project is for
         | Python.
        
           | pestaa wrote:
           | There is experimental support for arbitrary executables:
           | 
           | https://worker.graphile.org/docs/tasks#loading-executable-
           | fi...
           | 
           | But you can use a thin JS wrapper to make shell calls from
           | Node. Slightly inconvenient, but works well for my use case.
        
         | jeeybee wrote:
         | I haven't used Graphile Worker since I'm not familiar with
         | JavaScript. PgQueuer is tailored for Python with PostgreSQL
         | environments. I'd be interested to hear about Graphile Worker's
         | features and how they might inspire improvements to PgQueuer.
        
       | ijustlovemath wrote:
       | You could even layer in PostgREST for a nice HTTP API that is
       | available from any language!
        
         | topspin wrote:
         | Already done. See: PostgREST. Want to use PostgreSQL (or most
         | other RDBMSs) as the backend for an actively developed,
         | multiprotocol, multiplatform, open source, battle proven
         | message broker that also provides a messaging REST API of its
         | own? Use ActiveMQ (either flavor) and configure a JDBC backend.
         | Done.
        
       | gmag wrote:
       | You might also want to look at River
       | (https://github.com/riverqueue/river) for inspiration as they
       | support scheduled jobs, etc.
       | 
       | From an end-user perspective, they also have a UI which is nice
       | to have for debugging.
        
         | onionisafruit wrote:
         | I've been using river for some low volume stuff. I love that I
         | can add a job to the queue in the same db transaction that
         | handle the synchronous changes.
        
         | bdcravens wrote:
         | Glancing at it briefly, I like the Workflows feature. I'm a
         | long time Sidekiq user (Ruby), and while you can construct
         | workflows pretty easily (especially using nested batches and
         | callbacks in the Pro version), there really isn't a dedicated
         | UI for visualizing them.
        
         | rubenvanwyk wrote:
         | Also wanted to say I thought this problem has already been
         | solved by River.
         | 
         | Although seems like OP references a Python library rather than
         | standalone server, so would probably be useful to Python devs.
        
       | airocker wrote:
       | We use listen notify extensively and it is great. The things it
       | lacks most for us is guaranteed single recipient. All subscribers
       | get all notifications which leads to problems in determining who
       | should act on the message n our case.
        
       | bdcravens wrote:
       | Good Job does the same for Rails
       | 
       | https://github.com/bensheldon/good_job
        
         | strzibny wrote:
         | Wanted to post this, glad it's already here. This is PgQueuer
         | for Rails but also with some history under its belt.
        
         | Lio wrote:
         | There's also the new built in SolidQueue.
         | 
         | https://github.com/rails/solid_queue/
        
           | sandGorgon wrote:
           | https://dev.37signals.com/introducing-solid-queue/
        
         | BilalBudhani wrote:
         | Thanks for mentioning this gem (literally).
         | 
         | I moved my projects to GoodJob and it has been smooth sailing.
        
       | rtpg wrote:
       | I am going to go the other direction on this... to anyone reading
       | this, please consider using a backend-generic queueing system for
       | your Python project.
       | 
       | Why? Mainly because those systems offer good affordances for
       | testing and running locally in an operationally simple way. They
       | also tend to have decent default answers for various futzy
       | questions around disconnects at various parts of the workflow.
       | 
       | We all know Celery is a buggy pain in the butt, but rolling your
       | own job queue likely ends up with you just writing a similary-
       | buggy pain in the butt. We've already done "Celery but simpler",
       | it's stuff like Dramatiq!
       | 
       | If you have backend-specific needs, you won't listen to this
       | advice. But think deeply how important your needs are. Computers
       | are fast, and you can deal with a lot of events with most
       | systems.
       | 
       | Meanwhile if you use a backend-generic system... well you could
       | write a backend using PgQueuer!
        
         | nsonha wrote:
         | > those systems offer good affordances for testing and running
         | locally in an operationally simple way
         | 
         | Define "operationally simple", most if not all of them need
         | persistent anyway, on top of the queue itself. This eliminates
         | the queue and uses a persistent you likely already have.
        
           | rtpg wrote:
           | Well for example, lots of queueing libraries have an "eager
           | task" runtime option. What does that do? Instead of putting
           | work into a backend queue, it just immediately runs the task
           | in-process. You don't need any processing queue!
           | 
           | How many times have you shipped some background task change,
           | only to realize half your test suite doesn't do anything with
           | background tasks, and you're not testing your business logic
           | to the logical conclusion? Eager task execution catches bugs
           | earlier on, and is close enough to the reality for things
           | that matter, while removing the need for, say, multi-process
           | cordination in most tests.
           | 
           | And you can still test things the "real way" if you need to!
           | 
           | And to your other point: you can use Dramatiq with Postgres,
           | for example[0]. I've written custom backends that just use pg
           | for these libs, it's usually straightforward because the
           | broker classes tend to abstract the gnarly things.
           | 
           | [0]: https://pypi.org/project/dramatiq-pg/
        
           | topspin wrote:
           | Some message queue brokers that traditionally implement their
           | own backends can also use Postgresql (and other RDBMSs) for
           | persistence. This is a reasonable option if you a.) want to
           | consolidate persistence backends b.) want a mature, battle
           | proven broker and client stack.
        
         | miohtama wrote:
         | Some names
         | 
         | - Celery (massive and heavy)
         | 
         | - Dramatiq
         | 
         | - APScheduler
         | 
         | - Huey
         | 
         | Today, Redis queues, unless stricly a single process, seem to
         | be most pain free for small scale use.
        
           | nicktrocado wrote:
           | We had a terrible time with Dramatiq; very buggy and
           | resource-heavy. We ended up switching to SNS/SQS combo
        
         | killerstorm wrote:
         | In my experience, it's easy to test locally with PG: we have
         | unit tests which re-create DB for each test... It works.
         | 
         | Also DB transactions are absolutely the best way to provide
         | ACID guarantee
        
       | worik wrote:
       | Why so much code for Avery simple concept
       | 
       | One table.
       | 
       | Producer writes Co sumer reads
       | 
       | A very good idea
        
       | joseferben wrote:
       | for typescript there is pg-boss, works great for us
        
       | est wrote:
       | The most simple job queue in MySQL:                   update
       | job_table set key=value where ... limit 1
       | 
       | It's simple and atomic. Unfortunately PG doesn't allow `update
       | ... limit` syntax
        
         | DemocracyFTW2 wrote:
         | > It's simple and atomic
         | 
         | and almost certainly incorrect, gotta read at least
         | https://www.pgcon.org/2016/schedule/attachments/414_queues-p...
         | which discusses FOR UPDATE SKIP LOCKED
        
           | est wrote:
           | that's a nice read, but does it also apply to MySQL (InnoDB)?
        
             | hparadiz wrote:
             | I've done this with mysql. Never do one at a time if you
             | have jobs per minute over 30. It won't scale. Instead have
             | the job dispatcher reserve 100 at a time and then fire that
             | off to a subprocess which will subsequently fire off a
             | process for each job. A three layer approach makes it much
             | easier to build out multiserver. Or if you don't want the
             | headache just use SQS which is pretty much free under 1
             | million jobs.
        
               | est wrote:
               | Yeah it's very basic and limited.
               | 
               | However if I am about to use DB as a job queue for budget
               | reasons, I'd make sure the job doesn't get too
               | complicated.
        
               | hparadiz wrote:
               | For me it was a lot of small jobs.
               | 
               | I was able to get it up to 3500 jobs an hour and likely
               | could have gone far past that but the load on the MySQL
               | server was not reasonable
        
               | jeeybee wrote:
               | I was able to push PqQueuer to 25k jobs a second in my
               | benchmarking script.
        
       | kdunglas wrote:
       | The Symfony framework (PHP) provides a similar feature, which
       | also relies on LISTEN/NOTIFY and FOR UPDATE SKIP LOCKED:
       | https://symfony.com/doc/current/messenger.html#doctrine-tran...
       | 
       | It also supports many other backends including AMQP, Beanstalkd,
       | Redis and various cloud services.
       | 
       | This component, called Messenger, can be installed as a
       | standalone library in any PHP project.
       | 
       | (Disclaimer: I'm the author of the PostgreSQL transport for
       | Symfony Messenger).
        
       | odie5533 wrote:
       | I've been referring to this post about issues with Celery:
       | https://docs.hatchet.run/blog/problems-with-celery
       | 
       | Does PgQueuer address any of them?
        
         | jeeybee wrote:
         | At first glans i i see two thins that PgQueuer can. It has
         | native async support, it you can also have sync functions, they
         | will be offloaded to threads via anyio (https://github.com/janb
         | jorge/PgQueuer/blob/99c82c2d661b2ddfc...)
         | 
         | It has a global rate limit, synced via pg notify.
        
           | odie5533 wrote:
           | PgQueuer seems pretty similar to Hatchet - both use Postgres,
           | require their own worker processes, both support async.
           | Hatchet seems to be a lot more powerful though: cron, DAG
           | workflows, retries, timeouts, and a UI dashboard.
        
             | TkTech wrote:
             | https://github.com/tktech/chancy is a very early-stage pet
             | project to work around specific issues I keep running into
             | with Celery that has all of this, including a simple
             | plugin-extendable UI with DAG workflow visualization.
        
       | piyushtechsavy wrote:
       | Although I am more of a MySQL guy, I have been exploring
       | PostgreSQL from sometime. Seems it has lot of features out of
       | box.
       | 
       | This is very interesting tool.
        
       | _joel wrote:
       | Broadcaster is one we use in production for PUB/SUB stuff with
       | OPA/OPAL. https://pypi.org/project/broadcaster/
        
       | nifal_adam wrote:
       | It looks like PgQueuer integrates well with Postgres RPC calls,
       | triggers, and cronjobs (via pg_cron). Interesting, will check it
       | out.
        
       | bfelbo wrote:
       | How does this compare to the popular Graphile Worker library?
       | 
       | https://github.com/graphile/worker
        
       | mads_quist wrote:
       | I really like the emergence of simple queuing tools for robust
       | database management systems. Keep things simple and remove
       | infrastructure complexity. Definitely a +1 from me!
       | 
       | For handling straightforward asynchronous tasks like sending opt-
       | in emails, we've developed a similar library at All Quiet for C#
       | and MongoDB: https://allquiet.app/open-source/mongo-queueing
       | 
       | In this context:                   LISTEN/NOTIFY in PostgreSQL is
       | comparable to MongoDB's change streams.         SELECT FOR UPDATE
       | SKIP LOCKED in PostgreSQL can be likened to MongoDB's atomic
       | read/update operations.
        
       | jackbravo wrote:
       | Most of the small python alternatives I've seen use Redis as
       | backend:
       | 
       | - https://github.com/rq/rq - https://github.com/coleifer/huey -
       | https://github.com/resque/resque
        
       ___________________________________________________________________
       (page generated 2024-08-19 23:00 UTC)