[HN Gopher] Devious SQL: Message Queuing Using Native PostgreSQL
       ___________________________________________________________________
        
       Devious SQL: Message Queuing Using Native PostgreSQL
        
       Author : Scramblejams
       Score  : 107 points
       Date   : 2022-01-28 19:33 UTC (2 days ago)
        
 (HTM) web link (blog.crunchydata.com)
 (TXT) w3m dump (blog.crunchydata.com)
        
       | jpgvm wrote:
       | Implementations that use advisory locks like
       | https://github.com/que-rb/que are much more efficient (atleast
       | when I last tested) and will easily reach 10k job/s on even very
       | modest hardware.
       | 
       | There is a Go port of Que but you can also easily port it to any
       | language you like. I have a currently non-OSS implementation in
       | Rust that I might OSS someday when I have time to clean it up.
        
         | pdimitar wrote:
         | A Rust implementation will be highly valued!
        
       | erichocean wrote:
       | See also: https://github.com/timgit/pg-boss
       | 
       | > _Queueing jobs in Node.js using PostgreSQL like a boss_
       | 
       | You can scale much higher in Postgres 14 if you use partitioned
       | tables, both horizontally and in terms of day-to-day stability
       | because old partitions can be dropped in _O_ (1) time, taking all
       | table bloat with them. Obviously more work to set up, though.
        
       | nitred wrote:
       | This is easily a few thousand dollars worth of advice. I use
       | PostgreSQL regularly but I would never have been able to come up
       | with this solution myself.
       | 
       | Having queues on Postgres is going to be such a great addition to
       | my tool belt especially since there's always a Postgres instance
       | running somewhere anyway.
       | 
       | I know the first use case that jumps to mind is a job queues, but
       | I feel like the method described in the article is quite low
       | level which means it can be used as a base to solve many use
       | cases.
       | 
       | * I won't have to reach for Kafka/Redpanda unless the rate of
       | events/messages reaches 100k-1000k per day.
       | 
       | * I can add one column called queue_id where each unique queue_id
       | refers to a new queue which means I can use a single table for
       | multiple queues.
       | 
       | * If I add a new column called event_type, then I wonder if it's
       | possible to create a composite event queue, where for example the
       | query must return exactly two rows and one row must have
       | event_type=type_1 and second row must have event_type=type_2 and
       | both rows are locked and processed exactly once?
        
       | noodlesUK wrote:
       | Does anyone have anything that resembles Celery or Huey for
       | python that uses the nice Postgres semantics for queueing? I know
       | ruby has something like this in the form of Que. My application
       | has very modest needs for its task queue, probably like a few
       | thousand events per year, and celery seems like massive overkill.
        
         | swuecho wrote:
         | https://procrastinate.readthedocs.io/en/stable/
        
           | noodlesUK wrote:
           | This looks _exactly_ like what I'm looking for. Thank you!
        
         | subleq wrote:
         | https://github.com/gavinwahl/django-postgres-queue
        
           | fouadf wrote:
           | At our company we use Redis, it's lightweight and has list
           | structure which can be used as a push/pop queue
           | 
           | The basic gist of it is that on one end a producer pushes to
           | the list and a consumer(s) on the other end pops the job and
           | executes it. Fire and forget style.
        
             | noodlesUK wrote:
             | Yeah, redis or similar are generally good and battle tested
             | solutions. Celery and Huey use redis, as do many other task
             | queue and scheduling solutions. The issue is then that you
             | have to manage an additional deployment of redis. That's
             | not so bad if it's a core part of your application esp if
             | you're hosting on a paas or major cloud, but it can make
             | local dev and testing suck a bit more.
        
               | fouadf wrote:
               | Yes it's true, we already use Redis for several workloads
               | and operations so it's not really an extra thing to
               | manage, on the contrary we don't need to learn about say
               | rabbitmq and manage it, just worry about Redis.
               | 
               | Why do you consider it bad for local dev? A Redis
               | instance literally takes 1MB memory when started.
        
         | ddorian43 wrote:
         | https://gitlab.com/dalibo/dramatiq-pg
        
       | RedShift1 wrote:
       | I remember coming across a project that implemented a complete
       | queueing system using SQL functions, so to add jobs you would
       | just call a function and same for polling/listening for jobs. It
       | was very well documented, each function was explained and how to
       | use it. But I can't find it anymore.
        
       | 00000000005 wrote:
       | You don't even need a database to implement an effective queueing
       | mechanism.
       | 
       | Linux file move is atomic, which means file system based queues
       | are perfectly viable. Just save one message per file. Move the
       | file to a different directory when it changes state.
       | 
       | I built a prototype queue around this mechanism. Performance was
       | bound to the disks ability to create small files, around 30,000
       | messages per second.
       | 
       | This sort of performance rivals some of much more sophisticated
       | and complex queuing systems, but has zero configuration.
        
         | RedShift1 wrote:
         | Would you mind sharing the code? How do you know subscribers
         | have messages waiting for them, inotify or polling or something
         | else?
        
         | eloff wrote:
         | It has complex failure modes though, unless you're not at all
         | concerned about what happens if the program crashes, power goes
         | out, file gets corrupted, etc.
         | 
         | It's surprisingly difficult to write something that stores
         | changing data on disk correctly and durably.
        
           | 00000000005 wrote:
           | Databases have the same problems when disks fail, power goes
           | out etc.
        
             | eloff wrote:
             | If you mean they have to solve the same problems with very
             | careful design and programming, yes. It's non trivial. Even
             | Postrgres had a correctness issue around fsync for nearly
             | twenty years.
             | 
             | If you roll your own, that's very hard to get right. I
             | wouldn't have confidence in such a system that I've built
             | myself.
        
         | kumarvvr wrote:
         | This mechanism has multiple points of failure, and too many
         | disk writes. Batching up writes saves the hardware.
        
           | 00000000005 wrote:
           | Batching up writes means greater risk if data loss compared
           | to immediate writes.
        
       | bfelbo wrote:
       | Nice post! For anyone serious about using PG for queues, I'd
       | suggest looking into https://github.com/graphile/worker which
       | seems very promising.
        
       | kumarvvr wrote:
       | I was always taught (by extensive reading on SO) that
       | implementing a queue system on top of PGSQL is bad.
       | 
       | So, for small (but critical nonetheless) use cases, do any of you
       | use your own custom built queues on top of a database?
        
         | 00000000005 wrote:
         | There's people who think using a database as a queue is bad,
         | but if you keep pressing then for tangible reasons, it usually
         | turns out to be emotional ...... they just don't like the idea.
         | 
         | Next time someone says it's a bad idea, keep pressing for real
         | answers, not emotion.
        
         | onlywicked wrote:
         | I don't myself but DigitalOcean had there queue system to
         | create VMs built over MySQL.
         | 
         | I couldn't find the original video describing their
         | architecture. (It's private now, I guess).
        
         | bfelbo wrote:
         | We've implemented two types of queues using Postgres tables in
         | production and it's been working nicely. Happy to answer
         | questions about our usage if it's useful :)
        
           | kumarvvr wrote:
           | How scalable do you think your system is? Can it handle 10k
           | req/sec?
           | 
           | My usage is much lower than that but I want to target that
           | number.
        
             | bfelbo wrote:
             | We're currently keeping it as simple as possible. We just
             | use a table and this query that runs every second: "DELETE
             | FROM job_queue RETURNING type, info;"
             | 
             | Works great and reliably so far. I think our current volume
             | is on the order of 100-1000 req/sec. We'll likely switch to
             | graphile-worker when we need more performance, but we're
             | all about avoiding premature optimization. That library has
             | been benchmarked to handle 10k req/sec:
             | https://github.com/graphile/worker#performance
        
           | kumarvvr wrote:
           | Would also love to hear or read about your implementation,
           | the general architecture and failsafe constructs. Thank you.
        
       ___________________________________________________________________
       (page generated 2022-01-30 23:02 UTC)