[HN Gopher] Devious SQL: Message Queuing Using Native PostgreSQL
___________________________________________________________________
Devious SQL: Message Queuing Using Native PostgreSQL
Author : Scramblejams
Score : 107 points
Date : 2022-01-28 19:33 UTC (2 days ago)
(HTM) web link (blog.crunchydata.com)
(TXT) w3m dump (blog.crunchydata.com)
| jpgvm wrote:
| Implementations that use advisory locks like
| https://github.com/que-rb/que are much more efficient (atleast
| when I last tested) and will easily reach 10k job/s on even very
| modest hardware.
|
| There is a Go port of Que but you can also easily port it to any
| language you like. I have a currently non-OSS implementation in
| Rust that I might OSS someday when I have time to clean it up.
| pdimitar wrote:
| A Rust implementation will be highly valued!
| erichocean wrote:
| See also: https://github.com/timgit/pg-boss
|
| > _Queueing jobs in Node.js using PostgreSQL like a boss_
|
| You can scale much higher in Postgres 14 if you use partitioned
| tables, both horizontally and in terms of day-to-day stability
| because old partitions can be dropped in _O_ (1) time, taking all
| table bloat with them. Obviously more work to set up, though.
| nitred wrote:
| This is easily a few thousand dollars worth of advice. I use
| PostgreSQL regularly but I would never have been able to come up
| with this solution myself.
|
| Having queues on Postgres is going to be such a great addition to
| my tool belt especially since there's always a Postgres instance
| running somewhere anyway.
|
| I know the first use case that jumps to mind is a job queues, but
| I feel like the method described in the article is quite low
| level which means it can be used as a base to solve many use
| cases.
|
| * I won't have to reach for Kafka/Redpanda unless the rate of
| events/messages reaches 100k-1000k per day.
|
| * I can add one column called queue_id where each unique queue_id
| refers to a new queue which means I can use a single table for
| multiple queues.
|
| * If I add a new column called event_type, then I wonder if it's
| possible to create a composite event queue, where for example the
| query must return exactly two rows and one row must have
| event_type=type_1 and second row must have event_type=type_2 and
| both rows are locked and processed exactly once?
| noodlesUK wrote:
| Does anyone have anything that resembles Celery or Huey for
| python that uses the nice Postgres semantics for queueing? I know
| ruby has something like this in the form of Que. My application
| has very modest needs for its task queue, probably like a few
| thousand events per year, and celery seems like massive overkill.
| swuecho wrote:
| https://procrastinate.readthedocs.io/en/stable/
| noodlesUK wrote:
| This looks _exactly_ like what I'm looking for. Thank you!
| subleq wrote:
| https://github.com/gavinwahl/django-postgres-queue
| fouadf wrote:
| At our company we use Redis, it's lightweight and has list
| structure which can be used as a push/pop queue
|
| The basic gist of it is that on one end a producer pushes to
| the list and a consumer(s) on the other end pops the job and
| executes it. Fire and forget style.
| noodlesUK wrote:
| Yeah, redis or similar are generally good and battle tested
| solutions. Celery and Huey use redis, as do many other task
| queue and scheduling solutions. The issue is then that you
| have to manage an additional deployment of redis. That's
| not so bad if it's a core part of your application esp if
| you're hosting on a paas or major cloud, but it can make
| local dev and testing suck a bit more.
| fouadf wrote:
| Yes it's true, we already use Redis for several workloads
| and operations so it's not really an extra thing to
| manage, on the contrary we don't need to learn about say
| rabbitmq and manage it, just worry about Redis.
|
| Why do you consider it bad for local dev? A Redis
| instance literally takes 1MB memory when started.
| ddorian43 wrote:
| https://gitlab.com/dalibo/dramatiq-pg
| RedShift1 wrote:
| I remember coming across a project that implemented a complete
| queueing system using SQL functions, so to add jobs you would
| just call a function and same for polling/listening for jobs. It
| was very well documented, each function was explained and how to
| use it. But I can't find it anymore.
| 00000000005 wrote:
| You don't even need a database to implement an effective queueing
| mechanism.
|
| Linux file move is atomic, which means file system based queues
| are perfectly viable. Just save one message per file. Move the
| file to a different directory when it changes state.
|
| I built a prototype queue around this mechanism. Performance was
| bound to the disks ability to create small files, around 30,000
| messages per second.
|
| This sort of performance rivals some of much more sophisticated
| and complex queuing systems, but has zero configuration.
| RedShift1 wrote:
| Would you mind sharing the code? How do you know subscribers
| have messages waiting for them, inotify or polling or something
| else?
| eloff wrote:
| It has complex failure modes though, unless you're not at all
| concerned about what happens if the program crashes, power goes
| out, file gets corrupted, etc.
|
| It's surprisingly difficult to write something that stores
| changing data on disk correctly and durably.
| 00000000005 wrote:
| Databases have the same problems when disks fail, power goes
| out etc.
| eloff wrote:
| If you mean they have to solve the same problems with very
| careful design and programming, yes. It's non trivial. Even
| Postrgres had a correctness issue around fsync for nearly
| twenty years.
|
| If you roll your own, that's very hard to get right. I
| wouldn't have confidence in such a system that I've built
| myself.
| kumarvvr wrote:
| This mechanism has multiple points of failure, and too many
| disk writes. Batching up writes saves the hardware.
| 00000000005 wrote:
| Batching up writes means greater risk if data loss compared
| to immediate writes.
| bfelbo wrote:
| Nice post! For anyone serious about using PG for queues, I'd
| suggest looking into https://github.com/graphile/worker which
| seems very promising.
| kumarvvr wrote:
| I was always taught (by extensive reading on SO) that
| implementing a queue system on top of PGSQL is bad.
|
| So, for small (but critical nonetheless) use cases, do any of you
| use your own custom built queues on top of a database?
| 00000000005 wrote:
| There's people who think using a database as a queue is bad,
| but if you keep pressing then for tangible reasons, it usually
| turns out to be emotional ...... they just don't like the idea.
|
| Next time someone says it's a bad idea, keep pressing for real
| answers, not emotion.
| onlywicked wrote:
| I don't myself but DigitalOcean had there queue system to
| create VMs built over MySQL.
|
| I couldn't find the original video describing their
| architecture. (It's private now, I guess).
| bfelbo wrote:
| We've implemented two types of queues using Postgres tables in
| production and it's been working nicely. Happy to answer
| questions about our usage if it's useful :)
| kumarvvr wrote:
| How scalable do you think your system is? Can it handle 10k
| req/sec?
|
| My usage is much lower than that but I want to target that
| number.
| bfelbo wrote:
| We're currently keeping it as simple as possible. We just
| use a table and this query that runs every second: "DELETE
| FROM job_queue RETURNING type, info;"
|
| Works great and reliably so far. I think our current volume
| is on the order of 100-1000 req/sec. We'll likely switch to
| graphile-worker when we need more performance, but we're
| all about avoiding premature optimization. That library has
| been benchmarked to handle 10k req/sec:
| https://github.com/graphile/worker#performance
| kumarvvr wrote:
| Would also love to hear or read about your implementation,
| the general architecture and failsafe constructs. Thank you.
___________________________________________________________________
(page generated 2022-01-30 23:02 UTC)