hngopher.com

       [HN Gopher] Do you really need Redis? How to get away with just ...
       ___________________________________________________________________
        
       Do you really need Redis? How to get away with just PostgreSQL
        
       Author : hyzyla
       Score  : 689 points
       Date   : 2021-06-12 06:57 UTC (16 hours ago)
        
 (HTM) web link (spin.atomicobject.com)
 (TXT) w3m dump (spin.atomicobject.com)
        
       | markus_zhang wrote:
       | I think it more or less depends on the mindset of the developer.
       | For example from my observation developrrs from telecom or other
       | big companies tend to turn every program into a "solution" i. e.
       | it must have a full weaponry and other bells and whistles. Me?
       | I'm the opposite amd won't add anything or fix anything unless it
       | is really useful.
        
       | manishsharan wrote:
       | Sometimes we need network based data structures that are other
       | than relational tables. Redis delivers quite a few of those ..
       | You can always implement then in a SQL database but Redis is just
       | a better solution for those cases.
        
       | beforeolives wrote:
       | Can anyone explain the need for redis in a clear way for someone
       | who knows how databases work but isn't a backend developer? What
       | alternatives are there? What did people do to solve the same
       | problem before redis existed?
        
         | mixmastamyk wrote:
         | Generally used for caching expensive lookups, including job
         | queues, and other nifty patterns. Before that memcached was
         | most popular, but it didn't save to disk, where redis
         | eventually does. Before that, it was roll your own or ram disk.
        
         | snarkypixel wrote:
         | ELI5:
         | 
         | Redis is really a mix-bag of many useful commands, see
         | [https://redis.io/commands]. It has been referred to as the
         | "swiss army knife" because you can build many custom solutions
         | from these building blocks. Still, its most common use-case is
         | for in-memory key/value caching for performance reasons. For
         | instance, if you have one database query that takes a
         | particularly long time, you can execute it once and store the
         | result in redis, and then retrieve that value extremely fast.
        
       | ibraheemdev wrote:
       | Okay, but why? Redis is perfect for job queues and pub/sub. Use
       | the right tool for the job.
        
       | encoderer wrote:
       | Because one can, does not mean one should.
        
       | fasteo wrote:
       | For use case 1 (job queue) I can only recommend beanstalkd[1].
       | Simple tcp plain text protocol. Install and forget. Plus you get
       | some really useful features like deferred jobs, burying, kicking,
       | job priorities, etc.
       | 
       | We have literally processed tens of billions of jobs without a
       | single failure.
       | 
       | Old and simple technology that just works
       | 
       | [1] https://beanstalkd.github.io/
        
         | mianos wrote:
         | It is funny, every time I have used beanstalk over the years I
         | have put it in thinking I'll replace it later. I never have.
         | Multiple startups I have left years ago are still running it,
         | untouched years later. No one mentions it probably because it
         | didn't do anything but run. I wish the same could be said for
         | RabbitMQ.
        
       | chx wrote:
       | You really don't need anything fancy to implement a queue using
       | SQL. You need a table with a primary id and a "status" field. An
       | "expired" field can be used instead of the "status". We used the
       | latter because it allows easy retries.
       | 
       | 1. SELECT item_id WHERE expire = 0. If this is empty, no items
       | are available.
       | 
       | 2. UPDATE SET expire = some_future_time WHERE item_id =
       | $selected_item_id AND expire = 0. Then check whether UPDATE
       | affected any rows. If it did, item_id is yours. If not, loop. If
       | the database has a sane optimizer it'll note at most one document
       | needs locking as the primary id is given.
       | 
       | All this needs is a very weak property: document level atomic
       | UPDATE which can return whether it changed anything. (How weak?
       | MongoDB could do that in 2009.)
       | 
       | Source code at
       | https://git.drupalcode.org/project/drupal/-/blob/9.2.x/core/...
       | (We cooked this up for Drupal in 2009 but I am reasonably sure we
       | didn't invent anything new.)
       | 
       | Of course, this is not the fastest job queue there is but it is
       | quite often good enough.
        
         | ryanthedev wrote:
         | I believe something like this would have issues with multiple
         | consumers.
        
         | viraptor wrote:
         | This sounds dangerous. Without some good way to prevent workers
         | stepping on each other, that will result in lots of useless
         | queries and failed updates if a lot of jobs get scheduled at
         | the same time.
         | 
         | In the Drupal case, the database returns a single row which is
         | pretty much guaranteed to be the same for all workers in
         | between updates. You really don't want that in the same
         | database your page views hit. At least selecting N rows at a
         | time and then claiming a random item from them would be
         | slightly better.
        
           | viraptor wrote:
           | I find the downvotes weird without any explanation why the
           | raised issue doesn't sound important.
        
         | zapstar wrote:
         | I have done something similar to this to implement queues in
         | Postgres. And at quite a large scale with lots of workers all
         | pulling items off the queue at once.
         | 
         | One huge advantage over, say, SQS, is that you also get easy
         | visibility! You can query and see what's been completed, what's
         | still not worked on, etc.!
         | 
         | Oh, and do you want a priority queue? Just add a priority field
         | and sort before picking an item! Do you need to change the
         | priority of an item while it's already in the queue? Go right
         | ahead. Do you want to enforce some constraints so that
         | duplicates don't get added to the queue? It's a database -- add
         | the constraints.
         | 
         | If you're already using a relational database, and now you need
         | a queue, start with this approach. Odds are, it'll work just
         | fine, and it'll give you the perks above.
        
           | GordonS wrote:
           | Interesting! What kind of scale did you run it at?
           | 
           | Also keen to know if you saw any disadvantages with this
           | approach?
        
         | cerved wrote:
         | no you don't but it's a lot better to do an atomic
         | select/update IMHO.
         | 
         | Basically you have status = 0,1,2 for queued, processing, done.
         | You have a filtering CTE that returns the queued rows, sort by
         | id and update status = status +1 of the top one.
         | 
         | You can even get fancy and disallow updates on the status
         | column for other users to make it immutable.
         | 
         | I've done this in SQL server in a very similar way to as
         | explained in the post
        
         | Twisol wrote:
         | For other readers, the `UPDATE` step is an exact anlogue of the
         | "compare-and-set" atomic instruction [0]. It's really cool to
         | see how you've realized it in SQL!
         | 
         | As a capability, compare-and-swap has an infinite consensus
         | number [1], meaning it's sufficient to implement wait-free
         | consensus algorithms with an arbitrary number of participants.
         | That makes it a perfect fit for managing a scalable pool of
         | workers that need to coordinate on consuming from a queue!
         | 
         | [0] https://en.wikipedia.org/wiki/Compare-and-swap
         | 
         | [1]
         | https://en.wikipedia.org/wiki/Consensus_(computer_science)#C...
        
           | chx wrote:
           | Yes, the UPDATE command is the exact equivalent of LOCK
           | CMPXCHG (the SELECT can be seen as computing the memory
           | address). So the discussion about that in the comments of
           | https://stackoverflow.com/a/59022356/308851 totally applies:
           | if two queue runner threads pick the same item exactly one
           | will succeed so it can't happen both tries and tries the same
           | item. So there's no busy wait (reminder: a busy wait is where
           | it does nothing just tests a condition), it just goes over
           | every candidate until one succeeds.
        
             | abhishekjha wrote:
             | I have been asked this question in interviews for how to
             | prevent two people from booking the same seat. Interviewers
             | don't seem to be satisfied by the answer that I will let
             | the query go to the database and see who ends up booking.
             | They want some advanced locking mechanism using redis.
             | Redis does serialize queries using lua thus avoiding two
             | people from booking the same seat.
        
               | kccqzy wrote:
               | I had a similar question asked of me in an interview. The
               | interviewer was also surprised by my solution of letting
               | the database decide the winner.
        
               | bryanrasmussen wrote:
               | based on experience arriving very last minute to flight,
               | nobody has problem with letting two people book the same
               | seat.
        
               | chx wrote:
               | mmmm no
               | 
               | They allow overbooking the flight but you can't book the
               | same seat twice.
               | 
               | Overbooking can make financial sense -- even if you need
               | to hand a little compensation from time to time it's
               | still better than flying with empty seats. Of course it
               | sucks big time for the travelers especially if there are
               | no volunteers. IDB is miserable.
               | https://www.transportation.gov/individuals/aviation-
               | consumer...
        
               | mianos wrote:
               | They obviously don't have the transaction semantics
               | worked out. I have been at the gate and the person in
               | front of me had the same seat number. They read it out.
               | When I presented my pass the girl politely took my ticket
               | and printed another one. If the person was not right in
               | front of me I would never have known why. It is not
               | unusual.
        
               | kyleee wrote:
               | Let the check-in attendant handle it; that's what I call
               | a no-code solution! You're hired
        
               | Tostino wrote:
               | That's solid middle management thinking right there!
        
               | agf wrote:
               | I think this being a good answer hinges on the resulting
               | user experience. If one person spends several minutes
               | going through the process of putting in their payment and
               | other info only to be told you don't have a ticket,
               | they're going to be pissed. So you need a good answer on
               | how to avoid that experience, however you implement the
               | locking.
               | 
               | Saying this as someone who has been on both sides of that
               | interview question, and also evaluated performance in it
               | as a hiring manager.
        
               | lstamour wrote:
               | Send bookings to the database early, with an expiry time
               | and a counter visible on the screen?
               | 
               | Thing is, because purchasing is often shared across
               | third-parties, even then you can't guarantee that you'll
               | have the reservation until it's paid and entered into the
               | shared booking backend... unless the third party backend
               | has a locking mechanism, at which point you're probably
               | not using your own Postgres to do this, but their
               | mainframe instead.
        
               | spockz wrote:
               | Is there any other solution other than going full STM on
               | this?
        
           | halifaxbeard wrote:
           | Postgres's transactional semantics are really useful when
           | building a queue, because of how it interacts with the
           | various pieces.
           | 
           | Connection 1                 LISTEN 'job-updates';
           | 
           | Connection 2                 BEGIN;       INSERT INTO jobs
           | ('a-uuid', ...);       SELECT PG_NOTIFY('job-update', 'json
           | blob containing uuid and state change info');       COMMIT;
           | 
           | Connection 3 (used when Connection 1 is notified)
           | BEGIN;       SELECT id, ... FROM jobs WHERE id = 'a-uuid' FOR
           | UPDATE SKIP LOCKED;       UPDATE 'jobs' SET state =
           | 'step1_completed' WHERE is = 'a-uuid';       SELECT
           | PG_NOTIFY('job-update', 'json blob containing uuid and state
           | change info');       -- do the thing here: computation,
           | calling external API, etc. If it fails then rollback.
           | COMMIT;
           | 
           | Because notify has transactional semantics, the notify only
           | goes out at transaction commit time. You want to use a
           | dedicated connection for the notify.
           | 
           | The only downsides I immediately think of are you will have
           | every worker contending to lock that row, and you'll need to
           | write periodic jobs to cleanup/retry failures.
        
         | jayd16 wrote:
         | A single Rabbit node seems easier than rolling this yourself
         | and doing all the work to test it.
        
           | jasonwatkinspdx wrote:
           | There's off the shelf libraries to do this in nearly every
           | language.
           | 
           | It's always about tradeoffs in context, but I have seen
           | plenty of instances of someone setting up a rabbit cluster
           | for something that could be done trivially in their existing
           | db cluster via the above.
        
           | z77dj3kl wrote:
           | It's a tradeoff between software and infra complexity.
           | 
           | I would argue that if you've built a system up from scratch,
           | this is much easier to debug and maintain than a foreign
           | piece of software. Rabbit is just way overkill if you have a
           | couple of jobs per hour, for example.
        
           | solipsism wrote:
           | But if we always do what's easiest, then we will make a bunch
           | of horrible choices.
           | 
           | Other things that matter, often more than what's easiest:
           | 
           | * Flexibility. If you aren't 100% sure what you'll need, but
           | need something in place today, it can make sense to choose an
           | unopinionated, generic solution.
           | 
           | * Familiarity. Maybe you've never used Rabbit.
           | 
           | * Ops simplicity. Maybe you already have a bunch of postgres
           | deployed. Adding a new type of system means you need new ways
           | of monitoring, new dependencies, different deployment.
        
             | throwaway894345 wrote:
             | Of these, I think the last is the most compelling. Why
             | introduce something new if you already have all of the
             | infra and tooling configured for managing postgres? The
             | parent mentioned testing the postgres implementation, but
             | that seems like quite a lot less work than configuring the
             | infra-as-code, monitoring, alerting, etc required for a new
             | type of thing.
        
         | joelbluminator wrote:
         | So you need to keep polling the db?
        
           | lolinder wrote:
           | Postgres has LISTEN/NOTIFY for this.
        
         | ukd1 wrote:
         | You really really wanna do everything (locking, working,
         | unlocking, deleting or completing) in a transaction, and use
         | SKIP LOCKED to find work. It's actually pretty high performance
         | when you so this.
        
           | phumberdroz wrote:
           | do you have a more detailed example on how this could look
           | like?
        
             | subleq wrote:
             | https://github.com/gavinwahl/django-postgres-queue
        
             | taspeotis wrote:
             | The article starts by discussing SKIP LOCKED
        
         | habibur wrote:
         | Right. And this article reminds me of people trying to use a
         | SQL database like a filesystem. "Lock file before update
         | otherwise it will get overwritten".
         | 
         | Don't use record lock as like it's an attribute of a row. Add a
         | column on the table called "status" and use it.
        
         | 0x0nyandesu wrote:
         | I recently switched from rolling my own table to SQS.
         | 
         | The main issue is making a job queue dispatcher and runner
         | which most people aren't familiar with.
        
         | kalleboo wrote:
         | The way we did it (in MYSQL) was "UPDATE tasks SET
         | processid=<unique id for the queue consumer task> WHERE
         | processid IS NULL ORDER BY taskid LIMIT 1" and then look for
         | processid to find your task. I guess the LIMIT on UPDATE might
         | be an mysql extension? Then if your task crashes or dies you
         | can look for invalid processids and do cleanup
        
           | chx wrote:
           | Our initial did that too but generating a unique was removed
           | soon after as had expire in there anyways for retries as I
           | mentioned and that is enough.
        
         | hughrr wrote:
         | Oh please stop using databases as queues. I spent a
         | disproportionate amount of time in the last 20 years undoing
         | that decision. It doesn't scale at all well.
        
           | blowski wrote:
           | You're dealing with survivorship bias - those companies whose
           | products were successful enough that they needed your
           | services. You don't speak to the teams who went bankrupt
           | because they spent too much time fiddling with RabbitMQ, when
           | Postgres would have been fine.
           | 
           | As always, there are trade offs, no silver bullets.
        
             | paxys wrote:
             | You are wayyy overestimating how complex something like
             | RabbitMQ or Redis is. You don't need to hire entire teams
             | of engineers or go bankrupt setting up an instance. It is
             | less work than implementing a production-level queue in
             | Postgres for sure.
        
               | zomgwat wrote:
               | I agree. I'll also add that Redis is a level easier to
               | operate than RabbitMQ.
        
               | aetherson wrote:
               | Have worked at multiple companies that successfully used
               | redis or rabbit with teams of less than five engineers.
               | 
               | It's a little insane that a highly rated thread on HN is
               | telling people to use postgres as their queuing solution.
               | The world is wide, I'm sure that somewhere out there
               | there is a situation where using postgres of a queue
               | makes sense, but in 99% of all cases, this is a terrible
               | idea.
               | 
               | Also, SQS and nsq are simple.
        
               | politician wrote:
               | If your data storage pattern involves a database write
               | followed by a message queue insert, then the ability to
               | wrap those in a transaction can be a good trade-off to
               | avoid consistency failures between the writes.
               | 
               | Avoid consuming that queue directly though -- which is
               | probably what you're thinking when saying this is a dumb
               | idea and I tend to agree. Typically, you want to have a
               | worker that loads entries into a more optimal queue for
               | your application.
               | 
               | Bottom line though, there is not a single best "queue"
               | product. There are lots of queues offering wildly
               | different semantics that directly impact use cases.
        
               | GordonS wrote:
               | For years I've been using RabbitMQ in small teams, and me
               | as a one-man team too.
               | 
               | As long as you don't need clustering (even a single node
               | can handle some pretty heavy load), it's actually
               | _really_ simple to setup and use - way easier than
               | Postgres itself, for example.
               | 
               | My biggest beefs with it have historically been the
               | Erlang-style config files (which are basically
               | unreadable), and the ridiculous spiel of error messages
               | you get if you have an invalid configuration. But
               | thankfully RabbitMQ switched to a much simpler config
               | file format one or two years back, and I understand the
               | Erlang OTP is working on better error messages and stack
               | traces.
        
               | fauigerzigerk wrote:
               | The issue is not necessarily the complexity of RabbitMQ
               | or Redis. The complexity comes from having to manage
               | another (stateful) process that has to be available at
               | all times.
        
               | hughrr wrote:
               | Yes. Pay Amazon to do it and add use all that saved time
               | to add business value instead.
               | 
               | They'll also manage the consumers of the queue and scale
               | them too! Serverless is bliss.
        
               | mbreese wrote:
               | "Production-level" means different things to different
               | groups. Many (most?) groups don't operate at the scale
               | where they need a specialized queue in a dedicated
               | message broker. Simple queues in a DB will work fine.
               | Even if it isn't very complex, why not already use the
               | infrastructure you probably already have setup -- an
               | RDBMS?
               | 
               | Now, if you're using a nosql approach for data storage,
               | then you already know your answer.
        
               | gravypod wrote:
               | My main concern would be monitoring. Most queue systems
               | connect to alerting systems and can page you if you
               | suddenly stop processing thongs or retrying the same
               | query many many times. For a DB, since the scope of
               | access is much larger, you don't get these sort of
               | guarantees for access patterns and you essentially need
               | to reinvent the wheel for monitoring.
               | 
               | All to save 2 to 3 hr of Googling for the best queue for
               | your use case and finding s library for your language.
               | 
               | It makes sense if you don't care about reliability and
               | just need something easy for many many people to deploy
               | (ex: Drupal).
        
               | bcrosby95 wrote:
               | We use pingdom that hits a page that gives it the health
               | of various systems. Queue included.
               | 
               | > All to save 2 to 3 hr of Googling for the best queue
               | for your use case and finding s library for your
               | language.
               | 
               | The cost of using a new piece of tech in production is
               | not just 2 or 3 hours.
        
               | gravypod wrote:
               | If you're at the scale where your postgres db can be used
               | as a queue and no one on your team has experience running
               | a these systems (most startups) then pretty much anything
               | will work to begin with and as long as you have a clear
               | interface that separates your code from your deps for the
               | queue it'll be easy to swap out.
               | 
               | At $JOB-1 I wrote a `Queue` and `QueueWorker` abstraction
               | that used an environment variable to switch between
               | different queue backends while providing the same
               | interface. Because of this I got everyone up and running
               | with Redis lists as a queue backend and then could add in
               | things like MQTT or RabbitMQ as things evolved. I also
               | defined very loose constraints for the queue interface
               | that made it so the implementer. Essentially there was a
               | `push()` which added something into the queue or failed
               | and returned an error. Then there was an `onWork()` which
               | was called whenever there was work "at least once"
               | meaning your system had to handle multiple instances
               | being delivered the same work item. We would only ack the
               | queue message after `onWork()` completed successfully.
               | 
               | There's not really anything preventing a team from doing
               | this, putting a pin into it, and coming back to it when
               | there's a scalability or reliability concern.
        
             | cortesoft wrote:
             | Sure, but I also feel like this comment is based on an
             | assumption that everyone works at startups. A lot of us
             | work for larger, already established companies that are
             | past that will we or won't we survival stage. We are trying
             | to design systems to solve needs for already validated
             | products.
             | 
             | It might seem like everyone works for a startup, but big
             | companies have a lot of engineers, and we still have to
             | solve new problems there even though the company is not
             | new.
        
             | robwwilliams wrote:
             | Great comment: Been reading Taleb's Black Swan this morning
             | on ascertainment bias, so this resonated well.
        
             | skytreader wrote:
             | > You're dealing with survivorship bias - those companies
             | whose products were successful enough that they needed your
             | services
             | 
             | In fairness, Redis has only been available for 12 years so
             | someone who has been in the industry longer has probably
             | encountered systems using DBs as queues for no better
             | reason than an alternative was not available when the
             | system was made. (Rabbit just a couple years older I
             | think.)
             | 
             | But in this day and age, you have better options to start
             | with c'mon.
             | 
             | > they spent too much time fiddling with RabbitMQ, when
             | Postgres would have been fine.
             | 
             | Rather ironic claim in a thread full of tips, tricks, and
             | gotchas just to make PG behave like a queue no?
             | 
             | I can't speak for RabbitMQ personally but Redis has got to
             | be among the simplest drop-in dependencies I've ever
             | encountered. For a simple queue, the defaults do out-of-
             | the-box.
             | 
             | Gut feel also says, Redis defaults for queuing would also
             | serve more than PG defaults for queuing so you could leave
             | it untouched for far longer to focus on your product, get
             | to profitability, etc. Feel free to debunk with data.
        
               | boardwaalk wrote:
               | You (or whoever is doing ops) will have to pay with your
               | time to operate Redis until the product dies though.
               | 
               | I don't think adding another component (especially one
               | with storage/backup requirements/more complex failover
               | procedures) should be taken lightly.
               | 
               | Paying attention to the tips/tricks/gotchas is something
               | you'd pay once and hopefully document in source
               | code/documentation.
               | 
               | If you piggyback off someone else's work (another team at
               | your job, a cloud-based solution that handles everything,
               | etc) the calculus would change, of course.
        
               | skytreader wrote:
               | > You (or whoever is doing ops) will have to pay with
               | your time to operate Redis until the product dies though.
               | 
               | Repeating my other comment: do you mean to say using an
               | in-DB queue will not need special monitoring other than
               | what's already standard metrics for DBs?
               | 
               | Since a queue is a different structure, I would want to
               | know metrics that might be unfamiliar for standard DB
               | monitoring. Queue length, queue throughput, etc. Can I
               | get that from my DB queue with negligible cost?
               | 
               | Another thing, there might be standard practices and
               | patterns for in-DB queues, often encapsulated in a nice
               | library/middleware. But as _databases are not queues_
               | there is no assurance from the vendor that those patterns
               | will remain applicable in future versions.
               | 
               | > Paying attention to the tips/tricks/gotchas is
               | something you'd pay once
               | 
               | Actually, I disagree with this. Everytime someone
               | connects to that database, they have to remember that one
               | special table which is used as a queue, and remember to
               | tread differently where it is concerned.
               | 
               | Setting up a queue? You do that once and everyone who
               | interacts with it will not forget it is a queue. Heck you
               | can't even do your usual DB operations there. How can you
               | not remember it is a queue?
        
               | sh87 wrote:
               | Using a tool without knowing reasonable bounds of the
               | domain, current requirements, and how the tool (redis in
               | this case) solves the problem isn't good advice.
               | 
               | Case in point our team went with Redis, just the default,
               | use it blindly without fully understanding our
               | requirements and how redis helps scale.
               | 
               | 2 years later we spent 2 sprints, holding back the
               | release trying to understand RDF vs AOF, and why we're
               | seeing massive spikes in consumption and performance
               | triggering pod eviction, running comparison tests to
               | prove which works better and explaining why, running qa
               | tests (regression, performance, load), introducing
               | postgres for queuing, redoing our code to bypass the sync
               | mechanism between how data flows between redis and
               | postgres, updating dependenciies, migrating existing
               | customer data (various on site locations), explaining
               | this to team members, managers and their managers,
               | installation technicians, support engineers and
               | presenting it at engineering townhalls as a case of bad
               | decisions.
               | 
               | Not worth it.
        
               | skytreader wrote:
               | Well, by your admission, you used Redis for a problem
               | domain it wasn't suited for in the first place. How is
               | this an argument for using in-database queues?
               | 
               | > use it blindly without fully understanding our
               | requirements and how redis helps scale
               | 
               | I'm sorry I don't get how I could come across as
               | advocating the use of Redis blindly. My point is if your
               | data flow looks like a queue, then use a queue, don't
               | hack a relational DB to become a queue. I think that's
               | reasonable rule of the thumb, not going in blind.
        
               | sateesh wrote:
               | I believe you meant "RDB vs AOF".
               | 
               | Also reading your reply I get the impression that "sync
               | mechanism between redis and postgress" was the
               | bottleneck. Wondering if you can add some details around
               | it and also was this something that can't be fixed by
               | fine tuning redis config, rather than completely removing
               | it from your stack.
        
               | sh87 wrote:
               | Yes I did mean RDB. Thanks for pointing out.
               | 
               | There were many problems but at the core of it, this was
               | us having redis write huge amounts of data to disk very
               | frequently causing this.
               | 
               | We could not reduce the frequency (product would not
               | allow) and we couldn't find a way to make the writes
               | reliably fast.
               | 
               | I like to believe there exists a possible way of handling
               | this, but point being, our team had no time to find out,
               | how redis works internally and have confidence that the
               | new way won't bring up new surprises.
        
               | [deleted]
        
               | derefr wrote:
               | > Rather ironic claim in a thread full of tips, tricks,
               | and gotchas just to make PG behave like a queue no?
               | 
               | There are libraries in many languages written
               | specifically to accomplish the "make PG behave like a
               | queue" part. All these people are explaining the
               | principle of how it would work, but in practice, you can
               | just pull in one of those libraries and be done. (Or,
               | more accurately, you use an abstract external-job-queuing
               | library and tell it to use its Postgres adapter, which in
               | turn uses that library.)
               | 
               | In exchange, you get the ability to not have to set up /
               | maintain any more infra than you already have. Which can
               | be important if you're already "at scale", where every
               | component you set up needs to be duplicated per
               | geographic-region, clustered for High Availability, etc.
               | If you've already done all that for Postgres, it'd be
               | nice to not _also_ have to do all that _again,
               | differently_ for Redis, RabbitMQ /Kafka/NATS, etc.
               | 
               | > Gut feel also says, Redis defaults for queuing would
               | also serve more than PG defaults for queuing
               | 
               | Don't know about what you've got your queues doing, but
               | our service only does (persistent, out-of-process)
               | queuing for two purposes:
               | 
               | 1. Global scheduled background-job queuing. (Think "cron
               | jobs.") There are only a static number of these; they
               | don't scale up with load.
               | 
               | 2. Lifecycle email dispatch. This scales up with MAU, but
               | with very very low coefficients.
               | 
               | For those cases, Postgres would totally be enough. (
               | _Anything_ would be enough!)
               | 
               | Most "queues" needed in the real world are requirements-
               | light like this. It's a rare company whose product is
               | doing something Apache BEAM-alike with its data, where a
               | high-throughput reliable MQ is a fundamental part of the
               | data workflow. (And those companies, I hope, know that
               | they need an actual MQ, not a piece of software that does
               | queuing in its spare time.)
               | 
               | Our company doesn't use Postgres queuing; we do, in fact,
               | use Redis for it instead. But we only ended up doing
               | that, because we already needed Redis for other stuff;
               | and if you _already have Redis in play_ (including an ops
               | plan for scaling it), _then_ it 's the better tool for
               | the job.
        
               | skytreader wrote:
               | > In exchange, you get the ability to not have to set up
               | / maintain any more infra than you already have.
               | 
               | And, you mean, a PGQ will not need any special monitoring
               | other than the common DB metrics?
               | 
               | For instance, if I ever run a queue, it's just due ops
               | diligence to know the length of the queue, average time
               | an item spends in queue, throughput over time, among
               | others. Are there standard monitoring modules that would
               | check this for a PGQ? Because in exchange for setting up
               | a proper queue cluster, compatibility and documentation
               | for common stuff like this is also what you get.
               | 
               | The first one is particularly a sticking point for me. If
               | you don't do it right, you will end up issuing a COUNT on
               | a table periodically. You might say it's acceptable for
               | most companies, because they don't need high-performance
               | queues, but I specifically need my monitoring to be
               | reliable during times of unexpected high load. Also,
               | ideally, there is close to zero chance that my telemetry
               | is the one that ends up causing me trouble.
               | 
               | Also, why does no one consider the added "surface for
               | human error" that using a DB for a queue introduces. For
               | instance anyone can drop in the DB and perform a query
               | you were not supposed to. A malicious actor can update
               | all outgoing emails in the queue to another recipient. If
               | these are sensitive emails like password reset or OTP,
               | good luck. A dedicated queue process does not allow such
               | operations.
               | 
               | I have to say, it's very perplexing to me that people
               | (especially in HN) would avoid using the right structure
               | just because they have to invest a bit more work.
        
               | derefr wrote:
               | > I have to say, it's very perplexing to me that people
               | (especially in HN) would avoid using the right structure
               | just because they have to invest a bit more work.
               | 
               | You're imagining this as if it's just the CapEx (the work
               | of setting up Redis), but in reality it's the OpEx (the
               | work of _maintaining_ a Redis cluster) that kills you.
               | Think of it in terms of the number of ops salaries you
               | have to pay. At scale, this number goes up in proportion
               | to the complexity of your stack.
        
               | skytreader wrote:
               | Actually, if you read the rest of my previous reply (as
               | opposed to an offhand comment at the end) you will see
               | that I'm considering the logistics of operating an in-DB
               | queue as well.
               | 
               | Using an in-DB queue doesn't give you zero OpEx does it?
               | Maybe you can make the argument that it's cheaper but
               | it's cheaper for a reason: the job wasn't done properly,
               | so to speak.
               | 
               | Both options introduce new failure modes and op costs
               | into your system. Might as well do it properly if (and
               | this is a huge IF in my opinion) for slightly more cost.
               | When you run into a failure it's standard, maybe the
               | solution is even one Google search away as opposed to
               | realizing, one hour into a late-night debugging session
               | that, shit, that table should be treated as a queue!
        
               | derefr wrote:
               | I read your whole reply. I didn't see any mention of
               | anything I'd consider an OpEx cost.
               | 
               | Re: Monitoring
               | 
               | Presuming you're already monitoring anything at all,
               | adding monitoring for a new piece of infrastructure is a
               | pure CapEx cost. You presumably already have monitoring
               | infrastructure running. Its ingestion, by design, will be
               | 99% idle -- i.e. it won't need to be scaled horizontally
               | proportionally to the number of components. The only
               | thing needed, then, will be careful up-front design for
               | the monitoring.
               | 
               | Which is usually _also_ a cost already paid for you in
               | advance when you use an abstract job-queue library. They
               | all know how to expose Prometheus metrics endpoints, and
               | they do the accounting to serve those endpoints
               | efficiently (usually using process-local per-job-producer
               | and per-job-consumer counters, which you must then roll
               | up yourself at the PromQL level, taking the irate() to
               | find spikes.)
               | 
               | Re: AAA
               | 
               | > Also, why does no one consider the added "surface for
               | human error" that using a DB for a queue introduces.
               | 
               | Because the type of person who reaches for their DB first
               | to solve a novel problem, is the same type of person who
               | understands and makes use of their DB's security
               | features, to the point that doing _more_ DB-security-
               | config isn 't a complex one-off problem for them, but
               | something they can whip off in a few minutes.
               | 
               | For a simple option, you can create two service-account
               | DB roles: a queue_producer, and a queue_consumer. You can
               | put all the job-queue implementation tables in a schema
               | owned by the queue_producer; and then grant the
               | queue_consumer SELECT privileges on all the tables, and
               | UPDATE privileges on some of them. Then, nobody but the
               | job_producer (or a DB superuser) can create or destroy
               | jobs; and nobody but a job_producer, a job_consumer, or a
               | superuser, can read or modify jobs. (Your job-queue
               | abstraction library within your service usually maintains
               | its own DB connection pool anyway, so it's no sweat to
               | have those connections use their own credentials specific
               | to their job-queue role.)
               | 
               | For a more complex -- but perhaps useful? -- option, the
               | tables themselves can be "hidden" behind stored
               | procedures (DDLed into existence by the abstract job-
               | queue library), where nobody has any privileges (incl.
               | SELECT) on the tables, only EXECUTE rights on the sprocs.
               | And the sprocs are carefully designed to never do
               | anything that could have an unbounded CPU time. Then
               | anyone can "check up on" or even "insert into" the job-
               | queue, but nobody can do anything "funny" to it. (My god,
               | it's an encapsulated API!)
               | 
               | Once again -- the libraries that abstract this away,
               | already think about concerns like this, and choose one or
               | the other of these options. That's why libraries like
               | this exist, when the "core premise" is so simple: it's so
               | there's a place to put all the fine details derived from
               | man-years of thought on how to make this approach
               | _robust_.
               | 
               | -----
               | 
               | On a completely different track, though: having a queue
               | in the DB can sometimes _be_ the optimal (i.e. the  "if
               | we had infinite man-hours for design + implementation")
               | engineering decision. This case comes when the thing the
               | queue is operating upon is _data in the DB_. In such a
               | case, the DB data modification, and the job 's
               | completion, can succeed or fail together _atomically_ ,
               | as part of a single DB transaction.
               | 
               | To accomplish the same thing when your queue lives
               | outside the DB, you usually end up either needing some
               | really gnarly distributed-locking logic that both your DB
               | and your app layer need to know everything about (leaky
               | abstraction!); _or_ you need to completely remodel your
               | data _and_ your job queue into an event-streaming
               | paradigm, so that you can  "rewind" one side when the
               | other side fails.
        
               | skytreader wrote:
               | > For a simple suggestion, the job queue can live in a
               | schema that is owned by a job-queue service-account role.
               | Nobody other than that user (or a DB superuser) can issue
               | any query in there.
               | 
               | Yes service accounts are fine and dandy but [a] isn't
               | that more hoops borne by choosing the wrong tool for the
               | job (gotta stay mindful of those permissions, you can
               | have an Ansible typo, etc) and [b] as I said, proper
               | queue services would not even allow tampering of enqueued
               | data. For no extra cost other than installing the
               | service. In contrast, your service account falling into a
               | malicious actor is still a threat vector.
               | 
               | > For a more complex -- but perhaps useful? -- option ...
               | 
               | Now this is just a queue service with extra steps running
               | in a relational DB instead of natively as an OS process.
               | You did cite it as just an option but I don't see why
               | this is an _attractive_ option.
               | 
               | > I didn't see any mention of anything I'd consider an
               | OpEx cost.
               | 
               | It seems to me we have differing definitions of OpEx and,
               | judging by your previous comment, you value this almost
               | exclusively in terms of "the number of ops salaries you
               | have to pay". Even if I play by that definition, I could
               | tell from experience (and also corroborated by other
               | commenters here, in other subthreads of this discussion),
               | that operating a Redis cluster does not need extra warm
               | bodies. The people who monitor that your app responds,
               | your servers aren't running out of memory, could take on
               | the task with little to no additional training.
               | 
               | The experience I want to cite, bearing in mind of course
               | that everyone's mileage varies: in my previous job,
               | customer base of 2M and growing across two countries, we
               | operated and monitored a redundant Redis cluster with an
               | ops team of 3. In my first job I talked about in another
               | subthread here, we managed three high-traffic Redis
               | queues (and a few other smaller-traffic special cases)
               | with a team of 7. In both jobs, we also did active
               | across-the-stack development; we weren't dedicated to
               | monitoring, nor to the components related to the queues.
               | 
               | In fact I would argue running a DB (queue or no queue) is
               | just more complex than running a queue service. Rather
               | telling is the need for "DB Administrators/Specialists"
               | but no such equivalent for Redis or other proper queueing
               | solutions.
               | 
               | > Which is usually also a cost already paid for you in
               | advance when you use an abstract job-queue library. They
               | all know how to expose Prometheus metrics endpoints...
               | 
               | Honestly, this sounds fine but I'm rather wary of the
               | caveats that might be present. How do they do it
               | efficiently? What do I need to keep in mind to keep
               | things efficient? As such, unless you can point me to
               | your preferred job-queue library which does all these
               | wonders, I have to reserve judgment.
               | 
               | Edit: I just remembered mentioning this in another
               | subthread. Even if you have a _fantastic_ library
               | /middleware abstracting all those queue concerns, that
               | ties you in with that library. If someone wants to build
               | integrations into your queue, they have to play by that
               | library. If majority of your app is in Java (and so is
               | the library), and Data Science (who, in this example,
               | uses Python, not improbable to happen) wants to share
               | your data pipeline, if that library isn't available for
               | them, tough luck I guess? More dev time for someone.
               | 
               | And also, whatever underlying DB features or semantics
               | your library might rely on in order to enforce a queue
               | structure, you can't be assured that future versions of
               | your DB would support those features because, you know,
               | your DB isn't really in the business of queues. It opens
               | you up to running an outdated DB version just to keep
               | that queue functionality.
        
               | derefr wrote:
               | > Now this is just a queue service with extra steps
               | running in a relational DB instead of natively as an OS
               | process. You did cite it as just an option but I don't
               | see why this is an attractive option.
               | 
               | Your DB then shares a WAL log with your queue. Meaning a
               | single managed physical replication pipeline for them
               | both. Meaning only one set of leader-election issues to
               | debug, not two. Meaning one canonical way to do
               | geographic high-latency async replication. Meaning
               | disaster recovery brings back a whole-system consistent
               | snapshot state. Etc.
               | 
               | Honestly, if I had my way, every stateful component in
               | the stack would all share a single WAL log. That's what
               | FoundationDB and the like get you.
               | 
               | > In fact I would argue running a DB (queue or no queue)
               | is just more complex than running a queue service.
               | 
               | Well, yeah, but you usually _need_ a DB. So, if you're
               | going to be paying the OpEx costs of the DB either way,
               | then you may as well understand it deeply in order to
               | wring the most productive use you can out of each OpEx
               | dollar /man-hour spent.
               | 
               | (I feel the same way about Redis, as it happens: if you
               | need it, and are locking your code into its model
               | _anyway_ , then you may as well take advantage of its
               | more arcane features, like Redis Streams, Lua scripting,
               | etc.)
               | 
               | However, maybe our company is uncommon in how much our
               | service literally _is_ doing fancy complex DB queries
               | that use tons of DB features. We're a data analytics
               | company. Even the frontend people know arcane SQL here :)
               | 
               | > that ties you in with that library
               | 
               | The difference between what you / apps / abstracting
               | libraries do in Redis, and what they do in an SQL DB, is
               | that in the DB, the shape of everything has to be
               | explained in a vendor-neutral manner: SQL DDL.
               | 
               |  _Sometimes_ Redis-based solutions converge on
               | conventional schemas; see e.g. Sidekiq's informal schema,
               | which several other queuing systems are implemented in
               | terms of. But when they don't, there's nothing you can
               | really do -- beyond hacking on the libraries involved --
               | to bring them into sync.
               | 
               | In an SQL DB, anything can be adapted into the expected
               | shape of anything else, by defining SQL views. (Heck, in
               | an SQL DB with Redis support, like Postgres with
               | redis_fdw, the _Redis_ data can be adapted into any shape
               | you like using SQL views.)
               | 
               | And that's further enabled by the fact that the DB had
               | received from the app, through DDL, a schema, that you
               | can examine, manipulate, and refactor; or even synthesize
               | together with other schemas.
               | 
               | > you can't be assured that future versions of your DB
               | would support those features
               | 
               | You can if those features are in the SQL standard. I've
               | never heard of a DBMS _regressing_ on its level of SQL
               | standard support.
        
               | skytreader wrote:
               | > Your DB then shares a WAL log with your queue...
               | 
               | I feel like the understanding we can come to here is that
               | we have differing definitions of necessary complexity.
               | 
               | You did mention you work in data analytics and I have
               | worked and am working in more traditional product-
               | oriented gigs. Everything you mentioned are nice,
               | impressive even, but to me they are toys. I don't need a
               | queue with a WAL stream, or PITR backups. Queues, to me,
               | are inter-process communication mechanisms, or a means to
               | concurrency. In fact, worst case, you can delete the
               | queues or restart Redis even without triggering a disk
               | write (though I note this is another feature that comes
               | free with Redis); it would inconvenience our customers
               | but they can always just retry.
               | 
               | Of all the benefits you mentioned, leader-election is the
               | only one I could make a good case for.
               | 
               | > then you may as well understand it deeply in order to
               | wring the most productive use you can out of each OpEx
               | dollar/man-hour spent.
               | 
               | Understanding it is one thing but using that
               | understanding to create/develop (CapEx) and then maintain
               | (OpEx) an in-DB queue feels like unnecessary costs. CapEx
               | to install Redis is practically nil, and you seem to
               | agree that running it is far simpler OpEx than PG too
               | ("Well, yeah, but...").
               | 
               | Of course, I keep in mind your point about libraries
               | doing the dirty work. But if it all boils down to a third
               | party in the end, the point I'm replying to above is
               | rather moot no? And if it comes to a third dependency
               | anyway, I've already reasoned out my preference for an
               | actual queue solution.
               | 
               | > explained in a vendor-neutral manner: SQL DDL
               | 
               | > You can if those features are in the SQL standard. I've
               | never heard of a DBMS regressing on its level of SQL
               | standard support.
               | 
               | While this is an ideal case, my experience so far shows
               | that every major DB package relies on their own tweaks to
               | the SQL standard, and so "If those features are in the
               | SQL standard" turns out to be a pretty big "if" to ask. I
               | don't worry about vendors regressing on SQL standard
               | support but rather that the libraries for DB queues are
               | relying on the non-SQL-standard features/behavior to mock
               | out a full queue functionality---non standard behavior
               | that has no guarantee of consistency across versions.
               | 
               | I mean, if there's a DB queue middleware/library that
               | works across DB vendors, be it Postgres, Oracle, Maria,
               | MySQL (with a reasonable storage engine, perhaps) then
               | that's a convincing argument that the SQL standard can
               | support queuing sufficiently. But otherwise, this
               | counterpoint is a hard sell for me.
        
               | hughrr wrote:
               | I've got about 6 years operational experience with Redis.
               | I would not use it for a queue. I struggle to consider
               | using it as a data store these days as well with other
               | options on the table. But that's a whole new thread so
               | I'm not going into details here.
               | 
               | If you want a queue, get a queue.
               | 
               | If you want a relational database, get a relational
               | database.
               | 
               | If you want a key-value store, get a key-value store.
               | 
               | Don't cross the streams.
        
               | mikeklaas wrote:
               | What would you use redis for, if anything?
        
               | yawaramin wrote:
               | I assume a key-value store.
        
               | stickfigure wrote:
               | There's one incredibly valuable feature you get with PG
               | that you will never get with Redis or RabbitMQ or any
               | other task queue: Enqueueing tasks transactionally with
               | your unit of work.
               | 
               | If your load requirements fit using your database as a
               | queue, it can radically simplify your system.
               | 
               | I've personally done a fair bit of migration from formal
               | queue systems _into_ an RDBMS. You simply get a lot more
               | control and visibility that way. But I use task queues
               | heavily too. It 's all very application-specific and I
               | don't think you can generalize.
        
               | GordonS wrote:
               | You can actually do this with RabbitMQ - it has a feature
               | called "Publisher Confirmations", where you send a
               | message, and then wait for an "ack" in return.
        
               | blowski wrote:
               | If you're using the same database for your queue as your
               | business entities, then you can wrap both in the same
               | transaction - I think that's what the parent post is
               | getting at. For example, save a customer entity and add a
               | job to send them a confirmation email. As far as I know,
               | that's something you can't do natively if you're using
               | RabbitMQ and PostgreSQL. Of course you can work around
               | it, but when your workloads are low you don't need to,
               | which is great for early stage products.
        
               | spullara wrote:
               | In that case it doesn't help you that much since you
               | can't transactionally send an email but you would get at
               | least one attempt.
        
               | blowski wrote:
               | It doesn't help guarantee the email is sent, but it makes
               | it easier to find and handle unhappy edge cases, which is
               | where a lot of time is spent in an early-stage startup.
        
               | politician wrote:
               | Not sure why you were downvoted, but perhaps they were
               | thinking about Redis transactions (lua scripts that can
               | do multiple operations at once). If Redis is your
               | datastore ( _groan_ ), then you can transactionally
               | enqueue tasks with your unit of work.
        
               | arthurcolle wrote:
               | Real time trading systems can totally get away with only
               | using Redis, outside of historical data (which obviously
               | you'd stick into Postgres or your favorite flavor of DB)
        
             | 2muchcoffeeman wrote:
             | In my experience proper abstraction is the key. If you have
             | a clear abstraction and structure you can replace the
             | implementation when you need to without too much
             | disruption. If it's too leaky you're screwed.
        
               | hughrr wrote:
               | Yes. Use a queue abstraction for which there are numerous
               | queues available off the shelf for!
        
               | aparsons wrote:
               | In my experience, great abstractions without a good data
               | migration blueprint is as about as useful as no
               | abstraction at all.
               | 
               | In this example - great, you have a "queue" abstraction.
               | When you switch to RabbitMQ or Postgres, how do you move
               | your existing data without a quality of service or data
               | loss? It can be difficult with production datastores even
               | if the abstraction within your service is beautiful.
        
               | lolinder wrote:
               | Isn't migrating a queue super simple? Have a window in
               | which you only add to the new system and you listen to
               | both. When the old queue is empty, delete it.
               | 
               | If you need to keep track of old data, then yes,
               | migration is hard. But queues don't save old data--that's
               | the point of a queue.
        
               | goodoldneon wrote:
               | 100%! Create a good enough solution and put it behind an
               | abstraction that remains stable when you later create an
               | ideal solution
        
             | pyuser583 wrote:
             | I've watched companies maintain their non-scaling DBs all
             | the way to the grave.
             | 
             | It's a killer.
             | 
             | If it's "just a cache" OK. Fine. If you need it for
             | something important like queuing or certain caching
             | scenarios - be aware it won't scale well.
        
             | ikiris wrote:
             | You say that as if rabbitmq takes more than an 8th grader
             | to set up in a few hours?
        
               | GordonS wrote:
               | To be fair, you can pull and run the Docker image in a
               | few minutes, and the defaults will work out-of-the-box
               | for a lot of folks.
               | 
               | If you don't need clustering, RabbitMQ really is pretty
               | simple to setup.
        
             | dorfsmay wrote:
             | There is a lot of space between Postres and RabbitMQ. If
             | performance is a factor it takes alot of work to make a
             | Postgres table fast enough. These days, and for the past at
             | least 5 years, you're better off using a service like SQS,
             | or setup Redis yourself.
        
             | rantwasp wrote:
             | no. he/she actually has a point. a database is not a queue
             | 
             | https://mikehadlow.blogspot.com/2012/04/database-as-queue-
             | an...
        
             | piokoch wrote:
             | Not sure. All messaging solution brokers walked away from
             | relational database based storage long time ago (I've seen
             | how badly JBoss 4 messaging with persistence sucked) and
             | for a good reason as either database was occasionally
             | killed by messaging traffic, or messaging traffic was
             | crawling because database was too slow.
             | 
             | In these days with Docker, ready cloud solutions getting
             | the message broker is so simple that there is no reason to
             | use database as a broker. If we know that our traffic will
             | be so small, just install DB and message broker on the same
             | VM, to avoid spending more on hardware or hosting.
        
             | alunchbox wrote:
             | You're not wrong, but if anyone on the team has already
             | worked with a queue system that they need to manage then
             | utilize something that scales. Otherwise there's no reason
             | to use the cloud solution you're hosting with which hand
             | hold you to make it work quickly with your integration.
        
           | crazygringo wrote:
           | Can you explain exactly what doesn't scale well?
           | 
           | Databases scale well in general, it's one of the things
           | they're built for.
           | 
           | A single database server with an indexed table (b-tree) and
           | an SSD is _extremely_ performant. (And yes there are other
           | indexes that could be even _more_ performant, but b-trees are
           | already extremely fast.)
           | 
           | But the huge advantage of putting a queue in your database is
           | being able to create transactions across both the queue and
           | data in other tables that needs to change at the same time.
           | 
           | Not to mention one less extra tool to maintain, existing
           | database tools for backups replication etc. automatically
           | include the queue as well, etc.
           | 
           | As well as the fact that it's incredibly easy to make your
           | queue behave according to whatever kind of custom business
           | logic you might need. (Which is where additional indexes and
           | b-trees can turn out to be necessary, and which is precisely
           | what databases are great at.)
           | 
           | So to the contrary -- unless you're at Facebook-level scale,
           | using modern databases for queues generally seems like an
           | _excellent_ choice. (And in the special cases where it 's
           | not, that's usually quite obvious.)
        
           | cerved wrote:
           | In your experience, roughly at which point did it stop
           | scaling and why?
        
             | rmetzler wrote:
             | I also have a lot of issues with people using tables and
             | cronjobs instead of just using queues.
             | 
             | I once was the one implementing this stupid idea. It's very
             | easy to shoot yourself in the foot. It seems to be ok at
             | first, but it is a stupid idea and will give nightmares to
             | the operations team.
             | 
             | It works as long as the table is small. It gets especially
             | problematic when there are sudden spikes in the workload,
             | so that the table is growing rapidly. People don't test
             | these scenarios enough.
             | 
             | Often it's important to do the jobs in the order they came
             | in. This means you need to have the index set up this way
             | or you'll end up with full table scans. You also need to
             | remove jobs which are done, but this will also mess with
             | the tables performance.
             | 
             | There are tricks with databases to help scaling this, but
             | at the end of the day it's much simpler to just use a
             | queue.
        
             | hughrr wrote:
             | It stopped scaling when we put in the PO for a $150k x86-64
             | box to run it on and finance said WTF.
        
               | cerved wrote:
               | What about in terms of number of requests and requests
               | per second?
        
           | sonthonax wrote:
           | It can absolutely scale well if you design the database
           | properly. However, it'll probably be fairly high latency
           | compared to a Redis or Kafka stream.
           | 
           | The things that often make databases appear slow is that
           | they're typically configured for large random reads rather
           | than small uniform writes. And they have insane initial
           | response times because of all the IPC they typically do when
           | starting a session.
           | 
           | But aside from that, if you use a heap table without an index
           | you'll can probably get similar throughout to Kafka. And if
           | you turn Fsync off you might even get up to REDIS speeds.
        
             | oneplane wrote:
             | It's possible at scale, but the engineers that can do that
             | are usually 1:20 outnumbered within most organisations.
             | (when scoping out FAANG and co) Asking a developer to
             | design a database is not even normal process today, unless
             | the ORM does it for you, very little actually is done with
             | the database. It might as well be a flat file...
        
               | api wrote:
               | That's sad. I learned relational DB design before I even
               | really got into systems programming and it was really a
               | valuable thing to learn. You can be so much more
               | efficient and less error prone with normalized data
               | structures.
               | 
               | There's a reason SQL is still around after all these
               | years. It's basically math and while the syntax is a
               | little moldy the principles are eternal.
        
           | skytreader wrote:
           | This please. I feel like "How to Get Away with Just
           | PostgreSQL" and the GP comment falls squarely under being too
           | preoccupied with whether you could, you didn't stop to think
           | if you should.
           | 
           | Whatever happened to use the proper data structures for the
           | job? PostgreSQL and MySQL are, at the end of the day, b-trees
           | with indices. Throw in relational properties and/or ACID too.
           | Those aren't properties you need or want in a queue
           | structure.
           | 
           | I know I don't have a solid argument against not doing it;
           | it's just experience (and dare I say, common sense) telling
           | me not to. Not quite like parent but I spent the first two
           | years of my professional career in a team that had the
           | brilliant idea to use DBs as queues. The big task I partook
           | in for that stint is moving them off that v2 into a v3 which
           | used---wait for it---Redis. Everyone's quality of life
           | improved with every migration, proportional to the size of
           | the v2 cluster we retired.
        
             | zigzag312 wrote:
             | > Whatever happened to use the proper data structures for
             | the job?
             | 
             | This so much. People too often treat databases as magical
             | black-boxes that should handle anything. Database is most
             | often the bottleneck and choosing the proper storage engine
             | with appropriate data structures can be 100x more efficient
             | that just using the defaults. 1 server vs 100 can
             | definitely make a noticeable difference in costs and system
             | complexity.
             | 
             | While premature optimization is bad, choosing the right
             | tool for the job is still somewhat important and will
             | usually pay off in the long run.
        
               | mirekrusin wrote:
               | I think your "most often" is more like 0.01%, I'd say the
               | inverse is true, that _most_ would be fine with single
               | sqlite host or something like rqlite.
        
               | otoolep wrote:
               | rqlite author here. Happy to answer any questions about
               | it.
               | 
               | https://github.com/rqlite/rqlite
        
               | mirekrusin wrote:
               | Are you planning on adding websockets or something
               | similar in the near future to support things like ie.
               | data change notifications [0]?
               | 
               | [0] https://www.sqlite.org/c3ref/update_hook.html
        
               | zigzag312 wrote:
               | What would you then consider to be the most common
               | bottleneck?
               | 
               | I agree that there are many cases with low workload where
               | that would be plenty.
        
               | solipsism wrote:
               | Money. The most common bottleneck is money and customers.
               | Use whatever helps you get new customers faster.
               | 
               | Don't be scared of having to make changes in the future.
               | Do the small amount of work it takes today to make sure
               | your transition in the future is easy.
               | 
               | Transitioning from a SQL queue to redis it's only
               | difficult if you have a bunch of SQL throughout your
               | code. If you have that, you did it wrong.
        
               | mirekrusin wrote:
               | Most common bottleneck is lack of competence.
               | 
               | Direct visible effects are wrong decisions entangled in
               | spaghetti-like complexity.
               | 
               | It's hard to reach technical bottleneck in well designed
               | systems. Computers are really fast novadays. They will
               | vary greatly depending on what kind of system it is. Out
               | of resources - cpu, memory, network, disk io - likely the
               | weakest of them will be saturated first - network. But
               | that's not a rule, it's easy to have system which will
               | saturate ie. CPU before.
        
               | hughrr wrote:
               | This is the best post in this thread.
               | 
               | A lot of people don't see the effects of their decisions.
               | They leave a company after 3-5 years and go and work
               | somewhere else where they get to make the same mistake
               | again. The bottleneck indeed is lack of competence.
               | 
               | As for technical bottlenecks, it's quite easy to hit a
               | wall. Be it through layers of stupid or unexpected
               | success. We have unexpectedly reached the limit of what
               | is possible with x86-64 on a couple of occasions due to
               | stupid decisions made over 10 years previously for which
               | there is now no longer the budget or attention to fix.
        
               | zigzag312 wrote:
               | Competence is expensive :) While I mostly agree, even
               | well designed systems have (sometimes considerable)
               | tradeoffs.
               | 
               | > It's hard to reach technical bottleneck in well
               | designed systems. Computers are really fast novadays.
               | 
               | I have been listening to how fast moderen computers are
               | for the better part of the past two decades, yet as I
               | user I still have to deal daily with too many of slow
               | software and slow web services.
        
               | mirekrusin wrote:
               | Somebody once said "cheap things are expensive". This
               | idea applies to developers as well. Cheap developers will
               | drive company through bumpy roads towards uninteresting
               | plains. Good developers not only pay for themselves but
               | bring orders of magnitude more cash in. Only thing that
               | touches on this that I can find is "software
               | craftsmanship manifesto".
        
               | zigzag312 wrote:
               | True
        
             | Seattle3503 wrote:
             | Your comment sort of explains why you would use your DB as
             | a queue. It is a big task to migrate to a new system. If
             | you already have Ppostgres or MySQL integrated and
             | deployed. Using it as a queue may be the simplest option.
        
             | lolinder wrote:
             | What has me wanting to stick with postgres is that I work
             | on a small team (two developers) and adding more
             | technologies to our stack is extra overhead that's hard to
             | justify. At our peak we're currently handling one request
             | per second, and postgres for a queue is more than
             | sufficient for that. Is there any good reason for us to
             | add, learn, and maintain a technology neither of us yet
             | knows? Or would we do just as well to abstract away the
             | queue in the code so that we can switch to redis when we do
             | run into scaling problems?
        
               | skytreader wrote:
               | > Is there any good reason for us to add, learn, and
               | maintain a technology neither of us yet knows?
               | 
               | Absolutely and that reason is, you are still a small
               | team, with a small user base to boot. That's fantastic
               | opportunity to learn a new technology and build on it
               | properly! Remember everything is easier in software
               | engineering if you assume you have no users[1] and your
               | situation is as close as it gets to this ideal. Leverage
               | it.
               | 
               | Plus, as me and others keep saying, Redis (and other
               | proper queues) isn't a complex addition to your infra.
               | This isn't Hadoop, or Kafka, which is a very special type
               | of queue (one way to put it, at least).
               | 
               | > one request per second, and postgres for a queue is
               | more than sufficient for that
               | 
               | Yes I agree but...
               | 
               | > Or would we do just as well to abstract away the queue
               | in the code so that we can switch to redis when we do run
               | into scaling problems?
               | 
               | What I read when I see such statements is this mythical
               | software engineering ideal that with enough abstraction,
               | a migration is just a matter of writing a new class that
               | implements some interface and then changing a config. For
               | a sufficiently complex app infra, that happens almost
               | never because you could never keep the abstraction leaks
               | to an acceptable level.
               | 
               | Another thing, abstraction does not solve all your
               | problems if the underlying implementation is poor fit to
               | begin with. Let me paint you a plausible scenario:
               | 
               | Once you are large enough, you might find your PGQ acting
               | weird and you realize it's because someone in the team
               | wrote code that accesses your queue table like it's an
               | actual table of records, not a queue. So you think, okay
               | let's prevent that from happening. Maybe you add users
               | and permissions to distinguish connections that need to
               | access between tables proper and queue. Maybe you start
               | writing stored procs to check and enforce queue
               | invariants periodically.
               | 
               | Well, guess what, all those problems would've been solved
               | for free if you invested maybe one work day getting a
               | Redis server running when you were a two-person op
               | serving one request per second.
               | 
               | Lastly, scaling a relational DB is an entirely different
               | beast from scaling a queue. Scaling anything does not
               | ever come painless but you can reduce the suffering when
               | it comes. Would you rather scale PG so it can keep acting
               | as a queue or scale a queue that's, you know, really a
               | queue in the first place? Heck the latter might even be
               | solvable by throwing money at the problem (i.e., give it
               | more compute).
               | 
               | [1] Except for the part where you need to make money, of
               | course.
        
               | yongjik wrote:
               | > Absolutely and that reason is, you are still a small
               | team, with a small user base to boot. That's fantastic
               | opportunity to learn a new technology and build on it
               | properly! Remember everything is easier in software
               | engineering if you assume you have no users[1] and your
               | situation is as close as it gets to this ideal. Leverage
               | it.
               | 
               | I have to disagree. Of course code quality is important,
               | but building things "properly" because "we may need it
               | later" is a great way to kill a project with complexity.
               | KISS, YAGNI. An early startup is, IMHO, not a good place
               | to learn about new frameworks while getting paid - you're
               | on borrowed time.
               | 
               | Make a back-of-the-envelope calculation about how much
               | throughput you need. E.g., if you expect to have 10,000
               | users, and each may make one request per hour, you're
               | dealing with 3 qps. Anybody who wants to bring in a new
               | dependency for this, needs some talking to.
               | 
               | (If you already need Redis anyway and it's a better fit
               | than Postgresql, then sure, go ahead.)
        
               | skytreader wrote:
               | > but building things "properly" because "we may need it
               | later" is a great way to kill a project with _complexity_
               | 
               | Emphasis added because I feel like I addressed this in
               | the paragraph immediately after the one you quoted:
               | 
               | > Plus, as me and others keep saying, Redis (and other
               | proper queues) _isn 't a complex addition to your infra_
               | 
               | I'm speaking out of experience and, as I already pointed
               | out in another subthread, Postgres is far more complex
               | than Redis. Consider the presence of "DB
               | Admins/Specialists" and the lack of counterpart thereof
               | for Redis and other queuing solutions.
               | 
               | Of course, if queues are not central to how your platform
               | operates, you _might_ be able to get away with Postgres.
               | I still advise using Redis as a reasonable hedge against
               | someone famous tweeting organically about your service
               | because in this case, you don 't want your DB to go down
               | because some queue table had a surplus of transactions
               | (or vice versa).
               | 
               | Not to mention, at an early stage, your tech decisions
               | set precedents for the team. Maybe you have 10K users
               | with a low qps but soon you are sending marketing emails
               | to them and your system has periodic bursts of queue
               | activity for all 10K users at once. When discussing this
               | marketing "feature" rarely anyone thinks, "Hey we can't
               | do that with our Postgres queue", rather "Yeah I saw
               | functions in our codebase for queuing---this is doable".
               | This is a small effort but a huge technical investment
               | for later on.
        
               | [deleted]
        
           | onlyrealcuzzo wrote:
           | There are A LOT of use cases for a queue that aren't
           | particularly computer intensive and you aren't worried about
           | ever needing to scale - especially larger than your Postgres
           | database.
           | 
           | So while this might be good advice if you will need to scale,
           | it's certainly not blanket advice.
           | 
           | In the first case, I would much rather just have Postgres and
           | not a separate queue to manage.
        
           | basilgohar wrote:
           | For the scale of most tasks, the existing RDBMS that a small
           | app is using for its data is perfectly sufficient for the
           | queuing needs of said app. This reduces complexity in the app
           | and removes the need to maintain additional tools.
           | 
           | Frameworks like Laravel make this easy because they allow you
           | choose different backends for your DB, queue, and cache,
           | among other things, all or some of which can be an RDBMS like
           | Postgres or MySQL/MariaDB.
           | 
           | When your app's need outscale what the RDBMS can rationally
           | handle, then you can dedicate the resources to switch to a
           | more specific tool as your growth has indicated you should.
           | 
           | I can't say what the percentage of apps that can handle
           | things this way, but the anecdotes mentioned in this thread
           | are clearly the cases where growth outstripped what a
           | traditional RDBMS like Postgres are well-suited for, and so
           | migrating was the right thing. The probably was likely
           | identifying the need or delaying the transition until it was
           | too painful. I don't think there's a magic bullet here, but I
           | also don't think that starting with a dedicated queuing store
           | tool is also always the right thing to do.
        
           | baq wrote:
           | the other side of the coin is that queue servers are
           | databases, _just_ optimized for the use case.
        
           | atombender wrote:
           | You're not wrong, but every solution involves compromises.
           | More broadly, I would argue, though, that almost nobody
           | actually _needs_ queues.
           | 
           | Sure, there are cases were you actually need strict ordering,
           | but they are relatively few, in my opinion, involving things
           | like transaction processing or other areas where the order of
           | operations change the effect of the operations and must be
           | strictly sequential.
           | 
           | Ordering itself brings significant technical challenges. For
           | example: With a queue, you can only have strict ordering as
           | long as you only process one item as a time, with no
           | concurrency. Ordering also complicates handling of failures
           | and retrying, because it means every newer item has to wait
           | until the failed item has been dealt with; if you use things
           | like dead letter queues, order is violated.
           | 
           | In almost all use cases -- anything from image processing to
           | newsletter delivery -- a better system is to have an
           | unordered list of items that anyone can take items out of, in
           | any order. Then you throw as many workers at this system and
           | design it so that there's minimal locking needed to prevent
           | workers from stepping on each other's.
           | 
           | There are queue systems that have a purpose in a stack,
           | specifically pub/sub brokers, which can be used to _schedule_
           | work along several dimensions. You use the pub /sub system to
           | _signal_ work to be done, but each payload just refers to
           | some state that 's stored elsewhere.
        
             | hughrr wrote:
             | I think you're missing something important here by applying
             | the worker pool simplification to the idea. Order _is_
             | important i that case as well. It forces both fair
             | scheduling of the work and the metrics from the queue can
             | easily describe the overall latency of the work being done
             | and the capacity and utilisation. The latter can be used to
             | scale the consumers of the queue and /or alert on when
             | things aren't working properly.
             | 
             | If you throw a proper messaging broker on the table like
             | RabbitMQ then you're getting message acknowledgement as
             | well which is important if you _must_ do the work. A failed
             | consumer will result in the work being rescheduled on
             | another consumer transparently. Of course there are metrics
             | you can see and get heads up when your consumers are not
             | working.
             | 
             | Ergo, don't simplify something too soon. Even reliable
             | delivery of newsletters is rather important otherwise why
             | would all the tracking pixels exist? (they are a form of
             | message acknowledgement)
        
               | atombender wrote:
               | If latency is important, I would argue that ordering
               | doesn't help you.
               | 
               | That is, if you have the computational resources to
               | service your entire queue within the required latency,
               | then ordering doesn't matter, because workers could
               | simply claim the first available random task, and it
               | doesn't matter who finishes first or last. If you _don
               | 't_ have the resources, then ordering still doesn't
               | matter, because you are blowing your latency budget
               | anyway.
               | 
               | For example, say users are uploading photos which go into
               | a queue for scaling, cropping, etc. Let's say 10 users
               | per second are uploading a photo. In the best-case
               | scenario, the queue is empty, so all users should see
               | their photo fully uploaded quickly. Worst-case scenario,
               | your queue has a million tasks pending, which benefits a
               | handful of early users who see their photos completed,
               | whereas most users will simply wait forever. Prioritizing
               | early users (i.e. queue order) doesn't benefit anyone
               | when everyone will be miserable, so it's an arbitrary
               | choice.
               | 
               | The ordering imposed by queues can create an illusion of
               | fairness when task _execution time_ (i.e. end-to-end
               | latency) is not taken into account. If one user uploads a
               | 1 gigapixel photo, it 's most likely going to take up
               | more processing time than other users, and again, there's
               | no reason to prioritize _that_ user over any other user
               | just because their request arrived earlier.
               | 
               | If you have random-order delivery, you can still measure
               | latency, capacity, and utilization, as well as have
               | retries and acking, and utilization-based scaling.
        
             | gravypod wrote:
             | I use Queues for retries, buffering, and parallelization.
             | It's pretty easy to write most programs to support at least
             | once and to run multiple queue workers that you auto scale.
        
           | draw_down wrote:
           | I bet a bunch of them love the phrase "use the right tool for
           | the job" too. Just shows how hollow/meaningless that phrase
           | can often be
        
           | ngrilly wrote:
           | Look at SKIP LOCKED in PostgreSQL and MySQL.
        
             | hughrr wrote:
             | It's not that. It usually ends up in the same storage
             | engine as the main business functionality which leads to IO
             | contention. The some sick and twisted bastard does a join
             | to it from somewhere abhorrent which means it requires some
             | serious heavy lifting to fix it.
             | 
             | This is unfortunately the status quo in the slightly less
             | ideal universe than "seemed like a good idea at the time"
             | 
             | Use SQS or RabbitMQ or something.
             | 
             | Edit: Also if you're using something SQL for queues you're
             | going to have to build out your monitoring stack yourself
             | based on your implementation rather than suck up a metrics
             | endpoint with queue stats or pull from cloudwatch.
        
               | jbverschoor wrote:
               | I dunno. If it's usually the same problem, it should be
               | easy to identify and fix.
               | 
               | Unless we're talking about a high load, there should be
               | no problem doing this.
        
               | hughrr wrote:
               | It's only the same problem until someone dumps 25,000
               | entries in your queue.
        
               | kasey_junk wrote:
               | I've seen all manner of hokey queue implementations in
               | sql going back about 20 years and all of them could
               | handle 25k enqueue bursts. That wasn't a problem for a
               | Sybase database on a commodity host circa 2000.
               | 
               | I think if I were going to argue against using DBs as
               | queues it would be around: heavy parallel write use
               | cases, latency concerns of both reads/writes and scaling
               | to millions of events per second.
               | 
               | If you don't have those concerns using a properly
               | normalized and protected schema (which you are doing
               | anyway right? Cause if not you are already shooting your
               | toes off) for queues goes a very long way and removes a
               | very big operational burden and tons of failure modes.
        
               | justsomeuser wrote:
               | I agree.
               | 
               | Going from a single database to a (database + queue)
               | means two server processes to manage, maintain, observe,
               | test etc.
               | 
               | I actually start with SQLite to reduce as much
               | distributed state as possible, then move to something
               | else once it's proven it will not work.
        
               | hughrr wrote:
               | It wasn't a problem for Sybase on a commodity host circa
               | 2000 because clearly that host wasn't doing a whole lot
               | of other stuff. It's a big problem for our 48 core nodes
               | with 2TiB of RAM and a metric shit ton of DAS NVMe. Ergo
               | anecdotes don't scale either.
               | 
               | To clarify we just moved this entire problem to SQS.
        
               | crazygringo wrote:
               | What database can't handle an extra 25,000 entries?
               | 
               | That's... nothing. Databases handle billions of rows
               | effortlessly using a b-tree index. So not really sure
               | what point you're trying to make?
        
               | glogla wrote:
               | The benefit of having the queue in the same engine as the
               | main business functionality is that you can do stuff in a
               | single transaction.
               | 
               | Having the queue on its own means you have to handle
               | exceptions like "job is done but queue didn't get
               | updated" or get into distributed transaction which is
               | very different challenge.
               | 
               | Also, operationally having one inherently stateful
               | component (db) versus two (db, queue) makes things a lot
               | simpler to manage, deploy, update, etc.
               | 
               | Not claiming this is a perfect solution, just that the
               | tradeoff isn't obvious.
        
               | hughrr wrote:
               | The point of queues is sequencing, storage and decoupling
               | of events for the sake of scalability and durability.
               | 
               | By putting it in the same transactional store with the
               | same transactional boundaries you're instantly shooting
               | the whole point.
               | 
               | Not only that, most queues define boundaries between
               | different logical systems where transactional boundaries
               | do exist. At which point your database's transaction
               | scope is extremely limited in capability.
               | 
               | In the real world of messaging transactions span more
               | than just the database engine.
               | 
               | It's just bad architecture. Full stop.
        
               | victor106 wrote:
               | Having a separate queue makes sense across application
               | boundaries. If you need queuing mechanisms in your own
               | application you should be able to get by with a dB for
               | 99% of use cases
        
               | dragonwriter wrote:
               | > The point of queues is sequencing, storage and
               | decoupling of events for the sake of scalability and
               | durability.
               | 
               | The point of queues in any particular application is some
               | subset of that; the calculus of implementation approaches
               | that make sense depends not on the abstract point of
               | queues _in general_ but on the concrete point of queues
               | _in your use case_.
        
               | flefto wrote:
               | So use a separate database.
               | 
               | Is contention with the application your only objection?
               | That's pretty weak.
               | 
               | What's the real compelling reason not to use a database?
               | You haven't said.
        
               | hughrr wrote:
               | Contention, architectural separation, you have to build
               | your own monitoring stack, not transactional with
               | concerns outside the database without introducing
               | distributed transactions and risk, no routing or
               | distribution capability, you have to build it yourself,
               | point in time message durability is somewhat dubious
               | depending on your acknowledgement process which of course
               | you had to invent yourself as well.
               | 
               | Not to mention numerous database administrators crying
               | into their cups of coffee.
               | 
               | Enough?
               | 
               | Like I said 20 years of people fucking this up is my
               | experience.
        
               | deskamess wrote:
               | > you have to build your own monitoring stack,
               | 
               | How is this done in Redis automatically? At some point
               | you are writing queries (albeit Redis ones) to pull
               | metrics. Other queueing systems may expose an API but
               | there is always some level of integration. With
               | Postgres/other db I would write SQL which is their API
               | and a powerful one at that.
               | 
               | I can couple events with triggers to auto-generate other
               | events etc, have built in audit capability, roll up
               | reporting etc, all with standard SQL. (all in the context
               | of monitoring and reporting)
        
               | hughrr wrote:
               | Um, I'm not suggesting using Redis. In actual fact I said
               | elsewhere I wouldn't use Redis for queues.
               | 
               | As for triggers, reporting, audit, I'm laughing now
               | because you miss the point. That's another bunch of IO in
               | your monolithic black box, and you're still building it.
               | 
               | Start here: https://www.rabbitmq.com/prometheus.html
        
               | Jarwain wrote:
               | Currently at work we use a postgresdb for queues for
               | long-running persistent jobs, but I'd like to move away
               | from that model since I have doubts about how it'll
               | scale.
               | 
               | I've thought about using rabbit mq, but have a few
               | questions: - it appears that when a job is consumed, it's
               | gone. How would I maintain a record of the jobs & status
               | of the job in rabbitmq? If I wanted to display job
               | updates how would I handle that? I didn't think you could
               | update a message once it's already in the queue
               | 
               | Or am I missing the mark? Do I want to separate the
               | business entity "job" that maintains a status and updates
               | and such from the "thing that dispatches jobs to
               | workers"? And when a worker Has the job it just updates
               | the business entity in the database?
        
               | doktorhladnjak wrote:
               | You might want to look at something like SWF or Temporal
        
               | flefto wrote:
               | Meh. All your objections are hand wavey and not factual.
               | 
               | Databases actually work fine as a queue but emotionally
               | you don't like it. That's fine it's just not real strong
               | objections.
               | 
               | What you have not said is "it physically does not work",
               | and that's because it does work fine.
        
               | phildawes wrote:
               | I don't think he's saying 'it cannot be physically made
               | to work (given enough effort and discipline)'. The
               | impression I got was more like 'if you go down this route
               | then the incremental path of least resistance leads
               | somewhere bad'.
        
               | rmetzler wrote:
               | The range where databases work fine is pretty small.
               | Sudden spikes in job creation will kill the performance
               | of the database. Often enough the job creations come from
               | external systems you can't control.
        
               | hughrr wrote:
               | I'm not saying it doesn't work. I'm saying you're
               | shooting your toes off down the line.
        
           | mianos wrote:
           | Skype at its peak used a postgres queue extension, pgqueue,
           | at its core. Seemed to go OK for them. The extension did work
           | around some limitations when running a pure sql queue in
           | postgres. These days I would run Pulsar but if you only have
           | a million users and already have postgres you will be fine.
        
             | conradfr wrote:
             | I worked at a place using pg_queue (the person who created
             | it worked there before going to Skype IIRC) and maybe it
             | was working well operationally but it was not a great
             | experience to develop on it. It was hard to generate events
             | and debug/observe them (it was height years ago so the
             | details are a bit fuzzy).
             | 
             | MS also moved from it afterwards when they acquired Skype
             | but I guess it was more because of the shift to SQL Server.
        
           | maxrev17 wrote:
           | Lean startup begs to differ :p
        
           | vincentmarle wrote:
           | Exactly, GP does not sound like someone who has worked on
           | anything with decent scale. Because if you did, you would
           | know that it is a really, really bad idea.
        
             | manigandham wrote:
             | Very few things have "decent scale". 99% of apps can run on
             | a single server with modern hardware.
        
             | lolinder wrote:
             | Or maybe GP is someone who has worked on projects from
             | their infancy. If your queue will be handling less than a
             | request per second for the foreseeable future, why _not_
             | just use the technologies already in your stack and plan a
             | migration path now? What do I gain from adding Redis (and
             | thereby making my system even more distributed, with all
             | the hazards that carries) if my project isn 't even
             | guaranteed to make it to 10 rps before going bankrupt?
        
           | aidos wrote:
           | The solution I use now is to have the queue in the dB and
           | then have a single process pushing to another queue better
           | suited to dealing with workers.
           | 
           | I drank from the "keep your queue out of the dB" koolaid
           | once, but then you end up worrying about transaction
           | boundaries and atomic writes to the queue and the dB. It
           | totally depends on your workload, but in my case, I've found
           | the halfway solution of the queue in the dB so you get to
           | write to it within your transactions, with a dedicated queue
           | for the workers to be a much better solution.
        
             | hughrr wrote:
             | Are you sure it works properly when something fails. That's
             | a wonderful situation I like to ask people because the
             | answer is usually no. Then someone gets billed twice for
             | something and then the guy who said this was a good idea
             | gets shot.
        
               | aidos wrote:
               | Well, I only ever enqueue idempotent work, so again, in
               | my scenario it can fail and be rerun, but I'm also unsure
               | about what you think is a bad idea here.
               | 
               | Honestly, I had more code managing two phase commits so I
               | would only push to the queue when I was sure (ish) the dB
               | commit was ok.
               | 
               | Say you need to trigger something to send an email, but
               | let's say that there's an error condition after and the
               | transaction is rolled back. How do you handle that
               | failure? In my scenario the queue is rolled back too, no
               | harm to foul.
        
             | AtNightWeCode wrote:
             | The outbox pattern. Can in many cases be solved by using
             | topics.
        
               | aidos wrote:
               | Ah ha. Yup, that's exactly it. Thanks for the pointer.
               | 
               | Edit: can you explain by what you mean by solving it with
               | topics? I thought topics were a broadcast mechanism?
        
         | sorokod wrote:
         | What happens if the process that performed 2. crashes before it
         | was able to complete whatever processing it was supposed to do?
        
           | jayd16 wrote:
           | I think they want to select expire < now instead of expire =
           | 0. Delete the row on completion.
        
           | jffry wrote:
           | The idea, I think, is you wouldn't delete the job from the
           | queue until the processing was done.
           | 
           | Of course, this relies on the jobs being something that can
           | be retried.
        
             | sorokod wrote:
             | Looking at the code, some steps are missing in the
             | description:
             | 
             | 3. When processing is complete the matching table entry is
             | deleted.
             | 
             | 4. There is a "reaper" process that checks for stale jobs (
             | based on time in the future ) and resets them to 0.
             | 
             | This of ouffcourse raises more questions.
        
             | shawnz wrote:
             | But you will have marked it as in progress by setting the
             | "expire" to a non-zero value, preventing any other workers
             | from trying to work on it. How will they know that the
             | worker which marked it actually crashed and will never
             | finish?
             | 
             | By using SELECT ... FOR UPDATE SKIP LOCKED, the record will
             | automatically get unlocked if the worker crashes.
        
               | sorokod wrote:
               | This approach has its own gottchas.
        
               | jffry wrote:
               | Not sure. Maybe something else is responsible for
               | identifying "stuck" jobs and kicking them out to humans
               | to decide what to do (assuming worker crashes are rare)
               | 
               | If we're talking PostgreSQL specifically, and newer-ish
               | Postgres (9.5+ I think), then you can leverage its
               | abilities to do all this in one atomic query:
               | UPDATE jobs       SET status='working'       WHERE id =
               | (SELECT id                   FROM jobs
               | WHERE status=NULL                   LIMIT 1
               | FOR UPDATE                   SKIP LOCKED)       RETURNING
               | id
        
               | shawnz wrote:
               | If you are willing to use SELECT ... FOR UPDATE SKIP
               | LOCKED, then you can implement the queue without any
               | UPDATE query or status column at all. Just lock the
               | record to mark it as in progress, that is how you get the
               | benefit that when the worker crashes, it will
               | automatically be returned to the queue.
        
         | hardwaresofton wrote:
         | Don't forget SKIP LOCKED[0]!
         | 
         | [0]: https://blog.2ndquadrant.com/what-is-select-skip-locked-
         | for-...
        
           | masklinn wrote:
           | `FOR UPDATE` is also a good idea.
        
           | Koffiepoeder wrote:
           | The article mentions that.. The comment by chx was rather a
           | more generalized idea for implementing a queue that is not
           | database-specific.
        
         | sigil wrote:
         | Problems with this approach:
         | 
         | 1. Recovery.
         | 
         | 2. Concurrency.
         | 
         | 3. Polling.
         | 
         | In more detail:
         | 
         | 1. _Recovery_. Suppose a worker dies while processing a job.
         | The job should be retried, but how do you know this, since
         | expire  > 0? You can impose a timeout, but that has drawbacks
         | -- there might be long jobs you repeatedly start but never
         | finish. To recover failed jobs without imposing a timeout,
         | you'd have to run both the UPDATE and the job processing inside
         | a transaction. That way, a failed job can result in an implicit
         | ROLLBACK, freeing the job for future worker attempts.
         | 
         | 2. _Concurrency_. So now recoverable workers are trying to
         | claim jobs inside a transaction. If there are 10 jobs and 10
         | workers, we want all 10 jobs processing concurrently. However,
         | each worker SELECTs the first item_id and tries to UPDATE that
         | same row. 1 worker wins. The other 9 workers block while that
         | transaction completes, then update zero rows. Concurrency will
         | hover around 1.
         | 
         | 3. _Polling_. In the transactional UPDATE approach, there 's no
         | way to tell whether there are free jobs short of trying to
         | claim them via UPDATE, which blocks. So you must poll, and
         | either burn cpu on quiet queues, or introduce an artificial
         | delay into job processing times. The beauty of the SKIP LOCKED
         | approach is that it _can_ tell whether there are free (not
         | currently processing) jobs. Even if you wake all 10 workers up
         | via LISTEN  / NOTIFY for a lone job, 9 workers in the
         | thundering herd will fail to claim the job, and go back to
         | sleep.
         | 
         | I blame TFA for not sufficiently explaining the SKIP LOCKED
         | approach. A much better explanation is here:
         | https://www.2ndquadrant.com/en/blog/what-is-select-skip-lock...
        
         | dillondoyle wrote:
         | Thanks for posting this! It is relevant to me right now and
         | great example. Especially since I'm using a new to me database,
         | Spanner.
         | 
         | Specifically I similarly pull rows into a 'queue' like that
         | linked sql by WHERE primary key, primary key, and queue_status
         | ORDER BY timestamp, uuid LIMIT 10. I was worried what would
         | happen with many requests around the same time if it would
         | accidentally pull duplicates into 'queue.'
         | 
         | I think hopefully I've implemented Spanner locking read write
         | transaction correctly and that this won't cause problems at
         | scale (we're not even close to google scale lol).
        
         | wodenokoto wrote:
         | > SELECT item_id WHERE expire = 0
         | 
         | I come across a lot of SQL databases that uses 0 instead of
         | false.
         | 
         | Is there any good reason for this?
        
           | masklinn wrote:
           | So that you don't need a separate column for an associated
           | datum?
        
           | chx wrote:
           | false would be "expired", this is called "expire" because it
           | is a timestamp, it's the epoch time when the item expires.
           | The worker presumably finishes before that time and if it
           | does then the item is deleted. So, if any item has a nonzero
           | expires in the past then it needs to be reset to 0 so it can
           | be retried.
        
           | xupybd wrote:
           | Only in databases that have no Boolean type.
           | 
           | Sometimes it's an artifact of the orm mapping.
        
         | dragonwriter wrote:
         | > You need a table with a primary id and a "status" field.
         | 
         | For a queue, you need a table with an insertion order and a
         | status indicator. "Primary id" is not enough because "primary
         | id" may not be ordered (e.g., a UUID primary key is a primary
         | id.)
         | 
         | OTOH, while a DB can do this, therr are plenty of well-tested
         | open source purpose built message queueing solutions, so the
         | swt of scenarios where "roll your own with a DB" isn't a misuse
         | of effort is limited.
        
           | lolinder wrote:
           | Why on earth would you make a queue table with a non-serial
           | primary key? What benefit do you gain from a UUID here?
        
             | [deleted]
        
         | solipsism wrote:
         | What am I missing? With your description, nothing ever sets
         | expire to 0. Can't look at the code comfortably as I'm on
         | mobile.
        
         | pjmlp wrote:
         | Yeah, most of the NoSQL solutions boil down to not learning how
         | powerful SQL and its programming extensions are.
        
           | rad_gruchalski wrote:
           | How so?
        
           | kasey_junk wrote:
           | That's a bit of rewriting history. NoSql, particularly
           | document style stores came about at least as much as a
           | reaction to the overweight policies & orms the sql world has
           | made rampant at the time.
           | 
           | You'd be trying to store a tiny bit of simple state and all
           | the books/articles would have you standing up read only views
           | & stored procedures for all your crud ops. The document
           | stores came along with a fresh perspective and easy scaling.
           | 
           | Then their were the columnar stores and time-series stores
           | that really did solve the newer scale problems in ways the
           | existing sql stores didn't.
           | 
           | I'm a sql guy through and through but it's important to
           | recognize the nosql movement was a reaction to real pain
           | points. Also it made the sql databases better.
        
             | pjmlp wrote:
             | What history rewriting?
             | 
             | I have been using RDMS all along since 1996.
             | 
             | Nokia NetAct was scaling GB of data across multiple
             | clusters with OLAP reporting engine in 2005 with no
             | hiccups.
             | 
             | The experience I had with DynamoDB kind of proved to me I
             | haven't lost anything by staying in the SQL path.
             | 
             | Most NoSQL deployments I have seen, could have been easily
             | done in Oracle or SQL Server, provided they actually had a
             | DBA on the team.
        
               | kasey_junk wrote:
               | But that's sort of the point. You are saying "if you have
               | a DBA and use proprietary products and have the skill to
               | understand the trade offs of running a db with another
               | data warehouse product layered in you can handle GB of
               | data".
               | 
               | Mongo said "Throw data at me and I'll scale with very
               | little work".
               | 
               | Now, I've always largely believed that's penny wise and
               | pound foolish but it's certainly a good pitch.
        
               | pjmlp wrote:
               | Which is what I originally stated "boil down to not
               | learning how powerful SQL and its programming extensions
               | are".
               | 
               | One can only have bought into Mongo's story, by lacking
               | the skills to understand how fake it was.
        
               | [deleted]
        
               | sorokod wrote:
               | "mongo said" - the obligatory mongo DB is webscale video:
               | 
               | https://m.youtube.com/watch?v=b2F-DItXtZs
        
               | nemothekid wrote:
               | DynamoDB was designed to solve a very different problem
               | than a traditional SQL database. When DynamoDB (and the
               | other Cassandra flavors) were released there were no
               | databases doing multi-master failover with high write
               | throughput - we are talking about TBs, not GBs. It's not
               | a coincidence that Google, Facebook, and Amazon all had
               | to write to their own database at around the same time
               | (BigTable, Cassandra, Dynamo).
               | 
               | With those new tools you had other companies building on
               | top of those databases for far cheaper than a license to
               | MSSQL or any other OLAP of choice would give you.
        
         | edwinyzh wrote:
         | Great! So I learned this technique. But one thing that's not
         | clear to me - how a "expired" datetime field can allow easy
         | retries? Anyone give me an example? Thanks!
        
           | chx wrote:
           | Successful items are deleted from the queue before they
           | expire. So an item with expired > current time needs to be
           | retried because it failed processing. So, a cronjob resets
           | these to expire 0 and the next worker can pick them up.
        
             | edwinyzh wrote:
             | Thanks for the explanation and time is needed in order to
             | digest it... I guess one might start with the simple
             | "status" field approach ;)
        
               | DangitBobby wrote:
               | The status doesn't help if the worker died during
               | processing. If you've got something that's been "in
               | progress" for 20 minutes of a job that takes 2, your job
               | might not be getting done. That's what expired is for.
        
             | layoutIfNeeded wrote:
             | Doesn't sound like a robust solution. What if the job takes
             | longer than your estimated expire time? Does the worker
             | keep updating the expiry date on a background thread while
             | doing the processing?
        
               | cbo100 wrote:
               | Same for any queuing system. You need to set the expiry
               | time long enough for the expected task duration.
               | 
               | In SQS, for example, you use a visibility timeout set
               | high enough that you will have time to finish the job and
               | delete the message before SQS hand it off to another
               | reader.
               | 
               | You won't always finish in time though, so ideally jobs
               | are idempotent.
        
             | ptx wrote:
             | You mean _expired < current time_, right?
        
             | johbjo wrote:
             | Could the process not delete the row, and not commit until
             | it has finished processing? Otherwise, the transaction is
             | rolled back when the db connection ends. This would not
             | require timeout or clean-up.
             | 
             | One would need the appropriate transaction isolation, but
             | shouldn't it work?
        
       | papito wrote:
       | The world would be a simpler place if people actually knew their
       | storage engine like Postgres. I mean, _knew_ it. All the features
       | and how to leverage them in a performant way. Would eat less
       | cloud energy too.
        
       | sgt wrote:
       | As much as I love Postgres, I would rather use Redis for this. I
       | haven't used Redis much though, and on our project we actually
       | decided on using Kafka. Admittedly much heavier and maintenance
       | intensive, it seems to do the job very well.
       | 
       | Any opinions and experiences here with Kafka vs Redis as a queue?
        
         | ickyforce wrote:
         | Don't use Kafka as a queue.
         | 
         | (I haven't used Kafka for a while, if anything below is
         | outdated let me know)
         | 
         | The main issue I've experienced was balancing work across
         | consumers - if one consumer is slower than others (e.g. running
         | on a machine with some other heavy workload). In such case when
         | a larger number of jobs is enqueued and expected to process
         | quickly it's possible that everything will complete except jobs
         | which landed on the partition belonging to the slow consumer.
         | There can be only one consumer per partition so others can't
         | help.
         | 
         | One could consider using a single partition to avoid this
         | imbalance but this means that there could be only a single
         | consumer - not good.
         | 
         | In other queue systems consumers fetch up to N jobs from a
         | shared pool when they need more work. This means that in the
         | case of a slow consumer the lag is limited to by the time
         | required to process N jobs.
         | 
         | This situation also arises if consumers are equally fast but
         | some kinds of jobs are considerably slower. Random distribution
         | across partitions mitigates it to a degree.
         | 
         | My case was precomputing caches after some event, the goal was
         | to do this within 10 seconds. First implementation with Kafka
         | worked but very often some consumer couldn't use as much CPU
         | time as others and we didn't want dedicated nodes for this
         | because this even was happening at most few times a day. As a
         | result jobs from a single partition were processed 2x slower -
         | we had to overprovision.
         | 
         | Replacing Kafka with a queue based on Redis solved this right
         | away, all consumers were finishing at the same moment. Although
         | it wasn't a large scale operation.
         | 
         | The second complaint about using Kafka as a job queue is not
         | being able to modify the queue, e.g. to reorder jobs, postpone
         | jobs or cancel jobs
        
           | sgt wrote:
           | That is very interesting, thanks. I will look into a possible
           | migration to Redis, which in any case is so lightweight and
           | simple to run, it seems.
        
         | arpa wrote:
         | It's apples versus oranges with rabbits in the middle.
        
           | des1nderlase wrote:
           | Ha ha. So true. In theory though you can make any Turing
           | complete piece of software to do anything, so rabbits can be
           | turned into oranges...
        
       | dandarie wrote:
       | How about tags/sets?
        
       | flefto wrote:
       | MySQL and Microsoft SQL server also support SKIP LOCKED.
        
       | tim44 wrote:
       | ia this a bub het, found, hard srry gl hf hf
        
       | taf2 wrote:
       | Redis to me is the magic solution to solve so many complex
       | multiple process syncing issues, a global lock, rate limiter,
       | uniqueness with a set, stasher of all the temporary things... no
       | migration, quick solution to complex problems- streams of events
       | to organize pending tasks, etc... to me it's all about the
       | temporal / temporary state my application needs to keep track in
       | order to work in a multiuser - multiprocess and multi service
       | environment... just a different tool for a different job then
       | what I would use a database for... but really it's computer
       | science, to each is their own
        
       | x7777 wrote:
       | Sometimes I wonder what kind of apps you are all developing where
       | using postgres instead of redis makes sense? do you have 3 people
       | using your app? Is 800ms latency ok for your users? Or is it that
       | you don't care. Honest question.
        
         | jayd16 wrote:
         | Postgres is plenty fast. If you think it can only do 3 users or
         | 800ms, you're doing something wrong.
        
           | x7777 wrote:
           | It is the blind claim of it being "plenty fast" that is
           | resonated in this thread that I find problematic. The article
           | describes how to implement queueing/app locks/pub sub,
           | without providing any benchmarks on how the same
           | implementations compare to redis under different loads.
           | 
           | If you are using redis and you don't care about perf you are
           | probably doing it wrong. If you are using postgres and you
           | care about perf you are probably doing it wrong. "Probably"
           | because context is key.
        
             | mixmastamyk wrote:
             | Pg is very fast. If it is slower it is because it does more
             | and does it more safely. When not needed those things can
             | be disabled.
             | 
             | There are a number of tips about that in the discussion
             | above, such as turning off logging of and partitioning a
             | job table.
        
       | cs02rm0 wrote:
       | I tend to ask the opposite question more often.
       | 
       | Do you really need Postgres?
        
       | gchamonlive wrote:
       | Get your requirements in check.
       | 
       | How sensitive is your app to latency? How much data and request
       | volume you need to handle?
       | 
       | Do proof of concepts, write thorough load tests and go for what
       | makes sense.
       | 
       | Either way, no matter which tech you choose, make sure monitoring
       | and alarms are in place and that you do regular maintenance
       | exercises.
        
       | tim44 wrote:
       | men have never found me. at least
        
       | tim44 wrote:
       | found at at a deal
        
       | tim44 wrote:
       | any
        
       | MichaelMoser123 wrote:
       | it's cool that they added listen notify to postgres; I wonder
       | when they will add many-to-many relationships; my problems with
       | postgress start when i have to do a many to many relationship.
       | For these you need to do an extra table for the link entries; now
       | performance tanks, and you need to denormalise this for read
       | requests, that's where the fun starts...
        
         | mixmastamyk wrote:
         | M2M tables shouldn't be a problem except at the very high end,
         | in which case you'll have a number of mitigations at your
         | disposal. ?
        
       | tim44 wrote:
       | any never alt homely
        
         | [deleted]
        
       | jdlyga wrote:
       | It's kind of like buying all sorts of running outfits and a
       | camelbak to go on a 3 mile jog once a week. It's about overeager
       | optimism of doing races and marathons in the future. Where in
       | reality, you can get away with just running in the track pants
       | and tshirt you already have on.
        
       | osigurdson wrote:
       | This seems to come up on HN at least once a year. Sure it can
       | work but LISTEN ties up a connection which limits scalability as
       | connections are limited and expensive. Also, mitigation
       | strategies like PgBouncer cannot be used with this approach (nor
       | can scale out solutions like CitusDB I don't think).
       | 
       | Of course, if scalability is not a concern (or the connection
       | limitations are eventually fixed in postgres), this would be a
       | very viable approach.
        
       | axiosgunnar wrote:
       | The point the author is missing is that most people are not
       | deliberately choosing Redis.
       | 
       | They use packages such as celery which happen use Redis under the
       | hood but the user doesn't have to deal with these details other
       | than pasting some Redis connection URL somewhere.
        
         | Nicksil wrote:
         | >They use packages such as celery which happen use Redis under
         | the hood but the user doesn't have to deal with these details
         | other than pasting some Redis connection URL somewhere.
         | 
         | Celery doesn't use Redis under the hood.
         | 
         | However, you may use Redis as a message broker and/or result
         | store.
         | 
         | https://docs.celeryproject.org/en/stable/getting-started/bac...
        
       | antihero wrote:
       | I think one of the draws of redis was, back when it came out, how
       | simple it was to set up and maintain in comparison to an RDBMS.
        
       | sometimesshit wrote:
       | I need an article how to get away with just MangoDB
        
       | simonw wrote:
       | The best reason I know of to use a relational database as a queue
       | is that it lets you trigger queue operations from within a
       | transaction - so a queue item is guaranteed to be created if the
       | transaction succeeds, and guaranteed not to be created if the
       | transaction fails.
       | 
       | Brandur wrote a great article about that here:
       | https://brandur.org/postgres-queues
        
       | Aeolun wrote:
       | This seems a bit silly. We might be able to use only postgres,
       | but this seems harder than using Postgres _and_ Redis.
       | 
       | Any time someone says 'this works for small workloads' I hear,
       | 'we'll eventually have to migrate this'.
       | 
       | That's not a problem if you are talking about introducing a
       | hugely complex piece of infrastructure, but Redis is hardly that.
        
         | gizdan wrote:
         | > Any time someone says 'this works for small workloads' I
         | hear, 'we'll eventually have to migrate this'.
         | 
         | Personally I aim to develop with future migrations in mind.
         | Certain abstractions are worth it. Put an interface around a
         | cacher, db store etc. Day to day dev overheads is low, but your
         | future self will thank you should you need to migratw away.
         | 
         | Sure this isn't always possible, but do it as much as possible.
        
         | manigandham wrote:
         | Most workloads are small workloads. Modern hardware is
         | incredibly powerful.
        
         | ryanjshaw wrote:
         | Or don't use Postgres to begin with. The reason you need to
         | SELECT FOR UPDATE is because Postgres doesn't implement READ
         | UNCOMMITTED. If you're implementing an ordered queue, you have
         | no way of reliably getting the next page of queue items without
         | taking a lock using SELECT FOR UPDATE because there may be
         | pending uncommitted transactions that you need to block on. On
         | MySQL and SQL Server you can compare counts computed with READ
         | UNCOMMITTED and READ COMMITTED (on MSSQL, with snapshot
         | isolation) to quickly figure out whether there are any
         | uncommitted transactions in the range without taking a lock.
         | That being said, SELECT FOR UPDATE and UPDLOCK equivalents are
         | the easiest to get right, and if you need higher throughput you
         | probably need a message queue or Kafka-equivalent.
        
         | kristopolous wrote:
         | I've worked at plenty of businesses with annual revenue in the
         | tens of millions using "non-scaled" solutions without any
         | issue.
         | 
         | If we rephrase question to the material reality of "What will
         | we do when we're pulling in $20,000,000 a month?" then you can
         | see how silly it is.
         | 
         | I dunno, just buy a company then and use their stuff? It's
         | literally not a problem that a pre/early-revenue company should
         | or realistically can solve. It's part of "founders delusion" -
         | the same kind that make them get a huge office that's mostly
         | empty or all that unused computer hardware sitting in the
         | closet. It's just gambling.
         | 
         | It'd be like saying "we have $500,000 in the bank at this new
         | restaurant and need to be ready service 5,000 national
         | locations with our staff of 5."
         | 
         | I mean no, by the time that's no longer just a mental exercise
         | (as in you have, say 4,700 locations) you'll be a different
         | kind of institution better able to solve the problem. Literally
         | just forget about it. It's so profoundly irrelevant...
         | 
         | Solve the problems that are actually real, you've got enough of
         | them, I promise.
        
           | simonjgreen wrote:
           | I feel like I've spent half by career telling people this.
           | 
           | My general recommendation is to engineer for only 1 order of
           | magnitude greater than you need today, but even that stops
           | applying eventually.
        
             | kristopolous wrote:
             | The response is easy "find me an otherwise successful
             | company that actually collapsed from failing to scale
             | things like a database and we'll use it as a case study."
             | 
             | Security and PI stuff is unfortunately something you _do_
             | have to be super anal about these days but scalability? no.
        
         | anaganisk wrote:
         | For thousands of applications deployed, they're most probably
         | never need scale.
        
       | zomgwat wrote:
       | Security is another factor in choosing not to use a database for
       | things like queues. I like to keep access to the database as
       | tight as possible. I don't want to start dishing out network and
       | account access to services just because they need a queue or a
       | distributed lock. I could run a separate database instance but
       | that's worse than just running Redis.
        
       | ram_rar wrote:
       | Why do you need postgres, if you can get away with Redis? For all
       | the 3 points mentioned in the article, I would rather just use
       | redis as apposed to postgres. Unless, you really need a rds
       | solution for the project.
        
         | dkarras wrote:
         | I think the article is assuming you already have Postgres for
         | data storage and arguing against the addition of redis when
         | that is the case, for the given common requirements that
         | typically calls for the addition of redis into the mix.
        
       | cpursley wrote:
       | With Elixir you can replace both with something like Quantum or
       | backed by Postgres with Oban. Or even just a hand rolled
       | genserver.
       | 
       | You can also listen to Postgres WAL changes to build an Event
       | like system with Elixir. Supabase is doing this with their
       | Realtime app.
        
       | Rygu wrote:
       | Don't forget to use Partial Indexes on jobs/tasks tables with
       | queries like WHERE status = 'pending'
       | 
       | More: https://use-the-index-luke.com/sql/where-clause/partial-
       | and-...
        
       | BiteCode_dev wrote:
       | Oh but redis is much more than that.
       | 
       | It's so simple any little script can interact with it in seconds,
       | instead of having to craft complex SQL or import your wrapper.
       | You can call redis from the weirdest places, like from the inside
       | of an nginx config file. You needn't even a compiled driver if
       | that matters.
       | 
       | It's easier to get it to perform. It's possible to get great
       | perfs with PostgreSQL, but you just got them for free with redis.
       | Very hard to screw up. Read, write ? N+1 ? Batching ? Who cares,
       | it will be fast no matter the concurrency or the load.
       | 
       | Sure you can expire with Postgres, but having done so in
       | practice, it's way harder than it looks to get right. With redis,
       | you don't have to care. Set a cache, set a lock with a timeout,
       | store data your are not sure you need. It will all disapear.
       | 
       | Redis is not just key/value. You have sets, so ensuring unicity
       | is a piece of cake, no constraints to define, then insertion to
       | check against. And of course, you have sorted sets, which you are
       | kept ordered at insertion by any number you pair the value with,
       | such as a timestamp/score/index, and truncate by range for extra
       | magic.
       | 
       | And then you have bonuses like hyperloglog which you need an
       | extension for in Posgres. Super handy for stats.
       | 
       | Finally, you have streams, which for most apps will fill the gap
       | for a timeserie database or a way to store your app logs. All
       | that with a 2ms latency at 10k/s requests. None of my projects
       | ever needed more than 1K/s though, even one with 1M users/month.
       | 
       | You have all of that with a dead simple install and basically no
       | maintenance.
       | 
       | In fact, redis by itself consume almost no resource, it's not
       | even worth it to not have it in your toolset. I just install it
       | by default on all my projects: I never regretted it, there is
       | always something it can do for you. It not now, just wait a bit,
       | it's costing you nothing, and something will come up.
       | 
       | So no, let use Postgres for what it's great at, which is being a
       | robust all purpose database. Redis is a fantastic complement to
       | it, not a competitor, just use both.
       | 
       | Unless you are google size, there are little chances you will
       | reach a stage where you need to migrate from any of them.
       | 
       | It's part of those tech that are just too good to be true, like
       | SQLite or Python.
       | 
       | My only regret is that it doesn't exist on windows.
       | 
       | P.S: if you need caching and can't afford redis on a small python
       | script, use Disk Cache, it's awesome:
       | http://www.grantjenks.com/docs/diskcache/index.html
        
         | junon wrote:
         | Not to mention I've never personally experienced redis go down
         | by itself - it's always some stupid user error that could have
         | happened to any part of the infrastructure.
         | 
         | Redis is, far and away, _the_ most robust piece of software
         | I've ever had the pleasure of deploying to production.
        
           | snicky wrote:
           | That + the best documentation in the industry. It is a joy to
           | work with.
        
       | bob1029 wrote:
       | I feel I could write a similar article titled:                 Do
       | you really need PostgreSQL? How to Get Away with Just SQLite
       | 
       | How many apps are running pg on a single node and don't really
       | have plans to scale beyond this?
       | 
       | Exclusive ownership of a SQLite database by the application can
       | deliver better performance than exclusive ownership over a single
       | node postgres database. This also extends to SQL Server, DB2,
       | Oracle, or any other hosted solution.
        
         | aeoleonn wrote:
         | Yep. Or even
         | 
         | "Do you really need SQLite? How to get away with a dozen
         | secretaries, pencil, paper, and a fax machine"
         | 
         | I'm with ya. Why not use the right tool for the right job,
         | given the right tool exists?
        
       | mosselman wrote:
       | For rails I've used que in the past, which is a job queue adapter
       | for Postgres. It supports activejob.
       | 
       | https://github.com/que-rb/que
        
         | shaicoleman wrote:
         | Another alternative is GoodJob:
         | 
         | https://island94.org/2020/07/introducing-goodjob-1-0
         | 
         | https://github.com/bensheldon/good_job
        
         | pablodavila wrote:
         | For Elixir, Oban is a great library that allows the same, in
         | case anyone was looking for one.
         | 
         | https://github.com/sorentwo/oban
        
       | void_mint wrote:
       | "Do you really need < insert well-tested, verified technology > ?
       | Why not just hand-roll your own in < different technology > ?"
       | 
       | Hopefully the emphasis is clear why this is silly.
        
         | meitham wrote:
         | Indeed this is silly, to me this sounded like "Do you really
         | need a drill? Just hammer this screw down!".
        
           | void_mint wrote:
           | As with all things, if the goal is exploration, hell yes try
           | this out.
           | 
           | If you're actually trying to solve the problems Redis is
           | meant to solve, just use Redis. Otherwise you become the sole
           | maintainer of a glued together, suboptimal version of Redis
           | based in Postgres. For most people, they'd rather build
           | whatever caused them to need Redis in the first place, rather
           | than building their own internal worse-version-of Redis.
        
       | iamgopal wrote:
       | Does groupcache for golang works with cloud run ? As a redis
       | replacement ?
        
       | nameless912 wrote:
       | It's always wild when an article on HN has an answer to the exact
       | problem i wanted to solve this weekend. Kudos!
        
       | nickjj wrote:
       | I think one of the biggest advantages of using Redis for job
       | queing vs Postgres comes down to library support.
       | 
       | For example Python has Celery and Ruby has Sidekiq. As far as I
       | know there's no libraries in either language that has something
       | as battle hardened with comparable features for background tasks
       | using Postgres as a backend.
       | 
       | There's a big difference between getting something to work in a
       | demo (achievable by skimming PG's docs and rolling your own job
       | queue) vs using something that has tens of thousands of hours of
       | dev time and tons of real world usage.
       | 
       | I'm all for using PG for things like full text search when I can
       | because it drastically reduces operation complexity if you can
       | avoid needing to run Elasticsearch, but Redis on the other hand
       | is a swiss army knife of awesome. It's often used for caching or
       | as a session back-end so you probably have it as part of your
       | stack already. It's also really easy to run, uses almost no
       | resources and is in the same tier as nginx in terms of how crazy
       | efficient it is and how reliable it is. I don't see not using
       | Redis for a job queue as that big of a win.
        
         | kortex wrote:
         | Totally agree. The number of job queues which use or can use
         | Redis as the backing store is legion. Celery, rq, arq, Golang
         | has asynq (inspired by sidekiq iirc), and that's off the top of
         | my head. IMHO, it's just a better interface for implementing a
         | job queue than an RDB.
         | 
         | It's also probably one of the easiest services to deploy and
         | manage; often a one-liner.
         | 
         | Plus like you said, swiss army knife. It has so many uses. It's
         | inevitable my stack will include a redis at some point, and my
         | reaction is almost always "I should have just started with
         | redis in the first place."
         | 
         | Is redis prone to golden hammer syndrome? Of course. But as
         | long as you aren't too ridiculous, I've found you can stretch
         | it pretty far.
        
           | adventured wrote:
           | Redis and I have a golden hammer agreement. I keep finding
           | new ways to use it and it just keeps working.
        
         | derethanhausen wrote:
         | The elixir world has Oban[0] which implements quite a lot of
         | advanced job features on top of PG. Admittedly it doesn't quite
         | have the usage of Celery and Sidekiq but most queueing
         | libraries don't.
         | 
         | [0] https://github.com/sorentwo/oban
        
         | trulyrandom wrote:
         | In the case of Python, Celery does support SQLAlchemy as a
         | broker if I remember correctly. So in theory, you _could_ still
         | use PostgreSQL and also have a solid queue library.
        
           | nickjj wrote:
           | It is supported but it's classified as experimental and not
           | officially maintained by the core devs based on their
           | documentation.
        
         | RedShift1 wrote:
         | But in Postgres you could write functions in your schema to
         | handle job queueing/dequeueing with the additional benefit of
         | being able to use it in any language that can connect to
         | Postgres and being able to reuse the same SQL/interface across
         | all languages.
        
           | nickjj wrote:
           | I'm totally with you in this regard and I'd like to see that
           | too but the reality of the situation is a job queue is more
           | than slapping together a few SQL queries.
           | 
           | A good job queue will have all or most of these features:
           | Prioritize jobs (queues, weighs to certain jobs, etc.)
           | Scheduled jobs (running them once but X time in the future)
           | Periodic jobs (running them every 2nd Tuesday at 3:33am)
           | Static and dynamic configuration (CRUD'ing jobs at runtime,
           | like adding new scheduled tasks)         Re-try jobs
           | (customizable strategy, such as exponential back off)
           | Rate limit jobs         Expire jobs         Cancel jobs
           | Unique jobs         Batch executing         Handling graceful
           | shutdown (integration with your app server)         Get
           | metrics (status, health, progress, etc.)         Browse job
           | history         Web UI (nice to have)
           | 
           | And I'm sure I'm missing things too. These are only off the
           | top of my head based on features I tend to use in most
           | applications. In other words, this isn't a laundry list of
           | "nice to haves in theory", most of these are core or
           | essential features IMO. I use them in nearly every web app.
           | 
           | Rolling all of these things on your own would be a massive
           | undertaking. Tools like Celery and Sidekiq have been actively
           | developed for ~10 years now and have likely processed
           | hundreds of billions of jobs through them to iron out the
           | kinks.
           | 
           | Even if you managed to do all of that and created it as a
           | Postgres extension (which I think is doable on paper), that's
           | only half the story. Now you'd have to write language
           | specific clients to interface with that so you can create
           | jobs in your application using a nice API. This would be a
           | very welcome project but I think we're talking a year+ of
           | full time development time to release something useful that
           | supports a few languages, assuming you're already an expert
           | with writing pg extensions, have extension knowledge about
           | job queues and know a few popular programming languages to
           | release the initial clients.
        
           | c-cube wrote:
           | You can extend redis with lua to perform atomic operations,
           | too. And it has bindings in a lot of languages by virtue of
           | its protocol being so simple.
        
           | rantwasp wrote:
           | please don't. source control/versioning/deployment become a
           | nightmare
        
             | taffer wrote:
             | I never understood this point:
             | 
             | - You have a separate schema for your procs
             | 
             | - You define your procs in files
             | 
             | - You write tests for your procs
             | 
             | - You put your files into git
             | 
             | - Then, in a transaction, you drop your old schema and
             | deploy the new one
        
       | fiznool wrote:
       | We've seen success in our current node.js application using
       | pgboss [1] for job queueing. Since the primary database is
       | Postgres, it's nice to not introduce another dependency into the
       | system, and take advantage of the infrastructure that we already
       | have. The library supports most of what you'd expect from a job
       | queue, and is crucially well maintained!
       | 
       | That being said, I agree with other comments that this is
       | somewhat swimming against the tide. Redis is much more commonly
       | used, and so if you don't mind adding another service into the
       | mix, I'd probably recommend going with Redis instead.
       | 
       | [1] https://github.com/timgit/pg-boss
        
         | queicherius wrote:
         | Out of curiosity, how many jobs are you queuing on average and
         | how did it perform for you at that level?
         | 
         | I have been thinking about using the library, mainly because
         | it's less operational hassle than running another service that
         | has to be sticky to one host with persistent storage and
         | backups.
        
       | skunkworker wrote:
       | Hopefully one day projects like que[1][2] will become stable and
       | battle-tested enough to use in a production environment. Until
       | then I'll be using something like sidekiq (if you're going for a
       | client-side job queue, eg: the clients don't really know about
       | each other and only have rate-limiting, not true throttling).
       | 
       | With Postgres you also need to worry about high churn, especially
       | since you are creating/locking/deleting rows constantly. This can
       | be alleviated through a variety of means, of which personally I
       | would use per-day table partitioning and truncate older
       | partitions on a cron, not to mention the sharp increase in
       | database connections to the host now required.
       | 
       | Ignoring the literal elephant in the room of synced writes to the
       | store. Redis can be used quite effectively in a blocking manner
       | with RPOPLPUSH/LMOVE(6.2+) for a reliable queue, allowing an item
       | to not be lost because atomically the pop and push from two
       | different lists are done together.
       | 
       | [1] https://github.com/que-rb/que [2]
       | https://gist.github.com/chanks/7585810
        
       | dutchbrit wrote:
       | Why use Redis when you have more suitable solutions like RabbitMQ
       | or Kafka?
       | 
       | Obviously depends on the scale and needs of a project, Postgres
       | etc is fine for simple queues.
       | 
       | I often see people waste unnecessary time by writing their own
       | complex solutions, resulting in increasing technical debt, when
       | you already have perfectly suitable open source options available
       | that do a better job..
        
         | brentjanderson wrote:
         | I have had two separate benefits from using just Postgres in
         | apps:
         | 
         | 1. It's one tool to learn. No need to upskill on topics,
         | partitions, consumer groups, routers, brokers, etc. 2. It's one
         | transaction boundary. You get a lot of "magic" for free when
         | you know that a rollback will not only revert application state
         | but also NOTIFY events published and queue jobs in flight. This
         | alone makes Postgres a powerful option.
        
       | jugg1es wrote:
       | Why would you use a DB or Redis for job queuing when there are
       | extremely inexpensive and highly optimized queuing systems in
       | every major cloud provider?
       | 
       | I've had so many horrible experiences with DB-based queuing over
       | my career once you get to a certain scale. Just use a message
       | bus, bro!
        
         | mixmastamyk wrote:
         | Some folks are not using the cloud for everything. Portability
         | may be an issue too.
        
       | deckard1 wrote:
       | I imagine most people using Redis as a queue were already using
       | it as a cache and just needed some limited queuing ability. Much
       | like how places end up using a DB as a queue.
       | 
       | Using a DB as a queue has been a thing for a very long time.
       | Every billing system I've seen is a form of a queue: at a certain
       | point in the month a process kicks off that scans the DB and
       | bills customers, marking their record as "current".
       | 
       | The challenge is always going to be: what if the worker dies.
       | What if the worker dies, the job is re-ran, and the customer is
       | billed twice. Thank god it's been many years since I've had to
       | touch cron batch jobs or queue workers. The thought of leaving
       | the office knowing some batch job is going to run at 3am and the
       | next morning might be total chaos... shudder.
        
         | cerved wrote:
         | How would double billing occur if the worker dies. The way I
         | would design this, the billing request and bill would be
         | committed atomically such that a request can only be completed
         | with exactly one associated bill. If the worker dies, no bill
         | is created.
         | 
         | Also I'd detect a worker has died by recording the start-time
         | and using a timeout. Furthermore I'd requeue requests as
         | distinct new entities. A requeued entity would have a self-
         | referencing nullable FK to reference its parent request.
        
           | deckard1 wrote:
           | Murphy's law says that you're going to screw this up any
           | number of ways. Maybe not you, specifically, but perhaps your
           | coworker.
           | 
           | > committed atomically
           | 
           | Complex billing systems don't work that way. Worker processes
           | are not always in these neat boxes of "done" or "not done".
           | Much like rollbacks are a developer myth. If a process was
           | that trivial then you wouldn't need a queue and workers in
           | the first place!
           | 
           | > Also I'd detect a worker has died by recording the start-
           | time and using a timeout.
           | 
           | There are many ways to solve this and many ways to get it
           | wrong. Not working in UTC? Oops, better hope nothing runs
           | during daylight savings changeover. Parent process died but
           | worker finished job? Let's hope the parent isn't responsible
           | for updating the job completion status. Large job is
           | borderline on the timeout? Better hope parent process doesn't
           | restart the job while the worker is still working on it.
           | Network partition? Ut oh. CAP theorem says you're out of luck
           | there (and typically there is at least one network hop
           | between the DB server/controlling process and the system
           | running the workers).
           | 
           | Probably the more straightforward solution is to give each
           | worker an ID and let them update the database with the job
           | they pick up. Then, something robust like systemd, would
           | monitor and restart workers if they fail. When a worker
           | starts, they find any jobs where table.worker_id = myId and
           | then start back on those. But you still have network
           | partitions to worry about. Again, not at all trivial.
        
             | cerved wrote:
             | I can't speak to billing since that's not my area but in
             | general I disagree. A process doesn't need a non-trivial
             | flow to need a queue and workers. I've done this with route
             | optimization requests. They have a priority and need to be
             | run in discrete processes. Once they are done, they are
             | done.
             | 
             | The database records the start time, in the databases
             | system time, in UTC.
             | 
             | The parent process can't update the job status, it can only
             | cancel jobs.
             | 
             | There's a pool of workers. Each working may fetch a pending
             | request or wait. Once it has fetched a request, which sets
             | it as pending, no other worker is able to fetch this
             | request. Only this worker may alter the status of its
             | request and it may only set it to completed, and it may
             | only do so with its results.
             | 
             | The worker periodically writes its status to a separate
             | table. If such status has not been written in a timely
             | manner, the worker is assumed to be stalled and a new child
             | request is created, available to other workers.
             | 
             | Putting this logic into the database instead of a parent
             | process, which managed a shared concurrent queue, has been
             | a by far more pleasant solution. Granted, we're not talking
             | large numbers of concurrent requests. Ten as opposed to ten
             | thousand. If we're talking hundreds of concurrent, short
             | requests, I can see this starting to become problematic.
             | 
             | I think there's huge difference in the needs of different
             | implementations of queues and workers so I'm interested to
             | hear more about workloads where this approach fails and
             | specifically in which regards. In my experience, it's been
             | pleasant.
        
           | jffry wrote:
           | For many billing activities, "days" is a small amount of time
           | delay. One very viable option to "the worker died" is to just
           | look for stuck jobs once the run is complete and kick out an
           | alert to let a human decide what to do.
        
       | larodi wrote:
       | yes, do you really need PDF convertors when you can have them as
       | PostgreSQL extensions.
       | 
       | the point (ok, one point of many) of REDIS is that it is not the
       | main DB so you can have a sense of security and decoupling in the
       | architecture. Besides - there is no silver bullet for all things.
       | While you can have your app do everything with PostgreSQL (and
       | much more with Oracle, something people dislike it about), the
       | fact itself does not mean is a good design decision or is a more
       | stable decision.
       | 
       | Because when you have redis for sessions, kafka for event
       | streams, postgre (or else) for data storage - you have components
       | that can fail separately and thus the system degrades gracefully.
        
         | xupybd wrote:
         | Yes but have you built a tool for the job or a Rube Goldberg
         | machine?
         | 
         | Complexity comes at a huge cost. Only add it when the benefits
         | out weigh the costs.
         | 
         | You could start out building a product that could scale to
         | millions of uses overnight. But if you do that you've spent
         | months building something with no users. You could have been in
         | the market already, building revenue already. Your requirements
         | will change as you find your market fit and you'll need to
         | change things. The less you have built the easier it is to
         | change. Why not leave the million user capabilities until you
         | actually need it?
        
           | berkes wrote:
           | > Yes but have you built a tool for the job or a Rube
           | Goldberg machine? > Complexity comes at a huge cost. Only add
           | it when the benefits out weigh the costs.
           | 
           | Im honestly unsure if you mean this as opposing "doing
           | everything in Postgres" or as opposing "throw more services
           | on the stack".
           | 
           | Because both are true for the statements. You have that
           | complexity, regardless of where you implement it. Either you
           | are building the rube-goldberg machine inside of postgres out
           | of modules, config and SQL or outside of postgres with
           | additional services.
           | 
           | The only way to really solve this is to avoid building that
           | RG machine in the first place. In this case: don't have
           | queues. In practice that probably means introducting
           | complexity elsewhere, though.
        
             | xupybd wrote:
             | Most web apps I've worked on have had queues in the
             | database. The operational simplicity of only having a
             | database has far outweighed the code complexity of using
             | the relational database as a queue. However the performance
             | would not have scaled. Luckily the performance was well
             | above peak load of the actual systems.
        
       | chmod775 wrote:
       | Funny. My approach is usually the other way around: Can I get
       | away with _just_ Redis?
        
         | siscia wrote:
         | It was the reason with https://zeesql.com formerly know as
         | https://RediSQL.com was written.
         | 
         | My clients seems rather happy with the project.
        
         | grenoire wrote:
         | My arsenal is Redis to SQLite to pg.
        
         | Jemaclus wrote:
         | I sorta do this, but my approach is more Redis-first than
         | _just_ Redis. I try to see if I can use Redis for 99.999% of my
         | operations and have a more durable store (like Postgres or
         | something) as a "backup". The nature of Redis is such that even
         | with some persistence features, we kinda have to assume that
         | the data could go away at any minute, so I always build some
         | way to rebuild Redis as fast as possible.
         | 
         | But I've run billions of queries per day against a single Redis
         | instance with zero failures serving up traffic to large,
         | enterprise-level customers with no downtime and no major issues
         | (knock on wood). The only minor issues we've run into were some
         | high-concurrency writes that caused save events and primaries
         | to failover to replicas, and resulted in a few minutes downtime
         | at the beginning of our experiments with Redis-first
         | approaches, but that was easily mitigated once we realized what
         | was happening and we haven't had a problem since.
         | 
         | Huge fan of a Redis-first approach, and while the haters have
         | _some_ valid points, I think they're overstated and are missing
         | out on a cool way to solve problems with this piece of tech.
        
         | [deleted]
        
         | wokwokwok wrote:
         | Why would you choose to use a system that doesn't scale by
         | default?
         | 
         | Single user local applications? Fair.
         | 
         | Web applications? Very strange choice imo.
         | 
         | Reddis is great, but it is *not* a database, and it's
         | thoroughly rubbish at high load concurrency without clustering,
         | which is (still) a massive pain in the ass to setup manually.
         | 
         | Of course, you can just use a hosted version off a cloud
         | provider... but, it's generally about 10x more expensive than
         | just a plain old database.
         | 
         | /shrug
         | 
         | I mean, sure, it's (arguably...) step up from just using
         | sqlite, but... really, it's _easy_ , and that's good... but it
         | isn't good enough as a general replacement for having a real
         | database.
         | 
         | (To be fair, sqlite has got some pretty sophisticated
         | functionality too, even some support for concurrency; it's
         | probably a step _up_ from redis in many circumstances).
        
           | Hendrikto wrote:
           | > sure, it's (arguably...) step up from just using sqlite
           | 
           | How so? What's wrong with SQLite?
        
             | wokwokwok wrote:
             | I suppose it's a bit more suitable to networked services
             | than sqlite is, since it's natively a web api, and sqlite
             | is natively a local-only solution.
             | 
             | ...but, I started writing about clustering and the network
             | API, but, I can't really articulate why those are actually
             | superior in any meaningful way to simply using sqlite, and
             | given the irritation I've had in maintaining them in
             | production in the past...
             | 
             | I guess you're probably right. If I had to pick, I'd
             | probably use sqlite.
        
           | tehbeard wrote:
           | > Why would you choose to use a system that doesn't scale by
           | default?
           | 
           | By all accounts Postgres seems to be a pain to scale off a
           | single machine, much more so than redis.
        
             | cbsmith wrote:
             | Which PostreSQL scaling pain point would you be referring
             | to? Citus?
        
             | hardwaresofton wrote:
             | Postgres is not as automatic as other tools but is mostly
             | an artifact of it being around so long, and focus being on
             | other things. Few projects have been around and stayed as
             | relevant as postgres.
             | 
             | Most of the time, you really _don 't_ need to scale
             | postgres more than vertically (outside of the usual read
             | replicas), and if you have tons of reads (that aren't
             | hitting cache, I guess), then you can scale reads
             | relatively easily. The problem is that the guarantees that
             | postgres gives you around your data are research-level hard
             | -- you either quorum or you 2pc.
             | 
             | Once you start looking into solutions that scale easily, if
             | they don't ding you on performance, things get murky
             | _really_ quick and all of a sudden you hear a lot of
             | "read-your-writes" or "eventual consistency" -- they're
             | weakening the problem so it can be solved easily.
             | 
             | All that said -- Citus and PostgresXL do exist. They're not
             | perfect by any means, but you also have solutions that
             | scale at the table-level like TimescaleDB and others. You
             | can literally use Postgres for something it was never
             | designed for and still be in a manageable situation -- try
             | that with other tools.
             | 
             | All _that_ said, KeyDB[0] looks pretty awesome.
             | Multithreaded, easy clustering, and flash-as-memory in a
             | pinch, I 'm way more excited to roll that out than I am
             | Redis these days.
             | 
             | [0]: https://github.com/EQ-Alpha/KeyDB
        
               | victor106 wrote:
               | KeyDB is really good. We use it in production to achieve
               | millisecond response times on millions of requests per
               | second.
        
               | killingtime74 wrote:
               | Are you Google search? How do you have millions of
               | requests per second?
        
               | manigandham wrote:
               | Lots of industries and applications can get to that
               | scale. My last few companies were in adtech where that is
               | common.
        
               | qeternity wrote:
               | It's likely millions of internal requests, which as
               | another comment mentions, is common in a number of
               | industries.
        
               | hardwaresofton wrote:
               | It really looks absolutely amazing, I feel guilty because
               | I want to run a service on it, there's almost no downside
               | to running it everywhere you'd normally run Redis.
               | 
               | Also in the cool-redis-stuff category:
               | 
               | https://github.com/twitter/pelikan
               | 
               | Doesn't have the feature set that KeyDB has but both of
               | these pieces of software feel like they could the basis
               | of a cloud redis product that would be _really_ efficient
               | and fast. I 've got some plans to do just that.
        
           | arpa wrote:
           | redis is not a database. It's a key-value based cache. If
           | you're using it as a database, you're gonna have a bad time.
        
             | cube2222 wrote:
             | Why so? It has persistence and I'm not aware of any
             | reported data loss happening with it.
             | 
             | It's also got loads of complex and useful instructions.
        
               | arpa wrote:
               | Data loss can occur between flushes to disk, for example
               | (by default every 2 seconds / every I_FORGOT megabytes).
               | Perhaps (most likely) it is possible to fine-tune the
               | configuration to have redis as a very reliable data
               | store, but it doesn't come with such settings by default,
               | unlike most of RDBMSes.
        
               | miohtama wrote:
               | Not all use cases require reliable data storage and it is
               | ok lose few seconds of data. Think simple discussion
               | forums, internal chat applications. There are some
               | scenarios where ease of use and a single server
               | scalability pays off in the faster development and devops
               | cost.
        
               | charrondev wrote:
               | I work on a SaaS community forums service and I can
               | assure you data loss is not acceptable to our clients.
               | 
               | As a result we use MySQL w/ memcached, although we are
               | considering a swap to redis for the caching layer.
        
               | [deleted]
        
               | EvilEy3 wrote:
               | > Not all use cases require reliable data storage and it
               | is ok lose few seconds of data. Think simple discussion
               | forums, internal chat applications.
               | 
               | That is definitely not ok. I'd be really pissed as a user
               | if I wrote a huge comment and it suddenly disappeared.
        
               | miohtama wrote:
               | It only disappears if there is a catastrophic failure.
               | The likelihood for such thing to happen when you write a
               | huge comment are less than jackpot in Las Vegas, a
               | sensible risk tradeoff for better development experience
               | and cost.
        
               | TheCoelacanth wrote:
               | How is that a sensible tradeoff compared to just using
               | something that was actually designed to be a database
               | when you need a database?
        
               | obstacle1 wrote:
               | > a sensible risk tradeoff
               | 
               | Note the tradeoff doesn't make sense as soon as you're
               | operating at a meaningful scale. A small likelihood of
               | failure at small scale translates to "I expect a failure
               | a million years from now", whereas at large scale it's
               | more like "a month from now". Accepting the same percent
               | risk of data loss in the former case might be OK, but in
               | the latter case is irresponsible. Provided whatever
               | you're storing is not transient data.
        
               | arpa wrote:
               | GP was asking why redis is not a reliable storage
               | solution/database. Redis is great as an unreliable (not
               | source-of-truth) storage.
        
               | kwdc wrote:
               | For that temporary use case, how does it compare to
               | memcached?
        
               | jasonwatkinspdx wrote:
               | Mostly boils down to Redis having a richer API, and
               | memcached being faster / more efficient. The new EXT
               | store stuff allows you to leverage fast ssd's to cache
               | stupid huge datasets. Memcached is also one of the most
               | battle tested things out there in open source. I've used
               | them both plenty over the years, but tend to lean towards
               | memcached now unless I really need some Redis API
               | feature.
        
               | jasonwatkinspdx wrote:
               | Redis is inherently lossy as a matter of basic design,
               | and that's not even touching on the many other issues
               | born of NIH solutions rampant within it. You may not hit
               | the behavior until you push real loads through it. If you
               | talk to anyone who has, I'm confident they'll agree with
               | the criticism that while it may be an excellent cache, it
               | should never be treated as a ground truth database. It's
               | excellent as a slower memcachd with richer features. It's
               | not a database. You can also read Aphyr's reports over
               | the years, which to be utterly frank, bent over backwards
               | to be charitable.
        
           | zigzag312 wrote:
           | I would say Redis with RediSearch is a database.
        
         | dionian wrote:
         | well... how much uptime do you need and how much resources do
         | you have to devote towards achieving your desired uptime
        
         | himinlomax wrote:
         | If your data has no value whatsoever, sure.
        
           | cheald wrote:
           | Redis can be made durable. The WAIT command allows you to
           | guarantee writes to a quorum of nodes, and it can be
           | configured for on-disk persistence rather easily.
           | 
           | That said, due to it's single-threaded nature, blocking on
           | quorum writes is likely to bottleneck your application under
           | any kind of significant load. It really shines at volatile
           | data, and while it can work for valuable data, there are
           | better tools for the job.
        
             | himinlomax wrote:
             | > Redis can be made durable
             | 
             | Postgres, SQLite and many others are durable by default.
             | Almost all so-called databases are like that. When you need
             | a database, 90% of the time, you want durable. People make
             | mistakes, developers are people, developers make mistakes,
             | and one such mistake is assuming that Redis is like other
             | databases in being durable by default when it's not. It's
             | not conjecture, I've seen it done _in production_.
        
               | YetAnotherNick wrote:
               | Why does by default matters so much to you? Redis has
               | persistence support and it can be easily turned on.
        
               | himinlomax wrote:
               | vi has persistence support
        
           | gizdan wrote:
           | Why? Redis has persistent data stores, and doesn't
           | necessarily need to be memory only.
        
             | himinlomax wrote:
             | When you commit to Postgres and the server acknowledges it,
             | you know for sure that it's been written to disk and that
             | it will survive anything but a hardware disk loss (or,
             | obviously, system/FS bug). When clustering is enabled with
             | synchronous writes, you can also be confident that the data
             | has been recorded to another node as well.
             | 
             | With redis clustering, there's no guarantee the data has
             | been replicated. I'm not even sure there's any guarantee
             | the data you just asked to be recorded be stored even once
             | if a power outage happens immediately after the request.
        
         | mianos wrote:
         | If you want to use transactions for multi table updates you are
         | probably best to use a proper rdbms. Not to mention read
         | consistency. If you only have one table Redis may be fine. I
         | usually find my work grows beyond one table. Redis does make an
         | unbeatable cache.
        
         | hardwaresofton wrote:
         | Hey I assume this is like a joke and not too serious, and we'd
         | all switch off when things got a bit hairy, but I sure hope
         | other readers can tell.
         | 
         | I am literally in the middle of digging a company out of this
         | mistake (keeping Redis too long) right now. If your
         | software/data is worth something, take a week or a month and
         | figure out a reasonable schema, use an auto-generation tool,
         | ORM, or hire a DB for a little bit to do something for you.
         | Even MongoDB is better than redis if your're gonna do something
         | like this.
        
           | tluyben2 wrote:
           | > take a week or a month and figure out a reasonable schema,
           | use an auto-generation tool, ORM, or hire a DB for a little
           | bit to do something for you.
           | 
           | Sorry but am I the only one who is very worried about the
           | state of software? There are people who drank so much of the
           | schemaless (which was not an actual issue for any dev worth
           | her salt to begin with) that you have to dispense this kind
           | of advice? I find that bordering on criminal if someone did
           | that to you and carries the title programmer.
           | 
           | Again, maybe that is just me.
           | 
           | Edit: not an attack on the parent: good advice. Just didn't
           | know it was that bad. And sad.
        
           | chrisdinn wrote:
           | If you store protos in your Redis keys (like most people
           | using "NoSQL" for data storage), this comment doesn't have
           | much punch. Pretty sure we all can think of some pretty high
           | profile examples of NoSQL + structured data working very very
           | well at scale.
        
             | hardwaresofton wrote:
             | I'm not trying to get on people who are using redis as a
             | cache (for photos, or any other ephemeral data).
             | 
             | The idea I was trying to get at was using redis to store
             | data traditionally reserved for OLTP workloads.
             | 
             | > Pretty sure we all can think of some pretty high profile
             | examples of NoSQL + structured data working very very well
             | at scale.
             | 
             | Well that's the thing, you very rarely hear of companies
             | who cursed their decision early on to use NoSQL when they
             | realized that their data was structured but in 20 different
             | ways over the lifetime of the product. Some datasets only
             | need light structure (key/value, a loosely defined
             | document, schema-included documents), and other things
             | should probably have a schema and be stored in a database
             | with a tight grip on that schema and data
             | consistency/correctness. Please don't use redis in that
             | latter case.
        
               | chrisdinn wrote:
               | I mean, Google was built on protos in a "NoSQL" database
               | (BigTable). I think maybe you are overindexing on
               | personal experience.
        
               | hardwaresofton wrote:
               | Sure, but:
               | 
               | 1) 99.9% of internet-facing/adjacent businesses are not
               | Google and will never reach even 1% of Google's scale
               | 
               | 2) Proto + BigTable is very different from just throwing
               | stuff in redis/mongo. Proto schemas are compile-time
               | enforced, which is great for some teams and might slow
               | others down. Google enforces more discipline than your
               | average engineering team -- this is overkill for most
               | engineering teams.
        
               | dionian wrote:
               | operations aside, the big problem in my experience
               | dealing with these systems is you are extremely limited
               | (on purpose) and cant do much
               | sorting/filtering/aggregation/querying. that's what
               | really makes true db's powerful. I love redis for what it
               | does, i just dont think it replaces a DB well in many
               | cases where its non-transient data
        
       | gsvclass wrote:
       | I just did a thread on the topic myself
       | https://twitter.com/dosco/status/1402909104012623873
        
       | truth_seeker wrote:
       | While you are at it, don't forget to use UNLOGGED tables.
       | UNLOGGED == In Memory.
       | 
       | But if you must use disk based table for Job queueing, set
       | fillfactor = 50. This takes care of heavy updates.
       | 
       | Indexes are helpful but costs memory and CPU, so always make sure
       | you partition the table based on job_type for performant pending
       | job query.
       | 
       | I wouldn't recommend using LISTEN/NOTIFY unless you are okay with
       | "at most once" semantics. I have used disk table based approach
       | for PUB/SUB to replace Kafka. More fine tuned approach will also
       | allow (job_type, consumer_group, publisher) as a partition key.
       | 
       | Ref - https://www.postgresql.org/docs/current/sql-
       | createtable.html
        
         | osigurdson wrote:
         | My understanding is, UNLOGGED means that changes are not
         | written to the WAL and data can be lost in the event of an
         | unscheduled shutdown. It doesn't mean that the table only
         | exists in memory however - the data is still eventually
         | persisted to disk.
        
           | truth_seeker wrote:
           | From the official doc link i shared already:
           | 
           | UNLOGGED
           | 
           | If specified, the table is created as an unlogged table. Data
           | written to unlogged tables is not written to the write-ahead
           | log (see Chapter 29), which makes them considerably faster
           | than ordinary tables. However, they are not crash-safe: an
           | unlogged table is automatically truncated after a crash or
           | unclean shutdown. The contents of an unlogged table are also
           | not replicated to standby servers. Any indexes created on an
           | unlogged table are automatically unlogged as well.
        
             | andrewflnr wrote:
             | Uh, yeah, that confirms what osigurdson said, not that
             | they're in-memory. For that IIRC you need to mount a ram-
             | disk in your OS and put the table on that. Definitely also
             | make it UNLOGGED, though.
        
         | BenoitEssiambre wrote:
         | I came here to post this. CREATE TABLE UNLOGGED is basically a
         | redis in your postgres (with periodic dumps to disk), but with
         | transactional joins into more persistent data.
        
         | kureikain wrote:
         | Thanks so much for this. I have been looking for this in
         | Postgres. MySQL has memory back table which helped us a lot in
         | the past.
        
         | tpetry wrote:
         | Unlogged tables ARE NOT in memory tables. They are written to
         | disk like every other table, but unlogged tables don't have use
         | the wal and are therefore much lighter.
        
       | [deleted]
        
       | gsvclass wrote:
       | And others like this thread on building a MongoDB like JSON
       | database in Postgres in one line.
       | https://twitter.com/dosco/status/1401413712842346501
        
       | petepete wrote:
       | I've used PostgreSQL in the first two scenarios and would love an
       | opportunity to in the third. It's worked really well and hasn't
       | caused any problems under decent loads.
       | 
       | The one feature of Redis I'd love to have supported in PostgreSQL
       | is to be able to set a TTL on a record. On a recent project where
       | we could only save personal data for so long, it was a must have
       | feature so we had to use Redis for that purpose instead.
        
         | halayli wrote:
         | you can do that using postgres portioning and a trigger to drop
         | old partitions from the table.
        
         | phiresky wrote:
         | Redis EXPIRE doesn't actually delete any data after it expires
         | though. Active deletion happens at random, so you can easily
         | still have expired values in memory months later:
         | 
         | > Redis keys are expired in two ways: a passive way, and an
         | active way.
         | 
         | > A key is passively expired simply when some client tries to
         | access it, and the key is found to be timed out.
         | 
         | > Of course this is not enough as there are expired keys that
         | will never be accessed again. These keys should be expired
         | anyway, so periodically Redis tests a few keys at random among
         | keys with an expire set. All the keys that are already expired
         | are deleted from the keyspace.
         | 
         | > Specifically this is what Redis does 10 times per second:
         | 
         | 1. Test 20 random keys from the set of keys with an associated
         | expire. 2. Delete all the keys found expired. 3. If more than
         | 25% of keys were expired, start again from step 1.
         | 
         | So really it's not much better than doing `SELECT value from
         | keys where key=? and expires > now()` with manual deletion.
         | Though I agree that it can be more convenient.
        
           | mateuszf wrote:
           | > Redis EXPIRE doesn't actually delete any data after it
           | expires though.
           | 
           | I guess OP likes the simplicity that built-in expiration
           | provides. In your example - all selects reading the value
           | will need to have this expiration check. And also some
           | scheduled process will have to be written to actually delete
           | the values.
        
             | jaggederest wrote:
             | I would access the table through a view that had that query
             | built into it.
             | 
             | create table all_items(id integer, value text, expires
             | timestamp);
             | 
             | create index all_item_expiry on all_items(expires);
             | 
             | create view items as (select id, value, expires from
             | all_items where expires > now());
             | 
             | Then you can treat items as your base table and postgres
             | neatly allows INSERT/UPDATE/DELETE from it. You'll need a
             | job to clean up expires < now() items but it can be done at
             | whatever arbitrary interval you like, could even be a
             | trigger in PG if you were feeling spicy.
        
           | yukinon wrote:
           | I would contend that it really depends on what one would
           | prioritize the most in that scenario. In my experience, Redis
           | EXPIRE means it is not selectable. That is the primary
           | requirement for a lot of development around EXPIRE/TTL. It is
           | OK if it is still in memory in some form, it still won't be
           | accessible by applications SDK or CLI. Since Redis 2.6 the
           | expire error is from 0 to 1 milliseconds which is accurate
           | enough for many use cases. Not to mention, Redis will handle
           | that deletion for you. You don't need to run a deletion job
           | and/or include an additional condition on a query.
           | 
           | Additionally, the expire/ttl/get/set in Redis is incredibly
           | easy to use (and abuse, hence the OP article). Some team's
           | criteria is limiting the amount of moving parts - and that's
           | great. Don't use Redis and use a relational database for
           | everything such as what you mentioned. Use it as a queue, a
           | cache, a message broker, etc..
           | 
           | Other teams may care less about an extra moving part if it
           | means their code will look simpler and they leverage
           | relational databases for their more common usecases.
        
             | nicoburns wrote:
             | You could very easily create a database view that applies
             | the where query, and even prevent your db user from
             | selecting from the underlying table.
             | 
             | You could also use a library like PG boss to handle the
             | cleanup task.
        
             | petepete wrote:
             | The fewer moving parts bit is key.
             | 
             | It was a government project, written by one team (us) to be
             | maintained by another.
             | 
             | The data that needed to be expunged was user signup data,
             | upon completion the record was sent to a CRM and the Redis
             | record destroyed. If the signup wasn't finished it's
             | automatically removed after 12 hours.
             | 
             | Referential integrity wasn't really a problem, emails are
             | unique and if we clash the two records are auto-merged by
             | the CRM.
             | 
             | Setting up scheduled tasks, triggers, partitioning, cron,
             | etc, is just more things that can go wrong. If they go
             | wrong _and_ go unnoticed we end up with piles of data we
             | shouldn't have. That would be many different kinds of bad.
             | 
             | Doing `redis.set(k, v, ex: 12.hours)` or whatever is just
             | easier.
        
           | arpa wrote:
           | hmmm, I disagree that it's not better. Select operation
           | implies index scan most likely with O(log n), while GET
           | operation is essentially O(2-3). And you also have to run
           | DELETE on sql to remove the expired keys.
           | 
           | Oh, and i'm not entirely sure about the part about redis
           | active expiry (disabled by default, default is remove expired
           | on lookup - lazy); you're talking about key eviction which
           | applies to all deleted keys and AFAIR happens only when
           | certain watermarks are hit. Since it happens in ram, it's
           | also faaaast, unlike SQL DELETE, which will definitely
           | involve disk...
        
         | kureikain wrote:
         | exact this. even MongoDB has TTL index which is amazing nice.
        
         | sologoub wrote:
         | Wouldn't a simple scheduled batch job be enough to go through
         | periodically and drop records older than N days?
         | 
         | This would also give you audit logs, etc. As well as
         | flexibility to adjust business logic without updating the TTL
         | on all records.
        
           | petepete wrote:
           | The requirement for the project was that the deletion should
           | be handled in as fail-safe a manner as possible. Relying on
           | external jobs, cron, other scheduled tasks etc were deemed a
           | risk.
           | 
           | We were already using Redis for other things, it was the
           | logical choice for this scenario.
        
           | tbarbugli wrote:
           | Deleting large amount of data on Postgresql is expensive.
           | First you need an index on a column to select the expired
           | data, then you actually need to delete the rows which creates
           | a lot of garbage, is heavy and slow.
           | 
           | Creating and deleting a lot of data on PG is a pain because
           | of MVCC and vacuum. One useful trick is to partition data
           | into tables and to truncate/drop entire tables, drop/truncate
           | is instant and reclaims space immediately.
        
             | squiggleblaz wrote:
             | > Creating and deleting a lot of data on PG is a pain
             | because of MVCC and vacuum. One useful trick is to
             | partition data into tables and to truncate/drop entire
             | tables, drop/truncate is instant and reclaims space
             | immediately.
             | 
             | In this case, the requirement is that user data must only
             | be kept for a certain time and not longer.
             | 
             | If that time is a property of the record create and we're
             | allowed to keep the data with an error of a day, I guess
             | it's easy: We partition the table on a daily basis, and
             | delete any partition that is older than 28 days old.
             | Sometimes, a record will be closer to 29 days old when it's
             | deleted, but we accepted that, and it's easy enough to
             | write queries so that it's unavailable to the application
             | if it's more than exactly 28*24*60*60 seconds old if that's
             | our constraint.
             | 
             | If the requirement is to keep it based on the last of a
             | certain kind of use, we'd need to move it from one
             | partition to another if we need to keep it. For instance,
             | if we can keep data for 28 days after the user last longs
             | in, we can't just drop the partition - unless we've moved
             | the user each day they log in.
             | 
             | If we have that kind of a constraint, where data lifespan
             | is based on properties that change over the lifetime of the
             | data, is partitioning + drop actually a useful approach?
             | The drop will still be instant, I guess, but it's the
             | movement of data over its lifetime that concerns me here.
        
               | sologoub wrote:
               | You'd probably already have indices by user then. There
               | also other ways to store the data, for example not as a
               | time series of user actions, but as another data
               | structure. It just depends on your requirements and data
               | model.
               | 
               | The gist of the original article is asking whether you
               | could reduce tech stack complexity and use a single set
               | of technologies for more use cases, allowing to
               | understand the tool you are using better.
               | 
               | Also, note that a traditional database may or may not be
               | the right tool for the job - there are different
               | storage/behavior needs for writing lots of data,
               | processing lots of data and serving the results.
        
               | squiggleblaz wrote:
               | > You'd probably already have indices by user then.
               | 
               | Do you mean "you'd probably already have indices by user
               | then, so you won't be able to take advantage of quick
               | drops"?
               | 
               | > There also other ways to store the data, for example
               | not as a time series of user actions, but as another data
               | structure. It just depends on your requirements and data
               | model.
               | 
               | I suppose I want to be a bit more specific. Till now, if
               | I wanted to deal with this issue, I would just have used
               | a cron job and some indexes and taken on the load. But
               | what is an example of a nice way of dealing with this?
               | You get to make up plausible requirements and data model
               | details, perhaps drawing on particular requirements you
               | had when you faced a similar issue.
               | 
               | > The gist of the original article is asking whether you
               | could reduce tech stack complexity and use a single set
               | of technologies for more use cases, allowing to
               | understand the tool you are using better.
               | 
               | The legitimacy of a question in a free-form conversation
               | doesn't depend on the original inspiration, several
               | iterations ago. But even if it did, the question is here
               | is exactly about understanding the tools better and
               | whether we can use one tool instead of two, so by your
               | summary it's perfectly on topic.
               | 
               | > Also, note that a traditional database may or may not
               | be the right tool for the job - there are different
               | storage/behavior needs for writing lots of data,
               | processing lots of data and serving the results.
               | 
               | The subquestion here is precisely if we can get away with
               | just using Postgres for data with mandatory lifespan
               | requirements that vary over the lifetime of the data.
               | 
               | Extra tools come at a cost - that is the presumption of
               | this article and the thread it has spawned.
               | 
               | If we have to use Postgres, we need to assess the cost of
               | doing this in Postgres before we can decide whether or
               | not to pay the cost of other tools. Waving in the general
               | direction of other tools isn't helpful; it probably takes
               | as much work to enumerate candidates as it does to
               | calculate the cost of doing it in Postgres.
        
             | plank_time wrote:
             | You can do the equivalent by adding a Timestamp column with
             | index on your table and add "where now - timestamp <= TTL".
             | (Or some computational easier way that doesn't require math
             | on the query)
             | 
             | TTL on systems like Cassandra is pretty ugly and deleting
             | data is hard as you scale no matter how you do it. I don't
             | think Postgres would be able to implement a TTL that is
             | magically better than the rest.
        
               | nezirus wrote:
               | Deleting whole partitions is generally useful strategy.
               | It's like the difference between single inserts and batch
               | inserts (often huge performance difference, and much
               | lower IO)
               | 
               | Since you mentioned Cassandra and TTL, I'll mention
               | ClickHouse, very nice TTL options, splitting into smaller
               | partitions and using "ttl_only_drop_parts=1" has prove
               | itself in the production with big data ingestion rates.
               | 
               | Last, but not the least, I almost always prefer Postgres
               | for data storage needs, one can trust it to be safe and
               | fast enough. Only some specific situations warrant other
               | solutions, but it's a long way until that point (if
               | ever), and better not optimize too early.
        
             | cbsmith wrote:
             | I was going to say... the standard approach is to simply
             | partition the data by time, at which point it gets really
             | easy to manage.
        
             | sologoub wrote:
             | Sure, there are different design patterns for different
             | data sizes. This also adds querying complexity, so just
             | depends on what's needed.
             | 
             | Also, most applications have peak and low periods of load
             | that are predictable (e.g. users concentrated in a given
             | set of time zones) which make for good times to run
             | otherwise disrupting functions, etc.
        
           | radicalbyte wrote:
           | It would, and you'd also get nice features like proper
           | transactions and referential integrity.
           | 
           | BUT that comes at a cost. Doing that in Redis is just so
           | easy, and the vast majority of the time no-one is going to
           | notice a few concurrency errors, YOLO!
           | 
           | ;-)
           | 
           | To be serious: it's a nice tool to have in the box and is
           | amazing for inexperience/start developers because you can do
           | a lot with very little time investment or knowledge.
        
             | zigzag312 wrote:
             | What's wrong with Redis' transactions?
        
         | [deleted]
        
       | mxyzptlk wrote:
       | We tried it with just PostgreSQL and struggled with missed jobs.
       | We tried it with Redis + PostgreSQL and haven't looked back. I'll
       | take the blame for not engineering the first version adequately
       | (this was before upsert) but Redis has been useful in so other
       | ways that I'm glad it ended up in our architecture.
        
       | gsvclass wrote:
       | And a search engine in Posgres instead of Elastic Search
       | https://twitter.com/dosco/status/1400643969030127620
        
       | chrisallick wrote:
       | interesting... im in opposite camp. do you really need anything
       | other than key/value store and a data structure? ive used redis
       | exclusively for close to a decade.
        
       | munro wrote:
       | I feel like sanity is being restored. Maybe I'm lazy, but yeah I
       | use PostgreSQL for everything I can. Zookeeper is great, but I've
       | used PostgreSQL for distributed locking & queueing. Sometimes
       | it's quicker for me to write a 10-20 line algorithm than install
       | a new piece of infrastructure.
        
       | pilif wrote:
       | For pub/sub, I would recommend against using PostgreSQL if you're
       | doing it at any kind of scale because LISTEN ties up one
       | connection completely and Postgres connections are very expensive
       | compared to a redis connection.
        
         | postgressomethi wrote:
         | > LISTEN ties up one connection completely
         | 
         | I've seen this twice in this thread, but I don't know what that
         | means. Can you explain a bit?
        
           | sa46 wrote:
           | When a client connects to Postgres, Postgres creates a new
           | process just for that client. Separate processes are great
           | for isolation but it means Postgres connections are a bit
           | expensive (this is an active area of improvement). Postgres
           | really starts to struggle once you a have a few thousand
           | connections, even if those connections aren't doing anything.
           | 
           | The common workaround is to use a connection pooler like
           | PGBouncer so that clients reuse connections. This approach
           | doesn't work for LISTEN because typically a client will
           | listen for its entire lifecycle so you can't share
           | connections in a pool.
        
             | taffer wrote:
             | You need one connection per worker thread, and
             | realistically you would only have one worker per cpu core.
             | So how many LISTENing connections do you really need?
        
               | sa46 wrote:
               | You generally have more than one database connection per
               | thread. As an example, consider Node.js which acts like a
               | single threaded process. You'd probably want to be able
               | to handle more than 1 database query concurrently since
               | network latency tends to dominate OLTP requests.
               | 
               | How you setup LISTEN and NOTIFY is app dependent. In a
               | multi-tenant database, you could have 1 NOTIFY channel
               | per tenant.
               | 
               | As you scale, you probably do something that listens in a
               | smarter way, maybe 1 Go channel per client with a single
               | LISTEN instead of 1 database connection per client. The
               | downside is that now the app code is responsible for
               | tenant isolation instead of the database.
        
         | dvfjsdhgfv wrote:
         | I was looking for your comment here, thank you. I hope they
         | solve this problem somehow in a future release.
        
         | gunnarmorling wrote:
         | The other issue with LISTEN/NOTIFY is that a client will miss
         | any notifications sent while it is not running (e.g. due to
         | crash, update, etc.). An alternative would be logical
         | replication and change data capture (CDC), as provided for
         | Postgres by Debezium for instance. That way, any consumers
         | won't miss events during downtimes, as they'll continue to read
         | from the replication slot from the last offset they had
         | processed.
         | 
         | Disclaimer: I work on Debezium
        
           | jbverschoor wrote:
           | This is by design and very common with pub/sub.
        
           | supermatt wrote:
           | isnt this also the case for redis pubsub?
        
             | Nick-Craver wrote:
             | For pub/sub yes that's correct. For full info though: Redis
             | later added streams (in 5.x) for the don't-wan't-to-miss
             | case: https://redis.io/topics/streams-intro
        
             | philjohn wrote:
             | I'd need to check, but I think so.
             | 
             | Which is probably why if you don't want loss if consumers
             | go down a proper queue system (RabbitMQ, ActiveMQ, Amazon
             | SQS, Kafka if you don't care about ordering between
             | partitions) is the way I'd go.
        
           | hardwaresofton wrote:
           | Could you explain a little more about this (Debezium is
           | awesome btw)? So after creating the proper replication slots,
           | you set up change data capture via Debezium, then you listen
           | to the CDC stream _from the original DB you are connected to_
           | , correct?
           | 
           | This sounds like it might be vulnerable to dead consumers
           | causing WAL to pile up without the right settings/periodic
           | checks.
        
             | gunnarmorling wrote:
             | > then you listen to the CDC stream from the original DB
             | you are connected to, correct?
             | 
             | Yes, exactly.
             | 
             | > This sounds like it might be vulnerable to dead consumers
             | causing WAL to pile up without the right settings/periodic
             | checks.
             | 
             | There are some subtleties around this indeed. You should
             | monitor backlog of replication slots, so to identify
             | inactive consumers as you say. Also, there are some corner
             | cases you need to take care of: when listening to changes
             | from a low-traffic database on the same PG host that also
             | has a high-traffic database, the replication slot may not
             | be acknowledged often enough. Debezium mitigates this by
             | optionally writing changes to a dummy heartbeat table in
             | the low-traffic database, so to advance the replication
             | slot.
             | 
             | In Postgres 13 there's also a new option
             | "max_slot_wal_keep_size" which limits WAL size retained by
             | a replication slot. This prevents unbounded WAL growth, at
             | the risk of consumers to miss events if they are down for
             | too long.
             | 
             | All in all, proper monitoring and alerting is key.
        
           | MichaelMoser123 wrote:
           | then what's the point of this feature as part of the db, if
           | it doesn't do persistance?
        
           | BiteCode_dev wrote:
           | Then use streams.
        
       | dvfjsdhgfv wrote:
       | Has anyone tested how listen/notify in pg (for in-memory
       | databases) compares to pub/sub in Redis?
        
       | posharma wrote:
       | What happened to RabbitMQ? Is that not used as a queue any more?
        
         | lwn wrote:
         | Other than it's absence from this thread, Rabbit still seems to
         | be going strong.
        
       | didip wrote:
       | The issue with Redis is that it's distributed story is not great.
       | I wonder if their Raft experiment is finally GA or not.
       | https://github.com/RedisLabs/redisraft
        
       | qatanah wrote:
       | For simple task queues, yes pg is ok. For high loads, redis is
       | still better as PG generates a lot of WAL & connection overhead.
        
       | phendrenad2 wrote:
       | Wordpress historically didn't need a cache and just used page
       | visits and/or a cron job to kick off automated processes, backed
       | by MySQL. Is it fast? Yes. Is it nearly as fast as Redis? No. Do
       | you need it to be? Not for Wordpress lol
        
       | [deleted]
        
       | avinassh wrote:
       | Most of the times Redis's distributed lock works fine, however
       | one should know that its not fail proof and you might run into
       | really weird bugs
       | 
       | references:
       | 
       | 0 - https://aphyr.com/posts/283-jepsen-redis
       | 
       | 1 - https://martin.kleppmann.com/2016/02/08/how-to-do-
       | distribute...
        
         | antirez wrote:
         | 0 - unrelated to RedLock algorithm, since RedLock has its own
         | replication in the protocol.
         | 
         | 1 - I replied in a blog post refusing most of the arguments,
         | and then there was maybe another reply. You may want to read
         | the full thing to form an idea about the safety of RedLock.
        
       | leetrout wrote:
       | Watch out for transaction ID and sequence exhaustion if you have
       | a lot of things rolling through a table as a queue.
       | 
       | Postgres is awesome but logging and queuing in a table can cause
       | gotchas you won't have with redis.
        
       | yukinon wrote:
       | This is a great article because it outlines an alternative to
       | using Redis for any given use cases. If we don't constantly re-
       | evaluate our toolsets and their capabilities, it can lead to poor
       | decision making.
       | 
       | That being said, I've found Redis largely a pleasure to work with
       | for these use cases and don't really see a real incentive to
       | changing my current approach.
        
       | arpa wrote:
       | As German engineers are known to say: "Why make things simple
       | when complicated will do?"
        
         | Fannon wrote:
         | If you already have Postgres in your project and now you get
         | the requirement for an additional cache store, I think it will
         | be less complicated to reuse what you already have instead of
         | adding another DB to your stack.
         | 
         | Of course, at some point of scaling needs a dedicated cache
         | store will make sense anyway.
         | 
         | (Just some justifications from a german engineer :) )
        
           | arpa wrote:
           | True. However, since cache is usually transient, adding this
           | key-value store/cache is as easy as "docker run redis". No
           | need to provision block storage, and it's really lightweight
           | in comparison.
           | 
           | (that being said, I try really hard not to judge; after all,
           | i'm not without fault: it's 2021 and i'm using bash over cgi-
           | bin to serve web pages for my own hobby projects :))) )
        
             | rualca wrote:
             | > True. However, since cache is usually transient, adding
             | this key-value store/cache is as easy as "docker run
             | redis". No need to provision block storage, and it's really
             | lightweight in comparison.
             | 
             | If you're using postgres for caching only, as you do Redis,
             | then you also do not need to provision block storage.
             | 
             | If you happen to already have Postgres running for other
             | uses, you also do not need to provision block storage.
             | 
             | Finally, I would add that Redis clients such as Redisson
             | are resource hogs that cause performance problems on apps,
             | while pg clients are barely noticeable.
        
           | mosselman wrote:
           | Redis as a cache server requires different settings from
           | redis as a job queue. So you can't reuse the same server
           | anyway.
        
           | himinlomax wrote:
           | Redis is not a DB. Only idiots think Redis is a DB. To be
           | fair, I've met plenty of such idiots, so if you happen to be
           | one, you're not alone.
           | 
           | I'm not calling people names to be mean here; I'm trying to
           | save your lives.
        
             | cube2222 wrote:
             | Could you explain why Redis is not a DB instead of just
             | calling people who disagree with you idiots?
        
               | himinlomax wrote:
               | It offers no guarantees commonly afforded by databases. I
               | have seen several instances, real life instances of
               | people incorrectly assuming those guarantees when using
               | Redis. So sure, if you know exactly what you're doing,
               | you _can_ use Redis as a database, but the reality is
               | that it 's much safer to assume you can't as a first
               | approximation.
               | 
               | Calling people "idiots" here is a public service. I'm
               | sure there are non idiots who can prove me wrong, but
               | they're the minority among production users of Redis that
               | rely on it for non perishable data.
        
               | kube-system wrote:
               | Non-ACID databases are still databases. Everyone should
               | use the right tool for the job, and ACID is not a
               | requirement for every job.
        
             | squiggleblaz wrote:
             | > Only idiots think [x]... I'm not calling people names to
             | be mean here; I'm trying to save your lives.
             | 
             | If you don't mean to call people names, my advice is to not
             | call them names.
             | 
             | For instance, in this particular case, a perfectly
             | intelligent person whose expertise is in another area of
             | software development might have been misinformed. In this
             | case, you're going to achieve your goal (saving "lives")
             | for more effectively if you state (a) what you mean by "a
             | DB" and (b) why it is always a worse decision to use Redis
             | as a DB than to use some other tool for that end.
        
               | himinlomax wrote:
               | I mean it in the same way you'd put a "not to be operated
               | by morons" sticker on dangerous industrial equipment.
               | Redis offers little to no guarantees, particularly in a
               | cluster, and unless your data is perishable (session
               | state, cache ...) and you now _exactly_ what you 're
               | doing, that's less than what most users expect. I've seen
               | it happen several times already.
        
               | squiggleblaz wrote:
               | > I mean it in the same way you'd put a "not to be
               | operated by morons" sticker on dangerous industrial
               | equipment.
               | 
               | I've never seen a sticker like that on dangerous
               | industrial equipment. I'm pretty sure if you tried to put
               | a sticker like that on dangerous industrial equipment,
               | you would not limit your liability one iota. You would
               | certainly receive at least this much pushback, and
               | probably more. Normally, for a warning to be effective,
               | it needs to say what it is warning about.
               | 
               | >Redis offers little to no guarantees, particularly in a
               | cluster, and unless your data is perishable (session
               | state, cache ...) and you now exactly what you're doing,
               | that's less than what most users expect. I've seen it
               | happen several times already.
               | 
               | It reads like you're making a different claim now than
               | you did before. Before, you said something like "you're
               | an idiot if you use Redis as a database", and we were
               | concerned by what you mean by "a database" and how
               | useless it is to just call someone an idiot. Now it seems
               | like you mean "you'd have to be an expert or an idiot if
               | you decide to use Redis at all for any purpose".
               | 
               | Now, I think it's still useless to call someone an idiot
               | - it would be more productive to play with a cat than
               | call someone an idiot for using a widely regarded tool.
               | 
               | In this form we don't need to worry about what you mean
               | by a database - it's the tool that is the signal, and you
               | need to take into consideration its properties before you
               | decide to use it, rather than reading marketing copy or
               | looking for a point on your resume.
               | 
               | And I think trustworthy, expert, well-trained people can
               | make judgements that turn out to be wrong. A person can
               | make a very good case to use Redis which turns into a
               | nightmare by future requirements changes or by an
               | incorrect weighting of the pros and cons.
               | 
               | I think you're just trying to say something which I would
               | express as "The burden of proof lies on the person who
               | wants to add Redis to a system, and on any additional use
               | of Redis in a system which already has it. If you're
               | considering simplifying a system which uses Redis and
               | something else so that it uses only one of them, you
               | almost certainly want to remove Redis rather than the
               | other tool." If so, I think that's fair (even though the
               | word "idiot" will make it harder to get your message
               | across, and the mention of "as a database" is a
               | distraction - you can only use Redis as a database and
               | the question is how much persistence do you need and how
               | much do you get), but I think you should have added the
               | why to the original message: "Redis offers little to no
               | guarantees, particularly in a cluster, and unless your
               | data is perishable (session state, cache ...) and you now
               | exactly what you're doing, that's less than what most
               | users expect." Like this, it emphasises the risk of
               | "accidental creep" as well as the persistence limitations
               | you're worried about.
               | 
               | After all, any idiot who reads your original text is
               | going to ignore it. If they're willing to decide based on
               | reputation, Redis is a well known tool used by many
               | businesses and you're a pseudonymous commenter on social
               | media. If they're willing to decide based on facts, you
               | didn't give them any. And if they're willing to decide
               | based whether or not they assess themselves as an idiot,
               | they aren't going to assess themselves as an idiot.
        
               | himinlomax wrote:
               | https://www.google.com/search?sxsrf=ALeKk02SkvKxY0RsgH6we
               | x0w...
        
               | edgyquant wrote:
               | There are no images of that sticker on actual equipment
               | in that search
        
             | [deleted]
        
             | Fannon wrote:
             | It would have been better to frame it more generically as
             | "adding another store or db".
             | 
             | But the official self-description of Redis is the
             | following:
             | 
             | > Redis is an open source (BSD licensed), in-memory data
             | structure store, used as a database, cache, and message
             | broker.
             | 
             | Quote: https://redis.io/
             | 
             | I'd be interested on why exactly you think that Redis
             | shouldn't be called a DB?
        
               | himinlomax wrote:
               | Most products labelled as "database" afford most if not
               | all of the ACID guarantees. Redis offers none by default.
               | It's a database in the loosest sense of the word, so it's
               | a database alright, but then so is a text file.
               | 
               | The problem is that in practice, some people will assume
               | that it, being a "database", has similar properties to
               | the other databases they know when it doesn't. I've had
               | heated debates in meetings, and I wasn't just spewing an
               | opinion but pointing at the documentation. In the latest
               | case, that involved replication in a cluster, there is
               | absolutely no guarantee (or there wasn't at the time that
               | happened) of consistency between members. The developers
               | had assumed there was. It's the kind of assumption that
               | works just fine and dandy in a staging environment, but
               | breaks subtly and painfully in production, eventually.
        
               | squiggleblaz wrote:
               | > I'd be interested on why exactly you think that Redis
               | shouldn't be called a DB?
               | 
               | Normally I think when people say "Redis isn't a DB", they
               | mean "an update will always return successfully before
               | the data has been committed to storage".
               | 
               | If your user makes a change, you record that in a
               | Postgres database, you've informed the user of your
               | success, and the power goes out on the Postgres database,
               | then the user will not have been misled; the data is in
               | the database. If you did it with a persistent Redis
               | system, there is some reasonable probability that the
               | user has been misled and their data is no longer
               | available.
               | 
               | I don't think this is a good definition of a database. In
               | reality, we take various decisions, some of which
               | increase the likelihood data will be preserved across
               | some kinds of failures, some of which decrease the
               | likelihood. No one would say "you don't have a database
               | because you take daily backups from a single,
               | unreplicated Postgres instance". They say "you have made
               | decisions which increase the possibility of dataloss
               | under certain circumstances" (well, they say things I
               | translate that way).
        
               | Fannon wrote:
               | And there I'm unsure what we consider essential
               | characteristics of a database.
               | 
               | This characteristic you're mentioning is essential for
               | your persistence database, but maybe not so much for a
               | derived read-only databases (e.g. when using the CQRS
               | pattern). Those guarantees and characteristics are often
               | a trade off anyway.
        
               | squiggleblaz wrote:
               | I certainly agree with you. As engineers, we should be
               | comfortable with properties and requirements, and
               | selecting the combinations of tools and self-written code
               | which match them most maintainably.
               | 
               | Unfortunately, it's easier to think in terms of tools and
               | substitutes. For instance, it's difficult for us to say
               | "I used a combination of handwritten code and redis to
               | provide some of the guarantees that I would've got for
               | less handwritten code and Postgres". However, this is
               | exactly that sort of article: it argues that sometimes,
               | it is better to use Postgres even if the features you
               | want a more naturally provided by Redis.
               | 
               | > And there I'm unsure what we consider essential
               | characteristics of a database.
               | 
               | I doubt there really is anything. I don't think it's
               | possible to actually narrow down "database" to some
               | precise definition, so I think for instance the person
               | who said "only an idiot thinks Redis is a database" needs
               | to explain what they mean by "a database" before we can
               | discuss the merits of their position.
        
               | himinlomax wrote:
               | > And there I'm unsure what we consider essential
               | characteristics of a database.
               | 
               | Simple, people know other databases (Postgres, MySQL,
               | SQLite, SQL Server, Berkeley DB, CouchDB, ....), those
               | have many things in common, so when they hear "Redis is a
               | database" they unconsciously assume Redis has properties
               | those other products have. And it has few all those
               | products have.
        
               | squiggleblaz wrote:
               | Just fyi: If you're deep in a thread and the Reply button
               | doesn't show on the comment you want to reply to, you can
               | click on the time. The reply button will be available
               | there. Basically, permitted thread depth is relative to
               | the view, not absolute to the root.
        
         | rualca wrote:
         | > As German engineers are known to say: "Why make things simple
         | when complicated will do?"
         | 
         | Isn't Redis an in-memory key/value store? Paying for RAM is not
         | the same as paying for SSD.
        
           | welder wrote:
           | There's Redis-compatible alternatives that use disk space
           | instead of RAM.
           | 
           | https://wakatime.com/blog/45-using-a-diskbased-redis-
           | clone-t...
        
       | hardwaresofton wrote:
       | A blog post series I've been meaning to write for over 3 years
       | now:
       | 
       | * Every database a Postgres 1: Key/Value store
       | 
       | * Every database a Postgres 2: Document stores
       | 
       | * Every database a Postgres 3: Logs (Kafka-esque)
       | 
       | * Every database a Postgres 4: Timeseries
       | 
       | * Every database a Postgres 5: Full Text Search
       | 
       | * Every database a Postgres 6: Message Queues
       | 
       | Low key, you could make almost every single type of database a
       | modern startup needs out of Postgres, and get the benefits (and
       | drawbacks) of Postgres everywhere.
       | 
       | Should you do it? Probably not. Is it good enough for a
       | theoretical ~70% of the startups out there who really don't
       | shuffle around too much data or need to pretend to do any hyper
       | scaling? Maybe.
       | 
       | If anyone from 2ndQuadrant/Citus/EDB see this, please do a series
       | like this, make the solutions open source, and I bet we'd get
       | some pretty decent performance out of Postgres compared to the
       | purpose built solutions (remember, TimescaleDB did amazing
       | compared to InfluxDB, a purpose built tool, not too long ago).
       | 
       | New features like custom table access methods and stuff also
       | shift the capabilities of Postgres a ton. I'm fairly certain I
       | could write a table access method that "just" allocated some
       | memory and gave it to a redis subprocess (or even a compiled-in
       | version) to use.
       | 
       | [EDIT] - It's not clear but the listing is in emacs org mode,
       | those bullet points are expandable and I have tons of notes in
       | each one of these (ex. time series has lots of activity in
       | postgres -- TimescaleDB, native partitioning, Citus, etc).
       | Unfortunately the first bullet point is 43 (!) bullet points
       | down. If someone wants to fund my yak shaving reach out,
       | otherwise someone signal boost this to 2Q/Citus/EDB so
       | professionals can take a stab at it.
       | 
       | [EDIT2] - I forgot some, Postgres actually has:
       | 
       | - Graph support, w/ AgensGraph now known as AGE[0]
       | 
       | - OLAP workloads with Citus Columnar[1] (and zedstore[2]).
       | 
       | [0]: https://age.apache.org
       | 
       | [1]:
       | https://www.citusdata.com/blog/2021/03/05/citus-10-release-o...
       | 
       | [2]: https://github.com/greenplum-db/postgres/tree/zedstore
        
         | rkwz wrote:
         | > Should you do it? Probably not. Is it good enough for a
         | theoretical ~70% of the startups out there who really don't
         | shuffle around too much data or need to pretend to do any hyper
         | scaling? Maybe.
         | 
         | It's also useful when you want to quickly build a "good enough"
         | version of a feature like search so you can get it in front of
         | your users fast and iterate on their feedback. Most of the
         | time, they'd be quite happy with the results and you don't have
         | to spend time on something like managing Elasticsearch.
         | 
         | I wrote a post on how you can use postgres to add search
         | capabilities with support for queries like
         | 
         | jaguar speed -car
         | 
         | ipad OR iphone
         | 
         | "chocolate chip" recipe
         | 
         | http://www.sheshbabu.com/posts/minimal-viable-search-using-p...
        
           | hardwaresofton wrote:
           | Yup -- I'm a big fan of always writing the interface and the
           | implementation, even if there's only one. You're always glad
           | you wrote `Queue` and `PostgresQueue` when it comes time to
           | write `KafkaQueue` or `NATSQueue`.
           | 
           | That said, I am an unrepentant yak shaver, and there is a lot
           | to be said in just writing those things when you need it but,
           | Postgres would be perfect for rapid prototyping in this way.
        
         | osigurdson wrote:
         | I do think this is a product that everyone wants - support all
         | popular models (relational, kvs, queue, log, etc) in a
         | consistent, scalable, open source and easy to operate service.
         | I'm not sure that this is actually possible but I think if such
         | a thing did exist it really would dominate.
         | 
         | In the current reality today, implementing everything in
         | Postgres is probably going to be slower to market (i.e. for a
         | start-up) than using off-the-shelf products. When you do need
         | to scale, this is when you get to learn about how valid your
         | assumptions were in your abstraction layer - mostly likely in
         | production. As a concrete example, Kafka isn't designed to work
         | well with large numbers of topics. Similarly, InfluxDB isn't
         | designed to work well with high cardinality time series. I
         | think it is generally wiser to "skate where the puck is going"
         | in this situation.
         | 
         | Of course, everything is a trade-off. Postgres is incredibly
         | reliable (like insane) and simple to operate. I'd say for any
         | kind of internal line-of-business type application where
         | scalability is less of a concern you really would be doing your
         | ops team a service by implementing everything in Postgres.
        
         | des1nderlase wrote:
         | But I don't get it, why would you use PG for all these if
         | specialized systems (and arguably optimized for that use case)
         | already exist?
        
           | jpalomaki wrote:
           | One practical thing is that consistent backups can become
           | very difficult if you distribute your state to multiple
           | places.
        
           | cnorthwood wrote:
           | Operational ease
        
           | konschubert wrote:
           | Because you already have a Postgres DB running probably and
           | you know how to back it up, you know how to upgrade it, all
           | your services can already authenticate towards it, your
           | developers can run it locally, you know how to mock it...
        
           | hardwaresofton wrote:
           | Just repeating what others have said:
           | 
           | - Postgres is probably already running (it's pretty good for
           | OLTP workloads)
           | 
           | - Operational ease and robustness
           | 
           | - Cloud support everywhere for Postgres
           | 
           | - People know how to backup and restore postgres
           | 
           | - Sometimes Postgres will beat or wholly subsume your
           | specialized system and be a good choice
           | 
           | - Postgres has ACID compliance and a _very_ good production-
           | ready grasp on the research level problems involved in
           | transactions. I 've never met an etcd/zookeeper cluster I
           | didn't wish was simply a postgres table. Image being able to
           | actually change your cache _and_ your data at the same time
           | and ensure that both changes happen or none of them happen
           | (this is a bit vaporware-y, because locks and access pattern
           | discrepancies and stuff but bear with me). You 're much more
           | unlikely to see Postgres fail a Jepsen test[0]
           | 
           | [0]: https://jepsen.io
        
       | dave_sid wrote:
       | I'd think you don't really need redis or elastic search in about
       | 80% of the places you see it. You certainly don't need the
       | baggage.
        
       | Xophmeister wrote:
       | SKIP LOCKED looks interesting; I'll have to try that. I've used
       | advisory locks in the past, but I kept running into deadlocks
       | when I tried to acquire them intelligently (just for popping off
       | the queue). It was unclear why, at the time, so I just put an
       | advisory lock on every transaction. Obviously that causes serious
       | contention problems as the number of jobs and workers increase.
        
       | [deleted]
        
       | u678u wrote:
       | Surely redis is trivial to set up. Postgres gives you lots more
       | maintenance and admin headaches.
        
       ___________________________________________________________________
       (page generated 2021-06-12 23:01 UTC)