[HN Gopher] Multi-tenant queues in Postgres
___________________________________________________________________
Multi-tenant queues in Postgres
Author : abelanger
Score : 80 points
Date : 2024-04-18 15:26 UTC (7 hours ago)
(HTM) web link (docs.hatchet.run)
(TXT) w3m dump (docs.hatchet.run)
| andrewstuart wrote:
| I wonder if Postgres RBAC row based access control is another
| solution to this.
| ucarion wrote:
| This is pretty fancy stuff! Sorry if I'm just not reading
| carefully enough, but does this approach account for tenants
| whose messages take longer to process, as opposed to a tenant
| that sends a larger volume of messages?
| abelanger wrote:
| No, this assumes you've already split your tasks into quanta of
| approximately the same duration - so all tasks are weighted
| equally in terms of execution time. If each of the tasks have
| different weights you might be looking at something like
| deficit round-robin [1], which I've looked at implementations
| for in RabbitMQ and we're thinking about how to implement in PG
| as well [2].
|
| [1]
| https://www.cs.bu.edu/fac/matta/Teaching/cs655-papers/DRR.pd...
|
| [2] https://nithril.github.io/amqp/2015/07/05/fair-consuming-
| wit...
| jperras wrote:
| Is the `FOR UPDATE SKIP LOCKED` in the CTE necessary? Granted my
| understanding of Postgres row-level locking and their interaction
| with CTEs may be a bit lacking, but according to the docs[1]:
| The sub-statements in WITH are executed concurrently with each
| other and with the main query. Therefore, when using data-
| modifying statements in WITH, the order in which the specified
| updates actually happen is unpredictable. All the statements are
| executed with the same snapshot (see Chapter 13), so they cannot
| "see" one another's effects on the target tables.
|
| 1. https://www.postgresql.org/docs/current/queries-
| with.html#QU...
| mslot wrote:
| Read committed mode (PostgreSQL's default) can get pretty
| funky.
|
| If two transactions concurrently perform a SELECT (may be in a
| CTE) followed by an UPDATE, then they might see and try to
| update the same rows. That's often undesirable, for instance in
| the example of a queue where messages are supposed to arrive
| ~once. Serializable mode would "solve" the problem by letting
| one transaction fail, and expects the application to retry or
| otherwise deal with the consequences.
|
| FOR UPDATE is a precision tool for working around read
| committed limitations. It ensures rows are locked by whichever
| transaction reads them first, such that the second reader
| blocks and (here's the funky part) when the first transaction
| is done it actually reads the latest row version instead of the
| one that was in the snapshot. That's semantically a bit weird,
| but nonetheless very useful, and actually matches how updates
| work in PostgreSQL.
|
| The biggest issue with SELECT..FOR UPDATE is that it blocks
| waiting for concurrent updaters to finish, even if the rows no
| longer match its filter after the update. The SKIP LOCKED
| avoids all that by simply skipping the locked rows in the
| SELECT. Semantically even weirder, but very useful for queues.
| jperras wrote:
| Ah, I see - the default transactional isolation level is what
| I wasn't accounting for.
|
| Thanks for the explanation! Very much appreciated.
| plandis wrote:
| At a previous job we did something similar but ended up having
| workers first poll another table to determine which tenant to
| query against. We called these items tokens and they represented
| a finite amount of dedicated thread time for processing a
| specific tenants' queue.
|
| What this looked like was a worker thread would first query the
| token table for which tenant to process eligible tasks from, and
| then update the token to take a timed lock and during that time
| would solely process eligible tasks from a specific tenant.
|
| This has some nice properties:
|
| 1. You can scale different tenants using different amounts of
| tokens which means different but controlled amounts of thread
| time.
|
| 2. It allows for locality on your worker thread. Within a
| specific tenant the processing was usually similar so any shared
| resources could be cached and reused after polling for additional
| eligible tasks from the tenants queue.
| ndriscoll wrote:
| You've got something wonky going on with that query plan for the
| 2nd partition by attempt. In particular the seq scan on tasks to
| do the `(tasks.id = eligible_tasks.id)` hash join seems odd. The
| filter on queued status in `CTE eligible_tasks` (and not in the
| last join) also seems weird. Is that plan for the same query in
| the article?
|
| If you add an index on `group_key, id WHERE status = 'queued'`
| and remove the 2nd `WHERE tasks."status" = 'QUEUED'` (I believe
| that's redundant?), you might get a better plan. You'd want
| something to make sure you're not preferring one group_key as
| well.
|
| It's also useful in practice to have something like a worker_id
| and timestamp and not just set status to RUNNING in case a worker
| gets stuck/dies and you need to unclaim the work.
___________________________________________________________________
(page generated 2024-04-18 23:00 UTC)