[HN Gopher] Pg_later: Asynchronous Queries for Postgres
___________________________________________________________________
Pg_later: Asynchronous Queries for Postgres
Author : revskill
Score : 171 points
Date : 2023-08-18 08:15 UTC (14 hours ago)
(HTM) web link (tembo.io)
(TXT) w3m dump (tembo.io)
| MuffinFlavored wrote:
| I'm having a hard time understanding. I see that it's basically a
| queue. Request your query to be inserted, get a job_id back, poll
| that job_id until it is done.
|
| Under the hood, is the query ran like a normal query?
|
| I guess I don't understand the distinction between "run queries
| async" or "put query in a queue, run them sync one a time, poll
| until the one you want is done"
|
| I get how the latter is "async".
| chuckhend wrote:
| It is similar functionality to
| https://docs.snowflake.com/en/developer-guide/sql-
| api/refere..., the `async` flag more specifically. You submit
| the query and are given the job id. A background worker runs it
| like a normal query, but the caller can go about their business
| while Postgres handles it for them.
| pramsey wrote:
| Similar to https://github.com/vibhorkum/pg_background
| DaiPlusPlus wrote:
| If I understand the article correctly, the two main features of
| pg_later is that a SQL batch won't be aborted when your Postgres
| connection dies - and all results are retained until you retrieve
| them, and this is all mediated by Tembo's Postgres message-queue
| extension.
|
| The thing is, both of those things can be done today without any
| extensions: just modify your SQL scripts to run under an Agent
| account, and change every `SELECT` into a `SELECT INTO`
| statement, all dumping into a tempdb - this technique also works
| on SQL Server and Oracle too.
|
| (On the subject of Agents, I'm surprised pgAgent isn't built-in
| to Postgres; while MSSQL and Oracle have had it since the start)
| dventimi wrote:
| I'm confused. Can I already do this today without extensions or
| do I need to install pgAgent first?
| notatoad wrote:
| i think the parent commenter has confused "without any
| extensions" and "without installing any more extensions than
| the ones i've already installed"
| DaiPlusPlus wrote:
| To my knowledge, pgAgent isn't an extension, it's just a
| client.
| dventimi wrote:
| Thanks. That helps.
| ransackdev wrote:
| Idk about pgagent but any table is a resilient queue with the
| multiple locks available in pg along with some SELECT
| pg_advisory_lock or SELECT FOR UPDATE queries, and/or
| LISTEN/NOTIFY.
|
| Several bg job libs are built around native locking
| functionality
|
| > Relies upon Postgres integrity, session-level Advisory
| Locks to provide run-once safety and stay within the limits
| of schema.rb, and LISTEN/NOTIFY to reduce queuing latency.
|
| https://github.com/bensheldon/good_job
|
| > |> lock("FOR UPDATE SKIP LOCKED")
|
| https://github.com/sorentwo/oban/blob/8acfe4dcfb3e55bbf233aa.
| ..
| zekenie wrote:
| Interesting! I've used materialized reviews for this in the past.
| I'm curious what the advantages to pg_later are over that.
| chuckhend wrote:
| I don't think you'd replace a materialized view with pg_later,
| but it might help you populate or update your materialized view
| if you are trying to do that asynchronously. pglater.exec()
| works with DDL too!
|
| I use it a lot for long running queries when doing data science
| and machine learning work, and a lot of times when executing
| queries from a jupyter notebook or CLI. That way if my jupyter
| kernel dies, my query execution continues even if the network
| or my environment has an issue. I've started using it a bit
| more with https://github.com/postgresml/postgresml for model
| training tasks too, since those can be quite long running
| depending on the situation.
| magden wrote:
| It should be helpful for maintenance and DDL tasks. Such as I
| need to build a secondary index over a large table and do
| something once the index is ready. Start the index creation
| with pg_later, check the status later and do what you need to
| do next. Probably, they can support triggers so that once a
| task is completed I can be notified.
| samaysharma wrote:
| We are planning to add support for "Push notifications for
| completed and failed jobs". That's in our roadmap listed
| here: https://tembo.io/blog/introducing-pg-later/#up-next
| chuckhend wrote:
| Hi HN, lead dev on pg_later here. Reach out if you want to try
| pg_later or pgmq on Tembo Cloud. email me, adam at tembo dot io
| or join our slack:
| https://join.slack.com/t/tembocommunity/shared_invite/zt-20d...
| why-el wrote:
| Do the queries still have to be serial? I imagine so. The
| README is not clear but it can be interpreted as me being able
| to fire more than one query (say 2, but the first one is still
| running, like pg_background). Is that the case?
| chuckhend wrote:
| There's currently a single background worker reading jobs
| from a single queue, so processing is limited to serial at
| the moment. But we're planning on making both of those
| configurable, so that you can have priority queues, and
| multiple working pulling from the same queue, for example.
| davea37 wrote:
| It's so good, it gave the elephant an extra leg.
| luto wrote:
| I don't quite understand why blog writers care to include a
| random AI at the top of their posts. Just give me the content,
| I don't want to scroll past a random elephant first.
| brightball wrote:
| Because social sharing seems to work better with an image and
| hand crafting something for every post is exhausting.
| weinzierl wrote:
| Everyone says they that and has been before AI. What they
| always miss is the little word _" relevant"_. A relevant
| picture can be helpful, a random picture is just
| distracting and annoying.
| mosselman wrote:
| I don't find it that random in a post about Postgres to
| show a picture of an elephant. If it had been a badger or
| something, then yes, but an Elephant is pretty relevant
| as animals go.
| DaiPlusPlus wrote:
| > hand crafting something for every post is exhausting.
|
| Spending hours doing trial-and-error with image-generation
| prompts is also exhausting.
|
| Are we at the point where authors can feed their entire
| article text into an image-generator and it repeatedly
| (95%?) produces appropriate, if not very apt, artwork?
| TylerE wrote:
| Most likely some SaaS that is ultimately mechanical
| turking the work out to some country with cheap labor.
| mhd wrote:
| When your LLM is fed by blind Buddhist monks.
| revskill wrote:
| Why Buddhist ? Why not Christians ?
| rrauenza wrote:
| https://en.wikipedia.org/wiki/Blind_men_and_an_elephant
|
| "The earliest versions of the parable of blind men and
| elephant is found in Buddhist, Hindu and Jain texts, as
| they discuss the limits of perception and the importance of
| complete context. The parable has several Indian variations
| [...]"
| PaulHoule wrote:
| arangodb has had this for a long time
|
| https://www.arangodb.com/docs/stable/http/general.html
| chuckhend wrote:
| Snowflake has too! I'm not familiar with arangodb but I'll
| check it out.
| reacharavindh wrote:
| Not so much a big user of Postgres. But, out of curiosity, does
| Postgres have a way to run async queries _and_ have them run in
| the background with lowest priority such that it doesn't affect
| other important queries and transactions?
|
| As an example, a way to run an upsetting query about a cosmetic
| change in big table without locking rows for other transactions
| we care about on the same table..?
| drsim wrote:
| I'm not aware of a way to prioritize one process over another.
|
| The way I've done it is, as you allude to, by managing locks.
| You set the transaction isolation level as appropriate.
|
| You can also batch statements by using a cursor, rather than
| having a single large blocking transaction.
| andruby wrote:
| There's no concept of nice or priority in Postgres.
|
| Queries usually stay out of each other's way, unless they're
| modifying the same data, causing lock contention.
|
| What I've done in the past for "less important background
| queries" is use very short lock_timeout and short
| statement_timeout values. The query will fail if it can't
| acquire the lock quickly (and in turn won't hold extra locks),
| so you put it in a loop with a sleep.
|
| https://www.postgresql.org/docs/15/runtime-config-client.htm...
| samaysharma wrote:
| There isn't an explicit concept of priority / resource
| allocation between queries in Postgres.
|
| However, Postgres does that automatically for certain
| background processes like autovacuum, background worker etc. by
| allowing you to configure how fast / slow they go.
|
| You could implicitly influence how fast / slow something goes
| by setting per role / database parameters and giving less
| resources to certain types of queries
| (https://www.postgresql.org/docs/15/sql-alterrole.html) or by
| using explicit locks + lock_timeout to create some kind of a
| priority.
| gavinray wrote:
| I tested the following and it seems to work:
| CREATE OR REPLACE FUNCTION
| run_with_adjusted_settings(query_text text) RETURNS
| SETOF record LANGUAGE plpgsql AS $$
| DECLARE result_record record; BEGIN
| -- set_config ( setting_name text, new_value text, is_local
| boolean ) - text -- Sets the parameter setting_name
| to new_value, and returns that value. -- If
| is_local is true, the new value will only apply during the
| current transaction. PERFORM
| set_config('statement_timeout', '10s', true);
| PERFORM set_config('work_mem', '1MB', true);
| PERFORM set_config('maintenance_work_mem', '1MB', true);
| PERFORM set_config('max_parallel_workers_per_gather', '1',
| true); -- Execute the provided query
| dynamically and return the results FOR
| result_record IN EXECUTE query_text LOOP
| RETURN NEXT result_record; END LOOP;
| RETURN; END; $$; -- Example usage:
| SELECT * FROM run_with_adjusted_settings('SELECT 1 as
| id, false as some_bool;') AS (id int, some_bool boolean);
| netcraft wrote:
| Ive wanted something like this for a while.
|
| Years ago on db2 on AS/400 we could submit queries to "batch". It
| would save the results in a physical file and you could come back
| and query them later. We were running plenty of things that took
| hours or sometimes days to finish. Being able to have that
| running on the server, not tied to any given client and set their
| priority (and change the priority during the run) was a huge
| benefit.
|
| Theres not too many cases where I need to run hour long queries
| anymore, but still would be a great feature to have for long
| running, lower priority jobs.
|
| Just need this to become available for RDS / Aurora.
___________________________________________________________________
(page generated 2023-08-18 23:01 UTC)