[HN Gopher] Pg_later: Asynchronous Queries for Postgres
       ___________________________________________________________________
        
       Pg_later: Asynchronous Queries for Postgres
        
       Author : revskill
       Score  : 171 points
       Date   : 2023-08-18 08:15 UTC (14 hours ago)
        
 (HTM) web link (tembo.io)
 (TXT) w3m dump (tembo.io)
        
       | MuffinFlavored wrote:
       | I'm having a hard time understanding. I see that it's basically a
       | queue. Request your query to be inserted, get a job_id back, poll
       | that job_id until it is done.
       | 
       | Under the hood, is the query ran like a normal query?
       | 
       | I guess I don't understand the distinction between "run queries
       | async" or "put query in a queue, run them sync one a time, poll
       | until the one you want is done"
       | 
       | I get how the latter is "async".
        
         | chuckhend wrote:
         | It is similar functionality to
         | https://docs.snowflake.com/en/developer-guide/sql-
         | api/refere..., the `async` flag more specifically. You submit
         | the query and are given the job id. A background worker runs it
         | like a normal query, but the caller can go about their business
         | while Postgres handles it for them.
        
       | pramsey wrote:
       | Similar to https://github.com/vibhorkum/pg_background
        
       | DaiPlusPlus wrote:
       | If I understand the article correctly, the two main features of
       | pg_later is that a SQL batch won't be aborted when your Postgres
       | connection dies - and all results are retained until you retrieve
       | them, and this is all mediated by Tembo's Postgres message-queue
       | extension.
       | 
       | The thing is, both of those things can be done today without any
       | extensions: just modify your SQL scripts to run under an Agent
       | account, and change every `SELECT` into a `SELECT INTO`
       | statement, all dumping into a tempdb - this technique also works
       | on SQL Server and Oracle too.
       | 
       | (On the subject of Agents, I'm surprised pgAgent isn't built-in
       | to Postgres; while MSSQL and Oracle have had it since the start)
        
         | dventimi wrote:
         | I'm confused. Can I already do this today without extensions or
         | do I need to install pgAgent first?
        
           | notatoad wrote:
           | i think the parent commenter has confused "without any
           | extensions" and "without installing any more extensions than
           | the ones i've already installed"
        
             | DaiPlusPlus wrote:
             | To my knowledge, pgAgent isn't an extension, it's just a
             | client.
        
               | dventimi wrote:
               | Thanks. That helps.
        
           | ransackdev wrote:
           | Idk about pgagent but any table is a resilient queue with the
           | multiple locks available in pg along with some SELECT
           | pg_advisory_lock or SELECT FOR UPDATE queries, and/or
           | LISTEN/NOTIFY.
           | 
           | Several bg job libs are built around native locking
           | functionality
           | 
           | > Relies upon Postgres integrity, session-level Advisory
           | Locks to provide run-once safety and stay within the limits
           | of schema.rb, and LISTEN/NOTIFY to reduce queuing latency.
           | 
           | https://github.com/bensheldon/good_job
           | 
           | > |> lock("FOR UPDATE SKIP LOCKED")
           | 
           | https://github.com/sorentwo/oban/blob/8acfe4dcfb3e55bbf233aa.
           | ..
        
       | zekenie wrote:
       | Interesting! I've used materialized reviews for this in the past.
       | I'm curious what the advantages to pg_later are over that.
        
         | chuckhend wrote:
         | I don't think you'd replace a materialized view with pg_later,
         | but it might help you populate or update your materialized view
         | if you are trying to do that asynchronously. pglater.exec()
         | works with DDL too!
         | 
         | I use it a lot for long running queries when doing data science
         | and machine learning work, and a lot of times when executing
         | queries from a jupyter notebook or CLI. That way if my jupyter
         | kernel dies, my query execution continues even if the network
         | or my environment has an issue. I've started using it a bit
         | more with https://github.com/postgresml/postgresml for model
         | training tasks too, since those can be quite long running
         | depending on the situation.
        
         | magden wrote:
         | It should be helpful for maintenance and DDL tasks. Such as I
         | need to build a secondary index over a large table and do
         | something once the index is ready. Start the index creation
         | with pg_later, check the status later and do what you need to
         | do next. Probably, they can support triggers so that once a
         | task is completed I can be notified.
        
           | samaysharma wrote:
           | We are planning to add support for "Push notifications for
           | completed and failed jobs". That's in our roadmap listed
           | here: https://tembo.io/blog/introducing-pg-later/#up-next
        
       | chuckhend wrote:
       | Hi HN, lead dev on pg_later here. Reach out if you want to try
       | pg_later or pgmq on Tembo Cloud. email me, adam at tembo dot io
       | or join our slack:
       | https://join.slack.com/t/tembocommunity/shared_invite/zt-20d...
        
         | why-el wrote:
         | Do the queries still have to be serial? I imagine so. The
         | README is not clear but it can be interpreted as me being able
         | to fire more than one query (say 2, but the first one is still
         | running, like pg_background). Is that the case?
        
           | chuckhend wrote:
           | There's currently a single background worker reading jobs
           | from a single queue, so processing is limited to serial at
           | the moment. But we're planning on making both of those
           | configurable, so that you can have priority queues, and
           | multiple working pulling from the same queue, for example.
        
       | davea37 wrote:
       | It's so good, it gave the elephant an extra leg.
        
         | luto wrote:
         | I don't quite understand why blog writers care to include a
         | random AI at the top of their posts. Just give me the content,
         | I don't want to scroll past a random elephant first.
        
           | brightball wrote:
           | Because social sharing seems to work better with an image and
           | hand crafting something for every post is exhausting.
        
             | weinzierl wrote:
             | Everyone says they that and has been before AI. What they
             | always miss is the little word _" relevant"_. A relevant
             | picture can be helpful, a random picture is just
             | distracting and annoying.
        
               | mosselman wrote:
               | I don't find it that random in a post about Postgres to
               | show a picture of an elephant. If it had been a badger or
               | something, then yes, but an Elephant is pretty relevant
               | as animals go.
        
             | DaiPlusPlus wrote:
             | > hand crafting something for every post is exhausting.
             | 
             | Spending hours doing trial-and-error with image-generation
             | prompts is also exhausting.
             | 
             | Are we at the point where authors can feed their entire
             | article text into an image-generator and it repeatedly
             | (95%?) produces appropriate, if not very apt, artwork?
        
               | TylerE wrote:
               | Most likely some SaaS that is ultimately mechanical
               | turking the work out to some country with cheap labor.
        
         | mhd wrote:
         | When your LLM is fed by blind Buddhist monks.
        
           | revskill wrote:
           | Why Buddhist ? Why not Christians ?
        
             | rrauenza wrote:
             | https://en.wikipedia.org/wiki/Blind_men_and_an_elephant
             | 
             | "The earliest versions of the parable of blind men and
             | elephant is found in Buddhist, Hindu and Jain texts, as
             | they discuss the limits of perception and the importance of
             | complete context. The parable has several Indian variations
             | [...]"
        
       | PaulHoule wrote:
       | arangodb has had this for a long time
       | 
       | https://www.arangodb.com/docs/stable/http/general.html
        
         | chuckhend wrote:
         | Snowflake has too! I'm not familiar with arangodb but I'll
         | check it out.
        
       | reacharavindh wrote:
       | Not so much a big user of Postgres. But, out of curiosity, does
       | Postgres have a way to run async queries _and_ have them run in
       | the background with lowest priority such that it doesn't affect
       | other important queries and transactions?
       | 
       | As an example, a way to run an upsetting query about a cosmetic
       | change in big table without locking rows for other transactions
       | we care about on the same table..?
        
         | drsim wrote:
         | I'm not aware of a way to prioritize one process over another.
         | 
         | The way I've done it is, as you allude to, by managing locks.
         | You set the transaction isolation level as appropriate.
         | 
         | You can also batch statements by using a cursor, rather than
         | having a single large blocking transaction.
        
         | andruby wrote:
         | There's no concept of nice or priority in Postgres.
         | 
         | Queries usually stay out of each other's way, unless they're
         | modifying the same data, causing lock contention.
         | 
         | What I've done in the past for "less important background
         | queries" is use very short lock_timeout and short
         | statement_timeout values. The query will fail if it can't
         | acquire the lock quickly (and in turn won't hold extra locks),
         | so you put it in a loop with a sleep.
         | 
         | https://www.postgresql.org/docs/15/runtime-config-client.htm...
        
         | samaysharma wrote:
         | There isn't an explicit concept of priority / resource
         | allocation between queries in Postgres.
         | 
         | However, Postgres does that automatically for certain
         | background processes like autovacuum, background worker etc. by
         | allowing you to configure how fast / slow they go.
         | 
         | You could implicitly influence how fast / slow something goes
         | by setting per role / database parameters and giving less
         | resources to certain types of queries
         | (https://www.postgresql.org/docs/15/sql-alterrole.html) or by
         | using explicit locks + lock_timeout to create some kind of a
         | priority.
        
         | gavinray wrote:
         | I tested the following and it seems to work:
         | CREATE OR REPLACE FUNCTION
         | run_with_adjusted_settings(query_text text)         RETURNS
         | SETOF record         LANGUAGE plpgsql         AS $$
         | DECLARE              result_record record;         BEGIN
         | -- set_config ( setting_name text, new_value text, is_local
         | boolean ) - text             -- Sets the parameter setting_name
         | to new_value, and returns that value.             -- If
         | is_local is true, the new value will only apply during the
         | current transaction.             PERFORM
         | set_config('statement_timeout', '10s', true);
         | PERFORM set_config('work_mem', '1MB', true);
         | PERFORM set_config('maintenance_work_mem', '1MB', true);
         | PERFORM set_config('max_parallel_workers_per_gather', '1',
         | true);                  -- Execute the provided query
         | dynamically and return the results             FOR
         | result_record IN EXECUTE query_text             LOOP
         | RETURN NEXT result_record;             END LOOP;
         | RETURN;         END;         $$;              -- Example usage:
         | SELECT *         FROM run_with_adjusted_settings('SELECT 1 as
         | id, false as some_bool;') AS (id int, some_bool boolean);
        
       | netcraft wrote:
       | Ive wanted something like this for a while.
       | 
       | Years ago on db2 on AS/400 we could submit queries to "batch". It
       | would save the results in a physical file and you could come back
       | and query them later. We were running plenty of things that took
       | hours or sometimes days to finish. Being able to have that
       | running on the server, not tied to any given client and set their
       | priority (and change the priority during the run) was a huge
       | benefit.
       | 
       | Theres not too many cases where I need to run hour long queries
       | anymore, but still would be a great feature to have for long
       | running, lower priority jobs.
       | 
       | Just need this to become available for RDS / Aurora.
        
       ___________________________________________________________________
       (page generated 2023-08-18 23:01 UTC)