[HN Gopher] Tricking PostgreSQL into using an insane, but faster...
___________________________________________________________________
Tricking PostgreSQL into using an insane, but faster, query plan
Author : cube2222
Score : 156 points
Date : 2022-01-18 16:42 UTC (6 hours ago)
(HTM) web link (spacelift.io)
(TXT) w3m dump (spacelift.io)
| SigmundA wrote:
| PostgreSQL really needs hints. Sometimes I as the developer know
| better than the optimizer and having to trick the optimizer to
| get it to do what I want is a ridiculous situation.
|
| Also having the optimizer decide to do something bad off hours is
| not a good situation again hints or plan locking would help here.
| shock-value wrote:
| It's basically shameful how the maintainers refuse to even
| entertain the thought of allowing query hints. These kinds of
| articles and anecdotes come up again and again, yet seemingly
| no movement whatsoever on the blanket ban on query hints in
| Postgres. (See
| https://wiki.postgresql.org/wiki/OptimizerHintsDiscussion)
|
| Postgres generally is great -- so many great features and
| optimizations and more added all the time. But its query
| optimizer still messes up, often. It's absolutely not the
| engineering marvel some would have you believe.
| pmontra wrote:
| I can understand that people wants to get absolute control on
| their queries sometimes. I never used hints so I shouldn't
| even write this note but IMHO that page has a quite balanced
| view of pros and cons of existing hint systems.
| zmmmmm wrote:
| > that page has a quite balanced view of pros and cons of
| existing hint systems
|
| I found the cons section somewhat disappointing given how
| much respect I have for Postgresql maintainers in general.
|
| Most of the "problems" are essentially manifestations of
| dysfunction within users or PostgreSQL development itself.
| People won't report optimizer bugs if they can fix them
| themselves, etc. (far more likely they won't report the
| bugs if they have no way to prove their alternative query
| plan actually performs better, eg: by adding a hint).
|
| Many are just assertions which I highly doubt would hold up
| if validated ("most of the time the optimizer is actually
| right", "hints in queries require massive refactoring"
| ...).
| gowld wrote:
| > (far more likely they won't report the bugs if they
| have no way to prove their alternative query plan
| actually performs better, eg: by adding a hint).
|
| They don't need to run an alternative plan, they just
| need to report that the main plan did far more work than
| needed.
|
| In the real world, when users have a problem in their
| mission-critical system, they ask for a solution, not
| just give up on the mission.
|
| > Many are just assertions which I highly doubt would
| hold up if validated ("most of the time the optimizer is
| actually right", "hints in queries require massive
| refactoring" ...).
|
| Their 20 years of experience beats your idle speculation.
| zmmmmm wrote:
| > Their 20 years of experience beats your idle
| speculation.
|
| Perhaps.
|
| But it comes across to me on that page as arrogance and
| dismissiveness of user needs. Hence why I find it
| disappointing, even if it is mainly in how it is
| expressed.
| infogulch wrote:
| > they just need to report that the main plan did far
| more work than needed.
|
| And how do they know that the chosen plan is not already
| optimal? How many reports do you think you'd get if every
| time someone encountered a "slow" query they reported it?
| And what fraction of those would be finally found to be
| caused by a bad optimizer as opposed to the user's fault
| like a missing index, outdated statistics, ... 1%? 0.1%?
| See stack overflow for a sample of that ratio. Be careful
| what you wish for.
|
| If hints existed performance reports could be "default
| plan of x is slow, I know because when I use hints w,z
| then it's fast".
|
| What should they do in the meantime while they're waiting
| for the pg project to acknowledge the problem, someone to
| propose a solution, and someone to implement a fix, just
| sit on their hands and twiddle their thumbs while their
| database server spews smoke? Hints are an escape hatch
| that empowers users to take over a when the optimizer
| veers off course. Not having _any_ way to control the
| plan is like a tesla autopilot that doesn 't allow the
| driver to take control over the car in an emergency.
|
| Also, distrusting users to report performance issues is a
| weird attitude and gives a bad taste. Maybe it's true, or
| maybe you need to make it easier.
| williamdclt wrote:
| It does. But when you're watching your database, and
| therefore your business, crash and burn because it decided
| to change the query plan for an important query, all these
| "problems with existing Hint systems" sound very
| irrelevant.
|
| I'd love to have a way to lock a plan in a temporary
| "emergency measure" fashion. But of course it's
| hard/impossible to design this without letting people abuse
| it and pretend it's just a hint system.
| silon42 wrote:
| Exactly... DB deciding to switch query plans in
| production is sometimes terrible.
| suddendescent wrote:
| >I'd love to have a way to lock a plan in a temporary
| "emergency measure" fashion.
|
| Could this be achieved by extending prepared statements?
| [1] The dirty option would be to introduce a new keyword
| like PREPAREFIXED.
|
| The first time such a statement is executed, the
| execution plan could be stored and then retrieved on
| subsequent queries. There would be no hints and the
| changes in code should be minimal.
|
| Once a query runs successfully, the users can be sure
| that the execution plan won't change.
|
| >But of course it's hard/impossible to design this
| without letting people abuse it
|
| Is this more important than having predictable execution
| times?
|
| [1] https://www.postgresql.org/docs/current/sql-
| prepare.html
| yjftsjthsd-h wrote:
| > These kinds of articles and anecdotes come up again and
| again
|
| Don't articles and anecdotes also come up again and again of
| developers feeding bad happens to the database and cratering
| their performance?
|
| And that wiki page... Very explicitly isn't a blanket ban?
| They straight up say that they're willing to consider the
| idea if somebody wants to say how to do it without the
| pitfalls of other systems. The only thing they say they're
| going to ignore outright is people thinking they should have
| a feature because other databases have it (which seems fair).
| erichocean wrote:
| > _people thinking they should have a feature because other
| databases have it (which seems fair)_
|
| Literally NO ONE wants query hinting in Postgres to check
| some kind of feature box because other databases have it.
|
| We know we want it because...other databases have it, and
| it's INCREDIBLY USEFUL.
|
| > _They straight up say that they 're willing to consider
| the idea if somebody wants to say how to do it without the
| pitfalls of other systems._
|
| Pitfalls my ass. We want exactly the functionality that is
| already present in other systems, _pitfalls and all_. That
| 's just an excuse to do nothing, Apple-style, "because we
| know better than our own users" while trying to appear
| reasonable.
|
| It's akin to not adopting SQL until you can do so "while
| avoiding the pitfalls of SQL." Just utter bullshit.
| adamrt wrote:
| > We want exactly the functionality that is already
| present in other systems, pitfalls and all. That's just
| an excuse to do nothing
|
| Are you offering to help with the maintenance or
| development associated with it? Or are you just demanding
| features while calling the people that do help with that
| stuff liars?
|
| Maybe they do know better? Or maybe they know about other
| hassles that will come with it that you can't fathom.
| They've been developing and maintaining one of the best
| open source projects on the planet for nearly 3 decades.
| Maybe with all that experience, its not their opinion
| that is BS?
| mst wrote:
| Given there's already a pg_hint_plan extension I really
| don't see why people are so angry about the developers
| not wanting to bless something they consider a footgun as
| a core feature.
| zepearl wrote:
| "risk", because pg_hint is an external module, written by
| somebody as a hobby, therefore if it stops existing or
| working after some updates or with newer versions of
| Postgres then it becomes a huge problem for PROD systems
| that need it for any reason.
| ghusbands wrote:
| Sure, maybe in not doing what every other database does,
| that really helps users out when they're suffering
| unpredictable performance, they're demonstrating their
| far superior knowledge. And maybe all the users that are
| suffering are somehow incompetent.
|
| Or maybe it's just typical developer hubris of the kind
| that hits everyone at some point.
| fragmede wrote:
| We're software engineers not marketing folk who just want
| to check a box.
|
| There's a difference between "have a feature _just because_
| other databases have it " and "this is a very useful
| feature that would have its own PostgreSQL-isms and also we
| got this idea because other databases have it". The query
| planner isn't infallible, so being able to hint queries to
| not accidentally use a temporary table that just can't fit
| in ram isn't just copying a feature "just because everyone
| else has it".
| ghusbands wrote:
| > they're willing to consider the idea if somebody wants to
| say how to do it without the pitfalls of other systems
|
| That is basically a blanket ban. Saying you won't implement
| a widely-implemented feature unless someone comes up with a
| whole new theory about how to do it better is saying you
| simply won't implement it. Other databases do well enough
| with hints, and they do help with some problems.
| zorgmonkey wrote:
| I've never used it but their is a postgres extension called
| pg_hint_plan [0] for query hints and I am guessing it is
| pretty decent because it is even mentioned in the AWS Aurora
| documentation
|
| [0]: https://pghintplan.osdn.jp/pg_hint_plan.html
| zmmmmm wrote:
| Definitely my biggest issue with Postgres.
|
| I have one shameful query where, unable to convince it to
| execute a subquery that is essentially a constant for most of
| the rows outside of the hot loop of the main index scan, I
| pulled it out into a temp table and had the main query select
| from the temp table instead. Even creating a temp table _and
| indexing it_ was faster than the plan the Postgres query
| planner absolutely insisted on. Things like CTEs etc made no
| difference, it would still come up with the same dumb plan
| every way I expressed the query.
|
| The worst thing is not even being able to debug or understand
| what is going on because you can't influence the query plan
| to try alternative hypotheses out easily.
| baq wrote:
| even plan locking itself would be a welcome feature. we've had
| multiple instances of plans going haywire (never traced it
| down, but 99% sure it was after autovacuum analyze) impacting
| prod and getting back to normal after a manual re-ANALYZE.
| SigmundA wrote:
| I haven't had much use for plan locking even after MS SQL
| server got it, even though its a nice feature with being able
| save load plans from a file etc.
|
| I just like good set of hints since they are part of source
| code and I can put comments and see why they are there in
| source history later.
| jonatron wrote:
| Having been in a situation where Postgres suddenly decides to
| use a much slower query plan on a hot query in production, I'd
| agree.
| quietbritishjim wrote:
| Although absolute sacrilege to many, but I would honestly
| support a really explicit syntax to compose particular tables
| and indices. Not even in top of regular SQL query, I mean its
| own first class language.. E.g. "Take this index, restrict to
| this subrange; for each row, pick a row using this other index
| on the foreign key" (but it wouldn't have to be imperative like
| I've phrased it).
|
| There are times when I already know how I want the query to
| execute, and I have to iteratively fiddle with the query to
| indirectly trick the planner into doing that. There's just no
| excuse for that arrangement. Even if I've made a mistake and my
| plan couldn't work efficiently - at least let me execute it
| inefficiently so I can discover and debug that.
| gfody wrote:
| something like sql server's "use plan" it really comes in
| handy when you need it
| (https://forrestmcdaniel.com/2018/07/19/how-to-use-the-use-
| pl...)
| SigmundA wrote:
| Yes I would too. Over time I have come to appreciate at least
| being able to control order of operations, just forcing join
| order is many times enough of hint to keep things working
| properly.
|
| Having used Linq a lot I would actually prefer more of that
| kind of chained statement approach that is more hands on
| without having to explicitly loop.
| derefr wrote:
| Or, to put that another way: Postgres's query planner should
| be emitting an explicit bytecode that is then interpreted by
| a query-execution abstract machine. That same layer should
| also be exposed directly to clients over the network,
| allowing pre-planned query bytecode to be submitted directly
| in place of a textual query.
|
| I've heart the rebuttal to this idea before, that even minor
| DB schema/configuration changes that should be "invisible" to
| the application (e.g. an index being created -- or even
| refreshed -- on the table) would result in the resulting
| bytecode being different; and so you'd never be able to ship
| such bytecode as a static artifact in your client.
|
| But: who said anything about a static artifact? I'm more
| imagining a workflow like this:
|
| 1. user submits a query with leading command "PLAN" (e.g.
| "PLAN SELECT ...")
|
| 2. Postgres returns planned bytecode (as a regular single-
| column bytea rowset)
|
| 3. if desired, the user _modifies_ the received bytecode
| arbitrarily (leaving alone anything the client doesn 't
| understand -- much like how optimization passes leave alone
| generated intrinsics in LLVM IR)
|
| 4. user later submits a low-level query-execution command
| over the binary wire protocol, supplying the (potentially
| modified) bytecode
|
| 5. _if the bytecode 's assumptions are still valid_, the
| query executes. Otherwise, the query gives a "must re-plan"
| error. The client is expected to hold onto the original SQL
| and re-supply it in this case.
|
| In other words, do it similarly to how Redis's EVAL + EVALSHA
| works; but instead of opaque handles (SHA hashes), the client
| receives a white-box internal representation.
|
| Or: do it similarly to how compiling a shader on a game-
| console with unikernel games + unified RAM/VRAM works. The
| driver's shader compiler spits you back a buffer full of some
| GPU object code -- which, running in kernel-mode, you're free
| to modify before running.
| tshaddox wrote:
| You can't ship SQL queries and then do arbitrary schema
| changes either, right?
| lazide wrote:
| Not arbitrary changes, but nearly so - if you add new
| columns (and your existing queries name the columns they
| use explicitly), then nothing changes as far as the
| client is concerned. If you remove or rename one, only
| those clients who refer to it need to change, and then
| only if you don't do a interposing view/schema/etc.
|
| The server is supposed to take the logical intent
| specified by the queries and do the work to mapping that
| into concrete retrievals in a stable and performant way,
| so a query explicitly telling the server to do something
| in a specific way breaks the model. The more specific,
| the more broken.
|
| DDL changes for instance ideally would not have to tell
| the server how to structure things on disk. The server
| should know how to do it efficiently.
|
| That said, real life intrudes and sometimes (but not
| usually) it's important to do this for stability or
| performance reasons. The more magic involved, the more
| unpredictable a system can be, and the closer a system
| runs to redline, the more chaos/downtime that can cause.
| jhgb wrote:
| > Or, to put that another way: Postgres's query planner
| should be emitting an explicit bytecode that is then
| interpreted by a query-execution abstract machine. That
| same layer should also be exposed directly to clients over
| the network, allowing pre-planned query bytecode to be
| submitted directly in place of a textual query.
|
| And now you've invented Firebird, or even 1980's Interbase.
| ;)
| mst wrote:
| I suspect the extra compatibility/dependency surface would
| make this a non starter in practice but it's an absolutely
| fascinating thought experiment and I'm glad you wrote it
| up.
| SigmundA wrote:
| PostgreSQL doesn't even cache its own byte code, it replans
| every time!
| mst wrote:
| Depending on the placeholder values the optimal plan may
| be very different.
|
| You -can- use a server side prepared statement to force
| it to -ahem- plan ahead but that's -usually- not actually
| worth it.
| SigmundA wrote:
| Thats why other databases have parameter sniffing.
|
| And using prepared statements only works on the same
| connection so of limited use.
|
| Luckily PostgreSQL's optimizer is very primitive and so
| planning doesn't take too much time, as it gets more
| advanced the lack of plan reuse will become more of an
| issue. Its already an issue with the LLVM JIT compilation
| time.
| orthoxerox wrote:
| Clickhouse doesn't go that way all the way, but its approach
| to SQL support is very similar: it does exactly what you have
| written. It doesn't rewrite your query in any way: if you
| want to filter table A before you join it with table B, you
| write a subquery.
| magicalhippo wrote:
| There's no technical reason why one couldn't just select or
| join from an index as if it was a table or a view is there?
|
| We also write queries knowing it will use a specific index,
| or we will create an index because of a specific query. And
| then we have to have a scheduled task to periodically
| recalculate statistics just so the DB server doesn't get
| silly ideas.
|
| Of course it could be abused, but I'm in favor of having ways
| of letting programmers tell computers exactly what to do.
| Sometimes we really do know best.
| pzduniak wrote:
| I recall using https://pghintplan.osdn.jp/pg_hint_plan.html for
| this feature, it did what it claimed to be able to do. It's a
| hack, but it really helps in the worst case scenarios.
| remus wrote:
| Not that I disagree with your point, but the opposite of
|
| > Also having the optimizer decide to do something bad off
| hours is not a good situation again hints or plan locking would
| help here.
|
| Can also happen when you force a particular query plan, only
| for it to turn to treacle when some assumption you made about
| the data suddenly breaks.
|
| As with most things there's no free lunch.
| lazide wrote:
| When doing disaster planning, at least something blowing up
| because of a data change is preventable or mitigateable.
| Staging environments. Rollbacks, etc.
|
| The system randomly deciding to drive itself off a cliff for
| no reason, with no known way to stop it next time is quite
| concerning.
| ghusbands wrote:
| Since the PostgreSQL folks are clearly against hints, maybe
| another random feature suggestion: If a (read-only) query is
| taking "too long", maybe you could launch a second copy with a
| different plan. Maybe even a third after twice as long and a
| fourth after twice as long again. Kill them all once any one
| returns. Trades some extra load now against waiting forever for
| a 'bad' query plan.
|
| The idea obviously needs work and has probably already been
| suggested and dismissed, somewhere, but I thought I might throw
| it out there, especially with modern computers having so many
| cores.
| lgessler wrote:
| I'm not a DB expert, but aren't most queries bottlenecked by
| disk bandwidth? If so, addtl queries might hurt more than
| help.
| zepearl wrote:
| On one hand this could theoretically work if the optimizer
| tracks historical SQLs + the query plans used at that time +
| their runtime and decides to try to use a different query
| plan, then aborts the SQL being executed when its runtime
| exceeds the historical runtime. But in practice it's likely
| to mess things up because data volume/distribution/machine
| load (CPU, disks, etc) might differ compared to the past
| etc... .
|
| On the other hand, in general there are often too many
| postential combinations of query plans to try out (hundreds
| even for a relatively simple SQL) and trying them all out
| would need hours/days/etc... . The "good plan" might be
| something that a machine might categorize as "very unlikely
| to work" so it might end up being the one tested
| automatically at the very end.
|
| Normal hints would still be a lot easier to handle in the
| code and for the user.
| electroly wrote:
| I really appreciate that in SQL Server we can just use query
| hints. I don't think I will ever see eye-to-eye with the
| PostgreSQL team on this. No query planner is perfect. SQL
| Server's planner certainly isn't. We have had the same sort of
| "query plan suddenly changes in production and everything
| breaks" incidents, but they are easily fixed with query hints
| and/or forced plans, and you can be sure that future statistics
| updates won't break them again.
| SigmundA wrote:
| I still have nightmares of SQL servers optimizer taking down
| production systems at random times due to picking a bad plan,
| judicious use of hints has made my life much easier and
| allowed me as a developer to be in control and not have to
| ask the DB nicely and pray it listens.
| newlisp wrote:
| I guess that is why cloud databases like spanner charge big
| bucks.
| fennecfoxen wrote:
| I worked for a payments firm that submitted transfers in
| bulk every day, but had a monthly surge day, and a late
| cut-off period to submit or cancel payments. It doesn't
| matter if you can do everything else in 15 minutes, if the
| query planner decides to take a 30-second query and turn it
| into a 4-hour nightmare because different customers are
| making different numbers of payments today, and it trips
| over some Postgres heuristic.
|
| The Postgres query planner was, quite frankly, the enemy.
| By the time I left we were at the point that we considered
| it a business risk and were looking at alternatives. If you
| need queries to run in a predictable amount of time --
| forget _fast_ , simply _predictable_ -- then Postgres is
| quite simply not fit for purpose.
| slt2021 wrote:
| these query issues are most of the times not SQL Server
| problem, they are rather symptom of poor data model
| architecture, table design.
|
| A lot of times I see in operational DB hot data gets
| mixed with warm and cold data, and your hot path query
| with ton of JOINs/subqueries will get rekt.
|
| Proper redesign and rearchitecture helps to provide large
| buffer against these problems.
|
| Also not everything should be inside SQL server, if you
| run large query every 5 seconds or something - probably
| consider using in-memory cache or denormalized model
| fennecfoxen wrote:
| Excuse me. You are replying with advice that is
| fundamentally inappropriate for a batch workload
| (describing a "hot path query" and suggesting in-memory
| caches), and it shows that you have not read and
| comprehended the post I wrote.
|
| Besides which, if you need to re-architect and
| denormalize by moving records in and out of different
| tables just so you can run what is essentially a monthly
| report in a predictable (not fast! just predictable)
| amount of time -- well then, why bother using Postgres in
| the first place, you might as well bite the bullet and go
| with something NoSQL at that point, because at least when
| you read from a given Cassandra partition you know all
| the results will be right next to each other, 100%; why
| leave it to chance? you've been burned by Postgres before
| slt2021 wrote:
| As a developer I never had to blame SQL server, it was
| all the time develop mistake: poor architecture, poor
| query, some data quality issues. Dont blame the platform
| for developers mistakes, thats what I was trying to say
| fennecfoxen wrote:
| The wise man builds his house upon the rock. The foolish
| man builds his house upon the sand. I will 100% blame the
| platform when what it provides is deficient, for
| selecting a deficient platform is one of the gravest
| mistakes a developer can make.
|
| And Postgres, again, is materially deficient in its query
| performance predictability, and its developers are
| insistent that they have no desire to allow the
| approaches that would mitigate it. If this matters to
| your application, then the prudent developer will drop
| Postgres and use a real database.
| SigmundA wrote:
| Must not have been working with it very long, off the top
| of my head having to disable the "new" optimizer in Sql
| 2014 and go back to the 2012 version due to bad plans
| https://sqlservergeeks.com/sql-server-2014-trace-
| flags-9481/
|
| I have been working with it since way back in 6.5 and it
| has made many obviously wrong decisions in the optimizer
| over the years, its has gotten better but is by no means
| perfect, good thing is it has hints unlike PG!
| slt2021 wrote:
| So whats the issue with marginal difference in
| cardinality estimators? Both queries seem perfectly fine
| speed-wise.
|
| Just splitting hairs, again most of the time developers
| fail to notice the significant data distribution drift,
| and dont rearchitect their data models then blame the
| engine for developers omissions and invalid assumptions
| SigmundA wrote:
| https://www.sqlconsulting.com/archives/big-performance-
| probl...
| fragmede wrote:
| Blaming the table architecture doesn't get rid of the
| fact that the PostgreSQL planner can make decision that
| will cause a query to take an unknown amount of time.
| cube2222 wrote:
| Hey, author here.
|
| The article describes a fun debugging and optimization session
| I've had recently. Happy to answer any questions!
| tqi wrote:
| Interesting write up! Did you happen to try this with a CTE
| instead of a subquery? ie something like: WITH
| tmp AS (SELECT accounts_other.id, COUNT(*) n
| FROM accounts accounts_other JOIN stacks
| stacks_other ON accounts_other.id = stacks_other.account_id
| JOIN runs runs_other ON stacks_other.id = runs_other.stack_id
| WHERE (stacks_other.worker_pool_id IS NULL OR
| stacks_other.worker_pool_id = worker_pools.id)
| AND runs_other.worker_id IS NOT NULL GROUP BY
| 1 ) SELECT COUNT(*)
| as "count", COALESCE(MAX(EXTRACT(EPOCH FROM
| age(now(), runs.created_at)))::bigint, 0) AS "max_age"
| FROM runs JOIN stacks ON runs.stack_id = stacks.id
| JOIN worker_pools ON worker_pools.id = stacks.worker_pool_id
| JOIN accounts ON stacks.account_id = accounts.id LEFT
| JOIN tmp ON tmp.id = accounts.id WHERE
| worker_pools.is_public = true AND runs.type IN (1, 4)
| AND runs.state = 1 AND runs.worker_id IS NULL
| AND accounts.max_public_parallelism / 2 > COALESCE(tmp.n, 0)
|
| I don't know this for sure, but in my experience it has seemed
| like Postgres is bad at optimizing subqueries, and will execute
| once per row.
| cube2222 wrote:
| Haven't run it, but based on the query plan in the first
| picture, PostgreSQL already does that part kind of right. It
| precalculates it for all accounts and then does a three-way
| hash join.
| dragonwriter wrote:
| Aren't CTEs still optimization barriers in Postgres, so
| wouldn't that make any subquert optimization problem worse?
| CBLT wrote:
| Wasn't this solved in Postgres 12?[0]
|
| [0] https://www.depesz.com/2019/02/19/waiting-for-
| postgresql-12-...
| kristjansson wrote:
| Mostly no, since PostgreSQL 12
| [deleted]
| briffle wrote:
| The fun thing is, they have corrected a bunch of that in
| newer versions of postgresql.
|
| I'm surprised that the problem stemmed from an index where
| the 'AND NOT NULL' took forevery to scan, and they didn't
| include a partial index, or an index in a different order
| since they are sorted.
| srcreigh wrote:
| Depends if materialization helps.
|
| As a silly example, if the CTE materializes 1/100th of a
| table via table scan, scanning 1/100th of a table N times
| may be faster than scanning the entire table N times.
| aeyes wrote:
| For Postgres 12+ you'll have to use to force the plan
|
| > WITH tmp AS MATERIALIZED
| mijoharas wrote:
| Hey, I think you'll probably get better performance if you swap
| your subquery around a little from: SELECT
| COUNT(*) FROM runs runs_other WHERE (SELECT
| COUNT(*) FROM stacks WHERE stacks.id =
| runs_other.stack_id AND stacks.account_id =
| accounts.id AND stacks.worker_pool_id =
| worker_pools.id) > 0 AND runs_other.worker_id IS NOT
| NULL
|
| To: SELECT COUNT(*) FROM runs runs_other
| WHERE EXISTS (SELECT * FROM stacks
| WHERE stacks.id = runs_other.stack_id AND
| stacks.account_id = accounts.id AND
| stacks.worker_pool_id = worker_pools.id) AND
| runs_other.worker_id IS NOT NULL
|
| Your previous option needs to hit every row in the stacks
| subquery to generate the count. The modified one uses EXISTS so
| it only needs to check there's at least one row, and then can
| short-circuit and exit that subquery. (a smart enough query
| planner would be able to do the same thing, but PG generally
| doesn't).
| cube2222 wrote:
| Thanks for the tip! I didn't know EXISTS can provide such
| benefits.
|
| However, I've tried it out and it resulted in a 10x
| performance loss. Looking at the query plan, it actually got
| PostgreSQL to be too smart and undo most of the optimizations
| I've done in this blog post... It's still an order of
| magnitude faster than the original query though!
| mijoharas wrote:
| I stand corrected, and always glad to see the main point
| about perf optimisation proven. (always measure! :) )
| gaha wrote:
| Have you tried unnesting the subquery? From the images posted
| in your blog it's not clear to me how often the subquery is
| run. Maybe unnesting it somehow like this should work well:
| SELECT COUNT(*) as "count",
| COALESCE(MAX(EXTRACT(EPOCH FROM age(now(),
| runs.created_at)))::bigint, 0) AS "max_age" FROM runs
| JOIN stacks ON runs.stack_id = stacks.id JOIN
| worker_pools ON worker_pools.id = stacks.worker_pool_id
| JOIN accounts ON stacks.account_id = accounts.id
| /*unnesting*/ join (SELECT accounts_other.id,
| COUNT(*) as cnt FROM accounts accounts_other
| JOIN stacks stacks_other ON accounts_other.id =
| stacks_other.account_id JOIN runs runs_other ON
| stacks_other.id = runs_other.stack_id WHERE
| accounts_other.id AND (stacks_other.worker_pool_id
| IS NULL OR stacks_other.worker_pool_id =
| worker_pools.id) AND runs_other.worker_id IS NOT
| NULL group by accounts_other.id) as acc_cnt
| on acc_cnt.id = accounts.id and
| accounts.max_public_parallelism / 2 > acc_cnt.cnt
| /*end*/ WHERE worker_pools.is_public = true
| AND runs.type IN (1, 4) AND runs.state = 1
| AND runs.worker_id IS NULL /* maybe also copy these
| filter predicates inside the unnested query above */
|
| Edit: format and typo
| soperj wrote:
| what are the downsides?
| cube2222 wrote:
| The downside is that a PostgreSQL version update can change
| internal optimizer rules, which may result in the plan being
| changed to a less performant one again.
|
| In that case it's good to have monitoring in place to catch
| something like that. We're using Datadog Database monitoring
| which gives you a good breakdown of what the most pricey
| queries are - it uses information from the pg_stat_statements
| table, which contains a lot of useful stats about query
| execution performance.
|
| You should regularly check such statistics to see if there
| aren't any queries which are unexpectedly consuming a big
| chunk of your database resources, as that might mean you're
| missing an index, or a non-optimal query plan has been chosen
| somewhere.
| topspin wrote:
| > The downside is that a PostgreSQL version update can
| change internal optimizer rules
|
| This has actually happened relatively recently.
|
| PostgreSQL Common Table Expressions (WITH x as (some
| query)) used to be optimization fences by default, meaning
| that you can influence the optimizer using a CTE. This is a
| well known technique among PostgreSQL users that resort to
| it for lack of optimizer hints.
|
| In Pg12 they enhanced the optimizer to remove the
| optimization fence _by default_ , so the query plans of
| existing queries automatically changed and sometimes became
| much slower as a result. If you want the old CTE behavior
| you have to modify the WITH clause via AS MATERIALIZED.
| Justsignedup wrote:
| heh i had a similar style problem, i think it was more 1000x than
| 200x, but its because certain query structures the optimizer
| couldn't figure out, had to try like 4 different re-writes of the
| sql till the optimizer could understand it.
|
| I guess at this point I just see it as just part of my job to do
| this insane stuff, good to know that others see this as hard :)
| pkrumins wrote:
| Instead of doing this madness with complicated queries, I always
| advise to split complicated queries into many smaller simple
| queries and then merge the results in the code. This way you
| don't need to debug the queries as all of them are very simple
| SELECTs and the code that merges the results is one or two for
| loops.
| ccleve wrote:
| This. Plus, you can use temporary tables, which limits the
| amount of application code you have to write. Put your
| intermediate results in temp tables, and build indexes on them
| if you have to. It's a cheap way of forcing the optimizer to do
| things your way.
| cultofmetatron wrote:
| in practice that means mroe data being sent to the application
| code that then has to process that. 99% of the time, you'll get
| more performance figuring out the right way to express the data
| structure you want in sql. The entire computation happens in
| the database so all the data is immediately available to
| process on disk and you only send what you need for bandwidth
| consumption. IF the query is slow, you need to think about how
| you setup your joins and being aggressive about your use of
| predicates in subqueries to reduce the size of temporary
| tables.
| chrisjc wrote:
| Not in every case. Some DBs allow you to reference the
| results and use the result reference in subsequent queries.
| As a result, data stays in the DB until you're actually ready
| to pull it.
|
| Or of course, you can just use temp tables.
| cultofmetatron wrote:
| thats a nice feature. does postgres have it? in that case,
| its even MORE performant to push your data processing into
| the database via a query
| chrisjc wrote:
| I'm not too sure about Postgres honestly. I was
| specifically talking about Snowflake, although there
| might be others that do it too.
|
| https://docs.snowflake.com/en/sql-
| reference/functions/result...
|
| Sometimes you can follow this pattern and never actually
| pull data into your application, which is especially true
| when you're doing ETLs (table --> table).
| Tostino wrote:
| That can sometimes be a solution... Just keep in mind, you are
| manually writing a join algorithm every time you need to do
| that, and sometimes it will be the right one for the task,
| other times not so much. The query planner will adapt your join
| type for you instead, if you can trust it.
|
| As the shape of your data changes over time, your hard coded
| algorithm choice is not guaranteed to be the best one.
| takeda wrote:
| That won't always work. Main reason for bigger queries is to
| get exact data you need, so you don't have to send unnecessary
| data over the wire which is slow.
| cube2222 wrote:
| That's more custom logic and additional code to maintain
| though.
|
| In this case we're using the Datadog Agent with its
| custom_queries directive to automatically create metrics based
| on periodically evaluated SQL queries.
| jsumrall wrote:
| Sounds like just using CTEs. This is how I write some
| complicated queries, as a few CTEs with specific logic to get
| an overall query written in a more readable way.
| [deleted]
| igammarays wrote:
| Are you happy with Aurora Serverless so far? Is it really the
| "painless scaling" it's made out to be? I'm really hesitant to
| use anything other than self-hosted Postgres with full control,
| because I'm always afraid of running into situations like these
| and not being able do what I need to do, like have an offline
| analytical read replica. Also I imagine the cost is an order of
| magnitude more than a self-hosted instance.
| cube2222 wrote:
| Not really. But it's mostly because the support for other
| features is basically non-existant. It also seems like all
| development is happening on Aurora Serverless v2 (which I think
| is in preview state?). The scaling also isn't seamless, it
| breaks your connections if you're using prepared statements -
| we have it pinned to a specific number of capacity units.
|
| We'll probably move to plain Aurora at some point, as that
| supports all bells and whistles, but for now Aurora Serverless
| is good enough.
| no-s wrote:
| It's impressive how many developer misconceptions I see cropping
| up in these articles. The point of the query planner is to
| replace an army of programmers, not evade design faults. Using a
| server RDBMS it's trivially easy to decouple schema and query
| evolution from application. Yet somehow developers always seem to
| back themselves into situations requiring extreme coupling. Using
| plan guides in the MSSQL environment helps somewhat in patchwork
| solutions, tho typically the deficiencies are usually related to
| obsolete statistics...(and that's not even necessary). In
| Postgres you should ensure design separation so you can rewrite
| the query easily.
| newlisp wrote:
| _Using a server RDBMS it's trivially easy to decouple schema
| and query evolution from application._
|
| Curious, how do you decouple query evolution from the DB layer
| of your application?
| dragonwriter wrote:
| > Curious, how do you decouple query evolution from the DB
| layer of your application?
|
| Use application-specific views, and the evolution driven by
| DB optimization, etc., can happen in view definitions without
| the app DB layer being involved. It's a traditional best
| practice anyway, going back decades to the time when it was
| expected an RDBMS would serve multiple apps that you needed
| to keep isolated from each others changes, but it also works
| to isolate concerns at different levels for a single-app
| stack.
|
| (Because of practical limits of views in some DBs, especially
| in the 1990s, and developer preferences for procedural code,
| a common enterprise alternative substitutes stored procs for
| views, to similar effect.)
| no-s wrote:
| hah, exactly. "application-specific views...Going back
| decades", or "substitute Stored Procs for views".
|
| It's not really much extra overhead to use a schema+views.
| I ask developers to avoid creating views with "Select *",
| always specify the columns desired.
|
| Further decoupling is possibly beneficial, e.g. CQRS is
| wonderful perspective for designing data models that may be
| more easily distributed or cacheable (by explicitly
| choosing to separate query schema from modification
| schema).
| newlisp wrote:
| Oh yes views, they are not with no effort and become
| numerous and specific. But are extremely missed with a
| document store I'm working on write now ;)
| Mathnerd314 wrote:
| Similar article that instead solved it by recording more
| statistics: https://build.affinity.co/how-we-used-postgres-
| extended-stat...
| AaronFriel wrote:
| Regarding this:
|
| > Only a minuscule part of the Runs in the database are active at
| any given time. Most Stacks stay dormant most of the time.
| Whenever a user makes a commit, only the affected Stacks will
| execute Runs. There will obviously be Stacks that are constantly
| in use, but most won't. Moreover, intensive users will usually
| use private Worker Pools. The public shared Worker Pool is
| predominantly utilized by smaller users, so that's another reason
| for not seeing many Runs here.
|
| > Right now, we're iterating over all existing Stacks attached to
| the public worker pool and getting relevant Runs for each.
|
| Is there an opportunity to make this query more robust with a bit
| of denormalization, if necessary, to indicate which runs and
| which stacks are active and a filtered index (index with a WHERE
| clause) on each to allow efficiently retrieving the list of
| active entries?
|
| Best practices for database optimization tell you "don't index on
| low cardinality columns" (columns with few unique values), but
| the real advice should be "ensure each valid key of an index has
| a low cardinality relative to the size of the data" (it's OK to
| have an index that selects 1% of your data!).
|
| That is, conventional wisdom says that indexing on a boolean is
| bad, because in the best case scenario 50% of your entries are
| behind each key, and in the worst case almost all entries are
| behind either "true" or "false". There are performance impacts
| from that and there's the risk the query optimizer tries to use
| the "bad" side of the index, which would be worse than a full
| table scan.
|
| But with the improved advice and insight that 0.01% of rows have
| value "true", you can create an index with a `WHERE active =
| true` clause, et voila: you have an index that finds you exactly
| what you want, without the cost and performance issues of
| maintaining the "false" side of the index.
| cube2222 wrote:
| Yep, you could indeed use a partial index with the two relevant
| conditions to solve this query well. We're using those all over
| the place as well.
| deltaonefour wrote:
| This article is everything that's wrong with SQL.
|
| How about just providing us access to the query plan and allowing
| users to directly modify it? It's like needing assembly language
| to optimize a really tight section of code but having that access
| blocked off.
|
| Why do we have to rely on leakage to a higher level interface in
| order to manipulate the lower level details? Just give us manual
| access when we need it.
| shock-value wrote:
| I think SQL is generally great but I do agree with you. Access
| to lower level primitives from which one could construct their
| own query plan would be awesome. Query hints (not available in
| Postgres) would also probably get you a lot of the way there.
| hnov wrote:
| In this case denormalizing the data right into the runs table
| would likely give you better, but more importantly more
| predictable performance.
| srcreigh wrote:
| The curious part is why postgres can't estimate that the size of
| the active runs<>stacks join will be very small.
|
| A core part of query planning is estimating the best join order
| using a dynamic programming algorithm. So roughly, A(1M) <> B(1M)
| <> C(10) should use the join order A-(B-C), not (A-B)-C.
|
| I bet it's something like, postgres doesn't know the correlation
| between runs_other.worker_id and runs_other.stack_id. It seems
| like its seeing low number of runs_other.worker_id, then
| _estimating the stacks are split evenly among those small number
| of runs_ in the millions.
|
| (Why it wouldn't know stacks.id correlates to one stack each,
| idk. Is stacks.id not the primary key? Is there a foreign key
| from runs? Very curious.)
|
| What happens if you follow this guide to hint Postgres as to
| which columns are statistically related? [0]
|
| Hinting PG's estimating stats may be a more persistent solution.
| Another core part of query planning is query rewriting-if they
| add a rule to simplify COUNT(select...)>0 to normal joins (which
| are equivalent I think?) then your trick may stop working.
|
| [0]: https://www.postgresql.org/docs/12/multivariate-
| statistics-e...
| cube2222 wrote:
| To be fair, we're using Aurora Serverless, so I'll have to dig
| deeper if we can use these kinds of hints. But thanks for the
| tip, this could be really useful!
| mst wrote:
| Given Aurora talks about itself as a replacement for pg/mysql
| that provides compatible skims I'm honestly wondering whether
| you're dealing with the pg optimiser here in any meaningful
| way.
|
| (exactly how far along the scale from 'tweaked version' to
| 'basically just a skin on a completely new database' aurora
| is is ... opaque to me ... there's probably something out
| there that answers that but I haven't yet managed to find it)
|
| Edited to add: zorgmonkey downthread points out they have
| support for some extensions like pg_hint_plan - https://docs.
| aws.amazon.com/AmazonRDS/latest/AuroraUserGuide... - which
| suggests at least a decent amount of pg code but I'm still
| curious exactly where the changes start.
| cube2222 wrote:
| I obviously don't know its exact internal architecture, but
| based on my digging and trying out different PostgreSQL-
| native features, it looks like it's PostgreSQL instances as
| query executors with a custom (high-availability) storage
| layer, fronted by PgBouncer.
| slt2021 wrote:
| There is no need to trick query optimizer, just common sense and
| understanding of rdbms engine.
|
| The query seems pretty standard and often occuring problem with
| nested queries. Every JOIN and nested query grows computation due
| to cartesian combination, so having your predicates as narrow and
| specific as possible is a must.
|
| Also Nested Loop is a kiss of death for your query performance
___________________________________________________________________
(page generated 2022-01-18 23:00 UTC)