[HN Gopher] Operating on a minimal two-core Postgres instance: Q...
       ___________________________________________________________________
        
       Operating on a minimal two-core Postgres instance: Query
       optimization insights
        
       Author : rob742
       Score  : 148 points
       Date   : 2023-11-15 14:08 UTC (1 days ago)
        
 (HTM) web link (robertrode.com)
 (TXT) w3m dump (robertrode.com)
        
       | KingOfCoders wrote:
       | This gives me hope, my current idea I work on is osal planned for
       | a two-core Postgres cloud instance from Hetzner.
        
         | znpy wrote:
         | > This gives me hope, my current idea I work on is osal planned
         | for a two-core Postgres cloud instance from Hetzner.
         | 
         | Keep in mind that years ago having two cpus (single core cpus,
         | with incredibly worse performance) was the pinnacle of
         | technology, and still people managed to handle a decent amount
         | of data.
        
       | sgarland wrote:
       | Some of the points mentioned are sensible, but others are
       | horrible (shift joins to the application) or simply nonsensical,
       | like this:
       | 
       | > If suboptimal methods (Nested Loop, Hash Join, Merge Join,
       | etc.) are chosen for joins, this can lead to unnecessarily high
       | resource consumption.
       | 
       | Pray tell, what join strategies are left to use? Also, none of
       | those are suboptimal, given the right schema and settings. Nested
       | loop is fine, provided one side of the join is small.
       | 
       | What isn't mentioned (perhaps in a future post it will be) is how
       | table and column statistics can affect performance. That's how
       | you get nested loops when a merge join would actually be more
       | performant.
        
         | OJFord wrote:
         | Maybe it means (trying to read charitably) that those are the
         | methods, and if a suboptimal one of them (for a given query,
         | schema, etc.) is chosen, then ...
         | 
         | Or it's just a continuation of the first point: any non-
         | application join is suboptimal. (Erm..)
         | 
         | Edit: in fact, re-reading TFA, I think it is the latter or
         | something like it - because the shift to application logic is
         | mentioned as a 'countermeasure' for the 'suboptimal join
         | methods'. I think it means deal with sometimes getting a bad
         | plan by taking control and doing it yourself in the
         | application. (I don't agree, just trying to understand.)
        
         | robertlagrant wrote:
         | If you're joining in the application, why use a relational
         | database?
        
           | horse_dung wrote:
           | This is so true. If you just want the data then use a
           | different store (e.g. NoSQL) and everything will be quicker,
           | lighter weight and slower to develop...
        
             | paulddraper wrote:
             | What does "lighter weight" mean?
             | 
             | Smaller install size? Smaller memory footprint? Schemaless?
        
           | toolz wrote:
           | Just because you're joining in the application for one code
           | path doesn't mean all of the code paths are doing the same
           | thing. There are certainly valid use-cases for joining in the
           | application because application layer can often times be
           | horizontally scaled more cheaply/easily than the persistence
           | layer.
           | 
           | So in a scenario where you have plenty of network IO being
           | under utilized for your database, it's not unimaginable where
           | joining in the application layer would be preferable.
        
           | pard68 wrote:
           | FWIW the article never says to join at the app level. GP read
           | wrong or something. The article says to shift logic to the
           | application.
        
             | robertlagrant wrote:
             | Ah - fair enough. I knew I should've checked the article
             | first before replying!
        
           | OJFord wrote:
           | This may start to depend on exactly what you're willing to
           | call relational, but because you still want the schema &
           | foreign key constraints for example enforced in the database,
           | you're just going to do the actual mechanics of joining the
           | data in the application?
           | join_users_and_widgets(           run_sql('select * from
           | users where account_id = :1', req.account_id),
           | run_sql('select * from widgets'),         )
           | 
           | for example, where the db still enforces that
           | `widgets.owner_id` is a fk to `users.id` say. And
           | `join_users_and_widgets` only has to know the current version
           | of the schema, all rows have the same columns as it were.
        
         | paulddraper wrote:
         | It just means PostgreSQL can choose the wrong one for a
         | specific instance, not that those are universally wrong.
         | 
         | "PostgreSQL chosing a suboptimal join methods (a 'join method'
         | is Nested Join, Hash Join, Merge Join, etc., each of which is
         | optimized for a specific situation) leads to unnecessarily high
         | resource consumption.
         | 
         | "[And unlike virtually every other DBMS, PostgreSQL doesn't
         | support specifying which method is used for a particular join.
         | So even if it happen relatively infrequently, it's very
         | difficult to correct a situation where it does choose the wrong
         | one.]"
        
           | sgarland wrote:
           | > PostgreSQL doesn't support specifying which method is used
           | for a particular join
           | 
           | Not directly, no. You can make some of them possible /
           | impossible depending on your indexing and server settings, as
           | well as the query itself. For example, if the smaller side of
           | the join can't fix into `work_mem` - which defaults to 4 MiB
           | - you won't get a hash join. You can either create smaller
           | tables, more restrictive join conditions, or increase the
           | size of `work_mem` (but be careful not to blow up the DB,
           | since that's per connection).
        
             | paulddraper wrote:
             | And you can use the enable_* sledgehammers as well.
        
         | matthewaveryusa wrote:
         | Sometimes it's better to do things in the application instead
         | of caking it in an sql query.
         | 
         | An example would be a subquery that yields a large temp table,
         | only to be pruned down to a single row.
         | 
         | Your choice is to either have understanding of the query
         | optimizer and formulate a query that forces the subquery with a
         | smaller temp table first, or you can bring it back to the
         | application. Either approach works, but in once case you need
         | to have deep-ish understanding of the optimizer and SQL, and in
         | the other not so much (at the cost of an RTT.)
        
           | sgarland wrote:
           | > but in once case you need to have deep-ish understanding of
           | the optimizer and SQL
           | 
           | I will never understand why devs think they need to
           | understand their chosen application well, but not SQL. Your
           | application relies on SQL, so why shouldn't you deeply
           | understand it? Relational algebra isn't that complex compared
           | to other things we take for granted as necessary.
        
           | whynotmaybe wrote:
           | Bringing more data to the app so that the app filters it
           | instead of the DB server is always wrong.
           | 
           | An app with such solution is a sign of mutliple issues that
           | either the DB schema is wrong, or that the devs don't know
           | enough Sql, or both.
           | 
           | Sql has been there for decades, much less longer than many of
           | the actual dev framework. PostgreSQL (and other DBMS) have
           | also been there for decades, they have perfected the art of
           | fetching data.
        
             | jon-wood wrote:
             | Nothing is ever "always wrong". It may be a sign that a
             | little more thought can lead to a more
             | maintainable/quick/efficient solution, but it may also be
             | that under the circumstances an approach that would be the
             | wrong one if applied widely actually is the correct
             | tradeoff between those factors in this case.
        
         | jmull wrote:
         | The article says "Shift _logic_ to the application level to
         | reduce database load. "
         | 
         | (Emphasis mine.)
         | 
         | It may have been edited since you commented (perhaps because of
         | your comment), but it seems like they aren't actually
         | advocating to do joins at the application level.
        
           | 7thaccount wrote:
           | I worked at a company with an extremely complex optimization
           | application with more business rules than any human being
           | could ever comprehend. The logic has to go somewhere. Either
           | the database or the interface or the application itself I
           | guess. There are pro/cons to all. Some of the logic is hard
           | to follow in SQL compared to a general purpose language. The
           | vendor for us of course did some kind of blend where a lot of
           | SQL packages do certain validations and so on. The
           | optimization problem reads a bunch of .csv files from the
           | database and then has its own rules it uses to build the math
           | problem for the mathematical solver.
        
           | sgarland wrote:
           | I could've sworn it said joins, but I'm not positive.
           | 
           | Tbf this is what Prisma does, albeit it joins at its own
           | layer, then passes results back to the caller. Still
           | horrifying.
        
         | dboreham wrote:
         | Just to push against the crowd here pooh poohing client-side
         | joins: this is a standard technique for scaling when you are
         | stuck (for the time being) with a RDB and is definitely not in
         | general "horrible". It does _not_ mean pulling entire tables to
         | the client then joining there. It means splitting a query that
         | in effect walks down a chain of pointers (FKs) between tables
         | into separate client-initiated queries. There are several
         | benefits to this: 1. makes the design  "nosql-ready" for when
         | you design out the RDB. 2. Makes all DB queries inherently
         | indexed and quick, allowing easy perf analysis (any non-quick
         | query must be a bug) and alleviating connection pool starvation
         | issues. 3. Prevents pesky devs from adding expensive queries
         | under the radar. 4. allows more ready deployment of caching
         | solutions. 5. recognizes that client resources scale with the
         | number of clients, and are paid for by the client while RDB
         | resources generally scale with difficulty and $$$, and are paid
         | for by you.
        
       | JCharante wrote:
       | > Shift logic to the application level to reduce database load.
       | 
       | I'm not convinced by this. What type of logic are we talking
       | about?
        
         | leetrout wrote:
         | Some companies do large amounts of stored procedures for even
         | routine tasks.
        
           | JCharante wrote:
           | Ah I see. I thought it meant taking away some merge logic out
           | of the database layer.
        
             | leetrout wrote:
             | Oh I think they mean that, too. I am just trying to be
             | charitable by calling out times where it can be a lift to
             | yank something out of the DB.
             | 
             | This application side join stuff was real popular about 15
             | years ago with Mongo and similar stores that didnt support
             | joins.
             | 
             | It is one of the many things that separated RethinkDB from
             | the document db crowd at that time.
        
         | northbymidwest wrote:
         | This is a common pattern with NoSQL databases, but really, the
         | author of this article should just spend more than $10 a month
         | on their Postgres instance.
        
         | wongarsu wrote:
         | How to operate on a tiny database server: shift all work from
         | the database server to the application servers.
         | 
         | See also: webscale databases
        
         | nijave wrote:
         | The last place I worked had some JSON columns and doing lots of
         | JSON operations tended to eat up all the Postgres CPU (the
         | servers were generally 16-32 cores)
        
           | ddorian43 wrote:
           | Native columns are better/faster.
        
             | ahoka wrote:
             | Not everything fits into those.
        
               | sgarland wrote:
               | Not with that attitude.
        
           | ttfkam wrote:
           | We had that too. Partial expression indexes mitigated most of
           | the performance problems and CPU load.
           | 
           | We then created consistently-named functions that made the
           | appropriate JSON query. Then an expression index on the
           | function. If the key was missing (NULL result), it wasn't
           | added to the index, and we stripped all JSON nulls before
           | storing.
           | 
           | It wasn't necessarily pretty, but neither was the client's
           | source data, which we had little to no control over. Best of
           | a bad situation.
           | 
           | Jsonb columns required more steps afforded us more
           | flexibility and timeliness when the client kept asking for
           | "one more feature" at the last minute.
        
         | winrid wrote:
         | Easier to scale application CPU usage horizontally than a
         | single instance DB.
        
           | WJW wrote:
           | Even easier (and much less error prone when it comes to ACID
           | properties) would be to scale the single instance DB
           | vertically instead.
        
         | dkjaudyeqooe wrote:
         | This makes no sense and is typically the opposite of what I'd
         | build. The more business logic (various validation rules and
         | processing) you have in the database (via stored procedures)
         | the less problems you have, not least because all that
         | processing becomes ACID.
         | 
         | The reducing load on the database claim is nonsense since the
         | application would have to query the database anyway and less
         | efficiently that the database could, avoiding numerous round
         | trips between application and database.
         | 
         | If you implement business logic in front end applications,
         | you're asking for trouble. Many places I've worked didn't allow
         | software outside the database to do anything more than call a
         | stored procedure.
         | 
         | Application servers did come into fashion at one point, with
         | predictable results: subtle timing bugs, inconsistent
         | processing with different results depending on which
         | application server processed the data, a general difficulty in
         | finding the source of bugs, security problems.
         | 
         | Everyone would be a lot less stressed if they did everything in
         | the database.
        
           | count wrote:
           | Oracle, DB2 or SQLServer?
           | 
           | >The reducing load on the database claim is nonsense since
           | the application would have to query the database anyway and
           | less efficiently that the database could, avoiding numerous
           | round trips between application and database.
           | 
           | Ah, but by shifting the logic to the app, I can now query
           | multiple databases (of disparate types even! vector,
           | document, object, relational, etc.), each optimized for its
           | particular data and query types and combine the result.
           | That's very....not easy in stored procedures. I can also
           | scale out different tiers of the application based on its
           | needs (heavy compute, gpu-based, etc.). How do you do that
           | when everything is a stored proc on a database?
        
             | dkjaudyeqooe wrote:
             | Well depends on the requirements, does it? I'm not
             | suggesting you try to stick all your processing of all
             | things inside a database, I'm saying put all critical
             | processing of data stored in said database within it. Put
             | all related data in the same database where is makes sense.
             | Whatever data doesn't belong in that database can be merely
             | referenced from it.
             | 
             | > I can now query multiple databases
             | 
             | This is another thing I disagree with. Wherever possible
             | related data should be centralised to improve ACID
             | properties and to create a reliable single "source of
             | truth". I know this is unfashionable. with everyone
             | claiming they need massive scale (which does cause some
             | issues with this design), but much like microservices, it
             | ends up being mostly busy work for programmers and related
             | professions.
             | 
             | Having said that, multiple databases, if largely orthogonal
             | to each other present no real problems, and the same design
             | concept is applied to each.
        
               | bvirb wrote:
               | FWIW I think I agree with you if you consider "critical
               | processing of data" data validations (IME via
               | constraints, checks, triggers, etc...). I think where
               | things can really go wrong is when the DB accepts
               | anything the application layer thinks is valid.
               | 
               | My favorite ORM says the same thing: https://sequel.jerem
               | yevans.net/rdoc/files/doc/validations_rd...
               | 
               | I wouldn't be surprised if more people agree with that as
               | well, maybe just using different terms.
        
               | dkjaudyeqooe wrote:
               | > I think where things can really go wrong is when the DB
               | accepts anything the application layer thinks is valid.
               | 
               | No, my view is the opposite of that: the database doesn't
               | allow anything invalid to enter the database.
        
           | Galanwe wrote:
           | That is the exact opposite of my experience. I try very hard
           | to have the database as dumb as possible.
           | 
           | - Having mixed workloads on the database servers make it
           | almost impossible to profile and tune. By definition, a
           | database answers a lot of queries from a lot of clients. If
           | these queries are hybrid data fetching / compute, then it's
           | pretty much impossible to make sense of any system metrics.
           | 
           | - Having hybrid workloads usually lead to considering DB
           | servers as black boxes (because you cannot make sense of
           | their resource usage easily). This in turn leads to having
           | hyper pumped up DB servers with both crazy RAM, crazy CPU,
           | crazy caches, crazy SSDs and crazy HDDs.
           | 
           | - DB vendors of course understand the previous point, thus
           | why Oracle switched to a "pay per core on the machine"
           | model...
           | 
           | - The DB logic layer ecosystem is most of the time subpar
           | compared to a more traditional programming language. If you
           | depend on your DB server performing a lot of work, you will
           | want to make these awesome queries accessible. That is, you
           | will want stored procs, extensions, etc. Dealing with those
           | is a huge pain to test, version, update...
           | 
           | - Overall all RDBMS have special SQL extensions all over the
           | place. You're probably using tons of those without even
           | knowing it. With time, vendor locking will become an immense
           | burden and you'll be forced to pay millions to Oracle to keep
           | your business running.
        
           | el_nahual wrote:
           | The problem with having business logic in the DB is not a
           | _performance_ problem, it 's a maintainability, testing, and
           | documentation problem.
        
             | dkjaudyeqooe wrote:
             | You've demonstrated the real issue here, but not the one
             | you claim, it's that programmers have no idea how to use
             | (SQL) databases or how they work.
             | 
             | The idea that software has those negative characteristics
             | merely because it resides within a database, is laughable.
        
               | setr wrote:
               | All of the PLSQL dialects are pretty much universally
               | mediocre at best. They don't (de-)compose well, they
               | don't version-control well, they don't document well,
               | they don't debug well -- they pretty much lack any of the
               | learnings and tooling that occurred in application
               | languages over the last 20 years. You're basically
               | dealing with all of the problems of a poorly designed DSL
               | made by your favorite coworker.
               | 
               | You could do something more sane like running lang-
               | extensions on the db so you have access to a "real"
               | language to specify your business logic in, but then you
               | haven't really changed anything except to run your app
               | logic on the DB _server_ and doesn't really have anything
               | to do with the RDBMS at that point.
               | 
               | Moving business logic into the schema/queries can be
               | fine, but you still have the issues that you can't
               | document it particularly well, can't version control it
               | well, can't test it well, can't trust your performance
               | tuning well, etc. Which all derives from DBMS expecting
               | to be treated as "living systems", and has nothing to do
               | with knowing/not knowing SQL.
               | 
               | And the problem with modifying living systems is that you
               | really don't want to be fucking around with it because
               | anything could potentially send it reeling (see:
               | surgeons). You _really_ want it to be stable... so moving
               | your unstable business logic out is almost a necessity
        
               | nicoburns wrote:
               | A lot of the stuff I see people writing application code
               | for doesn't require any more complex SQL features than
               | joins, aggregations and filters. i.e the stuff plain old
               | SQL is really good at.
        
               | setr wrote:
               | That's just born of either psychosis or naivety; but even
               | when one does know SQL, there's more to consider. RDBMS's
               | are awkward systems to work with, and SQL knowledge isn't
               | the issue
               | 
               | moving business logic into schema/queries is usually
               | fine, (and perhaps the only sane place for the DB to
               | directly interact with business logic) though schema
               | changes has all the issues described. Queries get to live
               | with your application code, and so can be treated sanely,
               | but you only really get to a good setup with a decent
               | query builder library (ORM-haters smashing strings
               | together like an ape is naturally full of issues, and
               | ORM-lovers get all the fun of table/object model-mismatch
               | and poor query generation). But then you run into the
               | issue that every database library tries to be database-
               | agnostic, and you lose access to all the fancy features
               | your database provides
        
               | nicoburns wrote:
               | > but you only really get to a good setup with a decent
               | query builder library (ORM-haters smashing strings
               | together like an ape is naturally full of issues, and
               | ORM-lovers get all the fun of table/object model-mismatch
               | and poor query generation). But then you run into the
               | issue that every database library tries to be database-
               | agnostic, and you lose access to all the fancy features
               | your database provides
               | 
               | I've found that smashing strings together works
               | excellently so long as:
               | 
               | - You use something like JavaScript's template strings so
               | that interpolated fragments are inline
               | 
               | - You have named placeholder support (so that you're not
               | trying to count parameter placements which gets hugely
               | complex for optional parameters).
               | 
               | - You mainly use it for SELECTS and have minimal query
               | builder for INSERTs/UPDATEs/UPSERTs
               | 
               | And you get full access to all of the database's
               | features.
        
               | setr wrote:
               | If you're maintaining a layer between construction and
               | finalization (the query builder), then you're not really
               | in the string smashing group I was describing. The main
               | thing I was trying to get at there is no one should make
               | the mistake of "ORMs are shit, I can just write SQL
               | strings directly" because that road is plagued with
               | problems (because SQL is not a language that composes
               | well..).
               | 
               | And it's not just named parameters; with a sane query
               | builder you don't need to totally rewrite things when
               | dealing with subqueries vs top-level, aggregations, etc.
               | 
               | As long as you don't go so far as to introduce the horror
               | of an ORM
        
               | JCharante wrote:
               | > The main thing I was trying to get at there is no one
               | should make the mistake of "ORMs are shit, I can just
               | write SQL strings directly"
               | 
               | This has actually been the road I've been heading down on
               | my personal projects. If I were to use this for work I
               | would use something like Kysely so I can have type safety
               | and make sure the application layer is updated to support
               | every migration, but I've taken time to really practice
               | SQL and write queries & migrations manually and avoid
               | ORMs completely. I love writing SQL by hand, however I
               | know that doesn't scale without some automated checking
               | going on.
        
               | dkjaudyeqooe wrote:
               | You're making my point here. RDBMSes are not any more
               | difficult or mysterious than any other software except
               | for the fact that you're more familiar with that software
               | and language, thus your 'doesn't X well'.
               | 
               | What you will find though is any sufficiently complicated
               | business application software implementing an ad hoc
               | informally-specified bug-ridden slow implementation of
               | half of what the RDBMS is giving you for free.
        
               | setr wrote:
               | > except for the fact that you're more familiar with that
               | software and language, thus your 'doesn't X well'.
               | 
               | That's exactly not my point. My point is that, knowing
               | SQL, knowing what RDBMS's have and what they are, RDBMS's
               | are a bad place to stuff your business logic. You have a
               | variety of options that don't work well. Schemas/queries
               | are the only relevant place where RDBMS's excel, and
               | because it's a living system, you don't really want
               | something so unstable as business logic to permeate
               | through your schema significantly.
               | 
               | It's not a matter of familiarity. It's the wrong tool for
               | the job being discussed.
        
               | dkjaudyeqooe wrote:
               | What's not a "living system"? You can update RDBMSes much
               | more cleanly and reliably than almost all other software.
               | Stored procedures (in a variety of languages) and the
               | like do not necessitate changes to your data schema and
               | the data that ends up in your schema belongs there,
               | regardless of how its produced. That is a data design
               | issue.
               | 
               | Being able to roll forward and back with code and data
               | (including an updated schema) that is synchronised and
               | works is a huge win. In most places where I've worked
               | with serious RDBMS systems developers have no access
               | whatsoever to production. The fact that this isn't the
               | case universally illustrates the strength of these RDBMS
               | approaches.
        
               | JCharante wrote:
               | why aren't stored procedures on a DB not able to be
               | version controlled well? People already version control
               | database schemas and migrations, why would stored
               | procedures not also be able to be version controlled just
               | as well? You can test it by creating shadow databases
               | (what Prisma does) to make sure all migrations can be
               | done on an empty database just fine, add test data, call
               | the procedure, and check the output. Genuinely please do
               | enlighten me as to why this isn't possible or doable, I
               | would love to learn more about this part of the field.
        
         | stiiv wrote:
         | Here is a scenario that I've encountered a number of times.
         | 
         | We needed to serve up some data for an API endpoint. Great!
         | We'll write a SQL query for it. Just need a couple of joins and
         | where-clause parameters -- it will be speedy, and easy to
         | maintain.
         | 
         | Before long, we are asked to make our query more robust,
         | including a couple of optional parameters. Some null-coalescing
         | in our join expressions and where-clauses is introduced.
         | Performance immediately suffers. (I've seen the same kind of
         | impact when introducing unions or function calls within such
         | expressions -- the query plan can get very messy when you try
         | to scale in this manner.)
         | 
         | So what do we do? Restructuring our data (maybe with a
         | materialized view or something) is one option.
         | 
         | A reasonable alternative is to break up the single query into a
         | few more specialized queries, and then use application logic to
         | inspect parameters and choose the best query to use for each
         | request.
         | 
         | Plenty of trade-offs to consider in both cases.
        
           | JCharante wrote:
           | I don't think there's any trade-offs in that situation. It
           | doesn't make sense to have your queries handle optional
           | parameters when you have that info already precent in the
           | application layer and can choose the specialized query. I
           | suppose if you were really rushed you could end up in that
           | starting scenario.
        
       | ranting-moth wrote:
       | I agree that most DB instances are grossly underutilized (or
       | abused performance wise).
       | 
       | But an "entire company" is a tad crude unit of measure when
       | talking about optimization.
       | 
       | Good point though. I've seen companies create a completely new
       | replicated DB instance for every little pet project they start.
       | Crazy. Nice if you're a cloud provider though.
        
       | mrweasel wrote:
       | One issue I frequently ponder is: What would happen if we gave a
       | modern computer to a developer 30 or 40 years ago?
       | 
       | They'd be overwhelmed by the instruction set I suppose, but how
       | large a business could they run on a modern desktop?
        
         | narag wrote:
         | You know, some of us are still alive today :)
        
       | throwawaaarrgh wrote:
       | I don't think most people today realize how wasteful modern
       | software engineering is. They think, it's fine to be inefficient,
       | I'll just get a bigger X. And we are now blessed with technology
       | and solutions that make that a possibility, when for the longest
       | time it wasn't.
       | 
       | But you don't want to have to optimize when you don't expect to.
       | It takes time away from your features and quality, and is risky.
       | You want to predict when you'll need to grow and how much effort
       | it might take, and control costs. That's why software development
       | should be efficient from the start. By only using what you need,
       | you don't run into situations where you have to optimize to
       | survive.
       | 
       | Anyone who says to me "I'll just put everything in PostgreSQL and
       | optimize later" sounds just as naive as the person who says
       | "let's build for scalability before we have any customers". Both
       | are giving themselves problems they don't need, and trying to
       | justify it by thinking it's going to save them time. The former
       | is giving themselves problems in the long term, the latter is
       | giving themselves problems in the short term. Even if these
       | tradeoffs are acceptable to you, they are often unnecessary!
        
         | api wrote:
         | ... then they see the cloud bill.
         | 
         | Cloud companies found a way to directly monetize developers'
         | love of complexity and overengineering. It's genius.
        
           | robocat wrote:
           | Developers often love simplicity and underengineering, which
           | can also lead to performance problems.
           | 
           | My personal definition of engineering is: choosing good
           | compromises between competing needs.
        
           | ecshafer wrote:
           | This comment will be taught in business schools one day.
        
             | api wrote:
             | 1. Find pathology.
             | 
             | 2. Attach toll booth.
             | 
             | 3. Profit.
        
           | aleph_minus_one wrote:
           | > Cloud companies found a way to directly monetize
           | developers' love of complexity and overengineering. It's
           | genius.
           | 
           | The developers' tastes differ a lot. Consider the extreme
           | counterexamples
           | 
           | * the "small code" scene
           | 
           | * people who love to port modern games to C64, Amiga, retro
           | consoles, ...
           | 
           | The situation is in my opinion rather that quite some
           | managers love complexity and "future-proofness", and thus
           | hire developers who have a similar taste.
        
           | _a_a_a_ wrote:
           | IME it's idiot managers that go for love of complexity and
           | overengineering - though not all mgrs are idiots.
        
         | okeuro49 wrote:
         | > "I'll just put everything in PostgreSQL and optimize later"
         | 
         | Sounds good to me!
        
           | 0xbadcafebee wrote:
           | "I'll just drink and eat heavily in my 20s and worry about
           | exercising later"
           | 
           | Or you could... not?
        
             | yjftsjthsd-h wrote:
             | To what benefit? In my (early) twenties, I _could_ do that
             | without consequence. But what benefit is there to not doing
             | it while you can? (I mean, assuming you 're doing so at a
             | non dangerous level)
        
         | zamadatix wrote:
         | A new business already has a laundry list of "I need to
         | ${thing} to survive". Being able to move one such problem a
         | couple of years in the future is already an attractive value
         | before considering most businesses won't last that long anyways
         | (for reasons unrelated to how optimized the database was). I
         | mean if you already have the know how, confidence, and
         | experience to do it right in the same amount of time then duh.
         | If not, is spending precious extra time that early on really
         | going to help you survive long enough to reach where how
         | optimized you infrastructure spend is seems like an important
         | problem for your cash flow?
         | 
         | Sometimes I think this then gets conflated with more
         | established businesses where continuing to start every project
         | so heavily this way does tend to create more problems than it
         | solves.
        
         | SkyPuncher wrote:
         | > Anyone who says to me "I'll just put everything in PostgreSQL
         | and optimize later" sounds just as naive as the person who says
         | "let's build for scalability before we have any customers".
         | Both are giving themselves problems they don't need
         | 
         | Both are giving themselves problems, but they are fundamentally
         | different decisions. I will almost always take the "optimize
         | later" approach as there's not always a "later".
         | 
         | For most new products, the biggest challenge is finding product
         | market fit. Once you have that, the world is your oyster. I
         | will happily pay a 10% tax after PMF to be 10% better at
         | finding PMF now.
         | 
         | If I've found PMF, I will have more revenue, more traction,
         | more budget, etc to fix the problems I created. If I don't have
         | PMF, everything goes in the trash.
        
           | adamnemecek wrote:
           | It's not 10% though, is it.
        
             | SkyPuncher wrote:
             | 10% was just a smallish number. In my experience, it's
             | actually less than 1%.
             | 
             | Code typically follows one of two paths:
             | 
             | * It's in an unused/underused part of the app. Performance
             | doesn't really matter.
             | 
             | * It's in a highly used part of the app. Performance
             | matters, but you'll never be able to predict the features
             | that get layered on top of it. Fix performance as you build
             | those new features.
        
         | nijave wrote:
         | Especially in small/medium sized businesses, hardware is
         | cheaper than payroll. It's most cost effective to pay people to
         | build features (that drive revenue) while paying a bigger
         | hardware bill than spend engineering hours optimizing hardware
         | usage.
         | 
         | At some scale they'll be a break even/shift
        
         | hliyan wrote:
         | I once ran the daily trade reconciliation of an entire US
         | primary exchange on my laptop, using a PERL script (in 2007). I
         | wrote it in a couple of days, and each day's processing ran in
         | minutes. Ten years later, I witnessed a team spending weeks
         | spinning up spark clusters to perform a job that was
         | significantly smaller. At the time I thought _I_ was the person
         | who was missing some important context. Now I 'm beginning to
         | realise that I was not in fact misinformed or insane.
        
           | cutemonster wrote:
           | Maybe you were missing something? You could have been a
           | manager, commanding a team of Spark engineers and an even
           | higher salary?
        
         | teaearlgraycold wrote:
         | I don't know if you've ever had to build an MVP at an early
         | stage company
        
         | numpad0 wrote:
         | In my fuzzy sentimental memory, software bloat seemed to always
         | self-regulate to 5-10 lines per second of logs or 2-3 screen
         | flickers per minute, whichever applicable. Things go slower and
         | optimizations come about, go faster and another Midleware.js
         | comes to be.
         | 
         | My working theory is this is the common denominator max
         | comfortable flickering rate for developers. There's got to be
         | sweet spot for UI, how do I say, fluidity. And that fluid-ness
         | usually aren't controlled for dev tools, tools themselves must
         | evolve to stay at comfortable refresh-rate, by natural
         | selection. That shouldn't be too weird of an idea.
        
         | mynameisash wrote:
         | > They think, it's fine to be inefficient, I'll just get a
         | bigger X. And we are now blessed with technology and solutions
         | that make that a possibility, when for the longest time it
         | wasn't.
         | 
         | Unfortunately, the culture in my group is basically this. We
         | never address things until they're dire inefficiencies; thus, I
         | spend a nontrivial amount of my time addressing workloads that
         | take 10+ hours but _need_ to be faster (and often with minor
         | changes are cut down to 2 hours).
         | 
         | Having a culture of "I'll just throw more hardware at it" / "If
         | it takes an hour or ten, I still am going to run it and get the
         | results tomorrow" is costing ridiculous amounts of money and
         | lost time. It's such an uphill battle to try to change that
         | culture.
        
       | alkonaut wrote:
       | You have to love the revival of things like "Let's render the
       | html on the server!", "Lets create the whole page with just one
       | request!", "Let's use SQL, and perhaps just ONE server!", "Try
       | using a server in your office to save on cloud bills", "Put all
       | your microservices in one process so they are cheap to run and
       | easy to deploy!".
        
         | riku_iki wrote:
         | > "Try using a server in your office to save on cloud bills"
         | 
         | and then company stops working when that server dies.
        
           | nik736 wrote:
           | You could also say the same for the cloud when your internet
           | is not working.
        
             | riku_iki wrote:
             | It is easy to setup fault tolerant, simple, low maintenance
             | and cheap for small traffic and data infra in the cloud
             | (e.g. AppEngine+CloudSQL) where provider will provide
             | 99.99% availability.
             | 
             | Internet not working is usually can be mitigated through
             | multiple providers, cellular included
        
               | narag wrote:
               | You can setup redundant local servers too.
        
               | riku_iki wrote:
               | yes, and then you need to have engineer with $300k salary
               | to support your app/postgresql cluster, load balancer,
               | fallack, backups, security, etc with unknown quality.
        
               | narag wrote:
               | So it seems I'm severly underpaid and underemployed.
               | Seriously I'm wasting my time reading HN.
        
       | jabart wrote:
       | Whatever is spent optimizing this to two cores is hours wasted
       | when you could expand. Developer time is still cheaper than Cloud
       | Cores or Ram. Yes you should write SQL queries with basic best
       | practices and have a schema that is just right between
       | performance and normalized.
       | 
       | Not mentioned was table size. You can do a lot of odd things with
       | < 100 million rows of data and get away with it.
        
         | outside1234 wrote:
         | Yes - I think the overall headline of this is correct (most
         | people should just use PostgreSQL) but they should NOT go into
         | five zoom levels of detail on optimization or anything other
         | than really understanding and utilizing knowledge of how a
         | relational database works (utilizing indexes etc.)
        
         | sgarland wrote:
         | Nothing in the article was in the weeds, and you can do all of
         | it in RDS as well if you'd like (pgstat is an available
         | extension, I think enabled by default).
        
       | auspiv wrote:
       | Not entirely clear from the article/blog - is this an RDS
       | instance (equivalent) or EC2 instance (equivalent) in a cloud
       | with storage attached?
       | 
       | I didn't think it was possible to edit postgres.conf on a RDS
       | type managed database. But the author also talks about scaling up
       | quickly as performance walls are hit while they work on
       | optimizing stuff.
        
         | rob742 wrote:
         | In RDS this would be possible via Parameter/Option groups
        
       | monlockandkey wrote:
       | Does anyone have any experience with Pgtune, i.e has if made a
       | difference in performance?
       | 
       | https://pgtune.leopard.in.ua/
        
       | paxys wrote:
       | The optimizations are no doubt impressive, but looking at it from
       | a costs perspective - they run their company on two DB instances,
       | each with two CPU cores and 4 GB of ram. The annual cost of
       | running such a machine on AWS (with SSD included) is ~$4000. Even
       | if their DB infra was only half as efficient as it is today,
       | their costs would increase by a whopping $8000/yr.
       | 
       | How much do they pay an average software developer? How many
       | hours have developers spent on query monitoring, optimization,
       | removing joins, removing aggregations? What is the ongoing cost
       | of learning and applying these techniques to all new code written
       | at the company? Even with the most conservative estimates I'm
       | confident that it costs them several orders of magnitude more in
       | developer productivity than they are saving on hardware.
       | 
       | Optimization is good, but every company needs to recognize where
       | they are throwing away dollars to save cents. There is an optimal
       | level of inefficiency that companies _want to_ have.
        
         | mcqueenjordan wrote:
         | How did you arrive at $4000?
         | 
         | I used https://calculator.aws/#/estimate and my number is over
         | an order of magnitude less than $8000 for 2 instances with
         | SSDs, 4GB RAM, and 2 cores.
        
           | wheaties wrote:
           | I/O, backups, and such. Just having the instances cost money.
           | Using them costs even more!
        
             | sgt wrote:
             | And egress at AWS.
        
           | SteveNuts wrote:
           | I've always thought there should be a meta "AWS Calculator"
           | Calculator tool, which takes the output from the official AWS
           | calculator and applies an algorithm to make it more realistic
           | to factor in the things people forget about (mostly data
           | transfer)
        
         | fatherzine wrote:
         | How much would they pay an average SRE to deal with random
         | fires operating a poorly designed, poorly instrumented and
         | poorly understood system?
        
           | jezovuk wrote:
           | This. Optimized system means the devs took the time to
           | understand the system, know the (most frequent) usage
           | patterns and will not be nagged by 'something is slow'
           | problems every so often. Of course, as with everything, there
           | is a danger of going overboard. (And industry is firmly
           | headed in the 'just throw more people/hardware/AI at the
           | problem' direction anyway).
        
         | YetAnotherNick wrote:
         | Reserved m6a.xlarge(4vcpu, 16 GB RAM) costs <$1000. Assuming
         | most companies use reserved for things like DB. So yeah their
         | savings is like less than $1000, at which point it feels more
         | like a developer satisfaction project rather than having any
         | ROI on money terms.
        
           | _a_a_a_ wrote:
           | Link? ta
        
         | rob742 wrote:
         | The balancing act between one-time developer hours and the
         | recurring costs associated with scaling up database resources
         | can be challenging. If sufficient funds are available, scaling
         | up may seem like an easy decision. However, investing in the
         | development of expertise in efficient database operations is
         | usually more cost-effective in the long run. Additionally,
         | acquiring the skills to debug queries and optimize database
         | configurations can ultimately save on developer hours.
        
         | mamcx wrote:
         | > How much do they pay an average software developer?
         | 
         | Let's say that you pay a lot. Like you expend $20.000 USD doing
         | this.
         | 
         | But we forget the _cost power of software_.
         | 
         | Your monthly savings extend to the infinity.
         | 
         | Your dev is now freer to do other things that are precluded in
         | a more complex scenario ("you save pennies here, but expend
         | pounds elsewhere"), because complexity DEMANDS MORE ATTENTION.
         | 
         | This is something I have observed all my life. I have _never_
         | regretted to optime]ize for operational costs even if this
         | means I  "waste" a lot of time for this. I work in a more
         | cutthroat environment than most startups (ERPs), where my time
         | to deploy to production something is measured in hours, and is
         | an environment that is filled by the graveyard of my
         | competitors that have drink all the kool-aids.
         | 
         | Complex operational environments ALWAYS have turned worse/more
         | complex than do the simplest things.
         | 
         | P.D: I think is similar to spending time in refactoring,
         | cleanups, testing, security audits, training, and even resting.
         | 
         | All of that sounds like "costs". But you are doing this stuff
         | alongside, just need to budget a few hours a week on this and
         | the profits come and come.
         | 
         | Never regret using time on this stuff.
         | 
         | NEED to be smart on this? Sure. But is pure muscle and after a
         | while is great.
        
           | nerdponx wrote:
           | Carefully designing your application around an underpowered
           | database is almost surely _more_ complicated than just paying
           | for a more powerful database.
           | 
           | We're not talking about K8s or sharding. We're talking about
           | adding more cores and RAM to the single database node that
           | they're already using.
           | 
           | Imagine how much easier the backend dev's life would be if
           | they could just write SELECT FROM JOIN WHERE like everyone
           | else, and not have to pore over query plans and
           | design/implement/test a bunch of custom joining routines.
           | This path sounds like a high-effort artisanal approach to
           | something where it doesn't seem warranted. It's penny-wise
           | and pound-foolish in exactly the way you (very nicely)
           | described.
        
             | hobs wrote:
             | It's true, but its great experience for future things when
             | the costs actually matter - I cut my teeth at a biz that
             | refused to pay more for database hardware and now my
             | performance skills pay the bills at much nicer companies.
        
               | smallnamespace wrote:
               | Surely it's great for you as the developer to get paid
               | extra to learn transferable skills.
               | 
               | That's not necessarily true of the business.
        
               | hobs wrote:
               | Absolutely, I did not feel it as a positive at the time
               | and recommended not that much more money for
               | significantly better utilization of human time, but you
               | take what you can get with intractable CEOs.
        
         | renegade-otter wrote:
         | And now your system can scale better, saving you enough money
         | to justify the optimization work long-term.
         | 
         | The StackOverflow folks famously take their DB design and
         | performance very seriously, and now they do not need an army of
         | overpriced "architects" to plug the performance holes with more
         | insane and expensive complexity.
         | 
         | https://stackexchange.com/performance
         | 
         | Coincidentally I just wrote up a post about the importance of
         | minding your database skills:
         | 
         | https://renegadeotter.com/2023/11/12/your-database-skills-ar...
        
           | paxys wrote:
           | Stack Overflow is the poster child of over-optimizing DB
           | engineering while neglecting every other product feature that
           | end users actually want, so not sure if that should be your
           | example.
        
             | renegade-otter wrote:
             | There is a case to be made they took it to an extreme, but
             | there is too much going in the other direction as well.
             | It's not like most teams are working on the meat of the
             | product anyway - they probably spend 40% of their time
             | running a small project on Kubernetes and getting rid of
             | daily security warnings from NPM.
        
             | adrianN wrote:
             | Somehow they managed to become one of the most important
             | websites for software developers despite neglecting every
             | product feature you miss.
        
               | paxys wrote:
               | They have never made a dollar of profit in their 15 years
               | of operation and just sold themselves to a private equity
               | firm for pennies. "Most important website for software
               | developers" doesn't really mean anything unless one can
               | also be a sustainable company in the process.
        
             | hobs wrote:
             | Have you seen their code base? They did not over optimize
             | the DB engineering by any means that I could tell.
             | 
             | Having a simple monolithic database is not optimizing your
             | DB engineering by any means.
             | 
             | edit: and inventing a better ORM doesn't count either.
        
         | sharadov wrote:
         | As someone who does DB optimization for a living - there is a
         | tradeoff, I've seen far too many places where db instances run
         | with default settings out of the box, queries missing basic
         | indexes, no one even cared to run an explain analyze.
         | 
         | I've seen far too experienced developers with no basic
         | understanding of how relational databases operate. And it's
         | gotten worse over the years - as they can keep throwing
         | hardware at the problem. Performance optimization has become a
         | lost art. This is as much a problem of negligence as much as it
         | is of education.
         | 
         | I won't even touch on proper design, which requires someone
         | with significant database expertise from the get-go ( which
         | most startups may not have).
        
           | dkjaudyeqooe wrote:
           | > Performance optimization has become a lost art.
           | 
           | It's really not very hard, basically add indexes wherever
           | lookups are done but don't go overboard, especially with
           | tables that see a lot of inserts. The DB will tell you what
           | it's doing if you ask, it's all easily investigated. If your
           | joins resist optimisation your data structure is probably
           | off.
           | 
           | > I won't even touch on proper design, which requires someone
           | with significant database expertise from the get-go
           | 
           | In more extreme cases this is true, but a few basic
           | principles and an understanding of databases will get you 95%
           | of the way there and is quite intuitive. Something as basic
           | as "don't copy data around, each individual entity exists in
           | one place, and use its key to refer to it otherwise" will
           | help enormously.
        
             | alex7734 wrote:
             | > It's really not very hard, basically add indexes
             | 
             | It's really not but you'd be surprised how many people
             | don't know the basics of how indexes work. I've had people
             | tell me that their queries couldn't be optimized further
             | because "they had already added an index for every column
             | of the table".
             | 
             | It doesn't help that SQL, by design, hides the actual
             | algorithm doing the data access from the users while
             | simultaneously relying on them to add indexes to achieve
             | performance, which is in my humble opinion the worst
             | mistake of SQL.
        
               | corytheboyd wrote:
               | I love to ask this very simple question in interviews: if
               | a database index makes queries faster, why not add an
               | index for every column on a table?
               | 
               | I want them to say "the heck are you talking about,
               | because that's not how anything works?"... it does trip
               | some people up though that just legitimately don't
               | understand what's happening.
        
               | ako wrote:
               | That is not a SQL mistake, but an implementation choice.
               | Most modern databases can determine where indexes should
               | be added, and add these automatically. E.g.:
               | https://www.oracle.com/news/connect/oracle-database-
               | automati...
        
           | jjav wrote:
           | > I've seen far too many places where db instances run with
           | default settings out of the box, queries missing basic
           | indexes, no one even cared to run an explain analyze.
           | 
           | Indeed! One place I worked at, there was something like 10x
           | (forget exactly) gains to be had simply by editing the MySQL
           | config to match the workload. Seems like nobody bothers
           | anymore. There is _so much_ money to be saved by reducing
           | excess operating costs.
        
         | qingcharles wrote:
         | You can run this on Hetzner ARM hosts for about $30/year o_O
        
         | jerf wrote:
         | The last line from the article: "We must not forget that 20
         | years ago only a fraction of the performance of current
         | hardware resources was available, and despite these
         | restrictions, performance database-based applications were
         | developed."
         | 
         | I think you have a mental model in which these people are
         | spending weeks and months optimizing a system they should just
         | upgrade while the real tasks languish, but in my experience,
         | probably all they are doing is just paying a bit of attention
         | every so often and simply doing things that don't require 32
         | cores and terabytes of RAM. A lot of things don't need that
         | power.
         | 
         | 20 years ago you had a lot more real variance in power. If your
         | 100MHz server couldn't handle it, maybe your 400MHz server
         | could, and the gap between those two systems is probably even
         | bigger than the numbers imply because that 400MHz was probably
         | equipped with better everything else (RAM, disk, cache, better
         | performance/cycle for the CPU) to go with it. Nowadays, with
         | CPU speeds having stalled, it's easy to go wide and service
         | lots of diverse queries at a time, but if you have a particular
         | query that your lowest-end DB node is struggling with, there's
         | a pretty decent chance that just pushing the button to go to a
         | bigger node will have a distinctly sub-linear result on
         | performance improvement. I've certainly seen this from a number
         | of teams in a number of places. You can push a button to get
         | _more_ CPUs but you can 't push a button to get much _better_
         | CPUs. Some better, yes, but not like it used to be.
         | 
         | If you just try a little, modern commodity systems can do a
         | _lot_. Even with only 4 cores and 4GB. Just try a little. Yes,
         | there are absolutely tasks that that doesn 't work with; if
         | you've got 10,000 hours of video to re-encode that system isn't
         | going to cut it. But a lot of developers are _really_ bad at
         | understanding how long things should take and don 't realize
         | that the thing that takes two minutes during CI really should
         | take something more like 50ms because
         | https://accidentallyquadratic.tumblr.com/ and somebody really
         | ought to look into that rather than just bump the CI node size
         | again hoping it solves the problem.
         | 
         | (And for Pete's sake, when bumping the instance size does
         | happen to not solve the problem... _turn it back down again!_
         | Despite what I just wrote I do understand there 's a time and a
         | place to spray money at the problem, but when spraying money at
         | the problem doesn't even help, _STOP_.)
        
         | jjav wrote:
         | > The optimizations are no doubt impressive
         | 
         | The sad thing is that the things they discuss like monitoring
         | and optimizing slow queries have always, until recently, been
         | baseline competence.
         | 
         | It's only recently that the universal answer to something being
         | too slow is to simply spin up another dozen, or heck make it
         | two dozen, cloud instances to compensate.
         | 
         | > Even if their DB infra was only half as efficient as it is
         | today, their costs would increase by a whopping $8000/yr.
         | 
         | I think the interesting comparison is not so much the same team
         | doing things slightly less efficiently, but comparing to the
         | average status quo in the industry.
         | 
         | So many companies are spending $100K+/month (month not year) on
         | AWS costs to run very tiny minimal loads due to the attitude
         | that efficiency never matters, just spin up more instances.
         | Pretty soon you're talking real money. And by the time the CFO
         | starts screaming, it's very difficult to go back to a more sane
         | infrastructure.
        
       | latchkey wrote:
       | I'm picturing that meme with the guy sweating over pressing a
       | button.
       | 
       | Add new features that generates revenue, or optimize the
       | database.
       | 
       | Hmmm.
       | 
       | Of course the response is that you want to optimize the database
       | so that when you do eventually add new features that generate
       | revenue, you can scale to meet the demand.
       | 
       | Or is it...
        
         | Draiken wrote:
         | This meme is a bit too misleading.
         | 
         | In reality tons of startups have a shitty product with zillions
         | of bugs because "move fast break things". Then later on they
         | don't survive because the only reason people even pay for them
         | is because the other startups are spending silly money on it.
         | When the well dries up, it all comes crumbling down.
         | 
         | Turns out that startup's software is too shitty and simply not
         | good enough to survive a cost cutting round.
         | 
         | People love to pretend that just adding more features always
         | beats everything else, but it's not as clear cut as people make
         | it seem.
        
           | latchkey wrote:
           | We're talking about premature optimization, not necessarily
           | shitty/buggy software.
        
       | lgkk wrote:
       | Nice.
       | 
       | I do something similar with low cost instances. I use Go and I
       | get a great bang for my buck (performance to dx to costs).
       | 
       | In my case I also leverage the cheap access to gobs of SSD or
       | nvme. I use super cheap VMs for the app servers and they come
       | with 10-20gb disk. I use this as KV embedded cache. For a few
       | bucks a month I'm able to pump out 7-10k rps at load (synthetic
       | but still).
       | 
       | I have a low end pg instance that's able to pump out 2k tps
       | (synthetic) so with the little bit of caching I'm able to save
       | some costs and also offer good performance. All under $100 a
       | month.
       | 
       | I hope when my business does take off I will have a good base to
       | expand that I can get far without spending tons of money on
       | inefficient compute.
        
       | mannyv wrote:
       | If you want a data store, just use a NoSQL server.
       | 
       | If you want an RDBMS, learn how to use it.
       | 
       | At least with NoSQL you know you're trading ignorance for ease of
       | use.
        
       | bhaak wrote:
       | > Improve database schemas, e.g., through normalization, to
       | promote efficient filtering.
       | 
       | Cries in ActiveRecord type columns.
       | 
       | In less sad news, I recently learned that Postgres can have
       | serious problems with joining a lot of tables. I looked at a join
       | of several tables (table A 80MM, table B 30MM, table C 1MM rows)
       | and instead of using the WHERE parameter to reduce the result set
       | of table A to a few thousands, it fetched all rows from table C
       | first.
       | 
       | The applied heuristic in this case apparently being "join tables
       | by order of lower row count".
       | 
       | Setting "join_collapse_limit" to 1 forces the query planner to
       | join the table in exactly the order as they appear in the query
       | and that solved my performance issues neatly.
        
         | nerdponx wrote:
         | Now that CTEs are no longer optimization fences in recent
         | versions of Postgres, is there a way to somehow manually
         | declare an optimization fence around a certain region of the
         | query?
         | 
         | I've had similar problems in Snowflake with the query planner
         | taking a bad path, and no obvious way to hint otherwise.
        
           | bhaak wrote:
           | Functions are hard barriers for optimization and I don't see
           | that changing any time soon.
           | 
           | Which goes even so far that EXPLAIN doesn't work on them and
           | you just get a black box of "Function call to ..." in the
           | execution plan.
           | 
           | Using auto_explain is the only workaround known to me for
           | that.
        
             | emilsedgh wrote:
             | But that's not the case when functions are inlined.
        
               | bhaak wrote:
               | Interesting. I've never seen this though. I guess that
               | only happens for simple cases?
        
           | baq wrote:
           | WITH MATERIALIZED restores the fence.
        
         | anarazel wrote:
         | > The applied heuristic in this case apparently being "join
         | tables by order of lower row count".
         | 
         | Not really - it's all cost based analysis. There are a few main
         | culprits that might be at work:
         | 
         | 1) Selectivity estimates can get very off if you have cross-
         | column correlations or your joins are over multiple
         | aggregations / DISTINCT or such.
         | 
         | 2) Join order planning is exponential in the number of joins.
         | Once there are more than geqo_threshold (default = 12) joins,
         | we don't explore the whole space anymore.
         | 
         | 3) If you have more than join_collapse_limit /
         | from_collapse_limit joins, subqueries in the from list, we'll
         | not flatten them anymore, to prevent the join ordering problem
         | to become even harder. It might be worth trying to just
         | increase them with your query.
        
       ___________________________________________________________________
       (page generated 2023-11-16 23:00 UTC)