[HN Gopher] Speedup of deletes on PostgreSQL
       ___________________________________________________________________
        
       Speedup of deletes on PostgreSQL
        
       Author : Ianvdl
       Score  : 72 points
       Date   : 2024-05-29 14:48 UTC (8 hours ago)
        
 (HTM) web link (ivdl.co.za)
 (TXT) w3m dump (ivdl.co.za)
        
       | EdSchouten wrote:
       | Out of curiosity, would placing all DELETE queries within a
       | single transaction also help? Or does that still cause PostgreSQL
       | to process each of the queries sequentially?
        
         | londons_explore wrote:
         | seperate queries within a transaction aren't optimized
         | together. It wouldn't help (apart from possibly some caching
         | benefits).
        
         | peter_l_downs wrote:
         | No -- most constraint checks are by default deferred until the
         | end of the transaction, but you'd still need to check them, and
         | without an index you'll still have to do a large scan.
         | 
         | See https://www.postgresql.org/docs/17/sql-set-constraints.html
         | for more information regarding constraint checking.
        
       | mypalmike wrote:
       | I wonder how common it is to learn to add FK indexes in Postgres
       | after watching a system be surprisingly slow. I learned the same
       | lesson in a very similar way.
        
         | hn_throwaway_99 wrote:
         | I'm surprised there doesn't seem to be more consensus on this
         | issue, but I guess it's because changing the default after the
         | fact would be backwards incompatible.
         | 
         | I'm pretty sure MySQL creates fk indexes by default, but I
         | believe MS SQL Server does not, like Postgres.
        
           | RegnisGnaw wrote:
           | I haven't touched MySQL since the 5.7 says, but I don't think
           | it does. I remember back the creation of the FK will fail if
           | there is no index, but it doesn't create them.
        
             | hu3 wrote:
             | https://dev.mysql.com/doc/refman/5.7/en/constraint-
             | foreign-k...
             | 
             | Documentation for 5.7 says it does create indexes for FKs
             | automatically if one isn't created.
             | 
             | > MySQL requires that foreign key columns be indexed; if
             | you create a table with a foreign key constraint but no
             | index on a given column, an index is created.
        
       | hn_throwaway_99 wrote:
       | I had problems with the "no foreign key indexes by default"
       | issue, and as much as I love Postgres I think this is an
       | unfortunate foot gun.
       | 
       | I think it would be much better to create indexes for foreign
       | keys by default, and then allow skipping index creation with
       | something like a `NO INDEX` clause if explicitly desired.
        
         | peter_l_downs wrote:
         | I agree with you. What I've done in the past (and continue to
         | do with new projects) is write a database-backed test that:
         | 
         | - applies migrations
         | 
         | - parses the resulting schema
         | 
         | - finds all foreign key references: ref{tableA columnA ->
         | tableB columnB}
         | 
         | - finds all the indexes: index{tableA columnA [columnB...]}
         | 
         | - checks that there is an explicit index for every reference
         | column: index{tableA columnA} must exist
         | 
         | So basically, by default when you add a new foreign key, the
         | tests fail until you either explicitly add an exception OR
         | create the necessary index. Easy.
         | 
         | This strategy is also really nice for linting other relational
         | properties in your database. For instance, for
         | GDPR/CCPA/correctness purposes, you probably want to prevent
         | deletion of certain rows unless it's done by specialized code
         | with an audit log. These kinds of lints can check to make sure
         | that there are no ON DELETE CASCADE foreign keys from those
         | tables that would result in a surprise deletion. You can also
         | check to make sure that foreign keys are either ON DELETE
         | CASCADE, ON DELETE SET NULL, or explicitly covered by custom
         | deletion code.
        
         | ltbarcly3 wrote:
         | Not a footgun, it is well thought out and better than what you
         | propose, and less surprising. (edited to be less insulting)
         | 
         | What makes you think that Postgres automatically making an
         | arbitrary number of indexes on an arbitrary number of tables
         | that you aren't trying to modify, that might be extremely bad
         | for overall performance or take weeks to create, will save you
         | from the rest of the things you haven't bothered to learn?
        
           | hu3 wrote:
           | I'll have to defend your parent commenter on this one.
           | 
           | Not having indexes for FKs is on average much worse for
           | overall performance. Defaults should be reasonable.
           | 
           | In the great majority of cases you WANT to have indexes in
           | FKs.
           | 
           | > expect the universe to magically fix all of your mistakes
           | 
           | This kind of derogatory hyperbole is not necessary nor
           | productive.
           | 
           | I should expect tools to help me avoid mistakes. Not having
           | an index on FKs is, more often than not, a mistake. It is
           | reasonable to expect PostgreSQL to help me here.
        
             | stouset wrote:
             | I'll play devil's advocate. To be clear I generally agree
             | that foreign keys should essentially always have a
             | corresponding index, and that not including an index is a
             | mistake far more often than it isn't.
             | 
             | My only counterargument is that--especially in production--
             | adding indexes is expensive. Adding foreign keys is cheap.
             | Latching a potentially expensive operation that can result
             | in downtime to what _should_ be (and often is _expected_ to
             | be) a cheap operation can cause an unexpected immediate
             | loss of service. Though I believe (but am not certain) that
             | in PostgreSQL 's case, it should be relatively easy to
             | recover from since DDL is transactional, and I can't see
             | why you wouldn't be able to abort the index-creating
             | transaction in progress. In MySQL I believe it's much more
             | difficult to recover from this type of situation.
             | 
             | Not having an index on a foreign key can cause problems,
             | but they tend to be of the long-term performance-reducing
             | kind rather than the immediate outage kind. And in the
             | event that adding a foreign key causes an issue, removing
             | it is as simple as creating it.
             | 
             | That's all I've got: you're latching a slow, table-locking
             | operating to what is expected to be an immediate one. Yes,
             | I understand you're only suggesting this be the default,
             | but I wouldn't expect most developers to predict the
             | possible implications. Especially if the migration worked
             | quickly in staging, where there's less data.
             | 
             | Edit: Actually, another one: if the index is created
             | automatically, should it be removed when the foreign key is
             | removed? This isn't a "problem" so much as a design issue
             | with--I think--no necessarily clear, great answer. Just
             | different choices with potentially-awkward tradeoffs.
        
               | hu3 wrote:
               | Correct me if I'm wrong but, FKs are rarely created for
               | existing columns.
               | 
               | You usually create the column and the FK in the same
               | script. And usually starting with a NULL value for
               | existing rows.
               | 
               | And if it's a new table then there's no rows anyway.
               | 
               | So the most common operations when creating FK's aren't
               | expensive as far as I know.
               | 
               | You know what's expensive? Creating an index on a large
               | table because you or your RDMS forgot to create the index
               | when the FK was created and now JOINS are crawling to
               | halt.
        
               | stouset wrote:
               | FK indexes by necessity need to be placed on the
               | _foreign_ table, which is just as likely to be a
               | preexisting table that already contains data.
               | 
               | To be clear I 100% agree that adding indexes later is
               | extremely painful. A little care when first creating
               | tables goes a long way, and I've never seen a database
               | fall over due to preemptive over-indexing but I've seen
               | countless do so thanks to being underindexed.
               | 
               | Still, taking a DDL operation which is presumed to be
               | essentially instantaneous and adding a default behavior
               | that requires locking completely separate tables for a
               | potentially-lengthy update does give me pause.
        
           | switchbak wrote:
           | You're right in that the developer ought to have control over
           | the kind of index that gets created.
           | 
           | Having an ability to lint a DB and check for missing indexes
           | would be useful, that could even be useful in a vanilla
           | Postgres (however you want to provide it). Perhaps auto-
           | creating them is a step too far.
           | 
           | Note that your statement: "you just don't know what you are
           | doing" comes off as unnecessarily arrogant and off-putting.
        
           | vlovich123 wrote:
           | If I apply your reasoning, is there anything that is a
           | footgun? I can just excuse anything as you are just expecting
           | the universe to magically fix all of your mistakes & then the
           | word loses all meaning.
           | 
           | Footguns are when your expectations are subverted in
           | surprising ways. It means the system is set up to point a gun
           | on a hair trigger at your feet and then just wait for someone
           | to bump in and set it off - you could blame the person who
           | bumped into the footgun or you could consider that maybe you
           | shouldn't be pointing a gun at peoples feet on a hair trigger
           | & blaming the person who was unlucky or clumsy enough to bump
           | into it. Subverting reasonable expectations, having defaults
           | tuned for the minority situation, and having inconsistent
           | defaults are all footguns in my opinion. Footguns can be
           | unavoidable in many cases when you don't have any reason to
           | believe there is a majority or minority usage pattern, but
           | that doesn't seem to be the case here based on what the
           | author & people in the thread seem to be saying.
           | 
           | Arguing that someone needs to learn arbitrarily many things
           | to properly use a tool is just gatekeeping; this isn't the
           | only footgun in Postgres. If you notice, there's reflection
           | going on here on whether there may be ways to improve the
           | tool to begin with (e.g. maybe the default for FKs should be
           | to index them given that that's what people usually do on FKs
           | anyway & it's the default for PKs).
        
           | hn_throwaway_99 wrote:
           | Did someone pee in your coffee this morning?
           | 
           | I know very clearly how it works. It's essentially a
           | tradeoff: don't create indexes by default, and you don't take
           | the hit of creating indexes but then hit the risk of queries
           | starting to fall over due to lack of indexes when you start
           | to scale. Alternatively, create the indexes (again, _by
           | default_ ), with the risk that you may be creating ones you
           | don't need.
           | 
           | Importantly, all I am recommending is what the _default_
           | behavior is - I still think you should be able to opt out
           | when creating the FK. It 's just that (a) in my experience
           | you _do_ end up wanting an index at least 80-90% of the time,
           | and (b) _not_ adding indexes usually has much worse
           | implications than adding them unnecessarily.
           | 
           | Also, this is obviously an easy thing to get wrong given the
           | number of times I've seen different developers hit this, and
           | the fact that different DB engines have settled on different
           | defaults (e.g. MySQL _does_ automatically create FK indexes
           | and requires an index on all FKs).
           | 
           | But please, continue, let me know what an idiot I am.
        
       | saisrirampur wrote:
       | Nice post! Love the unique insight here. TL;DR Create indexes on
       | the foreign key columns of the table on which the foreign keys
       | are defined.
        
       | SoftTalker wrote:
       | This is DBA 101 stuff. If a database is part of your software,
       | you really need someone on the team who knows how it works.
        
         | ltbarcly3 wrote:
         | 100% this, however my first boss shared some wisdom with me:
         | 
         | "The kind of people who recognize the value of expertise don't
         | need to be told to look for it, the kind of people who don't
         | recognize it can't be told anything."
        
         | AnotherGoodName wrote:
         | I've witnessed companies react with delight at the results
         | after spending millions on consultants when all the consultants
         | did was a few hours of explain query and create index.
         | 
         | Not my problem when I see it, way to go consultants charging
         | millions, but it's amazing how poorly big companies are run
         | that this seriously happens.
         | 
         | You use a database? You have someone who knows how to add
         | indexes right? Right?!
        
         | DaiPlusPlus wrote:
         | Yes, but consider the marketing, uh... I mean, messaging coming
         | from the NoSQL and DBaaS (CosmosDb, DynamoDb, etc) and Firebase
         | crowd: "databases are _hard_ , let us manage it all for you" -
         | and they've got a point: it's 2024 now, we arguably shouldn't
         | need to handle those kinds of non-functional requirements by
         | ourselves: a DB engine should be able to automatically infer
         | the necessary indexes from the schema design, and automatically
         | rebuild them asynchronously - if Google can search the web in
         | under a second, then your RDBMS should have no problem querying
         | your data in a fraction of the time.
         | 
         | ...which I imagine is the impression made to a lot of (even
         | most?) people who got started writing software-that-uses-a-
         | database within the past decade. If you're using NodeJS then
         | using a KV-store library feels far more natural than writing
         | SQL in a string. At least the kids are using parameters now, so
         | it's not like how every PHP+MySQL site was vulnerable to
         | injection attacks...
         | 
         | (I know that recently RDBMS now do implement automatic indexes
         | based on runtime query-profiling, which is great, but it isn't
         | pre-emptive, and often gets it wrong too)
         | 
         | -------
         | 
         | Also, who calls themselves a "DBA" anymore? That word makes me
         | think of a pipe-and-suspenders type, still employed well-past
         | retirement age because they're the only ones who knows how to
         | keep the company's Big Iron (...or AS/400) database from
         | keeling over. Thesedays it's all "Ops" - "DevOps", "SysOps",
         | ..."DatabaseOps"?
        
           | teaearlgraycold wrote:
           | Google doesn't search the internet in under a second. They
           | are doing a key-value lookup. Any time they change the search
           | algorithm or add to their index they recompute every query
           | ever searched and prepare the answers to be quickly recalled.
        
         | arp242 wrote:
         | You're not wrong, but unfortunately many teams don't.
         | 
         | Probably my favourite "ya'll don't understand how databases
         | work" was where they "reserved" space for MySQL enums; for
         | example for the "active" column it would be something like:
         | enum(           'active',           'deleted',
         | '_futureval1',           '_futureval2',
         | '_futureval3',           '_futureval4',
         | '_futureval5',           '_futureval6',
         | '_futureval7',           '_futureval8',           '_futureval9'
         | )
         | 
         | Enums don't work like that at all; it's just a mapping of int
         | to a string value for readability, and you don't need to
         | "reserve" space to add future values just like you don't need
         | to for ints. Adding a new enum value is easy and cheap.
         | Removing them is not as it requires a full scan of all rows to
         | verify they're used. Even worse, you couldn't easily rename
         | enum labels (at the time, don't know if you can now), making it
         | all worse than useless.
         | 
         | Since this was all on large tables and the effort to fix it was
         | relatively large, but without adding much business value, we
         | never fixed it. We were basically stuck with it. It sure as
         | hell annoyed me every time I looked at it.
         | 
         | I'm not an DBA either, but spending about 5 seconds on the
         | documentation for "enum" would have prevented this. This really
         | doesn't require a PhD in SQL.
        
           | mbell wrote:
           | I'm not sure what the current state of things are since I
           | haven't use MySQL recently but this used to be a perfectly
           | valid thing to do.
           | 
           | The issue was that MySQL doesn't use a full int to store
           | enums. If your enum has 8 values, it stores in 1 byte, if it
           | has more than 8, it stores it in 2 bytes. Adding that 9th
           | value thus requires re-writing the entire table. So yes - it
           | can make sense to "reserve space" to avoid a future table re-
           | write.
           | 
           | You also had to be careful to include `ALGORITHM=INPLACE,
           | LOCK=NONE;` in your `ALTER TABLE` statement when changing the
           | enum or it would lock the table and rewrite it.
        
             | colonwqbang wrote:
             | A byte would fit at least 255 different values, right? How
             | often is this limit exceeded in practice.
        
             | arp242 wrote:
             | You can store 255 values in one byte, and reserving two
             | bytes is not what that did.
             | 
             | And even if I did, it still leaves the inability to
             | actually rename enums without scanning the full table at
             | least twice (which still doesn't seem possible in MariaDB,
             | unless I missed something there).
             | 
             | If you potentially want great flexibility you shouldn't be
             | using enums in the first place but int and a relational
             | mapping to another table.
        
       | ltbarcly3 wrote:
       | Find missing indexes, return SQL to create them.
       | SELECT CONCAT('CREATE INDEX ', relname, '_', conname, '_ix ON ',
       | nspname, '.', relname, ' ',                     regexp_replace(
       | regexp_replace(pg_get_constraintdef(pg_constraint.oid, true),
       | ' REFERENCES.*$','',''),
       | 'FOREIGN KEY ','',''),                   ';') AS query       FROM
       | pg_constraint       JOIN pg_class           ON (conrelid =
       | pg_class.oid)       JOIN pg_namespace           ON (relnamespace
       | = pg_namespace.oid)       WHERE contype = 'f' AND           NOT
       | EXISTS (                  SELECT 1                    FROM
       | pg_index                   WHERE indrelid = conrelid AND
       | conkey::int[] @> indkey::int[] AND
       | indkey::int[] @> conkey::int[]);
        
         | netcraft wrote:
         | youve got an extra escape in there, probably an artifact of HN
         | or something, its showing as *
        
           | ltbarcly3 wrote:
           | thank you, fixed
        
         | heuristic81 wrote:
         | This seems to have a few false positives. Multi-column indexes
         | that have the first column as the foreign key work as well as a
         | single column index. For more modern postgres, partial indexes
         | with `WHERE column_name IS NOT NULL` on columns that can be
         | null are also valid and more performant.
         | 
         | Here's what we use in CI to check for missing indexes:
         | -- Unindexed FK -- Missing indexes - For CI              WITH y
         | AS (         SELECT         pg_catalog.format('%I', c1.relname)
         | AS referencing_tbl,         pg_catalog.quote_ident(a1.attname)
         | AS referencing_column,         (SELECT pg_get_expr(indpred,
         | indrelid) FROM pg_catalog.pg_index WHERE indrelid = t.conrelid
         | AND indkey[0] = t.conkey[1] AND indpred IS NOT NULL LIMIT 1)
         | partial_statement         FROM pg_catalog.pg_constraint t
         | JOIN pg_catalog.pg_attribute  a1 ON a1.attrelid = t.conrelid
         | AND a1.attnum = t.conkey[1]         JOIN pg_catalog.pg_class
         | c1 ON c1.oid = t.conrelid         JOIN pg_catalog.pg_namespace
         | n1 ON n1.oid = c1.relnamespace         JOIN pg_catalog.pg_class
         | c2 ON c2.oid = t.confrelid         JOIN pg_catalog.pg_namespace
         | n2 ON n2.oid = c2.relnamespace         JOIN
         | pg_catalog.pg_attribute  a2 ON a2.attrelid = t.confrelid AND
         | a2.attnum = t.confkey[1]         WHERE t.contype = 'f'
         | AND NOT EXISTS (         SELECT 1         FROM
         | pg_catalog.pg_index i         WHERE i.indrelid = t.conrelid
         | AND i.indkey[0] = t.conkey[1]         AND indpred IS NULL
         | )         )         SELECT  referencing_tbl || '.' ||
         | referencing_column as column         FROM y         WHERE
         | (partial_statement IS NULL OR partial_statement <> ('(' ||
         | referencing_column || ' IS NOT NULL)'))         ORDER BY 1;
         | 
         | Additionally I have this to specify the index creation commands
         | (CONCURRENTLY is recommended for existing tables in production
         | as it doesn't cause locking):                   -- Unindexed FK
         | -- Missing indexes - Show Create Syntax              WITH y AS
         | (             SELECT                 pg_catalog.format('%I.%I',
         | n1.nspname, c1.relname)  AS referencing_tbl,
         | pg_catalog.quote_ident(a1.attname) AS referencing_column,
         | (SELECT pg_get_expr(indpred, indrelid) FROM pg_catalog.pg_index
         | WHERE indrelid = t.conrelid AND indkey[0] = t.conkey[1] AND
         | indpred IS NOT NULL LIMIT 1) partial_statement,
         | t1.typname AS referencing_type,                 t.conname AS
         | existing_fk_on_referencing_tbl,
         | pg_catalog.format('%I.%I', n2.nspname, c2.relname) AS
         | referenced_tbl,
         | pg_catalog.quote_ident(a2.attname) AS referenced_column,
         | t2.typname AS referenced_type,
         | pg_relation_size( pg_catalog.format('%I.%I', n1.nspname,
         | c1.relname) ) AS referencing_tbl_bytes,
         | pg_relation_size( pg_catalog.format('%I.%I', n2.nspname,
         | c2.relname) ) AS referenced_tbl_bytes,
         | pg_catalog.format($$CREATE INDEX CONCURRENTLY IF NOT EXISTS %I
         | ON %s%I(%I)%s;$$, c1.relname || '_' || a1.attname || 'x' , CASE
         | WHEN n1.nspname = 'public' THEN '' ELSE n1.nspname || '.' END,
         | c1.relname, a1.attname, CASE WHEN a1.attnotnull THEN '' ELSE '
         | WHERE ' || a1.attname || ' IS NOT NULL' END) AS suggestion
         | FROM pg_catalog.pg_constraint t             JOIN
         | pg_catalog.pg_attribute  a1 ON a1.attrelid = t.conrelid AND
         | a1.attnum = t.conkey[1]             JOIN pg_catalog.pg_type
         | t1 ON a1.atttypid = t1.oid             JOIN pg_catalog.pg_class
         | c1 ON c1.oid = t.conrelid             JOIN
         | pg_catalog.pg_namespace  n1 ON n1.oid = c1.relnamespace
         | JOIN pg_catalog.pg_class      c2 ON c2.oid = t.confrelid
         | JOIN pg_catalog.pg_namespace  n2 ON n2.oid = c2.relnamespace
         | JOIN pg_catalog.pg_attribute  a2 ON a2.attrelid = t.confrelid
         | AND a2.attnum = t.confkey[1]             JOIN
         | pg_catalog.pg_type       t2 ON a2.atttypid = t2.oid
         | WHERE t.contype = 'f'             AND NOT EXISTS (
         | SELECT 1                 FROM pg_catalog.pg_index i
         | WHERE i.indrelid = t.conrelid                 AND i.indkey[0] =
         | t.conkey[1]                 AND i.indpred IS NULL             )
         | )         SELECT  referencing_tbl,
         | referencing_column,
         | existing_fk_on_referencing_tbl,                 referenced_tbl,
         | referenced_column,
         | pg_size_pretty(referencing_tbl_bytes) AS referencing_tbl_size,
         | pg_size_pretty(referenced_tbl_bytes) AS referenced_tbl_size,
         | suggestion         FROM y         WHERE (partial_statement IS
         | NULL OR partial_statement <> ('(' || referencing_column || ' IS
         | NOT NULL)'))         ORDER BY             referencing_tbl_bytes
         | DESC,             referenced_tbl_bytes DESC,
         | referencing_tbl,             referenced_tbl,
         | referencing_column,             referenced_column;
        
       | mbb70 wrote:
       | Similar to 'If an article title poses a question, the answer is
       | no', if an article promises a significant speedup of a database
       | query, an index was added.
        
       | chasil wrote:
       | Lacking indexes on columns involved in a foreign key will also
       | cause deadlocks in Oracle.
       | 
       | This problem is common.
       | 
       | "Obviously, Oracle considers deadlocks a self-induced error on
       | part of the application and, for the most part, they are correct.
       | Unlike in many other RDBMSs, deadlocks are so rare in Oracle they
       | can be considered almost non-existent. Typically, you must come
       | up with artificial conditions to get one.
       | 
       | "The number one cause of deadlocks in the Oracle database, in my
       | experience, is un-indexed foreign keys. There are two cases where
       | Oracle will place a full table lock on a child table after
       | modification of the parent table: a) If I update the parent
       | table's primary key (a very rare occurrence if you follow the
       | rules of relational databases that primary keys should be
       | immutable), the child table will be locked in the absence of an
       | index. b) If I delete a parent table row, the entire child table
       | will be locked (in the absence of an index) as well...
       | 
       | "So, when do you not need to index a foreign key? The answer is,
       | in general, when the following conditions are met: a) You do not
       | delete from the parent table. b) You do not update the parent
       | table's unique/primary key value (watch for unintended updates to
       | the primary key by tools! c) You do not join from the parent to
       | the child (like DEPT to EMP). If you satisfy all three above,
       | feel free to skip the index - it is not needed. If you do any of
       | the above, be aware of the consequences. This is the one very
       | rare time when Oracle tends to 'over-lock' data."
       | 
       | -Tom Kyte, _Expert One-on-One Oracle_ First Edition, 2005.
        
         | trollied wrote:
         | Tom's great. Ask Tom taught me LOADS about Oracle back in the
         | day, and I still have the book you reference.
        
           | chasil wrote:
           | This is the second edition, above deadlock discussion on page
           | 211.
           | 
           | https://javidhasanov.wordpress.com/wp-
           | content/uploads/2012/0...
           | 
           | Kyte said to search for "expert oracle database architecture
           | pdf" to find these versions.
           | 
           | https://asktom.oracle.com/ords/asktom.search?tag=tom-
           | kytes-b...
        
         | hinkley wrote:
         | Why aren't indexes for FK relationships the default?
         | 
         | If you really don't want one there should be a hint/pragma to
         | turn it off.
         | 
         | It's just such a stupid reason for a full table scan.
        
           | marcosdumay wrote:
           | On Postgres, they are. You'll get an informational message
           | saying the index was automatically generated.
           | 
           | But this is not normal behavior. I think Postgres is the only
           | one that does this.
        
       | dagss wrote:
       | What popular SQL databases need is an option/hint to return an
       | error instead of taking a slow query plan.
       | 
       | That way a lot of SQL index creation -- something considered a
       | black art by surprisingly many -- would just be prompted by test
       | suite failures. If you don't have the right indices, your test
       | fails. Simple.
       | 
       | In this case, have TestDeleteCustomer fail, realize you need to
       | add index, 5 minutes later done and learned something. Would be
       | so much easier to newcomers... instead of a giant footgun and
       | obscure lore that only becomes evident after you have a oot of
       | data in production.
       | 
       | Google Data Store does this, just assumes that _of course_ you
       | did not look to do a full table scan. Works great. Not SQL, but
       | no reason popular SQL DBs could not have an option to have query
       | planners throw errors at certain points instead of always making
       | a plan no matter how bad.
       | 
       | SQL has a reputation for a steep learning curve and I blame this
       | single thing -- that you get a poor plan instead of an error -- a
       | lot for it.
        
         | John23832 wrote:
         | > What popular SQL databases need is an option/hint to return
         | an error instead of taking a slow query plan.
         | 
         | You can run ANALYSE on Postgres.
        
           | dagss wrote:
           | I am an MS SQL user and didn't touch postgres but can I
           | assume analyse is a tool for displaying the chosen query plan
           | for a query?
           | 
           | If so it is in all DBs I would think but it is a bit too
           | manual for my taste, still a big hurdle and opt-in...vs just
           | writing code (including SQL code) and tests like normal and
           | opt-out on getting errors if you get poor plans.
           | 
           | But...probably some tooling could be made to do such analysis
           | automatically and throw similar errors...
           | 
           | Does statistics ever cause query plans to suddenly change on
           | postgres? In MS SQL you would also need to pin the plan /
           | disable statistics on tables...
        
             | arp242 wrote:
             | Analyze collects statistics the query planner uses to
             | determine the query plan. It can change the resulting plan,
             | yes.
             | 
             | Production databases using different query plans sure is
             | annoying and cause problems, but I'm not so sure whether
             | returning errors is better. "Slow" beats "not working at
             | all" in almost all cases. The typical case it will select a
             | different query plan once the data grows, which is not so
             | straight-forward to test for, especially since the hardware
             | of your production may have quite different performance
             | characteristics.
             | 
             | Pinning the plan is temping, but has the downside you risk
             | running a bad plan because what works well for your 100k
             | test rows may not work equally well for your 1b actual
             | rows, and testing all of that is again tricky. That's not
             | really a brilliant either, and may also make your
             | application slow.
             | 
             | Just keeping an eye on slow query logs and/or query
             | performance statistics is the general approach. I don't
             | think it's really possible to improve on that without
             | making some pretty serious trade-offs in other areas.
        
               | dagss wrote:
               | Note that I am ONLY talking about a mode to use for
               | limited, trivial OLTP style queries. The kind where the
               | query planner will never be in doubt -- if you just have
               | the right indices in place.
               | 
               | The kind of simple backend software queries where people
               | consider NoSQL instead to avoid SQL's oddities.
               | 
               | The mode I talk about is very inappropriate for any kind
               | of reporting or analytics query or ad hoc queries etc.
               | 
               | > "Slow" beats "not working at all" in almost all cases.
               | 
               | In the specific context specified above, I disagree with
               | this.
               | 
               | Mainly because "not working at all" will be caught during
               | testing (because you should have test coverage of your
               | SQL queries). Slow = undiscovered during testing.
               | 
               | But even assume you didn't have test coverage and the
               | code made it to production -- yes I probably want a
               | seldomly used "DeleteUser" API call to crash if foreign
               | keys were missing indices, instead of doing it anyway and
               | consume lots of DB resources.
               | 
               | > Just keeping an eye on slow query logs and/or query
               | performance statistics is the general approach.
               | 
               | The feature I proposed was to help newbies learn SQL.
               | 
               | This requires expertise in the team, and easily shifts
               | work away from the newbies in the team writing SQL (don't
               | think properly through indices during development) to the
               | single SQL expert in the team.
               | 
               | Depends a bit how your work is organized etc; I like that
               | SQL indices etc are as closely linked to the backend
               | development process as possible; not considered a post-
               | optimization..
               | 
               | If you get the error, you can either make the index you
               | need for a perfect and trivial query plan -- or declare
               | "non-trivial mode" and be back to today's situation.
        
               | arp242 wrote:
               | You don't know what the "right indexes" are, because
               | sometimes "no index" _is_ the  "right index". Sometimes
               | because a full table scan is faster. Sometimes because
               | you're okay accepting the various performance trade-offs
               | (e.g. insert speed vs. update speed, storage space on
               | disk).
               | 
               | Many applications don't have tests for every single last
               | trivial SQL query, and adding those just because the SQL
               | server may decide to bail out because it might perhaps
               | possibly could be 100ms slower is not a good way for most
               | teams to spend their time.
        
             | John23832 wrote:
             | > If so it is in all DBs I would think but it is a bit too
             | manual for my taste, still a big hurdle and opt-in...vs
             | just writing code (including SQL code) and tests like
             | normal and opt-out on getting errors if you get poor plans.
             | 
             | The problem with that is what is considered a "poor plan"
             | largely comes down to the indexes used, and suitability of
             | an index is totally dependent on how it is used in the
             | application.
             | 
             | Who sees the error? The DBA? The application developer?
             | What's the cutoff for "poor".
             | 
             | The stats of the query allow those that know/care to make
             | decisions. That's the one size fits all, simple, tool.
        
               | dagss wrote:
               | What I had in mind was the simple OLTP usecases. To
               | compete with NoSQL in developer ergonomics.
               | 
               | My context is people choosing NoSQL because SQL is too
               | hard to learn and has too many caveats; not because of
               | performance etc
               | 
               | So basically a mode for the planner where you:
               | 
               | - throw away statistics
               | 
               | - consider all tables infinitely large
               | 
               | - ...except the log(N) of an index lookup is OK
               | 
               | - then find a non-infinite plan or crash
               | 
               | Yes, it does not work in all cases. But it avoids giving
               | SQL a bad reputation and push people towards NoSQL DBs
               | for the common case of simplistic OLTP queries.
               | 
               | The error should be seen during testing. The "DBA" sees
               | it there are issues then you are deploying code without
               | test coverage..
               | 
               | For more sophisticated queries, you would not enable this
               | flag, and be aware that you are doing a nontrivial query.
        
         | tedunangst wrote:
         | This seems like a very easy thing for any sort of middleware
         | (or ORM) to do for you. Maybe even add typed where clauses that
         | only exist for indexed columns.
        
           | dagss wrote:
           | I don't understand the idea ... do you mean a middleware that
           | parse the SQL query and contains a query planner and has full
           | knowledge of all the indices in the database .. or something
           | else?
        
         | marcosdumay wrote:
         | What postgres needs is some command to freeze query plans. It
         | doesn't need it very badly, because the planner is excellent,
         | but the need exists.
         | 
         | But what is a slow query plan in development has no relation at
         | all with what would be slow to run in production. You create
         | that error and the consequence will be a lot of spurious
         | indexing with little practical benefit.
        
       | hakanderyal wrote:
       | Coming across that early in my freelancing career, I create
       | indexes on foreign keys by default now via auto-configuring it
       | with ORM.
       | 
       | It's almost always needed, and it's easier to remove them if they
       | somehow become a problem.
        
       ___________________________________________________________________
       (page generated 2024-05-29 23:02 UTC)