[HN Gopher] Speedup of deletes on PostgreSQL
___________________________________________________________________
Speedup of deletes on PostgreSQL
Author : Ianvdl
Score : 72 points
Date : 2024-05-29 14:48 UTC (8 hours ago)
(HTM) web link (ivdl.co.za)
(TXT) w3m dump (ivdl.co.za)
| EdSchouten wrote:
| Out of curiosity, would placing all DELETE queries within a
| single transaction also help? Or does that still cause PostgreSQL
| to process each of the queries sequentially?
| londons_explore wrote:
| seperate queries within a transaction aren't optimized
| together. It wouldn't help (apart from possibly some caching
| benefits).
| peter_l_downs wrote:
| No -- most constraint checks are by default deferred until the
| end of the transaction, but you'd still need to check them, and
| without an index you'll still have to do a large scan.
|
| See https://www.postgresql.org/docs/17/sql-set-constraints.html
| for more information regarding constraint checking.
| mypalmike wrote:
| I wonder how common it is to learn to add FK indexes in Postgres
| after watching a system be surprisingly slow. I learned the same
| lesson in a very similar way.
| hn_throwaway_99 wrote:
| I'm surprised there doesn't seem to be more consensus on this
| issue, but I guess it's because changing the default after the
| fact would be backwards incompatible.
|
| I'm pretty sure MySQL creates fk indexes by default, but I
| believe MS SQL Server does not, like Postgres.
| RegnisGnaw wrote:
| I haven't touched MySQL since the 5.7 says, but I don't think
| it does. I remember back the creation of the FK will fail if
| there is no index, but it doesn't create them.
| hu3 wrote:
| https://dev.mysql.com/doc/refman/5.7/en/constraint-
| foreign-k...
|
| Documentation for 5.7 says it does create indexes for FKs
| automatically if one isn't created.
|
| > MySQL requires that foreign key columns be indexed; if
| you create a table with a foreign key constraint but no
| index on a given column, an index is created.
| hn_throwaway_99 wrote:
| I had problems with the "no foreign key indexes by default"
| issue, and as much as I love Postgres I think this is an
| unfortunate foot gun.
|
| I think it would be much better to create indexes for foreign
| keys by default, and then allow skipping index creation with
| something like a `NO INDEX` clause if explicitly desired.
| peter_l_downs wrote:
| I agree with you. What I've done in the past (and continue to
| do with new projects) is write a database-backed test that:
|
| - applies migrations
|
| - parses the resulting schema
|
| - finds all foreign key references: ref{tableA columnA ->
| tableB columnB}
|
| - finds all the indexes: index{tableA columnA [columnB...]}
|
| - checks that there is an explicit index for every reference
| column: index{tableA columnA} must exist
|
| So basically, by default when you add a new foreign key, the
| tests fail until you either explicitly add an exception OR
| create the necessary index. Easy.
|
| This strategy is also really nice for linting other relational
| properties in your database. For instance, for
| GDPR/CCPA/correctness purposes, you probably want to prevent
| deletion of certain rows unless it's done by specialized code
| with an audit log. These kinds of lints can check to make sure
| that there are no ON DELETE CASCADE foreign keys from those
| tables that would result in a surprise deletion. You can also
| check to make sure that foreign keys are either ON DELETE
| CASCADE, ON DELETE SET NULL, or explicitly covered by custom
| deletion code.
| ltbarcly3 wrote:
| Not a footgun, it is well thought out and better than what you
| propose, and less surprising. (edited to be less insulting)
|
| What makes you think that Postgres automatically making an
| arbitrary number of indexes on an arbitrary number of tables
| that you aren't trying to modify, that might be extremely bad
| for overall performance or take weeks to create, will save you
| from the rest of the things you haven't bothered to learn?
| hu3 wrote:
| I'll have to defend your parent commenter on this one.
|
| Not having indexes for FKs is on average much worse for
| overall performance. Defaults should be reasonable.
|
| In the great majority of cases you WANT to have indexes in
| FKs.
|
| > expect the universe to magically fix all of your mistakes
|
| This kind of derogatory hyperbole is not necessary nor
| productive.
|
| I should expect tools to help me avoid mistakes. Not having
| an index on FKs is, more often than not, a mistake. It is
| reasonable to expect PostgreSQL to help me here.
| stouset wrote:
| I'll play devil's advocate. To be clear I generally agree
| that foreign keys should essentially always have a
| corresponding index, and that not including an index is a
| mistake far more often than it isn't.
|
| My only counterargument is that--especially in production--
| adding indexes is expensive. Adding foreign keys is cheap.
| Latching a potentially expensive operation that can result
| in downtime to what _should_ be (and often is _expected_ to
| be) a cheap operation can cause an unexpected immediate
| loss of service. Though I believe (but am not certain) that
| in PostgreSQL 's case, it should be relatively easy to
| recover from since DDL is transactional, and I can't see
| why you wouldn't be able to abort the index-creating
| transaction in progress. In MySQL I believe it's much more
| difficult to recover from this type of situation.
|
| Not having an index on a foreign key can cause problems,
| but they tend to be of the long-term performance-reducing
| kind rather than the immediate outage kind. And in the
| event that adding a foreign key causes an issue, removing
| it is as simple as creating it.
|
| That's all I've got: you're latching a slow, table-locking
| operating to what is expected to be an immediate one. Yes,
| I understand you're only suggesting this be the default,
| but I wouldn't expect most developers to predict the
| possible implications. Especially if the migration worked
| quickly in staging, where there's less data.
|
| Edit: Actually, another one: if the index is created
| automatically, should it be removed when the foreign key is
| removed? This isn't a "problem" so much as a design issue
| with--I think--no necessarily clear, great answer. Just
| different choices with potentially-awkward tradeoffs.
| hu3 wrote:
| Correct me if I'm wrong but, FKs are rarely created for
| existing columns.
|
| You usually create the column and the FK in the same
| script. And usually starting with a NULL value for
| existing rows.
|
| And if it's a new table then there's no rows anyway.
|
| So the most common operations when creating FK's aren't
| expensive as far as I know.
|
| You know what's expensive? Creating an index on a large
| table because you or your RDMS forgot to create the index
| when the FK was created and now JOINS are crawling to
| halt.
| stouset wrote:
| FK indexes by necessity need to be placed on the
| _foreign_ table, which is just as likely to be a
| preexisting table that already contains data.
|
| To be clear I 100% agree that adding indexes later is
| extremely painful. A little care when first creating
| tables goes a long way, and I've never seen a database
| fall over due to preemptive over-indexing but I've seen
| countless do so thanks to being underindexed.
|
| Still, taking a DDL operation which is presumed to be
| essentially instantaneous and adding a default behavior
| that requires locking completely separate tables for a
| potentially-lengthy update does give me pause.
| switchbak wrote:
| You're right in that the developer ought to have control over
| the kind of index that gets created.
|
| Having an ability to lint a DB and check for missing indexes
| would be useful, that could even be useful in a vanilla
| Postgres (however you want to provide it). Perhaps auto-
| creating them is a step too far.
|
| Note that your statement: "you just don't know what you are
| doing" comes off as unnecessarily arrogant and off-putting.
| vlovich123 wrote:
| If I apply your reasoning, is there anything that is a
| footgun? I can just excuse anything as you are just expecting
| the universe to magically fix all of your mistakes & then the
| word loses all meaning.
|
| Footguns are when your expectations are subverted in
| surprising ways. It means the system is set up to point a gun
| on a hair trigger at your feet and then just wait for someone
| to bump in and set it off - you could blame the person who
| bumped into the footgun or you could consider that maybe you
| shouldn't be pointing a gun at peoples feet on a hair trigger
| & blaming the person who was unlucky or clumsy enough to bump
| into it. Subverting reasonable expectations, having defaults
| tuned for the minority situation, and having inconsistent
| defaults are all footguns in my opinion. Footguns can be
| unavoidable in many cases when you don't have any reason to
| believe there is a majority or minority usage pattern, but
| that doesn't seem to be the case here based on what the
| author & people in the thread seem to be saying.
|
| Arguing that someone needs to learn arbitrarily many things
| to properly use a tool is just gatekeeping; this isn't the
| only footgun in Postgres. If you notice, there's reflection
| going on here on whether there may be ways to improve the
| tool to begin with (e.g. maybe the default for FKs should be
| to index them given that that's what people usually do on FKs
| anyway & it's the default for PKs).
| hn_throwaway_99 wrote:
| Did someone pee in your coffee this morning?
|
| I know very clearly how it works. It's essentially a
| tradeoff: don't create indexes by default, and you don't take
| the hit of creating indexes but then hit the risk of queries
| starting to fall over due to lack of indexes when you start
| to scale. Alternatively, create the indexes (again, _by
| default_ ), with the risk that you may be creating ones you
| don't need.
|
| Importantly, all I am recommending is what the _default_
| behavior is - I still think you should be able to opt out
| when creating the FK. It 's just that (a) in my experience
| you _do_ end up wanting an index at least 80-90% of the time,
| and (b) _not_ adding indexes usually has much worse
| implications than adding them unnecessarily.
|
| Also, this is obviously an easy thing to get wrong given the
| number of times I've seen different developers hit this, and
| the fact that different DB engines have settled on different
| defaults (e.g. MySQL _does_ automatically create FK indexes
| and requires an index on all FKs).
|
| But please, continue, let me know what an idiot I am.
| saisrirampur wrote:
| Nice post! Love the unique insight here. TL;DR Create indexes on
| the foreign key columns of the table on which the foreign keys
| are defined.
| SoftTalker wrote:
| This is DBA 101 stuff. If a database is part of your software,
| you really need someone on the team who knows how it works.
| ltbarcly3 wrote:
| 100% this, however my first boss shared some wisdom with me:
|
| "The kind of people who recognize the value of expertise don't
| need to be told to look for it, the kind of people who don't
| recognize it can't be told anything."
| AnotherGoodName wrote:
| I've witnessed companies react with delight at the results
| after spending millions on consultants when all the consultants
| did was a few hours of explain query and create index.
|
| Not my problem when I see it, way to go consultants charging
| millions, but it's amazing how poorly big companies are run
| that this seriously happens.
|
| You use a database? You have someone who knows how to add
| indexes right? Right?!
| DaiPlusPlus wrote:
| Yes, but consider the marketing, uh... I mean, messaging coming
| from the NoSQL and DBaaS (CosmosDb, DynamoDb, etc) and Firebase
| crowd: "databases are _hard_ , let us manage it all for you" -
| and they've got a point: it's 2024 now, we arguably shouldn't
| need to handle those kinds of non-functional requirements by
| ourselves: a DB engine should be able to automatically infer
| the necessary indexes from the schema design, and automatically
| rebuild them asynchronously - if Google can search the web in
| under a second, then your RDBMS should have no problem querying
| your data in a fraction of the time.
|
| ...which I imagine is the impression made to a lot of (even
| most?) people who got started writing software-that-uses-a-
| database within the past decade. If you're using NodeJS then
| using a KV-store library feels far more natural than writing
| SQL in a string. At least the kids are using parameters now, so
| it's not like how every PHP+MySQL site was vulnerable to
| injection attacks...
|
| (I know that recently RDBMS now do implement automatic indexes
| based on runtime query-profiling, which is great, but it isn't
| pre-emptive, and often gets it wrong too)
|
| -------
|
| Also, who calls themselves a "DBA" anymore? That word makes me
| think of a pipe-and-suspenders type, still employed well-past
| retirement age because they're the only ones who knows how to
| keep the company's Big Iron (...or AS/400) database from
| keeling over. Thesedays it's all "Ops" - "DevOps", "SysOps",
| ..."DatabaseOps"?
| teaearlgraycold wrote:
| Google doesn't search the internet in under a second. They
| are doing a key-value lookup. Any time they change the search
| algorithm or add to their index they recompute every query
| ever searched and prepare the answers to be quickly recalled.
| arp242 wrote:
| You're not wrong, but unfortunately many teams don't.
|
| Probably my favourite "ya'll don't understand how databases
| work" was where they "reserved" space for MySQL enums; for
| example for the "active" column it would be something like:
| enum( 'active', 'deleted',
| '_futureval1', '_futureval2',
| '_futureval3', '_futureval4',
| '_futureval5', '_futureval6',
| '_futureval7', '_futureval8', '_futureval9'
| )
|
| Enums don't work like that at all; it's just a mapping of int
| to a string value for readability, and you don't need to
| "reserve" space to add future values just like you don't need
| to for ints. Adding a new enum value is easy and cheap.
| Removing them is not as it requires a full scan of all rows to
| verify they're used. Even worse, you couldn't easily rename
| enum labels (at the time, don't know if you can now), making it
| all worse than useless.
|
| Since this was all on large tables and the effort to fix it was
| relatively large, but without adding much business value, we
| never fixed it. We were basically stuck with it. It sure as
| hell annoyed me every time I looked at it.
|
| I'm not an DBA either, but spending about 5 seconds on the
| documentation for "enum" would have prevented this. This really
| doesn't require a PhD in SQL.
| mbell wrote:
| I'm not sure what the current state of things are since I
| haven't use MySQL recently but this used to be a perfectly
| valid thing to do.
|
| The issue was that MySQL doesn't use a full int to store
| enums. If your enum has 8 values, it stores in 1 byte, if it
| has more than 8, it stores it in 2 bytes. Adding that 9th
| value thus requires re-writing the entire table. So yes - it
| can make sense to "reserve space" to avoid a future table re-
| write.
|
| You also had to be careful to include `ALGORITHM=INPLACE,
| LOCK=NONE;` in your `ALTER TABLE` statement when changing the
| enum or it would lock the table and rewrite it.
| colonwqbang wrote:
| A byte would fit at least 255 different values, right? How
| often is this limit exceeded in practice.
| arp242 wrote:
| You can store 255 values in one byte, and reserving two
| bytes is not what that did.
|
| And even if I did, it still leaves the inability to
| actually rename enums without scanning the full table at
| least twice (which still doesn't seem possible in MariaDB,
| unless I missed something there).
|
| If you potentially want great flexibility you shouldn't be
| using enums in the first place but int and a relational
| mapping to another table.
| ltbarcly3 wrote:
| Find missing indexes, return SQL to create them.
| SELECT CONCAT('CREATE INDEX ', relname, '_', conname, '_ix ON ',
| nspname, '.', relname, ' ', regexp_replace(
| regexp_replace(pg_get_constraintdef(pg_constraint.oid, true),
| ' REFERENCES.*$','',''),
| 'FOREIGN KEY ','',''), ';') AS query FROM
| pg_constraint JOIN pg_class ON (conrelid =
| pg_class.oid) JOIN pg_namespace ON (relnamespace
| = pg_namespace.oid) WHERE contype = 'f' AND NOT
| EXISTS ( SELECT 1 FROM
| pg_index WHERE indrelid = conrelid AND
| conkey::int[] @> indkey::int[] AND
| indkey::int[] @> conkey::int[]);
| netcraft wrote:
| youve got an extra escape in there, probably an artifact of HN
| or something, its showing as *
| ltbarcly3 wrote:
| thank you, fixed
| heuristic81 wrote:
| This seems to have a few false positives. Multi-column indexes
| that have the first column as the foreign key work as well as a
| single column index. For more modern postgres, partial indexes
| with `WHERE column_name IS NOT NULL` on columns that can be
| null are also valid and more performant.
|
| Here's what we use in CI to check for missing indexes:
| -- Unindexed FK -- Missing indexes - For CI WITH y
| AS ( SELECT pg_catalog.format('%I', c1.relname)
| AS referencing_tbl, pg_catalog.quote_ident(a1.attname)
| AS referencing_column, (SELECT pg_get_expr(indpred,
| indrelid) FROM pg_catalog.pg_index WHERE indrelid = t.conrelid
| AND indkey[0] = t.conkey[1] AND indpred IS NOT NULL LIMIT 1)
| partial_statement FROM pg_catalog.pg_constraint t
| JOIN pg_catalog.pg_attribute a1 ON a1.attrelid = t.conrelid
| AND a1.attnum = t.conkey[1] JOIN pg_catalog.pg_class
| c1 ON c1.oid = t.conrelid JOIN pg_catalog.pg_namespace
| n1 ON n1.oid = c1.relnamespace JOIN pg_catalog.pg_class
| c2 ON c2.oid = t.confrelid JOIN pg_catalog.pg_namespace
| n2 ON n2.oid = c2.relnamespace JOIN
| pg_catalog.pg_attribute a2 ON a2.attrelid = t.confrelid AND
| a2.attnum = t.confkey[1] WHERE t.contype = 'f'
| AND NOT EXISTS ( SELECT 1 FROM
| pg_catalog.pg_index i WHERE i.indrelid = t.conrelid
| AND i.indkey[0] = t.conkey[1] AND indpred IS NULL
| ) ) SELECT referencing_tbl || '.' ||
| referencing_column as column FROM y WHERE
| (partial_statement IS NULL OR partial_statement <> ('(' ||
| referencing_column || ' IS NOT NULL)')) ORDER BY 1;
|
| Additionally I have this to specify the index creation commands
| (CONCURRENTLY is recommended for existing tables in production
| as it doesn't cause locking): -- Unindexed FK
| -- Missing indexes - Show Create Syntax WITH y AS
| ( SELECT pg_catalog.format('%I.%I',
| n1.nspname, c1.relname) AS referencing_tbl,
| pg_catalog.quote_ident(a1.attname) AS referencing_column,
| (SELECT pg_get_expr(indpred, indrelid) FROM pg_catalog.pg_index
| WHERE indrelid = t.conrelid AND indkey[0] = t.conkey[1] AND
| indpred IS NOT NULL LIMIT 1) partial_statement,
| t1.typname AS referencing_type, t.conname AS
| existing_fk_on_referencing_tbl,
| pg_catalog.format('%I.%I', n2.nspname, c2.relname) AS
| referenced_tbl,
| pg_catalog.quote_ident(a2.attname) AS referenced_column,
| t2.typname AS referenced_type,
| pg_relation_size( pg_catalog.format('%I.%I', n1.nspname,
| c1.relname) ) AS referencing_tbl_bytes,
| pg_relation_size( pg_catalog.format('%I.%I', n2.nspname,
| c2.relname) ) AS referenced_tbl_bytes,
| pg_catalog.format($$CREATE INDEX CONCURRENTLY IF NOT EXISTS %I
| ON %s%I(%I)%s;$$, c1.relname || '_' || a1.attname || 'x' , CASE
| WHEN n1.nspname = 'public' THEN '' ELSE n1.nspname || '.' END,
| c1.relname, a1.attname, CASE WHEN a1.attnotnull THEN '' ELSE '
| WHERE ' || a1.attname || ' IS NOT NULL' END) AS suggestion
| FROM pg_catalog.pg_constraint t JOIN
| pg_catalog.pg_attribute a1 ON a1.attrelid = t.conrelid AND
| a1.attnum = t.conkey[1] JOIN pg_catalog.pg_type
| t1 ON a1.atttypid = t1.oid JOIN pg_catalog.pg_class
| c1 ON c1.oid = t.conrelid JOIN
| pg_catalog.pg_namespace n1 ON n1.oid = c1.relnamespace
| JOIN pg_catalog.pg_class c2 ON c2.oid = t.confrelid
| JOIN pg_catalog.pg_namespace n2 ON n2.oid = c2.relnamespace
| JOIN pg_catalog.pg_attribute a2 ON a2.attrelid = t.confrelid
| AND a2.attnum = t.confkey[1] JOIN
| pg_catalog.pg_type t2 ON a2.atttypid = t2.oid
| WHERE t.contype = 'f' AND NOT EXISTS (
| SELECT 1 FROM pg_catalog.pg_index i
| WHERE i.indrelid = t.conrelid AND i.indkey[0] =
| t.conkey[1] AND i.indpred IS NULL )
| ) SELECT referencing_tbl,
| referencing_column,
| existing_fk_on_referencing_tbl, referenced_tbl,
| referenced_column,
| pg_size_pretty(referencing_tbl_bytes) AS referencing_tbl_size,
| pg_size_pretty(referenced_tbl_bytes) AS referenced_tbl_size,
| suggestion FROM y WHERE (partial_statement IS
| NULL OR partial_statement <> ('(' || referencing_column || ' IS
| NOT NULL)')) ORDER BY referencing_tbl_bytes
| DESC, referenced_tbl_bytes DESC,
| referencing_tbl, referenced_tbl,
| referencing_column, referenced_column;
| mbb70 wrote:
| Similar to 'If an article title poses a question, the answer is
| no', if an article promises a significant speedup of a database
| query, an index was added.
| chasil wrote:
| Lacking indexes on columns involved in a foreign key will also
| cause deadlocks in Oracle.
|
| This problem is common.
|
| "Obviously, Oracle considers deadlocks a self-induced error on
| part of the application and, for the most part, they are correct.
| Unlike in many other RDBMSs, deadlocks are so rare in Oracle they
| can be considered almost non-existent. Typically, you must come
| up with artificial conditions to get one.
|
| "The number one cause of deadlocks in the Oracle database, in my
| experience, is un-indexed foreign keys. There are two cases where
| Oracle will place a full table lock on a child table after
| modification of the parent table: a) If I update the parent
| table's primary key (a very rare occurrence if you follow the
| rules of relational databases that primary keys should be
| immutable), the child table will be locked in the absence of an
| index. b) If I delete a parent table row, the entire child table
| will be locked (in the absence of an index) as well...
|
| "So, when do you not need to index a foreign key? The answer is,
| in general, when the following conditions are met: a) You do not
| delete from the parent table. b) You do not update the parent
| table's unique/primary key value (watch for unintended updates to
| the primary key by tools! c) You do not join from the parent to
| the child (like DEPT to EMP). If you satisfy all three above,
| feel free to skip the index - it is not needed. If you do any of
| the above, be aware of the consequences. This is the one very
| rare time when Oracle tends to 'over-lock' data."
|
| -Tom Kyte, _Expert One-on-One Oracle_ First Edition, 2005.
| trollied wrote:
| Tom's great. Ask Tom taught me LOADS about Oracle back in the
| day, and I still have the book you reference.
| chasil wrote:
| This is the second edition, above deadlock discussion on page
| 211.
|
| https://javidhasanov.wordpress.com/wp-
| content/uploads/2012/0...
|
| Kyte said to search for "expert oracle database architecture
| pdf" to find these versions.
|
| https://asktom.oracle.com/ords/asktom.search?tag=tom-
| kytes-b...
| hinkley wrote:
| Why aren't indexes for FK relationships the default?
|
| If you really don't want one there should be a hint/pragma to
| turn it off.
|
| It's just such a stupid reason for a full table scan.
| marcosdumay wrote:
| On Postgres, they are. You'll get an informational message
| saying the index was automatically generated.
|
| But this is not normal behavior. I think Postgres is the only
| one that does this.
| dagss wrote:
| What popular SQL databases need is an option/hint to return an
| error instead of taking a slow query plan.
|
| That way a lot of SQL index creation -- something considered a
| black art by surprisingly many -- would just be prompted by test
| suite failures. If you don't have the right indices, your test
| fails. Simple.
|
| In this case, have TestDeleteCustomer fail, realize you need to
| add index, 5 minutes later done and learned something. Would be
| so much easier to newcomers... instead of a giant footgun and
| obscure lore that only becomes evident after you have a oot of
| data in production.
|
| Google Data Store does this, just assumes that _of course_ you
| did not look to do a full table scan. Works great. Not SQL, but
| no reason popular SQL DBs could not have an option to have query
| planners throw errors at certain points instead of always making
| a plan no matter how bad.
|
| SQL has a reputation for a steep learning curve and I blame this
| single thing -- that you get a poor plan instead of an error -- a
| lot for it.
| John23832 wrote:
| > What popular SQL databases need is an option/hint to return
| an error instead of taking a slow query plan.
|
| You can run ANALYSE on Postgres.
| dagss wrote:
| I am an MS SQL user and didn't touch postgres but can I
| assume analyse is a tool for displaying the chosen query plan
| for a query?
|
| If so it is in all DBs I would think but it is a bit too
| manual for my taste, still a big hurdle and opt-in...vs just
| writing code (including SQL code) and tests like normal and
| opt-out on getting errors if you get poor plans.
|
| But...probably some tooling could be made to do such analysis
| automatically and throw similar errors...
|
| Does statistics ever cause query plans to suddenly change on
| postgres? In MS SQL you would also need to pin the plan /
| disable statistics on tables...
| arp242 wrote:
| Analyze collects statistics the query planner uses to
| determine the query plan. It can change the resulting plan,
| yes.
|
| Production databases using different query plans sure is
| annoying and cause problems, but I'm not so sure whether
| returning errors is better. "Slow" beats "not working at
| all" in almost all cases. The typical case it will select a
| different query plan once the data grows, which is not so
| straight-forward to test for, especially since the hardware
| of your production may have quite different performance
| characteristics.
|
| Pinning the plan is temping, but has the downside you risk
| running a bad plan because what works well for your 100k
| test rows may not work equally well for your 1b actual
| rows, and testing all of that is again tricky. That's not
| really a brilliant either, and may also make your
| application slow.
|
| Just keeping an eye on slow query logs and/or query
| performance statistics is the general approach. I don't
| think it's really possible to improve on that without
| making some pretty serious trade-offs in other areas.
| dagss wrote:
| Note that I am ONLY talking about a mode to use for
| limited, trivial OLTP style queries. The kind where the
| query planner will never be in doubt -- if you just have
| the right indices in place.
|
| The kind of simple backend software queries where people
| consider NoSQL instead to avoid SQL's oddities.
|
| The mode I talk about is very inappropriate for any kind
| of reporting or analytics query or ad hoc queries etc.
|
| > "Slow" beats "not working at all" in almost all cases.
|
| In the specific context specified above, I disagree with
| this.
|
| Mainly because "not working at all" will be caught during
| testing (because you should have test coverage of your
| SQL queries). Slow = undiscovered during testing.
|
| But even assume you didn't have test coverage and the
| code made it to production -- yes I probably want a
| seldomly used "DeleteUser" API call to crash if foreign
| keys were missing indices, instead of doing it anyway and
| consume lots of DB resources.
|
| > Just keeping an eye on slow query logs and/or query
| performance statistics is the general approach.
|
| The feature I proposed was to help newbies learn SQL.
|
| This requires expertise in the team, and easily shifts
| work away from the newbies in the team writing SQL (don't
| think properly through indices during development) to the
| single SQL expert in the team.
|
| Depends a bit how your work is organized etc; I like that
| SQL indices etc are as closely linked to the backend
| development process as possible; not considered a post-
| optimization..
|
| If you get the error, you can either make the index you
| need for a perfect and trivial query plan -- or declare
| "non-trivial mode" and be back to today's situation.
| arp242 wrote:
| You don't know what the "right indexes" are, because
| sometimes "no index" _is_ the "right index". Sometimes
| because a full table scan is faster. Sometimes because
| you're okay accepting the various performance trade-offs
| (e.g. insert speed vs. update speed, storage space on
| disk).
|
| Many applications don't have tests for every single last
| trivial SQL query, and adding those just because the SQL
| server may decide to bail out because it might perhaps
| possibly could be 100ms slower is not a good way for most
| teams to spend their time.
| John23832 wrote:
| > If so it is in all DBs I would think but it is a bit too
| manual for my taste, still a big hurdle and opt-in...vs
| just writing code (including SQL code) and tests like
| normal and opt-out on getting errors if you get poor plans.
|
| The problem with that is what is considered a "poor plan"
| largely comes down to the indexes used, and suitability of
| an index is totally dependent on how it is used in the
| application.
|
| Who sees the error? The DBA? The application developer?
| What's the cutoff for "poor".
|
| The stats of the query allow those that know/care to make
| decisions. That's the one size fits all, simple, tool.
| dagss wrote:
| What I had in mind was the simple OLTP usecases. To
| compete with NoSQL in developer ergonomics.
|
| My context is people choosing NoSQL because SQL is too
| hard to learn and has too many caveats; not because of
| performance etc
|
| So basically a mode for the planner where you:
|
| - throw away statistics
|
| - consider all tables infinitely large
|
| - ...except the log(N) of an index lookup is OK
|
| - then find a non-infinite plan or crash
|
| Yes, it does not work in all cases. But it avoids giving
| SQL a bad reputation and push people towards NoSQL DBs
| for the common case of simplistic OLTP queries.
|
| The error should be seen during testing. The "DBA" sees
| it there are issues then you are deploying code without
| test coverage..
|
| For more sophisticated queries, you would not enable this
| flag, and be aware that you are doing a nontrivial query.
| tedunangst wrote:
| This seems like a very easy thing for any sort of middleware
| (or ORM) to do for you. Maybe even add typed where clauses that
| only exist for indexed columns.
| dagss wrote:
| I don't understand the idea ... do you mean a middleware that
| parse the SQL query and contains a query planner and has full
| knowledge of all the indices in the database .. or something
| else?
| marcosdumay wrote:
| What postgres needs is some command to freeze query plans. It
| doesn't need it very badly, because the planner is excellent,
| but the need exists.
|
| But what is a slow query plan in development has no relation at
| all with what would be slow to run in production. You create
| that error and the consequence will be a lot of spurious
| indexing with little practical benefit.
| hakanderyal wrote:
| Coming across that early in my freelancing career, I create
| indexes on foreign keys by default now via auto-configuring it
| with ORM.
|
| It's almost always needed, and it's easier to remove them if they
| somehow become a problem.
___________________________________________________________________
(page generated 2024-05-29 23:02 UTC)