[HN Gopher] The SQLite Index Suggester
___________________________________________________________________
The SQLite Index Suggester
Author : polyrand
Score : 147 points
Date : 2022-07-05 18:02 UTC (4 hours ago)
(HTM) web link (www.sqlite.org)
(TXT) w3m dump (www.sqlite.org)
| yvan wrote:
| I find it interesting, as recently I was reading about how
| complex it would/could be to create an index suggester.
|
| https://www.depesz.com/2021/10/22/why-is-it-hard-to-automati...
| Someone wrote:
| It's like writing a compiler or interpreter: writing one is
| easy; writing a good one extremely hard.
|
| This suggester isn't very good. It takes a single query and
| suggests indexes for it. A good one would take a mix of queries
| and suggest a set of indexes, also considering the impact on
| write speed of additional indexes (table updates often need to
| update indexes, too)
|
| For the example in this article, if the table is large and the
| average number of rows with a given 'a' value is close to 1 or
| if most queries are for 'a' values that aren't in the database,
| it may even be better to do CREATE INDEX x1a ON
| x1(a);
|
| That gives you a smaller index, decreasing disk usage.
| zimpenfish wrote:
| > This suggester isn't very good. It takes a single query and
| suggests indexes for it.
|
| The underlying API can analyse multiple queries - looks like
| they've only coded up the test `.expert` command for one.
|
| From [1], "The sqlite3expert object is configured with one or
| more SQL statements by making one or more calls to
| sqlite3_expert_sql(). Each call may specify a single SQL
| statement, or multiple statements separated by semi-colons."
| then "sqlite3_expert_analyze() is called to run the
| analysis."
|
| [1] https://www.sqlite.org/src/dir?ci=trunk&name=ext/expert
| rileymat2 wrote:
| >It takes a single query and suggests indexes for it. A good
| one would take a mix of queries and suggest a set of indexes,
| also considering the impact on write speed of additional
| indexes (table updates often need to update indexes, too)
|
| This is my pet peeve with SQL Server SSMS will give you a
| missing index suggestion and cost... the problem is
| inexperienced people will take the suggestion as is and
| create way too many highly specialized indexes over time.
| lfittl wrote:
| As one of the authors of the tool that was critiqued in
| depesz' blog post from last year, I fully agree that looking
| at this on a per-query basis is too simple.
|
| Since we wrote our initial index suggestion tool for
| Postgres, we actually went back to the drawing board,
| examined the concerns brought up, and developed a new per-
| table Index Advisor for Postgres that we recently released
| [1].
|
| The gist of it: Instead of looking at the "perfect" index for
| each query, its important to test out different "good enough"
| indexes that cover multiple queries. Additionally, as you
| note, the write overhead of indexes needs to be considered
| (both from a table writes / second approach, as well as disk
| space used at a given moment in time).
|
| I think this is a fascinating field and there is lots more
| work to be done. I've also found the 2020 paper "Experimental
| Evaluation of Index Selection Algorithms" [2] pretty useful,
| that compares a few different approaches.
|
| [1] https://pganalyze.com/blog/automatic-indexing-system-
| postgre...
|
| [2] https://www.vldb.org/pvldb/vol13/p2382-kossmann.pdf
| krylon wrote:
| This sounds really cool!
|
| I've sometimes wondered why server-based RDBMSs don't offer
| something like this. Is it too hard to implement? Or did people
| just not think of it? Or do they have something like this and I
| just never learned about it?
| pronoiac wrote:
| You'd probably enjoy this other discussion on the front page:
| https://news.ycombinator.com/item?id=31990836
| [deleted]
| A321321 wrote:
| Microsoft SQL Server definitely has suggestions for missing
| indexes. The quality of the suggestions are debatable though
| gfody wrote:
| sql server has the index tuning wizard (itwiz) that will
| sample your data and make suggestions, and more lately the
| missing index stuff as a consequence of how the execution
| planner is designed (iirc it tries to generate optimal plans
| regardless of what indexes are available and when plans are
| eliminated because the necessary indexes don't exist it emits
| records for the various "missing index" dmvs)
| dmitriid wrote:
| Microsoft SQL Server query analyzer was was essential in
| identifying missing indexes. Wherever you saw "full table
| scan" on a table, you knew it was missing an index.
|
| I don't know if it's still around, but in mid-2000s it was
| light years ahead of any other database.
| striking wrote:
| https://explain.dalibo.com/ and https://explain.depesz.com/
| are indispensable tools for visualizing Postgres' EXPLAIN
| output in a similar way. I will agree that the MSSQL query
| visualizer is more visually effective, though.
| iasay wrote:
| Indeed. It's even more fun when it starts generating really
| bad execution plans for table statistics it has completely
| got wrong, almost always while I'm eating my lunch.
| krylon wrote:
| Thank you!
| abraae wrote:
| Oracle (and likely others) have had this since well into the
| last century. As you might expect since it can save a huge
| amount of money on a large database, it's part of an
| (expensive) add on.
| realPubkey wrote:
| I did something similar for a NoSQL database [1]. The biggest
| surprise was how much the query performance can change for an
| index when the data distribution changes slightly. For example
| using a real distribution for an 'age' field instead of just
| using a random number like in the test data.
|
| [1] https://rxdb.info/query-optimizer.html
| contingencies wrote:
| _Sin #5 - Unrepresentative workloads: The common assumption in
| academic research systems is that the cluster workload is
| relatively homogenous. Most research evaluations measure
| performance by running a single job on an otherwise idle
| cluster._ - Schwarzkopf et al, 'The seven deadly sins of cloud
| computing research' (2014)
|
| .. via https://github.com/globalcitizen/taoup
| jessaustin wrote:
| It seems better for birthdate to be stored in the database and
| age just to be calculated when needed?
| realPubkey wrote:
| Yes of course you better store the birthday normally. This
| was just a projection of the real data set, used in my
| testings.
| HPsquared wrote:
| The 'age' thing might be due to the difference between floats
| and ints?
|
| Integer "age" has many repeats, but random floats are unique.
| That, or random ints might be from a large pool, again not many
| repeats.
___________________________________________________________________
(page generated 2022-07-05 23:01 UTC)