[HN Gopher] How does database indexing work? (2008)
___________________________________________________________________
How does database indexing work? (2008)
Author : luu
Score : 188 points
Date : 2022-03-07 22:30 UTC (1 days ago)
(HTM) web link (stackoverflow.com)
(TXT) w3m dump (stackoverflow.com)
| programmarchy wrote:
| I had assumed btree in Postgres was referring to a binary tree
| but that is definitely not the case. I also learned when to use a
| hash index, rather than a btree. (If I understand correctly if
| you're only using the = operator rather than something that needs
| sorting with >, <, <=, etc. then a hash will perform better.)
| efficax wrote:
| theoretically a hash lookup is O(1) so should perform better if
| the key is unique, but you give up a lot of other features
| (like partial matches, searching, and range queries).
|
| Binary trees are the ideal in a world of a pure von Neumann
| model. In practice data structures perform better when data is
| grouped together and this is what the b-tree gets you. It's
| both more disk and cache efficient in real world workflows
| dikei wrote:
| Unless your own benchmarks prove hash indexes to have major
| benefits for your queries, stick with the default B-tree index:
| you never know what queries you'll need in the future.
| latch wrote:
| That's a bit too YAGNI for me. You have a `users` table with
| an `tenant_id uuid not null` column. You aren't going to need
| uniqueness on it and you aren't going to need range
| operations on it it.
|
| And..even if you did, you could re-index it.
| settrans wrote:
| Indeed, they perform much better, but caveat emptor: hash index
| operations are not currently WAL-logged. That means that: 1) if
| you crash during a hash index update, all bets are off and
| you'll have to rebuild, and 2) they will just be wrong in a
| replicated configuration.
|
| EDIT: This is no longer true. Thanks latch!
| latch wrote:
| This was fixed in PG 10 (Oct 2017). See
| https://rhaas.blogspot.com/2017/09/postgresqls-hash-
| indexes-...
| settrans wrote:
| Oh, that's great news! (I last used evaluated them in
| August 2017)
| latch wrote:
| Less important now, but they're unsafe to use prior to PG 10.
|
| As of 14.1, they still don't support unique constraints:
| select amname, pg_indexam_has_property(oid, 'can_unique') from
| pg_am amname | pg_indexam_has_property
| --------+------------------------- heap | $?
| btree | t hash | f gist | f gin
| | f spgist | f brin | f
| dhosek wrote:
| A btree is a kind of binary tree, one which is self-balancing
| to keep O(log n) lookups from turning into O(n) lookups which
| is possible without the self-balancing. (Think about a naive
| binary tree where you insert elements in sorted order--then
| everything will be in the right (or left if you start with the
| maximum value) branch and you need to traverse the whole
| structure to find the node you're looking for.)
|
| For sorting or adjacency queries, you want a btree index. A
| hash index can give you O(1) lookup with the drawback that you
| lose ordering completely (whereas getting a sorted selection of
| records with a tree is a O(1) operation). The trick here is
| having a hashing function which is both fast and avoids
| collisions (the worst case scenario is hash(x)=1 where
| everything hashes to the same value and you end up again with
| O(n) lookups).
| masklinn wrote:
| > A btree is a kind of binary tree
|
| By definition a _binary_ tree means there are two children
| per node.
|
| Binary is the one thing a b-tree is _not_.
| programmarchy wrote:
| Right, I think you could say a binary tree is a kind of
| B-tree, though.
| jlokier wrote:
| It's not, because in a standard B-tree all the leaf nodes
| are at exactly the same depth, whereas in a binary tree
| they are not in general, unless the binary tree is
| balanced with exactly 2N leaves.
|
| This difference occurs because a B-tree can have interior
| nodes that aren't completely full, whereas a binary
| tree's interior nodes have exactly 2 children.
| getcrunk wrote:
| Uhh. A b tree is a self balancing binary (search) tree.
| coding123 wrote:
| Not binary, it can and generally has more than 2 children
| per node. That's the main difference.
| broken8ball wrote:
| Binary implies two. Both B-trees and BSTs exist in the
| universe of m-ary trees. A BST and a B-tree would be
| equivalent only if the branching factor of the B-tree was
| set to 2, but practically this is rarely the case with
| indexes, given that a higher branching factor is
| generally more favorable to lookup times ("block
| accesses") since the hight of the B-tree is reduced as m
| increases.
| masklinn wrote:
| > practically this is rarely the case with indexes
|
| Seems like it would be extremely odd to have a 2-2 btree,
| you'd just get a significantly more complicated BST no?
|
| I'd figure you'd want to fill a cacheline with something
| typical-ish, so probably at least 4 (this way if you have
| 4 children and 3 keys, the keys are 8 bytes and the child
| links are straight pointers your node is 56 bytes and you
| can add some metadata e.g. a bitmap).
|
| Apparently Rust's BTreeMap is a 6-11 btree but I don't
| know how they picked the branching factor.
| winrid wrote:
| Don't forget sorting when doing index design! With the right
| index the DB doesn't even need to sort - it can just fetch from
| the index in index order!
| blueberrychpstx wrote:
| Read here [0]
|
| Why would I do this? [1]
|
| [0]
| [https://otter.ai/u/SDndzmSLow_a2rNNzm35ohw4y9w](https://otte...
|
| [1]
|
| [https://www.notion.so/enoemos/https-stackoverflow-com-questi...
| leto_ii wrote:
| Completely besides the important point, but is it the same guy
| who asked and answered the question? What am I missing?
| yardshop wrote:
| I've seen it happen a bunch. Since StackOverflow is a Q&A
| format, if someone knows that many people will ask a question
| they know the answer to, they will write both. Preemptive
| questioning perhaps!
|
| He also wrote the other question that he links to, How to index
| a database column, with the request to get answers for each
| major type of database. So he's asking not so much to learn the
| answer but to provide a place for others to provide a catalog
| of answers.
|
| This is different of course from the case where someone asks a
| genuine question then comes back and writes an answer to
| themselves when they have learned it. This happens a lot too.
| westurner wrote:
| https://en.wikipedia.org/wiki/Database_index
|
| https://en.wikipedia-on-ipfs.org/wiki/Database_index
|
| From "Hosting SQLite Databases on GitHub Pages"
| https://news.ycombinator.com/item?id=28021766 re: edgesearch,
| HTTP/3 QUIC UDP, :
|
| > _Serverless full-text search with Cloudflare Workers,
| WebAssembly, and Roaring
| Bitmapshttps://github.com/wilsonzlin/edgesearch _
|
| >> _How it works: Edgesearch builds a reverse index by mapping
| terms to a compressed bit set (using Roaring Bitmaps) of IDs of
| documents containing the term, and creates a custom worker script
| and data to upload to Cloudflare Workers_
| manish_gill wrote:
| Relevant Book: https://www.databass.dev/
| [deleted]
| dhx wrote:
| Also remember to create tables using efficient structure packing
| and reordering[1], taking into consideration how the storage
| serializer operates (see [2] for PostgreSQL). An example provided
| by Gitlab[3] demonstrates structure packing and reordering
| principles in PostgreSQL. It may also be beneficial depending on
| types of workload to limit the number of columns in a table to
| optimise for locality of reference[4] once the data is
| deserialized into memory.
|
| [1] http://www.catb.org/esr/structure-packing/
|
| [2] https://www.postgresql.org/docs/14/storage-page-layout.html
|
| [3]
| https://docs.gitlab.com/ee/development/ordering_table_column...
|
| [4] https://en.wikipedia.org/wiki/Locality_of_reference
| snake_plissken wrote:
| Can someone expand on this statement (from the top answer, last
| paragraph):
|
| "Since indices are only used to speed up the searching for a
| matching field within the records, it stands to reason that
| indexing fields used only for output would be simply a waste of
| disk space and processing time when doing an insert or delete
| operation, and thus should be avoided."
|
| Specifically "indexing fields used only for output", what does
| this mean? I interpreted it as "indexes used only for `select`
| statements", but that would see completely counter to the main
| reason you'd want an index e.g. to speed up record retrieval.
| esnard wrote:
| "fields used only for output" means "fields that aren't used in
| any WHERE, ORDER BY, GROUP BY clauses".
|
| They are just SELECTed, i.e. sent in the output (the query
| results).
| fabian2k wrote:
| That part is wrong if your database can perform index-only
| queries. In that case if all columns you query are part of the
| index the query can be answered from the index alone, so you
| don't have to read any parts of the actual table.
| geophile wrote:
| I think that advice is not necessarily correct.
|
| Suppose you have this query: select a, b
| from T where x = 1
|
| And suppose you have an index on x. You can locate the x=1
| INDEX records using the index quickly (probably no more than
| 1-3 disk accesses). But then the qualifying TABLE records have
| to be retrieved. That index lookup could turn up thousands of
| qualifying records, and now you have to retrieve each record to
| get the a, b values.
|
| Now suppose that you replace your index on (x) with an index
| keyed by (x, a, b). You can still search for x=1 using this
| index, but now, the (a, b) values that you are SELECTing are
| present in the index itself. You don't have get the table
| records to get those values. That can save a lot of random
| accesses, and there are secondary effects, since the pages
| containing those records aren't brought into the disk cache.
|
| Yes, this wider index has space and update costs, but the
| benefit is often worth it.
| uhoh-itsmaciek wrote:
| Right, Postgres supports this use case explicitly with
| INCLUDE [1], which can be more efficient than just indexing
| all the columns you need. Pretty sure SQL Server has
| something similar.
|
| [1]: https://www.postgresql.org/docs/current/sql-
| createindex.html
| [deleted]
| didip wrote:
| You know what I would like to see (for learning)?
|
| A simplified project how indices, for example in b-tree, are
| stored in segment files.
| xtracto wrote:
| This. I was hopeful that the linked article would explain how
| b-trees are practically used for indexes in DB engines. I
| already know what a db index is (databases 101 software eng)
| and why are they used, but would love a clear explanation of
| how are they implemented
| btilly wrote:
| https://perl.plover.com/BTree/article.html is an introductory
| article about how a b-tree is implemented. It has example
| code for it in Perl.
|
| It doesn't actually match low level implementations for a
| number of reasons. Such as the need to fix the page size the
| block fits in rather than the count of things in the block,
| and locking for concurrent access. But the data structure
| itself still looks the same.
| kyberias wrote:
| Does anyone know what an index card was?
| Xorakios wrote:
| In school and public libraries it was a physical piece of thick
| paper, similar to a Rolodex card, which had multiple copies,
| always one per author and one per title, but sometimes backups
| by subject manager. And when I was a kid, there were often
| multiple, redundant copies that one could take to the librarian
| to check out the physical book if it were particularly
| important.
|
| Yep, that old :(
| dhosek wrote:
| Wait, you had card catalogs where you took the cards out of
| the drawer? I grew up with card catalogs where there was a
| metal rod at the bottom of each drawer that ran through holes
| in the bottom of the cards. There were cards for author,
| title and subject with multiple subject card possible for
| each book. The CIP blocks on the copyright pages of books
| list all the possible subject headings.
|
| And I've been thinking that I don't use the subject indexes
| on the online catalog anywhere near enough. They can surface
| things that the standard keyword search might miss or bury
| among irrelevant results.
| tomhallett wrote:
| I highly recommend the book "SQL Performance Explained" by Markus
| Winand: https://www.amazon.com/Performance-Explained-Everything-
| Deve...
| eatonphil wrote:
| If you want to see a dumb example of how you can implement
| indexing on top of a SQL database without it, I wrote a tutorial
| on implementing basic indexes [0] as part of a series on making a
| SQL database from scratch.
|
| And of course, Use the Index Luke is a great reference for the
| real world.
|
| [0] https://notes.eatonphil.com/database-basics-indexes.html
|
| [1] https://use-the-index-luke.com/
| unemphysbro wrote:
| Phil, These are great! Thank you!
| bogomipz wrote:
| I loved your series on writing a SQL DB from scratch. I also
| enjoyed your recent SMTP post as well. I hope you continue to
| do these, they're always good reading.
| eatonphil wrote:
| Thank you! There's a very long list of topics to write about
| so I'm not too worried about continuing. :) Always open to
| ideas as well.
| frosted-flakes wrote:
| [1] above is a fantastic resource. I highly recommend Markus
| Winand's book "SQL Performance Explained" if you're in need of
| better performing SQL queries. It's quite short and small, but
| it covers all the bases, and uses introductory language.
| jihadjihad wrote:
| My team has really enjoyed the concise and informative book [0]
| from Markus Winand. It comes up sometimes here on HN, and IMHO
| it's a great resource on the topic for anyone who touches a
| relational database in their line of work.
|
| His website "Use the Index, Luke!" [1] is also great.
|
| 0: https://sql-performance-explained.com
|
| 1: https://use-the-index-luke.com
| quelltext wrote:
| So how does a binary search work when the disk blocks are
| essentially a linked list as the top answer initially says? They
| just assume that it's possible to jump through a contiguous array
| of blocks for their later analysis but initially state blocks
| aren't necessary contiguous.
| jameshart wrote:
| Their point is that in order to get log(n) access speed you
| need a sorted random access list, not that if you sort the
| items in a linked block list you can use binary search on it.
| They go on to explain how a database index constructs such a
| thing over the linked block list.
| jlokier wrote:
| Honest answer: If the blocks aren't* contiguous, there's an
| index for that!
|
| (* It depends on the database type. In an LSM-tree layer, the
| blocks are typically contiguous in a file, so direct binary
| search is possible, though it might not be the most efficient
| method. The filesystem handles locating blocks in this case.)
|
| The database table is disk blocks containing data where the
| blocks are logically in order of keys, as if in a contiguous
| array. Really they are laid out differently on disk.
|
| So there's a block index, which is just a smaller version of
| the same table data structure: a table mapping keys to values,
| implemented as blocks in logical order of keys, as if in a
| contiguous array. Except in this index, the map is from key-
| ranges of the first table to block locations on disk. This
| index lets you go from keys to block locations on disk, and it
| also lets you do a course-grained version of the binary search
| to narrow down to a single block of the larger table.
|
| Ok, but then how do you look things up in the block index if
| it's using the same kind of data structure, made of multiple
| blocks? Same again: Each block index has its own smaller block
| index.
|
| This neat recursive definition gives you a tower of
| progressively smaller block indexes until the size is just one
| block, which doesn't need an index.
|
| Guess what you get when each table of logically sorted,
| contiguous blocks has a smaller index to say where the blocks
| are really located?
|
| The data structure is called a B-tree (technically a B+tree),
| and the smallest index is the root block.
|
| The tree structure arises from the recursive description, where
| each table has another table (until it stops).
|
| This is a decidedly non-standard way of describing the B-tree
| data structure. There's no explicit tree. But it's a valid and
| useful alternative view. (Note, these block indexes are not
| what is generally meant by database indexes, and they are not
| visible at the SQL level. They are an implementation detail.)
|
| One of the useful things to emerge from this view is that each
| level is just logically a flat table, made of blocks that are
| logically in key order but have arbitrary physical location.
| It's not necessary for every index to use the same data
| structure, or be on the same storage. Depending on how you
| think about algorithms, this description might be simpler to
| work with.
| dvirsky wrote:
| That's why disk indexes usually use B-trees.
| anamax wrote:
| disk block # = f(index) can be implemented with a b-tree but
| it can also be implemented (in some cases) with a deep-
| learning based approach.
| tonymet wrote:
| Take a large unordered text file and then create an ordered index
| in a separate file of word ==> line
|
| write a brief binary search algo to search the index.
|
| Compare searching the words with a "table scan" on the first file
| using grep, vs the binary search on the index.
|
| You will find the table scan is O(n) and your binary search is
| roughly O(log n)
|
| In 60 minutes you'll understand more about indexing than reading
| stack overflow.
| tonymet wrote:
| bonus points your interview skills will improve on both coding
| & system design
| maxmcd wrote:
| Obligatory recommendation for the CMU Databases Systems lecture
| series by Andy Pavlo.
| https://www.youtube.com/watch?v=oeYBdghaIjc&list=PLSE8ODhjZX...
|
| Here's the lecture on tree indexes:
| https://www.youtube.com/watch?v=JHZFc4hMGhk
| nijave wrote:
| ...the same way as non-database indexing? e.g. a lookup table for
| values in a different order than they're stored
| crdrost wrote:
| There's kind of three things you want to show people to get
| them to understand indexes. Let's say you sell an internet
| police service and have a table of URLs you are crawling for a
| table of clients, each client gets some client_id, each URL
| references the client_id of the client who wants it policed.
| Here are the things you want to say:
|
| 1. Binary search. It's be really handy to have the URLs stored
| in a different order, maybe grouped by client_id first and then
| all the URLs for the same client are alphabetical. The index
| conceptually just contains the same rows in a different order.
|
| 2. The read vs write trade-off. Your data fits in memory if
| there's less than 10 TB of it (possibly up to 100 TB) so the
| big thing is latency, what an index does is, it requires you to
| write the same information twice in order to read it
| exponentially faster, so depending on how often do you write
| versus how often you read, indexes make less or more sense.
|
| 3. High fan-out trees. Once you have explained that the index
| is just an the same data in a different order, you have
| inserted a sort of ticking time bomb of misunderstanding. The
| problem is, a beginner programmer's notion of a freely
| indexable list, such that binary search works on it, is an
| array. How do we insert into an array? With all the work of
| array copies! So you get people who think that random ID
| columns should never be indexed, because every write into the
| middle of the array has to shove half the data one cell to the
| right--unacceptable! So it really helps to say, "now we can
| store this as a binary tree, make one comparison per level of
| the tree." Pause for understanding that the "list," can be
| stored with such a tree. "but, modern computers have a nice
| property that right after accessing some memory the memory near
| that location is loaded into a cache... this makes arrays real
| fast. Suppose we don't just have the binary tree thing of 2
| child nodes each representing 50% of our data, but an array
| with 100 pointers each representing 1% of the data, you get
| something like a 6x memory speedup from the cache, because your
| tree is one sixth the depth of the binary tree. And that's
| basically what a B-tree is, you use a higher fan-out of your
| sort tree to exploit the cache."
| mikestew wrote:
| It is, but with all respect, it's not a _useful_ answer (or
| even accurate; see below). If one takes, for example and
| contrast, the top-voted answer, well, I could probably
| implement a database index scheme of some sort based on that
| explanation. Might not be a good one, but it would work. Parent
| comment's answer? I wouldn't know where to start.
|
| Additionally, parent's answer doesn't actually answer the
| question, by saying what an index _is_ , but the question was
| "how does it work?" The top SO answer does a great job with
| this one, I think. Another one a few answer down also goes into
| some of the downsides (it slows down writes, for one). The
| whole page is worth a skim if you're not a DBA but have to
| fiddle with DBs on occasion.
| zomnoys wrote:
| This feels a bit reductionary. Database indexes definitely have
| more to consider than just simple lookup tables. I'd say that
| mental model breaks down in practice.
___________________________________________________________________
(page generated 2022-03-08 23:01 UTC)