[HN Gopher] Index Merges vs. Composite Indexes in Postgres and M...
___________________________________________________________________
Index Merges vs. Composite Indexes in Postgres and MySQL
Author : Sirupsen
Score : 74 points
Date : 2022-11-27 19:02 UTC (3 hours ago)
(HTM) web link (sirupsen.com)
(TXT) w3m dump (sirupsen.com)
| magicalhippo wrote:
| A composite index can also be used for a partial index scan[1],
| so if you're frequently looking for just int100 as well as the
| int100,int1000 combination (as per the article's example), then a
| composite index int100,int1000 can be used for queries just
| filtering on int100.
|
| The order of the columns in the composite index might matter
| then. We got some nice savings by reordering columns in indexes
| and changing the join order in queries (our DB wasn't _that_
| smart) or adding a "useless" filters to the where clause,
| allowing us to consolidate multiple indexes into one composite
| one.
|
| [1]: might be using the wrong terminology here, I'm not talking
| about a partial index[2], but using only the first N dimensions
| of a M-dimensional index.
|
| [2]: https://www.postgresql.org/docs/current/indexes-partial.html
| winrid wrote:
| There are some DBs where the order of fields doesn't matter in
| the compound index if you're just searching by one field (but
| of course is less efficient). I think Oracle supports this.
| magicalhippo wrote:
| Good point, I reworded it to be more general. Can't recall
| seeing our DB being that clever.
| thom wrote:
| Postgres is capable of doing this with some index types, but
| generally more slowly than a B-tree index being asked about
| its leftmost columns:
|
| https://www.postgresql.org/docs/current/indexes-
| multicolumn....
|
| Also worth noting that for very complex workloads that need
| to support arbitrary subsets of equality matches over
| columns, a Bloom filter might work best:
|
| https://www.postgresql.org/docs/current/bloom.html
| mattashii wrote:
| This dependz on the index type, but PostgreSQL also supports
| queries based on arbitrary attributes of multi-attribute
| indexes: both GIN and BRIN can be used for queries on any of
| the indexed expressions.
|
| Maybe eventually PG will also get index skip scans for
| btrees, which could allow for arbitrary columns searches in
| the btree; but we're not there yet by a large margin.
| bawolff wrote:
| Honestly im kind of surprised that they are even that close. I
| wonder if this changes at scale when the intersection is larger.
| Sirupsen wrote:
| Ideally, I would add three graphs to the post:
|
| (1) Table size on the x-axis, and time on the y-axis for index
| merge vs composite index
|
| (2) Number of columns on the x-axis, and time on the y-axis for
| both
|
| (3) Number of final matches on the x-axis, and time on the
| y-axis for both
|
| But ran out of time and decided to test with the table size of
| 10M rows, and a 100-ish result set. That's in my experience a
| decent representation for what you might be doing with a
| relational database.
| Lukas1994 wrote:
| Good stuff! What's the size difference between the composite
| index vs the two separate indices?
| Sirupsen wrote:
| I've added this to the article, thanks!
|
| Composite index (int64, int64): ~70 MiB in Postgres, ~350 MiB
| in MySQL
|
| Single index (int64): ~70 MiB in Postgres, ~240 MiB in MySQL
|
| If you assume the majority of an index are index entries of
| (int64, int64, int64) where the third number is some sort of
| identifier for the record on the heap, you'd expect this to be
| ~230 MiB. So Postgres does some compression very well here, and
| MySQL has a bit more overhead for its indexes it seems.
| libraryofbabel wrote:
| As you mention in the article, this is small by modern
| hardware standards and means the indexes can be entirely held
| in memory. Curious how things look for much larger tables
| where the index has to be read from disk?
| fabian2k wrote:
| Could be the deduplication newer Postgres versions have for
| B-Tree indexes:
|
| https://www.postgresql.org/docs/current/btree-
| implementation...
|
| > 67.4.3. Deduplication
|
| > A duplicate is a leaf page tuple (a tuple that points to a
| table row) where all indexed key columns have values that
| match corresponding column values from at least one other
| leaf page tuple in the same index. Duplicate tuples are quite
| common in practice. B-Tree indexes can use a special, space-
| efficient representation for duplicates when an optional
| technique is enabled: deduplication.
|
| > Deduplication works by periodically merging groups of
| duplicate tuples together, forming a single posting list
| tuple for each group. The column key value(s) only appear
| once in this representation. This is followed by a sorted
| array of TIDs that point to rows in the table. This
| significantly reduces the storage size of indexes where each
| value (or each distinct combination of column values) appears
| several times on average. The latency of queries can be
| reduced significantly. Overall query throughput may increase
| significantly. The overhead of routine index vacuuming may
| also be reduced significantly.
| Sirupsen wrote:
| Excellent, thank you! I'll add that to the article.
| fabian2k wrote:
| This is a guess on my part, though I think a plausible
| one. To verify you'd probably have to compare an index
| with unique values to one with many identical ones.
| masklinn wrote:
| An other thing which might be of interest: what if you use
| convering indexes for the two single indexes? Is postgres
| able to do an index-only scan then, thanks to the coverage?
|
| Or does it decide to use only one index and filter based on
| the covered values?
| pdhborges wrote:
| For these 64-bit index entries we'd expect to have to scan
| roughly: index_row_size[?]rows=2[?]64bit[?]10^5=1.5MiB
|
| Where do the 10^5 rows come from? With a composite index and a
| point query doesn't the database scan just the 100 returned rows?
| Sirupsen wrote:
| You're absolutely right!
|
| I forgot to move this around when I updated the article's
| structure. This is only relevant when doing the index merge.
| The article has been updated
___________________________________________________________________
(page generated 2022-11-27 23:00 UTC)