[HN Gopher] Index Merges vs. Composite Indexes in Postgres and M...
___________________________________________________________________
Index Merges vs. Composite Indexes in Postgres and MySQL
Author : Sirupsen
Score : 132 points
Date : 2022-11-27 19:02 UTC (9 hours ago)
(HTM) web link (sirupsen.com)
(TXT) w3m dump (sirupsen.com)
| magicalhippo wrote:
| A composite index can also be used for a partial index scan[1],
| so if you're frequently looking for just int100 as well as the
| int100,int1000 combination (as per the article's example), then a
| composite index int100,int1000 can be used for queries just
| filtering on int100.
|
| The order of the columns in the composite index might matter
| then. We got some nice savings by reordering columns in indexes
| and changing the join order in queries (our DB wasn't _that_
| smart) or adding a "useless" filters to the where clause,
| allowing us to consolidate multiple indexes into one composite
| one.
|
| [1]: might be using the wrong terminology here, I'm not talking
| about a partial index[2], but using only the first N dimensions
| of a M-dimensional index.
|
| [2]: https://www.postgresql.org/docs/current/indexes-partial.html
| winrid wrote:
| There are some DBs where the order of fields doesn't matter in
| the compound index if you're just searching by one field (but
| of course is less efficient). I think Oracle supports this.
| chrsig wrote:
| > (but of course is less efficient)
|
| This means the order _does_ matter
| winrid wrote:
| Yes...?
|
| Sorry my comment is not semantically accurate enough for
| you.
| dspillett wrote:
| All can search an index by just one field if it is the first
| field covered by the index, few can use an index if your
| predicate concerns the second and not the first.
|
| IIRC Oracle does support this and calls it a skip-scan. If
| the first column in the index is not very selective this can
| be very efficient, though if the first column _is_ good in
| terms of selectivity then a skip-scan will be very
| inefficient and the DB might as well just scan the whole
| index. Given that a good choice of index usually has a fairly
| selective column as its first, and that unless storage space
| is very cramped you are likely better off just defining an
| extra index without the column that you want to not touch in
| some queries, this means that a skip-scan isn 't helpful as
| often as one might think which is one of the reasons DB
| engine maintainers give for not spending time implementing
| and maintaining the feature.
|
| There are some use cases where having the option to skip-scan
| is definitely a significant bonus, but they are rare enough
| that I understand why most engine maintainers have not
| implemented them.
| winrid wrote:
| Yes, I'm talking about skip-scan. I know pretty much all
| DBs can use compound index predicates for querying :)
|
| I can think of several cases where it would have been nice
| to have, sometimes "good enough" for a particular read
| pattern is better than additional write overhead etc.
| Sirupsen wrote:
| MySQL can do skip-scans
|
| https://dev.mysql.com/doc/refman/8.0/en/range-
| optimization.h...
| winrid wrote:
| Nice, I did not know MySQL supported this. Someday I'll
| work with MySQL again, good to know.
| magicalhippo wrote:
| Good point, I reworded it to be more general. Can't recall
| seeing our DB being that clever.
| thom wrote:
| Postgres is capable of doing this with some index types, but
| generally more slowly than a B-tree index being asked about
| its leftmost columns:
|
| https://www.postgresql.org/docs/current/indexes-
| multicolumn....
|
| Also worth noting that for very complex workloads that need
| to support arbitrary subsets of equality matches over
| columns, a Bloom filter might work best:
|
| https://www.postgresql.org/docs/current/bloom.html
| mattashii wrote:
| This dependz on the index type, but PostgreSQL also supports
| queries based on arbitrary attributes of multi-attribute
| indexes: both GIN and BRIN can be used for queries on any of
| the indexed expressions.
|
| Maybe eventually PG will also get index skip scans for
| btrees, which could allow for arbitrary columns searches in
| the btree; but we're not there yet by a large margin.
| [deleted]
| bawolff wrote:
| Honestly im kind of surprised that they are even that close. I
| wonder if this changes at scale when the intersection is larger.
| Sirupsen wrote:
| Ideally, I would add three graphs to the post:
|
| (1) Table size on the x-axis, and time on the y-axis for index
| merge vs composite index
|
| (2) Number of columns on the x-axis, and time on the y-axis for
| both
|
| (3) Number of final matches on the x-axis, and time on the
| y-axis for both
|
| But ran out of time and decided to test with the table size of
| 10M rows, and a 100-ish result set. That's in my experience a
| decent representation for what you might be doing with a
| relational database.
| arynda wrote:
| Comparison on Clickhouse, also runs in about 30-40ms, however
| there's no indexing being used and this is a full-table scan.
| create table if not exists test_table ( id
| UInt64, text1 String, text2 String,
| int1000 UInt64, int100 UInt64, int10
| UInt64, int10_2 UInt64 ) engine =
| MergeTree() order by (id) ;
| insert into test_table with repeat('b', 1024)
| as one_kib, repeat('b', 255) as bytes_255
| select number as id, one_kib,
| bytes_255, rand() % 1000 as int1000,
| rand() % 100 as int100, rand() % 10 as int10,
| rand() % 10 as int10_2 from numbers(10e6) ;
| > select count(*) from test_table where int1000 = 1 and int100 =
| 1; +-count()-+ | 9949 | +---------+
| 1 row in set. Elapsed: 0.034 sec. Processed 10.00 million rows,
| 160.00 MB (290.93 million rows/s., 4.65 GB/s.)
|
| The same table but with 1B rows instead, runs in ~1800ms
| > select count(*) from test_table where int1000 = 1 and int100 =
| 1; +-count()-+ | 999831 | +---------+
| 1 row in set. Elapsed: 1.804 sec. Processed 1.00 billion rows,
| 16.00 GB (554.24 million rows/s., 8.87 GB/s.)
|
| [1] Converted the table create and insert logic from here:
| https://github.com/sirupsen/napkin-math/blob/master/newslett...
| hodgesrm wrote:
| > however there's no indexing being used and this is a full-
| table scan.
|
| That first steatement about "no indexing being used" is not
| quite correct if the query is run exactly as you show in your
| nice example.
|
| ClickHouse performs what is known as PREWHERE processing which
| will effectively use the int1000 and int100 columns as indexes.
| It scans those columns and knocks out any blocks (technically
| granules containing by default 8192 rows) that do not values
| that match the filter conditions. It then performs a scan on
| the remaining blocks to get the actual counts.
|
| PREWHERE is effective because columns are compressed and scans
| are fast. If there's any pattern to the filter columns (for
| example monotonically increasing counters) or their values have
| high cardinality PREWHERE processing will remove a large number
| of blocks. This will make the rest of the scan far faster.
|
| In your dataset it may not be especially efficient because you
| use random values, which don't necessarily compress well, and
| the values will appear in many blocks. It works much better in
| real datasets where data are more correlated.
|
| EDIT: PREWHERE is _much_ faster in cases where you are doing
| more complex aggregation on many columns. Counts of course don
| 't need to scan any extra values so it's not helpful in this
| case.
|
| p.s. Scans are ridiculously fast.
| arynda wrote:
| > ClickHouse performs what is known as PREWHERE processing >
| p.s. Scans are ridiculously fast.
|
| Good point, I should have mentioned this was basically a
| worst-case scenario for Clickhouse as the data layout is
| totally random (same approach as OP used in their benchmark)
| and isn't able to utilize any granule pruning, sorting, or
| skip indexing, but is still able to achieve such remarkable
| speeds.
| hodgesrm wrote:
| What's cool is that even in this case ClickHouse is still
| stupid fast compared to most other databases. ;)
| paulmd wrote:
| > p.s. Scans are ridiculously fast.
|
| this is really the lesson of SOLR. full-scan all the things,
| aggregate as you go, broadcast disk IO to multiple listeners.
|
| why do a bunch of 4K random IO when you could full-scan at
| bus speed? yeah you can make the 4K random IO super fast but
| that's not where hardware is going, and it's also
| scalable/clusterable where RDBMS caps out at one machine and
| clustering is kinda ugly.
| Sirupsen wrote:
| Are you aware of a good write-up on how Clickhouse/other
| columnar databases do the intersection?
| twoodfin wrote:
| One obvious way is to build a bitmap indexed by row position
| for each filter. Both the "&" intersect and the final bit
| count can be rocket fast on modern CPU vector units.
| arynda wrote:
| Not in particular sorry, most of the good content I've found
| is on Altinity [1] and Alibaba's technical blogs [2][3].
| These tend to be mostly focused on how the data itself is
| stored and how to use Clickhouse, but don't really dive into
| the specifics of how query processing is performed.
|
| [1] https://altinity.com/blog/
|
| [2] https://www.alibabacloud.com/blog/clickhouse-kernel-
| analysis...
|
| [3] https://www.alibabacloud.com/blog/clickhouse-analysis-of-
| the...
| hodgesrm wrote:
| ClickHouse uses a single primary key index, which matches the
| sort order, plus skip indexes, which knock out blocks to
| scan. Here's a writeup that explains skip indexes.
|
| https://altinity.com/blog/clickhouse-black-magic-skipping-
| in...
|
| You can also check out the following webinar, which explains
| how ClickHouse indexes work in general. Here's a link to the
| discussion of indexes.
|
| https://youtu.be/1TGGCIr6dMY?t=1933
|
| p.s. The blog article is missing some images that WordPress
| seems to have lost but you'll still get the idea. (Should be
| fixed shortly.)
|
| Disclaimer: I work for Altinity
| Lukas1994 wrote:
| Good stuff! What's the size difference between the composite
| index vs the two separate indices?
| Sirupsen wrote:
| I've added this to the article, thanks!
|
| Composite index (int64, int64): ~70 MiB in Postgres, ~350 MiB
| in MySQL
|
| Single index (int64): ~70 MiB in Postgres, ~240 MiB in MySQL
|
| If you assume the majority of an index are index entries of
| (int64, int64, int64) where the third number is some sort of
| identifier for the record on the heap, you'd expect this to be
| ~230 MiB. So Postgres does some compression very well here, and
| MySQL has a bit more overhead for its indexes it seems.
| libraryofbabel wrote:
| As you mention in the article, this is small by modern
| hardware standards and means the indexes can be entirely held
| in memory. Curious how things look for much larger tables
| where the index has to be read from disk?
| fabian2k wrote:
| Could be the deduplication newer Postgres versions have for
| B-Tree indexes:
|
| https://www.postgresql.org/docs/current/btree-
| implementation...
|
| > 67.4.3. Deduplication
|
| > A duplicate is a leaf page tuple (a tuple that points to a
| table row) where all indexed key columns have values that
| match corresponding column values from at least one other
| leaf page tuple in the same index. Duplicate tuples are quite
| common in practice. B-Tree indexes can use a special, space-
| efficient representation for duplicates when an optional
| technique is enabled: deduplication.
|
| > Deduplication works by periodically merging groups of
| duplicate tuples together, forming a single posting list
| tuple for each group. The column key value(s) only appear
| once in this representation. This is followed by a sorted
| array of TIDs that point to rows in the table. This
| significantly reduces the storage size of indexes where each
| value (or each distinct combination of column values) appears
| several times on average. The latency of queries can be
| reduced significantly. Overall query throughput may increase
| significantly. The overhead of routine index vacuuming may
| also be reduced significantly.
| Sirupsen wrote:
| Excellent, thank you! I'll add that to the article.
| fabian2k wrote:
| This is a guess on my part, though I think a plausible
| one. To verify you'd probably have to compare an index
| with unique values to one with many identical ones.
| masklinn wrote:
| An other thing which might be of interest: what if you use
| convering indexes for the two single indexes? Is postgres
| able to do an index-only scan then, thanks to the coverage?
|
| Or does it decide to use only one index and filter based on
| the covered values?
| pdhborges wrote:
| For these 64-bit index entries we'd expect to have to scan
| roughly: index_row_size[?]rows=2[?]64bit[?]10^5=1.5MiB
|
| Where do the 10^5 rows come from? With a composite index and a
| point query doesn't the database scan just the 100 returned rows?
| Sirupsen wrote:
| You're absolutely right!
|
| I forgot to move this around when I updated the article's
| structure. This is only relevant when doing the index merge.
| The article has been updated
___________________________________________________________________
(page generated 2022-11-28 05:00 UTC)