[HN Gopher] Taming High Cardinality by sharding a stream
___________________________________________________________________
Taming High Cardinality by sharding a stream
Author : trojanalert
Score : 20 points
Date : 2023-08-21 05:58 UTC (17 hours ago)
(HTM) web link (last9.io)
(TXT) w3m dump (last9.io)
| thamer wrote:
| If you're considering sharding a database, please spend some time
| finding the best key distribution strategy, and don't just use
| `key % shard_count` as if it was automatically the right way to
| do it. The distribution of values for the left side of this mod
| operator will not necessarily lead to equal distribution over the
| shards.
|
| Some will add a hash function around the key, but this only
| addresses part of the problem: for example if you started with N
| shards and ever need to add 1, you will need to move all but
| O(1/N) keys to new shards. And it's not just about growing the
| number of shards permanently, other maintenance operations such
| as replacing a host can require you to redistribute the data
| depending on how replication is set up.
|
| Consistent hashing can often help drastically reduce the number
| of keys to shuffle, but in any case it's something worth spending
| some time on early on and getting right rather than having to pay
| later for having overlooked its impact.
___________________________________________________________________
(page generated 2023-08-21 23:01 UTC)