[HN Gopher] Overflow in consistent hashing (2018)
___________________________________________________________________
Overflow in consistent hashing (2018)
Author : g0xA52A2A
Score : 68 points
Date : 2024-05-19 10:22 UTC (1 days ago)
(HTM) web link (rmarcus.info)
(TXT) w3m dump (rmarcus.info)
| User23 wrote:
| Anyone interested in consistent hashing should take a look at the
| simpler and more general rendezvous hashing[1] too.
|
| [1] https://en.m.wikipedia.org/wiki/Rendezvous_hashing
| Snawoot wrote:
| There is also Aggregated Rendezvous Hashing[1] available, which
| kind of solves problem of growth of required calculations for
| large numbers of nodes.
|
| [1] https://github.com/SenseUnit/ahrw
| vlovich123 wrote:
| I used a different technique that seems to work well in
| practice. Prehash the names of the nodes beforehand. Then
| hash the key and combine the hashes using a much cheaper
| algorithm [1]. You only need to do a single hash per key as
| with consistent hashing and then a very fast O(n) operation
| instead of a hash to find the optimal node. This does degrade
| to an O(nlogn) sort if you need to find the N best nodes
| instead of the single best node (e.g. to have a concept of
| fallbacks that you hand off to the next link in the chain),
| but I found that to not actually matter where I implemented
| it (generally was routing on < 100 nodes total in the first
| place).
|
| [1] https://stackoverflow.com/a/27952689
| T0pH4t wrote:
| There is also jump consistent hashing
| https://arxiv.org/pdf/1406.2294
| kwillets wrote:
| I believe this problem diminishes as the key count goes up.
| RMarcus wrote:
| Depends!
|
| If you double the number of keys and you double the number of
| bins (load factor stays constant), then the problem becomes
| _much worse_ very quickly.
|
| If you double the number of keys and you double the _size_ of
| each bin (load factor stays constant), then the problem
| diminishes as you suggest. BUT, larger bins are more sensitive
| to changes in load factor.
|
| The sibling comment (
| https://news.ycombinator.com/item?id=40415826 ) does a good job
| of summarizing how post-2018 systems handle this issue.
| lucianbr wrote:
| Aren't you supposed to use more buckets than nodes, with each
| node hosting a number of buckets not all adjacent on the circle?
| This I expect would reduce the problems described in the article,
| though not eliminate them of course.
| mjb wrote:
| Having multiple logical buckets per physical node doesn't fix
| this problem. It does help ensure that the bucket sizes are
| closer to uniform, but not that items hash uniformly into the
| available buckets. Even if all the buckets are exactly uniform
| (as in some of the simulations on this page, if I understand
| correctly), to inconsistent hashing of items to buckets leads
| to inconsistent load.
|
| Multiplicity does help with a major operational concern,
| though: when a node fails and recovery is needed, the recovery
| traffic can be spread uniformly across all cluster members
| rather than hot-spotting a small number of neighbors.
| Incidentally, this is a classic congestive collapse scenario in
| consistent hashed systems: a node looks failed because its
| overloaded, which starts recovery, which adds load to the
| neighbors which makes them look overloaded, and the whole thing
| collapses.
| nielsole wrote:
| I always thought what systems did in practice is that each
| node can have a variable number of logical buckets assigned
| to it, so if there is an uneven distribution, the physical
| node subscribes to more or less buckets. This makes it so
| that the maximum logical bucket size is the maximum number of
| items a physical node can hold.
| mjb wrote:
| This is a very cool page. I love little simulations like this for
| building intuition for systems problems.
|
| Practical systems deal with this by not caring strongly about
| overflow (caches), by running at low enough utilizations that
| overflow is very unlikely given their item counts (e.g. Dynamo),
| by using explicit partitioning rather than consistent hashing
| (e.g. DynamoDB), by being able to take advantage of multi-tenancy
| to drive up per-physical-node utilization even in the case of low
| per-logical-node utilization, or by using some additional
| algorithmic sophistication (e.g. Chen et al
| https://arxiv.org/pdf/1908.08762).
|
| In practice, this kind of overflow is a big deal for systems that
| deal with relatively small numbers of large objects, and are not
| as big a deal for systems that deal with large numbers of small
| objects. Try out the numbers in the page's "Handy Calculator" to
| see how that plays out.
|
| It's also worth mentioning that this isn't unique to consistent
| hashing, but is a problem with random load balancing more
| generally. "Pick a random server and send traffic to it" is an OK
| load balancing strategy when requests are small and servers are
| large, but a terrible one when requests become relatively large
| or expensive. In the general load balancing/placement problem
| this is easier than the storage case, because you don't need to
| find requests again after dispatching them. That makes simple
| algorithms like best-of-2 and best-of-k applicable.
| RMarcus wrote:
| This is my post from 2018 (I didn't submit it to HN), and it
| could definitely use a "here's what practical systems do"
| update! I'll put it on the TODO list...
|
| Your point about systems dealing with a relatively small number
| of large objects vs. small objects also makes sense: this is
| essentially the "cost" of an overflow (4kb spills once in a
| blue moon? Oh well, handle that as a special case. 4TB spills
| once in a blue moon? The system might crash). This is more
| obvious, as you also point out, in load balancing.
|
| One aspect I found very counter-intuitive: before this
| investigation, I would've guessed that having a large number of
| large bins makes overflow increasingly unlikely. This is only
| partially true: more bins is obviously good, but larger bins
| are actually _more_ sensitive to changes in load factor!
|
| Overall, I think you are right that this is not really a
| concern in modern systems today. Compared to Dynamo, I still
| think Vimeo's solution (linked at the bottom of the post) is
| both intuitive and low-complexity. But regardless, more of an
| interesting mathematical diversion than a practical systems
| concern these days.
| anonymousDan wrote:
| Yeah there's a nice load balancing paper called the power of
| two choices that shows sending two requests gives surprisingly
| good results.
| 10000truths wrote:
| Load balancing solves the issue of non-uniform hashing by
| generating two hashes and picking the hash that corresponds to
| the node with lower load. Can something similar be done here?
| klysm wrote:
| No because the point of consistent hashing is that it doesn't
| require state other than the number of nodes to map items to
| nodes. Picking the lower load hash would require tracking much
| more state
| pjdesno wrote:
| Actually, I think what a lot of real systems do is equivalent to
| pre-computing a "random" table that has suitable balancing
| properties, and then use the hash to index into it.
|
| e.g. Ceph used to have a big problem with overloaded placement
| groups, causing some disks to get twice as much load; max
| throughput was when those maxed out, leaving the rest half-idle.
| I don't recall the details of the current solution, but I think
| it's equivalent to generating a random assignment, and then
| tweaking it to get rid of over-full bins.
|
| The original Chord-style consistent hashing is easier in the P2P
| environments it was designed for, but typically consistent
| hashing is used today in much more closely-coupled systems.
___________________________________________________________________
(page generated 2024-05-20 23:01 UTC)