[HN Gopher] Overflow in consistent hashing (2018)
       ___________________________________________________________________
        
       Overflow in consistent hashing (2018)
        
       Author : g0xA52A2A
       Score  : 68 points
       Date   : 2024-05-19 10:22 UTC (1 days ago)
        
 (HTM) web link (rmarcus.info)
 (TXT) w3m dump (rmarcus.info)
        
       | User23 wrote:
       | Anyone interested in consistent hashing should take a look at the
       | simpler and more general rendezvous hashing[1] too.
       | 
       | [1] https://en.m.wikipedia.org/wiki/Rendezvous_hashing
        
         | Snawoot wrote:
         | There is also Aggregated Rendezvous Hashing[1] available, which
         | kind of solves problem of growth of required calculations for
         | large numbers of nodes.
         | 
         | [1] https://github.com/SenseUnit/ahrw
        
           | vlovich123 wrote:
           | I used a different technique that seems to work well in
           | practice. Prehash the names of the nodes beforehand. Then
           | hash the key and combine the hashes using a much cheaper
           | algorithm [1]. You only need to do a single hash per key as
           | with consistent hashing and then a very fast O(n) operation
           | instead of a hash to find the optimal node. This does degrade
           | to an O(nlogn) sort if you need to find the N best nodes
           | instead of the single best node (e.g. to have a concept of
           | fallbacks that you hand off to the next link in the chain),
           | but I found that to not actually matter where I implemented
           | it (generally was routing on < 100 nodes total in the first
           | place).
           | 
           | [1] https://stackoverflow.com/a/27952689
        
         | T0pH4t wrote:
         | There is also jump consistent hashing
         | https://arxiv.org/pdf/1406.2294
        
       | kwillets wrote:
       | I believe this problem diminishes as the key count goes up.
        
         | RMarcus wrote:
         | Depends!
         | 
         | If you double the number of keys and you double the number of
         | bins (load factor stays constant), then the problem becomes
         | _much worse_ very quickly.
         | 
         | If you double the number of keys and you double the _size_ of
         | each bin (load factor stays constant), then the problem
         | diminishes as you suggest. BUT, larger bins are more sensitive
         | to changes in load factor.
         | 
         | The sibling comment (
         | https://news.ycombinator.com/item?id=40415826 ) does a good job
         | of summarizing how post-2018 systems handle this issue.
        
       | lucianbr wrote:
       | Aren't you supposed to use more buckets than nodes, with each
       | node hosting a number of buckets not all adjacent on the circle?
       | This I expect would reduce the problems described in the article,
       | though not eliminate them of course.
        
         | mjb wrote:
         | Having multiple logical buckets per physical node doesn't fix
         | this problem. It does help ensure that the bucket sizes are
         | closer to uniform, but not that items hash uniformly into the
         | available buckets. Even if all the buckets are exactly uniform
         | (as in some of the simulations on this page, if I understand
         | correctly), to inconsistent hashing of items to buckets leads
         | to inconsistent load.
         | 
         | Multiplicity does help with a major operational concern,
         | though: when a node fails and recovery is needed, the recovery
         | traffic can be spread uniformly across all cluster members
         | rather than hot-spotting a small number of neighbors.
         | Incidentally, this is a classic congestive collapse scenario in
         | consistent hashed systems: a node looks failed because its
         | overloaded, which starts recovery, which adds load to the
         | neighbors which makes them look overloaded, and the whole thing
         | collapses.
        
           | nielsole wrote:
           | I always thought what systems did in practice is that each
           | node can have a variable number of logical buckets assigned
           | to it, so if there is an uneven distribution, the physical
           | node subscribes to more or less buckets. This makes it so
           | that the maximum logical bucket size is the maximum number of
           | items a physical node can hold.
        
       | mjb wrote:
       | This is a very cool page. I love little simulations like this for
       | building intuition for systems problems.
       | 
       | Practical systems deal with this by not caring strongly about
       | overflow (caches), by running at low enough utilizations that
       | overflow is very unlikely given their item counts (e.g. Dynamo),
       | by using explicit partitioning rather than consistent hashing
       | (e.g. DynamoDB), by being able to take advantage of multi-tenancy
       | to drive up per-physical-node utilization even in the case of low
       | per-logical-node utilization, or by using some additional
       | algorithmic sophistication (e.g. Chen et al
       | https://arxiv.org/pdf/1908.08762).
       | 
       | In practice, this kind of overflow is a big deal for systems that
       | deal with relatively small numbers of large objects, and are not
       | as big a deal for systems that deal with large numbers of small
       | objects. Try out the numbers in the page's "Handy Calculator" to
       | see how that plays out.
       | 
       | It's also worth mentioning that this isn't unique to consistent
       | hashing, but is a problem with random load balancing more
       | generally. "Pick a random server and send traffic to it" is an OK
       | load balancing strategy when requests are small and servers are
       | large, but a terrible one when requests become relatively large
       | or expensive. In the general load balancing/placement problem
       | this is easier than the storage case, because you don't need to
       | find requests again after dispatching them. That makes simple
       | algorithms like best-of-2 and best-of-k applicable.
        
         | RMarcus wrote:
         | This is my post from 2018 (I didn't submit it to HN), and it
         | could definitely use a "here's what practical systems do"
         | update! I'll put it on the TODO list...
         | 
         | Your point about systems dealing with a relatively small number
         | of large objects vs. small objects also makes sense: this is
         | essentially the "cost" of an overflow (4kb spills once in a
         | blue moon? Oh well, handle that as a special case. 4TB spills
         | once in a blue moon? The system might crash). This is more
         | obvious, as you also point out, in load balancing.
         | 
         | One aspect I found very counter-intuitive: before this
         | investigation, I would've guessed that having a large number of
         | large bins makes overflow increasingly unlikely. This is only
         | partially true: more bins is obviously good, but larger bins
         | are actually _more_ sensitive to changes in load factor!
         | 
         | Overall, I think you are right that this is not really a
         | concern in modern systems today. Compared to Dynamo, I still
         | think Vimeo's solution (linked at the bottom of the post) is
         | both intuitive and low-complexity. But regardless, more of an
         | interesting mathematical diversion than a practical systems
         | concern these days.
        
         | anonymousDan wrote:
         | Yeah there's a nice load balancing paper called the power of
         | two choices that shows sending two requests gives surprisingly
         | good results.
        
       | 10000truths wrote:
       | Load balancing solves the issue of non-uniform hashing by
       | generating two hashes and picking the hash that corresponds to
       | the node with lower load. Can something similar be done here?
        
         | klysm wrote:
         | No because the point of consistent hashing is that it doesn't
         | require state other than the number of nodes to map items to
         | nodes. Picking the lower load hash would require tracking much
         | more state
        
       | pjdesno wrote:
       | Actually, I think what a lot of real systems do is equivalent to
       | pre-computing a "random" table that has suitable balancing
       | properties, and then use the hash to index into it.
       | 
       | e.g. Ceph used to have a big problem with overloaded placement
       | groups, causing some disks to get twice as much load; max
       | throughput was when those maxed out, leaving the rest half-idle.
       | I don't recall the details of the current solution, but I think
       | it's equivalent to generating a random assignment, and then
       | tweaking it to get rid of over-full bins.
       | 
       | The original Chord-style consistent hashing is easier in the P2P
       | environments it was designed for, but typically consistent
       | hashing is used today in much more closely-coupled systems.
        
       ___________________________________________________________________
       (page generated 2024-05-20 23:01 UTC)