[HN Gopher] The power of two random choices
       ___________________________________________________________________
        
       The power of two random choices
        
       Author : mfiguiere
       Score  : 128 points
       Date   : 2024-02-07 02:05 UTC (1 days ago)
        
 (HTM) web link (brooker.co.za)
 (TXT) w3m dump (brooker.co.za)
        
       | Zanni wrote:
       | Nice animated visualization of this:
       | https://twitter.com/GrantSlatton/status/1754912113246798036
        
         | koolba wrote:
         | That's neat. An easy concept to understand here is that with a
         | pair to consider you will never target the worst node. At worst
         | it's the second most loaded node.
        
       | atq2119 wrote:
       | The underlying mathematical driver of the power of two random
       | choices is a pretty amazing fact about the "n balls into n bins"
       | random experiment.
       | 
       | If the balls are thrown independently and uniformly at random,
       | the most loaded bin will have theta(log n) balls in it (almost
       | certainly).
       | 
       | If you instead pick two bins uniformly and independently at
       | random for each ball and throw the ball into the bin which has
       | fewer balls in it so far, then the most loaded bin will have
       | theta(loglog n) balls in it, i.e a massive asymptotic
       | improvement.
        
         | mjb wrote:
         | The balls-into-bins problem comes up often in distribute
         | systems engineering, and so it's super important to have the
         | right intuition. I think a lot of people come in to the space
         | assuming that random balls-into-bins is much more even than it
         | really is (and so underestimating the value of better-than-
         | random placement).
         | 
         | Another interesting example is hash-based partitioning of data.
         | It doesn't spread data out nearly as evenly as one might
         | expect!
         | 
         | (I wrote a blog post about that too:
         | https://brooker.co.za/blog/2018/01/01/balls-into-bins.html)
        
           | nyrikki wrote:
           | I have found remembering that both the law of large numbers
           | and the central limit theorem are dependent on the
           | independent and identically distributed assumption.
           | 
           | As an example, many computing systems are Markovian, which is
           | memoryless with an exponential distribution.
           | 
           | So the second part of I.I.D. doesn't hold. But may be close
           | enough for your needs.
        
             | mjb wrote:
             | However, in many real systems the aggregate behavior
             | follows the asymptotic trend of the central limit theorem
             | even though the underlying distributions may not fit the
             | strict requirements.
        
               | nyrikki wrote:
               | Yes, I always start with the IID, but try watch for
               | evidence that those assumptions need to change.
               | 
               | Too many convenient properties are lost once you lose IID
               | to prematurely drop it.
        
       | mkhnews wrote:
       | I used this method for a cache eviction/replacement scheme. For
       | our use case it was better than traditional lru.
        
       | zellyn wrote:
       | The most interesting question this has always raised for me is
       | whether it implies you should always try two
       | doctors/dentists/therapists/etc. before making a choice. Seems
       | like it would be just as powerful "in real life".
        
         | taeric wrote:
         | Well, this strategy would only help you find ones that are not
         | over loaded? And probably only works if you can change choices
         | over time?
         | 
         | That said, sampling, in general, works surprisingly well in all
         | things.
        
         | mjb wrote:
         | There's a whole area of math in "optimal stopping" which aims
         | at exactly this question: how many dentists should you try
         | before you pick your favorite?
         | 
         | For example: https://en.wikipedia.org/wiki/Secretary_problem
        
           | vikingerik wrote:
           | That problem is always sort of intuitively misleading. The
           | success criterion is defined as picking the single best
           | option from the field. But that's not how real life works.
           | You don't have to pick the single best option for a job or
           | spouse or dentist or whatever; you'll likely be satisfied
           | with anything in the top ten or quintile or somewhere around
           | there. It's interesting mathematically, but it's seldom
           | really applicable to real life.
        
         | chrisweekly wrote:
         | That reminds me of a great book -- "Algorithms to Live By" --
         | which examines precisely this kind of question. Useful,
         | educational, and entertaining = highly recommended.
        
       | ckcheng wrote:
       | Another place this technique was used and discussed recently:
       | 
       | Caches: LRU vs. Random (2014) (danluu.com)
       | 
       | https://news.ycombinator.com/item?id=39093109
        
         | Night_Thastus wrote:
         | It _feels_ subjectively like Caching and Load Balancing both
         | belong to the same  'class' of problems at least at some level.
         | I'm curious if there's a term that encompasses both.
        
       | dahart wrote:
       | The most interesting tidbit, I think, is cut off from the right
       | side of the plot -- eventually with enough delay, 1 random choice
       | is the winner, for exactly the same reason that 3 random choices
       | wins for small delay, and 2 random choices wins for medium delay.
       | 
       | I kinda wonder now if the crossover points between best, 3, 2,
       | and 1 are equidistant on a log-scale plot. If I squint a little,
       | I could maybe see 4, 16, and 64 being the approximate crossover
       | points, especially if there was a little less noise.
        
         | mjb wrote:
         | Yes, it's super interesting when systems move into the regime
         | where stale information becomes worse than no information at
         | all. Part of what makes best-of-2 interesting is how long it
         | resists falling into this pit, but it does eventually happen
         | (as, I suspect, it must with _any_ distributed placement /load
         | balancing algorithm).
        
       | amluto wrote:
       | > Using stale data for load balancing leads to a herd behavior,
       | where requests will herd toward a previously quiet host for much
       | longer than it takes to make that host very busy indeed.
       | 
       | This phenomenon is common with cars using traffic-aware
       | navigation, too.
        
       ___________________________________________________________________
       (page generated 2024-02-08 23:02 UTC)