[HN Gopher] Hash Ordering and Hyrum's Law
       ___________________________________________________________________
        
       Hash Ordering and Hyrum's Law
        
       Author : ColinWright
       Score  : 37 points
       Date   : 2024-09-27 17:27 UTC (5 days ago)
        
 (HTM) web link (eaftan.github.io)
 (TXT) w3m dump (eaftan.github.io)
        
       | sorokod wrote:
       | _Hash iteration order is a great example of Hyrum's Law_
       | 
       | An even better example (requires less words)
       | https://xkcd.com/1172/
        
       | jbandela1 wrote:
       | https://www.youtube.com/watch?v=ncHmEUmJZf4
       | 
       | Is a fun presentation by Matthew Kulukundis (designer of Google's
       | Hash Table) with Hyrum Wright offering objections from the crowd
       | (Hyrum's law) about the design and rollout of it at Google
        
       | ljnelson wrote:
       | Interesting note: the JDK deliberately randomizes iteration order
       | of certain immutable sets and maps (https://github.com/openjdk/jd
       | k/blob/jdk-23%2B37/src/java.bas...).
        
       | mise_en_place wrote:
       | Alternatively, maintain backward compatibility as much as
       | possible. Function overloading might be too clever. Extra
       | verbosity in exchange for clarity might be a worthwhile tradeoff.
       | WET is acceptable in these scenarios.
        
       | lsy wrote:
       | What's the advantage of specifying it as random over specifying
       | it as sequential in some way? Aren't both just specifying a
       | behavior, where one behavior is potentially more useful than the
       | other? I guess I understand the principled point that a set of
       | hash keys is not an array. But it seems like more complexity than
       | is necessary, and even possible to fall victim to Hyrum's law
       | besides... you can imagine someone using random hash ordering to
       | implicitly randomize something without needing to call an RNG
       | explicitly.
       | 
       | It seems like the most principled approach might be an re-
       | specification of the data structure: why is "iteration" even
       | possible over a hash table? Shouldn't the keys be specified as a
       | "set" on which only set-appropriate operations like map can be
       | performed?
        
         | fwip wrote:
         | I think removing methods from the Hashmap was not in scope for
         | their work - they were working on the JDK, and presumably
         | didn't have the bandwidth to patch every single Java library
         | used at Google to use set-based methods instead of iteration-
         | based.
         | 
         | Also, I think in practice you'll find you want to extract items
         | from a set in some order, so you need some kind of
         | transformation from set->list. e.g: you want to print them out.
         | 
         | Edit: Forgot to address your main question. If the
         | specification requires a specific ordering, the implementation
         | is forever bound to do that, even if other implementations
         | would be more efficient/secure/other-desirable-properties. By
         | introducing randomness, you reduce the risk of people
         | accidentally relying on unspecified behavior, and are more able
         | to change your implementation later.
        
         | kstrauser wrote:
         | One advantage to randomization, and why various languages did
         | it in the first place, is that it prevents miscreants from
         | DoSing your service if they know its implementation.
         | 
         | Say you're operating a service and you share its source on
         | GitHub such that anyone can see it. Your language doesn't
         | randomize hash values. Hashes are O(1), right? Well, not if a
         | clever attacker can send you a set of values where they know
         | each one will hash into the same bucket. The you end up with
         | like 9 empty buckets and 1 with 100,000 items in it. Oops!
         | 
         | Old Python pre-randomization had a dict.items() method that
         | would yield all the keys in order, conceptually kind of like:
         | for bucket in self._buckets:           for item in bucket:
         | yield item
         | 
         | The order of those buckets would be repeatable between runs, so
         | bucket foo would always come first, then bucket bar. Then
         | Python added hash randomization so that the resulting hash key
         | was something like hash(salt+key) instead of just hash(key).
         | Now there's no way to tell in advance which bucket an item
         | would get filed into, and the buckets would end up more or less
         | balanced in size.
         | 
         | Newer Pythons (since 3.6? 3.7?) do something altogether
         | different, and I can't explain exactly how their ordered dicts
         | work, except to say I sat through a presentation on them and
         | thought it was freaking genius even if I could re-implement it
         | myself without sitting down with their docs.
        
       | hoistbypetard wrote:
       | Relevant XKCD:
       | 
       | https://xkcd.com/1172/
        
       ___________________________________________________________________
       (page generated 2024-10-02 23:00 UTC)