[HN Gopher] Hash Ordering and Hyrum's Law
___________________________________________________________________
Hash Ordering and Hyrum's Law
Author : ColinWright
Score : 37 points
Date : 2024-09-27 17:27 UTC (5 days ago)
(HTM) web link (eaftan.github.io)
(TXT) w3m dump (eaftan.github.io)
| sorokod wrote:
| _Hash iteration order is a great example of Hyrum's Law_
|
| An even better example (requires less words)
| https://xkcd.com/1172/
| jbandela1 wrote:
| https://www.youtube.com/watch?v=ncHmEUmJZf4
|
| Is a fun presentation by Matthew Kulukundis (designer of Google's
| Hash Table) with Hyrum Wright offering objections from the crowd
| (Hyrum's law) about the design and rollout of it at Google
| ljnelson wrote:
| Interesting note: the JDK deliberately randomizes iteration order
| of certain immutable sets and maps (https://github.com/openjdk/jd
| k/blob/jdk-23%2B37/src/java.bas...).
| mise_en_place wrote:
| Alternatively, maintain backward compatibility as much as
| possible. Function overloading might be too clever. Extra
| verbosity in exchange for clarity might be a worthwhile tradeoff.
| WET is acceptable in these scenarios.
| lsy wrote:
| What's the advantage of specifying it as random over specifying
| it as sequential in some way? Aren't both just specifying a
| behavior, where one behavior is potentially more useful than the
| other? I guess I understand the principled point that a set of
| hash keys is not an array. But it seems like more complexity than
| is necessary, and even possible to fall victim to Hyrum's law
| besides... you can imagine someone using random hash ordering to
| implicitly randomize something without needing to call an RNG
| explicitly.
|
| It seems like the most principled approach might be an re-
| specification of the data structure: why is "iteration" even
| possible over a hash table? Shouldn't the keys be specified as a
| "set" on which only set-appropriate operations like map can be
| performed?
| fwip wrote:
| I think removing methods from the Hashmap was not in scope for
| their work - they were working on the JDK, and presumably
| didn't have the bandwidth to patch every single Java library
| used at Google to use set-based methods instead of iteration-
| based.
|
| Also, I think in practice you'll find you want to extract items
| from a set in some order, so you need some kind of
| transformation from set->list. e.g: you want to print them out.
|
| Edit: Forgot to address your main question. If the
| specification requires a specific ordering, the implementation
| is forever bound to do that, even if other implementations
| would be more efficient/secure/other-desirable-properties. By
| introducing randomness, you reduce the risk of people
| accidentally relying on unspecified behavior, and are more able
| to change your implementation later.
| kstrauser wrote:
| One advantage to randomization, and why various languages did
| it in the first place, is that it prevents miscreants from
| DoSing your service if they know its implementation.
|
| Say you're operating a service and you share its source on
| GitHub such that anyone can see it. Your language doesn't
| randomize hash values. Hashes are O(1), right? Well, not if a
| clever attacker can send you a set of values where they know
| each one will hash into the same bucket. The you end up with
| like 9 empty buckets and 1 with 100,000 items in it. Oops!
|
| Old Python pre-randomization had a dict.items() method that
| would yield all the keys in order, conceptually kind of like:
| for bucket in self._buckets: for item in bucket:
| yield item
|
| The order of those buckets would be repeatable between runs, so
| bucket foo would always come first, then bucket bar. Then
| Python added hash randomization so that the resulting hash key
| was something like hash(salt+key) instead of just hash(key).
| Now there's no way to tell in advance which bucket an item
| would get filed into, and the buckets would end up more or less
| balanced in size.
|
| Newer Pythons (since 3.6? 3.7?) do something altogether
| different, and I can't explain exactly how their ordered dicts
| work, except to say I sat through a presentation on them and
| thought it was freaking genius even if I could re-implement it
| myself without sitting down with their docs.
| hoistbypetard wrote:
| Relevant XKCD:
|
| https://xkcd.com/1172/
___________________________________________________________________
(page generated 2024-10-02 23:00 UTC)