[HN Gopher] A hash table by any other name
       ___________________________________________________________________
        
       A hash table by any other name
        
       Author : signa11
       Score  : 96 points
       Date   : 2024-07-26 06:53 UTC (16 hours ago)
        
 (HTM) web link (lwn.net)
 (TXT) w3m dump (lwn.net)
        
       | pphysch wrote:
       | > "I've _assumed_ an average chain length of 10 for rhashtable in
       | the above memory calculations. "
       | 
       | > Reviewers were, as ever, skeptical in the face of assertions
       | about performance without corresponding measurements. Peng Zhang
       | asked "how much performance improvement is expected if it is
       | applied to dcache?" Wilcox didn't reply to Zhang's question.
       | 
       | Am I reading this right that the author supplied no actual
       | (benchmarking) evidence to support their proposed performance
       | optimization?
        
         | kelnos wrote:
         | That's how I read it too, and that seems bizarre to me. The one
         | request for actual performance numbers was ignored? That seems
         | pretty crappy.
        
         | RandomThoughts3 wrote:
         | Seriously, a simple google search or taking a minute to follow
         | the first link in the article would have shown you that the
         | patch set obviously came with a benchmark [1]. It's performance
         | data on actual real use in the kernel which are missing. It's
         | someone posting a patch to the kernel mailing list, not some
         | random kid on GitHub, there is no point assuming they are an
         | idiot to try to shine.
         | 
         | My days as a LWN subscriber are now far behind but gosh reading
         | the comments there was a blast. I had forgotten how good proper
         | technical discussions used to be.
         | 
         | [1]
         | https://lore.kernel.org/lkml/20240222203726.1101861-1-willy@...
        
           | jonhohle wrote:
           | I poked through a few messages in that thread and while there
           | is a space benchmark, there is no time benchmark, something I
           | would expect to see on switching from a LL to a cache-aware
           | implementation. Phoronix, who would benchmark a potato if you
           | told them it gave you better performance, has an article with
           | no benchmarks. The commit says it should be good enough for
           | someone to run a perf benchmark.
           | 
           | Is it common to add an unused data structure into the kernel
           | without understanding its performance characteristics? If
           | they are available it is not obvious to someone who has read
           | the article, read the commit message, read lkml, went looking
           | for benchmarks, and has still come up with no concept of how
           | this might perform in realistic or synthetic scenarios.
        
           | pphysch wrote:
           | The entire point of a LWN article like this is to accurately
           | summarize the patch sets and email discussions for casual
           | readers. No need for the snark.
        
           | scottlamb wrote:
           | I'm pretty sure pphysch is correct.
           | 
           | > Seriously, a simple google search or taking a minute to
           | follow the first link in the article would have shown you
           | that the patch set obviously came with a benchmark [1].
           | 
           | If you're referring to the table with headers "number xarray
           | maple rhash rosebush", it's an estimate of memory usage, not
           | a benchmark. And the author (Matthew Wilcox) says "I've
           | assumed an average chain length of 10 for rhashtable in the
           | above memory calculations", which folks questioned. So
           | there's reason to doubt its correctness.
           | 
           | > It's someone posting a patch to the kernel mailing list,
           | not some random kid on GitHub, there is no point assuming
           | they are an idiot to try to shine.
           | 
           | Let's just let the work speak rather than flagging the author
           | as an idiot or genius...at present rosebush is unproven and
           | based on questionable assumptions. That might change in
           | future rounds.
        
         | rurban wrote:
         | Because many others did that benchmarks before and come to vast
         | improvements over simple linked lists.
        
       | o11c wrote:
       | Is it just me, or is this unsafely assuming that collisions can
       | only happen in low bits, not for the full hash (despite it being
       | only 32 bits)
        
         | masklinn wrote:
         | No? From the text at least, the low bits are just used to
         | select the bucket, this is a common technique (and reason for
         | using power of two sizes such that you don't need a full
         | modulo).
         | 
         | Once you're in a bucket, it does full hash match, then I assume
         | (didn't look at the patch) actual key comparison.
        
           | o11c wrote:
           | That would be fine, except that in v2 the buckets are fixed
           | size.
           | 
           | What happens when more than 20 objects end up with the exact
           | same hash?
        
             | samatman wrote:
             | If it were my hash table, the arrays would just be unrolled
             | linked lists.
             | 
             | So a "twenty object" bucket would actually store 19
             | objects, and if the last one wasn't null it would contain a
             | pointer to the next bucket.
             | 
             | I'm willing to presume these folks know what they're doing,
             | so that's probably what's happening here.
        
             | tialaramex wrote:
             | Yes it does seem like a problem, by my reading this will
             | just endlessly try to "split" the bucket of 20 items,
             | hoping that eventually one of the items goes into a new
             | bucket leaving space, but if they have the same hash that
             | never happens, disaster.
             | 
             | If I'm wrong I don't think this code does a good enough job
             | of explaining what's going on, and if I'm correct it's
             | defective and should not ship.
             | 
             | Even if this situation "shouldn't" happen, it clearly _can_
             | happen, you just have to get unlucky, and it 's difficult
             | to compute how unlikely that is for real systems, so better
             | to ensure that "We got unlucky" isn't a disaster for the
             | kernel.
        
               | rurban wrote:
               | No desaster. On an actual attack (eg if the seed leaked)
               | you count the collisions and return empty on some low
               | threshold. Like 100 on 32bit
        
               | tialaramex wrote:
               | Where in the code is this "no disaster" routine you
               | describe?
        
             | bufferoverflow wrote:
             | I think they address this in the article. They use arrays
             | instead of linked lists. Which makes direct key comparison
             | very fast, because it's all in CPU cache.
        
               | tialaramex wrote:
               | "I just use an array" doesn't solve the problem "What if
               | the array is full?".
               | 
               | In fact it makes that problem acute. A linked list
               | structure will degrade gradually as we shove more entries
               | with the same hash into it, every look-up incurs a
               | pointer chase down the entire list, slower is eventually
               | unacceptable but we do get signs first -- but with arrays
               | we're just screwed, in C it's pretty likely we end up
               | with an illegal dereference or we lose data once there's
               | no room.
        
               | mst wrote:
               | If it's designed to be used for caching, then replacing
               | an old entry or throwing away the new one is quite
               | possibly a perfectly reasonable "out."
               | 
               | If it's designed to be used for other purposes as well,
               | not so much.
               | 
               | (I wasn't entirely sure which from reading the article)
        
             | tubs wrote:
             | They resize the top level array and rehash items into their
             | new buckets.
        
               | tialaramex wrote:
               | Sure. There were 20 items in the bucket with the same
               | hash, we wanted to add one more.
               | 
               | So we grow the top level hash table and we put all of the
               | twenty items in the same new bucket, because they have
               | the same hash, and then we... oh, we don't have enough
               | room. Try again? (The code presented does). Too bad, that
               | can't help.
               | 
               | In your understanding, why does this not happen? That's
               | the question. How do this code ensure this can't happen?
               | If it doesn't, if it's just hoping to get lucky the code
               | must not be used.
        
               | tubs wrote:
               | They go to the same bucket because their hash *mod the
               | size of the top array* collides.
        
               | tialaramex wrote:
               | This whole sub-thread is about the case where the _entire
               | hash_ collides
               | 
               | Changing which bits we use changes nothing for this
               | problem case.
        
         | 10000truths wrote:
         | The rhashtable's bucket_table struct stores a random seed (the
         | hash_rnd field) that is folded into the calculation of the
         | hash. Without knowing what the seed is going to be, an attacker
         | has no way to precompute keys with colliding hashes.
        
       | metadat wrote:
       | Related reading:
       | 
       | What is RCU, Fundamentally? - https://lwn.net/Articles/262464/
       | 
       | Amusingly, the first several hits for "Golang RCU" point to
       | GitHub repos which contain a bunch of Mutexes (locks), import
       | "unsafe", and haven't been updated in 7+ years. Perhaps this is
       | not the most well-understood area of computer science.
       | 
       | Let's face it, when was the last time you needed RCU for a web
       | app?
        
       | commandlinefan wrote:
       | Interestingly, I was just reading the book "Java Concurrency in
       | Practice" where the author measures the performance of an array-
       | based hash table like the one described here with a linked-list
       | based hash table and finds that the linked list comes out ahead
       | performance-wise.
        
       ___________________________________________________________________
       (page generated 2024-07-26 23:08 UTC)