[HN Gopher] My favourite small hash table
___________________________________________________________________
My favourite small hash table
Author : speckx
Score : 140 points
Date : 2025-12-09 14:47 UTC (19 hours ago)
(HTM) web link (www.corsix.org)
(TXT) w3m dump (www.corsix.org)
| clbrmbr wrote:
| Awesome blog! Looking at the code I feel like there's a kindred
| soul behind that keyboard, but there's no About page afaict. Who
| beeth this mysterious writer?
| LargoLasskhyfv wrote:
| https://github.com/corsix
| loeg wrote:
| He's done some interesting work with crc32 ~recently:
|
| https://www.corsix.org/content/fast-crc32c-4k
|
| https://github.com/corsix/fast-crc32/
| judofyr wrote:
| Is there a specific reason to store the key + value as an
| `uint64_t` instead of just using a struct like this?
| struct slot { uint32_t key; uint32_t value;
| }
| zimpenfish wrote:
| Maybe trying to avoid struct padding? Although having done a
| quick test on {arm64, amd64} {gcc, clang}, they all give the
| same `sizeof` for a struct with 2x`uint32_t`, a struct with a
| single `uint64_t`, or a bare `uint64_t`.
| simonask wrote:
| In any struct where all fields have the same size (and no
| field type requires higher alignment than its size), it is
| guaranteed on every (relevant) ABI that there is no padding
| bytes.
| zimpenfish wrote:
| TIL! Thanks!
| nitnelave wrote:
| The alignment constraint is different, which they use to be
| able to load both as a 64-bit integer and compare to 0 (the
| empty slot).
|
| You could work around that with a union or casts with explicit
| alignment constraints, but this is the shortest way to express
| that.
| Asooka wrote:
| In that case you can use bit fields in a union:
| union slot { uint64_t keyvalue;
| struct { uint64_t key: 32;
| uint64_t value: 32; }; };
|
| Since both members of the union are effectively the exact
| same type, there is no issue. C99: "If the member used to
| access the contents of a union is not the same as the member
| last used to store a value, the object representation of the
| value that was stored is reinterpreted as an object
| representation of the new type". Meaning, you can initialise
| keyvalue and that will initialise both key and value, so
| writing "union slot s{0}" initialises everything to 0. One
| issue is that the exact layout for bit fields is
| implementation defined, so if you absolutely need to know
| where key and value are in memory, you will have to read
| GCC's manual (or just experiment). Another is that you cannot
| take the address of key or value individually, but if your
| code was already using uint64_t, you probably don't need to.
|
| Edit: Note also that you can cast a pointer to slot to a
| pointer to uint64_t and that does not break strict aliasing
| rules.
| nitnelave wrote:
| You can probably get away with just a union between a 64
| bit and 2 32 bit integers.
| crest wrote:
| C has finally gained `alignas` so you can avoid the union
| hack or you could just rely on malloc to alway return the
| maximum alignment anyway.
| loeg wrote:
| No real reason. Slightly terser to compare with zero to find an
| empty slot.
| mwkaufma wrote:
| Or better, just store keys and values in separate arrays, so
| you can have compact cache lines of just keys when probing.
| Aardwolf wrote:
| > The table occupies at most 32 GiB of memory.
|
| This constraint allows making a linear array of all the 4 billion
| values, with the key as array index, which fits in 16 GiB.
| Another 500 MiB is enough to have a bit indicating present or not
| for each.
|
| Perhaps text strings as keys and values would give a more
| interesting example...
| dragontamer wrote:
| This hashtable implements a multiset. Not (merely) a simple
| set.
| re wrote:
| > a linear array of all the 4 billion values, with the key as
| array index, which fits in 16 GiB
|
| The hash table has the significant advantage of having a _much_
| smaller _minimum_ size.
|
| > Perhaps text strings as keys and values would give a more
| interesting example
|
| Keep reading to "If keys and values are larger than 32 bits"
| kazinator wrote:
| That's not a constraint as much as the worst case size.
|
| If you actually only have a handful of entries in the table, it
| is measurable in bytes.
|
| A linear array of all 4 billion possible values will occupy 16
| GiB (of virtual memory) upfront. We have then essentially
| replaced the hash table with a radix tree --- that made up of
| the page directories and tables of the VM system. If only the
| highest and lowest value are present in the table, then we only
| need the highest and lowest page (4 kB or whatever) to be
| mapped. It's not very compact for small sets; storing N numbers
| in random locations could require as many as N virtual memory
| pages to be committed.
| artur44 wrote:
| I always find it interesting how often the simplest hash table
| layouts end up performing best in real workloads. Once you avoid
| pointer chasing and keep everything in a compact array, CPU
| caches do most of the heavy lifting.
|
| It's also a good reminder that clarity of layout often beats more
| "clever" designs, especially when the dataset fits comfortably in
| memory.
| hinkley wrote:
| Until you get high memory contention from the rest of the code.
| Once eviction gets high you get some pretty counterintuitive
| improvements by fixing things that seem like they shouldn't
| need to be fixed.
|
| My best documented case was a 10x speed up from removing a
| double lookup that was killing caches.
| crest wrote:
| My best improvment was just bit-interleaving both axes of a
| 2x32bit integer coordinate (aka z-curve). I obtained factor
| ~100x (yes factor not percent) throughput improvement over
| locality in only one dimension. All it took was ~10 lines of
| bit twiddling. The runtime went from a bit above 300ms to
| slightly less then 3ms.
| throw-the-towel wrote:
| I'm wondering how do you folk even come up with this kind
| of optimisations.
| hinkley wrote:
| Sheer stubbornness.
|
| Profilers lie and some more than most. I've gotten 3x
| from code with a perfectly rectangular profile output.
|
| Part of it comes down to a trick game devs steal from
| finance: give each task a budget and keep it to the
| budget even if it's not the tall tent pole.
|
| You should not spend 10% of your response time on
| telemetry and logging combined. Yet I pulled 10% TTFB out
| of just the logging and telemetry code on a project. It
| was a frog boiling situation. Every new epic used the new
| code and determining the cumulative cost wasn't easy.
| hinkley wrote:
| End to end gets weird. I was asked to look at an admin
| page, nobody could figure out why it was 30s. Literally the
| first thing I tried got it under 4 and the second down to
| three. It was pulling the same list of rows twice, applying
| two filters and then looking at the intersection. I changed
| the signature to send the list as input instead of the
| query constraints. Then I changed them to avoid the
| intersect.
|
| If you would have asked me to bet on which one would have
| had the bigger impact I would have split the difference.
|
| My second favorite was similar. Two functions making a call
| instead of sharing the answer. Profiler said 10%
| cumulative. I removed half. Instead of 5% I got 20%. Which
| just demonstrates how much data a profiler cannot show you.
| saltcured wrote:
| To me, these sorts of examples always seem contrived. To the
| first order, I've never had a real hash table problem that was
| on machine word keys.
|
| I've nearly always had a variable length string or other
| complex structure that was being hashed, not their handles.
|
| Back in my early career in C, this would be a generic API to
| hash and store void pointers, but the pointers were not being
| hashed. The domain-specific hash function needed to downcast
| and perform the appropriate remote memory access to fetch the
| variable-length material that was actually being hashed.
| jasonwatkinspdx wrote:
| I'm a big fan of the basic power of two choices hash table
| design. It's simple to understand and implement, has reasonable
| constant factors, and hits high load factors on real world
| datasets.
|
| You can use more elaborate probe and relocation schemes, but
| just choosing the less full bucket and resizing if both choices
| are full gets you surprisingly far.
| kccqzy wrote:
| Power-of-two length is also the natural choice for a growable
| linear array where the designer has no idea how many elements
| there will be.
| ww520 wrote:
| This hash table is pretty good. It has at best one memory read if
| there's no collision. Randomized key might introduce any level of
| key collision though.
| air217 wrote:
| Used Gemini to explain this. Very interesting and I think Gemini
| did a good job explaining the Robin hood fairness mechanism
|
| https://gemini.google.com/share/5add15a1c12f
| Joker_vD wrote:
| > If the Zba extension is present, sh3add.uw is a single
| instruction for zero-extending idx from 32 bits to 64,
| multiplying it by sizeof(uint64_t), and adding it to slots.
|
| Yay, we've got an equivalent of SIB byte but as three (six?)
| separate opcodes. Well, sub-opcodes.
|
| It's a shame though that Zcmp extension didn't get into RVA23
| even as an optional extension.
| attractivechaos wrote:
| This is a smart implementation of Robin Hood hashing I am not
| aware of. In my understanding, a standard implementation keeps
| the probe length of each entry. This one avoids that due to its
| extra constraints. I don't quite understand the following
| strategy, though
|
| > _To meet property (3) [if the key 0 is present, its value is
| not 0] ... "array index plus one" can be stored rather than
| "array index"._
|
| If hash code can take any value in [0,2^32), how to define a
| special value for empty buckets? The more common solution is to
| have a special key, not a special hash code, for empty slots,
| which is easier to achieve. In addition, as the author points
| out, supporting generic keys requires to store 32-bit hash
| values. With the extra 4 bytes per bucket, it is not clear if
| this implementation is better than plain linear probing (my
| favorite). The fastest hash table implementations like boost and
| abseil don't often use Robin Hood hashing.
| Panzerschrek wrote:
| I don't understand the part about key removal. Is it correct to
| shift values to the left? Can't it break lookups for keys
| inserted into their natural positions?
| unixhero wrote:
| I am dipping my toes in programming. But I can't follow this
| without graphical representations of the tables as the author
| (brilliantly) walks through it.
___________________________________________________________________
(page generated 2025-12-10 10:01 UTC)