[HN Gopher] Building a faster hash table for high performance SQ...
___________________________________________________________________
Building a faster hash table for high performance SQL joins
Author : nhourcard
Score : 48 points
Date : 2023-12-20 16:26 UTC (6 hours ago)
(HTM) web link (questdb.io)
(TXT) w3m dump (questdb.io)
| pixelpoet wrote:
| Serious question, if performance is the lynchpin, why write it in
| Java?
|
| Especially considering they use unsafe "heavily", for big joins
| they could easily just call out to some native code if the
| surrounding code reaaaaally must be Java (again, why?). It's the
| worst of both worlds using unsafe Java: you don't get native
| speed, there's loads of memory overhead from everything being an
| Object (besides the rest of the VM stuff), and get to "enjoy" GC
| pauses in the middle of your hot loops, and with fewer safety
| guarantees than something like Rust.
| bluestreak wrote:
| this is THE number one question being asked about QuestDB :)
| There is a thread that might help
| https://news.ycombinator.com/item?id=37557880
| pixelpoet wrote:
| I see... but that seems a little weak considering it's a
| funded product, the first adjective they use to describe it
| is "fast", and good old C++ would totally slay it. The author
| has a C++ background, maybe he could spend an afternoon
| trying that?
|
| I couldn't even try to count the number of great posts I've
| read about fast hash tables from e.g. Paul Khuong alone...
| gavinray wrote:
| In the most non-inflammatory way possible: I am not sure I'm
| convinced of the performance benefits of crossing a JNI
| boundary to invoke a method in another language, versus
| letting the JVM JIT and optimize native JVM code.
|
| Would be interesting to see benchmarks here, do you know if
| there are any public ones available, specific to QuestDB's
| interop code?
| bluestreak wrote:
| In certain situations, crossing the JNI boundary can be
| advantageous. When data resides in "native" memory, outside
| the Java heap, the coordination of concurrent execution
| logic can be handled in Java, while the actual execution
| occurs in C++ or Rust. In this context, the negligible
| penalty for crossing the JNI boundary once per 10 million
| rows pales in comparison to the substantial benefits
| achievable through C++ optimization or CPU-specific
| assembly.
| lukax wrote:
| They started with Java and tried to use as little GC as
| possible and started writing hot loops in C, C++ and Rust.
| Logic is in Java, other stuff is mostly native.
|
| https://questdb.io/blog/leveraging-rust-in-our-high-performa...
| toasted-subs wrote:
| Yeah, I genuinely don't understand why so many people reach for
| Java for everything.
| marginalia_nu wrote:
| It's a very well rounded language is why, and whatever
| papercuts exist are either on the way out in newer/future
| java versions, or made up for by the tooling ecosystem.
| dimgl wrote:
| Interestingly I recently saw an MQ broker written in Java and
| it seemed to have some pretty impressive performance. I'm not a
| huge fan of Java's DX but I have to admit, that's some serious
| performance. Can't remember the name but it was on the homepage
| this week I believe.
| gavinray wrote:
| I always enjoy reading stuff written by Andrey, he's a brilliant
| fellow for sure.
|
| Can highly recommend his personal blog as well:
| https://puzpuzpuz.dev/
| puzpuzpuz-hn wrote:
| Thanks, Gavin, I'm pleased to hear that. And thanks for
| recommending my blog!
| _a_a_a_ wrote:
| From article
|
| "Imagine that we run this query over a few hundred million rows.
| This means at least a few hundred million hash table operations.
| As you might imagine, a slow hash table would make for a slower
| query. A faster hash table? Faster queries!"
|
| I'll read the article properly after this, this is just a quick
| skim, but I can't see this quote can be correct. Unless I'm
| missing something, hashing function is fast compared to random
| bouncing around inside ram - very much faster then random memory
| accesses. So I can't see how it make a difference.
|
| Okay, I'll read the article now...
|
| Edit:
|
| "If you insert "John" and then "Jane" string keys into a FastMap,
| then that would later become the iteration order. While it
| doesn't sound like a big deal for most applications, this
| guarantee is important in the database world.
|
| If the underlying table data or index-based access returns sorted
| data, then we may want to keep the order to avoid having to sort
| the result set. This is helpful in case of a query with an ORDER
| BY clause. Performance-wise, direct iteration over the heap is
| also beneficial as it means sequential memory access."
|
| but "...if the underlying table data or index-based access
| returns sorted data..." Then you've got sorted data, in which
| case use a merge join instead of a hash join surely.
| puzpuzpuz-hn wrote:
| > Unless I'm missing something, hashing function is fast
| compared to random bouncing around inside ram - very much
| faster then random memory accesses. So I can't see how it make
| a difference.
|
| In a GROUP BY, you may have a few hundred million rows, but
| only a few hundred groups within them. A slow function would
| slow down things dramatically in that case since the hash table
| remain small and data access is potentially linear.
|
| > Then you've got sorted data, in which case use a merge join
| instead of a hash join surely.
|
| This property is beneficial for GROUP BY which includes a
| timestamp or a function over timestamp. QuestDB organizes data
| sorted by time, so relying on insertion order may help to avoid
| redundant sorting if there is an ORDER BY clause with the
| timestamp column.
|
| As for merge join, we also use it in ASOF join:
| https://questdb.io/docs/reference/sql/join/#asof-join
| senderista wrote:
| Since the blog post mentioned a PR to replace linear probing with
| Robin Hood, I just wanted to mention that I found bidirectional
| linear probing to outperform Robin Hood across the board in my
| Java integer set benchmarks:
|
| https://github.com/senderista/hashtable-benchmarks/blob/mast...
|
| https://github.com/senderista/hashtable-benchmarks/wiki/64-b...
| jerrinot wrote:
| A QuestDB engineer here: These are cool benchmarks! The idea to
| try Robin Hood probing came to me after receiving some feedback
| on Reddit. I ran initial experiments, and the results were
| promising, leading to its integration into our codebase. Thank
| you so much for sharing your repository. Perhaps one day we'll
| explore bidirectional probing as well!
|
| A snapshot of my happiness after running first experiments with
| Robin Hood:
| https://twitter.com/jerrinot/status/1730147245285150743 :)
| brandonasuncion wrote:
| Hi there!
|
| I made the initial suggestion to look into Robin Hood hashing
| when it was first posted on Reddit.
|
| Glad to see it make its way into the repo!
| puzpuzpuz-hn wrote:
| Thanks! We're still benchmarking Robin Hood hashing and are
| open to further experiments. The benchmarks look promising.
| mlochbaum wrote:
| Worth pointing out that this can depend a lot more on fiddly
| details than you might expect. In particular, you're dealing
| with a small fixed width allowing the hash to be stored in the
| table instead of the key. The article emphasizes variable-
| length keys, and I don't see any specialization on key sizes
| (if 4- and 8-byte keys aren't common then this makes sense; if
| they are then I'd expect dedicated table code for those sizes
| to be valuable). And set lookups are also just a bit different
| from value lookups. I think these cases are different enough
| that I have no idea if the results would carry over, although I
| can see how the bidirectional approach would reduce probing
| more than RH which seems good.
|
| ...and since I've done a lot of work with Robin Hood on small-
| key lookups, I can point out some little tweaks that have made
| a big difference for me. I have 8-byte lookups at just over
| 3ns/lookup[0], albeit at a very low load factor, typically
| <50%. A key step was to use the maximum possible hash as a
| sentinel value, handling it specially in case it shows up in
| the data. This way, instead of probing until finding an empty
| bucket _or_ greater hash, probing just finds the first slot
| that 's greater than or equal to the requested key's hash. So
| the lookup code[1] is very simple (the rest, not so much). The
| while loop is only needed on a hash collision, so at a low load
| factor a lookup is effectively branchless. However, these
| choices are specialized for a batched search where the number
| of insertions never has to be higher than the number of
| searches, and all the insertions can be done first. And focused
| on small-ish (under a million entries) tables.
|
| [0] https://mlochbaum.github.io/bencharray/pages/search.html
|
| [1]
| https://github.com/dzaima/CBQN/blob/5c7ab3f/src/singeli/src/...
| senderista wrote:
| Thanks for the links; the BQN impl looks really interesting.
| I believe TFA deals with only hash codes and offsets in the
| hash table proper (keys and values are stored separately in a
| dynamic array), so fixed-width keys/values still apply. It's
| true that you can't use keys interchangeably with hash codes
| for variable-length keys like I do for integer keys, but I
| don't expect that to affect the relative performance of RH
| vs. BLP. (I'm curious how they handle colliding hash codes;
| 32-bit hashes mean you have a ~50% probability of at least
| one collision at 2^16 keys, which isn't much.)
| mlochbaum wrote:
| Looks like full keys are always compared if hash codes test
| equal, which is what I'd expect. For example: https://githu
| b.com/questdb/questdb/blob/master/core/src/main...
| hawk_ wrote:
| Can bidirectional linear probing be used for any key type? Or
| do the keys need to be of some integral type?
| mlochbaum wrote:
| Since the hashes are stored, you could order by hash. This
| would leave keys with the same hash unordered, so if you find
| an entry with equal hash but unequal key you have to keep
| probing, but that only matters on a full-hash collision.
| senderista wrote:
| The keys-as-hash codes trick only works for fixed-width
| integers; otherwise you have to explicitly store the hash
| codes. I assumed unique hash codes in my implementation, but
| I think you could easily adapt the algorithm to allow
| duplicate hash codes. Tolerating hash code collisions would
| avoid having to increase hash code size (for collisions in
| their case, you'd just need to probe multiple offsets in the
| key/value array).
| rkerno wrote:
| Hi, I'm curious how you deal with the potential for hash
| collisions across a large data set - is that a post-join check?
___________________________________________________________________
(page generated 2023-12-20 23:00 UTC)