[HN Gopher] Building a faster hash table for high performance SQ...
       ___________________________________________________________________
        
       Building a faster hash table for high performance SQL joins
        
       Author : nhourcard
       Score  : 48 points
       Date   : 2023-12-20 16:26 UTC (6 hours ago)
        
 (HTM) web link (questdb.io)
 (TXT) w3m dump (questdb.io)
        
       | pixelpoet wrote:
       | Serious question, if performance is the lynchpin, why write it in
       | Java?
       | 
       | Especially considering they use unsafe "heavily", for big joins
       | they could easily just call out to some native code if the
       | surrounding code reaaaaally must be Java (again, why?). It's the
       | worst of both worlds using unsafe Java: you don't get native
       | speed, there's loads of memory overhead from everything being an
       | Object (besides the rest of the VM stuff), and get to "enjoy" GC
       | pauses in the middle of your hot loops, and with fewer safety
       | guarantees than something like Rust.
        
         | bluestreak wrote:
         | this is THE number one question being asked about QuestDB :)
         | There is a thread that might help
         | https://news.ycombinator.com/item?id=37557880
        
           | pixelpoet wrote:
           | I see... but that seems a little weak considering it's a
           | funded product, the first adjective they use to describe it
           | is "fast", and good old C++ would totally slay it. The author
           | has a C++ background, maybe he could spend an afternoon
           | trying that?
           | 
           | I couldn't even try to count the number of great posts I've
           | read about fast hash tables from e.g. Paul Khuong alone...
        
           | gavinray wrote:
           | In the most non-inflammatory way possible: I am not sure I'm
           | convinced of the performance benefits of crossing a JNI
           | boundary to invoke a method in another language, versus
           | letting the JVM JIT and optimize native JVM code.
           | 
           | Would be interesting to see benchmarks here, do you know if
           | there are any public ones available, specific to QuestDB's
           | interop code?
        
             | bluestreak wrote:
             | In certain situations, crossing the JNI boundary can be
             | advantageous. When data resides in "native" memory, outside
             | the Java heap, the coordination of concurrent execution
             | logic can be handled in Java, while the actual execution
             | occurs in C++ or Rust. In this context, the negligible
             | penalty for crossing the JNI boundary once per 10 million
             | rows pales in comparison to the substantial benefits
             | achievable through C++ optimization or CPU-specific
             | assembly.
        
         | lukax wrote:
         | They started with Java and tried to use as little GC as
         | possible and started writing hot loops in C, C++ and Rust.
         | Logic is in Java, other stuff is mostly native.
         | 
         | https://questdb.io/blog/leveraging-rust-in-our-high-performa...
        
         | toasted-subs wrote:
         | Yeah, I genuinely don't understand why so many people reach for
         | Java for everything.
        
           | marginalia_nu wrote:
           | It's a very well rounded language is why, and whatever
           | papercuts exist are either on the way out in newer/future
           | java versions, or made up for by the tooling ecosystem.
        
         | dimgl wrote:
         | Interestingly I recently saw an MQ broker written in Java and
         | it seemed to have some pretty impressive performance. I'm not a
         | huge fan of Java's DX but I have to admit, that's some serious
         | performance. Can't remember the name but it was on the homepage
         | this week I believe.
        
       | gavinray wrote:
       | I always enjoy reading stuff written by Andrey, he's a brilliant
       | fellow for sure.
       | 
       | Can highly recommend his personal blog as well:
       | https://puzpuzpuz.dev/
        
         | puzpuzpuz-hn wrote:
         | Thanks, Gavin, I'm pleased to hear that. And thanks for
         | recommending my blog!
        
       | _a_a_a_ wrote:
       | From article
       | 
       | "Imagine that we run this query over a few hundred million rows.
       | This means at least a few hundred million hash table operations.
       | As you might imagine, a slow hash table would make for a slower
       | query. A faster hash table? Faster queries!"
       | 
       | I'll read the article properly after this, this is just a quick
       | skim, but I can't see this quote can be correct. Unless I'm
       | missing something, hashing function is fast compared to random
       | bouncing around inside ram - very much faster then random memory
       | accesses. So I can't see how it make a difference.
       | 
       | Okay, I'll read the article now...
       | 
       | Edit:
       | 
       | "If you insert "John" and then "Jane" string keys into a FastMap,
       | then that would later become the iteration order. While it
       | doesn't sound like a big deal for most applications, this
       | guarantee is important in the database world.
       | 
       | If the underlying table data or index-based access returns sorted
       | data, then we may want to keep the order to avoid having to sort
       | the result set. This is helpful in case of a query with an ORDER
       | BY clause. Performance-wise, direct iteration over the heap is
       | also beneficial as it means sequential memory access."
       | 
       | but "...if the underlying table data or index-based access
       | returns sorted data..." Then you've got sorted data, in which
       | case use a merge join instead of a hash join surely.
        
         | puzpuzpuz-hn wrote:
         | > Unless I'm missing something, hashing function is fast
         | compared to random bouncing around inside ram - very much
         | faster then random memory accesses. So I can't see how it make
         | a difference.
         | 
         | In a GROUP BY, you may have a few hundred million rows, but
         | only a few hundred groups within them. A slow function would
         | slow down things dramatically in that case since the hash table
         | remain small and data access is potentially linear.
         | 
         | > Then you've got sorted data, in which case use a merge join
         | instead of a hash join surely.
         | 
         | This property is beneficial for GROUP BY which includes a
         | timestamp or a function over timestamp. QuestDB organizes data
         | sorted by time, so relying on insertion order may help to avoid
         | redundant sorting if there is an ORDER BY clause with the
         | timestamp column.
         | 
         | As for merge join, we also use it in ASOF join:
         | https://questdb.io/docs/reference/sql/join/#asof-join
        
       | senderista wrote:
       | Since the blog post mentioned a PR to replace linear probing with
       | Robin Hood, I just wanted to mention that I found bidirectional
       | linear probing to outperform Robin Hood across the board in my
       | Java integer set benchmarks:
       | 
       | https://github.com/senderista/hashtable-benchmarks/blob/mast...
       | 
       | https://github.com/senderista/hashtable-benchmarks/wiki/64-b...
        
         | jerrinot wrote:
         | A QuestDB engineer here: These are cool benchmarks! The idea to
         | try Robin Hood probing came to me after receiving some feedback
         | on Reddit. I ran initial experiments, and the results were
         | promising, leading to its integration into our codebase. Thank
         | you so much for sharing your repository. Perhaps one day we'll
         | explore bidirectional probing as well!
         | 
         | A snapshot of my happiness after running first experiments with
         | Robin Hood:
         | https://twitter.com/jerrinot/status/1730147245285150743 :)
        
           | brandonasuncion wrote:
           | Hi there!
           | 
           | I made the initial suggestion to look into Robin Hood hashing
           | when it was first posted on Reddit.
           | 
           | Glad to see it make its way into the repo!
        
         | puzpuzpuz-hn wrote:
         | Thanks! We're still benchmarking Robin Hood hashing and are
         | open to further experiments. The benchmarks look promising.
        
         | mlochbaum wrote:
         | Worth pointing out that this can depend a lot more on fiddly
         | details than you might expect. In particular, you're dealing
         | with a small fixed width allowing the hash to be stored in the
         | table instead of the key. The article emphasizes variable-
         | length keys, and I don't see any specialization on key sizes
         | (if 4- and 8-byte keys aren't common then this makes sense; if
         | they are then I'd expect dedicated table code for those sizes
         | to be valuable). And set lookups are also just a bit different
         | from value lookups. I think these cases are different enough
         | that I have no idea if the results would carry over, although I
         | can see how the bidirectional approach would reduce probing
         | more than RH which seems good.
         | 
         | ...and since I've done a lot of work with Robin Hood on small-
         | key lookups, I can point out some little tweaks that have made
         | a big difference for me. I have 8-byte lookups at just over
         | 3ns/lookup[0], albeit at a very low load factor, typically
         | <50%. A key step was to use the maximum possible hash as a
         | sentinel value, handling it specially in case it shows up in
         | the data. This way, instead of probing until finding an empty
         | bucket _or_ greater hash, probing just finds the first slot
         | that 's greater than or equal to the requested key's hash. So
         | the lookup code[1] is very simple (the rest, not so much). The
         | while loop is only needed on a hash collision, so at a low load
         | factor a lookup is effectively branchless. However, these
         | choices are specialized for a batched search where the number
         | of insertions never has to be higher than the number of
         | searches, and all the insertions can be done first. And focused
         | on small-ish (under a million entries) tables.
         | 
         | [0] https://mlochbaum.github.io/bencharray/pages/search.html
         | 
         | [1]
         | https://github.com/dzaima/CBQN/blob/5c7ab3f/src/singeli/src/...
        
           | senderista wrote:
           | Thanks for the links; the BQN impl looks really interesting.
           | I believe TFA deals with only hash codes and offsets in the
           | hash table proper (keys and values are stored separately in a
           | dynamic array), so fixed-width keys/values still apply. It's
           | true that you can't use keys interchangeably with hash codes
           | for variable-length keys like I do for integer keys, but I
           | don't expect that to affect the relative performance of RH
           | vs. BLP. (I'm curious how they handle colliding hash codes;
           | 32-bit hashes mean you have a ~50% probability of at least
           | one collision at 2^16 keys, which isn't much.)
        
             | mlochbaum wrote:
             | Looks like full keys are always compared if hash codes test
             | equal, which is what I'd expect. For example: https://githu
             | b.com/questdb/questdb/blob/master/core/src/main...
        
         | hawk_ wrote:
         | Can bidirectional linear probing be used for any key type? Or
         | do the keys need to be of some integral type?
        
           | mlochbaum wrote:
           | Since the hashes are stored, you could order by hash. This
           | would leave keys with the same hash unordered, so if you find
           | an entry with equal hash but unequal key you have to keep
           | probing, but that only matters on a full-hash collision.
        
           | senderista wrote:
           | The keys-as-hash codes trick only works for fixed-width
           | integers; otherwise you have to explicitly store the hash
           | codes. I assumed unique hash codes in my implementation, but
           | I think you could easily adapt the algorithm to allow
           | duplicate hash codes. Tolerating hash code collisions would
           | avoid having to increase hash code size (for collisions in
           | their case, you'd just need to probe multiple offsets in the
           | key/value array).
        
       | rkerno wrote:
       | Hi, I'm curious how you deal with the potential for hash
       | collisions across a large data set - is that a post-join check?
        
       ___________________________________________________________________
       (page generated 2023-12-20 23:00 UTC)