[HN Gopher] Static B-Trees: A data structure for faster binary s...
       ___________________________________________________________________
        
       Static B-Trees: A data structure for faster binary search
        
       Author : sereja
       Score  : 162 points
       Date   : 2022-02-17 17:42 UTC (5 hours ago)
        
 (HTM) web link (en.algorithmica.org)
 (TXT) w3m dump (en.algorithmica.org)
        
       | ww520 wrote:
       | For static data would it be faster to use a trie or radix tree?
        
       | sydthrowaway wrote:
       | I need this for my interview skills.
        
       | ignoramous wrote:
       | An interesting resource.
       | 
       | Just this past day, I prototyped a Skip List backing a sorted
       | key/value store, and it was super simple to do so (as compared to
       | AVL / Red Black).
       | 
       | B-Trees are notorious for their complexity, and so SingleStore
       | (memSQL) reasoned that their move to use Skip Lists to back their
       | tables was justified: https://archive.is/Qhpgn
       | 
       | It would be interesting to see the author attempt faster Skip
       | Lists, as their expertise in this topic seems deep enough.
        
         | josephg wrote:
         | I've been wondering something similar. I recently published a
         | rust rope library based on skip lists. (Ropes are "fat strings"
         | optimized for efficient insertion and deletion of arbitrary
         | characters.)
         | 
         | My rope is backed by a skip list, where each node in the skip
         | list has a gap buffer. Even without the gap buffer my skip list
         | is ~1.5x faster than the next fastest library on cargo (ropey -
         | which is b-tree based). With gap buffers the resulting code is
         | 4x faster. My library is also significantly smaller, both in
         | lines of code and compiled size.
         | 
         | I've got another b-tree library I've been using for CRDT
         | editing in Diamond Types. Based on this result I'm considering
         | rewriting my btree as a skip list to see how it compares. It'd
         | be a direct apples to apples comparison - I've already tuned my
         | btree well as I can. I'm very curious how performance would
         | change.
         | 
         | https://crates.io/crates/jumprope
        
         | CamperBob2 wrote:
         | What's the advantage of a skip list over a simple hash map?
        
         | Comevius wrote:
         | Skip lists are so simple conceptually, I'm managing to make
         | them concurrent, verifiable and even persistent.
         | 
         | For reference this is a concurrent skip list that a high school
         | student could implement. It's beautiful.
         | 
         | https://people.csail.mit.edu/shanir/publications/LazySkipLis...
        
           | sadiq wrote:
           | If anyone wants one based on Herlihy et al's "The Art of
           | Multiprocessor Programming" there's a well reviewed, tested
           | and heavily commented concurrent lock-free one in OCaml's
           | runtime now: https://github.com/ocaml/ocaml/blob/trunk/runtim
           | e/lf_skiplis...
           | 
           | It's used to manage code fragments that need to be accessed
           | in signal handlers.
        
         | ot wrote:
         | Note that the article is exclusively about _static_ data. The
         | vast majority of the complexity of B-trees comes from
         | supporting updates, but constructing a balanced B-tree from a
         | sorted list is relatively straightforward.
        
         | marcosdumay wrote:
         | Why does the article benchmark indexes on full scan operations?
        
       | ot wrote:
       | Very interesting post. The best results I knew of for static
       | ordered search [1] were from the 2015 Khuong-Morin paper [2],
       | which is not cited in the article but the author cites it in the
       | previous post, and in their benchmarks Eytzinger+prefeching
       | performed best. The improvement on top of that is pretty
       | impressive. At the larger sizes, the latencies are basically 1
       | cache miss (~50ns).
       | 
       | However, I have some comment on the benchmark, the iterations
       | have no loop dependency on the previous result, so the CPU is
       | allowed to pipeline some loads. This is rarely representative of
       | real workloads, where the key being searched comes from other
       | computations, so this is more a measure of highest possible
       | reciprocal throughput than actual latency. Would be interesting
       | to see what happens if they make the key being searched a
       | function of the result of the previous lookup. (EDIT: as gfd
       | below points out, the article actually discusses this!)
       | 
       | Also some of the tricks only work for 32-bit keys, which allow
       | the vectorized search in nodes. For larger keys, different
       | layouts may still be preferable.
       | 
       | [1] Ordered search is when you need to look up keys that are not
       | in the dataset, and find predecessor/successor. If you only do
       | exact lookups, a hashtable will almost certainly perform better,
       | though it would require a bit more space.
       | 
       | [2] https://arxiv.org/pdf/1509.05053.pdf
        
         | gfd wrote:
         | If you search for "reciprocal throughput" in the article, the
         | author did do another benchmark with dependencies to compare
         | "real latency".
        
           | ot wrote:
           | Thanks! You're right, I missed it because it was in the
           | section "Comparison with std::lower_bound" and I thought
           | there would be just some gloating :)
           | 
           | Most published data structure benchmarks have this pitfall,
           | and don't discuss it, then when trying to measure the
           | improvements after plugging into a real workload it turns out
           | that the microbenchmark was not predictive. When I see
           | latencies that top out at 50ns when they clearly have to make
           | more than one causal load I immediately get skeptical.
           | 
           | This is one of the very few articles I've seen that clearly
           | explains this, which makes it even more excellent. I only
           | wish it didn't label the other graphs "Latency" too, or at
           | least put an asterisk there.
        
             | sereja wrote:
             | By the time I realized it needs correcting, I was already
             | like 15 graphs in and really didn't want to remake them.
        
               | _0ffh wrote:
               | Somewhat understandable. As I sometimes stumble over
               | comparable situations I have made it a habit to never
               | just make any graphs, I always make a script that makes
               | the graphs. That makes it easy and quick to go back and
               | just re-generate them if you later change your mind about
               | some detail.
        
       | rurban wrote:
       | Still does not beat static perfect hashes, the usual solution in
       | these cases.
        
       | SNosTrAnDbLe wrote:
       | This is a very good resource. I am bookmarking it! It is very
       | engineer friendly
        
       | tabtab wrote:
       | In general the "best" algorithm seems dependent on specific usage
       | patterns. A different angle is to study or sample actual usage
       | patterns and automatically adjust the structure/algorithm based
       | on that. The indexes can be readjusted during off/slow hours. (If
       | you don't have off hours, then such a system may not be the best
       | fit.)
        
       ___________________________________________________________________
       (page generated 2022-02-17 23:00 UTC)