[HN Gopher] B-Trees Require Fewer Comparisons Than Balanced Bina...
___________________________________________________________________
B-Trees Require Fewer Comparisons Than Balanced Binary Search Trees
Author : greghn
Score : 155 points
Date : 2024-06-23 16:06 UTC (1 days ago)
(HTM) web link (databasearchitects.blogspot.com)
(TXT) w3m dump (databasearchitects.blogspot.com)
| shagie wrote:
| Long ago, I was taking the sophomore level data structures and
| algorithms class. It was being transitioned from C to C++ and
| being taught by Professor DeWitt. I have vague memories of linked
| lists and then binary trees. ... And then we got to the spot
| where the syllabus from previous years had AVL trees and red
| black trees... and he scoffed at teaching it.
|
| And that I do remember. I suspect he had been doing lecture
| planning over the weekend (this was in the "we're redoing the
| class, teach it as it needs to be taught" type mandate) and he
| had us skip that section of the text. The fist lecture he was
| going off of overhead transparencies that he was writing as he
| lectured. The second lecture he gave us photocopies of hand
| written notes and diagrams.
|
| Instead of AVL and red black trees, we instead learned about 2-3
| trees and B trees.
|
| For what it's worth, I still know how to do 2-3 trees from
| scratch.
|
| Later I looked at AVL trees... and they _still_ give me a
| headache.
| abhgh wrote:
| Red-black trees use to make me anxious. I think the first time
| I grasped them was after watching Erik Demaine's lecture on the
| topic. For anyone interested it is this [1]; its still a very
| good reference on the topic!
|
| [1] https://videolectures.net/mit6046jf05_demaine_lec10/
| cassepipe wrote:
| RB trees still make me anxious which is why I use AA trees
| instead
|
| https://en.wikipedia.org/wiki/AA_tree
| teruakohatu wrote:
| I feel there must be better ways of learning the principles of
| algorithms than balancing trees. I suffered through RB trees.
| 082349872349872 wrote:
| There are two fundamental things to teach in an algorithms
| course:
|
| 1) that there are many programs that implement the same
| algorithm, and many algorithms that implement the same
| function.
|
| 2) that there are a few basic tradeoffs we can make when
| implementing a function/customising an algorithm: space vs
| time, read vs write, deterministic vs indeterministic, etc.
|
| I'd guess trees have traditionally been used, because there's
| a whole zoo of them, which clearly presents (1) and allows
| (2) to be learned (in "cookbook" style) even if they're never
| explicitly presented (in "textbook" style).
| ajuc wrote:
| Suffering is the point. It's like exercises in math or push-
| ups - if you find a way to do them effortlessly you won't
| develop the skills you are training for.
|
| I agree they suck tho.
| slaymaker1907 wrote:
| B-trees are definitely more useful, but I don't think they're
| really a good idea to start out with as your first balanced
| tree.
| saghm wrote:
| > Instead of AVL and red black trees, we instead learned about
| 2-3 trees and B trees
|
| When we first were taught about B trees in my databases course
| in college, he professor quipped something like "these are the
| reason you'll never actually use a red-black tree in practice".
| I've actually found that in practice I rarely ever up actually
| needing to use any sort of data structure that maintains sort
| order; for whatever reason, I've run into cases far more often
| where I need to maintain insertion order than sort order, but
| I'm not confident about whether this is due more to the type of
| stuff I work on or if it's just a more common requirement in
| general.
| globular-toast wrote:
| They're not about maintaining sort order, though? They're
| about arbitrary lookup. Or do you mean "if I had to pick one
| nice side effect"?
| masklinn wrote:
| If your concern is _just_ arbitrary lookups then your
| baseline is a hashmap though.
| globular-toast wrote:
| Not really. Hashmaps don't have guaranteed performance
| characteristics and require the "correct" hash function
| to work properly. In particular, their performance
| degrades as they become more "full" requiring a re-
| allocation to a table of generally n-times the size.
| Balanced trees could be regarded a more generic solution
| to the mapping problem and are more suitable in many
| applications.
| fanf2 wrote:
| Trees require a lot more memory indirections than
| hashmaps, which makes them slower to traverse. They
| require a lot more interior pointers, which makes them
| bigger. It's easy for a hashmap to beat a tree.
| globular-toast wrote:
| Not in the worst case.
| sqeaky wrote:
| Which will be the developers responsibility to know when
| that is and is rarely the default or normal case.
| masklinn wrote:
| > Not really.
|
| Yes really, that's why the vast majority of languages
| default to a hashmap (if they even have tree-based maps
| at all). Turns out the average application is a _lot_
| more interested in a low-constant-factor O(1) best case
| than in a high-constant-factor O(log n) worst case.
|
| > Hashmaps don't have guaranteed performance
| characteristics
|
| Of course they do.
|
| > require the "correct" hash function to work properly.
|
| And tree maps require correct comparators to work
| correctly.
|
| > In particular, their performance degrades as they
| become more "full" requiring a re-allocation to a table
| of generally n-times the size.
|
| Yes? Also "n-times" seems like pretty fuddy fud. And it
| seems weird to be more worried by the reallocation of a
| hashmap when a tree requires an allocation per node, so
| your average balanced tree requires a heap allocation
| _per entry_. And two pointers, for up to 16x overhead not
| including the allocator's own metadata.
|
| > Balanced trees could be regarded a more generic
| solution to the mapping problem and are more suitable in
| many applications.
|
| And the amphibious cycle could be regarded as the more
| generic solution to the transport problem.
| lrschaeffer wrote:
| >> Hashmaps don't have guaranteed performance
| characteristics
|
| >Of course they do
|
| No, they do not, at least not worst case O(1). If you
| write a _very_ poor (but functional!) hash function or an
| adversary is choosing your items, then congratulations,
| your hashmap is now a linked list with extra steps.
|
| Sure, you _could_ use a robust cryptographic hash
| function, but you didn 't because (a) it's not the
| default in your language and (b) it's slow and you are "a
| _lot_ more interested in a low-constant-factor O(1) best
| case ".
|
| Otherwise, you're more or less correct.
| Maxatar wrote:
| I mean it's kind of pedantic but yes hash maps do have
| guaranteed performance characteristics. They are not
| guaranteed to be O(1), but they will never be something
| like O(2^N) either. You can give guarantees about their
| performance if you can make guarantees about the
| distribution of the inputs and the properties of the hash
| used.
|
| In my own experience, most uses of hash maps are
| typically O(logn). A good rule of thumb is hash maps are
| preferable unless you need predictable worst case
| performance or your hash maps are directly exposed to the
| public.
| masklinn wrote:
| > A good rule of thumb is hash maps are preferable unless
| you need predictable worst case performance or your hash
| maps are directly exposed to the public.
|
| Also, obviously, features only one of them provides e.g.
| if you need range queries then a hash table is quite
| useless. Meanwhile if you need insertion ordering, that's
| reasonably easy to do with a hashmap.
| globular-toast wrote:
| > Yes? Also "n-times" seems like pretty fuddy fud. And it
| seems weird to be more worried by the reallocation of a
| hashmap when a tree requires an allocation per node, so
| your average balanced tree requires a heap allocation per
| entry.
|
| Yes, it requires allocation for every entry whereas a
| hash map requires it for some entries. You don't know
| which ones. So real time constraints are out the window.
| But no it's not something your average web programmer
| (including me) needs to worry about. Some people do,
| though.
| saghm wrote:
| Yep, this is the thought process I had; a red-black tree
| or B-tree is something I'd consider for a map that needed
| sort order, and a hashmap would be what I'd use by
| default if I didn't care about order.
|
| For additional context, I work almost entirely in Rust
| nowadays, and BTreeMap and HashMap are the two map types
| in the standard library (https://doc.rust-
| lang.org/std/collections/index.html). I've ended up
| needing insertion-ordered maps quite often in my work
| though, which is not something I've seen taught very
| often or in standard libraries, although there are plenty
| of third-party implementations out there, and in most
| languages it's not too difficult to hand-roll an
| implementation using an unordered map and a linked list
| by keeping both the value and a reference to the node
| containing the element in the map, which provides
| constant time insertion, lookup and deletion. (Rust is
| somewhat notorious for _not_ being a language where
| implementing something like this by hand is super easy
| though, since tracking mutable references to individual
| nodes of a linked list across separate hashmap values is
| not something the borrow checker is enthusiastic about)
| jeffparsons wrote:
| > I've run into cases far more often where I need to maintain
| insertion order than sort order, but I'm not confident about
| whether this is due more to the type of stuff I work on or if
| it's just a more common requirement in general.
|
| I've had the exact opposite experience, so I'm guessing it
| has a lot to do with the type of stuff you've (and I've)
| worked on.
| kristopolous wrote:
| Honestly for me it's either linear search, dictionary, or
| external store like database and that's actually it.
|
| I think I've had to implement my own traversable data
| structure literally only in interviews
| saghm wrote:
| To clarify, I definitely don't mean I've been implementing
| these structures by hand! Even when using data structures
| from the standard library or other dependencies though, I
| just don't end up needing sort-ordered collections very
| often in my work, and I thought that was interesting
| (although sibling comments might indicate that this isn't
| necessarily typical).
| pcwalton wrote:
| In many cases just using a hash table and sorting the keys is
| faster than using a B-tree anyway. I tried switching from a
| separate sort to a hash table in the Bevy game engine for
| binning of transparent objects recently and it was a big
| regression.
| shagie wrote:
| > When we first were taught about B trees in my databases
| course in college
|
| I'll note that the instructor for this was Professor David
| DeWitt - https://en.wikipedia.org/wiki/David_DeWitt
|
| He normally taught databases and one of my great regrets was
| dropping his 500 level database class one semester when
| things were rough. (Another regret was not taking networking
| from Professor Landweber
| https://en.wikipedia.org/wiki/Lawrence_Landweber .)
| prerok wrote:
| May I ask why you regret it?
|
| It could be that my professors were simply not that good,
| so that I don't really understand this. I mostly groked the
| subjects by working through the books and exercises in
| them. The professors' lectures almost always went over some
| detail too quickly, so in a 3 hour lecture, I usually ended
| up following only the first hour or so.
|
| My question is maybe better put as: what quality or
| approach makes them great lecturers?
| shagie wrote:
| Having a class in databases from the person who was the
| DeWitt Clause in Oracle licenses.
|
| I had his class before (the aforementioned sophomore data
| structures and algorithms) and I knew he was a good
| lecturer. That wasn't a question. It was a "too much
| going on that semester and I wasn't able to do the
| assignments in a timely manner".
|
| Professor Landweber's class had a waiting list that I
| scoffed at as a college student and looked for something
| that I could get into and fill the requirement without
| realizing _who_ he was. It wasn 't until later while
| reading the UNIX and Linux System Administration Handbook
| and seeing an image in there that was attributed to him
| that the name clicked but he wasn't teaching that next
| semester and I had filled that requirement elseclass.
|
| Another regret was dropping numerical methods with
| Professor de Boor (
| https://en.wikipedia.org/wiki/Carl_R._de_Boor ) when I
| couldn't get my head around splines rather than going to
| his office hours (he was also the undergraduate advisor
| and _very_ helpful).
|
| Those classes could have changed the trajectory of my
| career. I was looking at being a sysadmin instead and
| writing perl and keeping some big iron running. My career
| pivoted from sysadmin to webmaster to web developer
| (perl) to web developer (Java). I have no regrets from
| that path - I do have regret for not learning what I
| could from the experts in their field when I was in a
| place and time to learn from them.
| Twirrim wrote:
| Slightly random anecdote. Had an intern one summer at a large
| tech company I worked for, who was determined to inject a
| red-black tree in to the project they were working on for us,
| using hand-rolled code.
|
| I'm not sure what was going through his head at the time,
| whether he'd just learned how to write them at college, or if
| he just thought they were the pinnacle of engineering.
|
| None of the cases they were trying to shoe-horn them in to
| made sense. Places where you might debate that even a
| straight array would be sufficient, due to the limited number
| of items. The worst case of searching to the end of the array
| wouldn't matter. Doubly so given they weren't working on
| anything anywhere near a hot path, or a time sensitive
| operation. Standard library components would have been
| perfectly fine, and not carried the maintenance burden.
|
| They did, on at least a couple of occasions, smugly explain
| what a black-red tree was to their mentor, who had likely
| implemented their first red-black tree at college before that
| Intern had ever even touched a computer for the first time.
| He showed _remarkable_ patience.
| rdtsc wrote:
| > I'm not sure what was going through his head at the time,
| whether he'd just learned how to write them at college, or
| if he just thought they were the pinnacle of engineering.
|
| That's understandable and rather common with new fresh
| grads. They have the energy and motivation but may not
| quite have the experience. We were all there once,
| including me.
|
| My history professor from 8th grade used say: "There is a
| difference between knowing what you're doing and doing what
| you know".
| saghm wrote:
| > I'm not sure what was going through his head at the time,
| whether he'd just learned how to write them at college, or
| if he just thought they were the pinnacle of engineering.
|
| I think it's easy to forget that a lot of interns might
| never have actually worked on a "real" codebase before; I
| know I certainly hadn't before my first internship! For
| someone who had only ever worked on homework code in
| classes before and maybe some side projects for fun, it's
| very easy to imagine that they honestly didn't understand
| that implementing the hardest data structure they knew by
| hand wouldn't actually be an impressive display of
| engineering to the company they were hoping to get a job
| offer from.
| mywittyname wrote:
| I remember part of it being that they beat efficiency
| into your brain throughout most of early CS. In college,
| using the naive approach is treated as being ignorant.
|
| Then you work on real software and realize that computers
| are pretty fast and basic hashmaps & arrays will likely
| suffice for the size of the datasets you're working with,
| and beyond that, performance is more about tool/database
| choice than rolling your own algorithms.
| Twirrim wrote:
| First code reviews are always interesting with interns, I
| love reading through them to learn stuff (I'm an
| operations focussed / sysadmin type, coding is more
| something I've picked up as I've gone along and wanted to
| automate stuff)
|
| This particular intern was dead set on this red-black
| tree, despite it being rejected constructively,
| repeatedly, every time they tried to introduce it, and
| despite mentoring on the subject from the experienced
| developer who was his mentor, and guidance from other
| engineers in the group.
|
| The only intern I every had to deal with who didn't seem
| to recognise that production code is different from
| academic code, where the balance on clever vs
| maintainable is vastly different :)
| 29athrowaway wrote:
| Left leaning red black trees are simpler to implement than the
| regular red black tree.
| skribanto wrote:
| Left leaning red black trees enforce extra constraints, which
| remove symmetries. This actually adds an edge case when
| instead you can just treat left/right rotations and
| comparisons as symmetric operations.
|
| Left Leaning Red Black Trees Considerer Harmful:
| https://www.read.seas.harvard.edu/~kohler/notes/llrb.html
| tgv wrote:
| AVL isn't complicated. Insert the node as you would in a binary
| tree. Do that recursively. Then, when you return from
| inserting, you check at each level if the height of the tree
| under the left and right children differ at most 1. If not, you
| perform a little shuffle of the nodes, update the levels, and
| return to the previous level. That's all. The proof that that
| works is a bit complicated, though.
| e12e wrote:
| I have a soft spot for red-black trees ever since reading about
| them in Dr. Dobbs - it's the first time I remember reading
| about a data structure (and it went mostly over my head at the
| time - but the fond memory remain):
|
| I believe it was the 1992 article by Bruce Schneier:
|
| https://jacobfilipp.com/DrDobbs/articles/DDJ/1992/9204/9204c...
| DanielHB wrote:
| > Later I looked at AVL trees... and they still give me a
| headache.
|
| Why? From what I remember AVL trees were much easier to
| implement than B-trees. Especially if using recursion.
| whizzter wrote:
| Insertion in AVL tree's is fairly simple and doesn't really
| have any cascades so they are quite predictable, deletions on
| the other hand cascades upwards with a bunch of different
| imbalances and thus rotations(and post rotation balances) to
| be accounted for.
|
| Recently implemented them for an advanced hobby project and
| having a solid fuzzing tester in place really alleviated a
| lot of headaches in the long term since it was able to
| produce quite an exhaustive set of conditions.
| nayuki wrote:
| B-tree deletions are quite complicated too, requiring
| stealing keys from either-side sibling or merging with
| either-side sibling.
|
| In practice, I find that a B-tree implementation is 2 to 3x
| the length of an AVL tree impl.
|
| I also fuzz-tested my implementations thoroughly.
| https://www.nayuki.io/page/avl-tree-list ,
| https://www.nayuki.io/page/btree-set , etc.
| DanielHB wrote:
| Yeah, this is what I was thinking too. The rebalancing on
| AVL is a bit tricky but once you grok it, it is not that
| much code...
| nayuki wrote:
| I would not scoff at AVL trees. They are easy to prove
| correctness for, and admit a short implementation at ~100
| lines. They're as short as AA trees, but AA trees are harder to
| prove with more cases. https://www.nayuki.io/res/aa-tree-
| set/BasicAvlTreeSet.java , https://www.nayuki.io/page/aa-tree-
| set
|
| B-trees require a much longer implementation than AVL trees.
| https://www.nayuki.io/page/btree-set
| harrison_clarke wrote:
| red black trees can be thought of as 2-4 trees, and they're way
| less headache-inducing when you do
|
| if you inline the red nodes into the black nodes, you get a 2-4
| tree node. except that the 3-nodes have chirality. but then,
| you can just use a 2-4 tree
|
| AVL trees seem not worth learning. they have slightly different
| performance characteristics than red-black trees. but, if i
| care about performance beyond big O, i'm probably using a hash
| table or a cache-aware tree of some kind instead of a CS101
| data structure
| cperciva wrote:
| In a sense, red-black and 2-3 trees are just a special case of
| B-trees with small nodes. Smaller B-tree nodes have (by a
| constant factor) more expensive searching than larger nodes; but
| smaller nodes are more efficient for mutating operations. And
| indeed, if you have an insert/delete heavy workload you can't
| beat something like a red-black tree.
| anonymoushn wrote:
| This sounds surprising, unless you mean specifically a workload
| where a bunch of inserts and deletes are made using a stored
| cursor, so that they don't need to follow a bunch of pointers
| from the root.
| aidenn0 wrote:
| AA trees are isomorphic to 2,3 trees and RB trees are
| isomorphic to 2,4 trees. IIRC we learned 2,4 trees first in my
| data-structures class, and then demonstrated that RB trees were
| a different representation of the same idea.
| masklinn wrote:
| > Practical B-trees often use fairly large values of k (e.g.,
| 100) and therefore offer tight bounds -- in addition to being
| more cache-friendly than binary search trees.
|
| That is true on-disk, where you would not use BSTs anyway. In-
| memory, mid to high single digit values are practical. For
| instance iirc Rust's btree (map and set) uses something like k=6.
| leni536 wrote:
| Interesting, Abseil's btree claims to use a larger value:
|
| _... for absl::btree_set <int32_t>, nodes currently have 62
| children ..._
|
| https://abseil.io/about/design/btree
| vlovich123 wrote:
| From the looks of it Rust [1] uses a constant branching
| factor based on number of items whereas ABSEIL generally uses
| a target of 256 bytes [3] for branching and fits however many
| elements fit within that [2]. Rust's approach seems to be
| more primitive as ABSEIL is optimizing for cache line usage
| (not sure why it's several multiples of a cache line - maybe
| to help the prefetcher or to minimize cache line bouncing?)
|
| [1] https://github.com/rust-
| lang/rust/blob/master/library/alloc/...
|
| [2] https://github.com/abseil/abseil-
| cpp/blob/74f8c1eae915f90724...
|
| [3] https://github.com/abseil/abseil-
| cpp/blob/74f8c1eae915f90724...
| masklinn wrote:
| > not sure why it's several multiples of a cache line -
| maybe to help the prefetcher or to minimize cache line
| bouncing?
|
| The AMD64 cache line is only 64 bytes, that would make for
| very low branching factors given interior nodes need a
| pointer per child, plus key, plus record pointer if b-tree
| (as opposed to b+tree).
| vlovich123 wrote:
| To be clear, I was examining only leaf nodes. I'm not
| sure if interior nodes use the same logic. Still, ABSEIL
| generally uses a larger branching factor than Rust for
| some undocumented reason.
| rtheunissen wrote:
| That does not mean that b-trees are unequivocally better than
| binary search trees. There are some applications, like
| concurrency or persistence, where comparison count is not as
| important.
|
| For instance, fewer values per node means less information to
| copy when copying the node. Locking is more granular with fewer
| values per node because fewer values are locked at a time.
|
| The binary structure also makes it possible to support randomized
| balancing and self-adjusting strategies like splay trees.
|
| I am disappointed to still see frequent mention of red-black
| trees - I see no reason for red-black trees to ever be the best
| choice in practice, or in theory, or in school. LBST's
| (logarithmic binary search trees) [1] are generally simpler, more
| intuitive, and more efficient [2] than red-black trees. They also
| support positional access inherently, so the balancing
| information is useful (subtree size).
|
| [1] https://www.semanticscholar.org/paper/A-New-Method-for-
| Balan... [2] https://rtheunissen.github.io/bst
| dragontamer wrote:
| > For instance, fewer values per node means less information to
| copy when copying the node. Locking is more granular with fewer
| values per node because fewer values are locked at a time.
|
| But this also assumes single-threaded computation.
| Increasingly, high-performance computers are SIMD-compute (aka:
| GPUs, and if not, at a minimum you're using AVX512 or ARM SVE).
|
| If you have 1024-threads per thread group (common maximum in
| GPUs), that will commonly be used to create 1024-element B-Tree
| nodes, as sorting can be accomplished in a single pass network.
| (Bitonic network: https://en.wikipedia.org/wiki/Bitonic_sorter)
|
| A singular lock that covers the whole 1024-node would better
| match these very wide GPUs that perform very wide SIMD-
| execution in practice. Physically, GPUs are 32-wide for NVidia
| and 32-wide or 64-wide for AMD. But various software tricks
| increase the width by either ganging-up neighboring compute-
| units in parallel and synchronizing them... or by running the
| same code across sequential passes like in AMD GCN
| architecture.
|
| However, the programming interface of a 1024-wide (looking)
| piece of hardware/equipment naturally maps to 1024-wide B-Tree
| nodes, a 1024-wide Bitonic sort (or MergePath sort), and other
| such parallel sorting networks.
|
| ----------
|
| B-Trees are "some number larger than 2", which I think for
| modern high-performance SIMD-based GPUs, the number absolutely
| should be "more than 2".
|
| Where I'm wondering, is if the optimal algorithm is 32-wide
| (aka: one NVidia wave, the smallest unit of computation on an
| NVidia GPU), or if the optimal algorithm is 1024-wide (the
| maximum gang of waves that can work as a team). Or if maybe
| using the said maximum-gang across an even larger node (ex:
| 4096-wide) has any benefits.
|
| --------
|
| We are reaching the point where compute is so cheap that even a
| 1024-wide parallel sort is a trifling thing.
| orlp wrote:
| One thing to keep in mind is that keeping the keys sorted on an
| insert requires O(k) element moves, so an insert is O(log2(n) +
| k) operations. So if you make k very large your inserts get
| slower.
| kazinator wrote:
| How B-trees can win is by taking advantage of the node children
| being sorted in an array. This allows a binary search to be used.
|
| A B-tree of any branching factor (not only 2) is equivalent to a
| binary tree.
|
| We can take any B-tree node _n_ : n
| | ---------+----------- | | | | | | a
| b c d e f
|
| and regard it as an unbalanced tree n |
| / \ a /\ b/\ c/\ d/\
| e/\ f nil
|
| When searching for an item, if we linearly search every node of
| the B-tree, like from _a_ to _f_ , it as if we are traversing the
| unbalanced tree (except for constant-time instruction and cache
| level efficiencies).
|
| What the unbalanced tree cannot do is binary search through _a_
| to _f_ to find which one to descend into; it doesn 't have those
| nodes in a neat array for that.
|
| That binary search makes B-tree more or less equivalent to a very
| balanced binary tree.
| jvanderbot wrote:
| I don't think so?
|
| I think the b-tree is using the start/end sequence knowledge at
| each to do _fewer_ node evaluations. That is, a b-tree is
| "shorter" so of course it'll take fewer node evaluations to
| reach the leaf.
|
| "b-tree is equivalent to balanced binary tree", and "unbalanced
| tree", don't make a ton of sense when ostensibly the article
| compares b-trees to balanced BST.
| bunderbunder wrote:
| Though, just spitballing here - I'd guess that, in practice, a
| linear search still tends to handily outperform a binary search
| (and, by extension, a well-balanced binary tree) whenever the
| keys are stored in a compact array. Due to better interaction
| with the cache and speculative execution.
| sam0x17 wrote:
| yeah locality of reference is way better as you're traversing
| the array than as you follow links in the tree
| shpongled wrote:
| Definitely depends on the branching factor! There are other
| encodings like Eytzinger that can provide further improves
| over "standard" binary search too [1].
|
| [1] https://en.algorithmica.org/hpc/data-structures/binary-
| searc...
| kazinator wrote:
| Suppose the key comparisons are much more expensive than the
| array or pointer navigation. Like the article does: it says
| that it is about theory, and it is looking at number of
| comparisons as the performance measure.
|
| The fact that we can search a B-tree node that has, say, 256
| children in 8 steps means that the node effectively contains
| the equivalent of a small balanced binary tree.
| kevin_thibedeau wrote:
| It does. The break even point varies by platform but linear
| will win for sufficiently short arrays.
| torginus wrote:
| I remember reading a blogpost about this, they compared
| binary search on integer keys with a highly-optimized
| unrolled SIMD linear search.
|
| The takeaway was that the break-even point was suprising
| small - just tens of elements, which is crazy considering a
| CPU can do like 2x 8-wide AVX2 integer comparisons in a
| single cycle. Modern CPUs are just that good at executing
| branchy code.
|
| But with B-trees we can have the best of both worlds - trees
| with a relatively high branch factor, like 16, that can still
| be tested in one go using SIMD.
|
| Another advantage is cache locality - which is one of the
| reasons B-trees are favored when implementing filesystems -
| much of the tree doesn't need to be fetched into memory while
| the parts that do are contiguous in memory.
| sqeaky wrote:
| I think you might be missing the real performance
| characteristics of the hardware. The amount of comparisons
| might be the same, but comparisons are only a proxy for the
| expensive parts of lookups, the latency to tell RAM to provide
| data.
|
| If the whole tree is in CPU cache then likely the difference is
| noise, but if the tree is large enough to have some data in RAM
| then the cost to copy that into cache will be larger based not
| on the size, but on the amount of queries from the CPU to the
| RAM (unless the size it way out of proportion). RAM has latency
| that is generally an order of magnitude slower than CPU Cache
| and generally an order of magnitude slower than fast
| contemporary storage.
|
| There are many places you can find "Latency numbers every
| programmer needs to know"
| https://fullstackengineer.pro/blog/latency-numbers-programme...
|
| But this trend has held for about 30 years and seem fundamental
| to the economics of building hardware. CPUs will be fastest,
| and cache slower, and RAM slower still , and fast storage still
| slower etc.
|
| From your example, if it takes 100ns to get the node containing
| a~f then the whole of the B-Tree search for f might take 150ns
| (1 lookup from RAM + a few CPU intructions), but with the
| strictly binary tree it will need 6 lookups to RAM, and we can
| see that we can ignore the CPU time because it is so small.
| This could take 600ns (or 1300ns if each branch is a node with
| only pointers and the data needs to be dereferenced too). A
| smart cache prefetcher might guess that these were pointers and
| precache a and b once n was referenced, and might even cache
| things pointed to by loaded pointers but even in that highly
| optimistic scenario it could still take 300ns to load stuff.
|
| Consider filesystems on storage and how they have had this
| problem but worse for a long time. Tree structures with many
| pointers are faster than binary trees in proportion to latency
| between the thing processing the tree and the place the tree it
| stored.
|
| EDIT - And don't get me started on clever use of SIMD
| instruction. Imagine if node in a tree had binary layout
| matching an AVX register, then comparisons looking for the
| searched item could check a whole node in a single instruction!
| I am curious if there are any implementations that do this?
|
| EDIT 2 - It does exist and I am late to the show, people have
| been doing this for like 15 years:
| https://stackoverflow.com/questions/20616605/using-simd-avx-...
| kazinator wrote:
| Like the article, I'm only concerned with comparison counts,
| trying to explain why B-trees look like balanced BSTs, and
| why that gets tighter and tighter with increasing numbers of
| children per node.
|
| It's simply because the nodes themselves contain the
| equivalent of a balanced binary tree, in the form of a
| binary-searched array.
|
| In the ultimate case, we set the number of children to be
| large enough to contain all the data, so then there is only
| one B-tree node, which is binary searched to find the
| element. Then we have the same number of comparisons as well
| balanced binary search tree.
| 9659 wrote:
| Just because it was first discovered 15 years ago, does not
| mean it wasn't still a good idea.
|
| Keep on thinking. You never know what else you will come up
| with.
| shin_lao wrote:
| B-Tree are a good fit if your values are small, but as the value
| size grows, they lose their advantage.
|
| Binary tree is a good default choice for sorted structure.
|
| Underappreciated structure: sorted arrays (but come with big
| caveats).
| nayuki wrote:
| Blogspot supports HTTPS. Please always use it when linking to the
| site.
| exabrial wrote:
| I think either is a good choice for knowing nothing about the
| data set.
|
| After n grows large enough, I think you'd want to benchmark your
| -exact- dataset on your -exact- prod hardware. Some datasets may
| cause certain trees to perform badly. Some hardware is much
| faster at memory lookups and cache misses are not as painful.
|
| For instance, developing locally on Intel and deploying to Arm
| could yield pretty wild performance differences!
___________________________________________________________________
(page generated 2024-06-24 23:01 UTC)