[HN Gopher] Binary Trees are optimal except when they're not
___________________________________________________________________
Binary Trees are optimal except when they're not
Author : jjgreen
Score : 20 points
Date : 2021-07-21 07:57 UTC (15 hours ago)
(HTM) web link (hbfs.wordpress.com)
(TXT) w3m dump (hbfs.wordpress.com)
| cakoose wrote:
| Yup, ignoring the costs of memory/storage access (cache, main
| memory, disk) can hurt an algorithm's real-world performance.
| Laying your data out in chunks that match the memory/storage
| blocks sizes (cache line size, disk block size) can yield much
| better performance.
|
| And while having the algorithm be aware of the chunk sizes in
| every layer of your memory hierarchy is optimal, it can be
| inconvenient to get all that set up. But what's cool is that you
| can often lay your data out in a fractal-like pattern, and get
| pretty good performance no matter what the various chunk sizes
| are: https://en.wikipedia.org/wiki/Cache-oblivious_algorithm
| riggsdk wrote:
| Basically algorithmic complexity analysis often ignores the cost
| of accessing data because the underlying storage is sometimes
| unknown - be it in memory, on a spinning disc or on magnetic
| tape.
|
| Unfortunately that gives suboptimal algorithms because that basic
| practically is ignored. You then end up with a bunch of
| algorithms that look worse "on paper" but still outperforms the
| theoretically optimal ones.
|
| Often the number of items is also ignored. If you want to
| implement a set-like feature but you'll be using less than X
| elements, you can often get better performance just using a
| linear search through an array than use a full fledged hashed
| set.
| jsmith45 wrote:
| Agreed. There are plenty of cases where an asymptotically
| "worse" algorithm performs better than a "better" one for all
| practical sizes.
|
| Like it is very very much possible that the asymptotically
| "better" algorithms have coeffcients so high that the break
| even point takes like week or more of computation. There are
| not many programs where users will tolerate using it with so
| much data that operations take that long.
| valbaca wrote:
| Author seems to have just backed into rediscovering B-trees
|
| https://en.wikipedia.org/wiki/B-tree
|
| > To conclude, binary trees are optimal when we ignore the cost
| of accessing a node, but they aren't when it becomes costly to
| access a node. When we access the nodes on disk, with a high
| cost, it becomes interesting to bundles many keys in a node, and
| we gave the optimal solution. However, the problem is often
| solved quite more directly. We just fit as many keys as possible
| in an I/O block. Disks operates on small blocks of (typically)
| 512 bytes, but file systems operate in somewhat larger, but
| fixed-sized blocks, something like 4 or 8k. A good strategy is
| therefore to fit as many keys as possible in that block, since
| even if the number of comparisons is large, it will still be
| faster than bringing that block into main memory.
| bob1029 wrote:
| Skiplists are my favorite data structure. There are a few use
| cases that they can't satisfy, but for the typical memory-bound
| key-value store application, it's a very simple & effective
| approach.
|
| If you have to go to disk, a combination of: splay tree, batching
| and append-only log structure are your best bet.
___________________________________________________________________
(page generated 2021-07-21 23:00 UTC)