[HN Gopher] Atree: A simple and efficient pointer-free tree impl...
___________________________________________________________________
Atree: A simple and efficient pointer-free tree implementation
Author : spenczar5
Score : 108 points
Date : 2023-12-16 17:52 UTC (5 hours ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| tlack wrote:
| Always surprising to click through a link on HN and discover it
| is one's own work. For a time I was very interested in
| lightweight array-based implementations of common data structures
| and this one seemed particularly handy.
| spenczar5 wrote:
| It sounds a little like it didn't work out as well as you
| hoped. How did it fare?
|
| I am interested because I have some scientific software
| (astrodynamics - propagating orbits in space) that would really
| benefit from a cache-friendly tree, and this seemed promising.
| ninepoints wrote:
| It does? Feels like an O(n) scan every time you need to query
| the children of a node is a nonstarter for most applications
| (the readme is strangely optimistic about this point). Plenty
| of cache friendly tree implementations out there, and this
| one seems actually cache hostile with few benefits in my mind
| aside from ease of implementation.
|
| Also, I write a lot of code that runs on a gpu, and the claim
| that this tree is somehow gpu friendly I find particularly
| dubious.
| jiggawatts wrote:
| There are use cases where that doesn't matter, such as some
| compilers where it makes a full pass over the source code
| for every optimisation type.
| spenczar5 wrote:
| For me, N is small. Its also N-ary, not binary, which
| crosses off a bunch of the first options. Anyway, I am not
| sure this will work, just worth trying. Empirical numbers
| beat theory every time :)
| ninepoints wrote:
| You are using N in a different sense than I am. Unless
| I'm reading the tree description incorrectly, N is the
| size of the tree itself, not the number of children.
| spenczar5 wrote:
| Oh, I was being sloppy and mixed N into the ariness: I
| meant N elements, each with a variable number of children
| (as many as 8).
| ninepoints wrote:
| I would hazard a guess that a regular n-ary tree would
| outperform the OP tree in many usage scenarios with no
| extra effort, and with a number of B+ tree variants being
| strictly better at the cost of more effort.
| mgaunard wrote:
| Vector programming requires you to change your way of
| thinking; instead of computing something for 1 element, you
| compute it for N elements.
| ninepoints wrote:
| I do lots of simd and shader programming, but regardless
| of register width, O(n) is not O(1)
| mgaunard wrote:
| The point is that you shouldn't try to get all the
| children of a single node, but rather all the children of
| all the nodes, which is still O(n) and not 0(n^2).
| a1369209993 wrote:
| > Feels like an O(n) scan every time you need to query the
| children of a node is a nonstarter for most applications
| (the readme is strangely optimistic about this point).
|
| The trick is that querying the children of N different
| nodes is usually _still_ a single O(N) scan, so if you
| operate on it array-style (which APL heavily encourages
| anyway), it 's amortized constant time. Of course that's
| not _always_ viable, but APL programmers tend to be
| surprisingly good at making array operations out of things
| you wouldn 't expect to use arrays for.
|
| > cache hostile
|
| If you additionally enforce that all parent indexes point
| to lower numbers, a preorder traversal is a linear scan
| forward, and a postorder traversal with child order
| reversed (which you can usually correct for one way or
| another) is a linear scan backward.
|
| (This assumes you only need dependency ordering, ie the
| parent node uses or supplies data to/from its children; if
| you need a true sequential traversal, the array has to be
| sorted according to that traversal (but is still a valid
| Apter Tree).)
|
| > the claim that this tree is somehow gpu friendly I find
| particularly dubious
|
| Yeah, array programming is generally kind of hit-or-miss at
| that and this does _look_ like a miss.
| ninepoints wrote:
| The linear scan you are talking about I don't think gives
| you any sort of ordered traversal right? Unless I'm
| missing something.
| a1369209993 wrote:
| For a _arbitrary_ Apter Tree, a linear scan is unordered.
| You can impose additional constraints to get a ordered
| traversal (in the same way that, eg, you can sort a
| assoc-list /JSON-style key-value table by keys to get a
| key-order traversal), and the result is still a valid
| Apter Tree (respectively valid list of key-value pairs).
| tlack wrote:
| I still like that setup for using trees in low level
| languages.
|
| But personally I've been working at higher levels of the
| stack the last few years, where these kinds of decisions seem
| less important.
|
| And on another level, it seems like coders in general aren't
| that interested in vector oriented languages and techniques
| which makes their study somewhat isolating.
| fwsgonzo wrote:
| It is always like that when you venture off the well
| trodden path. I am studying low latency emulation and it's
| also isolating.
| 8372049 wrote:
| Friendly heads up, "pseudo" is frequently misspelled "psuedo"
| in the readme.
| rewmie wrote:
| It might be of interest to anyone that there's an implicit binary
| tree data structure dubbed Eytzinger's tree/method that only
| requires a single vector.
|
| https://opendatastructures.org/ods-cpp/10_1_Implicit_Binary_...
|
| I dare say that no tree data structure beats Eytzinger's method
| in cache locality.
| ulatich wrote:
| There is also the "Binary Ostensibly-Implicit Tree". Like
| Eytzinger's tree/method but without the need to pad memory.
| Generalises to n-ary trees too.
| amluto wrote:
| > I dare say that no tree data structure beats Eytzinger's
| method in cache locality
|
| Cache locality is good toward the leaves and bad everywhere
| else. This is why binary search on a sorted array is actually
| quite slow.
|
| ORC in Linux, for example, was first prototyped as a sorted
| array of little structures. It got _much_ faster by adding an
| auxiliary index giving O(1) access to a credible starting point
| for a search.
| gumby wrote:
| A pointer is simply an index into an array which is RAM.
| lelandbatey wrote:
| Conceptually, yes. There are a lot of finer points to this
| concept though. One of the points most relevanthere is that
| accessing "RAM" is not _actually_ an O(1) operation, due to CPU
| caches which can speed up sequential access of "close together"
| data. Thus, if you manually ensure that data is "close
| together" (by putting it all in an array together), you can
| massively speed up your program in the real world because now
| all your keys/data can be squeezed into the cache at once.
| gumby wrote:
| You can allocate your data the same way, especially using
| custom allocators in C++. And every reference requires some
| arithmetic too, if you're worrying about that.
|
| But my point is that storing them in another array doesn't
| buy you a lot if you allow deallocations (you can then still
| have a use-after-free, or even read the wrong object) while
| if you don't deallocate you might as well just use pointers.
|
| Just trying to understand the benefit of going to all this
| work. I can see the point in old FORTRAN code before FORTRAN
| 90
| lelandbatey wrote:
| I think it's mostly just an experiment, example, proof-of-
| concept or other "toy" implementation
| demonstrating/exploring the idea.
| gumby wrote:
| Yeah, no judgement on that.
|
| But I've worked with indexed classes (mixin class) to
| avoid pointers in the past and have never found them
| worth the effort. Since someone else is mentioning it
| here on HN I'm asking if there is an advantage over
| pointers that I don't realise -- if so I might be glad to
| take advantage of it myself.
| zappb wrote:
| Pointers are a bit more abstract than that. Since each program
| has a virtual address space, and said address space is made up
| of memory pages, even raw pointers need to be translated to the
| physical address via TLBs. In that sense, indices into arrays
| are more like pointers than the other way around.
| mrkeen wrote:
| But they point into _your_ RAM, which means you can 't store
| them on disk or pass them over the network.
| gumby wrote:
| A minor but legitimate benefit.
| edflsafoiewq wrote:
| SoA form also lets you trivially store additional data at
| every node, eg. if an algorithm needs a flag for each node,
| you just alloc an array for it.
| mihaic wrote:
| One change I'd make to this structure would be to pack all the
| children of a node together. That way you could find out the list
| of all children of a node with a binary search, and they'd have
| some cache-friendly properties.
|
| Inserting a child would need a memmove in O(N), but if edits are
| rare after an initial build it wouldn't be that bad.
| catlifeonmars wrote:
| I feel like atree in TFA is just giving a name to something I do
| all the time when it makes sense to so. For example, I used
| something similar recently when building a ECS entity tree for a
| game engine.
| jiggawatts wrote:
| I was about to comment that this reminds me of Aaron Hsu's Dyalog
| compiler, but it's mentioned right there in the README as one of
| the inspirations.
|
| IMHO, compilers are about an order of magnitude slower than they
| could be, and this type of tree representation could be one key
| element for fixing that.
|
| One interesting approach would be to combine egg[1] with this
| style of vectorised trees for efficient optimisation passes.
|
| [1] https://arxiv.org/pdf/2004.03082.pdf
| amluto wrote:
| > Pointers are annoying anyway.
|
| But the parent indices _are_ pointers. Not in the index-into-all-
| memory sense but in the offset-into-array sense. They're smaller,
| type safe, inherently well packed, and can't point outside the
| instance of the data structure in question if they're bounds
| checked. But I would still think of them as a sort of pointer.
|
| So this is just a tree, with only parent pointer, in SOA form,
| and iterable (in no particular order). Which is maybe useful if
| you want that specific data structure.
|
| And it's utterly useless if you want, say, a tree with children
| and need better-than-O(n) access to children. Which you probably
| do need. Of course, you can add child pointers in SOA form if you
| want.
|
| (SOA is Struct Of Arrays)
| alexchamberlain wrote:
| I think you could achieve something close to (I haven't sat
| down and thought about it) O(logN) with sorted arrays? You'd
| have the downside of inserting things in the middle, but I
| think that would be lost in cache efficiency elsewhere for many
| use cases.
| amluto wrote:
| A sorted array to less inefficiently iterate children of a
| node? This is getting ridiculous.
|
| The "no pointers" thing is cache efficient for three reasons:
|
| 1. Indices can be smaller than pointers.
|
| 2. The data is packed in memory. This means that cachelines
| will be full of relevant data, and adjacent cache lines
| (which are a bit faster than faraway lines) get used.
|
| 3. SOA form is more or less cache efficient depending on what
| you're doing with it.
|
| And that's _it_.
|
| If you need child pointers, use child pointers. If you want a
| cache-efficient tree, use a cache-efficient tree, e.g. a
| B-tree or an appropriate variant.
| alexchamberlain wrote:
| Sorted arrays can be navigated like a tree. For
| sufficiently randomised data, the middle element is
| approximately the middle of your data. You can use binary
| search. You can implement range iteration very efficiently.
| It's not wholly ridiculous.
| amluto wrote:
| Binary search into a sorted array is slow. Intro
| algorithms classes might count _operations_ , and you
| discover that about log_2(n) operations are needed, which
| is optimal in a certain sense.
|
| But those operations are horrible operations. Until you
| are log_2(cache line size / element size) steps from the
| leaves, you are hitting a different cache line each time,
| and you're doing a branch such that the next cache line
| you want depends on the branch result.
|
| So this is like log_2(n)-3 serial (cache miss, compare,
| branch, compute address operations). (Assume 8-byte
| entries -- it's much worse with bigger entries.) This not
| even close to optimal.
|
| You will find that, for reasonably large data sets (maybe
| a few hundred elements or more?), any cache-optimized
| searchable data structure will beat binary search by a
| large factor. And for small arrays (or the last stage of
| an optimized large structure), you want a vectorized
| linear search.
|
| Binary search is mostly nice because sorted arrays are
| elegant and binary search can be implemented in very
| little code. And it's easy to explain and analyze.
| stevage wrote:
| Interesting the casual references to languages K, J and Q. I have
| never heard of them, and they are hard to google. Anyone know?
| spenczar5 wrote:
| Oh boy, they are weird. They are all related to APL. Extremely
| terse, dense languges. Got some traction in actuarial and
| financial circles.
|
| Arthur Whitney is the origin of much of that family of
| languages. Bryan Cantrill's interview with him is good:
| https://dl.acm.org/doi/pdf/10.1145/1515964.1531242
| mgaunard wrote:
| K and Q are the languages behind KDB.
|
| K is the lower-level language and is a variant of APL. Q is the
| higher-level one, bringing in elements of SQL.
|
| J is another APL-like programming language, unrelated to KDB,
| by the actual creator of APL.
|
| All are easy to google by adding "programming language" to your
| query, and each has a wikipedia page.
| laszlokorte wrote:
| They are array languages. If you have never heart of array
| languages check out the code_report youtube channel.
| icsa wrote:
| See Aaron Hsu's excellent descriptions (using APL):
|
| https://www.youtube.com/watch?v=hzPd3umu78g
| https://www.youtube.com/watch?v=X5_5MtOYNos
| rav wrote:
| Lots of comments about iterating over children being O(N) for
| this style of trees. It's actually easy to generalize the atree
| design by e.g. adding pointers for "first child" and "next
| sibling" and potentially removing the parent pointers, if that's
| what you need in your application. I think the "Operations in
| psuedocode" section should simply state that there's no O(1) way
| to access children of a node - instead of recommending the O(N)
| approach, you should recommend changing the data structure to
| support the operations you need.
|
| Storing nodes in arrays and using indices for pointers is a must
| whenever you're implementing algorithms on trees. I typically
| prefer using an array of structs, putting the key and the parent
| index next to each other, instead of putting them in separate
| arrays. If you need to access the children of a node, then be
| sure to consider if you can save memory by having a separate
| structure for leaves - remember that over half of the nodes will
| be leaves, so using space to store a child pointer on both
| internal nodes and leaves can be wasteful use of memory.
| zeroCalories wrote:
| Looks nice. That said, if you want to reduce mallocs and
| encourage data locality, maybe you could try a more traditional
| tree implementation using a pool of nodes similar to thread
| pools. I've found that most of my problems related to those were
| solved by such techniques.
| OnlyMortal wrote:
| A custom pool for objects is often a good idea. Pre-allocate
| "enough" memory and dish them out as appropriate, releasing
| back to the pool when the reference count goes to zero.
|
| You can also use the pool to bound the maximum size you're able
| to allow.
| mathiasgredal wrote:
| Why is this better than an arena style allocator?
| readthenotes1 wrote:
| This reminds me of data structures used to represent data on the
| old tape devices
| nurettin wrote:
| Wow this is pretty similar to how I used to store hierarchical
| data back in my qbasic days to solve minimax problems. Didn't
| know it was called steven or whatever.
| matt3210 wrote:
| An integer which references another data location is a pointer.
| This project only replaces system pointers with home grown
| pointers
| 8372049 wrote:
| I think in this context, "pointer-free" is meant to imply
| (spatial) locality of reference, no additional memory
| allocations, address-independent and presumably memory safety
| (but I didn't read the code and may be wrong about the last
| one).
| edflsafoiewq wrote:
| This is the representation I usually see used for the tree of
| nodes for skinning 3D models. There each node has a transform,
| and the most common operation is to recompute the world transform
| for all nodes, formed by composing the transform for each node
| with the one for all of its parents. If the array is sorted so
| that parents always precede their children, that's just a
| straight loop over the arrays for i = 0,
| num_nodes do if parents[i] == -1 then
| world_xforms[i] = xforms[i] else
| world_xforms[i] = world_xforms[parents[i]] * xforms[i]
___________________________________________________________________
(page generated 2023-12-16 23:00 UTC)