[HN Gopher] Atree: A simple and efficient pointer-free tree impl...
       ___________________________________________________________________
        
       Atree: A simple and efficient pointer-free tree implementation
        
       Author : spenczar5
       Score  : 108 points
       Date   : 2023-12-16 17:52 UTC (5 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | tlack wrote:
       | Always surprising to click through a link on HN and discover it
       | is one's own work. For a time I was very interested in
       | lightweight array-based implementations of common data structures
       | and this one seemed particularly handy.
        
         | spenczar5 wrote:
         | It sounds a little like it didn't work out as well as you
         | hoped. How did it fare?
         | 
         | I am interested because I have some scientific software
         | (astrodynamics - propagating orbits in space) that would really
         | benefit from a cache-friendly tree, and this seemed promising.
        
           | ninepoints wrote:
           | It does? Feels like an O(n) scan every time you need to query
           | the children of a node is a nonstarter for most applications
           | (the readme is strangely optimistic about this point). Plenty
           | of cache friendly tree implementations out there, and this
           | one seems actually cache hostile with few benefits in my mind
           | aside from ease of implementation.
           | 
           | Also, I write a lot of code that runs on a gpu, and the claim
           | that this tree is somehow gpu friendly I find particularly
           | dubious.
        
             | jiggawatts wrote:
             | There are use cases where that doesn't matter, such as some
             | compilers where it makes a full pass over the source code
             | for every optimisation type.
        
             | spenczar5 wrote:
             | For me, N is small. Its also N-ary, not binary, which
             | crosses off a bunch of the first options. Anyway, I am not
             | sure this will work, just worth trying. Empirical numbers
             | beat theory every time :)
        
               | ninepoints wrote:
               | You are using N in a different sense than I am. Unless
               | I'm reading the tree description incorrectly, N is the
               | size of the tree itself, not the number of children.
        
               | spenczar5 wrote:
               | Oh, I was being sloppy and mixed N into the ariness: I
               | meant N elements, each with a variable number of children
               | (as many as 8).
        
               | ninepoints wrote:
               | I would hazard a guess that a regular n-ary tree would
               | outperform the OP tree in many usage scenarios with no
               | extra effort, and with a number of B+ tree variants being
               | strictly better at the cost of more effort.
        
             | mgaunard wrote:
             | Vector programming requires you to change your way of
             | thinking; instead of computing something for 1 element, you
             | compute it for N elements.
        
               | ninepoints wrote:
               | I do lots of simd and shader programming, but regardless
               | of register width, O(n) is not O(1)
        
               | mgaunard wrote:
               | The point is that you shouldn't try to get all the
               | children of a single node, but rather all the children of
               | all the nodes, which is still O(n) and not 0(n^2).
        
             | a1369209993 wrote:
             | > Feels like an O(n) scan every time you need to query the
             | children of a node is a nonstarter for most applications
             | (the readme is strangely optimistic about this point).
             | 
             | The trick is that querying the children of N different
             | nodes is usually _still_ a single O(N) scan, so if you
             | operate on it array-style (which APL heavily encourages
             | anyway), it 's amortized constant time. Of course that's
             | not _always_ viable, but APL programmers tend to be
             | surprisingly good at making array operations out of things
             | you wouldn 't expect to use arrays for.
             | 
             | > cache hostile
             | 
             | If you additionally enforce that all parent indexes point
             | to lower numbers, a preorder traversal is a linear scan
             | forward, and a postorder traversal with child order
             | reversed (which you can usually correct for one way or
             | another) is a linear scan backward.
             | 
             | (This assumes you only need dependency ordering, ie the
             | parent node uses or supplies data to/from its children; if
             | you need a true sequential traversal, the array has to be
             | sorted according to that traversal (but is still a valid
             | Apter Tree).)
             | 
             | > the claim that this tree is somehow gpu friendly I find
             | particularly dubious
             | 
             | Yeah, array programming is generally kind of hit-or-miss at
             | that and this does _look_ like a miss.
        
               | ninepoints wrote:
               | The linear scan you are talking about I don't think gives
               | you any sort of ordered traversal right? Unless I'm
               | missing something.
        
               | a1369209993 wrote:
               | For a _arbitrary_ Apter Tree, a linear scan is unordered.
               | You can impose additional constraints to get a ordered
               | traversal (in the same way that, eg, you can sort a
               | assoc-list /JSON-style key-value table by keys to get a
               | key-order traversal), and the result is still a valid
               | Apter Tree (respectively valid list of key-value pairs).
        
           | tlack wrote:
           | I still like that setup for using trees in low level
           | languages.
           | 
           | But personally I've been working at higher levels of the
           | stack the last few years, where these kinds of decisions seem
           | less important.
           | 
           | And on another level, it seems like coders in general aren't
           | that interested in vector oriented languages and techniques
           | which makes their study somewhat isolating.
        
             | fwsgonzo wrote:
             | It is always like that when you venture off the well
             | trodden path. I am studying low latency emulation and it's
             | also isolating.
        
         | 8372049 wrote:
         | Friendly heads up, "pseudo" is frequently misspelled "psuedo"
         | in the readme.
        
       | rewmie wrote:
       | It might be of interest to anyone that there's an implicit binary
       | tree data structure dubbed Eytzinger's tree/method that only
       | requires a single vector.
       | 
       | https://opendatastructures.org/ods-cpp/10_1_Implicit_Binary_...
       | 
       | I dare say that no tree data structure beats Eytzinger's method
       | in cache locality.
        
         | ulatich wrote:
         | There is also the "Binary Ostensibly-Implicit Tree". Like
         | Eytzinger's tree/method but without the need to pad memory.
         | Generalises to n-ary trees too.
        
         | amluto wrote:
         | > I dare say that no tree data structure beats Eytzinger's
         | method in cache locality
         | 
         | Cache locality is good toward the leaves and bad everywhere
         | else. This is why binary search on a sorted array is actually
         | quite slow.
         | 
         | ORC in Linux, for example, was first prototyped as a sorted
         | array of little structures. It got _much_ faster by adding an
         | auxiliary index giving O(1) access to a credible starting point
         | for a search.
        
       | gumby wrote:
       | A pointer is simply an index into an array which is RAM.
        
         | lelandbatey wrote:
         | Conceptually, yes. There are a lot of finer points to this
         | concept though. One of the points most relevanthere is that
         | accessing "RAM" is not _actually_ an O(1) operation, due to CPU
         | caches which can speed up sequential access of "close together"
         | data. Thus, if you manually ensure that data is "close
         | together" (by putting it all in an array together), you can
         | massively speed up your program in the real world because now
         | all your keys/data can be squeezed into the cache at once.
        
           | gumby wrote:
           | You can allocate your data the same way, especially using
           | custom allocators in C++. And every reference requires some
           | arithmetic too, if you're worrying about that.
           | 
           | But my point is that storing them in another array doesn't
           | buy you a lot if you allow deallocations (you can then still
           | have a use-after-free, or even read the wrong object) while
           | if you don't deallocate you might as well just use pointers.
           | 
           | Just trying to understand the benefit of going to all this
           | work. I can see the point in old FORTRAN code before FORTRAN
           | 90
        
             | lelandbatey wrote:
             | I think it's mostly just an experiment, example, proof-of-
             | concept or other "toy" implementation
             | demonstrating/exploring the idea.
        
               | gumby wrote:
               | Yeah, no judgement on that.
               | 
               | But I've worked with indexed classes (mixin class) to
               | avoid pointers in the past and have never found them
               | worth the effort. Since someone else is mentioning it
               | here on HN I'm asking if there is an advantage over
               | pointers that I don't realise -- if so I might be glad to
               | take advantage of it myself.
        
         | zappb wrote:
         | Pointers are a bit more abstract than that. Since each program
         | has a virtual address space, and said address space is made up
         | of memory pages, even raw pointers need to be translated to the
         | physical address via TLBs. In that sense, indices into arrays
         | are more like pointers than the other way around.
        
         | mrkeen wrote:
         | But they point into _your_ RAM, which means you can 't store
         | them on disk or pass them over the network.
        
           | gumby wrote:
           | A minor but legitimate benefit.
        
           | edflsafoiewq wrote:
           | SoA form also lets you trivially store additional data at
           | every node, eg. if an algorithm needs a flag for each node,
           | you just alloc an array for it.
        
       | mihaic wrote:
       | One change I'd make to this structure would be to pack all the
       | children of a node together. That way you could find out the list
       | of all children of a node with a binary search, and they'd have
       | some cache-friendly properties.
       | 
       | Inserting a child would need a memmove in O(N), but if edits are
       | rare after an initial build it wouldn't be that bad.
        
       | catlifeonmars wrote:
       | I feel like atree in TFA is just giving a name to something I do
       | all the time when it makes sense to so. For example, I used
       | something similar recently when building a ECS entity tree for a
       | game engine.
        
       | jiggawatts wrote:
       | I was about to comment that this reminds me of Aaron Hsu's Dyalog
       | compiler, but it's mentioned right there in the README as one of
       | the inspirations.
       | 
       | IMHO, compilers are about an order of magnitude slower than they
       | could be, and this type of tree representation could be one key
       | element for fixing that.
       | 
       | One interesting approach would be to combine egg[1] with this
       | style of vectorised trees for efficient optimisation passes.
       | 
       | [1] https://arxiv.org/pdf/2004.03082.pdf
        
       | amluto wrote:
       | > Pointers are annoying anyway.
       | 
       | But the parent indices _are_ pointers. Not in the index-into-all-
       | memory sense but in the offset-into-array sense. They're smaller,
       | type safe, inherently well packed, and can't point outside the
       | instance of the data structure in question if they're bounds
       | checked. But I would still think of them as a sort of pointer.
       | 
       | So this is just a tree, with only parent pointer, in SOA form,
       | and iterable (in no particular order). Which is maybe useful if
       | you want that specific data structure.
       | 
       | And it's utterly useless if you want, say, a tree with children
       | and need better-than-O(n) access to children. Which you probably
       | do need. Of course, you can add child pointers in SOA form if you
       | want.
       | 
       | (SOA is Struct Of Arrays)
        
         | alexchamberlain wrote:
         | I think you could achieve something close to (I haven't sat
         | down and thought about it) O(logN) with sorted arrays? You'd
         | have the downside of inserting things in the middle, but I
         | think that would be lost in cache efficiency elsewhere for many
         | use cases.
        
           | amluto wrote:
           | A sorted array to less inefficiently iterate children of a
           | node? This is getting ridiculous.
           | 
           | The "no pointers" thing is cache efficient for three reasons:
           | 
           | 1. Indices can be smaller than pointers.
           | 
           | 2. The data is packed in memory. This means that cachelines
           | will be full of relevant data, and adjacent cache lines
           | (which are a bit faster than faraway lines) get used.
           | 
           | 3. SOA form is more or less cache efficient depending on what
           | you're doing with it.
           | 
           | And that's _it_.
           | 
           | If you need child pointers, use child pointers. If you want a
           | cache-efficient tree, use a cache-efficient tree, e.g. a
           | B-tree or an appropriate variant.
        
             | alexchamberlain wrote:
             | Sorted arrays can be navigated like a tree. For
             | sufficiently randomised data, the middle element is
             | approximately the middle of your data. You can use binary
             | search. You can implement range iteration very efficiently.
             | It's not wholly ridiculous.
        
               | amluto wrote:
               | Binary search into a sorted array is slow. Intro
               | algorithms classes might count _operations_ , and you
               | discover that about log_2(n) operations are needed, which
               | is optimal in a certain sense.
               | 
               | But those operations are horrible operations. Until you
               | are log_2(cache line size / element size) steps from the
               | leaves, you are hitting a different cache line each time,
               | and you're doing a branch such that the next cache line
               | you want depends on the branch result.
               | 
               | So this is like log_2(n)-3 serial (cache miss, compare,
               | branch, compute address operations). (Assume 8-byte
               | entries -- it's much worse with bigger entries.) This not
               | even close to optimal.
               | 
               | You will find that, for reasonably large data sets (maybe
               | a few hundred elements or more?), any cache-optimized
               | searchable data structure will beat binary search by a
               | large factor. And for small arrays (or the last stage of
               | an optimized large structure), you want a vectorized
               | linear search.
               | 
               | Binary search is mostly nice because sorted arrays are
               | elegant and binary search can be implemented in very
               | little code. And it's easy to explain and analyze.
        
       | stevage wrote:
       | Interesting the casual references to languages K, J and Q. I have
       | never heard of them, and they are hard to google. Anyone know?
        
         | spenczar5 wrote:
         | Oh boy, they are weird. They are all related to APL. Extremely
         | terse, dense languges. Got some traction in actuarial and
         | financial circles.
         | 
         | Arthur Whitney is the origin of much of that family of
         | languages. Bryan Cantrill's interview with him is good:
         | https://dl.acm.org/doi/pdf/10.1145/1515964.1531242
        
         | mgaunard wrote:
         | K and Q are the languages behind KDB.
         | 
         | K is the lower-level language and is a variant of APL. Q is the
         | higher-level one, bringing in elements of SQL.
         | 
         | J is another APL-like programming language, unrelated to KDB,
         | by the actual creator of APL.
         | 
         | All are easy to google by adding "programming language" to your
         | query, and each has a wikipedia page.
        
         | laszlokorte wrote:
         | They are array languages. If you have never heart of array
         | languages check out the code_report youtube channel.
        
       | icsa wrote:
       | See Aaron Hsu's excellent descriptions (using APL):
       | 
       | https://www.youtube.com/watch?v=hzPd3umu78g
       | https://www.youtube.com/watch?v=X5_5MtOYNos
        
       | rav wrote:
       | Lots of comments about iterating over children being O(N) for
       | this style of trees. It's actually easy to generalize the atree
       | design by e.g. adding pointers for "first child" and "next
       | sibling" and potentially removing the parent pointers, if that's
       | what you need in your application. I think the "Operations in
       | psuedocode" section should simply state that there's no O(1) way
       | to access children of a node - instead of recommending the O(N)
       | approach, you should recommend changing the data structure to
       | support the operations you need.
       | 
       | Storing nodes in arrays and using indices for pointers is a must
       | whenever you're implementing algorithms on trees. I typically
       | prefer using an array of structs, putting the key and the parent
       | index next to each other, instead of putting them in separate
       | arrays. If you need to access the children of a node, then be
       | sure to consider if you can save memory by having a separate
       | structure for leaves - remember that over half of the nodes will
       | be leaves, so using space to store a child pointer on both
       | internal nodes and leaves can be wasteful use of memory.
        
       | zeroCalories wrote:
       | Looks nice. That said, if you want to reduce mallocs and
       | encourage data locality, maybe you could try a more traditional
       | tree implementation using a pool of nodes similar to thread
       | pools. I've found that most of my problems related to those were
       | solved by such techniques.
        
         | OnlyMortal wrote:
         | A custom pool for objects is often a good idea. Pre-allocate
         | "enough" memory and dish them out as appropriate, releasing
         | back to the pool when the reference count goes to zero.
         | 
         | You can also use the pool to bound the maximum size you're able
         | to allow.
        
           | mathiasgredal wrote:
           | Why is this better than an arena style allocator?
        
       | readthenotes1 wrote:
       | This reminds me of data structures used to represent data on the
       | old tape devices
        
       | nurettin wrote:
       | Wow this is pretty similar to how I used to store hierarchical
       | data back in my qbasic days to solve minimax problems. Didn't
       | know it was called steven or whatever.
        
       | matt3210 wrote:
       | An integer which references another data location is a pointer.
       | This project only replaces system pointers with home grown
       | pointers
        
         | 8372049 wrote:
         | I think in this context, "pointer-free" is meant to imply
         | (spatial) locality of reference, no additional memory
         | allocations, address-independent and presumably memory safety
         | (but I didn't read the code and may be wrong about the last
         | one).
        
       | edflsafoiewq wrote:
       | This is the representation I usually see used for the tree of
       | nodes for skinning 3D models. There each node has a transform,
       | and the most common operation is to recompute the world transform
       | for all nodes, formed by composing the transform for each node
       | with the one for all of its parents. If the array is sorted so
       | that parents always precede their children, that's just a
       | straight loop over the arrays                  for i = 0,
       | num_nodes do          if parents[i] == -1 then
       | world_xforms[i] = xforms[i]          else
       | world_xforms[i] = world_xforms[parents[i]] * xforms[i]
        
       ___________________________________________________________________
       (page generated 2023-12-16 23:00 UTC)