[HN Gopher] Flattening ASTs and other compiler data structures (...
       ___________________________________________________________________
        
       Flattening ASTs and other compiler data structures (2023)
        
       Author : aw1621107
       Score  : 144 points
       Date   : 2025-01-10 19:23 UTC (1 days ago)
        
 (HTM) web link (www.cs.cornell.edu)
 (TXT) w3m dump (www.cs.cornell.edu)
        
       | ww520 wrote:
       | This is a fantastic idea. AST works well in an array based
       | allocation block since it has no need for freeing individual
       | nodes. It's an add-only allocation.
        
         | JonChesterfield wrote:
         | What about transforming the AST after it is built, or deriving
         | a new tree which partly aliases the original in persistent
         | structure fashion?
        
       | dmagyari wrote:
       | "Instead of allocating Expr objects willy-nilly on the heap,
       | we'll pack them into a single, contiguous array." Zig compiler
       | pipeline (AST, Zir, Air, Sema) does exactly this on all layers.
       | Not only contiguous, but instead of array-of-structs it is
       | struct-of-arrays, so walking the tree is even more cache
       | friendly. For AST see:
       | https://github.com/ziglang/zig/blob/master/lib/std/zig/Ast.z...
        
         | agumonkey wrote:
         | Makes me wonder if people in APL/J/K community have not been
         | influenced or influencing this kind of technique. IIRC Aaron
         | Hsu does tree processing through arrays (but i'm not skilled
         | enough to analyze his code)
        
           | gsf_emergency wrote:
           | Do you have a link to such an example of Aaron's code? Thank
           | you in advance!
        
             | 082349872349872 wrote:
             | https://scholarworks.iu.edu/dspace/items/3ab772c9-92c9-4f59
             | -...
             | 
             | Iverson's 1962 book also mentions tree representations, see
             | pp45-62: https://archive.org/details/aprogramminglanguage19
             | 62/page/n6...
        
             | agumonkey wrote:
             | Can't remember where exactly but he did demo his code in
             | talks/conferences with links.
        
         | gritzko wrote:
         | I work on a C dialect where _everything_ is flattened. JSON and
         | other trees in particular. Binary heaps are flat, merge sort
         | and iterator heaps are absolutely great, can build LSM
         | databases with that. Stacks, circular buffers, hash maps, etc,
         | all flat. Templated output (PHP like) is done by a flat data
         | structure.
         | 
         | https://github.com/gritzko/librdx/blob/master/abc/B.md
         | 
         | Apart from locality and lifetimes, these flat data structures
         | improve _composability_. When every data structure is a flat
         | buffer, you can mmap them or zip them or send them by the
         | network, all by the same routine. They are uniform like bricks,
         | in a sense.
        
           | rurban wrote:
           | I worked in a language where all datastructures were
           | "flattened", could be trivially serialized to disk, and read
           | in again. Called print and read. The language was called
           | lisp. All flat, just parens.
           | 
           | Some of my compilers export the AST as lisp trees. Much
           | smaller and more readable than json, and it can be executed.
           | Uniform like bricks
        
             | vanderZwan wrote:
             | > _All flat, just parens._
             | 
             | So not flat then. Prefix is not postfix. Forth, and most
             | concatenative languages, are much closer to _actually_
             | bein, flat.
             | 
             | Lisp is trivial to flatten, but that's not the same thing.
        
       | cardanome wrote:
       | Amazing article, very good advice to keep your data structures
       | flat.
       | 
       | Adding to that, it also makes editing the AST vastly more
       | efficient.
       | 
       | I have discovered that principle on my own when I worked on an
       | editor that directly operated on the AST instead of text. I found
       | manipulating the tree-style AST so painful, constantly traversing
       | the tree and all. Once I made it flat, my life was a hell lot
       | easier. You can just directly index any part of AST in linear
       | time.
        
       | emptysea wrote:
       | Rust-analyzer uses a similar technique for parsing
       | https://github.com/rust-lang/rust-analyzer/blob/master/crate...
       | which then gets fed into https://github.com/rust-analyzer/rowan
       | (lossless syntax tree)
        
         | bradrn wrote:
         | There is a longer overview of it here: https://github.com/rust-
         | lang/rust-analyzer/blob/master/docs/...
        
       | ndesaulniers wrote:
       | Cool! Carbon is doing exactly this. I had asked leads if there
       | was a paper on this approach, but they didn't have anything for
       | me. I'll send them this post!
        
         | benatkin wrote:
         | Zig uses a MultiArrayList which sounds similar
         | https://mitchellh.com/zig/parser
        
         | wrsh07 wrote:
         | Chandler discusses it in this video though!
         | https://youtu.be/ZI198eFghJk
         | 
         | You get some traversals for free with this layout (preorder,
         | reverse post order). Can search for subtrees with string
         | searching algorithms or more complex things with regex.
        
       | kazinator wrote:
       | > _Instead of allocating Expr objects willy-nilly on the heap,
       | we'll pack them into a single, contiguous array._
       | 
       | This happens naturally if you bump-allocate them in a garbage-
       | collected run-time, particularly under a copying collector. Free
       | lists also tend to co-locate because they are produced during
       | sweep phases of GC which run through heaps in order of address.
       | 
       | Don't make me bring out the L word for the billionth time.
       | 
       | > _A flat array of Exprs can make it fun and easy to implement
       | hash consing_
       | 
       | OK, it's not a case of L-ignorance, just willful neglect.
        
         | samps wrote:
         | FWIW I did acknowledge this in the article:
         | 
         | > A sufficiently smart memory allocator might achieve the same
         | thing, especially if you allocate the whole AST up front and
         | never add to it
         | 
         | > Again, a really fast malloc might be hard to compete with--
         | but you basically can't beat bump allocation on sheer
         | simplicity.
        
         | layer8 wrote:
         | And if you don't need more than 32 GB of heap space, the JVM
         | also gives you the ability to reduce reference sizes to 32
         | bits, with compressed references. (Due to alignment
         | requirements, the lower 3 bits of a pointer are zero and hence
         | do not need to be stored, which effectively gives you a 35-bit
         | address space.) Of course, the presence of object headers
         | counteracts this to a certain extent.
        
       | hencq wrote:
       | As the article mentions, this makes it quite similar to a
       | bytecode vm. I think the traditional wisdom is that an AST walker
       | is easy to write, but for speed you'd want a bytecode
       | interpreter. It'd be interesting to see how close the performance
       | gets with this flattened AST.
       | 
       | In practice I think there are more differences. E.g. AST
       | interpreters tend to pass environments around while bytecode
       | interpreters often store these on a stack (though I guess there's
       | nothing stopping you from doing this with an AST either). I
       | wonder if there's some goldilocks zone for ease of implementation
       | with decent performance.
        
         | kazinator wrote:
         | If you instead flatten the expression tree into RPN, then you
         | can execute it like that, with a stack machine.
         | 
         | I seem to recall that the Red Dragon Book ( _Compilers:
         | Principles, Techniques and Tools_ , Aho, Sethi, Ullman [1988])
         | describes a technique whereby intermediate code is represented
         | in RPN, and transformations are performed by pattern matches on
         | it.
        
           | finnh wrote:
           | The sample flat program in the post is exactly RPN, no?
        
             | samps wrote:
             | I think it would be more like RPN if it used a stack, and
             | operands were specified as relative offsets (i.e., stack
             | offsets). In the version I wrote, operands are still
             | represented as absolute offsets in the expression table.
        
       | dang wrote:
       | Related:
       | 
       |  _Flattening ASTs and other compiler data structures (2023)_ -
       | https://news.ycombinator.com/item?id=42181603 - Nov 2024 (2
       | comments)
       | 
       |  _Flattening ASTs and other compiler data structures_ -
       | https://news.ycombinator.com/item?id=36559346 - July 2023 (81
       | comments)
        
       | jgrowl wrote:
       | I thought a reddit comment on this article had an interesting
       | point:
       | 
       | https://www.reddit.com/r/rust/comments/1d3b356/my_new_favori...
       | 
       | [-]Timzhy0 3 points 7 months ago
       | 
       | Btw I think one can go a step further than the author, there is
       | no need to keep two explicit ExprRef baked in a binary node (lhs,
       | rhs). You can exploit locality, basically the AST, seen it the
       | LISP way, is just an arbitrarily nestable list, where elements
       | are atoms or other lists. Hence all you need to know is where
       | each list ends (and if it's an atom you can assume it spans one
       | node) and actually one bit to know if it is the last entry in the
       | list is quite ergonomic as well (because then you can distinguish
       | whether moving next slot in the AST means there is a sibling).
       | Basically it's easier to keep it sync while constructing and
       | takes up less memory per node. I pay 40 bits per node, stored
       | interleaved for best cache locality (some unaligned accesses but
       | I think it's still worthwhile), 8 bits for the tag, 32 for the
       | data, if data is bigger, 32 is an index into some auxiliary
       | segment (basically a ptr).
        
         | catgary wrote:
         | An arbitrarily nestable list is a tree, no?
        
       | userbinator wrote:
       | Rediscovering techniques that were somewhat well-known in the 70s
       | and 80s.
       | 
       | See also: https://en.wikipedia.org/wiki/Binary_heap
        
         | Taniwha wrote:
         | heh - I built compilers this back in the 70s because the
         | machine I was working on didn't really do pointers as a 1st
         | class data structure (B6700 algol), it's not really surprising
         | finding someone doing something similar in another language
         | that makes pointers difficult to deal with
        
         | torginus wrote:
         | Yup, and chances are if you're using a good C++ stl
         | implementation, most containers already use packed storage
         | internally. It doesn't even have a heap data structure, it uses
         | an std::vector, with a set of helper functions.
        
       | torginus wrote:
       | I guess Rust's contribution to high performance programming is
       | that its borrow checker is so loathsome that it pushes people
       | into using ideas like ECS or arenas just to not have to bother
       | with it.
        
       | Agraillo wrote:
       | About 10 years ago working with AST trees I (re)invented a flat
       | structure representing trees in a flat array. It reminds me of
       | what is described here but not exactly. In my case I needed only
       | two indices per node: total sub-region length of all the children
       | and sub-children and parent index (so no need to have indices of
       | all children). Total sub-length basically can be interpreted as
       | the index of the next sibling. With such a structure it's
       | easy/cheap to execute FindFirstChild/FindNextSibling.
       | 
       | Later the theory behind such structures was revealed as "Nested
       | set model" [1]. The article seems to not mention the internal
       | representation, but I think that the implementation should use
       | something like my solution, so fixed number of references per
       | node
       | 
       | [1] https://en.wikipedia.org/wiki/Nested_set_model
        
       | efitz wrote:
       | Did the author just reinvent Forth?
        
       | jonstewart wrote:
       | You can also be smart about memory with lexing for great winnage.
       | Have a struct for your tokens that has an enum for your token
       | type and either pointer or indices or a string_view (or a &str
       | but lol lotsa luck with the borrow checker). You can then have a
       | vector of your token structs for fast allocation and iteration
       | and you have a slice for the token back into the original input,
       | no substring copying.
        
         | uecker wrote:
         | Yes, the C FE I write (in C) does exactly this and othen utputs
         | in one pass a flattened intermediate code. I did not see at as
         | AST because it does semantic analysis during parsing, but it
         | sitll has all the information.
        
       | pfdietz wrote:
       | One advantage to this is the ease with which it handles ephemeral
       | annotations.
       | 
       | Suppose you want to attach additional information to some of the
       | nodes of the AST. Different algorithms on the AST will attach
       | different information; you don't necessarily need them all at the
       | same time or know ahead of time what you'll need.
       | 
       | With nodes, you have to have some sort of node/value hash table,
       | or hang a key/value map off each node. But with this flattened
       | representation, each datum gets its own flat array as well, which
       | can be independently allocated and deallocated.
       | 
       | One other thing I noticed about this flat representation is that
       | it throws static typing into a cocked hat! All you have to refer
       | to other nodes is indices. All different kinds of nodes are
       | stored in the same array.
        
       | Tarean wrote:
       | Twee (an equational theorem prover in Haskell used by quickspec)
       | has an interesting take on this. Terms are slices of arrays, but
       | you get a normal interface including pattern matching via
       | synonyms. It can also be nice to use phantom types of your
       | references (array offsets), so if you project them into flat view
       | types you can do so type safely
       | 
       | Requires the language to have something equivalent to pattern
       | synonyms to be as invisible as twee, though.
       | 
       | In twee a TermList is a slice of a bytearray (two ints for
       | offset/length plus a pointer).
       | 
       | And a term is an int for the function symbol and an unpacked
       | TermList for the arguments.
       | 
       | The pattern match synonyms load a flat representation from the
       | array into a view type, and the allocation of the view type
       | cancels out with the pattern matching so everything remains
       | allocation free.
       | 
       | https://hackage.haskell.org/package/twee-lib-2.4.2/docs/Twee...
        
         | Tarean wrote:
         | Forgot to mention: In the twee style, the int for the function
         | id contains metadata (is it a unification variable or constant
         | name? how many args does it take?). That way f1(f3(f5(), f7()))
         | would be serialised as something like [1,3,5,7], without even
         | references to other offsets
        
       ___________________________________________________________________
       (page generated 2025-01-11 23:02 UTC)