[HN Gopher] Flattening ASTs (and Other Compiler Data Structures)
___________________________________________________________________
Flattening ASTs (and Other Compiler Data Structures)
Author : aw1621107
Score : 40 points
Date : 2025-01-10 19:23 UTC (3 hours ago)
(HTM) web link (www.cs.cornell.edu)
(TXT) w3m dump (www.cs.cornell.edu)
| ww520 wrote:
| This is a fantastic idea. AST works well in an array based
| allocation block since it has no need for freeing individual
| nodes. It's an add-only allocation.
| dmagyari wrote:
| "Instead of allocating Expr objects willy-nilly on the heap,
| we'll pack them into a single, contiguous array." Zig compiler
| pipeline (AST, Zir, Air, Sema) does exactly this on all layers.
| Not only contiguous, but instead of array-of-structs it is
| struct-of-arrays, so walking the tree is even more cache
| friendly. For AST see:
| https://github.com/ziglang/zig/blob/master/lib/std/zig/Ast.z...
| cardanome wrote:
| Amazing article, very good advice to keep your data structures
| flat.
|
| Adding to that, it also makes editing the AST vastly more
| efficient.
|
| I have discovered that principle on my own when I worked on an
| editor that directly operated on the AST instead of text. I found
| manipulating the tree-style AST so painful, constantly traversing
| the tree and all. Once I made it flat, my life was a hell lot
| easier. You can just directly index any part of AST in linear
| time.
| emptysea wrote:
| Rust-analyzer uses a similar technique for parsing
| https://github.com/rust-lang/rust-analyzer/blob/master/crate...
| which then gets fed into https://github.com/rust-analyzer/rowan
| (lossless syntax tree)
| ndesaulniers wrote:
| Cool! Carbon is doing exactly this. I had asked leads if there
| was a paper on this approach, but they didn't have anything for
| me. I'll send them this post!
| kazinator wrote:
| > _Instead of allocating Expr objects willy-nilly on the heap,
| we'll pack them into a single, contiguous array._
|
| This happens naturally if you bump-allocate them in a garbage-
| collected run-time, particularly under a copying collector. Free
| lists also tend to co-locate because they are produced during
| sweep phases of GC which run through heaps in order of address.
|
| Don't make me bring out the L word for the billionth time.
|
| > _A flat array of Exprs can make it fun and easy to implement
| hash consing_
|
| OK, it's not a case of L-ignorance, just willful neglect.
| samps wrote:
| FWIW I did acknowledge this in the article:
|
| > A sufficiently smart memory allocator might achieve the same
| thing, especially if you allocate the whole AST up front and
| never add to it
|
| > Again, a really fast malloc might be hard to compete with--
| but you basically can't beat bump allocation on sheer
| simplicity.
___________________________________________________________________
(page generated 2025-01-10 23:00 UTC)