[HN Gopher] Implementing a Struct of Arrays
___________________________________________________________________
Implementing a Struct of Arrays
Author : mpweiher
Score : 110 points
Date : 2025-05-09 10:52 UTC (12 hours ago)
(HTM) web link (brevzin.github.io)
(TXT) w3m dump (brevzin.github.io)
| TheHideout wrote:
| So, basically it's allowing you to use a struct like you would in
| OOP, but get the array benefits of ECS when that struct is in a
| vector?
| throwawaymaths wrote:
| Isn't ECS often Type-Heterogeneous? I think you mean DoD.
| dgb23 wrote:
| It's not necessary to think about the data interface in terms
| of object orientation.
|
| You can think about it as being a composition of fields, which
| are individually stored in their respective array.
|
| (Slightly beside the point: Often they are also stored in pairs
| or larger, for example coordinates, slices and so on are almost
| always operated on in the same function or block.)
|
| The benefit comes from execution. If you have functions that
| iterate over these structs, they only need to load the arrays
| that contain the fields they need.
|
| Zig does some magic here based on comptime to make this happen
| automatically.
|
| An ECS does something similar at a fundamental level. But
| usually there's a whole bunch of additional stuff on top, such
| as mapping ids to indices, registering individual components on
| the fly, selecting a components for entities and so on. So it
| can be a lot more complicated than what is presented here and
| more stuff happens at runtime. It is also a bit of a one size
| fits all kind of deal.
|
| The article recommends watching Andrew Kelley's Talk on DoD,
| which inspired the post. I agree wholeheartedly, it's a very
| fun and interesting one.
|
| One of the key takeaways for me is that he didn't just slap on
| a design pattern (like ECS), but went to each piece
| individually, thought about memory layout, execution, trade
| offs in storing information versus recalculating, doing
| measurements and back of the envelope calculations etc.
|
| So the end result is a conglomerate of cleverly applied
| principles and learnings.
| jayd16 wrote:
| More like they used reflection to take a struct and generate a
| SOA collection for that type. Funnily enough, they skip the
| part where you can actually get at the arrays and focus on the
| struct type deconstruction and construction.
| monkeyelite wrote:
| Interest in SOA is bringing to mind the "art of the meta object
| protocol" which argues for a stage between class definition and
| implementation that would allow you to choose the layout and
| access method for instances of a class.
| corysama wrote:
| Yep. We're in a situation where C-like languages couple layout
| and access interface very tightly. But, now cache is such an
| overriding issue in optimization, you really want to rapidly
| experiment with different layouts without rewriting your whole
| algorithm every time. AOS, SOA, AOSOA, hot/cold data for
| different stages, etc...
|
| Jon Blow's Jai language famously added a feature to references
| that allowed you to easily experiment with moving data members
| between hot/cold arrays of structs.
|
| https://halide-lang.org/ tackles a related problem. It
| decouples the math to be done from the access order so as to
| allow you to rapidly test looping over data in complicated ways
| to achieve cache-friendly access patterns for your specific
| hardware target without rewriting your whole core loop every
| time.
|
| Halide is primarily an about image processing convolution
| kernels. I'm not sure how general purpose it can get.
| tialaramex wrote:
| AIUI Jai no longer has that SOA versus AOS functionality
| which Jon was very proud of when he invented it, I expect
| it's one of those "hot idea" things where for one week it
| seems as though this is a breakthrough with applications
| everywhere and then a week later you realise it was not as
| fundamental and you were just seeing applications everywhere
| due to recency illusion.
|
| The week after I first saw a Bloom Filter I imagined lots of
| places this would surely be great, in the decades since I
| probably have one bona fide use for a Bloom Filter per year,
| maybe less.
| monkeyelite wrote:
| Exactly. The SOA thing makes sense in so few cases -
| usually the one big data type your program operates on.
| jayd16 wrote:
| I suppose something like this would be considered a violation
| of OOO...
|
| That said Java might be able to naturally decompose record
| types into SoA collections without a big lift. Same for C#
| struct.
|
| You might even be able to access views types that still code
| like objects but point to the backing fields in the SoA
| collection.
| mpweiher wrote:
| Yep. Objective-S[1] takes the ideas of "the art of the
| metaobject protocol" and expands on them...using software
| architectural principles [2].
|
| One early result was the idea of _storage combinators_ [3], and
| with those, AoS/SoA pretty much just falls out.
|
| Storage Combinators are basically an interface for data. Any
| kind of data, so generalized a bit from the variations of
| "instances of a class" that you get in CLOS's MOP.
|
| If you code against that interface, it doesn't matter how your
| data is actually stored: objects, dictionaries, computed on-
| demand, environment variables, files, http servers, whatever.
| And the "combinator" part means that these can be
| stacked/combined.
|
| While you can do this using a library, and a lot is in fact
| just implemented in libraries, you need the language support so
| that things like objects/structs/variables can be accessed
| quickly using this mechanism.
|
| SoA storage for tables has been on the list of things to do for
| a long time. I just started to work on some table ideas, so
| might actually get around to it Real Soon Now(tm).
|
| Currently I am working on other aspects of the table
| abstraction, so for example just integrated query, so I ca
| write the following: invoices[{Amount>30}]
|
| And have that query work the same against an array of objects
| (also a kind of table) and a SQL database.
|
| [1] https://objective.st/
|
| [2] https://dl.acm.org/doi/10.1145/3689492.3690052
|
| [3] https://2019.splashcon.org/details/splash-2019-Onward-
| papers...
| sph wrote:
| define_aggregate(^^Pointers, nsdms(^^T)
| | std::views::transform([](std::meta::info member){
| return data_member_spec(add_pointer(type_of(member)),
| {.name = identifier_of(member)}); }));
|
| What operator is ^^type?
| diath wrote:
| https://isocpp.org/files/papers/P2996R4.html#the-reflection-...
| tialaramex wrote:
| back in R4 this operator is spelled ^ but in the accepted
| draft it was spelled ^^
| tialaramex wrote:
| This is the reflection operator. The committee+ spent some time
| bikeshedding different squiggles for this operator, but
| eventually it looks like ^^ won because it was least awful.
|
| + WG21 of SC22 of JTC1 between ISO and the IEC, "the C++
| committee".
|
| See P2996R11 for the latest draft of the work -
| https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2025/p29...
| PessimalDecimal wrote:
| I started thinking this would be a rehashing of how column-
| oriented storage formats can be far more efficient for certain
| access patterns. But it really was a showcase of what a bloated
| mess C++ has become.
| monkeyelite wrote:
| And you just do not need to be doing this. Just write out which
| columns you need and use an index. You can write generic sort
| and remove, and just overload the swap function for your SOA
| thing.
| pragma_x wrote:
| That was my takeaway as well. Pivoting a conventional
| structure in a column-major storage system just smells like a
| look-aside of some kind (a database). Plus, you lose the
| benefits of bulk copy/move that the compiler will likely give
| you in a standard struct (e.g. memcpy()), should it be on the
| big side. From there, we can use lots of other tricks to
| further speed things up: hashes, trees, bloom filters...
|
| At the same time, I don't completely understand how such a
| pivot would result in a speedup for random access. I suppose
| it would speed up sequential access, since the multiple array
| storage scheme might force many more cache line updates.
| thesz wrote:
| These days one's program rarely does purely sequential
| random access. Usually, there are several threads or
| requests, all going into same data structure.
|
| In the extreme case where all memory accesses of a
| immensely massively parallel program are deferred, the
| result is complete absence of cache misses [1].
| [1] https://disneyanimation.com/technology/hyperion/
|
| While our programs usually not ran at Hyperion scale, they
| still can benefit from accesses shared between processing
| of several requests.
|
| For one such example, consider speech recognition with the
| HCLG graph. That graph is created through composition of
| four WFST [2] graphs, the last one is Grammar, derived from
| SRILM n-gram language model. This HCLG graph has scale free
| property [3] due to all G FST states has to backoff to the
| order-0 model, and some states are visited exponentially
| often than others. [2]
| https://en.wikipedia.org/wiki/Finite-
| state_transducer#Weighted_automata [3]
| https://en.wikipedia.org/wiki/Scale-free_network
|
| By sorting HCLG graph states by their fan-ins (number of
| states referring to a particular one), we sped recognition
| process up 5%. E.g., most referred state gets index 0,
| second most referred is 1, etc, and that's it.
|
| No code change, just preprocessing step that, I believe,
| lessen the pressure on the CPU's memory protection
| subsystem.
|
| The speech recognition process with HCLG graph uses beam
| search [4], there are several (9-20) hundredths of states
| in the beam front to be evaluated. Most of them have
| outgoing arc to some of the most visited (fan-in) states.
| By having these states close to each other we incur less
| page faults exception handling during processing.
| [4] https://en.wikipedia.org/wiki/Beam_search
|
| Basically, we shared more MMU information between requests
| in beam front.
|
| PS
|
| Fun fact: WFST created by OpenFST has "structure of arrays"
| layout. ;)
| dexterous8339 wrote:
| I recently went down this rabbit hole myself and came out with
| my own column-major array type.
|
| The explanation drifts into thinking of 2D byte arrays as 3D
| bit matrices, but in the end it was a 20-30x improvement in
| speed and binary size.
|
| I was honestly surprised that C++ doesn't have anything built
| in for this, but at least it's trivial to write your own array
| type
| thechao wrote:
| I showed up to the C++ rodeo right as C++98 was taking off, and
| got to work with Jaakko & Gaby & Bjarne (at TAMU) on what would
| be C++11 (C++0x). Man -- those were great times. I still can't
| decide which craziness -- reflection or coroutines -- I detest
| worse in "modern" C++. In either case, I really feel like I'm
| channeling my inner/outer old-man-shakes-fist-at-clouds. On the
| other hand, I see languages like Nim, Zig, Rust, Swift, Go, ...
| and, I mean, _seriously_.
| mac3n wrote:
| JOVIAL language had TABLE structures, which could be declared
| SERIAL or PARALLEL
| (https://ntrl.ntis.gov/NTRL/dashboard/searchResults/titleDeta...
| page 281).
| gr4vityWall wrote:
| Interesting article. It does show how modern C++ can be quite
| scary.
|
| It reminded me of a Haxe macro used by the Dune: Spice Wars devs
| to transform an AoS into a SoA at compile time to increase
| performance: https://youtu.be/pZcKyqLcjzc?t=941
|
| The end result is quite cool, though those compile time type
| generation macros always look too magical to me. Makes me wonder
| if just getting values using an index wouldn't end up being more
| readable.
| Enthouan wrote:
| I'm definitely missing the point, but reading the article, I kept
| thinking "This would've been so much easier in C."
| trealira wrote:
| I don't see how it would be easier. Whatever you can do in C,
| you can do in C++. That said, I was also a bit confused, since
| I guess I don't keep up with the latest C++ updates enough.
| quotemstr wrote:
| No, it would be impossible in C, the "it" here being the
| _automatic_ conversion of a struct to an array of the struct 's
| fields. Sure, you can do that pretty easily _by hand_ in C, but
| you can also do it easily _by hand_ in C++, so whatever. The
| point of the article is the metaprogramming.
| zahlman wrote:
| I get the impression that ("pure") C programmers say things
| like that because they don't get themselves into situations
| where they feel they'd _want_ the metaprogramming.
| quotemstr wrote:
| They might think that they don't need metaprograming, but
| like everyone else, they eventually do.
|
| That's how we get monstrosities like
| https://github.com/freebsd/freebsd-
| src/blob/master/sys/sys/q... (plenty of Linux equivalents
| --- e.g. the tracepoint stuff)
|
| Better to use a language with thoughtful metaprogramming
| than tack it on badly later.
| gpderetta wrote:
| The use of reflection is interesting, but is there a significant
| advantage, compared to something like this:
| template<template<class> class G> struct Point {
| G<int> x; G<int> y; auto get() { return
| std::tie(x,y); } };
| template<template<template<class> class> class C> struct
| SOA { template<class T> using Id = T;
| template<class T> using Ref = T&; C<std::vector> vs;
| void push_back(C<Id> x) { std::apply([&] (auto&&...
| r) { std::apply([&](auto&&... v){ (
| (r.push_back(v)),...); }, x.get()); }, vs.get());
| } C<Ref> operator[](size_t i) { return
| std::apply([&] (auto&&... r) { return C<Ref>{ r[i]...}; },
| vs.get()); } }; int main() {
| SOA<Point> soa_point; soa_point.push_back({1,2});
| auto [x,y] = soa_point[0]; }
| OskarS wrote:
| > template<template<template<class> class> class C>
|
| Boy howdy!
| the__alchemist wrote:
| I think I've lost the thread on the abstractions. (Me not being
| very familiar with Zig outside of its most basic syntax is
| probably why.) I've been doing a lot of SoA work in rust lately;
| specifically because I have numerical/scientific code that uses
| CPU SIMD and CUDA; SoA works great for these.
|
| The workflow is, I set up Vec3x8, and Quaternionx8, which wrap a
| simd instrinsic for each field (x: f32x8, y: f32x8...) etc.
|
| For the GPU and general flattening, I just pack the args as Vecs,
| then the fn signature contains them like eps: &[f32], sigma:
| &[f32] etc. So, I'm having trouble mapping this SoA approach to
| the abstractions used in the article. Then the (C++-like CUDA)
| kernel sees these as *float3 params etc. It also feels like a
| complexity reverse of the Rust/Zig stereotypes...
|
| Examples: struct Vec3x8 { x: f32x8,
| y: f32x8, z: f32x8 } // appropriate operator
| overloads... struct Setup { eps:
| Vec<f32>, sigma: Vec<f32>, }
|
| So, Structs of Arrays, plainly. Are the abstractions used here
| something like Jai is attempting, where the internal
| implementation is decoupled from the API, so you don't compromise
| on performance vice ergonomics?
| jayd16 wrote:
| Towards the middle you realize this is about the reflection more
| than anything else.
|
| I do like how directly accessing the fields individually (the
| whole reason you would do this) is a hypothetical presented as an
| after thought. Enjoyably absurd.
| perihelions wrote:
| I attempted to write a minimal version of the idea in Common
| Lisp, if anyone was curious about how it'd look like in that
| language,
|
| https://peri.pages.dev/struct-of-arrays-snippet
| Dwedit wrote:
| Struct of Arrays vs Array of Structs have different performance,
| depending on how they are accessed, mainly due to how memory will
| be cached.
|
| If you are iterating over objects sequentially, and looking at
| every field, they will perform about the same. If you are
| iterating over objects sequentially, and only looking at a few
| fields, Struct of Arrays performs better. If you are accessing
| objects at random, and reading every field, Array of Structs
| performs better.
|
| Array of Structs has a multiplication step when you calculate the
| address of an element. Struct of Arrays basically guarantees that
| the multiplication will be by a power of 2, so it's a bitshift
| rather than a multiplication. (Unless your struct contains a type
| whose size is not a power of 2). Multiplication is still very
| fast though on most architectures, so that's not all that much of
| a difference.
| bombela wrote:
| How could it not be a power of 2?
|
| The compiler makes sure to align fields and pad structs to
| power of 2 anyways.
| tomsmeding wrote:
| If your struct contains a field that is itself a struct, e.g.
| one containing three int32_t's, then that field has size 12,
| which is not a power of two.
| compiler-guy wrote:
| Doesn't even need to be a struct within a struct:
|
| struct Foo { int8_t a; int8_t b;
| int8_t c;
|
| };
|
| will have a size of 3 and an alignment of 1. Alignment is
| always power of 2, and size modulus alignment will always
| be zero. But alignment can be less than size, and size need
| not be a power of two, just have one power-of-two factor.
| bombela wrote:
| Brain was not fully loaded yet, thank you.
|
| I don't think the struct in struct is relevant in your
| example by the way. i32 requires 4-bytes alignment. An
| array of a struct{3xi32} will be aligned to 12 bytes. Which
| maintains the 4 bytes alignment of the fields.
|
| If this structure is embedded into another one, it will be
| padded accordingly to the alignment requirement of the
| outer field.
|
| { { 3xi32 } i32 } will be 16 bytes.
|
| { { 3xi32 } i64 } will be 12, padding 4, 8. Totalling 24.
|
| I am assuming x86_64.
| compiler-guy wrote:
| Struct of arrays have the potential to be much more memory
| efficient if the original struct had padding.
|
| struct Foo { int64_t i; int8_t c;
| // 7 bytes of padding for alignment.
|
| };
|
| A Foo array will have seven bytes of wasted space per element.
| That can add up, and can even blow away what would be nice
| caching effects if looking at both elements.
|
| Access patterns do matter a lot to this, but it isn't as easy
| as describing one. Always profile.
|
| [Edit, I wish hn supported markdown.]
| WalterGR wrote:
| Lines that start with 2 spaces are monospaced.
|
| https://news.ycombinator.com/formatdoc
| silisili wrote:
| I haven't benched them, but it feels like in some languages
| bounds checking would hamper SoA performance, no?
| adzm wrote:
| Neat to see the reflection / unibrow operator ^^ in the wild!
| gwenzek wrote:
| Interesting to see the new C++ at work. But also I'm sure it's
| easier to learn Zig full language that just the new C++
| metaprogramming
___________________________________________________________________
(page generated 2025-05-09 23:01 UTC)