[HN Gopher] The Lost Art of C Structure Packing (2014)
___________________________________________________________________
The Lost Art of C Structure Packing (2014)
Author : isaacimagine
Score : 78 points
Date : 2022-04-27 16:50 UTC (6 hours ago)
(HTM) web link (www.catb.org)
(TXT) w3m dump (www.catb.org)
| CalChris wrote:
| Struct packing has its points for small structs. Indeed, you can
| reduce cache use and increase cache locality. However for large
| structs, page aligned structs, the cache lines will be
| constrained to a particular set. Moreover, pointer following from
| struct to struct can incur a TLB hit; the TLB is another small
| cache. So while you may cleverly encode things to squeeze size,
| you may then watch things slow to a crawl.
|
| You are packing small structs in order to squeeze lots of them
| into the caches. However for large structs, you should at least
| consider refactoring them into small structs which you can then
| pack to your heart's content.
| Comevius wrote:
| Also related: https://media.handmade-seattle.com/practical-data-
| oriented-d...
| dang wrote:
| Related:
|
| _The Lost Art of C Structure Packing_ -
| https://news.ycombinator.com/item?id=15626205 - Nov 2017 (49
| comments)
|
| _The Lost Art of C Structure Packing (2014)_ -
| https://news.ycombinator.com/item?id=12231464 - Aug 2016 (112
| comments)
|
| _The Lost Art of C Structure Packing_ -
| https://news.ycombinator.com/item?id=9517623 - May 2015 (4
| comments)
|
| _The Lost Art of C Structure Packing_ -
| https://news.ycombinator.com/item?id=9069031 - Feb 2015 (113
| comments)
|
| _The Lost Art of C Structure Packing_ -
| https://news.ycombinator.com/item?id=6995568 - Jan 2014 (143
| comments)
| ncmncm wrote:
| It neglects to mention that bit fields have always been the
| buggiest part of C compilers, and there is never a good enough
| reason to rely on them, if you have a choice at all. Honest
| shift-and-mask operations on unsigned machine words are always
| better, if you absolutely must pack bitwise.
| camgunz wrote:
| Super agree. I read up on bit fields [0] a while ago and some
| of the details about them are bonkers:
|
| > Multiple adjacent bit-fields are usually packed together
| (although this behavior is implementation-defined)
|
| > The special unnamed bit-field of size zero can be forced to
| break up padding.
|
| > int b:3; may have the range of values 0..7 or -4..3 in C
|
| > on some platforms, bit-fields are packed left-to-right, on
| others right-to-left
|
| I wouldn't touch them unless I absolutely had to, and knew I
| could guarantee compiler and platform.
|
| [0]: https://en.cppreference.com/w/cpp/language/bit_field
| [deleted]
| mjevans wrote:
| Until I read this article that's what I thought C Bitfields
| _were_. I didn't realize the specification was so uselessly
| sloppy about alignment packing that a programmer couldn't
| reliably address specific bit field members with just a lowest
| bit first to highest bit field of exact widths. It's quite
| annoying that such is not what those are.
| scatters wrote:
| If the specification is lax, it's because the expected
| behavior is different across platforms. C (and C++) has to
| support platforms where bytes are more than 8 bits, where
| floats are non-IEEE, and (until recently) where signed
| integers are not 2's complement. If you want a specific
| behavior and don't care about portability all you have to do
| is read the ABI spec alongside the standard.
| ithinkso wrote:
| Bitfields are used a lot when you have constrained resources,
| in my case in LTE/5G both on modem and bts sides. Every
| struct's field takes as much as it needs to and you leave rest
| as 'reservedX'. You never know when a new feature will have to
| be implemented and few more bits will be needed for some new
| field.
|
| Without bitfields the code would be absolutely filled with bit-
| access macros decreasing readability and screwing with IDE's
| indexers and static analyzers big time
|
| Not to mention the pain it would be to refactor/reorder/change
| fields sizes which is relatively painless with bitfields
| zozbot234 wrote:
| The drawback though is that you can't reference individual
| fields by pointer. You basically need to write the
| equivalents of OOP getters and setters to really keep the
| code tidy. The compiler can't plan for such things on its
| own, not across multiple compilation units at any rate.
| chrisseaton wrote:
| > shift-and-mask operations on unsigned machine words are
| always better
|
| ... but that's what a bit field is?
| jcranmer wrote:
| That's not what a bit field is.
|
| A bit field (in C/C++) is a weird object type that can only
| exist in a structure or union type, which kind of acts like
| an underlying regular integral type except for those
| situations where it does not.
|
| For an example of why compilers might have issues compiling
| bit fields properly (although this requires C++, since C's
| ternary operator works on rvalues, not lvalues):
| struct A { int x: 3; int y: 5 } a; (choice ? a.x : a.y)
| = val;
|
| Enjoy making that codegen work properly.
| mwint wrote:
| Can someone explain further why this is so hard? I'm not
| familiar enough with any of this to understand without some
| help.
| jcranmer wrote:
| There's a few layers of complexity here.
|
| The first is lvalues. In compiler jargon, an lvalue is a
| kind of object that can have a value stored to it. And
| you can usually represent it as the address of some
| memory location [1]. Of course, bitfields break this
| representation: you need to know what the bit offset and
| bit size of the field you're storing is (as well as the
| signedness).
|
| The next level of complexity is the conditional operator.
| This means that, when conditional operators yield lvalues
| [2], you now end up in a situation where the lvalue now
| has a _conditional_ bit offset and bit size within the
| address. Or maybe one leg of the expression returns a
| bit-field and the other leg returns a regular int lvalue.
| Imagine how complex your datastructure needs to be to
| represent an lvalue during this code generation phase.
|
| [1] Not all lvalues need to have memory locations. But if
| you're writing a C compiler, it's an easy first
| approximation to give every variable, even those marked
| register, some memory location and rely on an
| optimization pass to convert stack memory locations into
| register locations, rather than keeping track of this
| information when the frontend does code generation.
|
| [2] As mentioned elsewhere, conditional operators in C do
| not yield lvalues. But conditional operators in C++ do.
| ncmncm wrote:
| It is just very finicky, with myriad edge cases easy to
| get wrong, and even easier to neglect to have complete
| tests for. Each target CPU design and version has quirks.
| Many involve sign extension.
| jstimpfle wrote:
| Extracting a subrange of bits and shifting them to the
| beginning is not exactly rocket science.
| ncmncm wrote:
| Experience indicates otherwise.
|
| If you _must_ use bit fields, make them unsigned. Bugs
| love to hide under signed bit fields.
| jstimpfle wrote:
| That makes sense, looking at how signed arithmetic works
| on different architectures it would feel strange to use
| signed bitfields.
|
| Unsigned bitfields are a nice way to get modular
| arithmetic with n bits without syntactic clutter.
| ncmncm wrote:
| "Unsigned bitfields are a nice ..."
|
| Appear to be. Are, when all the stars align. Are not in
| fact, often enough that you are issued a red warning you
| may ignore if you are insulated from all consequences.
| chrisseaton wrote:
| Is that valid C code? Is a ternary on an L-value an
| L-value? I'm not sure it is - regardless of bitfields or
| not?
|
| https://godbolt.org/z/aP8v5xKaz
| jcranmer wrote:
| I mentioned in the note that it's not legal C, since
| ternaries must yield rvalues in C. It is legal C++,
| however, since there ternaries may be lvalues.
| chrisseaton wrote:
| I feel like you added that after I replied.
| WalterBright wrote:
| The compiler can rewrite it as: choice ?
| ((a.x = val),a.x) : ((a.y = val),a.y);
| iainmerrick wrote:
| But, you'd have an even harder time making that work with
| mask-and-shift macros!
|
| It also doesn't seem like something that would come up very
| often. I can't think of the last time I conditionally
| stored to one of two struct fields, if I ever have.
|
| The much more normal case would be: val =
| choice ? a.x : a.y;
|
| That one seems pretty straightforward from a codegen
| perspective.
| ncmncm wrote:
| "Seems" is not the domain under discussion.
| jcranmer wrote:
| The example I gave is an example of something legal with
| bitfields (in C++) that is legitimately challenging to
| implement [1] that leads to bugs in compilers. It's not
| meant to be something that anyone is intended to use--
| indeed, I'd firmly suggest that the standard ought to
| prohibit this kind of usage.
|
| The broader point is that bitfields are actually weird
| little objects that look a lot like regular objects in
| many, but not all, contexts. And it's very easy from a
| language design or implementation perspective to forget
| to account for the possibility that you're dealing with a
| weird little object. This leads to underspecified
| language specifications and compilers that crash if you
| do something weird (but legal) such as virtually inherit
| from a struct containing a bitfield as its last member.
|
| [1] So challenging, in fact, that Clang gives an error
| message "cannot compile this conditional operator yet".
| It does work in g++, icx, and MSVC though.
| iainmerrick wrote:
| That all makes it sound like a C++ problem, not a C
| problem.
| jcranmer wrote:
| You can make a lot of the "fun" of bitfields go away with
| lvalue-to-rvalue conversion, and C tends to do this
| conversion very rapidly so that it's hard to find good
| cases for truly bizarre stuff, whereas C++ makes lvalues
| last a lot longer.
|
| Of course, if you go reach for C's standard "fun with
| lvalue" operations, you can get some crazy nonsense. What
| machine code should you generate here [1]:
| struct A { int x : 5; volatile _Atomic int y: 3; } a;
| a.y++;
|
| I will note that the intersection of volatile and
| bitfields has been another fruitful area of compiler bugs
| [2] historically speaking. While C++ does provide better
| what-the-ever-living-fuck moments for bitfields, C has
| had its fair share of issues with bitfields.
|
| [1] Whether or not you can make a bitfield _Atomic in C
| is implementation-defined, so it's possible that someone
| writes a C implementation where this is legal. I will
| note that, in a rare display of sanity, all C compilers I
| can test do in fact sensibly reject _Atomic bitfields,
| but for the purposes of argument, assume that someone has
| one where it's permitted, since it is allowable by the
| standard.
|
| [2] Or programmer bugs blamed on the compiler. This is
| the intersection of two areas that are notorious for
| underspecification to begin with, and combined with the
| general tendency of programmers to expect C compilers to
| be a thin veneer over assembly, makes it awfully
| difficult to figure out which behavior is language-
| intended.
| dfox wrote:
| I vaguely recall that gcc supports this even in C mode.
|
| The underlying problem has to do with whether the IR has
| first-class concept of arbitrary lvalue or whether the
| frontend has to convert lvalues that get passed around to
| some pointer-like thing.
|
| It might look irrelevant for discussion of low-level AOT
| compilers, but it is also interesting to compare how this
| is implemented in dynamic/"scripting" runtimes and how
| the choice of underlying implementation of the concept of
| "lvalue"/"place" influences the user visible language.
| Somewhat notably first draft of Common Lisp had something
| akin to first-class lvalues and the final standard
| replaced all that with significantly simpler mechanism
| that purely relies on macros.
| jmwilson wrote:
| With shifts and masks you know where the bits are. With
| bitfields, you don't because the specification leaves
| everything up to the compiler. struct foo {
| char a : 4; char b : 4; };
|
| Is a in the high-order 4 bits, or the lower 4 bits? Both
| choices are allowed, so it's up to the compiler and makes the
| code non-portable.
| chrisseaton wrote:
| Surely there's an ABI? Otherwise how does this work at all?
| pavlov wrote:
| Never ever use bitfields in structs that may cross
| library boundaries. There are some corners of C that are
| not fit for public APIs.
| masklinn wrote:
| > Otherwise how does this work at all?
|
| Hopes, prayers, and a single version of a single compiler
| being involved.
| [deleted]
| iainmerrick wrote:
| _With shifts and masks you know where the bits are._
|
| You know where the bits are _within a single word_. But if
| you have a struct with multiple fields, it's not safe to
| rely on the exact memory layout even if it doesn't have any
| bitfields.
|
| If you need to represent a very specific memory layout,
| it's not just bitfields you need to avoid, it's structs in
| general.
|
| Conversely, if you don't need to guarantee a specific
| layout, bitfields are fine to use, and could be a useful
| optimisation hint for the compiler.
| ncmncm wrote:
| In other words, you don't understand.
| iainmerrick wrote:
| Here's an example where I think bitfields are totally
| appropriate:
|
| Say I have a window manager, and I want to attach a bunch
| of boolean flags to each window object (isVisible,
| isMaximized, etc). I don't need to serialize them to
| disk. It's highly preferable that they should be
| efficiently bit-packed, but not strictly essential.
|
| The conservative way to implement that would be bit-
| shifts and masking (either manually or via a macro). But
| implementing it with bitfields would be a lot easier and
| less error-prone, and would work just as well. What
| problems do you see with the bitfield approach?
| dmitrygr wrote:
| sometimes you do not care, and x = foo.a
|
| is simpler than x = (foo & FOO_MASK_A) >>
| FOO_SHIFT_A
|
| and for assignments, the difference is even bigger:
| foo.a = x
|
| is much better than foo = (foo &~
| FOO_MASK_A) | ((a << FOO_SHIFT_A) & FOO_MASK_A)
| zozbot234 wrote:
| If "foo" is defined as part of an API/ABI that's used in
| multiple compile units you will always care, since
| otherwise a random change in "implementation defined"
| bitfield encodings on some obscure architecture might
| break your build. Bitfields are a misfeature in most
| real-world cases.
| InitialLastName wrote:
| The case where a) you don't care about the in-memory
| representation of your struct and b) you care a lot about
| being able to pack into the absolute minimum memory
| space, but not enough to make sure the compiler actually
| packs the fields (depending on architecture and
| optimization settings, they might not!) is vanishingly
| small.
|
| The more frequent perceived use for bit-fields (in the
| situation where they actually work) is to pack into a
| serialized data format, such that memory or a data stream
| can be accessed elsewhere. In that case, "the compiler
| can do whatever it wants with your data packing" is
| pretty useless, since your "elsewhere" might have a
| different compiler that does a totally different thing.
| dmitrygr wrote:
| > is vanishingly small.
|
| Ladies and gentlemen, this thought is why we now consider
| 8GB of ram to be a "weak device".
|
| No, no no no no, 1000 times no. Every situation is a low
| ram situation. Every!
| InitialLastName wrote:
| What I'm saying is that the case where you want to use
| less RAM for a bit field but you don't actually care if
| the compiler _allocates less then an addressable line of
| RAM for that bit field_ (because it actually just might
| not) is pretty empty.
|
| Edit: I know it's hard to read a whole sentence at once,
| but I made that same point directly up there too.
| Findecanor wrote:
| While the C and C++ _language_ specs don 't specify the
| layout of bitfields, modern platforms tend to have a
| specified _ABI_ which compilers follow when compiling for
| that platform.
|
| 64-bit Linux distros and the BSDs follow the convention
| once set by the "C ABI for Itanium".
|
| In that, bitfields are grouped in declaration order into
| container words of the same width as the bitfield's type
| (char, int, etc.). Bitfields don't span multiple container
| words, and container words don't overlap. On little-endian
| platforms, bitfields are packed LSB first, but on big-
| endian platforms they are packed MSB first within their
| container word. Alignment rules apply only to the container
| words.
| ncmncm wrote:
| That is all very fine.
|
| If the instructions emitted and the instructions
| implemented both happen to match that, on every chip your
| code must run on, you got lucky.
| [deleted]
| dfox wrote:
| The point is that if you care about the resulting in-
| memory layout then you by definition know on what
| platform the code will run and what is the ABI.
|
| If you want to produce same sequence of bytes regardless
| of underlying platform, then you have to do it by hand
| with uint8_t[] buffers and explicit shifts and masks.
| Casting pointer to struct to char* and writing it
| somewhere is inherently non-portable and this gas nothing
| to do with bitfields and nothing to do with things like
| __attributte__((packed)), although both of these things
| are useful when you want to do that and understand the
| (non-)portability implications.
| ncmncm wrote:
| Physically, yes. The difference is whether you let the
| compiler generate and hide the shift-and-mask ops, or code
| them by hand. _Normally_ it is better to leave details to the
| compiler. This is the exception to that rule.
|
| A result of people avoiding declaring bit fields in serious
| use cases has been that compiler vendors didn't worry too
| much about bitfield codegen bugs.
|
| _Probably_ Gcc and Clang are OK on x86, by now. But that
| does not carry to, e.g., obscure microcontrollers. Heaven
| help you if your bit field members are supposed to correspond
| to hardware register sub-fields.
| iainmerrick wrote:
| The same applies to structs in general, not just bitfields.
| ncmncm wrote:
| Common experience is that compilers, ABIs, and
| instruction implementations get _ordinary_ struct fields
| right.
| layer8 wrote:
| Bit fields are a specific C language feature, allowing to
| treat bit slices as a unit whose length doesn't need to
| correspond to one of the integer types. See for example
| https://docs.microsoft.com/en-us/cpp/c-language/c-bit-
| fields....
| chrisseaton wrote:
| Yeah they compile to the same machine code operations
| though. If those machine code operations aren't right as a
| bitfield then they aren't going to be right done manually
| either.
|
| https://godbolt.org/z/csvTx89EG
| layer8 wrote:
| Bit field support being buggy exactly means that they
| don't compile to the same machine code as the bit
| shifting/masking code you would write by hand (if your
| hand-written code is correct).
| alcover wrote:
| Is it not rather void bar(struct y *s,
| unsigned int foo) { s->c = (s->c & 0xf0) | foo;
| }
| tom_ wrote:
| ARM is little-endian, and by tradition bitfield bit
| indexes are assigned from least significant (bit 0 in ARM
| terms) to more significant. b occupies bits 4-7
| inclusive.
| minipci1321 wrote:
| > Yeah they compile to the same machine code operations
| though.
|
| Not always. Switch your example to AARCH64 and check out
| the BFI instruction.
| bsder wrote:
| Quite true. C bitpacking is lousy.
|
| The best "bitpacking" I have ever dealt with is the "Erlang Bit
| Syntax". I really wish more languages would adopt it.
|
| See:
| https://www.erlang.org/doc/programming_examples/bit_syntax.h...
| WalterBright wrote:
| > bit fields have always been the buggiest part of C compiler
|
| Not in my experience. The buggiest part was the preprocessor.
| You don't hear much about preprocessor bugs anymore because the
| C standard doesn't dare change it, and in 40 years people have
| finally got them working right :-/
|
| Personally, I had to scrap and rewrite the C preprocessor 3
| times to get it right.
| AdamH12113 wrote:
| Bitfields make for much more readable code when accessing
| individual fields of hardware registers, although there are
| some caveats if the registers are poorly-designed. The main one
| is that bitfield writes are usually read-modify-writes, so if
| reading the register or writing back its current value causes
| something to happen, bitfields are a no-go. But when they work,
| you get code like: old_divider =
| SpiRegs.CONFIG_REG.bit.CLK_DIVIDER;
| SpiRegs.CONFIG_REG.bit.CLK_DIVIDER = new_divider;
|
| instead of: old_divider = (SpiRegs.CONFIG_REG
| & SPI_CONFIG_CLK_DIVIDER_MASK) >> SPI_CONFIG_CLK_DIVIDER_POS;
| SpiRegs.CONFIG_REG = (SpiRegs.CONFIG_REG &
| ~SPI_CONFIG_CLK_DIVIDER_MASK) | (new_divider <<
| SPI_CONFIG_CLK_DIVIDER_POS);
|
| or the slightly nicer but even longer: config
| = SpiRegs.CONFIG_REG; old_divider = (config &
| SPI_CONFIG_CLK_DIVIDER_MASK) >> SPI_CONFIG_CLK_DIVIDER_POS;
| config &= ~SPI_CONFIG_CLK_DIVIDER_MASK; config |=
| new_divider << SPI_CONFIG_CLK_DIVIDER_POS;
| SpiRegs.CONFIG_REG = config;
|
| For anything other than hardware registers, I agree that
| they're not portable enough to rely on.
| dataflow wrote:
| It feels weird to see arguments like this when you could just
| use a language (C++ being the elephant in the room here) that
| lets you define methods, then call those methods instead.
| nomel wrote:
| A method that does will often (depending on the
| architecture) have much more overhead than a struct lookup.
| If you're doing hardware stuff, you often care about
| performance.
| dataflow wrote:
| Inlining?
| foldr wrote:
| You can equally define helper functions to update registers
| in C.
| dataflow wrote:
| Definitely, but it's more ergonomically annoying with IDE
| stuff others complained about. [1]
|
| [1] https://news.ycombinator.com/item?id=31185723
| LAC-Tech wrote:
| I remember being surprised that setting an individual bit on
| an AVR hardware register in assembly was so much shorter than
| doing all that C bit masking stuff.
| pavon wrote:
| Helper functions or macros are just as clean as the bitfield
| syntax. That said, hardware register access is one of those
| things that is intrinsically tied to a specific platform (and
| if you target multiple platforms, it will already be behind
| an abstraction layer), so you can usually know the quirks of
| how the toolchain for that platform supports bitfields, and
| use them accordingly. Still more work for people reading the
| code, though since there are a lot of hidden assumptions
| behind that deceptively simple "=" than with an explicit mask
| and shift.
| AdamH12113 wrote:
| >Helper functions or macros are just as clean as the
| bitfield syntax.
|
| They can be, if done right, but then I have to remember the
| names of all the helper functions and macros. :-) An IDE
| can auto-complete bitfield names.
|
| >Still more work for people reading the code, though since
| there are a lot of hidden assumptions behind that
| deceptively simple "=" than with an explicit mask and
| shift.
|
| Depends on the platform. IIRC on ARM a bitfield access is
| masking and shifting, only done by the compiler instead of
| me. With optimized code I often have to look at the
| disassembly anyway if I want to know what's really going
| on.
| ncmncm wrote:
| Readable code that does not necessarily execute what it says
| does nobody any favors.
| nomel wrote:
| This is why we test our code, or build in runtime checks,
| before releasing it.
| dmitrygr wrote:
| > Honest shift-and-mask operations on unsigned machine words
| are always better, if you absolutely must pack bitwise
|
| Now you go ahead and teach GCC to use the arm UBFX instruction
| for those cases. It _DOES_ use it for actual bitfields. shift +
| mask = 2-3 instructions (immediate load may be needed). UBFX is
| one.
| ncmncm wrote:
| The more complicated the instruction is, the less likely it
| was implemented to spec on all the various products and mask
| steppings you might execute on ... and the less likely its
| published definition exactly matches C or C++ Standard and
| platform C ABI specs. And, the less likely that ABI spec
| nails down all the details.
|
| Compiler implementors don't like to guess, but don't get a
| choice. If the instruction provided doesn't match the
| Standard, which do they implement? Both choices are wrong.
| [deleted]
| [deleted]
| mistrial9 wrote:
| bad compilers make bad days.. custom hardware used to (?) use
| memory locations to control/enable features.. anything from
| electronic access paths to actual servo-motors firing. Probably a
| better idea to use human-readable constructs and avoid this
| compact and tiny use pattern, IMO. If you want a tricky test for
| yourself, perhaps some actual hardware design is a better use of
| time these days?
| loup-vaillant wrote:
| I once encountered a structured that were packed, even though it
| shouldn't have been. Took me over a day to notice where the error
| came from. I was poking at the internals of a library so I could
| gather information that it had, but did not provide. There was
| this context structure I normally only could access through a
| pointer, but copying the definition of the structure into my own
| code ought to do the trick...
|
| ...except it didn't.
|
| The way the library was compiled by default made the structure
| there _smaller_ than my copy. Took me some time to guess why my
| data was all garbled, but the cause was pretty simple: there was
| no padding, even if it meant some members ended up unaligned. I
| had to replace the unaligned members by char arrays to get it to
| work (I did not dare explore the compilation options of the
| library).
|
| And then I found a totally different solution for my problem. Oh
| well.
| digikata wrote:
| Is what the article says re: the pahole utility still correct
| (that it's not maintained)? Looks like it might be maintained now
| w/ kernel git under the dwarves area.
|
| Pahole is a decent utility to look at what the packing of a
| structure actually ended up after everything has had its last
| effect in the compile chain.
| hnur wrote:
| "a technique for reducing the memory footprint of programs in
| compiled languages with C-like structures"
|
| I figure that's not the primary reason for structure packing, but
| rather for fine-grained control over writing to very specific
| memory layouts (think global descriptor table) as structs.
| UncleEntity wrote:
| I know in blender there is a compile time check to ensure the
| structs are properly packed and it has nothing to do specific
| memory layouts.
|
| I _think_ it has to do with reading /writing them to disk but
| honestly never cared enough to ask anyone. Did make things
| convenient sometimes when you could 'steal' a padding value and
| magically got backwards compatibly because the older versions
| just ignored that field (and when reading an older file just
| set it to a sane default).
| waynecochran wrote:
| I found it interesting that Pascal had a "packed" keyword and C
| didn't (outside of implementation specific attributes like
| __attribute__ ((aligned (8))) in GNU).
| layer8 wrote:
| The reason is that Pascal was used on computers with long
| machine words (e.g. 36 bits) where memory wasn't byte
| addressable. It was customary (in assembly code) to "pack"
| multiple logical fields into a single word, in particular
| multiple characters of a text string. The "packed" feature in
| Pascal was added for that purpose.
| dragontamer wrote:
| I feel like "struct of arrays" style coding has really taken off
| in the past decade, and seems to be the best way to maximize
| memory operations these days.
|
| Its not so much that "structure packing" is dead, as much as a
| wide variety of techniques have been developed above-and-beyond
| just simply structure packing. There's many ways to skin a cat
| these days, and packing your structures more intelligently is
| just one possible data optimization.
| kolbusa wrote:
| It really depends on the domain. HPC is more frequently SOA
| (think CSR sparse matrices), while AOS may make more sense in
| other cases.
| layer8 wrote:
| That entirely depends on the access patterns. SOA makes sense
| when you don't often access the different fields of the same
| object (array index) at the same time. If you do, on the other
| hand, then AOS is more efficient.
| TillE wrote:
| Right. There's a little too much cargo-culting in the "struct
| of arrays" pattern, you really want to understand why it
| works or doesn't.
|
| If you have some giant bloated struct and you only care about
| one or two fields at a time, that's one thing. But if you
| have a well-aligned, correctly packed struct and you're
| processing all its data, it's total nonsense to break that
| up.
| dragontamer wrote:
| I certainly think SOA has been cargo-culted to all hell and
| back.
|
| But empirically speaking, it seems like SOA / AOS is the
| easiest "beginner topic" to get high performance-
| programmers thinking about memory-layout issues.
|
| Maybe in the 90s or 00s, it was more popular to think about
| struct layouts, alignment issues and the like. But today,
| SOA is popular because RAM has gotten less... random... and
| more sequential.
|
| I think its the changing nature of 90s era computers (RAM
| behaving more random-accessy) vs the nature of 10s era
| computers (RAM behaving more sequential-accessy)
|
| --------
|
| Its not like the 90s techniques don't work anymore. But the
| 10s technique of "structure of arrays" and then iterating
| for-loops over your data works better with prefetchers,
| multiple-cache hierarchies, and other tidbits that have
| made RAM more sequential than ever before.
|
| Hopefully programmers continue to study the techniques and
| understand what is going on under-the-hood, instead of
| cargo-culting the pattern. Alas, we all know that cargo-
| culting works in the short term and is easier to do than
| actually learning the underlying machine!
| layer8 wrote:
| It's similar to row vs. column oriented databases.
| monocasa wrote:
| Wasn't that one of the cool things about the language Jai?
| Struct definitions could be cleanly inverted between AoS and
| SoA at use time?
| pclmulqdq wrote:
| "Struct of arrays" becoming popular may also have something to
| do with few people understanding structure packing. AOS has
| much better performance if you pack your structs well than if
| you pack them naively.
___________________________________________________________________
(page generated 2022-04-27 23:01 UTC)