hngopher.com

       [HN Gopher] Writing memory efficient C structs
       ___________________________________________________________________
        
       Writing memory efficient C structs
        
       Author : aragonite
       Score  : 119 points
       Date   : 2025-07-30 13:32 UTC (9 hours ago)
        
 (HTM) web link (tomscheers.github.io)
 (TXT) w3m dump (tomscheers.github.io)
        
       | sim7c00 wrote:
       | definitely interesting points for C structs. regarding CPU it
       | might be interesting to look a little further still regarding
       | optimal sizes for structs regarding cachcing, figuring out hot
       | and cold areas and either including padding or putting cold
       | values behind pointer to a substruct.
       | 
       | like the examples though, very clear what the changes are and the
       | specific impact. always a good reminder to review for these
       | things after a session hacking in new things!
        
       | uticus wrote:
       | for another good intro to the fun with structs, Beej's guide is
       | always a good goto:
       | 
       | https://beej.us/guide/bgc/html/split-wide/structs-ii-more-fu...
       | 
       | ...goes into offsets, padding bytes, bit-field, unions, etc etc.
        
         | degenoah wrote:
         | I feel like this page would have definitely helped me when I
         | used to try reverse engineering things I had lying around on my
         | computer.
        
       | warmwaffles wrote:
       | This is also applicable outside of C as well.
        
         | mcraiha wrote:
         | "The x_position and y_position can unfortunately not get a
         | smaller type nor can a float be unsigned." If you have latest
         | C++, you can use float16_t or bfloat16_t to get smaller type
         | https://en.cppreference.com/w/cpp/types/floating-point.html
        
           | david2ndaccount wrote:
           | C23 has _Float16
        
       | rwmj wrote:
       | No mention of pahole / dwarves? https://lwn.net/Articles/335942/
       | It's the standard tool used by the Linux kernel developers.
        
         | uticus wrote:
         | for somebody starting out, where do you go for a list of tools
         | like this?
        
         | petergeoghegan wrote:
         | Recent versions of clangd show pahole-like struct
         | size/alignment padding overhead annotations. I find this built-
         | in support a lot more convenient than using pahole.
        
         | ryao wrote:
         | pahole is an awesome tool.
        
       | _rend wrote:
       | For completeness, this description of alignment is misleading:
       | 
       | > Well, dear reader, this padding is added because the CPU needs
       | memory to be aligned in sets of 4 bytes because it's optimized in
       | that fashion.
       | 
       | > ...
       | 
       | > Remember: since structs are aligned to 4 bytes, any padding is
       | therefore unnecessary if the size of the struct is a multiple of
       | 4 without the padding.
       | 
       | Individual data types have their own alignment (e.g.,
       | `bool`/`char` may be 1, `short` may be 2, `int` may be 4, `long`
       | may be 8, etc.), and the alignment of a compound type (like a
       | struct) defaults to the maximum alignment of its constituent
       | types.
       | 
       | In this article, `struct Monster` has an alignment of 4 because
       | `int` and `float` have an alignment of 4 for the author's
       | configuration. Expanding one of the `int`s to a `long` could
       | increase the alignment to 8 on some CPUs, and removing the `int`
       | and `float` fields would decrease the alignment to 1 for most
       | CPUs.
        
         | sumtechguy wrote:
         | Also keep in mind that is also all very CPU and compiler
         | specific. Had one compiler where it packed everything at 4/8,
         | usually 8. Not the 1/2/4/8 you would expect. That was because
         | the CPU would just seg fault if you didnt play nice with the
         | data access. The compiler hid a lot of it if you set the
         | packing with offsets and mem moves and shifting. It was clever
         | but slow. So they by default picked a wide enough packing that
         | removed the extra instructions at the cost of using more
         | memory. x86 was by far the most forgiving while at the time I
         | was doing it. ARM was the least forgiving (at least on the
         | platform I was using). With MIPS being OK in some cases but not
         | others.
        
           | gavinsyancey wrote:
           | Um ... isn't alignment generally dictated by the platform ABI
           | so that programs compiled by different compilers can be
           | linked together?
        
             | plorkyeran wrote:
             | The widely used platforms with multiple compilers generally
             | have one or more written down ABIs that the compilers all
             | follow, but more niche platforms frequently have exactly
             | one compiler (often a very out of date fork of gcc) that
             | just does whatever they felt like implementing and may not
             | even support linking together things built by different
             | versions of that one compiler.
        
       | david2ndaccount wrote:
       | The author relies on the compiler fitting the bitfield in unused
       | padding bytes after "speed", but that is implementation-defined
       | behavior (almost everything about bitfields is implementation
       | defined). MSVC will align the unsigned bitfield to
       | alignof(unsigned), whereas GCC and clang will use the padding
       | bytes. To portably pack the struct, use unsigned integers and use
       | flags (monster.flags & CAN_FLY instead of monster.can_fly).
       | 
       | See https://c.godbolt.org/z/K7oY77KGj
        
         | arjvik wrote:
         | Why can't there be a standard defined for bitfields in future C
         | releases? This is a long-discussed drawback of a feature that I
         | really really want to be able to use in production code.
        
           | ajross wrote:
           | There is, but it's part of the platform ABI and not the C
           | language standard. The latter specifies syntax and behavior,
           | the former is what's concerned with interoperability details
           | like memory layout.
           | 
           | I happen to have a copy of AAPCS64 in front of me, and you
           | can find the specification of bit packing in section 10.1.8.
           | The sysv ABI for x86/x86_64 has its own wording (but I'm
           | pretty sure is compatible). MSVC does something else I
           | believe, etc...
        
           | gizmo686 wrote:
           | There is. I'm not sure when it was added, but looking at
           | https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3220.pdf,
           | page 106, note 13
           | 
           | > An implementation may allocate any addressable storage unit
           | large enough to hold a bit-field. If enough space remains, a
           | bit-field that immediately follows another bit-field in a
           | structure shall be packed into adjacent bits of the same
           | unit. If insufficient space remains, whether a bit-field that
           | does not fit is put into the next unit or overlaps adjacent
           | units is implementation-defined. The order of allocation of
           | bit-fields within a unit (high-order to low-order or low-
           | order to high-order) is implementation-defined. The alignment
           | of the addressable storage unit is unspecified.
           | 
           | In practice, I've been using this feature for ages (along
           | with __attribute__((packed))). There comes a point where you
           | can start depending on compiler specific features instead of
           | relying solely on the standard.
        
             | nomel wrote:
             | Can this be standardized? It seems it would have
             | accommodate the most limited architecture, in terms of
             | memory access, probably making it practically useless.
             | Seems best left to the compiler.
        
               | wahern wrote:
               | The quoted text is from the C standard.
        
             | wahern wrote:
             | The portion you quoted derives word-for-word from the
             | original ANSI standard (C89/C90):
             | https://port70.net/~nsz/c/c89/c89-draft.html#3.5.2.1
        
           | petermclean wrote:
           | I've been working on an open source library that, while it
           | doesn't solve bitfields, can provide convenient ergonomic
           | access to bit-granular types:
           | 
           | https://github.com/PeterCDMcLean/BitLib
           | 
           | You can inherit from bit_array and create a pseudo bitfield:
           | class mybits_t : bit::bit_array<17> {         public:
           | // slice a 1 bit sized field           // returns a proxy
           | reference to the single bit           auto field1() {
           | return (*this)(0, 1);           }                // slice a 3
           | bit sized field            // returns a proxy reference to
           | the 3 bits           auto field2() {             return
           | (*this)(1, 4);           }        };       mybits_t
           | mybits(7);       assert(mybits.field1() == 1);
           | mybits.field1() = 0;       assert(static_cast<int>(mybits) ==
           | 6); //no implicit cast to integers, but explicit supported
           | 
           | There is minimal overhead depending on how aggressively the
           | compiler inlines or optimizes*
        
       | bullen wrote:
       | Adding to this you want the struct to be exactly 64 bytes, even
       | on Apple MX processors which have 128 bytes of cache.
        
       | jakey_bakey wrote:
       | Loved this, thanks!
        
       | signa11 wrote:
       | and i raise you this: https://www.catb.org/esr/structure-packing/
        
         | rzzzt wrote:
         | Hmm, the certificate for this site has a billion SANs but catb
         | .org is not one of them...
        
       | variadix wrote:
       | Some other things you can do to improve packing are using integer
       | indices/offsets/relative pointers instead of absolute pointers
       | (especially on 64-bit machines), overlapping fields for unions
       | (as an example you may have a tag that can share some bits with
       | another field in each member of the union), using a structs
       | position in memory to encode information (for example, if you
       | have a binary tree of array allocated nodes, you can specify that
       | the right child always follows the parent in memory, then you
       | only need to store the index of the left child), and of course
       | structure packing pragmas/attributes.
        
       | westurner wrote:
       | Is it possible to apply these optimizations to Arrow?
       | 
       | Arrow's struct is called StructArray. Fields of StructArray have
       | a StructType.
       | 
       | StructType has .bit_width and .byte_width attrs in Python and
       | probably the other implementations too:
       | https://arrow.apache.org/docs/python/generated/pyarrow.Struc...
       | 
       | Arrow supports bitfields with BooleanArray, and enums with
       | categoricals but
       | 
       | "BUG: Categorical columns using the PyArrow backend requires 4x
       | more memory" (open) https://github.com/pandas-
       | dev/pandas/issues/58062 :
       | 
       | > _On disk Parquet appears to store the category data as logical
       | type String which is compressed with snappy and encoded_
       | 
       | Arrow Flight RPC handles nested structs with enums over the wire
       | somehow too FWIU
        
       | Retr0id wrote:
       | I don't think I've ever seen struct bitfields used in real code,
       | it's almost always manual bit-masks against a numeric type, often
       | via macros. (I assume due to portability)
        
         | abnercoimbre wrote:
         | Yep. If I need a boolean for a struct I know I'll want more
         | than one. So I'll stick an integer type that lets me OR
         | multiple enum values (and so on).
        
         | gizmo686 wrote:
         | I see it decently often in embedded code for memory mapped IO
         | registers. Other than that, I don't think I've ever seen it
         | either.
         | 
         | I'm not sure portability is the reason though. I've seen it in
         | places that have no problem spamming #define __GNU_SOURCE
        
           | nomel wrote:
           | This is the only place I've seen it used, and the only placed
           | I've used it.
           | 
           | It's absolutely wonderful for IO.
        
         | tom_ wrote:
         | In terms of the underlying bit assingments, bitfields have been
         | pretty safe for a very long time now, as you can assume little-
         | endian. Just a question of bearing in mind the packing rules,
         | not too difficult given stdint.h. I went full bitfield shortly
         | after the time I went full size_t, about 15 years ago now.
         | 
         | Big-endian systems sort-of still exist (I do indeed have an
         | Amiga, and an Atari STe, and an Xbox 360, and they all work, so
         | they must count) - but they're only slightly less weird from a
         | practical C programming perspective than stuff like the DSPs
         | with CHAR_BIT==24, and Harvard architecture microcontrollers
         | with 2-byte ints. And so you can ignore them. People that use
         | still actually these systems will be used to it.
        
         | dirt_like wrote:
         | I have seen bit fields used in real code. Long ago. The reason
         | I know it so well is that bit field updates are NOT atomic, and
         | this caused a rare crashing bug under heavy load. Unless you
         | read the header the code looks atomic.
         | 
         | S->buffer_free = 1;
         | 
         | This is a very serious limitation if the structure/class is
         | being updated in a multi-threaded environment.
        
         | ziml77 wrote:
         | The Windows API uses bitfields like that in some places. I know
         | because I was working with the display configuration API
         | recently and found the bitfields very annoying to use through
         | generated FFI bindings (especially because they were used in
         | anonymous structs inside anonymous unions).
        
       | miguel_martin wrote:
       | >"... Note that I'm not an expert, in anything, so take
       | everything I write about with a grain of salt I guess :)."
       | 
       | >"I'm by far an expert and I could be wrong in some
       | specifications"
       | 
       | You're missing a "not" on that 2nd sentence ;)
       | 
       | In all seriousness, you don't need this disclaimer, you're
       | selling yourself short.
       | 
       | There's nothing wrong with your post to make the disclaimer.
       | 
       | And if there is something wrong with the post, let the comments
       | flood in and tell you that rather than yourself.
        
       | cogman10 wrote:
       | Nice little article.
       | 
       | The next thing to really explore is how the struct is being used.
       | Smaller is always nice, but sometimes you need it to be fast. To
       | do that, you'll want to group together fields that are commonly
       | read. For example, x, y, and speed are likely always read
       | together. So, it'd make sense to group those fields together to
       | optimize for cache reads.
       | 
       | The next thing to question is if your Monster is being interacted
       | with as an individual thing or as a group of things. If it's as a
       | group of things then it might make sense to create a "Struct of
       | arrays" of monsters. Imagine, for example, you have a system
       | which removes all the dead monsters. It'd be faster to iterate
       | through an `isAlive` array rather than an array of `Monster`
       | structs.
       | 
       | And now you are on your way to reinventing an ECS :)
        
       | jesse__ wrote:
       | Minor nitpick, 32bit max value is ~4 billion, not 4 million
        
       | zzo38computer wrote:
       | I would expect many C programmers would know these things,
       | because I do like that when writing programs in C, too (although
       | for bit fields, I will use a single field and then named bits by
       | macros). There are some other considerations as well though, such
       | as static vs dynamic allocation, and the requirements of how the
       | specific program is working.
        
       | andrewmcwatters wrote:
       | The article casually ignores the fact that more production code
       | uses bitmasking than bit fields.
        
       | cjensen wrote:
       | This is good stuff to have "in the toolbox," however habitually
       | organizing structs this way is premature optimization. Much
       | better to sensibly organize the fields and ignore alignment
       | entirely unless you have a really good reason to do this. Good
       | reasons could be that you know you will create a lot of these
       | objects, or it has become a performance issue due to cache misses
       | this would improve.
        
       | 4gotunameagain wrote:
       | Unpacked structs are also very useful for parsing network data.
       | You take a package, cast it to an unpacked struct, and can access
       | everything nicely.
       | 
       | Many a spacecraft telemetry stacks work like that.
        
       | foota wrote:
       | Some related thoughts (more for C++ than C).
       | 
       | You can eliminate the cost of a wrapping class with no members by
       | taking advantage of the empty base class optimization (or rather:
       | by avoiding members in a base class you will be taking advantage
       | of EBCO).
       | 
       | You can use an untagged union in C++ (taking care to avoid
       | undefined behavior) to implement small object optimization. In
       | it's general form, the idea is you can store some smaller struct
       | in a union (as a char[], since C++ doesn't have native unions)
       | with a larger structure, and use a sentinel in the larger
       | structure to determine which it is. E.g., if large struct has
       | some field that must not be zero, then make it zero in the
       | smaller struct and you can distinguish between the two without a
       | tag (like would be required with std::variant).
       | 
       | I'm also a huge fan in general of "make invalid states
       | unrepresentable" and following this can often reduce redundancy
       | and eliminate extra fields.
        
       | kazinator wrote:
       | Generally speaking, sort the members in descending order by
       | alignment, if you want the tightest packing without resorting to
       | nonportable compiler extensions. This descending order eliminates
       | padding between members producing the smallest offset for the
       | last byte of the last member, and that then minimizes the padding
       | required at the end of the struct.
       | 
       | In C, alignment for basic types is linked to size. A type may
       | have an alignment requirement that is as much as its size: e.g.
       | an 8 byte data type might be aligned as strictly as addresses
       | divisible by 8.
       | 
       | The alignment of an array aggregate is that of its element type.
       | 
       | The alignment of a struct is that of its most strictly aligned
       | member.
       | 
       | In portable coding, we don't exactly know, but we can make the
       | worst-case assumption that every basic type has the strictest
       | possible alignment---i.e. its size. We also don't know the sizes
       | of all the types, but we know that integers of higher ranks are
       | at least as large as lower ranks; i.e. sizeof(char) <=
       | sizeof(short) <= sizeof(int) <= sizeof(long), etc.
       | 
       | Things are tricky when you have a mixture of floating-point,
       | pointers and integers. E.g. pointer could be the same size as a
       | float, or bigger.
       | 
       | Here is a trick you use: combine together clumps of like types,
       | and try to make combinations that are powers of two. For
       | instance, if your structure has two or four pointers to things,
       | try to make them adjacent. E.g.:                 struct s {
       | void *p1, *p2, *p3, p4;         short s1, s2, s3, s4;
       | float f1, f2, f3, f4;       };
       | 
       | Even though float is almost certainly larger than short and may
       | have stricter alignment, its moot because I have four of
       | everything. Even if this is on a tiny system where pointers are
       | 16 bits wide, four of them will make 8 bytes. So when the f1
       | member is being placed into the struct, the offset is at least 8
       | byte aligned: we don't expect padding.
       | 
       | If you aggregate enough members of the same type in a power-of-
       | two sized clump, you can easily create alignment which meets or
       | exceeds the strictest alignment in the system.
       | 
       | So those are the main principles: group together members of the
       | same type. Then keeping those groups together, sort by descending
       | alignment strictness.
        
       | TinkersW wrote:
       | Could improve it by using enum MonsterName :u8, moving it down
       | near the other byte sized item, and specifying the exact type of
       | the bitfield(u8). Then it would be 16 bytes.
        
       | _mocha wrote:
       | I'm somewhat surprised to see this on the front page of HN. I
       | distinctly remember this material being covered in a single
       | lecture during our freshman CS course 20 years ago. I doubt this
       | would've reached the front page in HN's early days, and it makes
       | me wonder if we've moved away from teaching these fundamentals
       | due to the heavy reliance on abstraction and ai agents in today's
       | industry.
        
         | jerf wrote:
         | One of the neat things about programming is that you can
         | develop a high degree of skill in it without any formal
         | education at all.
         | 
         | But it does mean that we've always had those programmers who
         | have had characteristic holes in their education as a result.
         | 
         | Personally I enjoy having a mix of self-taught and formally-
         | taught around; I think there quite a few things the self-taught
         | are often better at than the formally educated as a group. I
         | respect both sides. I'm just saying that there has always been
         | a market for this sort of thing on HN because the self-taught
         | have always been around.
        
           | nomel wrote:
           | > I think there quite a few things the self-taught are often
           | better at than the formally educated as a group.
           | 
           | Would you mind expanding on this a bit?
        
         | asa400 wrote:
         | I've never taken a lick of CS in a formal setting, and I feel
         | like that's increasingly common as programming has broadened
         | its employment base. Most of the people I work with haven't
         | done formal CS, either. Consequently I've had to learn all of
         | this stuff on my own. There are communities out there that
         | value education this kind of education. I can vouch for the
         | Rust community, which has helped me learn _a ton_ about this
         | kind of "lower level" stuff over the last 5 years despite
         | starting my career doing Ruby.
        
       | o11c wrote:
       | Note that although alignment and size are platform-specific, in
       | practice padding can be minimized simply by using a fixed order:
       | (various vector types)       int128_t       long double
       | long long, int64_t, int_fast64_t, double       off_t       time_t
       | (can be 64 bits on 32-bit platforms only if off_t also is)
       | register_t, int_fast16_t, int_fast32_t (64-bit on e.g. x32)
       | void *, void(*)(), intptr_t       ptrdiff_t (theoretically wrong
       | if fully aligned and bigger than pointer)       size_t, ssize_t
       | long (32-bit on win64; generally has 16-bit alignment if bigger
       | than pointer)       int32_t, float       int (can be 16-bit)
       | short, int16_t       char, bool, int8_t, int_fast8_t       (a
       | trailing array member must come last; note when calculating
       | combined size you should use offsetof instead of sizeof to not
       | waste padding)
       | 
       | Besides this, `int_leastN_t` goes with `intN_t`, and all unsigned
       | types go with their signed types. Note that types like `id_t` are
       | hard to place however.
       | 
       | Although there do exist many platforms where this is not in order
       | of _size_ , the bigger types generally have smaller alignment on
       | such platforms. Despite this I have split out rows occasionally
       | despite not knowing of any such platform.
        
       | teo_zero wrote:
       | Neither TFA nor any comment mention using a struct of arrays
       | instead of an array of structs. This technique alone eliminates
       | all padding bytes!
        
       | herf wrote:
       | Sometimes struct-of-arrays (in-memory column store) is the right
       | solution, and you get automatic packing that way. C makes it so
       | much easier to store an array of structs that people don't do
       | this often enough, but column stores prove it has its uses.
        
       | munificent wrote:
       | This is a good article, but it bugs me that it sort of only tells
       | half the story.
       | 
       |  _Why_ are you making your structs smaller. Probably for
       | "performance". If your goal is literally just reducing memory
       | usage, then this is all fine. Maybe you're running on a small
       | microcontroller and you just straight up don't have much RAM.
       | 
       | But these days, in most areas, memory is cheap and plentiful.
       | What's expensive is _CPU cache_. If you 're optimizing memory use
       | for that, it's really about runtime _speed_ and less about total
       | memory usage. You 're trying to make things small so you have
       | fewer cache misses and the CPU doesn't get stuck waiting as much.
       | 
       | In that case, the advice here is only part of the story. Using
       | bitfields and smaller integers saves memory. But in order to
       | _access_ those fields and do anything with them, they need to get
       | loaded into registers. I 'm not an expert here, but my
       | understanding is that loading a bitfield or small int into a
       | word-sized register can sometimes have some bit of overhead, so
       | you have to pay attention to the trade-off.
       | 
       | If I was optimizing for overall runtime speed and considering
       | packing structs to do that, I'd really want to have good
       | profiling and benchmarks in place first. Otherwise it's easy to
       | think you're making progress when you're actually going
       | backwards.
        
         | zeusk wrote:
         | The real issue with packed structs and bitfields happens when
         | concurrency gets involved. Majority of modern CPU caches are
         | private and only allow one core to hold the cache line - so it
         | creates more false dependencies when cores are trying to alter
         | information that was compacted into a cache line on another
         | core.
        
         | wbl wrote:
         | Everything is microarch dependant but typically the memory
         | subsystem can handle subword loads and writes. What it usually
         | can't do is bits.
        
         | ryao wrote:
         | If you are using an intrusive data structure, keeping the key
         | on the same cache line as the pointers can make traversals
         | faster for example. Reducing padding can make this easier to
         | do.
         | 
         | In general, keeping commonly accessed together things on the
         | same cache line is a good idea and the work needed to remove
         | excessive padding overlaps with the work needed to do this.
        
       ___________________________________________________________________
       (page generated 2025-07-30 23:00 UTC)