[HN Gopher] Writing memory efficient C structs
___________________________________________________________________
Writing memory efficient C structs
Author : aragonite
Score : 119 points
Date : 2025-07-30 13:32 UTC (9 hours ago)
(HTM) web link (tomscheers.github.io)
(TXT) w3m dump (tomscheers.github.io)
| sim7c00 wrote:
| definitely interesting points for C structs. regarding CPU it
| might be interesting to look a little further still regarding
| optimal sizes for structs regarding cachcing, figuring out hot
| and cold areas and either including padding or putting cold
| values behind pointer to a substruct.
|
| like the examples though, very clear what the changes are and the
| specific impact. always a good reminder to review for these
| things after a session hacking in new things!
| uticus wrote:
| for another good intro to the fun with structs, Beej's guide is
| always a good goto:
|
| https://beej.us/guide/bgc/html/split-wide/structs-ii-more-fu...
|
| ...goes into offsets, padding bytes, bit-field, unions, etc etc.
| degenoah wrote:
| I feel like this page would have definitely helped me when I
| used to try reverse engineering things I had lying around on my
| computer.
| warmwaffles wrote:
| This is also applicable outside of C as well.
| mcraiha wrote:
| "The x_position and y_position can unfortunately not get a
| smaller type nor can a float be unsigned." If you have latest
| C++, you can use float16_t or bfloat16_t to get smaller type
| https://en.cppreference.com/w/cpp/types/floating-point.html
| david2ndaccount wrote:
| C23 has _Float16
| rwmj wrote:
| No mention of pahole / dwarves? https://lwn.net/Articles/335942/
| It's the standard tool used by the Linux kernel developers.
| uticus wrote:
| for somebody starting out, where do you go for a list of tools
| like this?
| petergeoghegan wrote:
| Recent versions of clangd show pahole-like struct
| size/alignment padding overhead annotations. I find this built-
| in support a lot more convenient than using pahole.
| ryao wrote:
| pahole is an awesome tool.
| _rend wrote:
| For completeness, this description of alignment is misleading:
|
| > Well, dear reader, this padding is added because the CPU needs
| memory to be aligned in sets of 4 bytes because it's optimized in
| that fashion.
|
| > ...
|
| > Remember: since structs are aligned to 4 bytes, any padding is
| therefore unnecessary if the size of the struct is a multiple of
| 4 without the padding.
|
| Individual data types have their own alignment (e.g.,
| `bool`/`char` may be 1, `short` may be 2, `int` may be 4, `long`
| may be 8, etc.), and the alignment of a compound type (like a
| struct) defaults to the maximum alignment of its constituent
| types.
|
| In this article, `struct Monster` has an alignment of 4 because
| `int` and `float` have an alignment of 4 for the author's
| configuration. Expanding one of the `int`s to a `long` could
| increase the alignment to 8 on some CPUs, and removing the `int`
| and `float` fields would decrease the alignment to 1 for most
| CPUs.
| sumtechguy wrote:
| Also keep in mind that is also all very CPU and compiler
| specific. Had one compiler where it packed everything at 4/8,
| usually 8. Not the 1/2/4/8 you would expect. That was because
| the CPU would just seg fault if you didnt play nice with the
| data access. The compiler hid a lot of it if you set the
| packing with offsets and mem moves and shifting. It was clever
| but slow. So they by default picked a wide enough packing that
| removed the extra instructions at the cost of using more
| memory. x86 was by far the most forgiving while at the time I
| was doing it. ARM was the least forgiving (at least on the
| platform I was using). With MIPS being OK in some cases but not
| others.
| gavinsyancey wrote:
| Um ... isn't alignment generally dictated by the platform ABI
| so that programs compiled by different compilers can be
| linked together?
| plorkyeran wrote:
| The widely used platforms with multiple compilers generally
| have one or more written down ABIs that the compilers all
| follow, but more niche platforms frequently have exactly
| one compiler (often a very out of date fork of gcc) that
| just does whatever they felt like implementing and may not
| even support linking together things built by different
| versions of that one compiler.
| david2ndaccount wrote:
| The author relies on the compiler fitting the bitfield in unused
| padding bytes after "speed", but that is implementation-defined
| behavior (almost everything about bitfields is implementation
| defined). MSVC will align the unsigned bitfield to
| alignof(unsigned), whereas GCC and clang will use the padding
| bytes. To portably pack the struct, use unsigned integers and use
| flags (monster.flags & CAN_FLY instead of monster.can_fly).
|
| See https://c.godbolt.org/z/K7oY77KGj
| arjvik wrote:
| Why can't there be a standard defined for bitfields in future C
| releases? This is a long-discussed drawback of a feature that I
| really really want to be able to use in production code.
| ajross wrote:
| There is, but it's part of the platform ABI and not the C
| language standard. The latter specifies syntax and behavior,
| the former is what's concerned with interoperability details
| like memory layout.
|
| I happen to have a copy of AAPCS64 in front of me, and you
| can find the specification of bit packing in section 10.1.8.
| The sysv ABI for x86/x86_64 has its own wording (but I'm
| pretty sure is compatible). MSVC does something else I
| believe, etc...
| gizmo686 wrote:
| There is. I'm not sure when it was added, but looking at
| https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3220.pdf,
| page 106, note 13
|
| > An implementation may allocate any addressable storage unit
| large enough to hold a bit-field. If enough space remains, a
| bit-field that immediately follows another bit-field in a
| structure shall be packed into adjacent bits of the same
| unit. If insufficient space remains, whether a bit-field that
| does not fit is put into the next unit or overlaps adjacent
| units is implementation-defined. The order of allocation of
| bit-fields within a unit (high-order to low-order or low-
| order to high-order) is implementation-defined. The alignment
| of the addressable storage unit is unspecified.
|
| In practice, I've been using this feature for ages (along
| with __attribute__((packed))). There comes a point where you
| can start depending on compiler specific features instead of
| relying solely on the standard.
| nomel wrote:
| Can this be standardized? It seems it would have
| accommodate the most limited architecture, in terms of
| memory access, probably making it practically useless.
| Seems best left to the compiler.
| wahern wrote:
| The quoted text is from the C standard.
| wahern wrote:
| The portion you quoted derives word-for-word from the
| original ANSI standard (C89/C90):
| https://port70.net/~nsz/c/c89/c89-draft.html#3.5.2.1
| petermclean wrote:
| I've been working on an open source library that, while it
| doesn't solve bitfields, can provide convenient ergonomic
| access to bit-granular types:
|
| https://github.com/PeterCDMcLean/BitLib
|
| You can inherit from bit_array and create a pseudo bitfield:
| class mybits_t : bit::bit_array<17> { public:
| // slice a 1 bit sized field // returns a proxy
| reference to the single bit auto field1() {
| return (*this)(0, 1); } // slice a 3
| bit sized field // returns a proxy reference to
| the 3 bits auto field2() { return
| (*this)(1, 4); } }; mybits_t
| mybits(7); assert(mybits.field1() == 1);
| mybits.field1() = 0; assert(static_cast<int>(mybits) ==
| 6); //no implicit cast to integers, but explicit supported
|
| There is minimal overhead depending on how aggressively the
| compiler inlines or optimizes*
| bullen wrote:
| Adding to this you want the struct to be exactly 64 bytes, even
| on Apple MX processors which have 128 bytes of cache.
| jakey_bakey wrote:
| Loved this, thanks!
| signa11 wrote:
| and i raise you this: https://www.catb.org/esr/structure-packing/
| rzzzt wrote:
| Hmm, the certificate for this site has a billion SANs but catb
| .org is not one of them...
| variadix wrote:
| Some other things you can do to improve packing are using integer
| indices/offsets/relative pointers instead of absolute pointers
| (especially on 64-bit machines), overlapping fields for unions
| (as an example you may have a tag that can share some bits with
| another field in each member of the union), using a structs
| position in memory to encode information (for example, if you
| have a binary tree of array allocated nodes, you can specify that
| the right child always follows the parent in memory, then you
| only need to store the index of the left child), and of course
| structure packing pragmas/attributes.
| westurner wrote:
| Is it possible to apply these optimizations to Arrow?
|
| Arrow's struct is called StructArray. Fields of StructArray have
| a StructType.
|
| StructType has .bit_width and .byte_width attrs in Python and
| probably the other implementations too:
| https://arrow.apache.org/docs/python/generated/pyarrow.Struc...
|
| Arrow supports bitfields with BooleanArray, and enums with
| categoricals but
|
| "BUG: Categorical columns using the PyArrow backend requires 4x
| more memory" (open) https://github.com/pandas-
| dev/pandas/issues/58062 :
|
| > _On disk Parquet appears to store the category data as logical
| type String which is compressed with snappy and encoded_
|
| Arrow Flight RPC handles nested structs with enums over the wire
| somehow too FWIU
| Retr0id wrote:
| I don't think I've ever seen struct bitfields used in real code,
| it's almost always manual bit-masks against a numeric type, often
| via macros. (I assume due to portability)
| abnercoimbre wrote:
| Yep. If I need a boolean for a struct I know I'll want more
| than one. So I'll stick an integer type that lets me OR
| multiple enum values (and so on).
| gizmo686 wrote:
| I see it decently often in embedded code for memory mapped IO
| registers. Other than that, I don't think I've ever seen it
| either.
|
| I'm not sure portability is the reason though. I've seen it in
| places that have no problem spamming #define __GNU_SOURCE
| nomel wrote:
| This is the only place I've seen it used, and the only placed
| I've used it.
|
| It's absolutely wonderful for IO.
| tom_ wrote:
| In terms of the underlying bit assingments, bitfields have been
| pretty safe for a very long time now, as you can assume little-
| endian. Just a question of bearing in mind the packing rules,
| not too difficult given stdint.h. I went full bitfield shortly
| after the time I went full size_t, about 15 years ago now.
|
| Big-endian systems sort-of still exist (I do indeed have an
| Amiga, and an Atari STe, and an Xbox 360, and they all work, so
| they must count) - but they're only slightly less weird from a
| practical C programming perspective than stuff like the DSPs
| with CHAR_BIT==24, and Harvard architecture microcontrollers
| with 2-byte ints. And so you can ignore them. People that use
| still actually these systems will be used to it.
| dirt_like wrote:
| I have seen bit fields used in real code. Long ago. The reason
| I know it so well is that bit field updates are NOT atomic, and
| this caused a rare crashing bug under heavy load. Unless you
| read the header the code looks atomic.
|
| S->buffer_free = 1;
|
| This is a very serious limitation if the structure/class is
| being updated in a multi-threaded environment.
| ziml77 wrote:
| The Windows API uses bitfields like that in some places. I know
| because I was working with the display configuration API
| recently and found the bitfields very annoying to use through
| generated FFI bindings (especially because they were used in
| anonymous structs inside anonymous unions).
| miguel_martin wrote:
| >"... Note that I'm not an expert, in anything, so take
| everything I write about with a grain of salt I guess :)."
|
| >"I'm by far an expert and I could be wrong in some
| specifications"
|
| You're missing a "not" on that 2nd sentence ;)
|
| In all seriousness, you don't need this disclaimer, you're
| selling yourself short.
|
| There's nothing wrong with your post to make the disclaimer.
|
| And if there is something wrong with the post, let the comments
| flood in and tell you that rather than yourself.
| cogman10 wrote:
| Nice little article.
|
| The next thing to really explore is how the struct is being used.
| Smaller is always nice, but sometimes you need it to be fast. To
| do that, you'll want to group together fields that are commonly
| read. For example, x, y, and speed are likely always read
| together. So, it'd make sense to group those fields together to
| optimize for cache reads.
|
| The next thing to question is if your Monster is being interacted
| with as an individual thing or as a group of things. If it's as a
| group of things then it might make sense to create a "Struct of
| arrays" of monsters. Imagine, for example, you have a system
| which removes all the dead monsters. It'd be faster to iterate
| through an `isAlive` array rather than an array of `Monster`
| structs.
|
| And now you are on your way to reinventing an ECS :)
| jesse__ wrote:
| Minor nitpick, 32bit max value is ~4 billion, not 4 million
| zzo38computer wrote:
| I would expect many C programmers would know these things,
| because I do like that when writing programs in C, too (although
| for bit fields, I will use a single field and then named bits by
| macros). There are some other considerations as well though, such
| as static vs dynamic allocation, and the requirements of how the
| specific program is working.
| andrewmcwatters wrote:
| The article casually ignores the fact that more production code
| uses bitmasking than bit fields.
| cjensen wrote:
| This is good stuff to have "in the toolbox," however habitually
| organizing structs this way is premature optimization. Much
| better to sensibly organize the fields and ignore alignment
| entirely unless you have a really good reason to do this. Good
| reasons could be that you know you will create a lot of these
| objects, or it has become a performance issue due to cache misses
| this would improve.
| 4gotunameagain wrote:
| Unpacked structs are also very useful for parsing network data.
| You take a package, cast it to an unpacked struct, and can access
| everything nicely.
|
| Many a spacecraft telemetry stacks work like that.
| foota wrote:
| Some related thoughts (more for C++ than C).
|
| You can eliminate the cost of a wrapping class with no members by
| taking advantage of the empty base class optimization (or rather:
| by avoiding members in a base class you will be taking advantage
| of EBCO).
|
| You can use an untagged union in C++ (taking care to avoid
| undefined behavior) to implement small object optimization. In
| it's general form, the idea is you can store some smaller struct
| in a union (as a char[], since C++ doesn't have native unions)
| with a larger structure, and use a sentinel in the larger
| structure to determine which it is. E.g., if large struct has
| some field that must not be zero, then make it zero in the
| smaller struct and you can distinguish between the two without a
| tag (like would be required with std::variant).
|
| I'm also a huge fan in general of "make invalid states
| unrepresentable" and following this can often reduce redundancy
| and eliminate extra fields.
| kazinator wrote:
| Generally speaking, sort the members in descending order by
| alignment, if you want the tightest packing without resorting to
| nonportable compiler extensions. This descending order eliminates
| padding between members producing the smallest offset for the
| last byte of the last member, and that then minimizes the padding
| required at the end of the struct.
|
| In C, alignment for basic types is linked to size. A type may
| have an alignment requirement that is as much as its size: e.g.
| an 8 byte data type might be aligned as strictly as addresses
| divisible by 8.
|
| The alignment of an array aggregate is that of its element type.
|
| The alignment of a struct is that of its most strictly aligned
| member.
|
| In portable coding, we don't exactly know, but we can make the
| worst-case assumption that every basic type has the strictest
| possible alignment---i.e. its size. We also don't know the sizes
| of all the types, but we know that integers of higher ranks are
| at least as large as lower ranks; i.e. sizeof(char) <=
| sizeof(short) <= sizeof(int) <= sizeof(long), etc.
|
| Things are tricky when you have a mixture of floating-point,
| pointers and integers. E.g. pointer could be the same size as a
| float, or bigger.
|
| Here is a trick you use: combine together clumps of like types,
| and try to make combinations that are powers of two. For
| instance, if your structure has two or four pointers to things,
| try to make them adjacent. E.g.: struct s {
| void *p1, *p2, *p3, p4; short s1, s2, s3, s4;
| float f1, f2, f3, f4; };
|
| Even though float is almost certainly larger than short and may
| have stricter alignment, its moot because I have four of
| everything. Even if this is on a tiny system where pointers are
| 16 bits wide, four of them will make 8 bytes. So when the f1
| member is being placed into the struct, the offset is at least 8
| byte aligned: we don't expect padding.
|
| If you aggregate enough members of the same type in a power-of-
| two sized clump, you can easily create alignment which meets or
| exceeds the strictest alignment in the system.
|
| So those are the main principles: group together members of the
| same type. Then keeping those groups together, sort by descending
| alignment strictness.
| TinkersW wrote:
| Could improve it by using enum MonsterName :u8, moving it down
| near the other byte sized item, and specifying the exact type of
| the bitfield(u8). Then it would be 16 bytes.
| _mocha wrote:
| I'm somewhat surprised to see this on the front page of HN. I
| distinctly remember this material being covered in a single
| lecture during our freshman CS course 20 years ago. I doubt this
| would've reached the front page in HN's early days, and it makes
| me wonder if we've moved away from teaching these fundamentals
| due to the heavy reliance on abstraction and ai agents in today's
| industry.
| jerf wrote:
| One of the neat things about programming is that you can
| develop a high degree of skill in it without any formal
| education at all.
|
| But it does mean that we've always had those programmers who
| have had characteristic holes in their education as a result.
|
| Personally I enjoy having a mix of self-taught and formally-
| taught around; I think there quite a few things the self-taught
| are often better at than the formally educated as a group. I
| respect both sides. I'm just saying that there has always been
| a market for this sort of thing on HN because the self-taught
| have always been around.
| nomel wrote:
| > I think there quite a few things the self-taught are often
| better at than the formally educated as a group.
|
| Would you mind expanding on this a bit?
| asa400 wrote:
| I've never taken a lick of CS in a formal setting, and I feel
| like that's increasingly common as programming has broadened
| its employment base. Most of the people I work with haven't
| done formal CS, either. Consequently I've had to learn all of
| this stuff on my own. There are communities out there that
| value education this kind of education. I can vouch for the
| Rust community, which has helped me learn _a ton_ about this
| kind of "lower level" stuff over the last 5 years despite
| starting my career doing Ruby.
| o11c wrote:
| Note that although alignment and size are platform-specific, in
| practice padding can be minimized simply by using a fixed order:
| (various vector types) int128_t long double
| long long, int64_t, int_fast64_t, double off_t time_t
| (can be 64 bits on 32-bit platforms only if off_t also is)
| register_t, int_fast16_t, int_fast32_t (64-bit on e.g. x32)
| void *, void(*)(), intptr_t ptrdiff_t (theoretically wrong
| if fully aligned and bigger than pointer) size_t, ssize_t
| long (32-bit on win64; generally has 16-bit alignment if bigger
| than pointer) int32_t, float int (can be 16-bit)
| short, int16_t char, bool, int8_t, int_fast8_t (a
| trailing array member must come last; note when calculating
| combined size you should use offsetof instead of sizeof to not
| waste padding)
|
| Besides this, `int_leastN_t` goes with `intN_t`, and all unsigned
| types go with their signed types. Note that types like `id_t` are
| hard to place however.
|
| Although there do exist many platforms where this is not in order
| of _size_ , the bigger types generally have smaller alignment on
| such platforms. Despite this I have split out rows occasionally
| despite not knowing of any such platform.
| teo_zero wrote:
| Neither TFA nor any comment mention using a struct of arrays
| instead of an array of structs. This technique alone eliminates
| all padding bytes!
| herf wrote:
| Sometimes struct-of-arrays (in-memory column store) is the right
| solution, and you get automatic packing that way. C makes it so
| much easier to store an array of structs that people don't do
| this often enough, but column stores prove it has its uses.
| munificent wrote:
| This is a good article, but it bugs me that it sort of only tells
| half the story.
|
| _Why_ are you making your structs smaller. Probably for
| "performance". If your goal is literally just reducing memory
| usage, then this is all fine. Maybe you're running on a small
| microcontroller and you just straight up don't have much RAM.
|
| But these days, in most areas, memory is cheap and plentiful.
| What's expensive is _CPU cache_. If you 're optimizing memory use
| for that, it's really about runtime _speed_ and less about total
| memory usage. You 're trying to make things small so you have
| fewer cache misses and the CPU doesn't get stuck waiting as much.
|
| In that case, the advice here is only part of the story. Using
| bitfields and smaller integers saves memory. But in order to
| _access_ those fields and do anything with them, they need to get
| loaded into registers. I 'm not an expert here, but my
| understanding is that loading a bitfield or small int into a
| word-sized register can sometimes have some bit of overhead, so
| you have to pay attention to the trade-off.
|
| If I was optimizing for overall runtime speed and considering
| packing structs to do that, I'd really want to have good
| profiling and benchmarks in place first. Otherwise it's easy to
| think you're making progress when you're actually going
| backwards.
| zeusk wrote:
| The real issue with packed structs and bitfields happens when
| concurrency gets involved. Majority of modern CPU caches are
| private and only allow one core to hold the cache line - so it
| creates more false dependencies when cores are trying to alter
| information that was compacted into a cache line on another
| core.
| wbl wrote:
| Everything is microarch dependant but typically the memory
| subsystem can handle subword loads and writes. What it usually
| can't do is bits.
| ryao wrote:
| If you are using an intrusive data structure, keeping the key
| on the same cache line as the pointers can make traversals
| faster for example. Reducing padding can make this easier to
| do.
|
| In general, keeping commonly accessed together things on the
| same cache line is a good idea and the work needed to remove
| excessive padding overlaps with the work needed to do this.
___________________________________________________________________
(page generated 2025-07-30 23:00 UTC)