[HN Gopher] I write type-safe generic data structures in C
       ___________________________________________________________________
        
       I write type-safe generic data structures in C
        
       Author : todsacerdoti
       Score  : 188 points
       Date   : 2025-06-30 16:55 UTC (6 hours ago)
        
 (HTM) web link (danielchasehooper.com)
 (TXT) w3m dump (danielchasehooper.com)
        
       | uecker wrote:
       | It is cool trick. I already use in my experimental library though
       | ;-) https://github.com/uecker/noplate/blob/main/src/list.h
        
         | eqvinox wrote:
         | I guess if anyone might know it might be you--do you see any
         | way of doing this for intrusive data structures, embedding the
         | node struct in the data (and as side effect supporting an
         | object to be on multiple containers) rather than the data in
         | the node like you're doing there?
        
           | uecker wrote:
           | You could put the dummy member into the embedded node. But
           | for intrusive data structures you often want them to erase
           | the type so that you write generic algorithms as regular
           | functions. In this case, it makes more sense to have a run-
           | time check do to down casts. I do this with my variadic type
           | which has an intrusive "super" member:
           | https://godbolt.org/z/ofdKe7Pfv The overhead is often
           | completely removed by the compiler.
        
             | eqvinox wrote:
             | Mhm. Putting the dummy member into the embedded node
             | doesn't give a path from the proper object to find the
             | embedded node "mid-struct". run-time checks are the "easy
             | way out". We/I'll stick to macro soup probably, so we have
             | compile-time checks.
             | 
             | btw. For ISO C WG14... has anyone suggested adding _Include
             | to the preprocessor, along the lines of _Pragma? It'd
             | really help with doing this kind of really long macros,
             | hiding the clunky "#define, #define, #define, #include"
             | inside a macro...
        
               | uecker wrote:
               | I don't think anybody has proposed this, at least not
               | recently. There is a proposal for multi-line macros:
               | https://www.open-
               | std.org/jtc1/sc22/wg14/www/docs/n3531.txt
        
               | eqvinox wrote:
               | That one might actually be better, nice to know & thanks
               | for the pointer! (I hope it actually goes somewhere...)
        
       | asplake wrote:
       | Interesting! I'm working on toy/educational generator of ML-style
       | tagged variants and associated functions in C (for a compiler)
       | and when I'm a bit further along I will see if they're
       | compatible.
        
       | HexDecOctBin wrote:
       | The key idea here seems to be to use function pointer's type to
       | enforce type safety rather than using the data "handle" type
       | (that is often found in implementations inspired by Sean
       | Barrett's strechy_buffers).
       | 
       | > One annoying thing about C is that it does not consider these
       | two variables to have the same type
       | 
       | C23 solves that too: https://www.open-
       | std.org/jtc1/sc22/wg14/www/docs/n3037.pdf
       | 
       | Supported by latest GCC and Clang, but not by MSVC.
        
         | dhooper wrote:
         | Author here. Not quite. The key idea is about using a union to
         | associate type information with a generic data type. Type
         | casting a function is not the only way to use that type
         | information. I discuss that as well as the C23 changes in the
         | footnotes and the "typeof on old compilers" section.
        
           | wahern wrote:
           | FWIW, as far back as 2015 my feature check library documents
           | Visual Studio as supporting "__typeof".[1] Note the leading
           | but not trailing underscores. Perhaps I was mistaken, but I
           | usually tested that sort of thing. It's also possible
           | __typeof had slightly different semantics.
           | 
           | [1] See https://github.com/wahern/autoguess/blob/b44556e4/con
           | fig.h.g... (that's the 2015 revision, but HEAD has the same
           | code).
        
             | dhooper wrote:
             | msvc 19.39 is the first to support it, which I mention in
             | the article. You can confirm it _didn 't_ work up through
             | 19.38 in godbolt [1]. I don't use Visual Studio, so I don't
             | know what version of that first started using msvc 19.39
             | 
             | [1] https://godbolt.org/z/M7zPYdssP
        
       | eqvinox wrote:
       | Interesting idea with the union and using typeof(). We (I) went
       | with large macros defining wrappers instead, which, I believe, is
       | a necessity with intrusive data structures, or at least I don't
       | immediately see how to do that with unions & typeof. Maybe it's
       | possible...?
       | 
       | e.g. hash table wrapper:
       | https://github.com/FRRouting/frr/blob/master/lib/typesafe.h#...
       | 
       | (cf. https://docs.frrouting.org/projects/dev-
       | guide/en/latest/list...)
        
       | notnmeyer wrote:
       | pretty sure C is the new Go.
        
         | revskill wrote:
         | Without the concurreny part.
        
           | oflebbe wrote:
           | OpenMP to the rescue
        
           | sltkr wrote:
           | Or garbage collection. Or interfaces. Or packages. Or actual
           | generics.
        
         | qustrolabe wrote:
         | pretty sure C has to go
        
       | o11c wrote:
       | For your level 2 code, `uint64_t data[];` is wrong for types
       | whose alignment is greater than `uint64_t`, and also wasteful for
       | types whose alignment is smaller (for example, under an ilp32 ABI
       | on 64-bit architectures).
       | 
       | For your level 3 code, it should be `int main() { List(Foo)
       | foo_list = {NULL};`
       | 
       | Note that working around a lack of `typeof` means you can't
       | return anything. Also, your particular workaround allows
       | `const`ness errors since `==` is symmetrical.
       | 
       | You can't safely omit `payload` since you need it to know the
       | correct size. Consider a `List(int64_t)` and you try to add an
       | `int32_t` to it - this should be fine, but you can't `sizeof` the
       | `int32_t`. Your code is actually lacking quite a bit to make this
       | work.
       | 
       | =====
       | 
       | There are 2 major limitations to generics in C right now:
       | 
       | * Delegating to a vtable (internal or external) is limited in
       | functionality, since structs cannot contain macros, only
       | functions.
       | 
       | * Delegating to an external vtable (mandatory to avoid overhead)
       | means that you have to forward-declare _all_ of the types you 'll
       | ever use a vtable with. So far the best approach I've found is to
       | declare (but not define) static functions in the same forwarding
       | header I declare the typedefs in; note that GCC and Clang differ
       | in what phase the "undefined static" warning appears in for the
       | case where you don't actually include that particular type's
       | header in a given TU.
       | 
       | (think about writing a function that accepts either `struct
       | SizedBuffer {void *p; size_t len;};` or `struct BoundedBuffer
       | {void *begin; void *end;};`, and also const versions thereof -
       | all from different headers).
        
         | EPWN3D wrote:
         | I would love for `union`s to be federated, that is, a type
         | could declare itself as thought it was part of a union with
         | another type, without having to pre-declare all possible types
         | in one place.
        
           | o11c wrote:
           | For layout-compatible types, you can often just include a
           | `_base` member in each child. Maybe twice (once named and
           | once unnamed) to avoid excess typing - I don't understand the
           | common-initial-subsequence rule but people do this enough
           | that compilers have to allow it.
        
         | rectang wrote:
         | > _Delegating to an external vtable (mandatory to avoid
         | overhead) means that you have to forward-declare all of the
         | types you 'll ever use a vtable with._
         | 
         | We went down the rabbit hole of writing a compiler for this as
         | part of a project I used to work on (Apache Clownfish[1], a
         | subproject of the retired Apache Lucy project). We started off
         | parsing .h files, but eventually it made sense to create our
         | own small header language (.cfh "Clownfish Header" files).
         | 
         | Here's some generated code for invoking the CharBuf version of
         | the "Clone" method defined in parent class "Obj":
         | typedef cfish_CharBuf*
         | (*CFISH_CharBuf_Clone_t)(cfish_CharBuf* self);
         | extern uint32_t CFISH_CharBuf_Clone_OFFSET;              static
         | inline cfish_CharBuf*
         | CFISH_CharBuf_Clone(cfish_CharBuf* self) {             const
         | CFISH_CharBuf_Clone_t method                 =
         | (CFISH_CharBuf_Clone_t)cfish_obj_method(
         | self,                     CFISH_CharBuf_Clone_OFFSET
         | );             return method(self);         }
         | 
         | Usage:                   cfish_CharBuf *charbuf =
         | cfish_CharBuf_new();         cfish_CharBuf *clone =
         | CFISH_CharBuf_Clone(charbuf);
         | 
         | We had our reasons for going to these extremes: the point of
         | Clownfish was to provide a least-common-denominator object
         | model for bindings to multiple dynamic languages (similar
         | problem domain to SWIG), and the .cfh files also were used to
         | derive types for the binding languages. But there was truly an
         | absurd amount of boilerplate being generated to get around the
         | issue you identify.
         | 
         | This is why almost everybody just uses casts to void* for the
         | invocant, skipping type safety.
         | 
         | [1] https://github.com/apache/lucy-clownfish
        
           | zem wrote:
           | i am firmly of the opinion that compiling to c is a better
           | route than doing clever c tricks to sort of get what you
           | want. the compiler can be pretty minimal and as you note it
           | pays for itself.
        
         | n_plus_1_acc wrote:
         | This is also problematic, because there might be padding and
         | the calculated size might be too small:
         | 
         | `malloc(sizeof(*node) + data_size);`
        
           | o11c wrote:
           | There's no a problem with the author's _current_ code, since
           | the padding is already included in the node size, but it
           | would be a problem after doing alignment more intelligently.
        
         | kccqzy wrote:
         | > it should be `int main() { List(Foo) foo_list = {NULL};`
         | 
         | In C `int main()` means the function takes an unknown number of
         | arguments. You need `int main(void)` to mean it doesn't take
         | any arguments. This is a fact frequently forgotten by those who
         | write C++.
        
           | tedunangst wrote:
           | This is incorrect. In a function definition, an empty list
           | means it takes no parameters. 6.7.5.3 Function declarators
           | 
           | > 14. An empty list in a function declarator that is part of
           | a definition of that function specifies that the function has
           | no parameters.
        
             | s3graham wrote:
             | As you surely know if you're quoting the standard, it
             | depends on which standard!
        
               | gpderetta wrote:
               | I believe that since C23 foo() is now a nullary function.
               | As this is the last approved standard and it supersedes
               | all previous standards, it is technically correct to say
               | that de-jure this is what the (unqualified) C standard
               | mandates.
               | 
               | Of course de-facto things are more nunanced.
        
               | el_pollo_diablo wrote:
               | C23 does not change anything in this situation, because
               | we are talking about the _definition_ of main(), not a
               | forward declaration. More details here:
               | 
               | https://news.ycombinator.com/item?id=38729278#38732366
        
               | tedunangst wrote:
               | Quote a different standard.
        
           | flohofwoe wrote:
           | That had been harmonized with C++ in C23 (e.g. func() is
           | equivalent with func(void) now).
           | 
           | It's not really relevant for main() though, even in older C
           | versions main() works fine and simply means "I don't need
           | argc and argv".
        
             | el_pollo_diablo wrote:
             | This is about a function definition, not a random function
             | declarator. C23 does not change anything in that case.
        
       | gritzko wrote:
       | Hi. I object.
       | 
       | The trick#0 you mention is how I made an entire C dialect. Here
       | is a generic binary heap, for example
       | https://github.com/gritzko/librdx/blob/master/abc/HEAPx.h The
       | syntax is a bit heavyweight, but a huge huge advantage is: you
       | get regular C structs in the end, very plain, very predictable,
       | very optimizable. Compiler would eat them like donuts.
       | 
       | In the other cases, it is void* and runtime memory sizing and you
       | have to define macros anyway.
        
         | variadix wrote:
         | I agree, there are actually several reasons to prefer the
         | header impl. Debugging is better, both because you can step
         | through the header code where you can't with a macro function,
         | and because the type information available to the debugger is
         | better. There are more opportunities for compiler optimizations
         | because each instantiation is monomorphized and you don't pay a
         | runtime cost with variable sizing, generic structures can also
         | be placed on the stack because of the fixed sizing.
         | 
         | There are workarounds for at least two of the problems the
         | author mentions. Naming can be changed from Bar_func(args...)
         | to func(Bar)(args...) with a function name macro that just does
         | name mangling. You can avoid some of the binary bloat by using
         | weak symbols, letting the linker deduplicate functions shared
         | between translation units at link time.
         | 
         | There are other problems for generic containers of pointer
         | types however, you can work around them by using a typedef or a
         | type alias.
         | 
         | Intrusive data structures are more convenient in C still, but
         | working with them in a debugger is a pain.
        
           | dhooper wrote:
           | Author here. It's worth noting that no work is being done in
           | the macros of my article, they compile down to a normal c
           | function call which you can step through in a debugger.
           | 
           | There is little benefit in monomorphizing the implementation
           | of a data structure like a linked list where its behavior
           | doesn't depend on the contents of the data it contains
           | (compared to, say, a max heap)
        
         | dhooper wrote:
         | Author here. Binary heaps and linked lists are different use
         | cases. A binary heap must read the data you put in it to store
         | it correctly, but a linked list doesn't. If I were writing a
         | generic binary heap, maybe I would weigh my options
         | differently. I mentioned this in the footnotes.
        
           | wordglyph wrote:
           | And that's why I like C++ templates
        
         | knutwannheden wrote:
         | > Compiler would eat them like donuts.
         | 
         | Made me laugh out loud!
        
       | b0a04gl wrote:
       | what happens if two types have same size and alignment but
       | different semantics : like `int` vs `float` or `struct { int a;
       | }` vs `int`? does the type tag system catch accidental reuse .
       | aka defending against structual collisions
        
       | ape4 wrote:
       | Or write in CFront and have it translated to C
        
         | zabzonk wrote:
         | And where are you going to get a cfront compiler these days?
        
       | layer8 wrote:
       | The casting of the function type assumes that the item pointer
       | type (e.g. Foo*) has the same representation as void*, which the
       | C standard doesn't guarantee (in standardese: the two types
       | aren't "compatible"). Calling the function with the converted
       | type therefore constitutes undefined behavior. It also impacts
       | aliasing analysis by compilers (see [0], incidentally), even if
       | the pointer representation happens to be the same.
       | 
       | This casting of the functions to different argument types
       | constitutes the core of the type safety of the generic
       | invocations; I'm not sure it can be fixed.
       | 
       | [0] https://news.ycombinator.com/item?id=44421185
        
         | dhooper wrote:
         | This is addressed in the footnotes. casting is not the core of
         | the type safety. Read the whole article.
        
           | layer8 wrote:
           | Ah, that's what I get for not reading the footnotes. However,
           | the alternative solution presented evaluates the item
           | argument twice, which is problematic as well (but could
           | probably be worked around by passing `(list)->payload` on
           | instead). Secondly, the assignment for type-checking doesn't
           | work for read-only operations on a const List, or does it?
           | And doesn't the assignment overwrite the head? Lastly, the
           | do-while construction means you can't use it for operations
           | that return a value (without compiler extensions).
           | 
           | I also don't agree it's "squeamish" to be wary of aliasing
           | analysis going wrong. It's not a clean abstraction and can
           | hide subtle bugs down the road.
        
       | monkeyelite wrote:
       | Another way is to not try to write generic data structures. When
       | you tailor them to the use case you can simplify.
       | 
       | The #1 data structure in any program is array.
        
       | hgs3 wrote:
       | I'm curious what a hashmap looks like with this approach. It's
       | one thing to pass through or hold onto a generic value, but
       | another to perform operations on it. Think computing the hash
       | value or comparing equality of generic keys in a generic hashmap.
        
       | mfuzzey wrote:
       | There's also the method used in the Linux kernel to embed the
       | list information (struct list_head) within the type specific
       | struct. https://kernelnewbies.org/FAQ/LinkedLists
        
         | nixpulvis wrote:
         | The naming of LIST_HEAD_INIT and INIT_LIST_HEAD is confusing to
         | me.
        
           | mfuzzey wrote:
           | The way I remember it is:
           | 
           | INIT_LIST_HEAD is of form VERB_NOUN so is called from within
           | a function to programatically initialise the list.
           | 
           | LIST_HEAD_INIT is NOUN_VERB and is used within a structure
           | initialiser not from a function.
           | 
           | But my main point was to show the "embed the list in the
           | data" approach rather than "embed the data in the list" or
           | "point to the data from the list" and not to discuss the
           | naming details in the kernel implementation of the concept.
        
       | cherryteastain wrote:
       | Why would you jump through all these hoops instead of just
       | writing C++ if you want "C with generics"
        
         | Kranar wrote:
         | Because for many of the use cases where C is used, switching to
         | C++ involves jumping through even more hoops.
        
           | lionkor wrote:
           | Do you have a couple of real world examples?
        
             | adastra22 wrote:
             | Embedded systems, for example.
        
               | teraflop wrote:
               | I know it used to be, but is it really still common for
               | embedded systems to use weird architectures that
               | G++/Clang don't support?
        
               | adastra22 wrote:
               | Unless it is a popular system or common architecture,
               | yes.
        
             | mikepurvis wrote:
             | Any established C codebase, for example the kernel or
             | Postgres?
             | 
             | Traditionally microcontroller firmwares as well, though
             | those are increasingly friendly to C++, you just have to be
             | careful about allocations as C++ makes it way easier to
             | accidentally allocate than C does.
        
               | neonz80 wrote:
               | I'm not sure about other compilers, but compiling C code
               | as C++ with MSVC ends up with pretty much the exact same
               | code, instruction by instruction. C++ is a bit more
               | strict though especially with casting, so a lot of code
               | won't compile out of the box.
        
               | vbezhenar wrote:
               | C++ code compiles to a different function names in object
               | file (name mangling). You probably need to put a lot of
               | ugly `#ifdef __cplusplus extern "C" {` boilerplate in
               | your headers, otherwise C and C++ files will not compile
               | together.
        
             | rectang wrote:
             | Writing extensions for projects that support C extensions
             | but may not support C++ extensions, e.g. many dynamic
             | languages.
        
               | Snarwin wrote:
               | You can still write the extension in C++ and expose an
               | extern "C" interface.
        
               | rectang wrote:
               | That's possible, but then the people building your
               | extension need a C++ toolchain.
               | 
               | The question was "please provide examples where switching
               | to C++ involves jumping through even more hoops", and in
               | my view requiring downstream to use a C++ environment
               | when they're expecting to use a C environment qualifies.
        
               | uecker wrote:
               | True. For me, C++ itself is the maze of hoops I would
               | rather want to avoid.
        
       | WalterBright wrote:
       | Here's how to do it in D:                   struct ListNode(T) {
       | ListNode* next;             T data;         }              T!int
       | node;
       | 
       | Why suffer the C preprocessor? Using preprocessor macros is like
       | using a hammer for finish carpentry, rather than a nail gun. A
       | nail gun is 10x faster, drives the nail perfectly every time, and
       | no half moon dents in your work.
        
         | dhooper wrote:
         | Thanks, this post is about C.
         | 
         | On some projects you must use C.
        
           | WalterBright wrote:
           | If I may may be provocative :-) this post isn't about C. It's
           | about layering on a custom language using C preprocessor
           | macros.
           | 
           | My compilers were originally written in C. I started using
           | the C preprocessor to do metaprogramming. After some years I
           | got fed up with it and removed nearly all of the preprocessor
           | use, and never looked back. My code was much easier to
           | understand.
           | 
           | An amusing story: long ago, a friend of mine working for
           | Microsoft was told by a team leader that a 50K program had a
           | bug in it, and sadly the developer was long gone. He'd
           | assigned programmer after programmer to it, who could not fix
           | it. My friend said he'd give it a try, and had it fixed in 2
           | hours.
           | 
           | The source code was written in Microsoft MASM, where the M
           | stood for "Macro". You can guess where this is going. The
           | developer had invented his own high level language using the
           | macro system (which was much more powerful than C's).
           | Unfortunately, he neglected to document it, and the other
           | programmers spent weeks studying it and could not figure it
           | out.
           | 
           | The leader, astonished, asked him how he figured it out in 2
           | hours? My friend said simple. He assembled it to object code,
           | then disassembled the object code with obj2asm (a
           | disassembler I wrote that converts object code back to source
           | code). He then immediately found and fixed the bug, and
           | checked in the "new" source code which was the disassembled
           | version.
           | 
           | I've seen many very smart and clever uses of the C macros,
           | the article is one of them. But isn't it time to move on?
        
             | ryao wrote:
             | If the C compiler accepts it, it is C.
        
       | ryao wrote:
       | uint64_t data[] in level 2 violates the strict aliasing rule. Use
       | the char type instead to avoid the violation.
        
       | david2ndaccount wrote:
       | The "typeof on old compilers" section contains the code:
       | (list)->payload = (item); /* just for type checking */\
       | 
       | That is not a no-op. That is overwriting the list head with your
       | (item). Did you mean to wrap it in an `if(0)`?
        
       ___________________________________________________________________
       (page generated 2025-06-30 23:00 UTC)