[HN Gopher] I write type-safe generic data structures in C
___________________________________________________________________
I write type-safe generic data structures in C
Author : todsacerdoti
Score : 188 points
Date : 2025-06-30 16:55 UTC (6 hours ago)
(HTM) web link (danielchasehooper.com)
(TXT) w3m dump (danielchasehooper.com)
| uecker wrote:
| It is cool trick. I already use in my experimental library though
| ;-) https://github.com/uecker/noplate/blob/main/src/list.h
| eqvinox wrote:
| I guess if anyone might know it might be you--do you see any
| way of doing this for intrusive data structures, embedding the
| node struct in the data (and as side effect supporting an
| object to be on multiple containers) rather than the data in
| the node like you're doing there?
| uecker wrote:
| You could put the dummy member into the embedded node. But
| for intrusive data structures you often want them to erase
| the type so that you write generic algorithms as regular
| functions. In this case, it makes more sense to have a run-
| time check do to down casts. I do this with my variadic type
| which has an intrusive "super" member:
| https://godbolt.org/z/ofdKe7Pfv The overhead is often
| completely removed by the compiler.
| eqvinox wrote:
| Mhm. Putting the dummy member into the embedded node
| doesn't give a path from the proper object to find the
| embedded node "mid-struct". run-time checks are the "easy
| way out". We/I'll stick to macro soup probably, so we have
| compile-time checks.
|
| btw. For ISO C WG14... has anyone suggested adding _Include
| to the preprocessor, along the lines of _Pragma? It'd
| really help with doing this kind of really long macros,
| hiding the clunky "#define, #define, #define, #include"
| inside a macro...
| uecker wrote:
| I don't think anybody has proposed this, at least not
| recently. There is a proposal for multi-line macros:
| https://www.open-
| std.org/jtc1/sc22/wg14/www/docs/n3531.txt
| eqvinox wrote:
| That one might actually be better, nice to know & thanks
| for the pointer! (I hope it actually goes somewhere...)
| asplake wrote:
| Interesting! I'm working on toy/educational generator of ML-style
| tagged variants and associated functions in C (for a compiler)
| and when I'm a bit further along I will see if they're
| compatible.
| HexDecOctBin wrote:
| The key idea here seems to be to use function pointer's type to
| enforce type safety rather than using the data "handle" type
| (that is often found in implementations inspired by Sean
| Barrett's strechy_buffers).
|
| > One annoying thing about C is that it does not consider these
| two variables to have the same type
|
| C23 solves that too: https://www.open-
| std.org/jtc1/sc22/wg14/www/docs/n3037.pdf
|
| Supported by latest GCC and Clang, but not by MSVC.
| dhooper wrote:
| Author here. Not quite. The key idea is about using a union to
| associate type information with a generic data type. Type
| casting a function is not the only way to use that type
| information. I discuss that as well as the C23 changes in the
| footnotes and the "typeof on old compilers" section.
| wahern wrote:
| FWIW, as far back as 2015 my feature check library documents
| Visual Studio as supporting "__typeof".[1] Note the leading
| but not trailing underscores. Perhaps I was mistaken, but I
| usually tested that sort of thing. It's also possible
| __typeof had slightly different semantics.
|
| [1] See https://github.com/wahern/autoguess/blob/b44556e4/con
| fig.h.g... (that's the 2015 revision, but HEAD has the same
| code).
| dhooper wrote:
| msvc 19.39 is the first to support it, which I mention in
| the article. You can confirm it _didn 't_ work up through
| 19.38 in godbolt [1]. I don't use Visual Studio, so I don't
| know what version of that first started using msvc 19.39
|
| [1] https://godbolt.org/z/M7zPYdssP
| eqvinox wrote:
| Interesting idea with the union and using typeof(). We (I) went
| with large macros defining wrappers instead, which, I believe, is
| a necessity with intrusive data structures, or at least I don't
| immediately see how to do that with unions & typeof. Maybe it's
| possible...?
|
| e.g. hash table wrapper:
| https://github.com/FRRouting/frr/blob/master/lib/typesafe.h#...
|
| (cf. https://docs.frrouting.org/projects/dev-
| guide/en/latest/list...)
| notnmeyer wrote:
| pretty sure C is the new Go.
| revskill wrote:
| Without the concurreny part.
| oflebbe wrote:
| OpenMP to the rescue
| sltkr wrote:
| Or garbage collection. Or interfaces. Or packages. Or actual
| generics.
| qustrolabe wrote:
| pretty sure C has to go
| o11c wrote:
| For your level 2 code, `uint64_t data[];` is wrong for types
| whose alignment is greater than `uint64_t`, and also wasteful for
| types whose alignment is smaller (for example, under an ilp32 ABI
| on 64-bit architectures).
|
| For your level 3 code, it should be `int main() { List(Foo)
| foo_list = {NULL};`
|
| Note that working around a lack of `typeof` means you can't
| return anything. Also, your particular workaround allows
| `const`ness errors since `==` is symmetrical.
|
| You can't safely omit `payload` since you need it to know the
| correct size. Consider a `List(int64_t)` and you try to add an
| `int32_t` to it - this should be fine, but you can't `sizeof` the
| `int32_t`. Your code is actually lacking quite a bit to make this
| work.
|
| =====
|
| There are 2 major limitations to generics in C right now:
|
| * Delegating to a vtable (internal or external) is limited in
| functionality, since structs cannot contain macros, only
| functions.
|
| * Delegating to an external vtable (mandatory to avoid overhead)
| means that you have to forward-declare _all_ of the types you 'll
| ever use a vtable with. So far the best approach I've found is to
| declare (but not define) static functions in the same forwarding
| header I declare the typedefs in; note that GCC and Clang differ
| in what phase the "undefined static" warning appears in for the
| case where you don't actually include that particular type's
| header in a given TU.
|
| (think about writing a function that accepts either `struct
| SizedBuffer {void *p; size_t len;};` or `struct BoundedBuffer
| {void *begin; void *end;};`, and also const versions thereof -
| all from different headers).
| EPWN3D wrote:
| I would love for `union`s to be federated, that is, a type
| could declare itself as thought it was part of a union with
| another type, without having to pre-declare all possible types
| in one place.
| o11c wrote:
| For layout-compatible types, you can often just include a
| `_base` member in each child. Maybe twice (once named and
| once unnamed) to avoid excess typing - I don't understand the
| common-initial-subsequence rule but people do this enough
| that compilers have to allow it.
| rectang wrote:
| > _Delegating to an external vtable (mandatory to avoid
| overhead) means that you have to forward-declare all of the
| types you 'll ever use a vtable with._
|
| We went down the rabbit hole of writing a compiler for this as
| part of a project I used to work on (Apache Clownfish[1], a
| subproject of the retired Apache Lucy project). We started off
| parsing .h files, but eventually it made sense to create our
| own small header language (.cfh "Clownfish Header" files).
|
| Here's some generated code for invoking the CharBuf version of
| the "Clone" method defined in parent class "Obj":
| typedef cfish_CharBuf*
| (*CFISH_CharBuf_Clone_t)(cfish_CharBuf* self);
| extern uint32_t CFISH_CharBuf_Clone_OFFSET; static
| inline cfish_CharBuf*
| CFISH_CharBuf_Clone(cfish_CharBuf* self) { const
| CFISH_CharBuf_Clone_t method =
| (CFISH_CharBuf_Clone_t)cfish_obj_method(
| self, CFISH_CharBuf_Clone_OFFSET
| ); return method(self); }
|
| Usage: cfish_CharBuf *charbuf =
| cfish_CharBuf_new(); cfish_CharBuf *clone =
| CFISH_CharBuf_Clone(charbuf);
|
| We had our reasons for going to these extremes: the point of
| Clownfish was to provide a least-common-denominator object
| model for bindings to multiple dynamic languages (similar
| problem domain to SWIG), and the .cfh files also were used to
| derive types for the binding languages. But there was truly an
| absurd amount of boilerplate being generated to get around the
| issue you identify.
|
| This is why almost everybody just uses casts to void* for the
| invocant, skipping type safety.
|
| [1] https://github.com/apache/lucy-clownfish
| zem wrote:
| i am firmly of the opinion that compiling to c is a better
| route than doing clever c tricks to sort of get what you
| want. the compiler can be pretty minimal and as you note it
| pays for itself.
| n_plus_1_acc wrote:
| This is also problematic, because there might be padding and
| the calculated size might be too small:
|
| `malloc(sizeof(*node) + data_size);`
| o11c wrote:
| There's no a problem with the author's _current_ code, since
| the padding is already included in the node size, but it
| would be a problem after doing alignment more intelligently.
| kccqzy wrote:
| > it should be `int main() { List(Foo) foo_list = {NULL};`
|
| In C `int main()` means the function takes an unknown number of
| arguments. You need `int main(void)` to mean it doesn't take
| any arguments. This is a fact frequently forgotten by those who
| write C++.
| tedunangst wrote:
| This is incorrect. In a function definition, an empty list
| means it takes no parameters. 6.7.5.3 Function declarators
|
| > 14. An empty list in a function declarator that is part of
| a definition of that function specifies that the function has
| no parameters.
| s3graham wrote:
| As you surely know if you're quoting the standard, it
| depends on which standard!
| gpderetta wrote:
| I believe that since C23 foo() is now a nullary function.
| As this is the last approved standard and it supersedes
| all previous standards, it is technically correct to say
| that de-jure this is what the (unqualified) C standard
| mandates.
|
| Of course de-facto things are more nunanced.
| el_pollo_diablo wrote:
| C23 does not change anything in this situation, because
| we are talking about the _definition_ of main(), not a
| forward declaration. More details here:
|
| https://news.ycombinator.com/item?id=38729278#38732366
| tedunangst wrote:
| Quote a different standard.
| flohofwoe wrote:
| That had been harmonized with C++ in C23 (e.g. func() is
| equivalent with func(void) now).
|
| It's not really relevant for main() though, even in older C
| versions main() works fine and simply means "I don't need
| argc and argv".
| el_pollo_diablo wrote:
| This is about a function definition, not a random function
| declarator. C23 does not change anything in that case.
| gritzko wrote:
| Hi. I object.
|
| The trick#0 you mention is how I made an entire C dialect. Here
| is a generic binary heap, for example
| https://github.com/gritzko/librdx/blob/master/abc/HEAPx.h The
| syntax is a bit heavyweight, but a huge huge advantage is: you
| get regular C structs in the end, very plain, very predictable,
| very optimizable. Compiler would eat them like donuts.
|
| In the other cases, it is void* and runtime memory sizing and you
| have to define macros anyway.
| variadix wrote:
| I agree, there are actually several reasons to prefer the
| header impl. Debugging is better, both because you can step
| through the header code where you can't with a macro function,
| and because the type information available to the debugger is
| better. There are more opportunities for compiler optimizations
| because each instantiation is monomorphized and you don't pay a
| runtime cost with variable sizing, generic structures can also
| be placed on the stack because of the fixed sizing.
|
| There are workarounds for at least two of the problems the
| author mentions. Naming can be changed from Bar_func(args...)
| to func(Bar)(args...) with a function name macro that just does
| name mangling. You can avoid some of the binary bloat by using
| weak symbols, letting the linker deduplicate functions shared
| between translation units at link time.
|
| There are other problems for generic containers of pointer
| types however, you can work around them by using a typedef or a
| type alias.
|
| Intrusive data structures are more convenient in C still, but
| working with them in a debugger is a pain.
| dhooper wrote:
| Author here. It's worth noting that no work is being done in
| the macros of my article, they compile down to a normal c
| function call which you can step through in a debugger.
|
| There is little benefit in monomorphizing the implementation
| of a data structure like a linked list where its behavior
| doesn't depend on the contents of the data it contains
| (compared to, say, a max heap)
| dhooper wrote:
| Author here. Binary heaps and linked lists are different use
| cases. A binary heap must read the data you put in it to store
| it correctly, but a linked list doesn't. If I were writing a
| generic binary heap, maybe I would weigh my options
| differently. I mentioned this in the footnotes.
| wordglyph wrote:
| And that's why I like C++ templates
| knutwannheden wrote:
| > Compiler would eat them like donuts.
|
| Made me laugh out loud!
| b0a04gl wrote:
| what happens if two types have same size and alignment but
| different semantics : like `int` vs `float` or `struct { int a;
| }` vs `int`? does the type tag system catch accidental reuse .
| aka defending against structual collisions
| ape4 wrote:
| Or write in CFront and have it translated to C
| zabzonk wrote:
| And where are you going to get a cfront compiler these days?
| layer8 wrote:
| The casting of the function type assumes that the item pointer
| type (e.g. Foo*) has the same representation as void*, which the
| C standard doesn't guarantee (in standardese: the two types
| aren't "compatible"). Calling the function with the converted
| type therefore constitutes undefined behavior. It also impacts
| aliasing analysis by compilers (see [0], incidentally), even if
| the pointer representation happens to be the same.
|
| This casting of the functions to different argument types
| constitutes the core of the type safety of the generic
| invocations; I'm not sure it can be fixed.
|
| [0] https://news.ycombinator.com/item?id=44421185
| dhooper wrote:
| This is addressed in the footnotes. casting is not the core of
| the type safety. Read the whole article.
| layer8 wrote:
| Ah, that's what I get for not reading the footnotes. However,
| the alternative solution presented evaluates the item
| argument twice, which is problematic as well (but could
| probably be worked around by passing `(list)->payload` on
| instead). Secondly, the assignment for type-checking doesn't
| work for read-only operations on a const List, or does it?
| And doesn't the assignment overwrite the head? Lastly, the
| do-while construction means you can't use it for operations
| that return a value (without compiler extensions).
|
| I also don't agree it's "squeamish" to be wary of aliasing
| analysis going wrong. It's not a clean abstraction and can
| hide subtle bugs down the road.
| monkeyelite wrote:
| Another way is to not try to write generic data structures. When
| you tailor them to the use case you can simplify.
|
| The #1 data structure in any program is array.
| hgs3 wrote:
| I'm curious what a hashmap looks like with this approach. It's
| one thing to pass through or hold onto a generic value, but
| another to perform operations on it. Think computing the hash
| value or comparing equality of generic keys in a generic hashmap.
| mfuzzey wrote:
| There's also the method used in the Linux kernel to embed the
| list information (struct list_head) within the type specific
| struct. https://kernelnewbies.org/FAQ/LinkedLists
| nixpulvis wrote:
| The naming of LIST_HEAD_INIT and INIT_LIST_HEAD is confusing to
| me.
| mfuzzey wrote:
| The way I remember it is:
|
| INIT_LIST_HEAD is of form VERB_NOUN so is called from within
| a function to programatically initialise the list.
|
| LIST_HEAD_INIT is NOUN_VERB and is used within a structure
| initialiser not from a function.
|
| But my main point was to show the "embed the list in the
| data" approach rather than "embed the data in the list" or
| "point to the data from the list" and not to discuss the
| naming details in the kernel implementation of the concept.
| cherryteastain wrote:
| Why would you jump through all these hoops instead of just
| writing C++ if you want "C with generics"
| Kranar wrote:
| Because for many of the use cases where C is used, switching to
| C++ involves jumping through even more hoops.
| lionkor wrote:
| Do you have a couple of real world examples?
| adastra22 wrote:
| Embedded systems, for example.
| teraflop wrote:
| I know it used to be, but is it really still common for
| embedded systems to use weird architectures that
| G++/Clang don't support?
| adastra22 wrote:
| Unless it is a popular system or common architecture,
| yes.
| mikepurvis wrote:
| Any established C codebase, for example the kernel or
| Postgres?
|
| Traditionally microcontroller firmwares as well, though
| those are increasingly friendly to C++, you just have to be
| careful about allocations as C++ makes it way easier to
| accidentally allocate than C does.
| neonz80 wrote:
| I'm not sure about other compilers, but compiling C code
| as C++ with MSVC ends up with pretty much the exact same
| code, instruction by instruction. C++ is a bit more
| strict though especially with casting, so a lot of code
| won't compile out of the box.
| vbezhenar wrote:
| C++ code compiles to a different function names in object
| file (name mangling). You probably need to put a lot of
| ugly `#ifdef __cplusplus extern "C" {` boilerplate in
| your headers, otherwise C and C++ files will not compile
| together.
| rectang wrote:
| Writing extensions for projects that support C extensions
| but may not support C++ extensions, e.g. many dynamic
| languages.
| Snarwin wrote:
| You can still write the extension in C++ and expose an
| extern "C" interface.
| rectang wrote:
| That's possible, but then the people building your
| extension need a C++ toolchain.
|
| The question was "please provide examples where switching
| to C++ involves jumping through even more hoops", and in
| my view requiring downstream to use a C++ environment
| when they're expecting to use a C environment qualifies.
| uecker wrote:
| True. For me, C++ itself is the maze of hoops I would
| rather want to avoid.
| WalterBright wrote:
| Here's how to do it in D: struct ListNode(T) {
| ListNode* next; T data; } T!int
| node;
|
| Why suffer the C preprocessor? Using preprocessor macros is like
| using a hammer for finish carpentry, rather than a nail gun. A
| nail gun is 10x faster, drives the nail perfectly every time, and
| no half moon dents in your work.
| dhooper wrote:
| Thanks, this post is about C.
|
| On some projects you must use C.
| WalterBright wrote:
| If I may may be provocative :-) this post isn't about C. It's
| about layering on a custom language using C preprocessor
| macros.
|
| My compilers were originally written in C. I started using
| the C preprocessor to do metaprogramming. After some years I
| got fed up with it and removed nearly all of the preprocessor
| use, and never looked back. My code was much easier to
| understand.
|
| An amusing story: long ago, a friend of mine working for
| Microsoft was told by a team leader that a 50K program had a
| bug in it, and sadly the developer was long gone. He'd
| assigned programmer after programmer to it, who could not fix
| it. My friend said he'd give it a try, and had it fixed in 2
| hours.
|
| The source code was written in Microsoft MASM, where the M
| stood for "Macro". You can guess where this is going. The
| developer had invented his own high level language using the
| macro system (which was much more powerful than C's).
| Unfortunately, he neglected to document it, and the other
| programmers spent weeks studying it and could not figure it
| out.
|
| The leader, astonished, asked him how he figured it out in 2
| hours? My friend said simple. He assembled it to object code,
| then disassembled the object code with obj2asm (a
| disassembler I wrote that converts object code back to source
| code). He then immediately found and fixed the bug, and
| checked in the "new" source code which was the disassembled
| version.
|
| I've seen many very smart and clever uses of the C macros,
| the article is one of them. But isn't it time to move on?
| ryao wrote:
| If the C compiler accepts it, it is C.
| ryao wrote:
| uint64_t data[] in level 2 violates the strict aliasing rule. Use
| the char type instead to avoid the violation.
| david2ndaccount wrote:
| The "typeof on old compilers" section contains the code:
| (list)->payload = (item); /* just for type checking */\
|
| That is not a no-op. That is overwriting the list head with your
| (item). Did you mean to wrap it in an `if(0)`?
___________________________________________________________________
(page generated 2025-06-30 23:00 UTC)