[HN Gopher] Parameterized types in C using the new tag compatibi...
       ___________________________________________________________________
        
       Parameterized types in C using the new tag compatibility rule
        
       Author : ingve
       Score  : 123 points
       Date   : 2025-06-27 05:31 UTC (17 hours ago)
        
 (HTM) web link (nullprogram.com)
 (TXT) w3m dump (nullprogram.com)
        
       | unwind wrote:
       | I think this is an interesting change, even though I (as someone
       | who has loved C for 30+ years and use it daily in a professional
       | capacity) don't immediately see a lot of use-cases I'm sure they
       | can be found as the author demonstrates. Cool, and a good post!
        
         | glouwbug wrote:
         | Combined with C23's auto (see vec_for) you can technically
         | backport the entirety of C++'s STL (of course with skeeto's
         | limitation in his last paragraph in mind). gcc -std=c23. It is
         | a _very_ useful feature for even the mundane, like resizable
         | arrays:                 #include <stdlib.h>       #include
         | <stdio.h>              #define vec(T) struct { T* val; int
         | size; int cap; }              #define vec_push(self, x) {
         | \           if((self).size == (self).cap) {
         | \               (self).cap = (self).cap == 0 ? 1 : 2 *
         | (self).cap;                  \               (self).val =
         | realloc((self).val, sizeof(*(self).val) * (self).cap); \
         | }
         | \           (self).val[(self).size++] = x;
         | \       }              #define vec_for(self, at, ...)
         | \           for(int i = 0; i < (self).size; i++) { \
         | auto at = &(self).val[i];          \               __VA_ARGS__
         | \           }              typedef vec(char) string;
         | void string_push(string* self, char* chars)       {
         | if(self->size > 0)           {               self->size -= 1;
         | }           while(*chars)           {
         | vec_push(*self, *chars++);           }
         | vec_push(*self, '\0');       }              int main()       {
         | vec(int) a = {};           vec_push(a, 1);
         | vec_push(a, 2);           vec_push(a, 3);           vec_for(a,
         | at, {               printf("%d\n", *at);           });
         | vec(double) b = {};           vec_push(b, 1.0);
         | vec_push(b, 2.0);           vec_push(b, 3.0);
         | vec_for(b, at, {               printf("%f\n", *at);
         | });           string c = {};           string_push(&c, "this is
         | a test");           string_push(&c, " ");
         | string_push(&c, "for c23");           printf("%s\n", c.val);
         | }
        
           | int_19h wrote:
           | What I don't quite get is why they didn't go all the way in
           | and basically enabled full fledged structural typing for
           | anonymous structs.
        
             | uecker wrote:
             | That way my plan, but the committee had concerns about type
             | safety.
        
               | int_19h wrote:
               | This would probably need some special rules around stuff
               | like:                  typedef struct { ... } foo_t;
               | typedef struct { ... } bar_t;        foo_t foo = (bar_t){
               | ... };
               | 
               | i.e. these are meant to be named types and thus should
               | remain nominal even though it's technically a typedef.
               | And ditto for similarly defined pointer types etc. But
               | this is a pattern regular enough that it can just be
               | special-cased while still allowing proper structural
               | typing for cases where that's obviously what is intended
               | (i.e. basically everywhere else).
        
       | Surac wrote:
       | i fear this will make slopy code compile more often OK.
        
         | ioasuncvinvaer wrote:
         | Can you give an example?
        
         | poly2it wrote:
         | Dear God I hope nobody is committing unreviewed LLM output in C
         | codebases.
        
           | pests wrote:
           | No worries, the LLM commits it for you.
        
           | pjmlp wrote:
           | Eventually they will generate executables directly.
        
       | rwmj wrote:
       | Slighty off-topic, why is he using ptrdiff_t (instead of size_t)
       | for the cap & len types?
        
         | r1chardnl wrote:
         | From one of his other blogposts. "Guidelines for computing
         | sizes and subscripts"                 Never mix unsigned and
         | signed operands. Prefer signed. If you need to convert an
         | operand, see (2).
         | 
         | https://nullprogram.com/blog/2024/05/24/
         | 
         | https://www.youtube.com/watch?v=wvtFGa6XJDU
        
           | poly2it wrote:
           | I still don't understand how these arguments make sense for
           | new code. Naturally, sizes should be unsigned because they
           | represent values which cannot be unsigned. If you do
           | pointer/size arithmetic, the only solution to avoid overflows
           | is to overflow-check and range-check before computation.
           | 
           | You cannot even check the signedness of a signed size to
           | detect an overflow, because signed overflow is undefined!
           | 
           | The remaining argument from what I can tell is that
           | comparisons between signed and unsigned sizes are bug-prone.
           | There is however, a dedicated warning to resolve this
           | instantly.
           | 
           | It makes sense that you should be able to assign a pointer to
           | a size. If the size is signed, this cannot be done due to its
           | smaller capacity.
           | 
           | Given this, I can't understand the justification. I'm
           | currently using unsigned sizes. If you have anything
           | contradicting, please comment :^)
        
             | sim7c00 wrote:
             | I dont know either.
             | 
             | int somearray[10];
             | 
             | new_ptr = somearray + signed_value;
             | 
             | or
             | 
             | element = somearray[signedvalue];
             | 
             | this seems almost criminal to how my brain does logic/C
             | code.
             | 
             | The only thing i could think of is this:
             | 
             | somearray+=11; somearray[-1] // index set to somearray[10]
             | ??
             | 
             | if i'd see my CPU execute that i'd want it to please stop.
             | I'd want my compiler to shout at me like a little child,
             | and be mean until i do better.
             | 
             | -Wall -Wextra -Wextra -Wpedantic <-- that should flag i
             | think any of these weird practices.
             | 
             | As you stated tho, i'd be keen to learn why i am wrong!
        
               | windward wrote:
               | In the implementation of something like a deque or merge
               | sort, you could have a variable that represents offsets
               | from pointers but which could sensibly be negative. C
               | developers culturally aren't as particular about
               | theoretical correctness of types as developers in some
               | other languages - there's a lot of implicit casting being
               | used - so you'll typically see an `int` used for this. If
               | you do wish to bring some rigidity to your type system,
               | you may argue that this value is distinct from a general
               | integer which could be used for any arithmetic and
               | definitely not just a pointer. So it should be a signed
               | pointer difference.
               | 
               | Arrays aren't the best example, since they are inherently
               | about linear, scalar offsets, but you might see a
               | negative offset from the start of a (decayed) array in
               | the implementation of an allocator with clobber canaries
               | before and after the data.
        
               | mandarax8 wrote:
               | Any kind of relative/offset pointers require negative
               | pointer arithmetic.
               | https://www.gingerbill.org/article/2020/05/17/relative-
               | point...
        
               | poly2it wrote:
               | I don't think you can make such a broad statement and be
               | correct in all cases. Negative pointer arithmetic is not
               | by itself a reason to use signed types, except if you
               | are:
               | 
               | 1. Certain your added value is negative.
               | 
               | 2. Checking for underflows after computation, which you
               | shouldn't.
               | 
               | The article was interesting.
        
             | ncruces wrote:
             | > It makes sense that you should be able to assign a
             | pointer to a size. If the size is signed, this cannot be
             | done due to its smaller capacity.
             | 
             | Why?
             | 
             | By the definition of ptrdiff_t, ISTM the size of any object
             | allocated by malloc _cannot_ be out of bounds of ptrdiff_t,
             | so I 'm not sure how can you have a _useful_ size_t that
             | uses the sign bit?
        
             | foldr wrote:
             | Stroustrup believes that signed should be preferred to
             | unsigned even for values that can't be less than zero:
             | https://www.open-
             | std.org/jtc1/sc22/wg21/docs/papers/2019/p14...
        
               | poly2it wrote:
               | I've of course read his argument before, and I think it
               | might be more applicable to C++. I exclusively program in
               | C, and in that regard, the relevant aspects as far as I
               | can tell wouldn't be clearly in favour of a signed type.
               | I also think his discussion on iterator signedness mixes
               | issues with improper bounds checking and attributes it to
               | the size type signedness. What remains I cannot see
               | justify using the a signed type other than "just
               | because". I'm not sure it's applicable to C.
        
               | uecker wrote:
               | I also prefer signed types in C for sizes and indices.
               | You can screen for overflow bugs easily using UBSan (or
               | use it to prevent exploitation).
        
             | sparkie wrote:
             | C offers a different solution to the problem in Annex K of
             | the standard. It provides a type `rsize_t`, which like
             | `size_t` is unsigned, and has the same bit width, but where
             | `RSIZE_MAX` is recommended to be `SIZE_MAX >> 1` or
             | smaller. You perform bounds checking as `<= RSIZE_MAX` to
             | ensure that a value used for indexing is not in the range
             | that would be considered negative if converted to a signed
             | integer. A negative value provided where `rsize_t` is
             | expected would fail the check `<= RSIZE_MAX`.
             | 
             | IMO, this is a better approach than using signed types for
             | indexing, but AFAIK, it's not included in GCC/glibc or
             | gnulib. It's an optional extension and you're supposed to
             | define `__STDC_WANT_LIB_EXT1__` to use it.
             | 
             | I don't know if any compiler actually supports it. It came
             | from Microsoft and was submitted for standardization, but
             | ISO made some changes from Microsoft's own implementation.
             | 
             | https://www.open-
             | std.org/JTC1/SC22/WG14/www/docs/n1173.pdf#p...
             | 
             | https://www.open-std.org/JTC1/SC22/WG14/www/docs/n1225.pdf
        
               | poly2it wrote:
               | This is an interesting middle ground. As ncruces pointed
               | out in a sibling comment, the sign bit in a pointer
               | cannot be set without contradicting the ptrdiff_t type.
               | That makes this seem like a reasonable approach to
               | storing sizes.
        
             | windward wrote:
             | Pointer arithmetic that could overflow would probably
             | involve a heap and therefore be less likely to require a
             | relative, negative offset. Just use the addresses and
             | errors you get from allocation.
        
               | poly2it wrote:
               | Yes, but there are definitely cases where this doesn't
               | apply, for example when deriving an offset from a user
               | pointer. As such this is not a universal solution.
        
             | uecker wrote:
             | "Naturally, sizes should be unsigned because they represent
             | values which cannot be unsigned."
             | 
             | Unsigned types in C have modular arithmetic, I think they
             | should be used exclusively when this is needed, or maybe if
             | you absolutely need the full range.
        
               | enqk wrote:
               | yeah unsigned are really about opting to perform modular
               | arithmetic, or for mapping hardware registers.
               | 
               | C is weakly typed, the basic types are really not to
               | maintain invariants or detect their violation
        
             | int_19h wrote:
             | > It makes sense that you should be able to assign a
             | pointer to a size. If the size is signed, this cannot be
             | done due to its smaller capacity.
             | 
             | You can, since the number of bits is the same. The mapping
             | of pointer bits to signed integer bits will mean that you
             | can't then do arithmetic on the resulting integers and get
             | meaningful results, but the behavior of such shenanigans is
             | already unspecified with no guarantees other than you can
             | get an integer out of a pointer and then convert it back
             | later.
             | 
             | But also, semantically, what does it even mean to convert a
             | single pointer to a size? A size of an object is naturally
             | defined as the count of chars between two pointers, one
             | pointing at the beginning of the object, the other at its
             | end. Which is to say, a size is a subset of pointer
             | difference that just happens to always be non-negative. So
             | long as the implementation guarantees that for no object
             | that non-negative difference will always fit in a signed
             | int of the appropriate size, it seems reasonable to reflect
             | this in the types.
        
         | rurban wrote:
         | Skeeto and Stroustrup are a bit confused about valid index
         | types. They prefer signed, which will lead to overflows on
         | negative values, but have the advantage of using only half of
         | the valid ranges, so there's more heap for the rest. Very
         | confused
        
         | foobar12345quux wrote:
         | Hi Rich, using ptrdiff_t is (alas) the right thing to do:
         | pointer subtraction returns that type, and if the result
         | doesn't fit, you get UB. And ptrdiff_t is a signed type.
         | 
         | Assume you successfuly allocate an array "arr" with "sz"
         | elements, where "sz" is of type "size_t". Then "arr + sz" is a
         | valid expression (meaning the same as "&arr[sz]"), because it's
         | OK to compute a pointer one past the last element of an array
         | (but not to dereference it). Next you might be tempted to write
         | "arr + sz - arr" (meaning the same as "&arr[sz] - &arr[0]"),
         | and expect it to produce "sz", because it is valid to compute
         | the element offset difference between two "pointers into an
         | array or one past it". However, that difference is always
         | signed, and if "sz" does not fit into "ptrdiff_t", you get UB
         | from the pointer subtraction.
         | 
         | Given that the C standard (or even POSIX, AIUI) don't relate
         | ptrdiff_t and size_t to each other, we need to restrict array
         | element counts, before allocation, with two limits:
         | 
         | - nelem <= (size_t)-1 / sizeof(element_type)
         | 
         | - nelem <= PTRDIFF_MAX
         | 
         | (I forget which standard header #defines PTRDIFF_MAX;
         | surpisingly, it is not <limits.h>.)
         | 
         | In general, neither condition implies the other. However, once
         | you have enforced both, you can store the element count as
         | either "size_t" or "ptrdiff_t".
        
       | fuhsnn wrote:
       | The recent #def #enddef proposal[1] would eliminate the need for
       | backslashes to define readable macros, making this pattern much
       | more pleasant, finger crossed for its inclusion in C2Y!
       | 
       | [1] https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3531.txt
        
         | cb321 wrote:
         | While long-def's might be nice, you can even back in ANSI C 89
         | get rid of the backslash pattern (or need to cc -E and run
         | through GNU indent/whatever) by "flipping the script" and
         | defining whole files "parameterized" by their macro environment
         | like https://github.com/c-blake/bst or
         | https://github.com/glouw/ctl/
         | 
         | Add a namespacing macro and you have a whole generics system,
         | unlike that in TFA.
         | 
         | So, it might add more value to have the C std add an `#include
         | "file.c" name1=val1 name2=val2` preprocessor syntax where
         | name1, name2 would be on a "stack" and be popped after
         | processing the file. This would let you do
         | types/functions/whatever "generic modules" with manual
         | instantiation which kind of fits with C (manual management of
         | memory, bounds checking, etc.) but preprocessor-assisted "macro
         | scoping" for nested generics. Perhaps an idea to play with in
         | your slimcc fork?
        
           | glouwbug wrote:
           | I've been thinking of maybe doing CTL2 with this. Maybe if
           | #def makes it in.
        
             | cb321 wrote:
             | I think the #include extension could make vec_vec /
             | vec_list / lst_str type nesting more natural/maybe more
             | general, but maybe just my opinion. :-)
             | 
             | I guess ctags-type tools would need updating for the new
             | possible definition location. Mostly someone needs to
             | decide on a separation syntax for stuff like
             | `name1(..)=expansion1 name2(..)=expansion2` for "in-line"
             | cases. Compiler _programs_ have had `cc
             | -Dname(..)=expansion` or equivalents since the dawn of the
             | language, but they actually get the OS /argv idea of
             | separation from whatever CL args or Windows APIs or etc.
             | 
             | Anyway, might makes sense to first get experience with a
             | slimcc/tinycc/gcc/clang cpp++ extension. ;-) Personally,
             | these days I mostly just use Nim as a better C.
        
         | hyperbolablabla wrote:
         | I really don't think the backslashes are that annoying? Seems
         | unnecessary to complicate the spec with stuff like this.
        
           | cb321 wrote:
           | FWIW, https://www.cs.cornell.edu/andru/ Andrew Myers had some
           | patch to gcc to do this back in the late 90s.
           | 
           | Anyway, as is so often the case, it's about the whole
           | ecosystem not just of tooling but the ecosystem of
           | assumptions about & around tooling.
           | 
           | As I mentioned in my other comment, if you want you can
           | always cc -E and re-format the code somehow, although the
           | main times you want to do that are for line-by-line stepping
           | in debuggers or maybe for other cases of "lines as source
           | coordinates" like line-by-line profilers.
           | 
           | Of course, a more elegant solution might be just having more
           | "adjustable step size/source coordinates" like "single
           | ';'-statement or maybe single sequence control point in
           | debuggers than just "line orientation". This is, in fact, so
           | natural an idea that it seems a virtual certainty some C
           | debugger has an "expressional step/next", especially if
           | written by a fan more of Lisp than assembly. Of course, at
           | some point a library is just debugged/trusted, but if there
           | are "user hooks" those can be buggy. If it's performance
           | important, it may never be unwelcome to have better profile
           | reports.
           | 
           | While addr2line has been a thing forever, I've never heard of
           | an addr2expr - probably because "how would you label it?" So,
           | pros & cons, but easy for debugger/profilers is one reason I
           | think the parameterized file way is lower friction.
        
             | kreco wrote:
             | This Facebook repository also use a new "extension" to do a
             | similar thing:
             | 
             | https://github.com/facebookresearch/CParser#multiline-
             | macros
        
             | core-explorer wrote:
             | debugging information is more precise than line numbers, it
             | usually conveys line and column in a source file.
             | 
             | Some debuggers make use of it when displaying the current
             | program state, the major debuggers do not allow you to step
             | into a specific sub-call on a line (e.g. skip function
             | arguments and go straight to the outermost function call).
             | This is purely a UI issue, they have enough information. I
             | believe the nnd debugger has implemented selecting the call
             | to step into.
             | 
             | Addr2line could be amended. I am working on my own debugger
             | and I keep re-implementing existing command line tools as
             | part of my testing strategy. A finer-grained addr2line
             | sounds like a good exercise.
        
           | kreco wrote:
           | The backslashes itself make the preprocessor way more
           | complicated for no real advantage (apart when it's
           | unavoidable like in macros).
           | 
           | For every single symbol you need to actually check if there
           | is a splice (backslash + new line) in it. For single pass
           | compiler, this contribute to a very slow lexing phase as this
           | splice can appear _anywhere_ in a C /C++ code.
        
             | jcelerier wrote:
             | I don't think this is optimizing for the right thing, I've
             | sat in front of hundreds of gcc & clang compiler time
             | traces and lexing is a minuscule percentage of the time
             | spent in the compiler
        
       | tialaramex wrote:
       | It seems as though this makes it impossible to do the new-type
       | paradigm in C23 ? If Goose and Beaver differ only in their name,
       | C now thinks they're the same type so too bad we can tell a
       | Beaver to fly even though we deliberately required a Goose ?
        
         | yorwba wrote:
         | "Tag compatibility" means that the name has to be the same. The
         | issue the proposal is trying to address is that "struct Goose {
         | float weight; }" and "struct Goose { float weight; }" are
         | different types if declared in different locations of the same
         | translation unit, but the same if declared in different
         | translation units. With tag compatibility, they would always be
         | treated as being the same.
         | 
         | "struct Goose { float weight; }" and "struct Beaver { float
         | weight; }" would remain incompatible, as would "struct { float
         | weight; }" and "struct { float weight; }" (since they're
         | declared without tags.)
        
           | tialaramex wrote:
           | Ah, thanks, that makes sense.
        
       | JonChesterfield wrote:
       | Not personally interested in this hack, but https://www.open-
       | std.org/jtc1/sc22/wg14/www/docs/n3037.pdf means struct foo {}
       | defined multiple times with the same fields in the same TU now
       | refers to the same thing instead of to UB and that is a good
       | bugfix.
        
       | IAmLiterallyAB wrote:
       | If you're reaching for that hack, just use C++? You don't have to
       | go all in on C++-isms, you can always write C-style C++ and only
       | use the features you need.
        
         | waynecochran wrote:
         | Not always a viable option -- especially for embedded and
         | systems programming.
        
           | _proofs wrote:
           | i work in an embedded space in the context of devices and
           | safety. if it were as simple as "just use c++ for these
           | projects" most of us would use a subset, and our newer
           | projects try to make this a requirement (we roll our own ETL
           | for example).
           | 
           | however for some niche os specific things, and existing
           | legacy products where oversight is involved, simply rolling
           | out a c++ porting of it on the next release is, well, not a
           | reality, and often not worth the bureaucratic investment.
           | 
           | while i have no commentary on the post because i'm not really
           | a c programmer, i think a lot of comments forget some
           | projects have requirements, and sometimes those requirements
           | become obsolete, but you're struck with what you got until
           | gen2, or lazyloading standardization across teams.
        
         | pton_xd wrote:
         | Yeah as someone who writes C in C++, everytime I see posts
         | bending over backwards trying to fit parameterized types into C
         | I just cringe a little. I understand the appeal of sticking to
         | pure C, but... why do that to yourself? Come on over, we've got
         | lambdas, and operator overloading for those special
         | circumstances... the water's fine!
        
           | pjmlp wrote:
           | Some people will do as much as they can to hurt themselves,
           | only to avoid using C++.
           | 
           | Note as the newer versions are basically C++ without Classes
           | kind of thing.
        
             | glouwbug wrote:
             | I think the main appeal is subset lock-down and compile
             | times. ~5000 lines in C gets me sub second iteration times,
             | while ~5000 lines in C++ hits the 10 second mark. Including
             | both iostream and format in C++ gets any project up into
             | the ~1.5 second mark which kills my iteration interests.
             | 
             | Second to that I'd say the appeal is just watching
             | something you've known for a long time grow slowly and
             | steadily.
        
               | kilpikaarna wrote:
               | This, and the two pages of incomprehensible compiler spam
               | you get when you make a typo in C++.
        
             | uecker wrote:
             | I see it the other way round. People hurt themselves by
             | using C++. C++ fans will never understand it, but it you
             | can solve your problem in a much simpler way, this is far
             | better.
        
         | sim7c00 wrote:
         | you are so right..thought hisotrically i would of disagreed
         | just by being triggered.
         | 
         | templates is the main thing c++ has over c. its trivial to
         | circumvent or escape the thing u dont 'like' about c++ like new
         | and delete (personal obstacle) and write good nice modern c++
         | with templates.
         | 
         | C generic can help but ultimately, in my opinion, the need for
         | templating is a good one to go from C to C++.
        
       | uecker wrote:
       | Here is my experimental library for generic types with some
       | godbolt links to try: https://github.com/uecker/noplate
        
       | o11c wrote:
       | Are we getting a non-broken `_Generic` yet? Because that's the
       | thing that made me give up with disgust the last project I tried
       | to write in C. Manually having to do `extern template` a few
       | times is nothing in comparison.
        
         | uecker wrote:
         | What is a non-broken `_Generic' ?
        
           | o11c wrote:
           | A `_Generic` that only requires its expressions to be valid
           | for the type associated with them, rather than spewing errors
           | everywhere.
        
       | Arnavion wrote:
       | Neat similarity to Zig's approach to generic types. The generic
       | type is defined as a type constructor, a function that returns a
       | type. Every instantiation of that generic type is an invocation
       | of that function. So the generic growable list type is `fn
       | ArrayList(comptype T: type) type` and a function that takes two
       | lists of i32 and returns a third is `fn foo(a: ArrayList(i32), b:
       | ArrayList(i32)) ArrayList(i32)`
        
       ___________________________________________________________________
       (page generated 2025-06-27 23:00 UTC)