[HN Gopher] Parameterized types in C using the new tag compatibi...
___________________________________________________________________
Parameterized types in C using the new tag compatibility rule
Author : ingve
Score : 123 points
Date : 2025-06-27 05:31 UTC (17 hours ago)
(HTM) web link (nullprogram.com)
(TXT) w3m dump (nullprogram.com)
| unwind wrote:
| I think this is an interesting change, even though I (as someone
| who has loved C for 30+ years and use it daily in a professional
| capacity) don't immediately see a lot of use-cases I'm sure they
| can be found as the author demonstrates. Cool, and a good post!
| glouwbug wrote:
| Combined with C23's auto (see vec_for) you can technically
| backport the entirety of C++'s STL (of course with skeeto's
| limitation in his last paragraph in mind). gcc -std=c23. It is
| a _very_ useful feature for even the mundane, like resizable
| arrays: #include <stdlib.h> #include
| <stdio.h> #define vec(T) struct { T* val; int
| size; int cap; } #define vec_push(self, x) {
| \ if((self).size == (self).cap) {
| \ (self).cap = (self).cap == 0 ? 1 : 2 *
| (self).cap; \ (self).val =
| realloc((self).val, sizeof(*(self).val) * (self).cap); \
| }
| \ (self).val[(self).size++] = x;
| \ } #define vec_for(self, at, ...)
| \ for(int i = 0; i < (self).size; i++) { \
| auto at = &(self).val[i]; \ __VA_ARGS__
| \ } typedef vec(char) string;
| void string_push(string* self, char* chars) {
| if(self->size > 0) { self->size -= 1;
| } while(*chars) {
| vec_push(*self, *chars++); }
| vec_push(*self, '\0'); } int main() {
| vec(int) a = {}; vec_push(a, 1);
| vec_push(a, 2); vec_push(a, 3); vec_for(a,
| at, { printf("%d\n", *at); });
| vec(double) b = {}; vec_push(b, 1.0);
| vec_push(b, 2.0); vec_push(b, 3.0);
| vec_for(b, at, { printf("%f\n", *at);
| }); string c = {}; string_push(&c, "this is
| a test"); string_push(&c, " ");
| string_push(&c, "for c23"); printf("%s\n", c.val);
| }
| int_19h wrote:
| What I don't quite get is why they didn't go all the way in
| and basically enabled full fledged structural typing for
| anonymous structs.
| uecker wrote:
| That way my plan, but the committee had concerns about type
| safety.
| int_19h wrote:
| This would probably need some special rules around stuff
| like: typedef struct { ... } foo_t;
| typedef struct { ... } bar_t; foo_t foo = (bar_t){
| ... };
|
| i.e. these are meant to be named types and thus should
| remain nominal even though it's technically a typedef.
| And ditto for similarly defined pointer types etc. But
| this is a pattern regular enough that it can just be
| special-cased while still allowing proper structural
| typing for cases where that's obviously what is intended
| (i.e. basically everywhere else).
| Surac wrote:
| i fear this will make slopy code compile more often OK.
| ioasuncvinvaer wrote:
| Can you give an example?
| poly2it wrote:
| Dear God I hope nobody is committing unreviewed LLM output in C
| codebases.
| pests wrote:
| No worries, the LLM commits it for you.
| pjmlp wrote:
| Eventually they will generate executables directly.
| rwmj wrote:
| Slighty off-topic, why is he using ptrdiff_t (instead of size_t)
| for the cap & len types?
| r1chardnl wrote:
| From one of his other blogposts. "Guidelines for computing
| sizes and subscripts" Never mix unsigned and
| signed operands. Prefer signed. If you need to convert an
| operand, see (2).
|
| https://nullprogram.com/blog/2024/05/24/
|
| https://www.youtube.com/watch?v=wvtFGa6XJDU
| poly2it wrote:
| I still don't understand how these arguments make sense for
| new code. Naturally, sizes should be unsigned because they
| represent values which cannot be unsigned. If you do
| pointer/size arithmetic, the only solution to avoid overflows
| is to overflow-check and range-check before computation.
|
| You cannot even check the signedness of a signed size to
| detect an overflow, because signed overflow is undefined!
|
| The remaining argument from what I can tell is that
| comparisons between signed and unsigned sizes are bug-prone.
| There is however, a dedicated warning to resolve this
| instantly.
|
| It makes sense that you should be able to assign a pointer to
| a size. If the size is signed, this cannot be done due to its
| smaller capacity.
|
| Given this, I can't understand the justification. I'm
| currently using unsigned sizes. If you have anything
| contradicting, please comment :^)
| sim7c00 wrote:
| I dont know either.
|
| int somearray[10];
|
| new_ptr = somearray + signed_value;
|
| or
|
| element = somearray[signedvalue];
|
| this seems almost criminal to how my brain does logic/C
| code.
|
| The only thing i could think of is this:
|
| somearray+=11; somearray[-1] // index set to somearray[10]
| ??
|
| if i'd see my CPU execute that i'd want it to please stop.
| I'd want my compiler to shout at me like a little child,
| and be mean until i do better.
|
| -Wall -Wextra -Wextra -Wpedantic <-- that should flag i
| think any of these weird practices.
|
| As you stated tho, i'd be keen to learn why i am wrong!
| windward wrote:
| In the implementation of something like a deque or merge
| sort, you could have a variable that represents offsets
| from pointers but which could sensibly be negative. C
| developers culturally aren't as particular about
| theoretical correctness of types as developers in some
| other languages - there's a lot of implicit casting being
| used - so you'll typically see an `int` used for this. If
| you do wish to bring some rigidity to your type system,
| you may argue that this value is distinct from a general
| integer which could be used for any arithmetic and
| definitely not just a pointer. So it should be a signed
| pointer difference.
|
| Arrays aren't the best example, since they are inherently
| about linear, scalar offsets, but you might see a
| negative offset from the start of a (decayed) array in
| the implementation of an allocator with clobber canaries
| before and after the data.
| mandarax8 wrote:
| Any kind of relative/offset pointers require negative
| pointer arithmetic.
| https://www.gingerbill.org/article/2020/05/17/relative-
| point...
| poly2it wrote:
| I don't think you can make such a broad statement and be
| correct in all cases. Negative pointer arithmetic is not
| by itself a reason to use signed types, except if you
| are:
|
| 1. Certain your added value is negative.
|
| 2. Checking for underflows after computation, which you
| shouldn't.
|
| The article was interesting.
| ncruces wrote:
| > It makes sense that you should be able to assign a
| pointer to a size. If the size is signed, this cannot be
| done due to its smaller capacity.
|
| Why?
|
| By the definition of ptrdiff_t, ISTM the size of any object
| allocated by malloc _cannot_ be out of bounds of ptrdiff_t,
| so I 'm not sure how can you have a _useful_ size_t that
| uses the sign bit?
| foldr wrote:
| Stroustrup believes that signed should be preferred to
| unsigned even for values that can't be less than zero:
| https://www.open-
| std.org/jtc1/sc22/wg21/docs/papers/2019/p14...
| poly2it wrote:
| I've of course read his argument before, and I think it
| might be more applicable to C++. I exclusively program in
| C, and in that regard, the relevant aspects as far as I
| can tell wouldn't be clearly in favour of a signed type.
| I also think his discussion on iterator signedness mixes
| issues with improper bounds checking and attributes it to
| the size type signedness. What remains I cannot see
| justify using the a signed type other than "just
| because". I'm not sure it's applicable to C.
| uecker wrote:
| I also prefer signed types in C for sizes and indices.
| You can screen for overflow bugs easily using UBSan (or
| use it to prevent exploitation).
| sparkie wrote:
| C offers a different solution to the problem in Annex K of
| the standard. It provides a type `rsize_t`, which like
| `size_t` is unsigned, and has the same bit width, but where
| `RSIZE_MAX` is recommended to be `SIZE_MAX >> 1` or
| smaller. You perform bounds checking as `<= RSIZE_MAX` to
| ensure that a value used for indexing is not in the range
| that would be considered negative if converted to a signed
| integer. A negative value provided where `rsize_t` is
| expected would fail the check `<= RSIZE_MAX`.
|
| IMO, this is a better approach than using signed types for
| indexing, but AFAIK, it's not included in GCC/glibc or
| gnulib. It's an optional extension and you're supposed to
| define `__STDC_WANT_LIB_EXT1__` to use it.
|
| I don't know if any compiler actually supports it. It came
| from Microsoft and was submitted for standardization, but
| ISO made some changes from Microsoft's own implementation.
|
| https://www.open-
| std.org/JTC1/SC22/WG14/www/docs/n1173.pdf#p...
|
| https://www.open-std.org/JTC1/SC22/WG14/www/docs/n1225.pdf
| poly2it wrote:
| This is an interesting middle ground. As ncruces pointed
| out in a sibling comment, the sign bit in a pointer
| cannot be set without contradicting the ptrdiff_t type.
| That makes this seem like a reasonable approach to
| storing sizes.
| windward wrote:
| Pointer arithmetic that could overflow would probably
| involve a heap and therefore be less likely to require a
| relative, negative offset. Just use the addresses and
| errors you get from allocation.
| poly2it wrote:
| Yes, but there are definitely cases where this doesn't
| apply, for example when deriving an offset from a user
| pointer. As such this is not a universal solution.
| uecker wrote:
| "Naturally, sizes should be unsigned because they represent
| values which cannot be unsigned."
|
| Unsigned types in C have modular arithmetic, I think they
| should be used exclusively when this is needed, or maybe if
| you absolutely need the full range.
| enqk wrote:
| yeah unsigned are really about opting to perform modular
| arithmetic, or for mapping hardware registers.
|
| C is weakly typed, the basic types are really not to
| maintain invariants or detect their violation
| int_19h wrote:
| > It makes sense that you should be able to assign a
| pointer to a size. If the size is signed, this cannot be
| done due to its smaller capacity.
|
| You can, since the number of bits is the same. The mapping
| of pointer bits to signed integer bits will mean that you
| can't then do arithmetic on the resulting integers and get
| meaningful results, but the behavior of such shenanigans is
| already unspecified with no guarantees other than you can
| get an integer out of a pointer and then convert it back
| later.
|
| But also, semantically, what does it even mean to convert a
| single pointer to a size? A size of an object is naturally
| defined as the count of chars between two pointers, one
| pointing at the beginning of the object, the other at its
| end. Which is to say, a size is a subset of pointer
| difference that just happens to always be non-negative. So
| long as the implementation guarantees that for no object
| that non-negative difference will always fit in a signed
| int of the appropriate size, it seems reasonable to reflect
| this in the types.
| rurban wrote:
| Skeeto and Stroustrup are a bit confused about valid index
| types. They prefer signed, which will lead to overflows on
| negative values, but have the advantage of using only half of
| the valid ranges, so there's more heap for the rest. Very
| confused
| foobar12345quux wrote:
| Hi Rich, using ptrdiff_t is (alas) the right thing to do:
| pointer subtraction returns that type, and if the result
| doesn't fit, you get UB. And ptrdiff_t is a signed type.
|
| Assume you successfuly allocate an array "arr" with "sz"
| elements, where "sz" is of type "size_t". Then "arr + sz" is a
| valid expression (meaning the same as "&arr[sz]"), because it's
| OK to compute a pointer one past the last element of an array
| (but not to dereference it). Next you might be tempted to write
| "arr + sz - arr" (meaning the same as "&arr[sz] - &arr[0]"),
| and expect it to produce "sz", because it is valid to compute
| the element offset difference between two "pointers into an
| array or one past it". However, that difference is always
| signed, and if "sz" does not fit into "ptrdiff_t", you get UB
| from the pointer subtraction.
|
| Given that the C standard (or even POSIX, AIUI) don't relate
| ptrdiff_t and size_t to each other, we need to restrict array
| element counts, before allocation, with two limits:
|
| - nelem <= (size_t)-1 / sizeof(element_type)
|
| - nelem <= PTRDIFF_MAX
|
| (I forget which standard header #defines PTRDIFF_MAX;
| surpisingly, it is not <limits.h>.)
|
| In general, neither condition implies the other. However, once
| you have enforced both, you can store the element count as
| either "size_t" or "ptrdiff_t".
| fuhsnn wrote:
| The recent #def #enddef proposal[1] would eliminate the need for
| backslashes to define readable macros, making this pattern much
| more pleasant, finger crossed for its inclusion in C2Y!
|
| [1] https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3531.txt
| cb321 wrote:
| While long-def's might be nice, you can even back in ANSI C 89
| get rid of the backslash pattern (or need to cc -E and run
| through GNU indent/whatever) by "flipping the script" and
| defining whole files "parameterized" by their macro environment
| like https://github.com/c-blake/bst or
| https://github.com/glouw/ctl/
|
| Add a namespacing macro and you have a whole generics system,
| unlike that in TFA.
|
| So, it might add more value to have the C std add an `#include
| "file.c" name1=val1 name2=val2` preprocessor syntax where
| name1, name2 would be on a "stack" and be popped after
| processing the file. This would let you do
| types/functions/whatever "generic modules" with manual
| instantiation which kind of fits with C (manual management of
| memory, bounds checking, etc.) but preprocessor-assisted "macro
| scoping" for nested generics. Perhaps an idea to play with in
| your slimcc fork?
| glouwbug wrote:
| I've been thinking of maybe doing CTL2 with this. Maybe if
| #def makes it in.
| cb321 wrote:
| I think the #include extension could make vec_vec /
| vec_list / lst_str type nesting more natural/maybe more
| general, but maybe just my opinion. :-)
|
| I guess ctags-type tools would need updating for the new
| possible definition location. Mostly someone needs to
| decide on a separation syntax for stuff like
| `name1(..)=expansion1 name2(..)=expansion2` for "in-line"
| cases. Compiler _programs_ have had `cc
| -Dname(..)=expansion` or equivalents since the dawn of the
| language, but they actually get the OS /argv idea of
| separation from whatever CL args or Windows APIs or etc.
|
| Anyway, might makes sense to first get experience with a
| slimcc/tinycc/gcc/clang cpp++ extension. ;-) Personally,
| these days I mostly just use Nim as a better C.
| hyperbolablabla wrote:
| I really don't think the backslashes are that annoying? Seems
| unnecessary to complicate the spec with stuff like this.
| cb321 wrote:
| FWIW, https://www.cs.cornell.edu/andru/ Andrew Myers had some
| patch to gcc to do this back in the late 90s.
|
| Anyway, as is so often the case, it's about the whole
| ecosystem not just of tooling but the ecosystem of
| assumptions about & around tooling.
|
| As I mentioned in my other comment, if you want you can
| always cc -E and re-format the code somehow, although the
| main times you want to do that are for line-by-line stepping
| in debuggers or maybe for other cases of "lines as source
| coordinates" like line-by-line profilers.
|
| Of course, a more elegant solution might be just having more
| "adjustable step size/source coordinates" like "single
| ';'-statement or maybe single sequence control point in
| debuggers than just "line orientation". This is, in fact, so
| natural an idea that it seems a virtual certainty some C
| debugger has an "expressional step/next", especially if
| written by a fan more of Lisp than assembly. Of course, at
| some point a library is just debugged/trusted, but if there
| are "user hooks" those can be buggy. If it's performance
| important, it may never be unwelcome to have better profile
| reports.
|
| While addr2line has been a thing forever, I've never heard of
| an addr2expr - probably because "how would you label it?" So,
| pros & cons, but easy for debugger/profilers is one reason I
| think the parameterized file way is lower friction.
| kreco wrote:
| This Facebook repository also use a new "extension" to do a
| similar thing:
|
| https://github.com/facebookresearch/CParser#multiline-
| macros
| core-explorer wrote:
| debugging information is more precise than line numbers, it
| usually conveys line and column in a source file.
|
| Some debuggers make use of it when displaying the current
| program state, the major debuggers do not allow you to step
| into a specific sub-call on a line (e.g. skip function
| arguments and go straight to the outermost function call).
| This is purely a UI issue, they have enough information. I
| believe the nnd debugger has implemented selecting the call
| to step into.
|
| Addr2line could be amended. I am working on my own debugger
| and I keep re-implementing existing command line tools as
| part of my testing strategy. A finer-grained addr2line
| sounds like a good exercise.
| kreco wrote:
| The backslashes itself make the preprocessor way more
| complicated for no real advantage (apart when it's
| unavoidable like in macros).
|
| For every single symbol you need to actually check if there
| is a splice (backslash + new line) in it. For single pass
| compiler, this contribute to a very slow lexing phase as this
| splice can appear _anywhere_ in a C /C++ code.
| jcelerier wrote:
| I don't think this is optimizing for the right thing, I've
| sat in front of hundreds of gcc & clang compiler time
| traces and lexing is a minuscule percentage of the time
| spent in the compiler
| tialaramex wrote:
| It seems as though this makes it impossible to do the new-type
| paradigm in C23 ? If Goose and Beaver differ only in their name,
| C now thinks they're the same type so too bad we can tell a
| Beaver to fly even though we deliberately required a Goose ?
| yorwba wrote:
| "Tag compatibility" means that the name has to be the same. The
| issue the proposal is trying to address is that "struct Goose {
| float weight; }" and "struct Goose { float weight; }" are
| different types if declared in different locations of the same
| translation unit, but the same if declared in different
| translation units. With tag compatibility, they would always be
| treated as being the same.
|
| "struct Goose { float weight; }" and "struct Beaver { float
| weight; }" would remain incompatible, as would "struct { float
| weight; }" and "struct { float weight; }" (since they're
| declared without tags.)
| tialaramex wrote:
| Ah, thanks, that makes sense.
| JonChesterfield wrote:
| Not personally interested in this hack, but https://www.open-
| std.org/jtc1/sc22/wg14/www/docs/n3037.pdf means struct foo {}
| defined multiple times with the same fields in the same TU now
| refers to the same thing instead of to UB and that is a good
| bugfix.
| IAmLiterallyAB wrote:
| If you're reaching for that hack, just use C++? You don't have to
| go all in on C++-isms, you can always write C-style C++ and only
| use the features you need.
| waynecochran wrote:
| Not always a viable option -- especially for embedded and
| systems programming.
| _proofs wrote:
| i work in an embedded space in the context of devices and
| safety. if it were as simple as "just use c++ for these
| projects" most of us would use a subset, and our newer
| projects try to make this a requirement (we roll our own ETL
| for example).
|
| however for some niche os specific things, and existing
| legacy products where oversight is involved, simply rolling
| out a c++ porting of it on the next release is, well, not a
| reality, and often not worth the bureaucratic investment.
|
| while i have no commentary on the post because i'm not really
| a c programmer, i think a lot of comments forget some
| projects have requirements, and sometimes those requirements
| become obsolete, but you're struck with what you got until
| gen2, or lazyloading standardization across teams.
| pton_xd wrote:
| Yeah as someone who writes C in C++, everytime I see posts
| bending over backwards trying to fit parameterized types into C
| I just cringe a little. I understand the appeal of sticking to
| pure C, but... why do that to yourself? Come on over, we've got
| lambdas, and operator overloading for those special
| circumstances... the water's fine!
| pjmlp wrote:
| Some people will do as much as they can to hurt themselves,
| only to avoid using C++.
|
| Note as the newer versions are basically C++ without Classes
| kind of thing.
| glouwbug wrote:
| I think the main appeal is subset lock-down and compile
| times. ~5000 lines in C gets me sub second iteration times,
| while ~5000 lines in C++ hits the 10 second mark. Including
| both iostream and format in C++ gets any project up into
| the ~1.5 second mark which kills my iteration interests.
|
| Second to that I'd say the appeal is just watching
| something you've known for a long time grow slowly and
| steadily.
| kilpikaarna wrote:
| This, and the two pages of incomprehensible compiler spam
| you get when you make a typo in C++.
| uecker wrote:
| I see it the other way round. People hurt themselves by
| using C++. C++ fans will never understand it, but it you
| can solve your problem in a much simpler way, this is far
| better.
| sim7c00 wrote:
| you are so right..thought hisotrically i would of disagreed
| just by being triggered.
|
| templates is the main thing c++ has over c. its trivial to
| circumvent or escape the thing u dont 'like' about c++ like new
| and delete (personal obstacle) and write good nice modern c++
| with templates.
|
| C generic can help but ultimately, in my opinion, the need for
| templating is a good one to go from C to C++.
| uecker wrote:
| Here is my experimental library for generic types with some
| godbolt links to try: https://github.com/uecker/noplate
| o11c wrote:
| Are we getting a non-broken `_Generic` yet? Because that's the
| thing that made me give up with disgust the last project I tried
| to write in C. Manually having to do `extern template` a few
| times is nothing in comparison.
| uecker wrote:
| What is a non-broken `_Generic' ?
| o11c wrote:
| A `_Generic` that only requires its expressions to be valid
| for the type associated with them, rather than spewing errors
| everywhere.
| Arnavion wrote:
| Neat similarity to Zig's approach to generic types. The generic
| type is defined as a type constructor, a function that returns a
| type. Every instantiation of that generic type is an invocation
| of that function. So the generic growable list type is `fn
| ArrayList(comptype T: type) type` and a function that takes two
| lists of i32 and returns a third is `fn foo(a: ArrayList(i32), b:
| ArrayList(i32)) ArrayList(i32)`
___________________________________________________________________
(page generated 2025-06-27 23:00 UTC)