[HN Gopher] Learning that you can use unions in C for grouping t...
___________________________________________________________________
Learning that you can use unions in C for grouping things into
namespaces
Author : deafcalculus
Score : 102 points
Date : 2021-08-01 14:48 UTC (8 hours ago)
(HTM) web link (utcc.utoronto.ca)
(TXT) w3m dump (utcc.utoronto.ca)
| 10000truths wrote:
| Anonymous nested structs are also quite useful for creating
| struct fields with explicit offsets: #include
| <stdio.h> #include <stdint.h> #define
| YDUMMY(suffix, size) char dummy##suffix[size] #define
| XDUMMY(suffix, size) YDUMMY(suffix, size) #define
| PAD(size) XDUMMY(__COUNTER__, size) struct
| ExplicitLayoutStruct { union { struct
| __attribute__((packed)) { PAD(3); uint32_t foo; };
| struct __attribute__((packed)) { PAD(5); uint16_t bar; };
| struct __attribute__((packed)) { PAD(13); uint64_t baz; };
| }; }; int main(void) { //
| offset foo = 3 // offset bar = 5 //
| offset baz = 13 printf("offset foo = %d\n",
| offsetof(struct ExplicitLayoutStruct, foo));
| printf("offset bar = %d\n", offsetof(struct ExplicitLayoutStruct,
| bar)); printf("offset baz = %d\n", offsetof(struct
| ExplicitLayoutStruct, baz)); return 0; }
| WalterBright wrote:
| Anytime macros are used for metaprogramming, it's time to reach
| for a more powerful language.
| Spivak wrote:
| Doesn't work in a lot of cases unfortunately. If you're
| writing a library designed to be consumed by other languages
| you're stuck with writing C abi compatible code which can be
| written in other languages that can "extern" them but it puts
| limits on what's possible in those libraries.
| 10000truths wrote:
| Macros are useful, as long as they're used sparingly. I think
| that in this case, it's used well - the struct is still
| perfectly readable, and the sole purpose of it is to make it
| so that you don't have to manually name the dummy fields. But
| you could totally just write out dummy1, dummy2, dummy3 etc.
| yourself if you want to get rid of the macros.
| WalterBright wrote:
| > Macros are useful, as long as they're used sparingly.
|
| Everybody says that. Everybody believes it. And everybody
| goes to town making a rat's nest with macros, just like
| that snarl of cables under my desk that resist all attempts
| to make it nice.
|
| Myself included. I've even written an article about clever
| C macros. Look, ma! I was so proud of myself.
|
| But then I got older. I started replacing the macros in my
| C code with regular code. It turns out they weren't that
| necessary at all. I liked the C code a lot better when it
| didn't have a single # in it other than #include.
| throwaway17_17 wrote:
| I want to be clear about your meaning, because I don't know
| if I'm reading your comment correctly. Are you referring
| explicitly to syntax based, preprocessor macros? Or does your
| comment extend to other metaprogramming techniques? I am
| inclined to think you mean the first considering the amount
| of emphasis on generic programming in D? Just curious.
| WalterBright wrote:
| I'm referring to both syntax based (AST) macros, and text
| based (preprocessor) macros. The latter, of course, are
| much worse.
|
| An example of the former is so-called "expression
| templates" in C++. I've seen them used to create a regular
| expression language using C++ expression templates. The
| author was quite proud of them, and indeed they were very
| clever.
|
| However nice the execution, the concept was terrible. There
| was no way to visually tell that some ordinary code was
| actually doing regular expressions.
|
| C++ expression templates had their day in the sun, but
| fortunately they seem to have been thrown onto the trash
| pile of sounds-like-a-good-idea-but-oops.
|
| (I wrote an article showing how to do expression templates
| in D, mainly to answer criticisms that D couldn't do it,
| not because it was a good idea.)
| cryptonector wrote:
| You might as well have written that "any time you're reaching
| for C, it's time to reach for a more powerful language".
|
| But if -sadly- you must use C, metaprogramming using macros
| is not a terrible thing.
| nyanpasu64 wrote:
| Do foo and bar deliberately overlap?
| 10000truths wrote:
| Yes, I was looking to demonstrate the flexibility of the
| approach by including overlapping fields.
| midjji wrote:
| If you write to either, accessing the other, even on the
| overlap, is undefined behaviour
| 10000truths wrote:
| Type punning/aliasing with unions is well defined in gcc.
| Linus even has a humorous rant about it on the topic:
|
| https://www.yodaiken.com/2018/06/07/torvalds-on-aliasing/
|
| Sure, it's compiler-specific, but I'm already using
| `__attribute__((packed))` anyways.
| midjji wrote:
| All undefined behaviour is well defined for each
| compiler, what it really means is implementation defined
| and subject to change without notice or documentation
| with every compiler version or host os what flags are
| enabled or a thousand other things. Why use an approach
| which strictly relies on specific versions of specific
| compilers, rather than a completely portable and standard
| compliant struct with char array and a few char pointers?
| Or if you want a convenient interface and aren't
| explicitly writing for the kernel, switch to a restricted
| subset of C++ and do it right?
| rocqua wrote:
| GCC explicitly states znd documents that type punning
| through unions will work as expected. This doesn't work
| by accident. It works very explicitly by design of GCC.
|
| Different GCC versions aren't randomly going to change
| documented behavior. And when they accidentally do, they
| will consider it a bug.
| gumby wrote:
| One of the very few things from C that I miss in C++ is
| anonymous structs and enums. I really don't understand why they
| are not allowed.
|
| That is, C style enums don't have to have a name but "type
| safe" (enum class) ones do. One classic use is to name an
| otherwise boolean option in a function signature; there's
| typically no need to otherwise name it.
|
| C++ incompatibly requires a name for all struct and class
| declarations, again a waste when you will only have a single
| object of a given type.
| WalterBright wrote:
| > I really don't understand why they are not allowed.
|
| I don't, either. Such were in D from 2000 or so.
|
| I also don't understand why `class` in C++ sits in the tag
| name space. I wrote Bjarne in the 1980s asking him to remove
| it from the tag name space, as the tag name space is an
| abomination. He replied that there was too much water under
| that bridge.
|
| D doesn't have the tag name space, and in 20 years not a
| single person has asked for it.
|
| This did cause some trouble for me with ImportC to support
| things like: struct S { ... }; int
| S;
|
| but I found a way. Although such code is an abomination. I've
| only seen it in the wild in system .h files.
| Gibbon1 wrote:
| The only explanation I saw was that C++ standards guys were
| horrified by the idea of unpredictable side effects as a
| result of initialization of a struct.
|
| I think C++ though is adding them.
|
| What I'd like in c is designated function parameters.
| // these the same bar(.a = 10, .b = 12); bar(.b
| = 12, .a = 10);
| cjaybo wrote:
| > C++ incompatibly requires a name for all struct and class
| declarations
|
| You're right about "enum class", but anonymous classes and
| structs are perfectly valid in C++:
|
| https://godbolt.org/z/7MbcqhnoK
| dataflow wrote:
| Try struct S { struct { int x; }; };
|
| under -pedantic and you'll get warning: ISO
| C++ prohibits anonymous structs [-Wpedantic]
| midjji wrote:
| Pedantic is for the older C++ standard, its not pedantic
| for the latter e.g c++11, I think this changed.
| dataflow wrote:
| Well that blows my mind, I never realized pedantic
| ignores the language setting. Is this the only case where
| it does that?
| junon wrote:
| No, pedantic is for disabling compiler extensions. You
| still need to explicitly specify a standard.
| midjji wrote:
| Use a enum in a namespace, or anonymous namespace
| gumby wrote:
| This is an example of the desired use case:
| static obj& some_call (obj& o, enum struct { abandon, save
| } disposition) { ... };
|
| This is a common case (and should be more common) to avoid
| using an obscure boolean flag, which can lead to bugs. It
| shouldn't need a name.
|
| An anonymous namespace just means the name itself won't
| leak out; under C++ rules I _need_ the name even to specify
| the enum tag, which is absurd.
| midjji wrote:
| There are two kinds of undefined behaviour being invoked in
| using this. Its a horrible idea and a horrible code smell, get
| rid of it if you ever see something like this.
| 10000truths wrote:
| I don't see any undefined behavior here. As I mentioned
| below, gcc explicitly documents type punning via unions as
| being well defined. But yes, this is compiler specific and is
| not guaranteed to work elsewhere.
| formerly_proven wrote:
| Accessing packed struct members works fine on x86, but will
| blow up at runtime or do weird things on platforms which
| don't support unaligned loads or stores.
|
| The correct way to access packed structs is through memcpy,
| just like you'd access any other potentially unaligned
| object.
| 10000truths wrote:
| For architectures where unaligned accesses are illegal,
| gcc will generate multiple load/store instructions when
| accessing packed struct fields by name. The main caveat
| to look out for is taking the address of a packed struct
| member and then dereferencing it.
| midjji wrote:
| There is absolutely undefined behaviour there. Undefined
| behaviour is defined not as nasal daemons but as: The
| compiler implementer does not guarantee that this behaviour
| will be hardware, circumstance, compiler version, or os
| consistent, nor that we will warn if we change this.
|
| Packed is technically not a undefined behaviour, but it is
| certainly a trap. Especially because the compiler macros
| leads people to make defines which select packed by
| compiler automatically. Then the special case of didn't
| recognize compiler is just left empty, meaning compiles but
| no longer does what you think.
| fifjdynb wrote:
| _You_ don 't get to decide what UB means. It really does
| mean nasal demons are a possibility: all bets are off
| when you run that executable. Use of the term "undefined
| behaviour" to mean something else may be on the increase,
| unfortunately (https://mars.nasa.gov/technology/helicopte
| r/status/298/what-...), but if we're talking about C,
| it's meaning is fixed.
| rightbyte wrote:
| I don't regard this as a "perverse" hack. If I ever do embedded
| memory mapped stuff in C11 this is way too tempting.
| midjji wrote:
| You are practically guaranteed to invoke undefined behaviour if
| you do. Just use a map on a std::array of e.g. std::byte
| dexterhaslem wrote:
| they said C11 tho
| flohofwoe wrote:
| I'm using anonymous nested structs extensively for grouping
| related items, but I consider the extra field name a feature, not
| something that should be hidden:
|
| https://github.com/floooh/sokol-samples/blob/bfb30ea00b5948f...
|
| (also note the 'inplace initialization' which follows the state
| struct definition using C99's designated initialization)
| remram wrote:
| The first example seems wrong, instead of `struct sub { ... };`
| what is meant is `struct { ... } sub;`
| siebenmann wrote:
| You're right; thanks for noticing and I've updated the first
| example. My C is a bit rusty these days and I didn't check it
| with a compiler the way I should have.
|
| (I'm the author of the linked-to article.)
| kevin_thibedeau wrote:
| The result is uglier and less maintainable than a pair of macros.
| Or just stop trying to hide syntax. This is ultimately on the
| same level as typedefing pointers.
| adamnemecek wrote:
| Don't actually do this.
| sp332 wrote:
| The Linux kernel is using this for bounds checking.
| https://news.ycombinator.com/item?id=28015263
| ufo wrote:
| Like the parent poster, when I read the article I assumed
| that there was no conceivable reason to ever use this feature
| in a real C program. Let me just say that I'm pleasantly
| surprised to be proven wrong!
| Subsentient wrote:
| Bleurgh. I have a deep soft spot for C, and I'm known to get
| twisted pleasure from using obscure language features in new ways
| to annoy people, but this is a level of abuse that even I can't
| get behind. If you need namespacing, use C++. As much as I love
| C, it's terrible for large projects.
| kktkti9 wrote:
| People will make a mess of a large project regardless of the
| language.
| vbezhenar wrote:
| Linux kernel is large project and clearly C is sufficient for
| it, given the fact that migrating to C++ would probably be very
| easy (not using all C++ features, but just selected ones), yet
| it did not happen.
|
| I think that C++ is better than C, but C is not that bad, even
| for large projects.
| AlotOfReading wrote:
| A big issue with introducing C++ into a codebase is that it's
| incredibly hard to stick to a particular subset or standard.
| There's always a well-justified argument for the next
| standard or "just this one additional feature". Eventually
| you end up with the whole kitchen sink, regardless of where
| you started.
|
| I've had far more success hard-firewalling C++ into its own
| box where programmers can use whatever they can get running
| than trying to limit people to subsets.
| adtac wrote:
| the kernel has to live with the choices it made in the 90s,
| you don't
| midjji wrote:
| Yeah they should have upgraded to some restricted subset of
| C++ or new restrictive language ages ago. I mostly buy the
| arguments against having exceptions, perhaps even against
| polymorphism in general, but the argument against
| destructors, or atomics... hell no.
| humanrebar wrote:
| > ...against polymorphism...
|
| C has polymorphism. Inheritance-based virtual dispatch is
| just one kind of polymorphism. It's common to wire up
| polymorphism in C with bespoke data structures using
| tagged unions it function pointers. Changing an
| implementation at link time is even a form of
| polymorphism.
| dkersten wrote:
| > Linux kernel is large project and clearly C is sufficient
| for it
|
| Sure, and operating systems have been written in assmebly
| too. The question is whether it would be better than just
| sufficient if Linux were written in C++, today (ie C++17 or
| 20, not something old). Switching now probably wouldn't be
| feasible (even ignoring technical reasons, the kernel
| developer community is familiar with the C codebase and code
| standards and bought into it), but if Linux were started
| today, would it be a better choice?
|
| Maybe the answer is still no and C would still be chosen, but
| the choice today is very different than it was when Linux was
| started. Of course, maybe Rust or something would be chosen
| today instead.
| dathinab wrote:
| And the Kernel devs would probably get _really_ annoyed if
| you try to push this kind of name-spacing.
|
| > C++ would probably be very easy
|
| Not necessary, besides some small? problems due to the C++
| allowing "more magic optimizations" then C they would switch
| to a sub-set of C++, and it might be so you would need to
| communicate to all contributors that a lot of C++ things are
| not allowed. And it might be easier to simple not use C++. I
| mean if it would be that easy the kernel likely would have
| switched.
| PostThisTooFast wrote:
| I will point out, however, that you can abuse "struct" in a
| similar way to simulate namespaces in Swift.
| bruce343434 wrote:
| Imo this is not "perverse". In my vector library I alias a vec3
| as float x,y,z and float[3] using this technique.
| midjji wrote:
| This is also known as the most common invocation of undefined
| behaviour in game programming. If you do this, write to y, then
| read from [1]. You are invoking undefined behaviour, and
| compilers doing different things here between windows, linux
| mac, and different compiler versions is a common cause of "why
| isnt my game working right on XXX, it works fine on YYY
| questions.
| midjji wrote:
| This is probably a terrible idea, remember that if you have
| written one member of a union, all other members remain public,
| yet accessing any of them in any way is undefined behaviour. This
| is made way worse by most compilers mostly choosing to let you do
| what you think it will. They just dont guarantee they always will
| or in all cases.
| drfuchs wrote:
| I believe you are mistaken. The C11 standard, section 6.5.2.3
| "Structure and union members" pgf 6, says "One special
| guarantee is made in order to simplify the use of unions: if a
| union contains several structures that share a common initial
| sequence (see below), and if the union object currently
| contains one of these structures, it is permitted to inspect
| the common initial part of any of them anywhere that a
| declaration of the completed type of the union is visible. Two
| structures share a common initial sequence if corresponding
| members have compatible types (and, for bit-fields, the same
| widths) for a sequence of one or more initial members." And
| that seems to be what's being used here.
| midjji wrote:
| No: from https://en.cppreference.com/w/cpp/language/union.
|
| The union is only as big as necessary to hold its largest
| data member. The other data members are allocated in the same
| bytes as part of that largest member. The details of that
| allocation are implementation-defined but all non-static data
| members will have the same address (since C++14). It's
| undefined behavior to read from the member of the union that
| wasn't most recently written. Many compilers implement, as a
| non-standard language extension, the ability to read inactive
| members of a union.
|
| What 6.5.2.3 simplifies is the use of unions of the type:
|
| struct A{int type; DataA a;}
|
| struct B{int type; DataB b;}
|
| union U{A a;B b};
|
| U u;
|
| switch(u.type)...
|
| Its not what is beeing used here.
|
| std::variant is designed to deprecate all legitimate uses of
| union
| throwaway17_17 wrote:
| Your response to GP is based on the C++ reference and his
| explicitly is based on the C standard. Your assertion that
| ' [t]he details of that allocation are implementation-
| defined but all non-static data members will have the same
| address (since C++14)' seems to directly conflict with the
| C11 standard. Also, your closing comment about std::variant
| is clearly only applicable to C++. I am just curious why
| you are using C++ when the article and GP are specifically
| addressing C?
| drfuchs wrote:
| The post is about C, not C++. My comment stands, as the
| original post has two structs in a union, and they start
| the same way, so it's exactly the case covered in the C11
| Standard.
| sesuximo wrote:
| Doesn't matter for C, but in C++ this could make your contexpr
| functions UB since you can only use one member of a union in
| constexpr contexts (the "active" member).
| midjji wrote:
| Constexpr unions is the sane/safe way to use them. Its great,
| because accessing a member which isnt the last one written,
| constexpr will explicitly prevent it compile time. Whereas all
| other examples here are explicitly undefined behaviour!
| pjmlp wrote:
| In C++ we have namespaces for 30 years now, no need for such
| tricks.
| comex wrote:
| C++ namespaces are unrelated to this. They don't accomplish
| the same thing.
| pjmlp wrote:
| The goal of inline namespaces is exactly to allow for
| migrating libraries across versions.
| tialaramex wrote:
| Hmm. How do C++ namespaces help with the structure naming
| problem in this example? They seem completely orthogonal.
|
| C++ namespaces are a way to avoid library A's symbol "cow"
| clashing with library B's symbol "cow" without everything
| being named library_a_cow and library_b_cow all over the
| place which is annoying. I agree C would be nicer with such a
| namespace feature.
|
| However _this_ technique is about what happens when you
| realise your structure members x and y should be inside a
| sub-structure position, and you want both:
|
| d = calculate_distance(s.x, s.y); // Old code
|
| and
|
| d = calculate_distance(s.position.x, s.position.y); // New
|
| ... to work while you transition to this naming.
| pjmlp wrote:
| You can use inline namespaces for versioning symbols.
|
| https://www.foonathan.net/2018/11/inline-namespaces/
| tialaramex wrote:
| First of all, C++ 11 may _feel_ like thirty years ago,
| and certainly some of its proponents look thirty years
| older than they did at the time, but it was only ten
| years ago. C++ namespaces date to standardisation work
| (so after the 1985 C++ but before the 1995 _standard_
| C++) but they don 't get this job done. _Inline_
| namespaces are a newer feature.
|
| Secondly this technique does something different. The C
| hack doesn't touch the old code. But this "inline
| namespace" trick means old code has to explicitly opt
| into this backward compatibility fix or else it might
| blow up.
|
| Lastly, I didn't try this, but presumably you did. Are
| the two separately namespaces classes the "same thing" as
| far as type checking is concerned? A vital feature of
| this union trick is that it's just one structure, it type
| checks as the same structure because it _is_ the same
| structure. At a glance, I think the C++ solution results
| in _two_ types with similar names, so that would fail
| type checking.
| thaumasiotes wrote:
| Based on the writeup, this technique isn't really about
| enabling you to start writing `s.position.x` where the old
| code would have written `s.x`. If that were all you wanted,
| you'd just keep writing `s.x`. It's about enabling you to
| write `s.x` everywhere, in old code and new code, while
| also being able to pass `s.position` to memcpy calls.
| You're never supposed to write `s.position.x`.
| pjmlp wrote:
| Triggering UB is a compiler error in constexpr code.
|
| https://shafik.github.io/c++/undefined%20behavior/2019/05/11...
| sesuximo wrote:
| True, you'll hopefully get a compiler error.
| ferdek wrote:
| In other words: please always be wary of differences in C and
| C++, for instance type punning [0].
|
| [0] https://stackoverflow.com/a/25672839
___________________________________________________________________
(page generated 2021-08-01 23:00 UTC)