[HN Gopher] Learning that you can use unions in C for grouping t...
       ___________________________________________________________________
        
       Learning that you can use unions in C for grouping things into
       namespaces
        
       Author : deafcalculus
       Score  : 102 points
       Date   : 2021-08-01 14:48 UTC (8 hours ago)
        
 (HTM) web link (utcc.utoronto.ca)
 (TXT) w3m dump (utcc.utoronto.ca)
        
       | 10000truths wrote:
       | Anonymous nested structs are also quite useful for creating
       | struct fields with explicit offsets:                   #include
       | <stdio.h>         #include <stdint.h>                  #define
       | YDUMMY(suffix, size) char dummy##suffix[size]         #define
       | XDUMMY(suffix, size) YDUMMY(suffix, size)         #define
       | PAD(size) XDUMMY(__COUNTER__, size)                  struct
       | ExplicitLayoutStruct {             union {                 struct
       | __attribute__((packed)) { PAD(3); uint32_t foo; };
       | struct __attribute__((packed)) { PAD(5); uint16_t bar; };
       | struct __attribute__((packed)) { PAD(13); uint64_t baz; };
       | };         };                  int main(void) {             //
       | offset foo = 3             // offset bar = 5             //
       | offset baz = 13             printf("offset foo = %d\n",
       | offsetof(struct ExplicitLayoutStruct, foo));
       | printf("offset bar = %d\n", offsetof(struct ExplicitLayoutStruct,
       | bar));             printf("offset baz = %d\n", offsetof(struct
       | ExplicitLayoutStruct, baz));             return 0;         }
        
         | WalterBright wrote:
         | Anytime macros are used for metaprogramming, it's time to reach
         | for a more powerful language.
        
           | Spivak wrote:
           | Doesn't work in a lot of cases unfortunately. If you're
           | writing a library designed to be consumed by other languages
           | you're stuck with writing C abi compatible code which can be
           | written in other languages that can "extern" them but it puts
           | limits on what's possible in those libraries.
        
           | 10000truths wrote:
           | Macros are useful, as long as they're used sparingly. I think
           | that in this case, it's used well - the struct is still
           | perfectly readable, and the sole purpose of it is to make it
           | so that you don't have to manually name the dummy fields. But
           | you could totally just write out dummy1, dummy2, dummy3 etc.
           | yourself if you want to get rid of the macros.
        
             | WalterBright wrote:
             | > Macros are useful, as long as they're used sparingly.
             | 
             | Everybody says that. Everybody believes it. And everybody
             | goes to town making a rat's nest with macros, just like
             | that snarl of cables under my desk that resist all attempts
             | to make it nice.
             | 
             | Myself included. I've even written an article about clever
             | C macros. Look, ma! I was so proud of myself.
             | 
             | But then I got older. I started replacing the macros in my
             | C code with regular code. It turns out they weren't that
             | necessary at all. I liked the C code a lot better when it
             | didn't have a single # in it other than #include.
        
           | throwaway17_17 wrote:
           | I want to be clear about your meaning, because I don't know
           | if I'm reading your comment correctly. Are you referring
           | explicitly to syntax based, preprocessor macros? Or does your
           | comment extend to other metaprogramming techniques? I am
           | inclined to think you mean the first considering the amount
           | of emphasis on generic programming in D? Just curious.
        
             | WalterBright wrote:
             | I'm referring to both syntax based (AST) macros, and text
             | based (preprocessor) macros. The latter, of course, are
             | much worse.
             | 
             | An example of the former is so-called "expression
             | templates" in C++. I've seen them used to create a regular
             | expression language using C++ expression templates. The
             | author was quite proud of them, and indeed they were very
             | clever.
             | 
             | However nice the execution, the concept was terrible. There
             | was no way to visually tell that some ordinary code was
             | actually doing regular expressions.
             | 
             | C++ expression templates had their day in the sun, but
             | fortunately they seem to have been thrown onto the trash
             | pile of sounds-like-a-good-idea-but-oops.
             | 
             | (I wrote an article showing how to do expression templates
             | in D, mainly to answer criticisms that D couldn't do it,
             | not because it was a good idea.)
        
           | cryptonector wrote:
           | You might as well have written that "any time you're reaching
           | for C, it's time to reach for a more powerful language".
           | 
           | But if -sadly- you must use C, metaprogramming using macros
           | is not a terrible thing.
        
         | nyanpasu64 wrote:
         | Do foo and bar deliberately overlap?
        
           | 10000truths wrote:
           | Yes, I was looking to demonstrate the flexibility of the
           | approach by including overlapping fields.
        
             | midjji wrote:
             | If you write to either, accessing the other, even on the
             | overlap, is undefined behaviour
        
               | 10000truths wrote:
               | Type punning/aliasing with unions is well defined in gcc.
               | Linus even has a humorous rant about it on the topic:
               | 
               | https://www.yodaiken.com/2018/06/07/torvalds-on-aliasing/
               | 
               | Sure, it's compiler-specific, but I'm already using
               | `__attribute__((packed))` anyways.
        
               | midjji wrote:
               | All undefined behaviour is well defined for each
               | compiler, what it really means is implementation defined
               | and subject to change without notice or documentation
               | with every compiler version or host os what flags are
               | enabled or a thousand other things. Why use an approach
               | which strictly relies on specific versions of specific
               | compilers, rather than a completely portable and standard
               | compliant struct with char array and a few char pointers?
               | Or if you want a convenient interface and aren't
               | explicitly writing for the kernel, switch to a restricted
               | subset of C++ and do it right?
        
               | rocqua wrote:
               | GCC explicitly states znd documents that type punning
               | through unions will work as expected. This doesn't work
               | by accident. It works very explicitly by design of GCC.
               | 
               | Different GCC versions aren't randomly going to change
               | documented behavior. And when they accidentally do, they
               | will consider it a bug.
        
         | gumby wrote:
         | One of the very few things from C that I miss in C++ is
         | anonymous structs and enums. I really don't understand why they
         | are not allowed.
         | 
         | That is, C style enums don't have to have a name but "type
         | safe" (enum class) ones do. One classic use is to name an
         | otherwise boolean option in a function signature; there's
         | typically no need to otherwise name it.
         | 
         | C++ incompatibly requires a name for all struct and class
         | declarations, again a waste when you will only have a single
         | object of a given type.
        
           | WalterBright wrote:
           | > I really don't understand why they are not allowed.
           | 
           | I don't, either. Such were in D from 2000 or so.
           | 
           | I also don't understand why `class` in C++ sits in the tag
           | name space. I wrote Bjarne in the 1980s asking him to remove
           | it from the tag name space, as the tag name space is an
           | abomination. He replied that there was too much water under
           | that bridge.
           | 
           | D doesn't have the tag name space, and in 20 years not a
           | single person has asked for it.
           | 
           | This did cause some trouble for me with ImportC to support
           | things like:                   struct S { ... };         int
           | S;
           | 
           | but I found a way. Although such code is an abomination. I've
           | only seen it in the wild in system .h files.
        
             | Gibbon1 wrote:
             | The only explanation I saw was that C++ standards guys were
             | horrified by the idea of unpredictable side effects as a
             | result of initialization of a struct.
             | 
             | I think C++ though is adding them.
             | 
             | What I'd like in c is designated function parameters.
             | // these the same       bar(.a = 10, .b = 12);       bar(.b
             | = 12, .a = 10);
        
           | cjaybo wrote:
           | > C++ incompatibly requires a name for all struct and class
           | declarations
           | 
           | You're right about "enum class", but anonymous classes and
           | structs are perfectly valid in C++:
           | 
           | https://godbolt.org/z/7MbcqhnoK
        
             | dataflow wrote:
             | Try                 struct S { struct { int x; }; };
             | 
             | under -pedantic and you'll get                 warning: ISO
             | C++ prohibits anonymous structs [-Wpedantic]
        
               | midjji wrote:
               | Pedantic is for the older C++ standard, its not pedantic
               | for the latter e.g c++11, I think this changed.
        
               | dataflow wrote:
               | Well that blows my mind, I never realized pedantic
               | ignores the language setting. Is this the only case where
               | it does that?
        
               | junon wrote:
               | No, pedantic is for disabling compiler extensions. You
               | still need to explicitly specify a standard.
        
           | midjji wrote:
           | Use a enum in a namespace, or anonymous namespace
        
             | gumby wrote:
             | This is an example of the desired use case:
             | static obj& some_call (obj& o, enum struct { abandon, save
             | } disposition) { ... };
             | 
             | This is a common case (and should be more common) to avoid
             | using an obscure boolean flag, which can lead to bugs. It
             | shouldn't need a name.
             | 
             | An anonymous namespace just means the name itself won't
             | leak out; under C++ rules I _need_ the name even to specify
             | the enum tag, which is absurd.
        
         | midjji wrote:
         | There are two kinds of undefined behaviour being invoked in
         | using this. Its a horrible idea and a horrible code smell, get
         | rid of it if you ever see something like this.
        
           | 10000truths wrote:
           | I don't see any undefined behavior here. As I mentioned
           | below, gcc explicitly documents type punning via unions as
           | being well defined. But yes, this is compiler specific and is
           | not guaranteed to work elsewhere.
        
             | formerly_proven wrote:
             | Accessing packed struct members works fine on x86, but will
             | blow up at runtime or do weird things on platforms which
             | don't support unaligned loads or stores.
             | 
             | The correct way to access packed structs is through memcpy,
             | just like you'd access any other potentially unaligned
             | object.
        
               | 10000truths wrote:
               | For architectures where unaligned accesses are illegal,
               | gcc will generate multiple load/store instructions when
               | accessing packed struct fields by name. The main caveat
               | to look out for is taking the address of a packed struct
               | member and then dereferencing it.
        
             | midjji wrote:
             | There is absolutely undefined behaviour there. Undefined
             | behaviour is defined not as nasal daemons but as: The
             | compiler implementer does not guarantee that this behaviour
             | will be hardware, circumstance, compiler version, or os
             | consistent, nor that we will warn if we change this.
             | 
             | Packed is technically not a undefined behaviour, but it is
             | certainly a trap. Especially because the compiler macros
             | leads people to make defines which select packed by
             | compiler automatically. Then the special case of didn't
             | recognize compiler is just left empty, meaning compiles but
             | no longer does what you think.
        
               | fifjdynb wrote:
               | _You_ don 't get to decide what UB means. It really does
               | mean nasal demons are a possibility: all bets are off
               | when you run that executable. Use of the term "undefined
               | behaviour" to mean something else may be on the increase,
               | unfortunately (https://mars.nasa.gov/technology/helicopte
               | r/status/298/what-...), but if we're talking about C,
               | it's meaning is fixed.
        
       | rightbyte wrote:
       | I don't regard this as a "perverse" hack. If I ever do embedded
       | memory mapped stuff in C11 this is way too tempting.
        
         | midjji wrote:
         | You are practically guaranteed to invoke undefined behaviour if
         | you do. Just use a map on a std::array of e.g. std::byte
        
           | dexterhaslem wrote:
           | they said C11 tho
        
       | flohofwoe wrote:
       | I'm using anonymous nested structs extensively for grouping
       | related items, but I consider the extra field name a feature, not
       | something that should be hidden:
       | 
       | https://github.com/floooh/sokol-samples/blob/bfb30ea00b5948f...
       | 
       | (also note the 'inplace initialization' which follows the state
       | struct definition using C99's designated initialization)
        
       | remram wrote:
       | The first example seems wrong, instead of `struct sub { ... };`
       | what is meant is `struct { ... } sub;`
        
         | siebenmann wrote:
         | You're right; thanks for noticing and I've updated the first
         | example. My C is a bit rusty these days and I didn't check it
         | with a compiler the way I should have.
         | 
         | (I'm the author of the linked-to article.)
        
       | kevin_thibedeau wrote:
       | The result is uglier and less maintainable than a pair of macros.
       | Or just stop trying to hide syntax. This is ultimately on the
       | same level as typedefing pointers.
        
       | adamnemecek wrote:
       | Don't actually do this.
        
         | sp332 wrote:
         | The Linux kernel is using this for bounds checking.
         | https://news.ycombinator.com/item?id=28015263
        
           | ufo wrote:
           | Like the parent poster, when I read the article I assumed
           | that there was no conceivable reason to ever use this feature
           | in a real C program. Let me just say that I'm pleasantly
           | surprised to be proven wrong!
        
       | Subsentient wrote:
       | Bleurgh. I have a deep soft spot for C, and I'm known to get
       | twisted pleasure from using obscure language features in new ways
       | to annoy people, but this is a level of abuse that even I can't
       | get behind. If you need namespacing, use C++. As much as I love
       | C, it's terrible for large projects.
        
         | kktkti9 wrote:
         | People will make a mess of a large project regardless of the
         | language.
        
         | vbezhenar wrote:
         | Linux kernel is large project and clearly C is sufficient for
         | it, given the fact that migrating to C++ would probably be very
         | easy (not using all C++ features, but just selected ones), yet
         | it did not happen.
         | 
         | I think that C++ is better than C, but C is not that bad, even
         | for large projects.
        
           | AlotOfReading wrote:
           | A big issue with introducing C++ into a codebase is that it's
           | incredibly hard to stick to a particular subset or standard.
           | There's always a well-justified argument for the next
           | standard or "just this one additional feature". Eventually
           | you end up with the whole kitchen sink, regardless of where
           | you started.
           | 
           | I've had far more success hard-firewalling C++ into its own
           | box where programmers can use whatever they can get running
           | than trying to limit people to subsets.
        
           | adtac wrote:
           | the kernel has to live with the choices it made in the 90s,
           | you don't
        
             | midjji wrote:
             | Yeah they should have upgraded to some restricted subset of
             | C++ or new restrictive language ages ago. I mostly buy the
             | arguments against having exceptions, perhaps even against
             | polymorphism in general, but the argument against
             | destructors, or atomics... hell no.
        
               | humanrebar wrote:
               | > ...against polymorphism...
               | 
               | C has polymorphism. Inheritance-based virtual dispatch is
               | just one kind of polymorphism. It's common to wire up
               | polymorphism in C with bespoke data structures using
               | tagged unions it function pointers. Changing an
               | implementation at link time is even a form of
               | polymorphism.
        
           | dkersten wrote:
           | > Linux kernel is large project and clearly C is sufficient
           | for it
           | 
           | Sure, and operating systems have been written in assmebly
           | too. The question is whether it would be better than just
           | sufficient if Linux were written in C++, today (ie C++17 or
           | 20, not something old). Switching now probably wouldn't be
           | feasible (even ignoring technical reasons, the kernel
           | developer community is familiar with the C codebase and code
           | standards and bought into it), but if Linux were started
           | today, would it be a better choice?
           | 
           | Maybe the answer is still no and C would still be chosen, but
           | the choice today is very different than it was when Linux was
           | started. Of course, maybe Rust or something would be chosen
           | today instead.
        
           | dathinab wrote:
           | And the Kernel devs would probably get _really_ annoyed if
           | you try to push this kind of name-spacing.
           | 
           | > C++ would probably be very easy
           | 
           | Not necessary, besides some small? problems due to the C++
           | allowing "more magic optimizations" then C they would switch
           | to a sub-set of C++, and it might be so you would need to
           | communicate to all contributors that a lot of C++ things are
           | not allowed. And it might be easier to simple not use C++. I
           | mean if it would be that easy the kernel likely would have
           | switched.
        
         | PostThisTooFast wrote:
         | I will point out, however, that you can abuse "struct" in a
         | similar way to simulate namespaces in Swift.
        
       | bruce343434 wrote:
       | Imo this is not "perverse". In my vector library I alias a vec3
       | as float x,y,z and float[3] using this technique.
        
         | midjji wrote:
         | This is also known as the most common invocation of undefined
         | behaviour in game programming. If you do this, write to y, then
         | read from [1]. You are invoking undefined behaviour, and
         | compilers doing different things here between windows, linux
         | mac, and different compiler versions is a common cause of "why
         | isnt my game working right on XXX, it works fine on YYY
         | questions.
        
       | midjji wrote:
       | This is probably a terrible idea, remember that if you have
       | written one member of a union, all other members remain public,
       | yet accessing any of them in any way is undefined behaviour. This
       | is made way worse by most compilers mostly choosing to let you do
       | what you think it will. They just dont guarantee they always will
       | or in all cases.
        
         | drfuchs wrote:
         | I believe you are mistaken. The C11 standard, section 6.5.2.3
         | "Structure and union members" pgf 6, says "One special
         | guarantee is made in order to simplify the use of unions: if a
         | union contains several structures that share a common initial
         | sequence (see below), and if the union object currently
         | contains one of these structures, it is permitted to inspect
         | the common initial part of any of them anywhere that a
         | declaration of the completed type of the union is visible. Two
         | structures share a common initial sequence if corresponding
         | members have compatible types (and, for bit-fields, the same
         | widths) for a sequence of one or more initial members." And
         | that seems to be what's being used here.
        
           | midjji wrote:
           | No: from https://en.cppreference.com/w/cpp/language/union.
           | 
           | The union is only as big as necessary to hold its largest
           | data member. The other data members are allocated in the same
           | bytes as part of that largest member. The details of that
           | allocation are implementation-defined but all non-static data
           | members will have the same address (since C++14). It's
           | undefined behavior to read from the member of the union that
           | wasn't most recently written. Many compilers implement, as a
           | non-standard language extension, the ability to read inactive
           | members of a union.
           | 
           | What 6.5.2.3 simplifies is the use of unions of the type:
           | 
           | struct A{int type; DataA a;}
           | 
           | struct B{int type; DataB b;}
           | 
           | union U{A a;B b};
           | 
           | U u;
           | 
           | switch(u.type)...
           | 
           | Its not what is beeing used here.
           | 
           | std::variant is designed to deprecate all legitimate uses of
           | union
        
             | throwaway17_17 wrote:
             | Your response to GP is based on the C++ reference and his
             | explicitly is based on the C standard. Your assertion that
             | ' [t]he details of that allocation are implementation-
             | defined but all non-static data members will have the same
             | address (since C++14)' seems to directly conflict with the
             | C11 standard. Also, your closing comment about std::variant
             | is clearly only applicable to C++. I am just curious why
             | you are using C++ when the article and GP are specifically
             | addressing C?
        
             | drfuchs wrote:
             | The post is about C, not C++. My comment stands, as the
             | original post has two structs in a union, and they start
             | the same way, so it's exactly the case covered in the C11
             | Standard.
        
       | sesuximo wrote:
       | Doesn't matter for C, but in C++ this could make your contexpr
       | functions UB since you can only use one member of a union in
       | constexpr contexts (the "active" member).
        
         | midjji wrote:
         | Constexpr unions is the sane/safe way to use them. Its great,
         | because accessing a member which isnt the last one written,
         | constexpr will explicitly prevent it compile time. Whereas all
         | other examples here are explicitly undefined behaviour!
        
         | pjmlp wrote:
         | In C++ we have namespaces for 30 years now, no need for such
         | tricks.
        
           | comex wrote:
           | C++ namespaces are unrelated to this. They don't accomplish
           | the same thing.
        
             | pjmlp wrote:
             | The goal of inline namespaces is exactly to allow for
             | migrating libraries across versions.
        
           | tialaramex wrote:
           | Hmm. How do C++ namespaces help with the structure naming
           | problem in this example? They seem completely orthogonal.
           | 
           | C++ namespaces are a way to avoid library A's symbol "cow"
           | clashing with library B's symbol "cow" without everything
           | being named library_a_cow and library_b_cow all over the
           | place which is annoying. I agree C would be nicer with such a
           | namespace feature.
           | 
           | However _this_ technique is about what happens when you
           | realise your structure members x and y should be inside a
           | sub-structure position, and you want both:
           | 
           | d = calculate_distance(s.x, s.y); // Old code
           | 
           | and
           | 
           | d = calculate_distance(s.position.x, s.position.y); // New
           | 
           | ... to work while you transition to this naming.
        
             | pjmlp wrote:
             | You can use inline namespaces for versioning symbols.
             | 
             | https://www.foonathan.net/2018/11/inline-namespaces/
        
               | tialaramex wrote:
               | First of all, C++ 11 may _feel_ like thirty years ago,
               | and certainly some of its proponents look thirty years
               | older than they did at the time, but it was only ten
               | years ago. C++ namespaces date to standardisation work
               | (so after the 1985 C++ but before the 1995 _standard_
               | C++) but they don 't get this job done. _Inline_
               | namespaces are a newer feature.
               | 
               | Secondly this technique does something different. The C
               | hack doesn't touch the old code. But this "inline
               | namespace" trick means old code has to explicitly opt
               | into this backward compatibility fix or else it might
               | blow up.
               | 
               | Lastly, I didn't try this, but presumably you did. Are
               | the two separately namespaces classes the "same thing" as
               | far as type checking is concerned? A vital feature of
               | this union trick is that it's just one structure, it type
               | checks as the same structure because it _is_ the same
               | structure. At a glance, I think the C++ solution results
               | in _two_ types with similar names, so that would fail
               | type checking.
        
             | thaumasiotes wrote:
             | Based on the writeup, this technique isn't really about
             | enabling you to start writing `s.position.x` where the old
             | code would have written `s.x`. If that were all you wanted,
             | you'd just keep writing `s.x`. It's about enabling you to
             | write `s.x` everywhere, in old code and new code, while
             | also being able to pass `s.position` to memcpy calls.
             | You're never supposed to write `s.position.x`.
        
         | pjmlp wrote:
         | Triggering UB is a compiler error in constexpr code.
         | 
         | https://shafik.github.io/c++/undefined%20behavior/2019/05/11...
        
           | sesuximo wrote:
           | True, you'll hopefully get a compiler error.
        
         | ferdek wrote:
         | In other words: please always be wary of differences in C and
         | C++, for instance type punning [0].
         | 
         | [0] https://stackoverflow.com/a/25672839
        
       ___________________________________________________________________
       (page generated 2021-08-01 23:00 UTC)