[HN Gopher] Lesser known tricks, quirks and features of C
       ___________________________________________________________________
        
       Lesser known tricks, quirks and features of C
        
       Author : jandeboevrie
       Score  : 218 points
       Date   : 2023-02-19 07:27 UTC (15 hours ago)
        
 (HTM) web link (blog.joren.ga)
 (TXT) w3m dump (blog.joren.ga)
        
       | vocram wrote:
       | Yet another proof that C is simple but not easy.
        
         | SAI_Peregrinus wrote:
         | The simplest languages tend to be the most difficult.
         | Brainfuck, Binary Lambda Calculus, Unlambda, and other "Turing
         | tarpits" are all extremely difficult to use for anything even
         | mildly complex.
        
           | elcritch wrote:
           | Forth. It's easy for a small cute script. But quickly becomes
           | a PITA.
        
           | mtlmtlmtlmtl wrote:
           | I always felt like unlambda and other SKI calculus esque
           | esolangs(iota comes to mind) could have some kinda strange
           | use case in some kind of generalised genetic programming. It
           | should be possible to create a binary notation for SKI
           | calculus where arbitrary bitstrings will be valid, and so one
           | could randomly mutate and recombine arbitrary programs.
           | Though I've never delved deeper into genetic algorithms and
           | evolutionary programming, my sense is that genetic algorithms
           | tend to be restricted to parameterised algorithms where the
           | "genes" determine the various parameters. Which can be great
           | for optimisation problems.
           | 
           | It's one of those weird ideas I've had kicking about for
           | years but never did anything about, and yet I keep coming
           | back to it.
        
             | alcover wrote:
             | This must be an avenue for very exciting explorations. I'm
             | quite ignorant about this stuff but have some questions :
             | > It should be possible to create a binary notation for SKI
             | calculus where arbitrary bitstrings will be valid
             | 
             | What if it's not ? How will your genetic petri dish spot
             | and eliminate invalid programs ?                 > one
             | could randomly mutate and recombine arbitrary programs
             | 
             | What if non-halting programs get generated ?
             | 
             | In this vein I've seen magnificent images of 1D cellular
             | automatons that use the surrounding pattern to decide on
             | the local rule for next gen.
        
               | mtlmtlmtlmtl wrote:
               | I assume that an invalid program would not compile/parse,
               | and so would die and fail to reproduce. The issue is more
               | that if the space of invalid programs is too large
               | compared to the space of valid ones, generating valid
               | offspring by combining two programs would be too rare and
               | the population would die off.
               | 
               | Though if the space is small enough I imagine you could
               | get past that. It's a bit of a gnarly point, hard to tell
               | how this would turn out without trying I suppose.
               | 
               | As for the halting problem there's of course no clever
               | solution there other than limiting CPU time. So I guess
               | pick a reasonable limit that makes sense for whatever
               | you're trying to do.
        
       | kevin_thibedeau wrote:
       | > The 0 width field tells that the following bit fields should be
       | set on the next atomic entity (char).
       | 
       | This isn't correct since int can't be less than 16-bits. Fields
       | are placed on the nearest natural alignment for the target
       | platform, which might not support unaligned access.
        
         | [deleted]
        
         | Jorengarenar wrote:
         | I think I'll use other example. Thanks!
        
           | kevin_thibedeau wrote:
           | You can just expand your example to use 16-bit values or
           | switch to uint8_t. Bitfields with signed integers are also a
           | minefield so it's best to never attempt it.
        
         | titzer wrote:
         | C is fundamentally confused, because it offers (near) machine-
         | level specifications but then leaves just enough wiggle room
         | for compilers to "optimize" (through alignment and such) while
         | ruining the precision of a specification. You end up not
         | getting exactly what you want at the machine level. It's
         | infuriating.
         | 
         | The bitfield stuff in C would be fantastic if it weren't
         | fundamentally broken. E.g. some Microsoft compilers in the past
         | interpreted bit fields as signed... _always_. In V8 we had a
         | work around with templates to avoid bitfields altogether. Fail.
        
         | [deleted]
        
         | rerdavies wrote:
         | int isn't a bitfield.
        
       | milgra wrote:
       | Very nice collection. My favorite C feature is actually a
       | gcc/clang feature : the __INCLUDE_LEVEL__ predefined macro. It
       | made me code&maintain my C projects exactly twice as fast as
       | before because file count dropped to half :
       | https://github.com/milgra/headerlessc .
        
       | zwieback wrote:
       | Be interesting to see when these features showed up. I learned C
       | from the K&R book back in the day and it doesn't mention most of
       | these.
       | 
       | Designated initializer is something I'll try to remember, seems
       | handy.
        
         | AceJohnny2 wrote:
         | Yeah the K&R, while being a masterpiece of clarity and
         | conciseness, is severely outdated in many important ways.
         | 
         | I wish there was some effort to create a modern version while
         | preserving the clarity and conciseness of Kernighan and
         | Ritchie.
         | 
         | Designated initializers in particular are extremely useful. I
         | once halted a factory line for days because of a mistake they
         | would have avoided.
        
         | suprjami wrote:
         | Designated initialisers were added in C99
        
       | ikran03 wrote:
       | Too young to know about anything in there, but these look so
       | interesting. Can't wait to show off '%n' in my next uni project
        
       | LegionMammal978 wrote:
       | > volatile type qualifier
       | 
       | > This qualifier tells the compiler that a variable may be
       | accessed by other means than the current code (e.g. by code run
       | in another thread or it's MMIO device), thus to not optimize away
       | reads and writes to this resource.
       | 
       | It's dangerous to mention cross-thread data access as a use case
       | for volatile. In standard C, modifying any non-atomic value on
       | one thread, while accessing it on another thread without
       | synchronization, is always UB. Volatile variables do not get any
       | exemption from this rule. In practice, the symptoms of such a
       | data race include the modification not being visible on the other
       | thread, or the modified value getting torn between its old and
       | new states.
        
       | clnq wrote:
       | Do we have something like this for C++ (parts not shared with C)?
        
       | dantle wrote:
       | Nice article. Saw a few things I wish I'd known about.
       | 
       | 1. %n in printf would be handy when writing CLIs dealing w/
       | multiple lines or precise counts of backspaces.
       | 
       | 2. Using enums as a form of static_assert() is a great idea
       | (triggering a div by zero compiler error).
        
         | jcelerier wrote:
         | using enums as a form of static_assert is very bad when C
         | nowadays literally has static_assert (_Static_assert:
         | https://gcc.godbolt.org/z/bfv6rKdKM)
        
         | tom_ wrote:
         | The enum idea is interesting. I've previously used an extern
         | with a conditional size of either 1 (valid) or -1 (invalid).
         | This requires no additional boilerplate, and is #define-able
         | into a static assert when built with a recent enough compiler.
         | Something like this, from memory:                   #define
         | STATIC_ASSERT(COND) extern char
         | static_assert_cond_[(COND)?1:-1] /* C99 or earlier */
         | #define STATIC_ASSERT(COND) _Static_assert(COND) /* C11 or
         | later */
         | 
         | As both are declarations, I don't think you'll end up in a
         | situation where one is valid and the other isn't - but I could
         | be wrong, and I suspect it would rarely matter in practice
         | anyway.
        
         | torstenvl wrote:
         | %n is an extremely poor fit for CLI manipulation or
         | tokenization for backspacing.
         | 
         | %n is for _bytes_ , not user-perceived characters.
        
       | nstbayless wrote:
       | Here's another one. Handy "syntax" that makes it possible to
       | iterate an unsigned type from N-1 to 0. (Normally this is
       | tricky.)
       | 
       | for (unsigned int i = N; i --> 0;) printf("%d\n", i);
       | 
       | This --> construction also works in JavaScript and so on.
        
         | titzer wrote:
         | AFAICT this would parse as "(i--) > 0", there's no "-->"
         | operator.
        
           | texaslonghorn5 wrote:
           | https://stackoverflow.com/questions/1642028/what-is-the-
           | oper...
        
         | skribanto wrote:
         | how would you iterate over every possible value of a unsigned
         | int?
        
           | mtklein wrote:
           | Usually I use a do-while loop,                   unsigned
           | char x = 0;         do {             printf("%d\n", x);
           | } while (++x);
        
         | [deleted]
        
         | mtklein wrote:
         | It's worth noting that this does also work on signed types, so
         | it can be a kind of handy idiom to see                  while
         | (N --> 0) { ... }
         | 
         | and know it will execute N times no matter the details of the
         | type of N.
        
         | Jorengarenar wrote:
         | I was hesitant to put it on the list, but fine, you convinced
         | me
        
         | stonegray wrote:
         | If you're gonna test the i--, shouldn't it fall through on zero
         | anyway?                   for (unsigned int i = N; i--;){}
         | unsigned int i = N;         while(i--){ ... }
         | 
         | Also I think I'm missing the tricky part. Couldn't this be a
         | bog-standard for loop?                  for (unsigned int i = N
         | - 1; i > 0; i--){ ... }
         | 
         | The "downto" pseudooperator definitely scores some points for
         | coolness and aesthetics, but there's no immediately obvious use
         | case for me.
        
           | Jorengarenar wrote:
           | The former executes loop when `i` is 0.
           | 
           | And we cannot change the comparison to `>=` in the later,
           | because unsigned is always bigger or equal 0, thus we would
           | get infinite loop.
        
       | Miserlou57 wrote:
       | "Quirks and features"
        
         | aranchelk wrote:
         | Where's the DougScore?
        
       | int_19h wrote:
       | One non-obvious thing about named function types is that they can
       | also be used to declare (but not define) functions:
       | typedef void func(int);        func f;        void f(int) {}
       | 
       | I don't think I've ever seen a practical use for this in C,
       | though. In C++, where this also works, and extends to member
       | functions, this can be very occasionally useful in conjunction
       | with decltype to assert that a function has signature identical
       | to some other function - e.g. when you're intercepting and
       | detouring some shared library calls:                   int foo();
       | decltype(foo) bar;
       | 
       | I suppose with typeof() in C23 this might also become more
       | interesting.
        
         | mtklein wrote:
         | I have found this pretty handy for declaring a bunch of
         | functions of all the same type, e.g. steps in a direct-threaded
         | interpreter.                   typedef void Step(whatever...);
         | Step add,sub,mul,div,              load,store,
         | etc...;
        
       | localplume wrote:
       | I remember once upon a time I thought C was fairly simple, so I
       | decided to write a program to generate ASTs from C programs. I
       | was very wrong and it was kind of a nightmare. There are so many
       | weird little quirks or lesser-used features that I never saw in
       | the wild even in large production codebases; I feel like you
       | really don't _need_ a lot of these features. I can't imagine
       | doing proper compiler work, especially for something like C++.
       | Nice article.
        
         | HybridCurve wrote:
         | > I remember once upon a time I thought C was fairly simple, so
         | I decided to write a program to generate ASTs from C programs.
         | 
         | Oh man, I think we all have been this young and naive at some
         | point.
         | 
         | I have spent time working with compilers for this purpose
         | (having realized I did _not_ want to attempt parsing source and
         | generating the AST) and decided it is much easier to let them
         | do the work. That being said, it can still be more than a
         | handful (both GCC and Clang have their eccentricities) and
         | depending on how you are using it you still might be in over
         | your head.
         | 
         | When you start a project like this and end up failing because
         | you simply do not have the depth of knowledge or time to see it
         | to completion it often feels a bit demoralizing from the loss
         | of investment. Truthfully though, having started many such
         | ventures (emulators for 6502 and 80386 to name a few), you get
         | all the benefit of experience from working on a difficult
         | problems without the misery of debugging and model checking
         | until everything until is more/less perfect. It's great fun,
         | you learn a lot, and you should never avoid trying simply
         | because it might be too much to handle.
        
       | 6451937099 wrote:
       | [dead]
        
       | AceJohnny2 wrote:
       | Compound Literals in C are great. They're no surprise to anyone
       | coming from more sophisticated languages, but I've never seen
       | them used in the C codebases I've worked on.
       | 
       | What with C also allowing structures as return values, another
       | rarely-used feature, they're really useful for allowing a richer
       | API than the historical `int foo(...)` that so many people are
       | used to seeing.
       | 
       | C has so much legacy that it's really hard for even decades-old
       | (C99!) feature to impose themselves. Or perhaps that's MSVC's
       | lagging support that's to blame :p
        
       | Joker_vD wrote:
       | Never quite understood why compound literals are lvalues, but
       | fine, whatever, I guess, it's so that you can write "&(struct
       | Foo){};" instead of "struct Foo tmp; &tmp;"... which, on a
       | tangential note, reminds me about Go: the proposals to make
       | things like &5 and &true legal in Go were rejected because "the
       | implied semantics would be unclear" even though &structFoo{} is
       | legal and apparently has obvious semantics.
        
         | cataphract wrote:
         | It's useful when a function has a out or in/out struct
         | parameter whose value at the end you're not interested in. Or
         | in functions where the struct is an input parameter, but they
         | return it as a return value too, which you can then assign to a
         | pointer variable or immediately pass to another function.
         | 
         | Note that the struct values thus created have longer lifetimes
         | than temporary C++ objects created directly inside the argument
         | list of a function call.
        
         | leni536 wrote:
         | In C compound literals have a relatively long lifetime compared
         | to C++ temporaries. With these lifetime rules it makes sense
         | that they are lvalues, although I like C++ rvalues (especially
         | prvalues) more.
         | 
         | https://cigix.me/c17#6.5.2.5.p5
         | 
         | > If the compound literal occurs outside the body of a
         | function, the object has static storage duration; otherwise, it
         | has automatic storage duration associated with the enclosing
         | block.
        
       | 6451937099 wrote:
       | [dead]
        
       | ufo wrote:
       | Fun fact about %n:
       | 
       | Mazda cars used to have a bug where they used printf(str) instead
       | of printf("%s", str) and their media system would crash if you
       | tried to play the "99% Invisible" podcast in them. All because
       | the "% In" was parsed as a "%n" with some extra modifiers.
       | https://99percentinvisible.org/episode/the-roman-mars-mazda-...
        
         | rerdavies wrote:
         | Fun fact about %n:
         | 
         | The %n functionality also makes printf accidentally Turing-
         | complete even with a well-formed set of arguments. A game of
         | tic-tac-toe written in the format string is a winner of the
         | 27th IOCCC.
         | 
         | - sez wiki.
         | 
         | A not so fun fact:
         | 
         | Because the %n format is inherently insecure, it's disabled by
         | default.
         | 
         | - MSVC reference.
        
         | gdprrrr wrote:
         | [dead]
        
         | [deleted]
        
         | tom_ wrote:
         | "format not a string literal" is one warning I always upgrade
         | to an error. Dear reader: you should do this, too!
        
           | wrigby wrote:
           | Thanks! This prompted me to look up the flag to enable this.
           | For GCC it's:                 -Werror=format-security
        
           | Gigachad wrote:
           | Why are these not compiler errors by default? Opting in to
           | such important safety features seems like broken design.
        
       | zabzonk wrote:
       | the c training course at a popular uk training company (the
       | instruction set) had duff's device on something like page 5 of
       | their c course - expunging it was one of the first things i did
       | when i joined them. there were many others.
        
         | zabzonk wrote:
         | i don't ask this too often - but what is wrong with this
         | comment?
        
       | gallier2 wrote:
       | Cool. Two of the tricks shown are from my contribution in
       | stackoverflow.
        
       | neverrroot wrote:
       | It's cool to have these, it's fun to use them for fun. But please
       | don't use them in production code. Also don't assume most of them
       | will he known by other developers.
        
         | suprjami wrote:
         | Professional C developers definitely should be using at least
         | designated init and FAM, standard features both added in C99
         | and currently 24 years old.
        
         | Jorengarenar wrote:
         | >Also don't assume most of them will he known by other
         | developers.
         | 
         | Given the title of the article, one ought to assume the
         | opposite ;)
        
         | Gigachad wrote:
         | Please don't use C at all in production if you can help it.
        
       | foobiekr wrote:
       | Most of these are pretty familiar if old enough but this is a
       | wonderful list.
       | 
       | I didn't know C23 was getting rid of trigraphs. That's probably a
       | good thing and easy to clean up if needed.
        
         | int_19h wrote:
         | The bit about "register" is old enough that I don't think it's
         | meaningful anymore.
         | 
         | The stock verbiage about how modern compilers ignore "register"
         | because they can do better but it may be useful on simpler
         | ones, has been around in this exact form 20 years ago already.
         | And one curious thing is that even back then, such statements
         | would never list specific compilers where "register" still did
         | something useful.
         | 
         | So far as I can tell, "register" was in actual use back when
         | many C compilers were still single-pass, or at least didn't
         | have a full-fledged AST, and thus their ability to do things
         | like escape analysis was limited. With that in mind, "register"
         | was basically a promise to such a compiler to not take the
         | address of a local in the function body (this is the only
         | standard way in which it affects C semantics!). But we haven't
         | had such compilers for a very long time now, even when
         | targeting embedded - the compilers themselves run on full-power
         | hardware, so there's no reason for them to take shortcuts.
        
           | camel-cdr wrote:
           | I think register is closer to const, as in: it's a hint to
           | the programmer not the compiler.
           | 
           | So if you want to make absolutely sure that a variable can
           | always be in a register then you should consider adding the
           | register specifier to stop other programmers from taking the
           | address of that variable.
        
       | tastysandwich wrote:
       | "Expert C Programming: Deep C Secrets" is a really good book to
       | learn a lot of C tricks and quirks, plus some history. I read it
       | a few years ago and loved it.
       | 
       | I was a grad when I read it and remember annoying my older
       | coworkers for a few weeks with little gotchas I picked up. "hey
       | what do you think THIS example prints?" "Stop sending me these!"
        
       | somewhereoutth wrote:
       | I'm a bit better at English than c, and in the spirit of language
       | peculiarities, this jumped out at me:
       | 
       | > It's possible, because C cares less than more about whitespace
       | 
       | Idiomatically we'd say 'couldn't care less'. I guess we should be
       | glad it wasn't the diabolical and illogical 'could care less'
        
         | Jedd wrote:
         | I don't believe those are functionally / semantically
         | equivalent - couldn't care less does imply a min() value of
         | care.
         | 
         | In contrast, the author is suggesting a comparative only.
         | 
         | And, on careful re-reading, I suspect the author is having a
         | play on syntax & semantics here -- the context of the quote is:
         | 
         | > You may ask, since when C has such operator and the answer
         | is: since never. --> is not an operator, but two separate
         | operators -- and > written in a way they look like one. It's
         | possible, because C cares less than more about whitespace.
         | 
         | Given that '--' is decrement (kind of 'lessen') and > is
         | greater than (kind of 'more'). Perhaps I am reading too much
         | into that.
         | 
         | (I feel 'couldn't care less' is perhaps more common in northern
         | America than elsewhere, and while TFA has a Gabon TLD, appears
         | to be resident in Poland, so automatically receives a lot of
         | leeway in their use of idiomatic English.)
        
       ___________________________________________________________________
       (page generated 2023-02-19 23:00 UTC)