[HN Gopher] Lesser known tricks, quirks and features of C
       ___________________________________________________________________
        
       Lesser known tricks, quirks and features of C
        
       Author : rramadass
       Score  : 87 points
       Date   : 2024-09-26 16:25 UTC (6 hours ago)
        
 (HTM) web link (jorenar.com)
 (TXT) w3m dump (jorenar.com)
        
       | coreyp_1 wrote:
       | That's a nice list!
       | 
       | I've been digging into cross-platform (Windows and Linux) C for a
       | while, and it has been fascinating. On top of that, I've been
       | writing a JIT-ted scripting (templating) language, and the ABI
       | differences (not just fastcall vs stdcall vs cdecl) are often not
       | easy to find documentation about.
       | 
       | I've decided that if I ever get to teach a University class on C
       | again, I wanted to cover some of these things that I feel are
       | often left out, and this list is a helpful reference! Thanks!
        
       | ranger_danger wrote:
       | > quirks and features
       | 
       | Someone is a fan of Doug DeMuro.
        
         | randomdata wrote:
         | _This..._ is the 1972 Riche C
        
       | saagarjha wrote:
       | Mentioning %n without explaining that it is overwhelmingly used
       | for exploits is a little reckless IMO.
        
         | greiskul wrote:
         | I'm curious about this, didn't know about %n before. What are
         | the common pitfalls and exploits using this enables?
        
           | lights0123 wrote:
           | If the user can control the formatting string, they can write
           | to pointers stored on the stack. It's important to use
           | printf("%s", str) instead of printf(str).
        
             | rep_lodsb wrote:
             | Useless use of printf; what's wrong with "puts(str)"?
        
               | shawn_w wrote:
               | puts() adds a newline at the end. gcc will happily turn
               | printf("%s\n", str) into puts(str), though.
               | 
               | I've never tested to see if printf("%s", str) becomes the
               | equivalent fputs(str, stdout)
        
           | mananaysiempre wrote:
           | You would expect a printf call with a user-controlled format
           | string to be, at worst, an arbitrary read. Thanks to %n, it
           | can be a write as well.
        
         | _kst_ wrote:
         | Background: A %n format specifier in a printf call stores the
         | number of characters written so far into a specified variable.
         | For example:                   #include <stdio.h>         int
         | main(void) {             int count;
         | printf("%s%n\n", "hello, world", &count);
         | printf("count = %d\n", count);         }
         | 
         | The output is:                   hello, world         count =
         | 12
         | 
         | %n can be exploited to write data to an arbitrary memory
         | location, but _only_ if the format string is something other
         | than a string literal.
         | 
         | %n can be exploited, but it's entirely possible to use it
         | safely.
        
       | jonathrg wrote:
       | Multi character constants is one of the many things in C that
       | would be nice to use if the language would just choose some well-
       | defined behaviour for it. It doesn't really matter which.
        
         | mananaysiempre wrote:
         | Mainstream compilers agree on multicharacter literals being big
         | endian; that is, 'AB' is usually 'A' << CHAR_BIT | 'B'. The
         | exception is MSVC, which also works like that as long as you
         | don't use character escapes, but if you do it emits some sort
         | of illogical, undocumented mess that looks like an ancient
         | implementation bug fossilized into a compatibility constraint.
        
       | golergka wrote:
       | switch (n % 2) {             case 0:                 do {
       | ++i;             case 1:                     ++i;
       | } while (--n > 0);              }
       | 
       | Someone is really ought to record a "WAT" video about C.
        
         | mananaysiempre wrote:
         | The switch statement in C is not a very limited pattern match.
         | The switch statement in C is a very ergonomic jump table. Do
         | not think ML's case-of with only integer literals for patterns;
         | think FORTRAN's computed GO TO with better syntax. And it will
         | cease to be a WAT. (For a glimpse of the culture before pattern
         | matching was in programmers' collective consciousness, try the
         | series on designing a CASE statement for Forth that ran for
         | several issues of Forth Dimensions.)
        
           | russellbeattie wrote:
           | I don't think there's any confusion of how it works, it's the
           | deep horror in discovering that it's possible in the first
           | place, and a morbid curiosity of the chaos it could cause if
           | abused.
        
             | mananaysiempre wrote:
             | At least for me, the feelings you describe are
             | characteristic of a footgun, not a WAT. A WAT is rather a
             | desperate bewilderment as to who could ever design the
             | thing that way and why, and for switch statements computed
             | gotos are the answer to that question.
             | 
             | As for the footgun side, I mean, it could be one in theory,
             | sure. But I don't think I've ever seen it actually fired.
             | And I can't really appreciate the Javaesque "abuse"
             | thinking--it is to some extent the job of the language
             | designer to prevent the programmer from accidentally doing
             | something bad, but I don't see how it is their job to
             | prevent a programmer from deliberately doing strange
             | things, as long as the result looks appropriately strange
             | as well.
             | 
             | (There are reasons to dislike C's switch statement, I just
             | don't think the potential for "abuse" is one.)
        
         | PhilipRoman wrote:
         | Just think of the "case" statements like any other label,
         | despite the misleading indentation. Then it becomes perfectly
         | natural to jump in the middle of a loop.
        
         | agumonkey wrote:
         | I wonder if there's any other instance (in programming or else)
         | of intersecting grammar constructs being accepted.
        
         | tom_ wrote:
         | This sort of thing is pretty handy sometimes. Don't forget you
         | can have code (e.g., start of the loop) before any of the cases
         | too!
        
       | fuhsnn wrote:
       | My recent favorite is glibc's hack to implement _Static_assert
       | under C99:
       | https://codebrowser.dev/glibc/glibc/misc/sys/cdefs.h.html#56...
       | 
       | It uses the constant expression to create a bitfield of size -1
       | when failed, and leaves the compiler to error on that as the
       | intended assertion. The actual statement is an extern pointer to
       | a function returning a pointer to an array which has sizeof the
       | aforementioned bitfield struct as its size.
       | 
       | Another one encountered in Toybox is (0 || "foo") being a const
       | expression that evaluates to 1. Apparently the string literal
       | must have been soundly created in data section, so its pointer
       | address is safely assumed to be non-zero.
        
       | wolfspaw wrote:
       | Really liked the trick of defining the struct in the return part
       | of the function.
       | 
       | Array pointers: Array to pointer decay is extremely annoying, if
       | it was implemented as Array to "slice" decay it would be great.
       | 
       | Static array indices in function parameter declarations: awesome,
       | a shame that C++ (and Tiny C) do not support it >/
       | 
       | flexible array member: extremely useful, and now there are good
       | compiler flags for ensuring correct flexible array member usage
       | 
       | X-Macro: nice, no-overhead enum to string name. Didn't know the
       | trick
       | 
       | Combining default, named and positional arguments: Named-
       | arguments/default-arg, C version xD. It would be cool if it was
       | added to C language as a native feature, instead of having to do
       | the struct hiding macro.
       | 
       | Comma operator: really useful, specially in macros
       | 
       | Digraphs, trigraphs and alternative tokens: di/tri/graphs rarely
       | useful, alternatives synonims of iso646.h are awesome, love using
       | and/or instead of &&/||
       | 
       | Designated initializer: super awesome, could not use if you
       | wanted C++ portability. Now C++ supports some part of it.
       | 
       | Compound literals: fantastic, but in C++ it will explode due to
       | stack deallocation in the same line. C++ should fix this and
       | allow the C idiom >/
       | 
       | Bit fields: nice for more control of structs layout
       | 
       | constant string concat: "MultiLine" String, C version xD
       | 
       | Ad hoc struct declaration in the return type of a function:
       | didn't know this trick, "multi value" return, C version xD
       | 
       | Cosmopolitan-libc: incredible project. Already knew of it, its
       | awesome to offer a binary that runs in all S.Os at the same time.
       | 
       | Evaluate sizeof at compile time by causing duplicate case error:
       | ha, nice trick for debugging the size of anything.
        
         | fuhsnn wrote:
         | >Static array indices in function parameter declarations:
         | awesome, a shame that C++ (and Tiny C) do not support it >/
         | 
         | The first array size is actually always decayed to a pointer,
         | supporting it in a compiler without analysis passes like TCC is
         | just a matter of skipping the "static" token and the size.
        
       | o11c wrote:
       | Bah, those are all well-known.
       | 
       | What value does the following program return?
       | int main()         {             int *p = 0;              loop:
       | if (p)                 return *p;                  int v = 1;
       | p = &v;             v = 2;             goto loop;
       | return 3;         }
       | 
       | Also, rather than doing `sizeof` via one error at a time, it's
       | better to just emit them to a char array {'0' + sz/10, '0' +
       | sz%10, '\0'}. Generalizing this to signed numbers of arbitrary
       | size is left as an exercise for the reader.
        
         | sweeter wrote:
         | Is it 2? I'm not exactly sure though. I'm interested in hearing
         | the logic
        
           | tylerhou wrote:
           | gcc, msvc, and clang both produce code that exits with code
           | 2: https://godbolt.org/z/WEYjns85Y
        
       ___________________________________________________________________
       (page generated 2024-09-26 23:00 UTC)