[HN Gopher] Lesser known tricks, quirks and features of C
___________________________________________________________________
Lesser known tricks, quirks and features of C
Author : rramadass
Score : 87 points
Date : 2024-09-26 16:25 UTC (6 hours ago)
(HTM) web link (jorenar.com)
(TXT) w3m dump (jorenar.com)
| coreyp_1 wrote:
| That's a nice list!
|
| I've been digging into cross-platform (Windows and Linux) C for a
| while, and it has been fascinating. On top of that, I've been
| writing a JIT-ted scripting (templating) language, and the ABI
| differences (not just fastcall vs stdcall vs cdecl) are often not
| easy to find documentation about.
|
| I've decided that if I ever get to teach a University class on C
| again, I wanted to cover some of these things that I feel are
| often left out, and this list is a helpful reference! Thanks!
| ranger_danger wrote:
| > quirks and features
|
| Someone is a fan of Doug DeMuro.
| randomdata wrote:
| _This..._ is the 1972 Riche C
| saagarjha wrote:
| Mentioning %n without explaining that it is overwhelmingly used
| for exploits is a little reckless IMO.
| greiskul wrote:
| I'm curious about this, didn't know about %n before. What are
| the common pitfalls and exploits using this enables?
| lights0123 wrote:
| If the user can control the formatting string, they can write
| to pointers stored on the stack. It's important to use
| printf("%s", str) instead of printf(str).
| rep_lodsb wrote:
| Useless use of printf; what's wrong with "puts(str)"?
| shawn_w wrote:
| puts() adds a newline at the end. gcc will happily turn
| printf("%s\n", str) into puts(str), though.
|
| I've never tested to see if printf("%s", str) becomes the
| equivalent fputs(str, stdout)
| mananaysiempre wrote:
| You would expect a printf call with a user-controlled format
| string to be, at worst, an arbitrary read. Thanks to %n, it
| can be a write as well.
| _kst_ wrote:
| Background: A %n format specifier in a printf call stores the
| number of characters written so far into a specified variable.
| For example: #include <stdio.h> int
| main(void) { int count;
| printf("%s%n\n", "hello, world", &count);
| printf("count = %d\n", count); }
|
| The output is: hello, world count =
| 12
|
| %n can be exploited to write data to an arbitrary memory
| location, but _only_ if the format string is something other
| than a string literal.
|
| %n can be exploited, but it's entirely possible to use it
| safely.
| jonathrg wrote:
| Multi character constants is one of the many things in C that
| would be nice to use if the language would just choose some well-
| defined behaviour for it. It doesn't really matter which.
| mananaysiempre wrote:
| Mainstream compilers agree on multicharacter literals being big
| endian; that is, 'AB' is usually 'A' << CHAR_BIT | 'B'. The
| exception is MSVC, which also works like that as long as you
| don't use character escapes, but if you do it emits some sort
| of illogical, undocumented mess that looks like an ancient
| implementation bug fossilized into a compatibility constraint.
| golergka wrote:
| switch (n % 2) { case 0: do {
| ++i; case 1: ++i;
| } while (--n > 0); }
|
| Someone is really ought to record a "WAT" video about C.
| mananaysiempre wrote:
| The switch statement in C is not a very limited pattern match.
| The switch statement in C is a very ergonomic jump table. Do
| not think ML's case-of with only integer literals for patterns;
| think FORTRAN's computed GO TO with better syntax. And it will
| cease to be a WAT. (For a glimpse of the culture before pattern
| matching was in programmers' collective consciousness, try the
| series on designing a CASE statement for Forth that ran for
| several issues of Forth Dimensions.)
| russellbeattie wrote:
| I don't think there's any confusion of how it works, it's the
| deep horror in discovering that it's possible in the first
| place, and a morbid curiosity of the chaos it could cause if
| abused.
| mananaysiempre wrote:
| At least for me, the feelings you describe are
| characteristic of a footgun, not a WAT. A WAT is rather a
| desperate bewilderment as to who could ever design the
| thing that way and why, and for switch statements computed
| gotos are the answer to that question.
|
| As for the footgun side, I mean, it could be one in theory,
| sure. But I don't think I've ever seen it actually fired.
| And I can't really appreciate the Javaesque "abuse"
| thinking--it is to some extent the job of the language
| designer to prevent the programmer from accidentally doing
| something bad, but I don't see how it is their job to
| prevent a programmer from deliberately doing strange
| things, as long as the result looks appropriately strange
| as well.
|
| (There are reasons to dislike C's switch statement, I just
| don't think the potential for "abuse" is one.)
| PhilipRoman wrote:
| Just think of the "case" statements like any other label,
| despite the misleading indentation. Then it becomes perfectly
| natural to jump in the middle of a loop.
| agumonkey wrote:
| I wonder if there's any other instance (in programming or else)
| of intersecting grammar constructs being accepted.
| tom_ wrote:
| This sort of thing is pretty handy sometimes. Don't forget you
| can have code (e.g., start of the loop) before any of the cases
| too!
| fuhsnn wrote:
| My recent favorite is glibc's hack to implement _Static_assert
| under C99:
| https://codebrowser.dev/glibc/glibc/misc/sys/cdefs.h.html#56...
|
| It uses the constant expression to create a bitfield of size -1
| when failed, and leaves the compiler to error on that as the
| intended assertion. The actual statement is an extern pointer to
| a function returning a pointer to an array which has sizeof the
| aforementioned bitfield struct as its size.
|
| Another one encountered in Toybox is (0 || "foo") being a const
| expression that evaluates to 1. Apparently the string literal
| must have been soundly created in data section, so its pointer
| address is safely assumed to be non-zero.
| wolfspaw wrote:
| Really liked the trick of defining the struct in the return part
| of the function.
|
| Array pointers: Array to pointer decay is extremely annoying, if
| it was implemented as Array to "slice" decay it would be great.
|
| Static array indices in function parameter declarations: awesome,
| a shame that C++ (and Tiny C) do not support it >/
|
| flexible array member: extremely useful, and now there are good
| compiler flags for ensuring correct flexible array member usage
|
| X-Macro: nice, no-overhead enum to string name. Didn't know the
| trick
|
| Combining default, named and positional arguments: Named-
| arguments/default-arg, C version xD. It would be cool if it was
| added to C language as a native feature, instead of having to do
| the struct hiding macro.
|
| Comma operator: really useful, specially in macros
|
| Digraphs, trigraphs and alternative tokens: di/tri/graphs rarely
| useful, alternatives synonims of iso646.h are awesome, love using
| and/or instead of &&/||
|
| Designated initializer: super awesome, could not use if you
| wanted C++ portability. Now C++ supports some part of it.
|
| Compound literals: fantastic, but in C++ it will explode due to
| stack deallocation in the same line. C++ should fix this and
| allow the C idiom >/
|
| Bit fields: nice for more control of structs layout
|
| constant string concat: "MultiLine" String, C version xD
|
| Ad hoc struct declaration in the return type of a function:
| didn't know this trick, "multi value" return, C version xD
|
| Cosmopolitan-libc: incredible project. Already knew of it, its
| awesome to offer a binary that runs in all S.Os at the same time.
|
| Evaluate sizeof at compile time by causing duplicate case error:
| ha, nice trick for debugging the size of anything.
| fuhsnn wrote:
| >Static array indices in function parameter declarations:
| awesome, a shame that C++ (and Tiny C) do not support it >/
|
| The first array size is actually always decayed to a pointer,
| supporting it in a compiler without analysis passes like TCC is
| just a matter of skipping the "static" token and the size.
| o11c wrote:
| Bah, those are all well-known.
|
| What value does the following program return?
| int main() { int *p = 0; loop:
| if (p) return *p; int v = 1;
| p = &v; v = 2; goto loop;
| return 3; }
|
| Also, rather than doing `sizeof` via one error at a time, it's
| better to just emit them to a char array {'0' + sz/10, '0' +
| sz%10, '\0'}. Generalizing this to signed numbers of arbitrary
| size is left as an exercise for the reader.
| sweeter wrote:
| Is it 2? I'm not exactly sure though. I'm interested in hearing
| the logic
| tylerhou wrote:
| gcc, msvc, and clang both produce code that exits with code
| 2: https://godbolt.org/z/WEYjns85Y
___________________________________________________________________
(page generated 2024-09-26 23:00 UTC)