[HN Gopher] Lesser known tricks, quirks and features of C
___________________________________________________________________
Lesser known tricks, quirks and features of C
Author : jiripospisil
Score : 198 points
Date : 2023-07-01 13:59 UTC (9 hours ago)
(HTM) web link (jorengarenar.github.io)
(TXT) w3m dump (jorengarenar.github.io)
| dundarious wrote:
| The register keyword is still useful in compiler specific
| contexts. e.g., for GCC-ish compilers like gcc, clang:
| long result; register long nr __asm__("rax") =
| __NR_close; register long rfd __asm__("rdi") = fd;
| __asm__ volatile ("syscall\n" : "=a"(result) : "r"(rfd), "0"(nr)
| : "rcx", "r11", "memory", "cc");
|
| The above is basically how you might implement the close(int)
| syscall on x86-64.
|
| You don't need to be doing embedded programming to find it useful
| to dip down into assembly like that (though syscalls are perhaps
| a bad example, even for a syscall not provided by your C library
| -- that library probably provides a `syscall` function/macro that
| does all this in a platform agnostic way).
|
| Also, "%.*" is extremely useful with strings, i.e., "%.*s". Your
| code base should be using length delimited strings (basically
| `struct S { int len; char* str; };`) throughout, in which case
| you can do `printf("%.*s\n", s.len, s.str);`
| messe wrote:
| Um, no. You could could remove the register/__asm__("reg")
| qualifiers entirely and just specify "D" (rfd) as an input
| parameter and the code would work fine. There is absolutely no
| need for register there.
|
| The only good use of register these days is for project-wide
| globals in embedded contexts. IIRC one example of this is the
| decompilation of Mario 64, where a certain register ALWAYS
| contains the floating point value 1.0.
| dundarious wrote:
| Both are valid, but I much prefer the `register` method I
| gave (documented in https://gcc.gnu.org/onlinedocs/gcc/Local-
| Register-Variables....), as it is far more self-explanatory.
| GCC's extended asm syntax has too many inscrutable constraint
| and modifier codes even excluding these GCC-ext-asm-specific
| codes to reference machine-specific registers by name. As
| such, I totally disagree with your statement about "the only
| good use". Given that I first learned about that method of
| specifying registers by reading linux kernel source code, I
| think others would disagree as well.
| Dwedit wrote:
| I think the GCC ASM syntax for specifying inputs and
| outputs is quite clear enough, and doesn't require a
| variable declaration with unusual syntax.
| dundarious wrote:
| I'm merely referring to the fact that I specified rdi by
| writing "rdi" not "D". I can specify r10 by writing
| "r10", but I can't remember how to specify that directly
| in the inputs/outputs constraints -- a glance at
| https://gcc.gnu.org/onlinedocs/gcc/Machine-
| Constraints.html didn't show me how either, but I'm
| guessing it's there.
|
| [Edit: Although, on second glance, from
| https://gcc.gnu.org/onlinedocs/gcc/Extended-
| Asm.html#Input-O...:
|
| > If you must use a specific register, but your Machine
| Constraints do not provide sufficient control to select
| the specific register you want, local register variables
| may provide a solution (see Specifying Registers for
| Local Variables).
|
| indicates in the r10 case maybe you _must_ use the syntax
| I gave?]
|
| My preference is for the syntax that requires looking up
| fewer tables in GCC docs, but as I said, the version you
| prefer is fine too.
| LegionMammal978 wrote:
| Indeed, the register variable syntax is necessary for
| many of the registers; there are only so many of them
| that have been stuffed into the one-letter constraint.
| I've used it before for making raw x86-64 Linux syscalls
| (which can use r10, r8, and r9) without going through
| errno, as part of a silly little project ([0]) to read
| from a file descriptor without writing to process memory.
|
| [0] https://pastebin.com/mepsedCC
| dundarious wrote:
| Nice. Yes, linux/tools/include/nolibc has syscall macros
| that look near identical.
| asveikau wrote:
| One of the reasons for gcc's inline asm syntax being so
| verbose is it tells the compiler which registers are used
| and how. There is no indication in your last asm() that the
| value of rax before the asm() is read from. This means the
| compiler could assume it's safe to clobber rax just before.
| After all, you assigned a value into rax and never read it,
| a reasonable optimizer might say, why emit that first
| assignment at all?
| dundarious wrote:
| I think you should read
| https://gcc.gnu.org/onlinedocs/gcc/Extended-
| Asm.html#Input-O... and the link I gave earlier, if you
| think my example is something novel of my own
| construction. It's basically straight from the GCC docs.
|
| If your point is that I don't use result, that's because
| it's a snippet written into Hacker News. I didn't write
| the code to convert it into an errno and return -1 on
| error, etc., but doing so would be perfectly valid, and
| safe from your reasonable optimizer concerns.
| asveikau wrote:
| I see absolutely no examples in that link there where
| they assign into a register via C code, not using it
| anywhere else, and then assume you can read the same
| value back from that register in an asm() statement
| without declaring it as an input.
| dundarious wrote:
| I referenced both links (the latter links to the former,
| and the former, which was in the first comment of mine
| you replied to, contains the following examples:
| register int *p1 asm ("r0") = ...; register int
| *p2 asm ("r1") = ...; register int *result asm
| ("r0"); asm ("sysint" : "=r" (result) : "0" (p1),
| "r" (p2));
|
| and int t1 = ...; register int
| *p1 asm ("r0") = ...; register int *p2 asm ("r1")
| = t1; register int *result asm ("r0");
| asm ("sysint" : "=r" (result) : "0" (p1), "r" (p2));
|
| In my example, nr is rax and listed in the input section,
| rfd is rdi and also listed in the input section, and
| result is rax and listed in the output section (I even
| used your preferred syntax for specifying rax here).
| Using result after the syscall asm statement is perfectly
| valid.
| gpderetta wrote:
| Even on x86, There are some registers where there isn't a
| corresponding exact input operand modifier, so register is
| the only option. But I forgot which register.
| pm215 wrote:
| clang does not support specific-register variables for
| registers which are allocatable by its register allocator
| (https://clang.llvm.org/docs/UsersManual.html#gcc-
| extensions-...) so this is gcc-only. If you care about
| portability between gcc and clang you'll need some other
| approach...
| WalterBright wrote:
| D's inline assembler: asm { mov
| RAX,__NR_close; mov RDI, fd; call syscall;
| }
|
| The compiler automatically keeps track of which registers were
| modified.
| schemescape wrote:
| I learned a few new things, but it would be more helpful if this
| had info on whether these are standard and, if so, which standard
| they are a part of.
| thaliaarchi wrote:
| The trick for preserving array lengths in function signatures
| looks quite useful (e.g., `void foo(int p[static 1]);` for a non-
| null pointer p). Unfortunately, I think the overloaded use of
| `const` and `static` somewhat obfuscates its semantics.
| asicsp wrote:
| This article was already discussed here:
| https://news.ycombinator.com/item?id=34855331 _(410 points | 4
| months ago | 176 comments)_
|
| But that link no longer works.
| bryancoxwell wrote:
| > I did a sloppy job of gathering some of them in this post (in
| no particular order) with even sloppier short explanations
|
| I wonder why developers tend to be so self deprecating
| tyre wrote:
| I think it's in part a preemptive defense against the endless
| nitpicking from their audience.
| camel-cdr wrote:
| Yeah, I think it's this. IIRC one of the first places it was
| posted is on the C_Programming reddit, where the bar for
| "lesser known" is quite high.
| badtension wrote:
| Being afraid of failure and the impostor syndrome. You mark the
| territory and lower the expectations to look better in the end
| (even if only in your own eyes). A ton of people do it, it's
| hard to get out of it.
| dmvdoug wrote:
| The earlier comments were so cynical I felt like I needed to
| offer another possibility: maybe they set out to do this in a
| more systematic way, then got so deep in the weeds they
| realized it would take them forever to put it into more
| systematic form, but they didn't want to just leave it sitting
| there sight unseen. So they acknowledge it's sloppier than they
| would like it to be, but hey, at least it's something. That's
| not really self deprecating so much as just... being
| transparent?
| wongarsu wrote:
| I assume it's just that being self-deprecating or humble
| correlates with many traits that make you a good developer, so
| people with those traits are more likely to end up in this
| career path and stick around in it.
|
| Just like being a sales person doesn't automatically make you
| overconfident, but being overconfident makes you a good sales
| person.
| bmacho wrote:
| For me it is what its written there: doing a less than
| satisfactory job, and not wanting to do it correctly.
| jnspts wrote:
| C11 added _Generic to language, but turns out metaprogramming by
| inhumanely abusing the preporcessor is possible even in pure C99:
| meet Metalang99 library.
|
| I'm actually working on a library doing just that! It's still in
| very (very) early development, but maybe someone may find it to
| be interesting. [1]
|
| Link [2] is the implementation of a vector. Link [3] is a test
| file implementing a vector of strings.
|
| [1]: https://github.com/jenspots/libwheel
|
| [2]:
| https://github.com/jenspots/libwheel/blob/main/include/wheel...
|
| [3]:
| https://github.com/jenspots/libwheel/blob/main/tests/impl/st...
| burstmode wrote:
| C++ Metaprogramming is also just a bunch sugarcoated
| preprocessor macros and it was never someting else.
| idispatch wrote:
| This is plainly not true
| cempaka wrote:
| The downvotes make me laugh, I did something pretty similar not
| long after the _Generic keyword came out and remember getting a
| pretty icy reception even though I was pretty up front about
| how painful and crufty it is.
|
| https://abissell.com/2014/01/16/c11s-_generic-keyword-macro-...
| buserror wrote:
| I "abuse" unions and anonymous unions all the time, it's very
| practical, and make the 'user' code a lot clearer as you can
| access the small 'namespace' as convenient. Here for example I
| can access it as named coordinates, x,y points or a vector.
| typedef union c2_rect_t { struct { c2_pt_t
| tl, br; }; c2_coord_t v[4]; struct {
| c2_coord_t l,t,r,b; }; } c2_rect_t;
| synergy20 wrote:
| confused by the code,can you elaborate more
| trentnelson wrote:
| 100% agree! I use whatever tool in the C toolbox results in the
| easiest-to-(read|grok) code, which means tons of anonymous
| union/struct "abuse", bit fields, function pointer typedefs,
| strictly adhering to "Cutler Normal Form".
| projektfu wrote:
| I used to use the comma operator to return a status code in the
| same line as an operation, but for some reason nobody liked my
| style. if (error_condition) return
| *result=0, ERR_CODE;
|
| So, back to writing lots of statements.
| gpderetta wrote:
| I have used it c++ to take a scoped lock without naming it and
| returning a mutex protected value: return
| scoped_lock{foo_mux_}, return foo_;
|
| Also nobody likes it.
| Joker_vD wrote:
| Deosn't such an unnamed variable get's immediately
| destructed, right at the comma? I am pretty certain I'd hit
| exactly that problem and had to switch to __LINE__-macro to
| name such scoped locks.
| _kst_ wrote:
| I think you mean: return
| scoped_lock{foo_mux_}, foo_;
| enriquto wrote:
| I love this style (avoiding multiple-statement blocks) !
|
| Sometimes you can still avoid multiple statements by
| rearranging your code otherwise. For example, in your case, you
| can set *result=0 at the beginning of the function. Other
| times, you can also cram the assignment inside the condition
| using short circuit evaluation; this trick somehow seems more
| palatable to normies than the comma operator.
| projektfu wrote:
| Yeah, I felt that in cases (like when doing COM programming)
| where the true result is almost always returned in a
| parameter and the status code as the return value, it made a
| lot of sense to me to combine those into one line wherever
| they appeared. But, like the article says, this is a lesser-
| known operator. In libc style, a similar thing makes sense,
| e.g. return errno=ENOENT, (FILE*)0;
|
| I don't know if anyone uses this style.
|
| In K&R, they say that it's mostly used in for loops, such as
| for (i = 0, j = strlen(s) - 1; i < j; i++, j--) ...
|
| So that is where I use it now.
| omoikane wrote:
| %n format specifier was lesser known until an IOCCC winner made
| it famous.
|
| https://news.ycombinator.com/item?id=23445546 - Tic-Tac-Toe in a
| single call to printf
| Croftengea wrote:
| > Compound literals are lvalues
|
| > ((struct Foo){}).x = 4;
|
| Do such lvalues have any real use?
| eqvinox wrote:
| The key thing about it being an lvalue is that you can take its
| address -- you can only take the address of lvalues. Other than
| that, no, no real use.
| kzrdude wrote:
| You could pass them to a function where something non-null is
| required but you don't want to use it, like : `f(&(struct
| Foo){0})`
| [deleted]
| eqvinox wrote:
| The %.* example is so close to hitting its single most useful
| application: char *something; /* no null
| termination */ size_t something_length;
| printf("%.*s", (int)something_length, something);
|
| Unfortunately, the .* argument has type (int), not size_t, and
| it's signed... but if that's not a problem this is a great way to
| format non-\0-terminated strings.
|
| (And of course you can also use it to print a substring without
| copying it around first.)
| jrpelkonen wrote:
| In this simple case, if the int cast is a problem, fwrite would
| be an adequate alternative, don't you think?
| PaulDavisThe1st wrote:
| not for s(n)printf ...
| LegionMammal978 wrote:
| For that, the analogue would be memcpy; but both
| alternatives lose the ease of surrounding the string with
| other text, since you either have to do the length
| calculations or define helper functions.
| thaliaarchi wrote:
| Somewhat related to this, printf alone in a loop is Turing-
| complete, by using %-directives like that. It was introduced in
| "Control-Flow Bending: On the Effectiveness of Control-Flow
| Integrity" (Carlini, et al. 2015) and the authors have
| implemented Brainfuck and an obfuscated tic-tac-toe with it.
|
| [0]: https://nebelwelt.net/publications/files/15SEC.pdf
|
| [1]: https://github.com/HexHive/printbf
|
| [2]: https://github.com/carlini/printf-tac-toe
| EPWN3D wrote:
| Casting to and from void is a "lesser known" feature of C?
| chrishill89 wrote:
| > Multi-character constants
|
| I asked on SO why C characters use `'` on both ends instead of
| just one (e.g. why not just `'a` instead of `'a'`?). This seems
| to have been the biggest reason.
| qsort wrote:
| The main reason for that is practicality. '\n' and '\0' are
| also characters. You could somehow still parse it, but it would
| be less clear and possibly need more escaping.
|
| Multi-character constants are historical baggage.
| dfox wrote:
| > Multi-character constants are historical baggage.
|
| It is more that it is an implementation detail of some
| compilers that was then (ab)used by certain platforms.
| chrishill89 wrote:
| I don't see the issue. If it's a literal character it's one
| character; if it's `\n` or or `\0` then it's two; if it is an
| octal escape it's four; and so on.
|
| You have to parse them the same way in a character literal as
| in a string literal, anyway.
| chrishill89 wrote:
| > if it is an octal escape it's four;
|
| I just figured out that
|
| 1. `\0` and octal numbers share the same prefix
|
| 2. Octal numbers can have 1-3 digits (not fixed)
|
| So maybe it's more tricky than I thought.
| _kst_ wrote:
| An octal-escape-sequence is a backslash followed by 1, 2,
| or 3 octal digits.
|
| '\0' is just another octal escape sequence, not a
| special-case syntax for the null character.
|
| "\0", "\00", and "\000" all represent the same value;
| "\0000" is a string literal containing a null character
| followed by the digit '0'.
|
| Hexadecimal escape sequences can be arbitrarily long. If
| you need a string containing character 0x12 followed by
| the digit '3', you can write "\x12" "3".
| Ontonator wrote:
| Not relevant to C, of course, but Ruby supports something like
| this with `?a` being equivalent to `"a"` (both of which are
| strings, since Ruby doesn't distinguish strings from
| characters). From what I've seen, it is recommended against in
| most styles, I assume because it is harder to read for most
| people.
| djur wrote:
| In older days before Ruby had encoding-aware strings, ?a
| would return the ASCII integer value of 'a'. It made sense in
| that context but is now pretty much a quirky fossil.
| bluetomcat wrote:
| Fun fact: the order of type qualifiers (const, volatile,
| restrict), type specifiers (char, int, long, short, float,
| double, signed, unsigned) and storage-class specifiers (auto,
| register, static, extern, typedef) is not enforced at the current
| indirection level. This means that the following declarations are
| identical: long long int x; int long
| long x; long int long x; typedef int myint;
| int typedef myint; const char *s; char const
| *s; const char * const volatile restrict *ss;
| const char * volatile const restrict *ss;
| qsort wrote:
| But preference in ordering immediately qualifies you as coming
| from the east or the west const.
| dmvdoug wrote:
| I like the idea of someone sitting down and looking at
| someone else's code, leaning back with satisfaction after
| they notice the programmer's preference. "I like the cut of
| their jib."
| jfghi wrote:
| I remember some really nice macro usage.
| thumbuddy wrote:
| This is the sprinkles on the icing of the five teir cake why C
| scares me. Thanks for sharing this, I'm sure it will help someone
| but I sincerely hope the never write C again.
| lelanthran wrote:
| > This is the sprinkles on the icing of the five teir cake why
| C scares me. Thanks for sharing this, I'm sure it will help
| someone but I sincerely hope the never write C again.
|
| I looked through this list, and I gotta ask, which items
| exactly do you find scary? Most other popular languages have
| similar, if not worse, quirks than the ones in this particular
| list.
| [deleted]
___________________________________________________________________
(page generated 2023-07-01 23:00 UTC)