[HN Gopher] Mildly interesting quirks of C
___________________________________________________________________
Mildly interesting quirks of C
Author : goranmoomin
Score : 206 points
Date : 2022-11-20 11:54 UTC (11 hours ago)
(HTM) web link (gist.github.com)
(TXT) w3m dump (gist.github.com)
| photochemsyn wrote:
| Related: "A primer on some C obfuscation tricks"
|
| https://news.ycombinator.com/item?id=22961054
| guenthert wrote:
| > UB is impossible
|
| What? UB is clearly _undesirable_ , but assuming it is impossible
| and deducing other outcomes must be meant are clearly wrong
| assumptions by the compiler writer.
|
| More sensible compilers (including older version of clang) do the
| right thing (TM) here and yield a compiler error.
|
| There were earlier attempts at do-what-i-mean programming
| languages. They are rightfully buried in history.
| MauranKilom wrote:
| > UB is clearly undesirable, but assuming it is impossible and
| deducing other outcomes must be meant are clearly wrong
| assumptions by the compiler writer.
|
| Compilers can and absolutely do assume that UB is impossible in
| this code (no integer overflow) and deduce other outcomes must
| be meant (the loop operates on contiguous memory):
| void foo(char* arr, int32_t end) { for (int32_t i
| = 0; i != end; ++i) arr[i] = 0; }
|
| (Based on code from the gist comments.)
| cvoss wrote:
| UB is not impossible; I think the author is being a little
| cheeky there. But the standard does grant compilers extreme
| liberties as far as how they deal with programs which can
| execute UB. LLVM's choice of what to do with that liberty, in
| this case, seems to be to assume the UB is unreachable and
| continue legally optimizing the program under that assumption.
| That's not a wrong assumption according to the definition of C.
|
| It's debatable whether it's a _good_ assumption. But not wrong.
| antirez wrote:
| The top comment in the gist looks like from "Hacker News Parody
| Thread".
| xigoi wrote:
| The parody thread should include a comment that references
| another comment by its position, not realizing that it might
| change.
| hoosieree wrote:
| 8. Modifiers to array sizes in parameter definitions
| [https://godbolt.org/z/FnwYUs] void foo(int arr[static
| const restrict volatile 10]) { // static: the array
| contains at least 10 elements // const, volatile and
| restrict all apply to the array type. }
|
| I imagine most of these depend on the C version, but this one
| specifically bit me because one tool only supported c99 and the
| other was c11 or something later.
| teddyh wrote:
| See also the _comp.lang.c Frequently Asked Questions_ :
| https://c-faq.com/
| marcthe12 wrote:
| C quirks. This is interesting. I have used some of the tricks
| myself, #1,#2,#4,#5.
|
| #2 and #5 can be combined to make and interesting hack. When
| combinded with memcpy you can do int *a =
| memcpy(&(int){0}, b, sizeof *b);
|
| C23 typeof makes this even more interesting
|
| If you what an challenge here is a standard compliant c code. Try
| to undestand it. If can understand you are a master of c's type
| system static int* (*const *(*restrict
| x)[5])(volatile union {struct{int a;int b;};}[static const
| restrict 5], register enum{HELLO,WORLD} a) = {0};
| flohofwoe wrote:
| Another one, adhoc struct declaration in the return type of a
| function: struct bla_t { int a, b, c, d; }
| make_bla(void) { return (struct bla_t){ .a=1, .b=2,
| .c=3, .d=4 }; }
|
| https://www.godbolt.org/z/Pha7dPzeq
|
| Also to be pedantic: "= {};" is not valid C (at least until C23)
| and fails to compile on MSVC - GCC and Clang accept it as a non-
| standard language extension though (the proper form would be "=
| {0};").
| sbf501 wrote:
| The switch/case anywhere looks equally useful and dangerous, and
| is so close to assembly that it really illustrates the low-level
| capabilities of C.
| Teknoman117 wrote:
| I consider the array pointer stuff a bit of a foot-gun in C. I've
| seen too many examples of people mixing up uint8_t[][] and
| uint8_t**.
|
| The "compound literals are lvalues" pattern I've seen many times
| for inline initializing a struct that's only going to be around
| as a parameter to a single function call.
| PointyFluff wrote:
| As someone who's moved on to Rust, I see this as one long list of
| nightmares.
| tmtvl wrote:
| I also see a fair few elements on that list as being
| problematic, to say the least. Can't stand Rust, though, so for
| those times I really need high performance I try and keep my C
| knowledge sharp-ish.
|
| Fortunately GCC has a whole bucket-list of warnings that can be
| enabled (I like compiling with -Wall -Wextra -pedantic, myself)
| which can, combined with proper tooling, catch many issues.
| rahen wrote:
| I don't think the "I use Rust btw" comments contribute much to
| the discussion.
|
| C and Rust don't perfectly overlap, especially since Rust is
| more a replacement to C++ than C.
| xcdzvyn wrote:
| While a few of these were interesting I'd love to see a short
| technical explanation of each quirk for the feeble high-level
| programmer (me). The first one for example, is foo initialised?
| How so?
| coliveira wrote:
| The reason is that a struct doesn't generate a new scope, like
| in C++. If you define something inside a struct it will also be
| available outside of the struct.
| veltas wrote:
| I think it's aimed at C programmers. foo is a struct, so it's a
| type, it's not a variable. The point is just that struct bar is
| also defined by the definition of struct foo.
| unnouinceput wrote:
| Quote: "4. Flexible array members ..... int elems[]; // <--
| flexible array member"
|
| TIL that a dynamic array is also called flexible. This
| generation, out of boringness, is trying to redefine well
| established paradigms? Because, for me, a 90's formed developer,
| "flexible" means maybe inheritance, or even better polymorphism.
| There is nothing flexible about a dynamic array. Its structure is
| well defined in the stack/heap, and with current compiler
| optimizations can even be demoted to a simple static array for
| faster access within CPU registries.
| Jorengarenar wrote:
| "Dynamic array" refers to block of memory allocated via
| malloc() which you just happen to use as array.
|
| "Flexible array member" [0] is when you have a _struct_ and its
| last member is an array with unspecified size.
|
| An example: #include <stdio.h> #include
| <stdlib.h> struct Foo { int len;
| int* arr; // dynamic "array" }; struct Bar {
| int len; int arr[]; // FAM }; int
| main() { const int n = 12; //
| have to allocate myself; no guarante it will be nearby the rest
| of struct struct Foo* a = malloc(sizeof a);
| a->arr = malloc(n * sizeof *(a->arr)); //
| array is part of the memory allocated for struct
| struct Bar* x = malloc((sizeof x) + n*(sizeof *(x->arr)));
| return 0; }
|
| [0]: https://en.wikipedia.org/wiki/Flexible_array_member
| unnouinceput wrote:
| >"Dynamic array" refers to block of memory allocated via
| malloc() which you just happen to use as array.<
|
| No. A dynamic array is an array which can be expanded or
| shrinked during its runtime life. The fact that C/C++ uses
| malloc for that (and btw, it's not the only way to do it)
| it's her problem. In other languages you have dynamic arrays
| that can be expanded/shrinked without using an extra line -
| main reason why nowadays Rust is a replacement for C/C++
|
| >[0]< From you own wiki reference: "the flexible array member
| must be last"
|
| LMAO, really? Well, that indeed is a bigger C quirk. In
| Pascal, as an example, I can have it anywhere inside the
| record (struct equivalent of C), and it can be just as
| "flexible".
| blep_ wrote:
| It has to be last because it's not a pointer to the array,
| it _is_ the array. The array elements are immediately after
| the struct in memory. You can 't resize it without
| reallocating the whole struct.
| kazinator wrote:
| Regarding 12, alignment of bitfields, how I believe it works is
| that when the bitfield of type long is laid out, then the
| structure so far is considered to be a vector of storage cells
| whose size and alignment are those of long:
| struct foo { char a; long b: 16; char c;
| };
|
| So, _a_ has been laid into the structure, so the current offset
| is 1 byte. This is considered to be occupying a portion of an
| existing _long_ type bitfield cell. In other words _a_ is
| essentially taken to be an 8-bit field in the first _long_ -sized
| cell of the structure. That cell looks like it has 56 bits left
| in it (if we assume 64 bit long). Since 56 > 16, the new bitfield
| _b_ is placed into that cell. When that field is placed, the
| placement offset becomes 3. The type of _c_ being char, that
| offset is acceptable for _c_.
|
| I've painstakingly reverse engineered the rules when developing
| the FFI for TXR Lisp: 1> (sizeof (struct foo (a
| char) (b (bit 16 long)) (c char))) 8 2> (alignof
| (struct foo (a char) (b (bit 16 long)) (c char))) 8
| 3> (offsetof (struct foo (a char) (b (bit 16 long)) (c char)) a)
| 0 4> (offsetof (struct foo (a char) (b (bit 16 long)) (c
| char)) b) ** ffi-offsetof: b is a bitfield in #<ffi-type
| (struct foo (a char) (b (bit 16 long)) (c char))> 4>
| (offsetof (struct foo (a char) (b (bit 16 long)) (c char)) c)
| 3
|
| I've summarized my empirically-obtained understanding for the
| benefit of users and anyone else doing similar work in a
| different project.
|
| https://www.nongnu.org/txr/txr-manpage.html#N-027D075C
| plq wrote:
| Whenever the subject of C/C++ quirks is brought up, I always like
| to point out the Deep C/C++ presentation:
|
| http://www.pvv.org/~oma/DeepC_slides_oct2011.pdf
|
| Source: https://freecomputerbooks.com/Deep-C-and-
| Cpp.html#downloadLi...
|
| Previous discussion: https://news.ycombinator.com/item?id=3093323
|
| It could be considered a bit dated at this point (It's before
| C++11) but I find it still both entertaining and educating.
| lanorienne wrote:
| [deleted]
| sureglymop wrote:
| That was very interesting!
| andrepd wrote:
| Loved that, read it start to finish. C is already a minefield
| and it looks positively tame when compared to C++!
| veltas wrote:
| "Flat Initializer Lists" is given as an example in K&R C I think,
| at least the first edition, when writing those extra braces to
| fill out an initializer must have felt very redundant.
|
| These days many compilers will warn if you do this, however, as
| it is rare people do this and usually indicates a
| misunderstanding of the type used.
|
| I think it's quite readable though, so it's a shame it causes
| warnings. What do you think? struct { const char
| *name; int age; } records[] = { "John", 20,
| "Bertha", 40, "Andrew", 30, };
| blep_ wrote:
| I find it slightly worse to read. It's C, so my brain is in
| "newlines don't matter" reading mode, so I see an array of 6
| things and then have to mentally split them back up.
| fb03 wrote:
| My favorite C "quirk": If you have an array and you want to
| access an item of it, you can swap the variable and the index
| number (put the variable name inside brackets and the number
| outside): a[5]
|
| is the same as: 5[a]
|
| why? a[5] is actually sugar for *(a + 5), so by
| commutative property, you can also do *(5 + a) to access the same
| memory position :-)
| FartyMcFarter wrote:
| One funny variant is this expression: "abcde"[4]
| whoopdedo wrote:
| You mean: 4["abcde"]
| FartyMcFarter wrote:
| Actually yes, oops :)
| jwilk wrote:
| That's #15 on the list.
| brookst wrote:
| *15#
| cptnapalm wrote:
| &(list + 15) = https://gist.github.com/fay59/5ccbe684e6e56a7d
| f8815c3486568f...
| the_svd_doctor wrote:
| *(list + 14) ?
| yccs27 wrote:
| That's #list on the 15.
| [deleted]
| [deleted]
| camel-cdr wrote:
| Here are two of my favorite obscure quirks of C:
| struct X { char x[8]; }; struct X awoo(void);
| printf("%s\n", awoo().x);
|
| The above is UB in <= C99 and valid in >= C11. [0]
| struct X { char b[8]; } foo(); int *b = foo().b;
| printf("%s\n", b);
|
| The above is UB in >= C11 and valid in <= C99. [1]
|
| [0]
| https://wiki.sei.cmu.edu/confluence/plugins/servlet/mobile?c...
|
| [1] http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1285.htm
| Asooka wrote:
| I really wish both would be valid in C11. Or rather I wish I
| had "systems-C" where all the undefined behaviour added for
| high performance computing was filed off and defined as
| "whatever the platform does".
| masklinn wrote:
| > all the undefined behaviour added for high performance
| computing
|
| UBs were added for cross-incompatibilities, where operations
| were too "core" (and / or untestable) for IBs to be
| acceptable. The reason was not performance (aside from not
| imposing a runtime check where that would have been possible)
| but portability:
|
| > 3.4.3 undefined behavior behavior, upon use of a
| nonportable or erroneous program construct or of erroneous
| data, for which this International Standard imposes no
| requirements
|
| Those UBs were leveraged later on by optimising compilers,
| because they provide constraints compensating for C's useless
| type system.
|
| So you can just use a non-optimising compiler or one which
| only does simple optimisations (e.g. tcc), and see what the
| compiler generates from your UBs.
| tsimionescu wrote:
| The standard also has implementation-defined behavior,
| doesn't it?
| rwmj wrote:
| Reminds me a bit of _" Who Says C is Simple?"_ written by the
| people who wrote a C parser & analyser in OCaml (CIL):
|
| https://cil-project.github.io/cil/doc/html/cil/cil016.html
|
| Also: https://cil-project.github.io/cil/doc/html/cil/cil012.html
| still_grokking wrote:
| > "Who Says C is Simple?"
|
| People who don't know what "simple" means and confuse it with
| "easy".
|
| https://www.entropywins.wtf/blog/2017/01/02/simple-is-not-ea...
|
| https://www.infoq.com/presentations/Simple-Made-Easy/
|
| "Easy" things almost always lead to astonishing complexity.
|
| Also it's easy to see just how complex C is: Have a look at a
| formal description of it! (And compare to a truly simple
| language like e.g. LISP).
|
| https://github.com/kframework/c-semantics/tree/master/semant...
|
| In contrast some basic Lambda calculus language semantics fit
| 0.5 of a page in K.
|
| https://www.youtube.com/watch?v=eSaIKHQOo4c
|
| https://www.youtube.com/watch?v=y5Tf1EZVj8E
| owl_vision wrote:
| +1 for simple is not easy, yet with enough thinking and
| ingenious ideas, it is achievable. Thank for the links.
|
| "simplicity is the ultimate sophistication." -- Leonardo da
| Vinci
| 752963e64 wrote:
| still_grokking wrote:
| After learning about a few of these I started to understand why
| people coming from C always said that PHP is a well designed
| language...
|
| But OK, I understand that my mind is just not made for the
| complexity of C. Most likely I'm not a real programmer.
|
| I get instantly knots in my brain and start to bang my head
| against the wall when I need to look for too long on C code.
| Actually even C documentation is enough to trigger this. (I get
| mad every time I have to look on a Linux system man page).
|
| This is highly subjective of course. Other people seem to love C!
|
| I'm more of a grug brain1, who mostly only understands plain pure
| functions.
|
| Input in, output out. No magic. Everything else's too taxing.
|
| 1 https://grugbrain.dev/
| [deleted]
| beyonddream wrote:
| Can someone explain how "A constant-expression macro that tells
| you if an expression is an integer constant" works ?
| scatters wrote:
| If `x` is a constant, `(x) * 0l` is a zero constant, so
| `(void*)((x) * 0l)` is a null pointer. When a null void pointer
| is one branch of a ternary conditional, the expression takes
| the (pointer) type of the other branch.
|
| If `x` is not a constant, `(void*)((x) * 0l)` is a void pointer
| to address 0 (which may not even be a null pointer at runtime,
| since null may have a runtime address distinct from zero!). The
| ternary conditional then unifies the types of the branches,
| resulting in `void*`.
| beyonddream wrote:
| My understanding of how it works is, with constant value, the
| compiler replaces (x) with the constant 0 and converts (void *)
| into (int *) which makes the size equality to return true. But
| I am not entire sure :)
| ainar-g wrote:
| Are there any practical cases where you'd want "extern void foo"?
| veltas wrote:
| You could use it for getting an address that will be linked in
| later. On GCC I get a warning (which I don't think I can mask)
| for taking the address of such an object, because its
| expression is type void. A better way of achieving this is
| usually to declare something like extern unsigned char foo[]
| instead, but that has a type other than void*.
___________________________________________________________________
(page generated 2022-11-20 23:00 UTC)