[HN Gopher] C Integer Quiz
___________________________________________________________________
C Integer Quiz
Author : rwmj
Score : 247 points
Date : 2022-09-04 11:45 UTC (11 hours ago)
(HTM) web link (www.acepace.net)
(TXT) w3m dump (www.acepace.net)
| nayuki wrote:
| This tool I recently made can help:
| https://www.nayuki.io/page/summary-of-c-cpp-integer-rules#ar...
|
| For example, you can see what (signed long) > (unsigned int)
| would convert to, under different environment bit width
| assumptions.
|
| Also, there's a discussion on Reddit:
| https://www.reddit.com/r/cpp/comments/x4x01f/cc_arithmetic_c...
| jart wrote:
| Another interesting thing about C integers that the quiz doesn't
| cover is that remainder is not modulus. For example, in Python:
| >>: 2 % -5 -3
|
| But in C: 2 % -5 == 2
|
| If you want to use modulus in C rather than Euclidean remainder,
| then you have to use a function like this, which does what Python
| does: long mod(long x, long y) { if
| (y == -1) return 0; return x - y * (x / y - (x % y &&
| (x ^ y) < 0)); }
| cozzyd wrote:
| One of my favorite tables:
| https://en.m.wikipedia.org/wiki/Modulo_operation#In_programm...
| marshallward wrote:
| Fortran supports both of these, with `mod` as the C-like
| truncated modulo and `modulo` as the floored modulo. Having
| both is convenient, but you do get errors from people who don't
| realize the difference
| mtreis86 wrote:
| Common Lisp as well, Rem and Mod
| qsort wrote:
| Mathematically speaking, the mod is usually taken to be
| positive. Any reasonable definition would involve taking the
| quotient of Z over the congruence relation, so I can see both
| -3 and 2 being reasonable conventions to represent partitions.
|
| OTOH, the largest "wat" of C-like languages is the following:
| > -8 % 5 -3
|
| why
|
| Python here is once again correct: >>> -8 % 5
| 2
| veltas wrote:
| Mathematically speaking if you ask for the modulus of -5 you
| may as well get a negative number, and I think people may
| validly have their own interpretations of what's "correct" at
| this point.
| qsort wrote:
| I didn't say it's wrong.
|
| You may as well use {white, blue, black, red, green} to
| represent congruences mod 5, mathematically speaking it's
| not wrong, as long as they respect the axioms of a field.
|
| What's "wat" (in the same sense as the js "wat" talk) is
| that the answer changes if the argument becomes negative.
| By all reasonable definitions of modular arithmetic, -8 and
| 7 are in the same class mod 5. Why is then:
| > (-8 % 5) == (7 % 5) false
|
| in C-like languages? I'm pretty sure it's perfectly
| consistent with all the relevant language standards, but
| it's a "wat" nonetheless.
| masklinn wrote:
| > What's "wat" (in the same sense as the js "wat" talk)
| is that the answer changes if the argument becomes
| negative.
|
| It's the same in Python though? you just prefer the way
| it behaves (though to be fair it's generally more useful
| and less troublesome): Python's remainder follows the
| sign of the divisor (because it uses floored division),
| while C's follows the sign of the dividend (because it
| uses truncated division).
|
| Strictly speaking, neither is an euclidian modulo.
| owl57 wrote:
| It's not symmetric. I believe most people with strong
| mathematical background (like Guido:)) expect lambda x: x
| % m to be some "computer approximation" to the standard
| mapping from Z to Z/mZ, while not having deep
| expectations about lambda m: x % m.
| veltas wrote:
| The really useful fact of C's remainder operator is that: x ==
| x / y * y + x % y, provided nothing overflowed. This is usually
| what you want when doing accurate calculations with integer
| division.
| xigoi wrote:
| The modulo operation together with floor division also has
| this property. And unlike C's operators, it also has the
| useful property that (x + y) % y == x % y.
| temac wrote:
| Undefined behavior is probably the worst way we can imagine to
| define constraints usable by optimizers. Sad that major languages
| and implementations went this way.
| Karellen wrote:
| Yeah, I think a lot of C's `undefined behaviour` semantics
| around assignments to ints should be reconsidered and changed
| to `unspecified` or `implementation defined` behaviour, and
| compilers can just do "whatever the hardware does". If that
| includes traps on one arch, fine, let it trap.
|
| I think `undefined behaviour` still has its place in C -
| dereferencing a freed pointer comes to mind as an obvious
| example - but I think a good proportion of the really
| unintuitive UB conditions could be made saner without
| sacrificing portability or optimisation opportunities.
| owl57 wrote:
| If you extend "unspecified" to allow traps, reading a bogus
| pointer can also be unspecified, only writes undefined in the
| current sense.
| Karellen wrote:
| I don't think so. With "unspecified" and "implementation-
| defiend" behaviours, the implementation has to pick a
| behaviour and be consistent with it. The difference is
| whether they have to document that behaviour or not.
|
| If the hardware traps on bogus pointers, then reading a
| bogus pointer may trap. But if you read a recently-freed
| pointer, it may still be valid according to the hardware
| (e.g. will have valid PTEs into the processes address
| space) so won't trap. Therefore you won't be able to
| guarantee any particular behaviour on an invalid read, so I
| don't think you'd be able to get away with "unspecified" or
| "implementation defined" behaviour on most hardware.
| owl57 wrote:
| AFAIR reading uninitialized int, for example, is
| "unspecified" (any value could be there). If we consider
| adding "implementation defined with possible trap" for
| overflow, we might as well add "unspecified with possible
| trap" for reading an invalid pointer (any value could be
| there, or it could trap, but no nasal demons).
| temac wrote:
| Using unitialized values is UB. Going back to naive
| pointers is a lost cause because compilers have started
| to do crazy optims (not even currently allowed by the
| standards...) like origin analysis. You can't steal that
| toy from the people implementing optims.
| UncleMeat wrote:
| "Implementation defined" is worse in a lot of ways. Now the
| compiler has way less authority to tell you to stop! And we
| still have the problem of the application probably not doing
| what you want.
| temac wrote:
| Current compilers don't "tell you to stop". They silently
| transform your program into complete garbage without even
| causality restraints. The sanitizers can help but they are
| merely dynamic when the dangerous tranformations are static
| in the first place.
| UncleMeat wrote:
| It is true that the language has failed to provide tools
| that help developers prevent UB and that this is a very
| bad thing for the ecosystem. It _can_ change, though.
|
| In practice, compilers aren't actually adversarial. A lot
| of the discussion around UB is catastrophizing and talks
| about how the compiler will order you pizza or delete
| your disk. Some problems are real and I know some people
| whose graduate school work was very specifically on the
| problems that this causes for security-related checks but
| compilers really really do not transform your program
| into "complete garbage." They transform your program into
| a program with a bug, which is a true statement about
| that program.
|
| I'm reminded of the apocryphal story about being asked
| how a computer knows how to do the right thing if given
| the wrong inputs. This feels similar.
| UncleMeat wrote:
| I do agree that the story around UB in C and C++ sucks. But UB
| doesn't exist because the compiler engineers want to be able to
| stick in optimizations. Once they are in the spec it makes
| sense to follow the rules but most UB comes from a desire for
| portability and to not privilege one platform over another.
|
| And further, defining a lot of UB won't actually improve
| things. Imagine we define signed integer overflowing behavior.
| Hooray. Now your program just has a _different_ bug. If you 've
| accidentally got signed integer overflow in your application
| then "did some weird things because the compiler assumed it
| would never overflow" is going to cause exactly the same amount
| of havoc as "integer overflowed and now your algorithm is
| almost certainly wrong."
| nayuki wrote:
| > And further, defining a lot of UB won't actually improve
| things. Imagine we define signed integer overflowing
| behavior. Hooray. Now your program just has a different bug.
|
| This is exactly what JF Bastien argues in his hourlong talk:
| https://youtu.be/JhUxIVf1qok?t=2284
|
| So yeah, defining signed integer overflow isn't going to fix
| bugs in the vast majority of existing programs.
|
| That being said, enforcing signed overflow wraparound at
| least makes debugging easier because it's reproducible. This
| is how it is in Java land - int overflow wraps around, and
| integer type widths are fixed, so if you trigger an overflow
| while testing then it is reliably reproducible across all
| conforming Java compiler and VM versions and platforms.
| UncleMeat wrote:
| It also makes tools less able to loudly yell at you for
| having it in your application. Yes, you can turn on
| optional warnings if you want but if you've got one or two
| intended uses for this behavior then now you need
| suppressions and all sorts of mess.
| gary_0 wrote:
| I got most of them right. My god, what have I become?
| [deleted]
| BirAdam wrote:
| I have used C quite a bit, and I'm amazed at how many I got
| wrong. Nice example of why C is hated by so many I suppose.
|
| Personally, I love C but I don't use it in anything serious.
| Probably for the best given I apparently cannot remember how C
| ints work.
| [deleted]
| [deleted]
| antirez wrote:
| Surprised I got all the questions right because I tend to code
| _around_ those limits, sticking to defining always safe types for
| the problem at hand, in all the cases I 'm not sure about ranges,
| possibile overflows and so forth. What I mean is that if you have
| to remember the rules in a piece of code you are writing, it is
| better to rewrite the code with new types that are obviously
| safe.
| bornfreddy wrote:
| Admittedly my C is (very) rusty, but I was surprised at how few
| of the answers I knew, so your comment made me feel even worse.
| But then I saw your nickname. Yeah. :) (redis rocks btw)
|
| Completely agree with you, and not just in C. Using esoteric
| features of any language is equivalent to putting landmines in
| front of your teammates - bad idea.
| tomxor wrote:
| > Using esoteric features of any language is equivalent to
| putting landmines in front of your teammates
|
| Also, outside of production code, using esoteric features is
| a good way to get familiar with every corner of that language
| - which is useful so that you can diffuse those landmines
| when they are accidentally created. i.e know them to avoid
| them.
| jherskovic wrote:
| Haven't written C in a long time. What a nightmare this is. Got
| most of them predictably wrong.
| simias wrote:
| I did fairly well in the test (I didn't remember that you
| couldn't shift into the sign bit), but it really helps highlight
| how stupid the C promotion rules are. In C as soon as I have to
| do arithmetics with anything but unsigned ints all sorts of alarm
| bells start going off and I end up casting and recasting
| everything to make sure it does what I think it does, coupled
| with heavy testing. Now add floats and doubles into the mix and
| all bets are off.
|
| Rust not having a default "int" type and forcing you to
| explicitly cast everything is such an improvement. Truly a poster
| child for "less is more". Yeah it makes the code more verbose,
| but at least I don't have to worry about "lmao you have a signed
| int overflow in there you absolute nincompoop, this is going to
| break with -O3 when GCC 16 releases 8 years from now, but only on
| ARM32 when compiled in Thumb mode!"
| WalterBright wrote:
| The trouble with explicit casting is if the code is refactored
| to change the underlying integer type, the explicit casts may
| silently truncate the integer, introducing bugs.
|
| D follows the C integral promotion rules, with a couple crucial
| modifications:
|
| 1. No implicit conversions are done that throw away information
| - those will require an explicit cast. For example:
| int i = 1999; char c = i; // not allowed
| char d = cast(char)i; // explicit cast
|
| 2. The compiler keeps track of the range of values an
| expression could have, and allows narrowing conversions when
| they can be proven to not lose information. For example:
| int i = 1999; char c = i & 0xFF; // allowed
|
| The idea is to safely avoid needing casts, in order to avoid
| the bugs that silently creep in with refactoring.
|
| Continuing with the notion that casts should be avoided where
| practical is the cast expression has its own keyword. This
| makes casting greppable, so the code review can find them. C
| casts require a C parser with lookahead to find.
|
| One other difference: D's integer types have fixed sizes. A
| char is 8 bits, a short is 16, an int is 32, and a long is 64.
| This is based on my experience that a vast amount of C
| programming time is spent trying to account for the
| implementation-defined sizes of the integer types. As a result,
| D code out of the box tends to be far more portable than C.
|
| D also defines integer math as 2's complement arithmetic. All
| that 1's complement stuff belongs in the dustbin of history.
| scrame wrote:
| I never really clicked with D, but I always like reading your
| discussions on these details.
| simias wrote:
| Yeah for sure, having more expressive casts and
| differentiating between "upcasts" and "downcasts" is
| definitely better. My point that even "lossy" casts are
| better than C's weird arcane implicit promotion rules by
| quite a margin IMO.
|
| Rust's current handling of this issue is by no mean perfect,
| although it's been steadily improving and I definitely don't
| use `as` as much as I used to.
|
| >D also defines integer math as 2's complement arithmetic.
|
| I think modern C standards does so as well, but I _think_
| than signed overflow is still UB, so it 's mostly about
| defining signed-to-unsigned conversions. There are flags on
| many compilers to tell them to assume wrapping arithmetic but
| obviously that's not standard...
| WalterBright wrote:
| It's not perfect. People do complain that:
| char a,b,c; a = b + c; // error a =
| cast(char)(b + c); // ok
|
| produces a truncation error, and requires a cast. But char
| types have a very small range, and overflow may be
| unexpected. So D makes the right choice here to promote to
| int, and require a cast.
| loeg wrote:
| The most recent C standards also explicitly define integer
| math as 2's complement. Admittedly, much later than D.
| WalterBright wrote:
| The last 1's complement machine I encountered was, never.
| Not in nearly 50 years of programming. I think those
| machines left for the gray havens before I was born.
|
| C should also make char unsigned (D does). Optionally
| signed chars are an abomination.
| xeeeeeeeeeeenu wrote:
| >The last 1's complement machine I encountered was,
| never. Not in nearly 50 years of programming. I think
| those machines left for the gray havens before I was
| born.
|
| Unisys OS 2200 uses one's complement[1] and Unisys MCP
| uses signed magnitude[2]. Both are still around.
|
| [1] - https://public.support.unisys.com/2200/docs/CP19.0/
| 78310422-... (page 108) - "UCS C represents an integer in
| 36-bit ones complement form (or 72-bit ones complement
| form, if the long long type attribute is specified)."
|
| [2] -
| https://public.support.unisys.com/aseries/docs/ClearPath-
| MCP... (page 304) - "ClearPath MCP C uses a signed-
| magnitude representation for integers instead of
| two's-complement representation. Furthermore, ClearPath
| MCP C integers use only 40 of the 48 bits in the word: a
| separate sign bit and the low order 39 bits for the
| absolute value."
| Measter wrote:
| > The trouble with explicit casting is if the code is
| refactored to change the underlying integer type, the
| explicit casts may silently truncate the integer, introducing
| bugs.
|
| That depends on how the casting is provided. For the C-style
| casting, or Rust's `as` casting, yes that is a problem.
| However, another way casting could be provided is through
| conversion functions that are only infallible if information
| isn't lost. For example, let's say we have the functions
| `to_u16` and `to_i16`. For an `i8` the first function could
| return `Option<u16>`, the second `i16`, while for a `u8` they
| would return `u16` and `i16`. That way, any change to the
| types that could cause it to now silently truncate would
| instead cause a compiler error because of the type mismatch.
|
| Rust almost gets there with its `Into` and `TryInto` traits
| which do provide that functionality, but trying to use them
| in an expression causes type inference to fail, which just
| makes them a pain in the ass to use.
| nayuki wrote:
| You can use From and TryFrom, like u32::try_from(5i32).
| https://doc.rust-lang.org/std/convert/trait.From.html ,
| https://doc.rust-lang.org/std/convert/trait.TryFrom.html
| tialaramex wrote:
| And, notably in this context, Rust's traits are auto-
| implemented in a chain, so if From<Foo> for Bar, then
| Into<Bar> for Foo, and thus TryFrom<Foo> for Bar, and
| thus in turn TryInto<Bar> for Foo.
|
| This means that if first_thing used to have a non-
| overlapping value space so that converting it to
| other_thing might fail, so you wrote
| other_thing = first_thing.try_into().blahblahblah;
|
| ... if you later refactor and now first_thing is a subset
| of other_thing so that the conversion can never fail, the
| previous code still works fine, the try_into() call just
| never fails. In fact, the compiler even _knows_ it can 't
| fail, because its error type is now Infallible, a sum
| type with nothing in it, so the compiler can see this
| never happens, and optimise accordingly.
| WalterBright wrote:
| Yeah, having two different cast operations can help here.
| But I like the simpler approach.
| tialaramex wrote:
| Still, I would rather not have Rust's 'as' cast. I should like
| to see all, or at least most uses of 'as' deprecated in a later
| Rust edition.
|
| Rust's 'as' will silently throw away data to achieve what you
| asked. Sometimes you wanted that, sometimes you didn't realise,
| and requiring into() and try_into() instead helps fix that.
|
| For example suppose I have a variable named beans, I forgot
| what type it is, but it's got the count of beans in it, there
| should definitely be a non-negative value because we were
| counting beans, and it shouldn't be more than a few thousand,
| so we're going to put that in a u16 variable named enough. let
| enough = beans as u16;
|
| This works just fine, even though beans is i64 (ie a signed
| 64-bit integer). Until one day beans is -1 due to an arithmetic
| error elsewhere, and now enough is 65535, which is pretty
| surprising. It's not _undefined_ but it is surprising.
|
| If we instead write let enough = beans.into(); we get a
| compiler error, the compiler can't see any way to turn i64 into
| u16 without risk of loss, so this won't work. We can write
| beans.try_into().unwrap() and get a panic if actually beans was
| out of range, or we can actually write the try_into() handler
| properly if we realise, seeing it can fail, what we're actually
| dealing with here.
| jcranmer wrote:
| If I can get away with using .into() over as, I will.
| However, there are so many cases where you can't, and
| .try_into().unwrap() is just way too unwieldy. In particular,
| there's no way for me to go "I know I'm only ever going to
| run on 64-bit machines where sizeof(usize) == sizeof(u64), so
| usize should implement From<u64>" (and even more annoying, I
| might be carefully making sure this works on 32-bit and
| 64-bit machines and get screwed because Rust thinks it could
| be portable to a 16-bit usize despite the fact I'm allocating
| hundreds of MB of memory).
|
| And of course there are times I do want to bitcast negative
| values to large positive unsigned values or vice versa,
| without error. So while I do understand why maybe you
| shouldn't use as, at the end of the day, it just ends up
| being easier to use it than not use it.
| gspr wrote:
| > try_into().unwrap() is just way too unwieldy.
|
| I always carry with me a tiu() alias method for exactly
| that reason (in a TryIntoUnwrap trait implemented for every
| pair of types which implements TryInto).
| arcticbull wrote:
| You can also do try_from()? if you impl From<TryIntError>
| on your local error type.
| ridiculous_fish wrote:
| A flip side is that some safe conversions also produce
| compiler errors. On a 64-bit system, usize and u64 are the
| same width, but they are not convertible: neither is Into the
| other, so you cannot lean on the compiler in the way you
| describe.
|
| You might use try_into(), but then you risk panicking on a
| 32-bit system, instead of getting a compile-time error.
| svnpenn wrote:
| Exactly why I don't use C anymore. Other languages like Go don't
| allow this stuff.
|
| https://go.dev/play/p/D2J6Y9ol41W
| ummonk wrote:
| Wait so does an unsigned int promote to signed int for shift?
| Jimajesty wrote:
| I felt pretty smart getting that first question, then proceeded
| to fall down the stairs.
| Turing_Machine wrote:
| The correct answer for many of these is "don't do that". :-)
| IncRnd wrote:
| That is the correct answer to the questions in this quiz!
|
| Most professionals don't worry about these effects, instead
| choosing to cast and use parenthesis directly. Those instruct
| the compiler instead of relying upon warnings from the
| compiler.
| jstimpfle wrote:
| It's important to notice the disclaimer that this is applies to
| x86/x86-64 GCC like platforms, in particular int is assumed to be
| 32 bits.
|
| As antirez said as well, I don't pride myself in understanding
| the intricacies of integer arithmetic and promotion well. I try
| to write clear code by writing around the less commonly
| understood rules. Nevertheless I wanted to test myself. There are
| two questions that surprised me somewhat.
|
| Apparently you can't left-shift a negative value even by a zero
| amount, as in (-1 << 0).
|
| And is it true that the value of "-1L > 1U" is platform
| dependent? I had assumed that 1U would be promoted to 1L in this
| expression, even on x86 where unsigned int and long have the same
| number of bits. According to the following document, long int has
| a "higher rank" than unsigned int.
| https://wiki.sei.cmu.edu/confluence/display/c/INT02-C.+Under... .
| (Edit: according to rules 4 and 5 of "Usual arithmetic
| conversions", it's not only the rank but also about "the values
| that can be represented")
| Dwedit wrote:
| On Windows, `long` and `int` are the same size, even when
| building for x64.
| einpoklum wrote:
| I taught first-semester C for several years, and still got a
| couple of these wrong - although, to be honest, the way we taught
| the class at my alma mater we steered well clear of these
| situations. We told students that C perform type promotions and
| automatic conversions, but either had them use a uniform type to
| begin with - int typically - or told them to convert explicitly
| and avoid
|
| Still, the most important suggestion I would make here is: Always
| compile with warnings enabled, in particular:
|
| * -Wstrict-overflow=1 and maybe even -Wstrict-overflow=3
|
| * -Wsign-compare
|
| * -Wsign-conversion
|
| * -Wfloat-conversion
|
| * -Wconversion
|
| * -Wshift-negative-value
|
| (or just `-Wall -Wextra`)
|
| And maybe also:
|
| * -fsanitize=signed-integer-overflow
|
| * -fsanitize=float-cast-overflow
|
| (These are GCC flags, should work for clang as well.)
| Cola2265 wrote:
| pif wrote:
| In the end, it's very simple and intuitive.
|
| 1- If you need arithmetic, use signed; if you need a bitmask, use
| signed. C is not assembly, and bit shift is no multiplication nor
| division.
|
| 2- Make sure you stay within bound, as in: don't even think you
| can approach the boundaries. C is not assembly, and overflow flag
| does not exist.
| codeflo wrote:
| I thought I did well until the bit-shifting questions. Who in
| their right mind designs a language where shifting the bits of a
| u16 silently converts into an i32? Doubly so since that fact
| alone directly causes UB -- keeping the u16 would have been
| perfectly fine.
|
| (Edit, since some respondents seem to miss this, explanations
| about efficiency or ISAs might justify promoting to u32 (though
| even that's debatable), but not i32. A design that auto-promotes
| an unsigned type, where every operation is nicely defined and
| total, into a signed type, where you run into all kinds of
| undefined behavior on overflow, is simply crazy.)
| jcranmer wrote:
| If you have a 32-bit architecture, you don't necessarily have
| 8-bit and 16-bit hardware operations outside of memory
| operations (load/store).
|
| Now, I don't find this reasoning persuasive--it's not _that_
| hard to emulate an 8-bit or 16-bit operation--and judging from
| the history of post-C languages, most other language designers
| are equally unmoved by this reasoning, but I can see someone in
| their right mind designing a language that acts like this.
| Especially if the first architecture they 're developing on is
| precisely such on architecture (PDP11 doesn't have byte-sized
| add/sub/mul/div).
| simias wrote:
| I think the prevailing philosophy for C (and later C++) was
| that code should map closely to the hardware and not expand
| to complicated "microcode". A left shift in Cshould just be a
| left shift in the underlying ISA, within reason. That's where
| most of the undefined behaviours come from. Having to add
| masking and other operations to emulate a 16bit shift on a
| 32bit architecture feels un-C-like, for better or worse.
|
| IMO the real issue is not so much the fact that all shifts of
| any type < int is treated as if it were an int, it's that the
| language doesn't force you to acknowledge that in the code.
| If you got a compilation error when trying to shift a short
| and had to explicitly promote to int in order to make it
| through, at the very least it can't lead to an oversight from
| a careless programmer.
|
| C is trying to be clever but only goes half way, resulting in
| the worst of both worlds IMO.
| codeflo wrote:
| The ISA might force someone to extend a value to 32 bits
| (debatable, but let's go with it). It never forces you to
| treat an unsigned int as signed. It also doesn't require
| inserting UB into the process.
| simias wrote:
| I agree, the whole signed vs. unsigned generally feels
| like an afterthought in C (probably because, to a certain
| extent, it was). `char`'s sign being implementation-
| defined is a pretty wild design choice that wasted many
| hours of my life while porting code between ARM and x86.
|
| UBs are not required, but you need them if you want C to
| behave as a macro-assembler as well as allowing for
| aggressive optimizations. For instance `a << b` if b is
| greater than a's width is genuinely UB if you write
| portable code, different CPUs will do different things in
| this situation. Defining the behaviour means that the
| compiler would have to insert additional opcodes to make
| the behaviour identical on all platforms.
|
| You may argue that it's still better than having UB but
| that's just not C's design philosophy, for better or
| worse.
| gsliepen wrote:
| It's worse than that. `char`'s signedness being
| implementation defined is one thing, but then having the
| standard library provide a function called `getchar()`
| that returns not a `char` but an `unsigned char` cast to
| an `int` is diabolical.
| tsimionescu wrote:
| > For instance `a << b` if b is greater than a's width is
| genuinely UB if you write portable code, different CPUs
| will do different things in this situation. Defining the
| behaviour means that the compiler would have to insert
| additional opcodes to make the behaviour identical on all
| platforms.
|
| Your seem to be mixing up implementation defined behavior
| and undefined behavior. It would have been perfectly
| reasonable to make this choice if signed integer ovwrflow
| were implementation-defined, but it is unfortunately not
| - it is undefined behavior instead. This means that a
| program containing this instruction is not valid C and
| may "legally" have any effect whatsoever.
| codeflo wrote:
| That's true, I'd add something more. Your reasoning about
| hardware differences would only justify implementation-
| defined behavior, not undefined behavior. The distinction
| is important here: undefined behavior is when the
| compiler can make surprising optimizations in other parts
| of the code assuming something doesn't happen.
| jcranmer wrote:
| I imagine the reason oversized shifts are UB is because
| some 40-year-old computer hardware trapped on oversized
| shifts, as traps are always UB in C.
| nayuki wrote:
| > Who in their right mind designs a language where shifting the
| bits of a u16 silently converts into an i32?
|
| Yeah, hence why I asked this question years ago:
| https://stackoverflow.com/questions/39964651/is-masking-befo...
| veltas wrote:
| Because everything smaller than an int is usually promoted to
| int. int is the 'word' in C that things are calculated in. Even
| character constants are ints, all enum constants are ints, the
| default type was int when default types were still a thing.
| edflsafoiewq wrote:
| Yes. The model C has is that the CPU has an ALU that that
| operates as (word,word)->word with int being the smallest
| word size. This explains many of C's conversion rules: to
| operate on a single integer, it first has to be promoted to a
| word size; to operate on two integers, they first have to be
| converted to a common word size, etc.
| lifthrasiir wrote:
| And that makes writing a correct _and_ portable C futile.
| Yes, defined-size types are a thing and I almost exclusively
| use them, but since the size of `int` itself is unknown and
| integer promotion depends on that size, the meaning of the
| code using only defined-size types can still vary across
| platforms. intN_t etc. are only typedefs and do not form
| their own type hierarchy, which in my opinion is a huge
| mistake.
| jefftk wrote:
| But why not turn u16 into u32? Why switch it to being signed
| on promotion?
| rwmj wrote:
| I'm sure the answer is going to be along the lines of
| "because PCC did that and they standardized it" :-/
|
| Here's a fun standardization problem I came across recently
| (nothing to do with C):
| http://mywiki.wooledge.org/BashFAQ/105
| ynfnehf wrote:
| ANSI describes why in their rationale:
| https://www.lysator.liu.se/c/rat/c2.html#3-2-1 .
| The unsigned preserving rules greatly increase the number
| of situations where unsigned int confronts signed int to
| yield a questionably signed result, whereas the value
| preserving rules minimize such confrontations. Thus, the
| value preserving rules were considered to be safer for the
| novice, or unwary, programmer. After much discussion, the
| Committee decided in favor of value preserving rules,
| despite the fact that the UNIX C compilers had evolved in
| the direction of unsigned preserving.
| veltas wrote:
| Believe it or not, this makes the behavior more like what
| you'd expect in many cases. For example:
| uint8_t x = 4; extern volatile uint64_t *reg;
| *reg &= ~x;
|
| In the last statement x is promoted to an int, and then
| when the logical NOT occurs every bit is set to 1,
| including the high bit. When it's converted to a uint64_t
| for the AND, the high bits are also set to 1. So the result
| is that the final statement clears only bit 2 in *reg.
|
| If it promoted to unsigned int, then it would also clear
| bits 32-63.
| codeflo wrote:
| I don't find that very convincing. It's simply ambiguous,
| I might want this: *reg &=
| ~(uint64_t)x;
|
| or, and there's no elegant way to even write this in C, I
| might want: *reg
| &=(uint64_t)(uint8_t)~x;
|
| The fact that I have to write two casts here to undo the
| damage of the auto-promotion is evidence of how broken
| this is.
| veltas wrote:
| That second line can be written: *reg &=
| (uint8_t)~x;
|
| Or: *reg &= ~x & 0xFF;
| tialaramex wrote:
| extern volatile uint64_t *reg; *reg &= ~x;
|
| People should stop doing this. What this _means_ is:
| extern volatile uint64_t *reg; uint64_t tmp = *reg;
| tmp &= ~x; *reg = temp;
|
| But of course when you write _that_ chances are somebody
| will point out that you 're running in interruptible
| context sometimes in this function, so that's actually
| introducing a race condition. Why didn't they say so when
| you wrote it your way? Because that looked like a single
| operation and so it wasn't obvious it might get
| interrupted.
| scatters wrote:
| Because sub-integer types are for storage, not computation.
|
| Yes, it'd be better if you had to explicitly cast to int or
| unsigned to perform arithmetic, but that ship has sailed.
| leni536 wrote:
| Doesn't list my favorite footgun.
|
| x and y are unsigned short. The expression `x*y` has defined
| behavior for...
|
| a) all values of x and y.
|
| b) some values of x and y.
|
| c) no values of x and y.
| nayuki wrote:
| If short = 16 bits and int = 16 bits, then x and y will be
| promoted to unsigned int. Unsigned multiplication has
| wraparound behavior, so x*y will be defined for all values.
|
| If short = 16 bits and int = 32 bits (or heck even 17 bits),
| then x and y will be promoted to signed int. Signed
| multiplication overflow is undefined behavior, so x*y will be
| undefined for some values when x*y is too large. In particular,
| 0xFFFF * 0xFFFF = 0xFFFE_0001, which is larger than INT_MAX =
| 0x7FFF_FFFF.
|
| If short = 16 bits and int = 64 bits (or even 33 bits), then x
| and y will be promoted to signed int. The range of x*y will
| always fit int, so no overflow occurs, and the expression is
| defined for all input values.
|
| Isn't C fun?
| loeg wrote:
| (B), assuming int is 32-bit and short is 16-bit. The
| multiplication promotes both operands and result to signed int,
| right? So if both x and y are uint16_max, the result overflows
| signed int and is UB, I think.
|
| But if int were larger (eg 64 bit) and short remained 16 bits,
| there's no overflow and the answer is (A). I think.
| jstimpfle wrote:
| There was definitely a question that covered the auto-promotion
| to int, maybe with slightly different types.
| [deleted]
| shultays wrote:
| I am gonna say A but I guess I dont the foot gun here
| Veliladon wrote:
| The shorts are not promoted to ints (if you even declared the
| result as that type) until after the multiplication. The
| result will first be put into a short and then promoted to
| int. It's basically asking for an overflow error. Given that
| 256^2 is 65536 you don't have to be multiplying large numbers
| before hitting that overflow.
| loeg wrote:
| Unsigned short has defined overflow, though.
| shultays wrote:
| I don't follow, assuming short is half the size of int it
| would only overflow if most significant bits of both values
| are 1. Where did that 256^2 come from? It wouldn't overflow
| if a and b were 256
|
| I missed that the result would be promoted to signed int,
| which only most significant bits are set
| [deleted]
| LegionMammal978 wrote:
| All operands are always promoted to int (or unsigned int,
| long, unsigned long, etc.) _before_ any operation. The
| footgun in this example is that even though you 're using
| unsigned integers and would expect guaranteed wrapping
| semantics, the promotion to signed int makes it UB for
| USHORT_MAX*USHORT_MAX.
| jandrese wrote:
| 65536 * 65536 would overflow an int.
| shultays wrote:
| Ah I didn't know it would be promoted to signed int.
| Another thread here explains it as well. Thanks
| synergy20 wrote:
| typically you shall avoid shift on signed integers, especially
| NEVER left shift on signed char|short|int|whatever.
|
| I limit shift strictly to unsigned integer numbers.
| nayuki wrote:
| Following this rule strictly can be tricky.
| uint16_t x = (...); uint16_t y = x << 15;
|
| Any arithmetic involving uint16_t will be promoted to some kind
| of int. If int is 16 bits wide, then uint16_t will be promoted
| to unsigned int before the shift, and all values of x are safe.
| Otherwise int is at least 17 bits wide, then uint16_t will be
| promoted to signed int before the shift.
|
| On a weird but legal platform where int is 24 bits wide, the
| expression (uint16_t)0xFFFF << 15 will cause undefined
| behavior.
|
| My workaround for this is to force promotion to unsigned int:
| (0U + x) << 15.
| https://stackoverflow.com/questions/39964651/is-masking-befo...
| siggen wrote:
| Got all these correct. I was expecting something more convoluted
| or exotic undefined behavior.
| dinom wrote:
| Seems like a good example of why interview quizzes aren't a
| panacea.
| flykespice wrote:
| This quiz only reminded me how _little_ I know about C
| (thankfully those case are corner-cases of incompetent
| programming so other and I can ignore their existence).
|
| Good grief, thanks Quiz for reminding me to stay away from that
| mess of language.
| greesil wrote:
| -Wall -Werror
|
| #include <stdint.h>
|
| and don't use primitive types, and you will avoid many of these
| issues.
| ghoward wrote:
| I'm a heavy use of C, and most of what I got wrong were around
| integer promotion.
|
| I'm glad I run clang with -Weverything and use ASan and UBSan.
| jcranmer wrote:
| The answer of the final question (INT_MIN % -1) is wrong in a way
| that's somewhat dangerous.
|
| If you read the text of C99 carefully, yes, it's implied that
| INT_MIN % -1 should be well-defined to be 0. However, the %
| operator is usually implemented in hardware as part of the same
| instruction that does division, which means that on hardware
| where INT_MIN / -1 traps (thereby causing undefined behavior),
| INT_MIN % -1 will also trap. The wording was changed in C11 (and
| C++11) to make INT_MIN % -1 explicitly undefined behavior, and
| given the reasoning for why the wording was changed, users should
| expect that it retains its undefined behavior even in C89 and C99
| modes, even on 20-year-old compilers that predate C11.
| greaterthan3 wrote:
| >The wording was changed in C11
|
| And here's an exact quote from the C11 standard:
|
| >If the quotient a/b is representable, the expression (a/b)*b +
| a%b shall equal a; otherwise, the behavior of both a/b and a%b
| is undefined.
|
| http://port70.net/~nsz/c/c11/n1570.html#6.5.5p6
| lultimouomo wrote:
| Beware that this is very much not a "C Integer quiz", but a
| "ILP32/LP64 Iinteger quiz". As sono as you move not to some weird
| exotic architecture, but simply to 64bit Windows(!!) some quiz
| answers will not hold.
|
| For a website meant to educate programmers on C language gotchas,
| this is a pretty lackluster effort.
|
| Even the initial disclaimer, "assume x86/x86-64 GCC/CLang", is
| wrong, as the compiler does not have anything to do with integer
| widths.
| flykespice wrote:
| My impression is this wasn't muxh to educate programmers on C
| language gotchas but to remind you just how messy and fragile
| this language is (so you can stay far away).
| lultimouomo wrote:
| One more reason not to give the impression that you can
| assume that long is wider than int which is wider then short!
| ok123456 wrote:
| The quiz is wrong because they assume that the length of short
| is less than int in some questions. A short can be the same
| length as an int. It just can't be longer.
| analog31 wrote:
| One of my friends likes to scold me about using a programming
| language (Python) that doesn't enforce type declarations. I
| gently remind him that his language (C) has eight different types
| of integers.
| einpoklum wrote:
| { char, short, int, long, long long } x { signed, unsigned } =
| 10 types, and then there's bool and maybe int128_t, so maybe
| 12.
| Aardwolf wrote:
| Heh, it has _way_ more than 8.
|
| For char, you have 3: signed char, unsigned char and char. It's
| not specified if char without keyword is signed or unsigned.
|
| You have integer types such as size_t, ssize_t and ptrdiff_t.
| They may, under the hood, match one of the other standard int
| types, however this differs per platform, so you can't e.g.
| just easily print size_t using the standard printf formatters,
| you really have to treat is as its own type. Also wchar_t and
| such of course.
|
| Then you have all the integers in stdint.h and inttypes.h. Same
| here applies as for size_t. At least you know how many bits you
| get from several of them, unlike from something like "long".
|
| Then your compiler may also provide additional types such as
| __int128 and __uint128_t.
| spc476 wrote:
| > so you can't e.g. just easily print size_t using the
| standard printf formatters
|
| This has been fixed in C99. For size_t, it's "%zu", for
| ptrdiff_t it's "%td", for ssize_t it's "&zd" and for wchar_t,
| it's "&lc".
| veltas wrote:
| Third question is wrong. It's implementation defined, because
| unsigned short might be the same rank as unsigned int in some
| implementations, in which case it remains unsigned when promotion
| occurs.
| jwilk wrote:
| They may have the same width, but not the same rank.
|
| From C99 SS6.3.1.1:
|
| > _-- The rank of long long int shall be greater than the rank
| of long int, which shall be greater than the rank of int, which
| shall be greater than the rank of short int,_ [...]
|
| > _-- The rank of any unsigned integer type shall equal the
| rank of the corresponding signed integer type, if any._
| [deleted]
| moefh wrote:
| That's true, but it doesn't matter to the point being made.
|
| According to the standard[1], if short and int have the same
| size (even if not the same rank) both numbers are converted
| to unsigned int (that is, the unsigned integer type
| corresponding to int) because int can't represent all values
| of unsigned short.
|
| The usual arithmetic conversions never "promote" an unsigned
| to signed if doing so would change the value.
|
| [1] http://port70.net/~nsz/c/c99/n1256.html#6.3.1.8
| rwmj wrote:
| He does say on the first page: _All other things being equal,
| assume GCC /LLVM x86/x64 implementation-defined behaviors._ Are
| there any normal, modern machines where short and int are the
| same size? I think the last machine where that was true was the
| 8086.
| codeflo wrote:
| Define "machine". All kinds of microcontrollers are
| programmed with C.
| moefh wrote:
| That's still a bad question: the behavior is implementation-
| defined according to the standard, so having "implementation-
| defined" as a wrong option is ambiguous and confusing.
|
| The question would be fine (given that note) if
| "impementation-defined" was not an option, like for example
| the question about "SCHAR_MAX == CHAR_MAX".
| fuckstick wrote:
| > I think the last machine where that was true was the 8086.
|
| The last machine with a word size of 16 bits? The 286 was as
| well. There were others like the WDC 65816 (the Apple IIgs
| and SNES CPU).
|
| It just so happens that there simply far fewer 16 bit CPUs
| then there were 8 or 32 (or "32/16" like the 68k). Also 8 bit
| CPUs are simply a poor fit for C by their nature and the
| assumptions C makes. But the numerous ones still relevant
| today will use 16 bit ints.
|
| The use of Real or V86 mode on the x86 went on for many years
| after the demise of the 8086. I think it is a somewhat of a
| joke at this point that they're teaching Turbo C in some
| developing countries.
| veltas wrote:
| On x86 systems like 8086 short and int were the same size.
| And that appears to be a footnote they've added after being
| called out for being wrong. The question gives
| "implementation defined" as an option and in other questions
| seems to specify the ABI, and in some assumes it without
| saying again. Very inconsistent, they should fix their quiz
| really.
|
| The last x86 processor I know of to have these sizes is
| 80286.
| galangalalgol wrote:
| Avr microcontrollers still have 16bit ints, probably 8051
| and pic too, but I don't use those. Lots of people do
| though. TI dsp uses 48 bit long, so don't count on int and
| long being the same either.
| MauranKilom wrote:
| _> What does the expression SCHAR_MAX == CHAR_MAX evaluate to?_
|
| _> Sorry about that -- I didn 't give you enough information to
| answer this one. The signedness of the char type is
| implementation-defined_
|
| ...why not have an "implementation-defined" answer button then,
| because that's what people should know (instead of knowing all
| ABIs) and what the question is about anyway?
|
| _> If these operators were right-associative, the expression [x
| - 1 + 1] would be defined for all values of x._
|
| That's just wrong, no? If + and - were right-associative, it
| would be parsed as x - (1 + 1), which is decidedly not "defined
| for all values of x".
| shultays wrote:
| I raised an eyebrow on that as well, in some questions you have
| implentation defined for an answer so I assumed the author just
| wrongly assumed undefined covers that
| necovek wrote:
| Other questions might also be "implementation defined", thus a
| caveat to assume GCC/LLVM implementations on x86/amd64 at the
| start.
|
| I.e. C standard enforces undefined very sparingly iirc, and
| most of the corner cases are implementation defined.
|
| I may also be misremembering things: it's been 20 years since
| I've carefully read C99 (draft, really) for fun :))
| kevin_thibedeau wrote:
| Even the compiler assumption isn't enough. There is also an
| implicit assumption in the answers that x86-64 is using LP64
| when some platforms will use LLP64 (Windows) or ILP64 (Cray).
| 6a74 wrote:
| Neat quiz. Reminds me that the absolute value of INT_MIN in C
| (and many other languages) is undefined, but will generally still
| return a negative value. This is a "gotcha" that a lot of people
| are unaware of.
|
| > abs(-2147483648) = -2147483648
| lizardactivist wrote:
| A consequence of most, if not all, CPUs today using two's
| complement integers.
|
| I think one's complement is more sensible since it doesn't have
| this problem, but it loses out because it requires a more
| complex ISA and implementation.
| [deleted]
| nayuki wrote:
| Ones' complement (correct spelling) has negative zero, which
| I would argue is a far worse problem.
| xigoi wrote:
| Why in the fuck is it "ones' complement", but "two's
| complement"?
| tsimionescu wrote:
| According to Wikipedia:
|
| > The name "ones' complement" (note this is possessive of
| the plural "ones", not of a singular "one") refers to the
| fact that such an inverted value, if added to the
| original, would always produce an 'all ones' number
| veltas wrote:
| It's annoying that negation, ABS, and division can overflow
| with two's complement. But how I look at it: lots of
| operations can already overflow, just a fact of signed
| integers, and you need to guard against that overflow in
| portable code already. It doesn't seem to be fundamentally
| worse that those extra operations can overflow.
| shultays wrote:
| It is undefined since it involves integer overflow
| Cu3PO42 wrote:
| One thing to note is that long int is the same size as int on x64
| Windows, at least in the MSVC ABI. clang also conforms to this.
|
| This is relevant to the question asking if -1L > 1U.
| auxym wrote:
| same with ARM-EABI (32bit cortex-M MCUs). Int and long int are
| both 32 bits.
| MobiusHorizons wrote:
| I think that is expected (it's what happens for x86 as well)
| what's surprising about the parent is that long is apparently
| 32 bit on windows x64. Usually long should be 64 bit on a
| machine with 64 bit words
| boxfire wrote:
| It's very worth pointing out and in fact advocating for the
| compiler options -fwrapv and -ftrapv
|
| Which both make signed integers take the expected two's
| complement behavior, or traps signed integer overflow,
| respectively.
|
| N.B. left shifting into the sign bit is still undefined
| behavior... On x86-64 gcc and clang seem to perform the shift as
| if the number is interpreted unsigned, then shifted, then
| interpreted signed.
| forrestthewoods wrote:
| In my opinion the three original sins of C are: nullptr, null-
| terminated strings, and implicit conversions.
|
| Abolish those three concepts and C is a significantly improved
| language!
| pjmlp wrote:
| no bounds checkings, enums are hardly better than #define,
| pointer decay....
| kazinator wrote:
| Found a disappointing bug in the test: (unsigned
| short)1 > -1
|
| Correct answer is: implementation-defined.
|
| The left operand of > has type _unsigned short_ , before
| promotion. On C implementations for today's popular, mainstream
| machines, _short_ is narrower than _int_ ; therefore, whether it
| is signed or unsigned it goes to _int_.
|
| In that common case, we are doing a 1 > -1 comparison in the
| _int_ type.
|
| However, _unsigned short_ may be exactly as wide as int, in which
| case it cannot promote to _int_ , because its values do not fit
| into that type. It promotes to _unsigned int_ in that case. Both
| sides will go to _unsigned int_ {*}, and so we are comparing 1 >
| UINT_MAX which is 0.
|
| Maybe the author should name this "GCC integer quiz for 32 bit
| x86", and drop the harder choices like "implementation-defined".
|
| ---
|
| {*} It's more nuanced here. If we have a signed and unsigned
| integer operand of the same rank, it could go either way. If the
| unsigned type has a limited range so that all its vallues are
| representable in the signed type, then the unsigned type goes to
| signed. My remark represents only the predominant situation
| whereby signed and unsigned types of the same rank have
| overlapping ranges that don't fit into each other: the unsigned
| version of a type not only lacks negatives, but has extra
| positives.
| mort96 wrote:
| The site says:
|
| > All other things being equal, assume GCC/LLVM x86/x64
| implementation-defined behaviors.
| jwilk wrote:
| Discussed in this thread:
|
| https://news.ycombinator.com/item?id=32712578
| phist_mcgee wrote:
| I know this is a tangential comment, but the fact that the site
| author went to the effort to add a css reset, and then _doesn 't_
| go ahead and add in any kind of margins is pretty bizarre.
|
| The fact that the text butts up to the left of the page with no
| margin is pretty incredible.
| loeg wrote:
| The quiz is a little broken. CHAR_MAX is implementation defined
| but for some reason that question doesn't have that option, even
| though other questions do.
| ynfnehf wrote:
| Another fun one: struct { long unsigned a : 31; }
| t = { 1 };
|
| What is t.a > -1? What about if a is instead a bit-field of width
| 32? (Assuming the same platform as in the quiz.)
| sposeray wrote:
| manaskarekar wrote:
| Tangentially related, https://cppquiz.org/, for C++.
| santimoller66 wrote:
| RedShift1 wrote:
| I only have cursory experience with C, mostly by programming some
| Arduino stuff, so I can only add this: good god this is a
| minefield.
| rwmj wrote:
| I've been a professional C programmer for nearly 40 years and
| didn't do especially well (I got about 2/3rds right). In my
| view it doesn't really matter. Enable maximum warnings and fix
| everything the compiler warns about. I also use Coverity once
| in a while, and every test is run with and without valgrind. If
| you do that you'll be fine.
| adastra22 wrote:
| Yeah, in practice these sort of gotchas ought to trigger
| warnings. And for a C/C++ developer, warnings should never be
| allowed to persist in a code base.
| RedShift1 wrote:
| I thought some deductive reasoning and thinking about how
| things work at the byte level would save me. Computer said
| no.
| nayuki wrote:
| When you write C programs, you are coding on the C abstract
| machine as specified by the language standard. If your code
| triggers signed integer overflow, then the C abstract
| machine says that it's undefined behavior, and the compiler
| and runtime can make your code behave however it wishes. It
| doesn't matter that the underlying concrete machine (POSIX,
| x86 ISA, etc.) has wraparound signed overflow, because your
| code has to interact with both the abstract machine and the
| concrete machine at the same time. See:
| https://www.youtube.com/watch?v=ZAji7PkXaKY
| petergeoghegan wrote:
| > When you write C programs, you are coding on the C
| abstract machine as specified by the language standard.
|
| Are you really, though? I would argue that it's a matter
| of perspective and/or semantics.
|
| The Linux kernel is built with -fwrapv and with -fno-
| strict-aliasing, and uses idioms that depend on it
| directly. We can surmise from that that the kernel must
| be:
|
| 1. Exhibiting undefined behavior (according to a literal
| interpretation of the standard)
|
| OR:
|
| 2. Not written in C.
|
| Either way, it's quite reasonable to wonder just how much
| practical applicability your statement really has in any
| given situation -- since you didn't have any caveats.
| It's not as if the kernel is some esoteric, obscure case;
| it's arguably the single most important C codebase in the
| world. Plus there are plenty of other big C codebases
| that take the same approach besides Linux.
|
| Lots of compiler people seem to take the same hard line
| on the issue -- "the C abstract machine" and whatnot. It
| always surprises me, because it seems to presuppose that
| the _only_ thing that matters is what the ISO standard
| says. The actual experience of people working on large C
| codebases doesn 't seem to even get _acknowledged_. Nor
| does the fact that the committee and the people that work
| on compilers have significant overlap.
|
| I'm not claiming that "low-level C hackers are right and
| the compiler people are wrong". I'm merely pointing out
| that there is a vast cultural chasm that just doesn't
| seem to be acknowledged.
| nayuki wrote:
| > In my view it doesn't really matter.
|
| One of these days, the compiler will do something surprising
| to one of your expressions involving signed integer overflow,
| like converting x < x + 1 to true. Or it'll delete a whole
| loop because it noticed that your code is guaranteed to
| trigger an out-of-bounds array read, e.g.:
| https://devblogs.microsoft.com/oldnewthing/?p=633
|
| > If you do that you'll be fine.
|
| I would not trust code written based on the methodology you
| described. However, if you also add
| -fsanitize=undefined,address (UBSan and ASan) and pass those
| tests, then I would trust your code.
| bluetomcat wrote:
| Almost all of these examples would be sloppy programming in
| real code. The "int", "short", "long", "long long" and "char"
| types are generic labels which shouldn't be used where size and
| signedness matters. For a guarantee of size and signedness one
| should use the "[u]intXX_t" types. For sizes and pointer
| arithmetic - size_t. For subtracting pointers - ptrdiff_t. You
| simply wouldn't have most of these issues if you stick to the
| correct types.
| rwmj wrote:
| Absolutely. _int_ is a "code smell" for us, except in some
| well-defined cases (storing file descriptors for example). If
| someone is using int to iterate a loop, then the code is more
| usually wrong than right, they should be using size_t.
| fizzynut wrote:
| I don't think a for loop using an int is bad or even "more
| wrong than right". If anything int is much better than
| using size_t.
|
| Using an integer, in the 1000s of for loops I've written,
| none get even remotely close to the billions - it is
| optimizing for a 1 in a million case, and if I know
| something can run into the billions of iterations I'm going
| to pay more attention to anyway. I've seen 0 occurrences of
| bugs relating to this kind of overflow.
|
| Using a size_t, it is effectively an unsigned integer that
| risks underflowing which can easily cause bugs like
| infinite loops if decrementing or other bugs if doing any
| index arithmetic. I've seen many occurrences of these kind
| of bugs.
| rwmj wrote:
| On Linux/x86-64 int is 31 bits, so you've probably
| introduced "1000s" of security bugs where the attacker
| only needs the persistence to add 2 billion items to a
| network input, local file or similar, to generate a
| negative pointer of their choosing.
|
| Any such code submitted to one of our projects would be
| rejected or fixed to use the proper type.
| fizzynut wrote:
| Please don't make this adversarial.
|
| I've not introduced a security bug in every for loop I've
| written. What I've written shouldn't be controversial,
| just take a look at Googles style guide:
|
| "We use int very often, for integers we know are not
| going to be too big, e.g., loop counters. Use plain old
| int for such things. You should assume that an int is at
| least 32 bits, but don't assume that it has more than 32
| bits. If you need a 64-bit integer type, use int64_t or
| uint64_t.
|
| For integers we know can be "big", use int64_t.
|
| You should not use the unsigned integer types such as
| uint32_t, unless there is a valid reason such as
| representing a bit pattern rather than a number, or you
| need defined overflow modulo 2^N. In particular, do not
| use unsigned types to say a number will never be
| negative. Instead, use assertions for this.
|
| If your code is a container that returns a size, be sure
| to use a type that will accommodate any possible usage of
| your container. When in doubt, use a larger type rather
| than a smaller type.
|
| Use care when converting integer types. Integer
| conversions and promotions can cause undefined behavior,
| leading to security bugs and other problems."
| scatters wrote:
| The true types (int, unsigned etc) are correct for local
| variables, since they describe registers. The sized aliases
| are correct for struct fields and for arrays in memory.
|
| You should prefer signed types for computing (if not storing)
| sizes and for pointer arithmetic, since they are more
| forgiving with underflow.
| mananaysiempre wrote:
| Unfortunately, even if you're using uint16_t (which,
| remember, the C standard _does not guarantee exists_ ) on a
| platform with (say) 32-bit integers, you're still actually
| computing in _signed int_ due to promotion.
| dave84 wrote:
| Didn't notice I got some wrong because the green color made me
| think I got them right.
| ape4 wrote:
| Yeah that's a UX bug in the quiz
| jbverschoor wrote:
| That's to keep the obscurity vibe
| gonzo41 wrote:
| Just like programming in C. You think you got it right, but
| then bugs...
| pjmlp wrote:
| Or thinking "how my compiler does it" == ISO C.
| junon wrote:
| Just a note for the author: The Green background on wrong answers
| was entirely confusing. Thought I was a C god at first.
| CamperBob2 wrote:
| Correct answer to most of these questions: "I don't know, and I
| don't care, because it would never even cross my mind to write
| code in which most of these questions would come up. If forced to
| review or debug code written by someone else who knew the answers
| and leveraged that knowledge, I would either complain about it
| loudly or rewrite it quietly."
|
| C, like Rome, is a wilderness of tigers. Why ask for trouble?
| up2isomorphism wrote:
| These examples do look scary. However, in my 16 years of
| experience of mostly writing In C, I have never found a single
| chance that I need to actually use the expressions that I got the
| answers wrong.
|
| Never mix unsigned and signed integer comparison unless you know
| exactly what you are doing.
|
| And never do arithmetic on boundaries definitions like INT_MAX,
| they are boundaries , why you need to compute a derived value on
| boundaries?
|
| If you do not need arithmetic behavior, do not use signed
| integer, use unsigned. Because computer does not understand sign,
| so if you do not need a sign, do not use it.
|
| You do not need to be a language committee member to write C ,
| you just need to understand the reason behind its design.
| bee_rider wrote:
| Is it your experience that there are many projects where you
| can reliably say some variable will never need arithmetic
| behavior?
|
| I don't have 16 years of experience writing C, but
|
| > If you do not need arithmetic behavior, do not use signed
| integer, use unsigned
|
| does seem in a roundabout way to match the advice that I've
| gotten from other folks -- usually stick to signed (because you
| never know if somebody is going to want to use an integer in,
| say, a downwards loop). Your comment just seems to highlight
| the less usual case, where you can be sure that nobody will
| ever need that arithmetic behavior... maybe it depends on the
| type of applications, though.
| up2isomorphism wrote:
| It always depends on the area of focus. You can also
| consistently choose the default int type to signed while
| being fully aware of you lose half range but gain signed
| integer arithmetic. For me I am mostly doing system and
| networking so I tends to to unsigned. The key is to choose a
| consistent default while aware of that you are in a non-
| intuitive zone when that default type puts you in.
| tredre3 wrote:
| > If you do not need arithmetic behavior, do not use signed
| integer, use unsigned. Because computer does not understand
| sign, so if you do not need a sign, do not use it.
|
| That's interesting. In my career (embedded development) I've
| learned to do the opposite. Always use signed unless you have a
| reason not to. Even if a value can't naturally be negative, use
| signed. Use unsigned only if you need the extra bit, or if
| you're doing bitwise operations.
|
| > Because computer does not understand sign
|
| Computers understand signs just fine, we're long past the days
| of the 6502 N-flag being a glorified bit 7 check. All CPUs have
| signed instructions.
| up2isomorphism wrote:
| > That's interesting. In my career (embedded development)
| I've learned to do the opposite. Always use signed unless you
| have a reason not to. Even if a value can't naturally be
| negative, use signed. Use unsigned only if you need the extra
| bit, or if you're doing bitwise operations.
|
| In this case, since you make sure that the extra size gained
| by unsigned is not important to you all, then you can also go
| with signed by default. Basically it is the tradeoff between
| 1. robust in majority of the use cases 2. capability to do
| signed arithmetic 3. additional positive integer range. As
| long as you make a consistent selection and be mindful when
| you are in the danger zone, it can be handled.
| WaffleIronMaker wrote:
| > Always use signed unless you have a reason not to. Even if
| a value can't naturally be negative, use signed.
|
| Can you elaborate on what benefits this approach has? I would
| feel that, especially when a number cannot be negative,
| unsigned integers seem like a proper representation of the
| data?
| gugagore wrote:
| Here is an example: https://wesmckinney.com/blog/avoid-
| unsigned-integers/
| Sirened wrote:
| > And never do arithmetic on boundaries definitions like
| INT_MAX, they are boundaries
|
| I'd argue something stronger: if you care about boundaries like
| INT_MAX, you should never be comparing them using your regular
| comparison tools. I.e., even though there are correct ways to
| compute whether x + 1 will overflow, don't bother trying to do
| that and instead always use __builtin_add_overflow since you
| can't fuck it up. Trying to do these sorts of edge checks are
| incredibly hard and it has lead to numerous security
| vulnerabilities due to checks being optimized out. The builtins
| do exactly what they say and you don't have to worry about UB
| blowing your foot off.
| jcelerier wrote:
| The problem is that for unsigned one of the boundaries is 0,
| which is an extremely common number - there isn't a couple
| months where I don't find a bug due to a size-1 somewhere
| 10000truths wrote:
| Subtraction would presumably fall under the arithmetic
| behavior that OP was talking about.
| pantalaimon wrote:
| The compiler will also warn about these cases where you better
| be explicit to avoid undesired behavior.
| Nokinside wrote:
| The first rule of C good programming "Treat warnings as errors".
| Compile with all warnings enabled, selectively ignore only
| warnings you know for sure are not important for program
| semantics.
|
| Simple MISRA C with all warnings on is already close to language
| with strict type checking. C compilers give you a choice, use it.
| If you are programming for money, good static analyzer makes it
| possible to write safety critical code.
| pjmlp wrote:
| Worth noting Dennis own words,
|
| "Although the first edition of K&R described most of the rules
| that brought C's type structure to its present form, many
| programs written in the older, more relaxed style persisted,
| and so did compilers that tolerated it. To encourage people to
| pay more attention to the official language rules, to detect
| legal but suspicious constructions, and to help find interface
| mismatches undetectable with simple mechanisms for separate
| compilation, Steve Johnson adapted his pcc compiler to produce
| lint [Johnson 79b], which scanned a set of files and remarked
| on dubious constructions."
|
| Dennis M. Ritchie -- https://www.bell-
| labs.com/usr/dmr/www/chist.html
|
| Unfortunately too many think they know better than the language
| authors themselves.
| protomikron wrote:
| What is the history of "undefined behavior" [in C compilers
| and the standard] in general? I suppose originally it was
| supposed to guide compiler engineers, but we all know that
| backfired, as many compilers try to exploit undefined
| behavior to optimize code, but that can be problematic in
| security sensitive code (e.g. if uninitialized memory is
| optimized away) - there have been discussions between
| security engineers, kernel developers and GCC hackers about
| how to implement/interpret the standard.
|
| Would it be possible to have a standard, where undefined
| behavior is just a compile error? What would we lose - apart
| from legacy compatibility?
| jcranmer wrote:
| > Would it be possible to have a standard, where undefined
| behavior is just a compile error?
|
| No [if you're aiming for something in the same vein as C].
| Undefined behavior is ultimately an inherently dynamic
| property--certain values could make a statement execute
| undefined behavior, and consequently, virtually every
| statement could potentially cause undefined behavior. Note
| that this remains true even in languages like Rust: Rust
| has _loads_ of undefined behavior, but you do have to wrap
| code in unsafe blocks to potentially cause undefined
| behavior.
|
| > What would we lose - apart from legacy compatibility?
|
| In particular, it is clear at this point that if you want
| to permit converting integers to pointers, you will either
| have to live with undefined behavior (via pointer
| provenance) or forgo basically _all_ optimization
| whatsoever.
| pjmlp wrote:
| C sucked on 8 and 16 bit home computers, to be fair, all
| high level systems programming languages had their own set
| of issues regarding optimal code generation, thus Assembly
| was the name of the name for ultimate performance.
|
| UB started as means to not kick out computer architectures
| that would otherwise not be able to be targeted by fully
| compliant ISO C compilers.
|
| Given that C prefers to be a kind of portable macro
| assembler than care about security, it was only a matter of
| time until those escape hatches started to be taken
| advantage for optimizations.
|
| Same applies to other languages, however since their
| communities tend to prefer security before ultimate
| performance, some optimization paths are not considered as
| that would hurt their safety goals.
|
| In what concerns C, C++ and Objective-C, dropping UB
| optimizations would mean going back to the 1990's in terms
| of code quality.
| robryk wrote:
| Would int foo(int x, int y) { return x+y;
| }
|
| compile? After all, this function can be called in a way
| that causes UB.
| Veliladon wrote:
| That's pretty much what Rust was created to do.
| pjmlp wrote:
| Like many others before it, hopefully it gets more
| adoption this time.
| wyldfire wrote:
| Unfortunately some of this stuff just can't be detected
| statically. So while warnings are an excellent starting point,
| I recommend also building and testing with UBSan+ASan enabled.
| veltas wrote:
| I do agree about compiler warnings, but MISRA C imposes a lot
| of unnecessary rules, a lot of unhelpful rules on how you write
| expressions, and tries to also act like C's type system works a
| different way than it does. In practice I have found it to
| actually create bugs. Read the MISRA rules and appendices on
| their effective type model and the C standard, they have gaps
| where MISRA actually forces you to write code that looks
| correct and doesn't work with C's model. I strongly recommend
| against using MISRA, even in automotive or aviation code
| (although it may unfortunately be a requirement on any such
| project).
| lizardactivist wrote:
| I really hate C.
| qznc wrote:
| I made a similar test for floating point:
| https://beza1e1.tuxen.de/no_real_numbers.html
|
| Since most languages use the same IEEE-754, you can hate them
| all.
| lizardactivist wrote:
| Unbiased, equal hate, fully in line with today's political
| correctness? Fine, I really hate programming languages.
| ananonymoususer wrote:
| I have an issue with this one: Assume x has type int . Is the
| expression x<<32 ...
|
| Defined for all values of x
|
| Defines for some values of x
|
| Defined for no values of x
|
| (chose second answer) Wrong answer
|
| Shifting (in either direction) by an amount equalling or
| exceeding the bitwidth of the promoted operand is an error in
| C99.
|
| So according to Wikipedia:
| https://en.wikipedia.org/wiki/C_data_types
|
| int signed signed int Basic signed integer type. Capable of
| containing at least the [-32,767, +32,767] range.[3][a]
|
| So a minimum of 16 bits is used for int, but no maximum is
| specified. Thus, if my C compiler on my 64-bit architecture uses
| 64 bits for int, this is perfectly allowed by the specification
| and my answer is correct.
| nayuki wrote:
| The top of the page says:
|
| > All other things being equal, assume GCC/LLVM x86/x64
| implementation-defined behaviors.
|
| However, you're right to point out that the quiz would be
| better if you can't make any more assumptions than guaranteed
| by the basic language standard.
___________________________________________________________________
(page generated 2022-09-04 23:00 UTC)