[HN Gopher] Using signed or unsigned int by default in C (2015)
       ___________________________________________________________________
        
       Using signed or unsigned int by default in C (2015)
        
       Author : synergy20
       Score  : 27 points
       Date   : 2022-08-23 15:59 UTC (7 hours ago)
        
 (HTM) web link (blog.robertelder.org)
 (TXT) w3m dump (blog.robertelder.org)
        
       | reltuk wrote:
       | Similar arguments are also seen in
       | https://graphitemaster.github.io/aau/.
       | 
       | When using unsigned ints for backward iteration, I'm partial to
       | looping as:                 for (i = size - 1; i < size; i--) {
       | ...       }
       | 
       | since it has so much symmetry with forward iteration.
        
         | 10000truths wrote:
         | I would rather just use a plain for loop and manipulate the
         | index in the loop body:                 for(uint32_t i = 0; i <
         | length; ++i) {           i = (length - 1) - i;           // ...
         | }
         | 
         | Much easier for me to understand, making it less likely to mess
         | anything up.
        
           | WalterBright wrote:
           | Even easier (for D):                   foreach (i; 0 ..
           | length)
           | 
           | D takes care of selecting the correct type for `i`.
        
           | edflsafoiewq wrote:
           | You'll need a new variable there, but otherwise yeah.
        
         | [deleted]
        
         | [deleted]
        
         | phkahler wrote:
         | I'd argue that is a terrible way to write it since it depends
         | on the rollover to terminate. Using i >= 0 is less mental work
         | but also requires that i be signed.
        
           | ghoward wrote:
           | Rollover on unsigned ints is well-defined in C.
        
             | hbn wrote:
             | I think their point is that it's less well defined in the
             | average human's brain
             | 
             | When most people see 0 - 1 they don't immediately think
             | 11111111111111111111111111111111
        
               | ghoward wrote:
               | We're talking about C, though. I hope the average human
               | is _not_ writing C.
        
           | xdavidliu wrote:
           | if we used i >= 0 and someone were to come in and change "int
           | i" to "unsigned i" or "size_t i", the result would be
           | catastrophic
        
             | Leherenn wrote:
             | The same argument can be applied to GP's code with the
             | added bonus it will cause UB at some point.
        
       | Asooka wrote:
       | Personally, I choose to just write non-standard C and turn on my
       | compiler's option to define signed overflow as wrapping.
       | 
       | Note that without that option, multiplication of unsigned shorts
       | is undefined on platforms where sizeof(unsigned short) =
       | sizeof(int)/2. That is due to the integer promotion rules. Let's
       | say shorts are 16-bit and ints are 32-bit, which is the case on
       | all consumer devices. Integer promotion rules lead to both
       | unsigned short operands being first converted to ints before
       | multiplication. If they are both equal to 0xffff, then the result
       | is bigger than INT_MAX, and thus undefined. Which means that just
       | multiplying two unsigned shorts is dangerous unless you cast them
       | to unsigned ints first. Here is a godbolt link demonstrating the
       | problem: https://gcc.godbolt.org/z/3445PEGqY
       | 
       | I have never seen the benefit of leaving signed overflow
       | undefined, so I just make it defined to avoid weird problems like
       | that. Most people think that all unsigned arithmetic is well
       | defined, but unfortunately that's false.
        
         | kevin_thibedeau wrote:
         | > I have never seen the benefit of leaving signed overflow
         | undefined
         | 
         | The benefit is that architectures are free to implement it with
         | saturating arithmetic.
        
       | cesarb wrote:
       | Reposting a comment of mine from another thread
       | (https://news.ycombinator.com/item?id=29770689):
       | 
       | For me, the main reason to prefer unsigned integers whenever
       | possible is: less special cases.
       | 
       | When a signed integer is used as an array index, the value can be
       | in one of three ranges: the <0 range, the >=0 && <size range, and
       | the >=size range. To validate the index, you need two
       | comparisons. When an unsigned integer is used as an array index,
       | there are only two ranges: the <size range, and the >=size range,
       | and you need only a single comparison.
       | 
       | And that's not the worst situation with signed integers. From a
       | more general point of view, there are four "classes" of signed
       | integer: positive (>0), zero, negative (<0), and INT_MIN.
       | Everybody tends to forget about that last one, but it's "special"
       | in that it can break things in unexpected ways. Negating it
       | doesn't work (you'd expect negating any number less than zero to
       | result in a number greater than zero, but for INT_MIN that
       | doesn't happen). Dividing it can trap (see for instance
       | https://kqueue.org/blog/2012/12/31/idiv-dos/ which has a couple
       | of examples) or worse.
       | 
       | With unsigned integers, there are only two "classes", zero and
       | non-zero, and it's common to not even need special treatment for
       | zero, reducing the whole thing to a single "class" of values.
        
       | dvh wrote:
       | I hate that stdint.h (or whoever) defines those types as uint8_t
       | instead of simply uint8. Why was the _t necessary?
        
         | mytailorisrich wrote:
         | The convention '_t' is to unambiguously indicate that the name
         | is a type.
         | 
         | Some people also use '_e' for enums, etc.
        
           | qsort wrote:
           | The _t suffix is reserved for types defined by the standard.
           | Although this is an excess of pedantry, strictly speaking you
           | should not be defining names ending in _t.
        
             | mytailorisrich wrote:
             | By 'the standard' do you mean POSIX? Or has that been added
             | to a C standard?
             | 
             | Either way, in general I'd agree this is an excess of
             | pedantry and I don't really see an issue with using _t for
             | types if that works for you.
        
               | alerighi wrote:
               | While this is true, I don't see either a reason to use _t
               | in your code. Since it's your code, you don't have the
               | problem for which the _t suffix was introduced, that is
               | to avoid conflicts that can break other people code. You
               | don't either get any benefit of using the _t suffix,
               | since it's obvious if something is a type or not,
               | especially nowadays (just look at the syntax coloring).
               | You don't even need the _t to distinguish for example
               | between a struct, enum or union definition and it's
               | relative typedef:                   typedef struct
               | MyStruct { ... } MyStruct;
               | 
               | is perfectly valid, and in my opinion better than:
               | typedef struct MyStruct_s { ... } MyStruct_t;
               | 
               | Where the suffix _s and _t doesn't add anything useful
               | but only complicate the code for nothing.
        
               | qsort wrote:
               | I wasn't sure, I just checked.
               | 
               | POSIX reserves everything ending in _t. C99 and later
               | reserve identifiers starting with "int" or "uint" and
               | ending in "_t".
               | 
               | Source:
               | https://en.cppreference.com/w/c/language/identifier
        
               | Someone wrote:
               | For those wondering: that phrase has to be read as _"C99
               | and later reserve identifiers ((starting with "int" or
               | "uint") and ending in "_t")"_, not as _"C99 and later
               | reserve identifiers starting with "int" or "uint" and
               | identifiers ending in "_t""_
               | 
               | It's not as if a program with a variable called
               | _interest_rate_ or _ford_model_t_ would be nonconforming
               | according to the standard.
        
         | pornel wrote:
         | A shorter name probably broke someone's code in 1998.
         | 
         | _t acts as a namespace, and that's all namespacing you can get
         | in C.
        
         | cesarb wrote:
         | Calling it just "uint8" could conflict with software which
         | already used it as an identifier, or worse, a macro; on the
         | other hand, names ending with _t are reserved (see for instance
         | https://www.gnu.org/software/libc/manual/html_node/Reserved-...
         | ), so they could be used by the standard for these new types.
        
       | cannam wrote:
       | I agree with the content of this article, but all the same my
       | advice is usually the opposite - for reasons that its author sets
       | out rather well in the table in his follow-up article
       | (https://blog.robertelder.org/signed-or-unsigned-part-2/)
       | 
       | I wrote a little piece about this myself once
       | (http://soundsoftware.ac.uk/c-pitfall-unsigned.html)
       | 
       | I suppose I would loosely characterise my view as
       | 
       | - if your profession is "C programmer" and you really know what
       | you're doing, you probably want to use unsigneds most of the time
       | 
       | - if your profession is something else, e.g. signal processing
       | researcher who has to translate an algorithm to C or C++, you are
       | probably better off not using unsigneds at all.
        
         | ghoward wrote:
         | I think I agree with you, as much as I don't want to.
         | 
         | I am a C programmer, and I think I do know what I am doing. (I
         | implemented safe, UB-free two's-complement arithmetic in C
         | using only unsigned integers.) I prefer unsigned.
         | 
         | But for someone who doesn't know, as long as they're not
         | working on anything critical, signed might be "good enough".
        
       | w0mbat wrote:
       | Repost from 2015
        
       | qsort wrote:
       | Doesn't this depend?
       | 
       | - When iterating an array, you're supposed to use "size_t",
       | because it's the only type that's guaranteed to contain the
       | maximum size of an array.
       | 
       | - When storing the difference between two pointers, you're
       | supposed to use "ptrdiff_t" because it's the only type that's
       | guaranteed to contain the signed difference between two related
       | pointers.
       | 
       | - When storing numbers, prefer unsigned to signed because signed
       | overflow is UB.
        
         | FartyMcFarter wrote:
         | > - When storing numbers, prefer unsigned to signed because
         | signed overflow is UB.
         | 
         | There's an interesting caveat to this one - if you rely on
         | sanitizers like ubsan to catch overflow errors, using unsigned
         | won't work. It will happily and silently overflow.
        
           | Leherenn wrote:
           | For what it's worth (though you likely know it), you can add
           | "-fsanitize=unsigned-integer-overflow" to trigger a
           | diagnostic even if unsigned integer overflow is not UB. Of
           | course, this only works if your code does not rely on
           | unsigned integer overflow or if you suppress each case one by
           | one.
           | 
           | I would tend to say this is a good practice to enable it,
           | because the number of times where you rely on the overflow
           | should be very low and well documented.
        
         | pkhuong wrote:
         | This all makes sense at first sight, but remember that we can
         | always take the difference between the address of an array's
         | first element and one past the end, so you run into a lot of
         | issues when an array (or any object) is larger than PTRDIFF_MAX
         | bytes.
         | 
         | https://trust-in-soft.com/blog/2016/05/20/objects-larger-tha...
        
       | wnoise wrote:
       | Previously: https://news.ycombinator.com/item?id=9988266
       | 
       | And part 2: https://news.ycombinator.com/item?id=10156265
        
       | coding123 wrote:
       | use longs because it's not arch dependent
        
         | messe wrote:
         | long is definitely arch dependent.
         | 
         | It's even operating system dependent:
         | 
         | It's 32-bits on 32-bit x86 unix-likes and windows, as well as
         | 32-bit on 64-bit windows.
         | 
         | On 64-bit unix-likes it's 64-bit.
         | 
         | Embedded is so varied (and generally 16 or 32 bit), that I'm
         | not even go to hazard a guess on the distribution there.
        
         | xdavidliu wrote:
         | long in C and C++ is absolutely arch dependent. In Java it's
         | not depedendent, but neither are any of the other primitives in
         | Java.
        
         | acuozzo wrote:
         | long is only guaranteed by to be a minimum of 32 bits.
         | 
         | It's 64 on Tru64 UNIX systems, for instance.
        
       | purpleblue wrote:
       | The mathematical definition of "integer" includes the set of
       | negative integers. That's the whole point, otherwise we could
       | just use natural numbers. So making ints signed makes the most
       | sense otherwise there's an impedance mismatch that you need to
       | explain away.
        
         | benj111 wrote:
         | But neither signed or unsigned ints can represent all ints.
         | It's a question of what subset of ints you want to be able to
         | represent.
        
         | mcguire wrote:
         | The mathematical definition of "integer" is unbounded above and
         | below.
         | 
         | The "numbers" used in computing systems are not mathematical
         | numbers in any way.
        
           | Asooka wrote:
           | The numbers used in computing systems are normal mathematical
           | numbers in the Z_2^32 field. The signed ones are shifted by
           | half the range in the negative direction. There is nothing
           | un-mathematical about them, people who have studied any math
           | in university, i.e. all CS graduates, should have gone
           | through an introductory algebra course that covers fields,
           | rings, etc. So I'm not sure why people keep saying ints are
           | not mathematical.
        
             | tsimionescu wrote:
             | That depends on the computing system. You're right if we're
             | talking about the processor level (on any modern
             | processor), but the "signed ints" in C or C++ are not a
             | field at all, since INT_MAX + 1 is undefined, as are
             | -INT_MIN and INT_MIN - 1 (and many many many
             | multiplications). Also, as someone else was pointing out,
             | unsigned short arithmetic is not well defined either.
        
             | [deleted]
        
             | mcguire wrote:
             | To be honest, I adapted a phrase I usually use for floating
             | point numbers.
             | 
             | But anyway, " _So making ints signed makes the most sense
             | otherwise there 's an impedance mismatch that you need to
             | explain away_" is a pretty poor argument when you're
             | dealing with Z_2^32 (or 2^64) minus 2^31, with some
             | operations becoming partial. (And that's assuming you're on
             | a twos-complement machine; those goofy ones-complement
             | boxes had positive and negative zero.)
        
             | qsort wrote:
             | I know what you mean and I'm not saying this with any
             | animus, but this is pretty funny:
             | 
             | > people who have studied any math in university, i.e. all
             | CS graduates, should have gone through an introductory
             | algebra course that covers fields, rings, etc.
             | 
             | > Z_2^32 _field_
        
         | qsort wrote:
         | I mean, it's not the only word in programming that's
         | overloading a word with a mathematical definition. Just off the
         | top of my head:
         | 
         | - function
         | 
         | - set
         | 
         | - map
         | 
         | - monad
         | 
         | - vector
         | 
         | - field
         | 
         | - class
         | 
         | Some of them are obviously referring to something else
         | entirely. Some of them are _kind of_ like the math thing but
         | subtly different.
        
         | pornel wrote:
         | Integers can also be infinitely large. C doesn't use
         | mathematical definitions, but its own.
        
       | [deleted]
        
       | giomasce wrote:
       | I don't know if I am too stupid or too smart, but I use signed
       | when some number can be negative and unsigned when it cannot. Of
       | course I am aware of their differences other e.g. when they wrap,
       | and act accordingly.
        
       | jasonhansel wrote:
       | -fwrapv
        
       ___________________________________________________________________
       (page generated 2022-08-23 23:02 UTC)