[HN Gopher] Using signed or unsigned int by default in C (2015)
___________________________________________________________________
Using signed or unsigned int by default in C (2015)
Author : synergy20
Score : 27 points
Date : 2022-08-23 15:59 UTC (7 hours ago)
(HTM) web link (blog.robertelder.org)
(TXT) w3m dump (blog.robertelder.org)
| reltuk wrote:
| Similar arguments are also seen in
| https://graphitemaster.github.io/aau/.
|
| When using unsigned ints for backward iteration, I'm partial to
| looping as: for (i = size - 1; i < size; i--) {
| ... }
|
| since it has so much symmetry with forward iteration.
| 10000truths wrote:
| I would rather just use a plain for loop and manipulate the
| index in the loop body: for(uint32_t i = 0; i <
| length; ++i) { i = (length - 1) - i; // ...
| }
|
| Much easier for me to understand, making it less likely to mess
| anything up.
| WalterBright wrote:
| Even easier (for D): foreach (i; 0 ..
| length)
|
| D takes care of selecting the correct type for `i`.
| edflsafoiewq wrote:
| You'll need a new variable there, but otherwise yeah.
| [deleted]
| [deleted]
| phkahler wrote:
| I'd argue that is a terrible way to write it since it depends
| on the rollover to terminate. Using i >= 0 is less mental work
| but also requires that i be signed.
| ghoward wrote:
| Rollover on unsigned ints is well-defined in C.
| hbn wrote:
| I think their point is that it's less well defined in the
| average human's brain
|
| When most people see 0 - 1 they don't immediately think
| 11111111111111111111111111111111
| ghoward wrote:
| We're talking about C, though. I hope the average human
| is _not_ writing C.
| xdavidliu wrote:
| if we used i >= 0 and someone were to come in and change "int
| i" to "unsigned i" or "size_t i", the result would be
| catastrophic
| Leherenn wrote:
| The same argument can be applied to GP's code with the
| added bonus it will cause UB at some point.
| Asooka wrote:
| Personally, I choose to just write non-standard C and turn on my
| compiler's option to define signed overflow as wrapping.
|
| Note that without that option, multiplication of unsigned shorts
| is undefined on platforms where sizeof(unsigned short) =
| sizeof(int)/2. That is due to the integer promotion rules. Let's
| say shorts are 16-bit and ints are 32-bit, which is the case on
| all consumer devices. Integer promotion rules lead to both
| unsigned short operands being first converted to ints before
| multiplication. If they are both equal to 0xffff, then the result
| is bigger than INT_MAX, and thus undefined. Which means that just
| multiplying two unsigned shorts is dangerous unless you cast them
| to unsigned ints first. Here is a godbolt link demonstrating the
| problem: https://gcc.godbolt.org/z/3445PEGqY
|
| I have never seen the benefit of leaving signed overflow
| undefined, so I just make it defined to avoid weird problems like
| that. Most people think that all unsigned arithmetic is well
| defined, but unfortunately that's false.
| kevin_thibedeau wrote:
| > I have never seen the benefit of leaving signed overflow
| undefined
|
| The benefit is that architectures are free to implement it with
| saturating arithmetic.
| cesarb wrote:
| Reposting a comment of mine from another thread
| (https://news.ycombinator.com/item?id=29770689):
|
| For me, the main reason to prefer unsigned integers whenever
| possible is: less special cases.
|
| When a signed integer is used as an array index, the value can be
| in one of three ranges: the <0 range, the >=0 && <size range, and
| the >=size range. To validate the index, you need two
| comparisons. When an unsigned integer is used as an array index,
| there are only two ranges: the <size range, and the >=size range,
| and you need only a single comparison.
|
| And that's not the worst situation with signed integers. From a
| more general point of view, there are four "classes" of signed
| integer: positive (>0), zero, negative (<0), and INT_MIN.
| Everybody tends to forget about that last one, but it's "special"
| in that it can break things in unexpected ways. Negating it
| doesn't work (you'd expect negating any number less than zero to
| result in a number greater than zero, but for INT_MIN that
| doesn't happen). Dividing it can trap (see for instance
| https://kqueue.org/blog/2012/12/31/idiv-dos/ which has a couple
| of examples) or worse.
|
| With unsigned integers, there are only two "classes", zero and
| non-zero, and it's common to not even need special treatment for
| zero, reducing the whole thing to a single "class" of values.
| dvh wrote:
| I hate that stdint.h (or whoever) defines those types as uint8_t
| instead of simply uint8. Why was the _t necessary?
| mytailorisrich wrote:
| The convention '_t' is to unambiguously indicate that the name
| is a type.
|
| Some people also use '_e' for enums, etc.
| qsort wrote:
| The _t suffix is reserved for types defined by the standard.
| Although this is an excess of pedantry, strictly speaking you
| should not be defining names ending in _t.
| mytailorisrich wrote:
| By 'the standard' do you mean POSIX? Or has that been added
| to a C standard?
|
| Either way, in general I'd agree this is an excess of
| pedantry and I don't really see an issue with using _t for
| types if that works for you.
| alerighi wrote:
| While this is true, I don't see either a reason to use _t
| in your code. Since it's your code, you don't have the
| problem for which the _t suffix was introduced, that is
| to avoid conflicts that can break other people code. You
| don't either get any benefit of using the _t suffix,
| since it's obvious if something is a type or not,
| especially nowadays (just look at the syntax coloring).
| You don't even need the _t to distinguish for example
| between a struct, enum or union definition and it's
| relative typedef: typedef struct
| MyStruct { ... } MyStruct;
|
| is perfectly valid, and in my opinion better than:
| typedef struct MyStruct_s { ... } MyStruct_t;
|
| Where the suffix _s and _t doesn't add anything useful
| but only complicate the code for nothing.
| qsort wrote:
| I wasn't sure, I just checked.
|
| POSIX reserves everything ending in _t. C99 and later
| reserve identifiers starting with "int" or "uint" and
| ending in "_t".
|
| Source:
| https://en.cppreference.com/w/c/language/identifier
| Someone wrote:
| For those wondering: that phrase has to be read as _"C99
| and later reserve identifiers ((starting with "int" or
| "uint") and ending in "_t")"_, not as _"C99 and later
| reserve identifiers starting with "int" or "uint" and
| identifiers ending in "_t""_
|
| It's not as if a program with a variable called
| _interest_rate_ or _ford_model_t_ would be nonconforming
| according to the standard.
| pornel wrote:
| A shorter name probably broke someone's code in 1998.
|
| _t acts as a namespace, and that's all namespacing you can get
| in C.
| cesarb wrote:
| Calling it just "uint8" could conflict with software which
| already used it as an identifier, or worse, a macro; on the
| other hand, names ending with _t are reserved (see for instance
| https://www.gnu.org/software/libc/manual/html_node/Reserved-...
| ), so they could be used by the standard for these new types.
| cannam wrote:
| I agree with the content of this article, but all the same my
| advice is usually the opposite - for reasons that its author sets
| out rather well in the table in his follow-up article
| (https://blog.robertelder.org/signed-or-unsigned-part-2/)
|
| I wrote a little piece about this myself once
| (http://soundsoftware.ac.uk/c-pitfall-unsigned.html)
|
| I suppose I would loosely characterise my view as
|
| - if your profession is "C programmer" and you really know what
| you're doing, you probably want to use unsigneds most of the time
|
| - if your profession is something else, e.g. signal processing
| researcher who has to translate an algorithm to C or C++, you are
| probably better off not using unsigneds at all.
| ghoward wrote:
| I think I agree with you, as much as I don't want to.
|
| I am a C programmer, and I think I do know what I am doing. (I
| implemented safe, UB-free two's-complement arithmetic in C
| using only unsigned integers.) I prefer unsigned.
|
| But for someone who doesn't know, as long as they're not
| working on anything critical, signed might be "good enough".
| w0mbat wrote:
| Repost from 2015
| qsort wrote:
| Doesn't this depend?
|
| - When iterating an array, you're supposed to use "size_t",
| because it's the only type that's guaranteed to contain the
| maximum size of an array.
|
| - When storing the difference between two pointers, you're
| supposed to use "ptrdiff_t" because it's the only type that's
| guaranteed to contain the signed difference between two related
| pointers.
|
| - When storing numbers, prefer unsigned to signed because signed
| overflow is UB.
| FartyMcFarter wrote:
| > - When storing numbers, prefer unsigned to signed because
| signed overflow is UB.
|
| There's an interesting caveat to this one - if you rely on
| sanitizers like ubsan to catch overflow errors, using unsigned
| won't work. It will happily and silently overflow.
| Leherenn wrote:
| For what it's worth (though you likely know it), you can add
| "-fsanitize=unsigned-integer-overflow" to trigger a
| diagnostic even if unsigned integer overflow is not UB. Of
| course, this only works if your code does not rely on
| unsigned integer overflow or if you suppress each case one by
| one.
|
| I would tend to say this is a good practice to enable it,
| because the number of times where you rely on the overflow
| should be very low and well documented.
| pkhuong wrote:
| This all makes sense at first sight, but remember that we can
| always take the difference between the address of an array's
| first element and one past the end, so you run into a lot of
| issues when an array (or any object) is larger than PTRDIFF_MAX
| bytes.
|
| https://trust-in-soft.com/blog/2016/05/20/objects-larger-tha...
| wnoise wrote:
| Previously: https://news.ycombinator.com/item?id=9988266
|
| And part 2: https://news.ycombinator.com/item?id=10156265
| coding123 wrote:
| use longs because it's not arch dependent
| messe wrote:
| long is definitely arch dependent.
|
| It's even operating system dependent:
|
| It's 32-bits on 32-bit x86 unix-likes and windows, as well as
| 32-bit on 64-bit windows.
|
| On 64-bit unix-likes it's 64-bit.
|
| Embedded is so varied (and generally 16 or 32 bit), that I'm
| not even go to hazard a guess on the distribution there.
| xdavidliu wrote:
| long in C and C++ is absolutely arch dependent. In Java it's
| not depedendent, but neither are any of the other primitives in
| Java.
| acuozzo wrote:
| long is only guaranteed by to be a minimum of 32 bits.
|
| It's 64 on Tru64 UNIX systems, for instance.
| purpleblue wrote:
| The mathematical definition of "integer" includes the set of
| negative integers. That's the whole point, otherwise we could
| just use natural numbers. So making ints signed makes the most
| sense otherwise there's an impedance mismatch that you need to
| explain away.
| benj111 wrote:
| But neither signed or unsigned ints can represent all ints.
| It's a question of what subset of ints you want to be able to
| represent.
| mcguire wrote:
| The mathematical definition of "integer" is unbounded above and
| below.
|
| The "numbers" used in computing systems are not mathematical
| numbers in any way.
| Asooka wrote:
| The numbers used in computing systems are normal mathematical
| numbers in the Z_2^32 field. The signed ones are shifted by
| half the range in the negative direction. There is nothing
| un-mathematical about them, people who have studied any math
| in university, i.e. all CS graduates, should have gone
| through an introductory algebra course that covers fields,
| rings, etc. So I'm not sure why people keep saying ints are
| not mathematical.
| tsimionescu wrote:
| That depends on the computing system. You're right if we're
| talking about the processor level (on any modern
| processor), but the "signed ints" in C or C++ are not a
| field at all, since INT_MAX + 1 is undefined, as are
| -INT_MIN and INT_MIN - 1 (and many many many
| multiplications). Also, as someone else was pointing out,
| unsigned short arithmetic is not well defined either.
| [deleted]
| mcguire wrote:
| To be honest, I adapted a phrase I usually use for floating
| point numbers.
|
| But anyway, " _So making ints signed makes the most sense
| otherwise there 's an impedance mismatch that you need to
| explain away_" is a pretty poor argument when you're
| dealing with Z_2^32 (or 2^64) minus 2^31, with some
| operations becoming partial. (And that's assuming you're on
| a twos-complement machine; those goofy ones-complement
| boxes had positive and negative zero.)
| qsort wrote:
| I know what you mean and I'm not saying this with any
| animus, but this is pretty funny:
|
| > people who have studied any math in university, i.e. all
| CS graduates, should have gone through an introductory
| algebra course that covers fields, rings, etc.
|
| > Z_2^32 _field_
| qsort wrote:
| I mean, it's not the only word in programming that's
| overloading a word with a mathematical definition. Just off the
| top of my head:
|
| - function
|
| - set
|
| - map
|
| - monad
|
| - vector
|
| - field
|
| - class
|
| Some of them are obviously referring to something else
| entirely. Some of them are _kind of_ like the math thing but
| subtly different.
| pornel wrote:
| Integers can also be infinitely large. C doesn't use
| mathematical definitions, but its own.
| [deleted]
| giomasce wrote:
| I don't know if I am too stupid or too smart, but I use signed
| when some number can be negative and unsigned when it cannot. Of
| course I am aware of their differences other e.g. when they wrap,
| and act accordingly.
| jasonhansel wrote:
| -fwrapv
___________________________________________________________________
(page generated 2022-08-23 23:02 UTC)