[HN Gopher] Checked integer arithmetic in the prospect of C23
       ___________________________________________________________________
        
       Checked integer arithmetic in the prospect of C23
        
       Author : signa11
       Score  : 50 points
       Date   : 2022-12-19 15:45 UTC (7 hours ago)
        
 (HTM) web link (gustedt.wordpress.com)
 (TXT) w3m dump (gustedt.wordpress.com)
        
       | tinglymintyfrsh wrote:
       | tl;dr                   #include <stdckdint.h>
       | bool ckd_add(type1 *result, type2 a, type3 b);         bool
       | ckd_sub(type1 *result, type2 a, type3 b);         bool
       | ckd_mul(type1 *result, type2 a, type3 b);
       | #include <stdckdint.h>         #include <limits.h>
       | /* ... */         int x;         int a = INT_MAX;         int b =
       | INT_MAX;              if (!chk_add(&x, a, b)) {            /*
       | error! */         }
       | 
       | Other stuff on the table for C23
       | 
       | - https://thephd.dev/c-the-improvements-june-september-virtual...
       | 
       | - (PDF) https://open-std.org/jtc1/sc22/wg14/www/docs/n3054.pdf
        
         | RustyRussell wrote:
         | Let's take this as evidence that the proposal is counter-
         | intuitive?
        
         | heywhatupboys wrote:
         | > if (!chk_add(&x, a, b)) { > /* error! */ > }
         | 
         | does it return non-zero on success???
        
           | EdSchouten wrote:
           | It returns a boolean.
        
           | tinglymintyfrsh wrote:
           | They're using a bool, so it violates the old-school C
           | paradigm used for library calls. This is more of a macro
           | rather than a syscall or standard library function call.
        
             | comex wrote:
             | It returns a bool, but true means error, not false.
        
         | comex wrote:
         | You have it backwards: it returns true on error.
        
       | olliej wrote:
       | This is simply standard using the builtins that the major c++
       | compilers already have. It does not remove the absurd "overflow
       | is UB" semantics that introduces security bugs.
        
         | gustedt wrote:
         | Only that here we are talking about C. But yes, most C
         | compilers already seem to have this as builtins.
        
           | olliej wrote:
           | haha, I'm so used to reading C++2x I assumed C++ - however
           | the problem exists in both :-/
        
       | Someone wrote:
       | FTA: _"Their working is quite simple: the arithmetic is as if
       | performed in the set of mathematical integers and then the value
       | is written to_ result. If it fits, the return value is false. If
       | it doesn't fit, the return value is true"*
       | 
       | They also give example code                 bool add_invalid =
       | ckd_add(&result_add, a, b);
       | 
       | I can see that fits with "most of the time, anything positive
       | means 'no error'", for example in _malloc_ , _write_ , _read_ or
       | _printf_ , but these new functions return bool, not int, and the
       | chosen method will require writing a double negation sometimes:
       | if(!add_invalid) { ... }
       | 
       | That's not too bad, but if I were to see
       | if(!ckd_add(&result_add, a, b)) { ... }
       | 
       | I would expect that to test for failure, not success.
       | 
       | Because of that, I think I would have chosen to return true on
       | success, false on failure. I'm curious as to what arguments led
       | to the choice made.
        
         | dooglius wrote:
         | I find the most readable, unambiguous thing is to define
         | explicit macros or constants, e.g.
         | if(ckd_add(&result_add, a, b) == CKD_SUCCESS) { ... }
         | 
         | or alternatively,
         | if(CKD_SUCCESS(ckd_add(&result_add, a, b))) { ... }
        
           | kevin_thibedeau wrote:
           | The return value is the overflow condition so just name it
           | CKD_OVERFLOW.
        
         | RustyRussell wrote:
         | Yes, it's backwards. And counter-intuitive use of bool :(
         | 
         | They felt fine changing the argument order, why stick with the
         | reverse polarity?
        
         | chongli wrote:
         | In C, the value 0 is equivalent to false and all nonzero values
         | are equivalent to true. It's a convention throughout the C
         | standard library to return 0 on success and nonzero when some
         | error occurred. The behaviour of the new checked arithmetic
         | library is consistent with that convention.
        
           | SAI_Peregrinus wrote:
           | And in shell it's convention for programs to return 0 for
           | success and nonzero when an error occurred. The issue is that
           | in shell, the `true` builtin returns `0` and `false` returns
           | `1`, which is the opposite of C's `bool`. And almost every
           | other language's Boolean type.
        
           | gustedt wrote:
           | Unfortunately there are several error conventions in the C
           | standard.
           | 
           | Here, the committee just standardized existing practice,
           | namely the gcc builtins. We just adjusted the call sequence
           | in putting the pointer parameter for the result first.
        
           | Someone wrote:
           | > It's a convention throughout the C standard library to
           | return 0 on success and nonzero when some error occurred.
           | 
           | If only it were so simple. _read_ and _write_ , for example,
           | return a number less than zero on error and a non-negative
           | number on success, and _malloc_ returns zero on error, and
           | nonzero on success.
           | 
           | The general rule for early C seems to be "whatever's the best
           | way to cram a return value or an error in an int" (probably
           | the correct decision for the time)
           | 
           | Also, these new functions return a bool, which, in C23, gets
           | integer-converted to zero for _false_ and one for _true_
           | (https://en.cppreference.com/w/c/language/bool_constant. C17
           | had macros for true and false, with false being zero)
           | 
           | and the reverse, converting to _bool_ similarly has zero fro
           | false (https://en.cppreference.com/w/c/language/conversion#Bo
           | olean_...):
           | 
           |  _"A value of any scalar type (including nullptr_t) (since
           | C23) can be implicitly converted to _Bool. The values that
           | compare equal to an integer constant expression of value zero
           | are converted to 0 (false), all other values are converted to
           | 1 (true)."_
           | 
           | (https://en.cppreference.com/w/c/language/bool_constant)
        
             | dahfizz wrote:
             | _when the return value is an error code_, zero means
             | success and nonzero means failure. Functions like `read`,
             | `recv`, etc etc don't just return an error code. They
             | return an actual value.
             | 
             | Functions that only return an error code like `stat`,
             | `connect`, and the proposed ckd_add, return 0 on success
             | and nonzero on error.
        
         | dahfizz wrote:
         | The proposal fits my intuition. When the return value is an
         | error code, truthy values are an error.                   int
         | rc = func();         if( rc ) { /*handle error*/ }
         | 
         | examples from the stdlib: connect(), stat(), etc. Hell, even
         | main is defined to return 0 on success, nonzero on error.
        
           | RustyRussell wrote:
           | Not for bool though.
        
       | jacquesm wrote:
       | What was the rationale for not simply making it optional to throw
       | an exception on overflow?
        
         | gustedt wrote:
         | Besides C not having exceptions (that you could catch), the
         | point was and is to have a way such that such a call has
         | defined behaviour under any circumstances. The return value of
         | the functions can even be ignored if the wrap around of the
         | overflowing value is what your code expects.
         | 
         | So it is on the programmer to define what happens on error,
         | they could ignore, try to back off by computing the high value
         | bits, `exit` or `abort`.
        
         | dahfizz wrote:
         | C doesn't have exceptions...?
        
           | heywhatupboys wrote:
           | kinda does though. floating point exception vectors are a
           | thing and settable from most compiler impls
        
             | dahfizz wrote:
             | fenv is available in standard C99. There is a GNU extension
             | to trap and throw a signal when a fp error is hit
             | (feenableexcept). I would definitely argue that signals !=
             | exceptions.
        
           | jacquesm wrote:
           | The floating point implementation can signal SIGFPE, with a
           | code FPE_INTOVF, it would seem to me that that is a suitable
           | exception mechanism, it's just that the source isn't the
           | floating point unit but the regular CPU.
           | 
           | Signals (kill), signal, raise, sigsetjmp, siglongjmp etc are
           | C's exception handling mechanism. It's not as well integrated
           | into the language as say a try-catch construct but it works
           | well enough for situations like these. See: signal.h and
           | setjmp.h
           | 
           | https://en.wikipedia.org/wiki/Signal_(IPC)
        
             | dahfizz wrote:
             | Signals are not exceptions. Languages with exceptions still
             | have to handle signals separately. Signals are an OS
             | construct, and exceptions are a language construct.
             | 
             | To the question of "why not raise a signal on integer
             | overflow in C?" - because signals are a terrible way of
             | dealing with this. The signal handler doesn't know what
             | code caused the overflow, and can't really do anything
             | about it. Once the signal handler returns, the code itself
             | has no idea it caused an overflow. Signals are a way for
             | the OS to send signals to your program and not for control
             | flow, after all. That's why `feenableexcept` is a niche
             | extension that nobody uses.
             | 
             | The standard way of checking for fp errors is by calling
             | `fetestexcept`. Personally I prefer this strategy (doing
             | operation, then checking for errors) vs the new proposal
             | for ints (checking for potential errors before doing the
             | operation). But that is a matter of taste.
        
               | jacquesm wrote:
               | Interesting, I've always considered signals to be an
               | exception handling mechanism, as in the 'normal flow' of
               | a program is interrupted and dealt with - or not -
               | through some other mechanism. Learn a new thing every
               | day, even after 40 years of programming in C :) Thanks!
               | 
               | https://en.wikipedia.org/wiki/Exception_handling
               | 
               | (which has this bit: "C does not have try-catch exception
               | handling, but uses return codes for error checking. The
               | setjmp and longjmp standard library functions can be used
               | to implement try-catch handling via macros.")
        
               | dahfizz wrote:
               | Its all semantics, I guess. You could argue that signals
               | can be used to handle exceptional conditions (dereffing
               | NULL). But signals are significantly different than what
               | other programming languages call "exceptions".
               | 
               | Its the same thing as "run time". Pedantically, crt0
               | exists and therefore C has a "runtime". But it is nothing
               | like what we refer to as a "runtime" today. The literal
               | words are true but the meaning of the words doesn't match
               | expectations.
        
               | jacquesm wrote:
               | > signals are significantly different than what other
               | programming languages call "exceptions"
               | 
               | C is likely considerably older than those 'other
               | programming languages' and I'm still stuck in the past
               | with my terminology.
        
               | kevin_thibedeau wrote:
               | Ada had exceptions in 83 that are analogous to their
               | modern incarnation. C was still just a baby then.
        
               | jacquesm wrote:
               | Ada was unobtanium for a long time for mere mortals like
               | myself. In 1983 I was 18 and had access to a C compiler,
               | an Ada compiler would have cost me an arm, a leg and my
               | still to be first born, and information about the
               | language was pretty much limited to what you could get
               | from magazine articles of people that had maybe at some
               | point known someone who had seen an Ada compiler in the
               | wild.
               | 
               | The only other realistic options outside of
               | government/enterprise were Pascal, BASIC and assembler,
               | and within the bulk of the work was done in COBOL.
        
         | gpderetta wrote:
         | These operations simply return the carry flag, they are
         | supposed to map directly to the hardware which normally doesn't
         | raise exceptions for integer overflow.
         | 
         | Also they can be useful to implement bigints.
        
       | raphlinus wrote:
       | I posted overflow checking of signed integer arithmetic as a
       | puzzle yesterday[1]. I got some good responses but none quite as
       | minimal wrt number of instructions as my own solution:
       | bool add_will_overflow(int32_t a, int32_t b) {
       | uint32_t c = (uint32_t)a + (uint32_t)b;             return
       | (((uint32_t)a ^ c) & ((uint32_t)b ^ c)) >> 31;         }
       | 
       | That produces the following assembly (see Godbolt[2]):
       | lea     edx, [rdi+rsi]             mov     eax, edi
       | xor     eax, edx             xor     esi, edx             and
       | eax, esi             shr     eax, 31             ret
       | 
       | In Rust, you can write a.checked_add(b).is_none() which produces
       | the following assembly[3]:                       add     edi, esi
       | seto    al             ret
       | 
       | A fun fact about this code: the overflow flag which is set by the
       | add instruction and then harvested dates back at least to the
       | 8080 (almost 50 years ago) and is not present in vanilla ARM.
       | However, Apple Silicon has it as an extension, to make life
       | easier for Rosetta 2 binary translation[4]. So when you do get to
       | use this shorter code sequence, be thankful of the effort that
       | chip designers put in to make it execute efficiently.
       | 
       | I expect the C23 built-in functions will perform as well as Rust
       | here, which is a win both for ergonomics (you can't really
       | consider the current state of "will a+b overflow" to be
       | discoverable) and performance.
       | 
       | [1]: https://mastodon.online/@raph/109535617953722719
       | 
       | [2]: https://godbolt.org/z/17zMsWjYv
       | 
       | [3]: https://rust.godbolt.org/z/36Ta9oP1P
       | 
       | [4]: https://news.ycombinator.com/item?id=33635720
        
         | jcranmer wrote:
         | Checked overflow operations are kind of the goto operation for
         | "it's easy in assembly, hard in programming languages"--in
         | hardware terms, it's usually check a flag, but since flag
         | registers are not provided for in high-level languages, it
         | becomes a game of try to write it in a pattern that the
         | compiler can recognize, which is never a fun game to play. Even
         | worse than addition is multiplication. Thankfully, C23 has
         | _finally_ added these operations.
         | 
         | Although, recently, I noticed I wanted a case where I wanted
         | checked (u32 - u32) -> i32 and (u32 + i32) -> u32 operations,
         | which even Rust's standard library doesn't provide. (The use
         | case is keeping track of a running delta between two lists of
         | u32 values--the delta can go positive or negative, so it has to
         | be signed, but the values in the lists can never be negative).
        
           | raphlinus wrote:
           | The addition operator just landed in Rust 1.66
           | (checked_add_signed[1]), but the subtraction one it looks
           | like you'd need to roll your own.
           | 
           | [1]: https://github.com/rust-lang/rust/issues/87840
        
           | TazeTSchnitzel wrote:
           | > it becomes a game of try to write it in a pattern that the
           | compiler can recognize
           | 
           | Worse, in C or C++ you also need to find a way to do it
           | without undefined behaviour. You can't just do the operation
           | and see if the result matches expectations...
        
         | dezgeg wrote:
         | Hmm, isn't the Apple-specific magic only for parity(PF) and aux
         | carry (AF)? aarch64 does have a 'V' flag for signed overflow.
        
           | raphlinus wrote:
           | Oops, you're right. Too late to edit, sorry about the
           | confusion.
        
         | dooglius wrote:
         | This is already present as a builtin in in GNU C (as indicated
         | in TFA) and it already results in the optimal code:
         | https://godbolt.org/z/qc4zvav7E
        
         | tinglymintyfrsh wrote:
         | Also, Hacker's Delight and OpenBSD probably have clever
         | solutions for these.
        
       | unwind wrote:
       | Really fun, great introduction to the exciting new features.
       | 
       | Which part of the example implemented using `nullptr` instead of
       | `NULL`, which is also from the future, though?
        
         | gustedt wrote:
         | Yes, `nullptr` will also be in C23.
        
           | pwdisswordfish9 wrote:
           | What for, if ((void *)0) is sufficient in C?
        
             | saagarjha wrote:
             | Apparently implementations don't use this or something like
             | that.
        
       | RcouF1uZ4gsC wrote:
       | One major concern with these type of safety functions is that you
       | have to explicitly opt in. If you are actually thinking about the
       | need to call these function, you are already thinking about
       | overflow.
        
         | jdhdjdbdjdbd wrote:
         | And that's why you use C in the first place?
        
         | jacquesm wrote:
         | > If you are actually thinking about the need to call these
         | function, you are already thinking about overflow.
         | 
         | As you should, for any datatype.
        
       | ash_gti wrote:
       | I know most of these are compiler intrinsically but it's good to
       | have them standardized.
        
       | [deleted]
        
       ___________________________________________________________________
       (page generated 2022-12-19 23:01 UTC)