[HN Gopher] Why do we need for an Undefined Behavior Annex to C++
       ___________________________________________________________________
        
       Why do we need for an Undefined Behavior Annex to C++
        
       Author : luu
       Score  : 47 points
       Date   : 2024-02-28 18:51 UTC (2 days ago)
        
 (HTM) web link (community.intel.com)
 (TXT) w3m dump (community.intel.com)
        
       | planede wrote:
       | Great stuff. It looks like a ton of upfront work, but will
       | greatly improve the standard, and shouldn't be too hard to
       | maintain once it's complete.
        
         | mike_hock wrote:
         | Once it's complete, make it authoritative. Any mention of
         | undefined behavior should be required to reference the
         | corresponding item in the UB annex.
        
           | marcosdumay wrote:
           | And even better, start reducing the list after it's complete,
           | by pushing things like integer overflow into implementation
           | defined; several uses of inc/decrement operators into
           | compilation error; and etc.
        
             | bluGill wrote:
             | I don't want integer overflow to be implementation defined
             | though, I want it undefined so my compiler can optimize my
             | code for the fact that I don't overflow my integers in the
             | first place.
        
           | nlewycky wrote:
           | The hard part of UB isn't where it's mentioned in the
           | standard. The problem is when the standard says "in case A, X
           | occurs, in case B, Y occurs" then somebody invents a
           | situation where neither A nor B apply.
        
       | SunlitCat wrote:
       | I know, the goal is (in short) a listing of expected undefined
       | behavior, which could improve security if developers follow
       | along.
       | 
       | But isn't putting undefined behavior into the standard an
       | oxymoron, kinda?
        
         | mike_hock wrote:
         | No, because UB is not avoidable in principle unless you create
         | a walled garden/sandbox that must be enforced at non-zero
         | runtime cost.
        
           | vlovich123 wrote:
           | I don't know. Safe rust has 0 UB although if I recall
           | correctly things are still a bit messy with integer overflow.
           | I could imagine a world where we define annotations around
           | "signed integer will never overflow" that you can add to hit
           | paths but otherwise disallow optimizing around that UB and
           | for all other UB require an explicit annotation acknowledging
           | it or just a warning and a missed optimization instead.
           | 
           | The current status quo is not the best.
        
             | aw1621107 wrote:
             | The key is "that must be enforced at non-zero runtime
             | cost." Safe Rust is indeed generally free of UB, but that
             | requires bounds checks at a minimum.
             | 
             | > I could imagine a world where we define annotations
             | around "signed integer will never overflow" that you can
             | add to hit paths but otherwise disallow optimizing around
             | that UB
             | 
             | That's kind of what Rust does - overflow is not UB, but you
             | can use unchecked_add/mul/div/sub to get UB-on-overflow if
             | you explicitly want it.
        
               | jeffbee wrote:
               | How does Rust as a language avoid the "UB" traps that are
               | fundamentally just compiler optimizations? For example in
               | the case where you have an array of 8 elements, you
               | access the element at index x, then (in program order)
               | test whether x is less than 8. It seems than an
               | optimizing compiler with a known-bits analysis could
               | conclude that x is always less than 8, and remove that
               | test. I don't know much about languages, so I am
               | interested in how that trap is a consequence of C and
               | avoided by Rust.
        
               | aw1621107 wrote:
               | The Rust compiler/stdlib inserts bounds checks for
               | accesses that cannot be proven to be safe, so in this
               | example you'd get a panic on the out-of-bounds access
               | itself instead of hitting the assert afterwards.
               | 
               | If you use get_unchecked() to intentionally bypass those
               | bounds checks, the assert is indeed removed, but that
               | requires an unsafe block.
               | 
               | Speaking more generally, Rust gates UB behavior behind
               | unsafe blocks, so it's much harder to unintentionally hit
               | UB than in C.
        
               | Kranar wrote:
               | Rust performs a runtime check to protect against out of
               | bounds array accesses. In some cases those checks can be
               | optimized out if the compiler can prove it's not needed.
        
               | jeffbee wrote:
               | This is my point precisely. Having proven to itself that
               | 0 <= x < 8, the compiler could go on to remove any test
               | for x < 8, right?
        
               | heinrich5991 wrote:
               | Yes. It even does so in some cases.
        
               | sgift wrote:
               | Yes, and the compiler does that. That's one of the
               | reasons that iterators are preferred over for-loops with
               | manual access in Rust. While compilers can for very
               | simple cases determine that you will always stay within
               | the bounds (as in your example, where it can find at
               | compile time that the length will always be 8 and no
               | access outside happens) with iterators the compiler
               | doesn't need to know the actual length and can always
               | elide bound checks.
        
               | GrumpySloth wrote:
               | Pedantic note: compiler doesn't elide bounds checks in
               | code using iterators. They're not there to begin with.
               | Iterators in the standard library are implemented using
               | unsafe code, which calls non-bounds-checking variants of
               | functions getting elements from collections. There is
               | nothing for the compiler left to do here regarding
               | bounds-checking.
        
               | sgift wrote:
               | Pedantic, but interesting - didn't know that. Thanks!
        
               | gizmo686 wrote:
               | Rust performs runtime bounds checks on array access
               | unless the compiler can prove that the index is in
               | bounds. As such, removing the check is safe, because it
               | is actually true that the program will never reach that
               | line with an out of bounds array.
               | 
               | This is in contrast to C, where the out of bounds access
               | us merely undefined, so the compiler is allowed to have
               | the program continue execution passed it.
        
               | vlovich123 wrote:
               | There's lots of things beyond bounds checks that
               | constitute UB and bounds checks are but 1 small item.
               | Generally Rust avoids the runtime cost of that through a
               | three prong strategy of idiomatic Rust amortizing the
               | cost of a bounds check to 0 (e.g. efficient iterator
               | implementations that don't need bounds checks), eliding
               | the check altogether if the compiler can prove it's
               | duplicate / unneeded for some reason, or providing unsafe
               | Rust as an escape hatch when you absolutely need it. C++
               | already has something similar via `.at` vs `[]` although
               | the shorter notation being the unsafer but faster option
               | is debatable & likely the thing most people use by
               | default. Explicit annotations are probably the better
               | approach.
               | 
               | It's also got a great ecosystem. I love the assume crate
               | to do these annotations instead of writing unsafe code
               | explicitly:                  assume!(unsafe: i < v.len())
               | 
               | Now you've explicitly written an assumption that will
               | cause the compiler to elide the bounds check in release
               | mode but still assert it in debug mode (vs just doing
               | unsafe & using variants that bypass the bounds check).
        
               | aw1621107 wrote:
               | > There's lots of things beyond bounds checks that
               | constitute UB and bounds checks are but 1 small item.
               | 
               | Indeed, hence "at a minimum". The type system and
               | choosing to avoid UB when defining some operations helps
               | a lot as well, but those have no direct overhead so
               | there's no runtime cost there. Bounds checks (and
               | overflow checks, if those are ever added to release mode)
               | are just the one thing that have a direct runtime cost
               | when they aren't elided.
        
             | jpcfl wrote:
             | > Safe rust has 0 UB
             | 
             | Safe Rust aims for 0 UB, but I don't think you can make the
             | claim that it absolutely has no UB.
             | 
             | This program SEGFAULTs on my system (macOS), because it's
             | reading an invalid memory address due to a stack overflow:
             | const N: usize = 1024*1024*1024;            fn main() {
             | let var: [u8; N] = [0; N];           println!("var: {:?}",
             | var);       }
        
               | Kranar wrote:
               | Safe Rust has no undefined behavior. Undefined behavior
               | does not mean no crashing, it means that the semantics of
               | the program are undefined.
               | 
               | Rust's semantics are to abort on a stack overflow. A
               | language like C or C++ have no such semantics, they may
               | abort or they may continue running and producing
               | jibberish.
        
               | vlovich123 wrote:
               | I don't know if it's technically UB or well defined. The
               | crash is a SEGFAULT and not a panic/abort, but it's
               | probably a SEGFAULT due to guard pages. Still, it's
               | possible to evade guard pages so if you access var[X]
               | such that X points to the heap, it's possible you're
               | reading aliased memory which would be UB in safe Rust.
               | 
               | EDIT: Going to take it back. I'm unable to create a
               | situation where I create a large stack array that doesn't
               | result in an immediate stack overflow. I even tried
               | nightly MaybeUninit::uninit_array but that crashed
               | explicitly with a "fatal runtime error: stack overflow"
               | so it seems like the standard library has improved
               | reporting instead of the old SEGFAULT. So no UB.
        
               | Kranar wrote:
               | Panics are not quite the same as an abort in Rust. Most
               | notably a panic can be caught and execution can resume so
               | as to gracefully terminate the application, but an abort
               | is an immediate termination, a go to jail do not pass go
               | kind of situation.
               | 
               | An out of bounds access in Rust will result in a panic
               | but a stack overflow is an abort.
        
               | vlovich123 wrote:
               | A segfault would imply it's not an abort either although
               | it seems like it has been converted to a proper abort in
               | newer versions of Rust.
        
               | fweimer wrote:
               | How does Rust implement this on targets where LLVM does
               | not implement stack clash protection?
        
               | jpcfl wrote:
               | The fact that this program results in reading/writing an
               | unmapped memory address means it's doing an out-of-bounds
               | access. It segfaults on macOS because the runtime/OS has
               | allocated the stack such that the overflow results in a
               | bad memory access, but that is a behavior of the
               | runtime/OS/hardware, not the language.
               | 
               | I guarantee I could exploit this on a system that does
               | not have virtual memory, or a runtime that does not have
               | unmapped addresses at the end of the stack, to, say,
               | manipulate the contents of another thread's stack.
               | Therefore, this behavior is undefined.
        
               | avgcorrection wrote:
               | Report it.
        
               | dzaima wrote:
               | The language runtime can require that the OS & hardware
               | always results in an exception on stack overflow (or,
               | alternatively, compile in explicit checks for it). You
               | running the program in an environment without that is,
               | technically, just as wrong as running it on a system
               | where integer addition does multiplication.
               | 
               | Now perhaps this means that there are real rust
               | deployments that are "wrong", but that shouldn't include
               | regular sane standard systems, and embedded users should
               | know the tradeoffs.
               | 
               | https://godbolt.org/z/Y75KTT87M:
               | .LBB3_1:                 sub     rsp, 4096
               | mov     qword ptr [rsp], 0                 cmp     rsp,
               | r11                 jne     .LBB3_1
               | 
               | That's a loop at the start of your 'main' that probes the
               | stack specifically to ensure a segfault definitely
               | happens if your array didn't fit on the stack.
        
             | crotchfire wrote:
             | _things are still a bit messy with integer overflow_
             | 
             | No, they aren't messy, they're fully defined. Integer
             | overflow in safe Rust works just like Java: the numbers
             | wrap around. It's totally defined.
             | 
             | You can, optionally, also enable a runtime check for this
             | wraparound. This is usually enabled for debug builds. There
             | is of course a performance cost to doing this.
        
             | deathanatos wrote:
             | Safe Rust has 0 UB, but the person you're responding to
             | qualified that with:
             | 
             | > _that must be enforced at non-zero runtime cost._
             | 
             | Some of Rust's behavior that permits this is not zero
             | runtime costs. Bounds-checking, for example, has a non-zero
             | cost: the bounds check!
             | 
             | Now, I _greatly_ prefer Rust 's approach: I'd rather have
             | the marginal CPU cost: I value my time far higher, at least
             | until the profiler speaks up, and in many cases the
             | optimizer is pretty good at eliminating checks that I
             | myself might look at and go "but I _know_ $condition is
             | true here! " -- so too does the optimizer. (And these days,
             | Godbolt makes testing that very simple.)
             | 
             | And in the worse case, there's unsafe{}, and it is at least
             | explicitly labelled as such to the next reader.
             | 
             | > _things are still a bit messy with integer overflow_
             | 
             | Integer overflow is well-defined behavior, but the behavior
             | depends on compile settings. (https://doc.rust-
             | lang.org/book/ch03-02-data-types.html#integ...) I do hope
             | that someday, the debug behavior becomes _the_ behavior:
             | IME most overflows are errors  / the author did not intend
             | for it to occur. And the "panic on overflow" behavior is
             | removable by simply using one of the functions that
             | specifies an overflow behavior, in which case then the
             | author's intention is just explicitly stated.)
        
         | crotchfire wrote:
         | What they are listing is not the undefined behaviors, but
         | rather _the C++ language constructs which might lead to them_.
        
         | mort96 wrote:
         | There's already a bunch of stuff in the standard which
         | explicitly says, "if you do xyz, the behavior is undefined".
         | Stuff doesn't become "defined" just because the standard points
         | out that it's not defined.
        
         | pornel wrote:
         | It's a poor name all around, because the spec already precisely
         | defines boundaries of what is UB -- "if _this and this_
         | happens, the behavior is undefined".
         | 
         | These are precisely defined situations, that in modern
         | compilers' interpretation, are _forbidden_ from ever happening
         | in any program under any circumstances.
         | 
         | UB is only a name for nebulous consequences of violating these
         | very specific prohibitions.
        
       | j16sdiz wrote:
       | > We also will require future proposal authors to keep the annex
       | updated if they add or remove undefined behavior. They are making
       | undefined behavior an explicit topic to be discussed when
       | reviewing a proposal.
       | 
       | This is great.
        
       | wahern wrote:
       | Notably, the C standard _does_ have an undefined behavior annex,
       | Annex J.2, since C99. It 's not mentioned in the post, and this
       | omission in tandem with repeated usage of "C and C++" at various
       | points might suggest otherwise to some people.
        
         | quelsolaar wrote:
         | And C has a UB study group and we are working on a technical
         | rapport that in great detail describes how UB works in C, and a
         | comprehensive list of sample code for all known UB.
        
           | tux3 wrote:
           | Is there hope that we could some day achieve an exhaustive
           | lists of all undefined behavior that C programmers need to be
           | aware of?
           | 
           | I realize no one really knows today, but is there a path?
        
             | trws wrote:
             | No. It would require listing not just every place the
             | standard states undefined behavior is the result, but also
             | every corner case _not_ covered by the spec, or that is
             | under-specified. It's possible to document known instances,
             | but never to document all possible sources of undefined or
             | unspecified behavior.
        
       ___________________________________________________________________
       (page generated 2024-03-01 23:01 UTC)