[HN Gopher] Why do we need for an Undefined Behavior Annex to C++
___________________________________________________________________
Why do we need for an Undefined Behavior Annex to C++
Author : luu
Score : 47 points
Date : 2024-02-28 18:51 UTC (2 days ago)
(HTM) web link (community.intel.com)
(TXT) w3m dump (community.intel.com)
| planede wrote:
| Great stuff. It looks like a ton of upfront work, but will
| greatly improve the standard, and shouldn't be too hard to
| maintain once it's complete.
| mike_hock wrote:
| Once it's complete, make it authoritative. Any mention of
| undefined behavior should be required to reference the
| corresponding item in the UB annex.
| marcosdumay wrote:
| And even better, start reducing the list after it's complete,
| by pushing things like integer overflow into implementation
| defined; several uses of inc/decrement operators into
| compilation error; and etc.
| bluGill wrote:
| I don't want integer overflow to be implementation defined
| though, I want it undefined so my compiler can optimize my
| code for the fact that I don't overflow my integers in the
| first place.
| nlewycky wrote:
| The hard part of UB isn't where it's mentioned in the
| standard. The problem is when the standard says "in case A, X
| occurs, in case B, Y occurs" then somebody invents a
| situation where neither A nor B apply.
| SunlitCat wrote:
| I know, the goal is (in short) a listing of expected undefined
| behavior, which could improve security if developers follow
| along.
|
| But isn't putting undefined behavior into the standard an
| oxymoron, kinda?
| mike_hock wrote:
| No, because UB is not avoidable in principle unless you create
| a walled garden/sandbox that must be enforced at non-zero
| runtime cost.
| vlovich123 wrote:
| I don't know. Safe rust has 0 UB although if I recall
| correctly things are still a bit messy with integer overflow.
| I could imagine a world where we define annotations around
| "signed integer will never overflow" that you can add to hit
| paths but otherwise disallow optimizing around that UB and
| for all other UB require an explicit annotation acknowledging
| it or just a warning and a missed optimization instead.
|
| The current status quo is not the best.
| aw1621107 wrote:
| The key is "that must be enforced at non-zero runtime
| cost." Safe Rust is indeed generally free of UB, but that
| requires bounds checks at a minimum.
|
| > I could imagine a world where we define annotations
| around "signed integer will never overflow" that you can
| add to hit paths but otherwise disallow optimizing around
| that UB
|
| That's kind of what Rust does - overflow is not UB, but you
| can use unchecked_add/mul/div/sub to get UB-on-overflow if
| you explicitly want it.
| jeffbee wrote:
| How does Rust as a language avoid the "UB" traps that are
| fundamentally just compiler optimizations? For example in
| the case where you have an array of 8 elements, you
| access the element at index x, then (in program order)
| test whether x is less than 8. It seems than an
| optimizing compiler with a known-bits analysis could
| conclude that x is always less than 8, and remove that
| test. I don't know much about languages, so I am
| interested in how that trap is a consequence of C and
| avoided by Rust.
| aw1621107 wrote:
| The Rust compiler/stdlib inserts bounds checks for
| accesses that cannot be proven to be safe, so in this
| example you'd get a panic on the out-of-bounds access
| itself instead of hitting the assert afterwards.
|
| If you use get_unchecked() to intentionally bypass those
| bounds checks, the assert is indeed removed, but that
| requires an unsafe block.
|
| Speaking more generally, Rust gates UB behavior behind
| unsafe blocks, so it's much harder to unintentionally hit
| UB than in C.
| Kranar wrote:
| Rust performs a runtime check to protect against out of
| bounds array accesses. In some cases those checks can be
| optimized out if the compiler can prove it's not needed.
| jeffbee wrote:
| This is my point precisely. Having proven to itself that
| 0 <= x < 8, the compiler could go on to remove any test
| for x < 8, right?
| heinrich5991 wrote:
| Yes. It even does so in some cases.
| sgift wrote:
| Yes, and the compiler does that. That's one of the
| reasons that iterators are preferred over for-loops with
| manual access in Rust. While compilers can for very
| simple cases determine that you will always stay within
| the bounds (as in your example, where it can find at
| compile time that the length will always be 8 and no
| access outside happens) with iterators the compiler
| doesn't need to know the actual length and can always
| elide bound checks.
| GrumpySloth wrote:
| Pedantic note: compiler doesn't elide bounds checks in
| code using iterators. They're not there to begin with.
| Iterators in the standard library are implemented using
| unsafe code, which calls non-bounds-checking variants of
| functions getting elements from collections. There is
| nothing for the compiler left to do here regarding
| bounds-checking.
| sgift wrote:
| Pedantic, but interesting - didn't know that. Thanks!
| gizmo686 wrote:
| Rust performs runtime bounds checks on array access
| unless the compiler can prove that the index is in
| bounds. As such, removing the check is safe, because it
| is actually true that the program will never reach that
| line with an out of bounds array.
|
| This is in contrast to C, where the out of bounds access
| us merely undefined, so the compiler is allowed to have
| the program continue execution passed it.
| vlovich123 wrote:
| There's lots of things beyond bounds checks that
| constitute UB and bounds checks are but 1 small item.
| Generally Rust avoids the runtime cost of that through a
| three prong strategy of idiomatic Rust amortizing the
| cost of a bounds check to 0 (e.g. efficient iterator
| implementations that don't need bounds checks), eliding
| the check altogether if the compiler can prove it's
| duplicate / unneeded for some reason, or providing unsafe
| Rust as an escape hatch when you absolutely need it. C++
| already has something similar via `.at` vs `[]` although
| the shorter notation being the unsafer but faster option
| is debatable & likely the thing most people use by
| default. Explicit annotations are probably the better
| approach.
|
| It's also got a great ecosystem. I love the assume crate
| to do these annotations instead of writing unsafe code
| explicitly: assume!(unsafe: i < v.len())
|
| Now you've explicitly written an assumption that will
| cause the compiler to elide the bounds check in release
| mode but still assert it in debug mode (vs just doing
| unsafe & using variants that bypass the bounds check).
| aw1621107 wrote:
| > There's lots of things beyond bounds checks that
| constitute UB and bounds checks are but 1 small item.
|
| Indeed, hence "at a minimum". The type system and
| choosing to avoid UB when defining some operations helps
| a lot as well, but those have no direct overhead so
| there's no runtime cost there. Bounds checks (and
| overflow checks, if those are ever added to release mode)
| are just the one thing that have a direct runtime cost
| when they aren't elided.
| jpcfl wrote:
| > Safe rust has 0 UB
|
| Safe Rust aims for 0 UB, but I don't think you can make the
| claim that it absolutely has no UB.
|
| This program SEGFAULTs on my system (macOS), because it's
| reading an invalid memory address due to a stack overflow:
| const N: usize = 1024*1024*1024; fn main() {
| let var: [u8; N] = [0; N]; println!("var: {:?}",
| var); }
| Kranar wrote:
| Safe Rust has no undefined behavior. Undefined behavior
| does not mean no crashing, it means that the semantics of
| the program are undefined.
|
| Rust's semantics are to abort on a stack overflow. A
| language like C or C++ have no such semantics, they may
| abort or they may continue running and producing
| jibberish.
| vlovich123 wrote:
| I don't know if it's technically UB or well defined. The
| crash is a SEGFAULT and not a panic/abort, but it's
| probably a SEGFAULT due to guard pages. Still, it's
| possible to evade guard pages so if you access var[X]
| such that X points to the heap, it's possible you're
| reading aliased memory which would be UB in safe Rust.
|
| EDIT: Going to take it back. I'm unable to create a
| situation where I create a large stack array that doesn't
| result in an immediate stack overflow. I even tried
| nightly MaybeUninit::uninit_array but that crashed
| explicitly with a "fatal runtime error: stack overflow"
| so it seems like the standard library has improved
| reporting instead of the old SEGFAULT. So no UB.
| Kranar wrote:
| Panics are not quite the same as an abort in Rust. Most
| notably a panic can be caught and execution can resume so
| as to gracefully terminate the application, but an abort
| is an immediate termination, a go to jail do not pass go
| kind of situation.
|
| An out of bounds access in Rust will result in a panic
| but a stack overflow is an abort.
| vlovich123 wrote:
| A segfault would imply it's not an abort either although
| it seems like it has been converted to a proper abort in
| newer versions of Rust.
| fweimer wrote:
| How does Rust implement this on targets where LLVM does
| not implement stack clash protection?
| jpcfl wrote:
| The fact that this program results in reading/writing an
| unmapped memory address means it's doing an out-of-bounds
| access. It segfaults on macOS because the runtime/OS has
| allocated the stack such that the overflow results in a
| bad memory access, but that is a behavior of the
| runtime/OS/hardware, not the language.
|
| I guarantee I could exploit this on a system that does
| not have virtual memory, or a runtime that does not have
| unmapped addresses at the end of the stack, to, say,
| manipulate the contents of another thread's stack.
| Therefore, this behavior is undefined.
| avgcorrection wrote:
| Report it.
| dzaima wrote:
| The language runtime can require that the OS & hardware
| always results in an exception on stack overflow (or,
| alternatively, compile in explicit checks for it). You
| running the program in an environment without that is,
| technically, just as wrong as running it on a system
| where integer addition does multiplication.
|
| Now perhaps this means that there are real rust
| deployments that are "wrong", but that shouldn't include
| regular sane standard systems, and embedded users should
| know the tradeoffs.
|
| https://godbolt.org/z/Y75KTT87M:
| .LBB3_1: sub rsp, 4096
| mov qword ptr [rsp], 0 cmp rsp,
| r11 jne .LBB3_1
|
| That's a loop at the start of your 'main' that probes the
| stack specifically to ensure a segfault definitely
| happens if your array didn't fit on the stack.
| crotchfire wrote:
| _things are still a bit messy with integer overflow_
|
| No, they aren't messy, they're fully defined. Integer
| overflow in safe Rust works just like Java: the numbers
| wrap around. It's totally defined.
|
| You can, optionally, also enable a runtime check for this
| wraparound. This is usually enabled for debug builds. There
| is of course a performance cost to doing this.
| deathanatos wrote:
| Safe Rust has 0 UB, but the person you're responding to
| qualified that with:
|
| > _that must be enforced at non-zero runtime cost._
|
| Some of Rust's behavior that permits this is not zero
| runtime costs. Bounds-checking, for example, has a non-zero
| cost: the bounds check!
|
| Now, I _greatly_ prefer Rust 's approach: I'd rather have
| the marginal CPU cost: I value my time far higher, at least
| until the profiler speaks up, and in many cases the
| optimizer is pretty good at eliminating checks that I
| myself might look at and go "but I _know_ $condition is
| true here! " -- so too does the optimizer. (And these days,
| Godbolt makes testing that very simple.)
|
| And in the worse case, there's unsafe{}, and it is at least
| explicitly labelled as such to the next reader.
|
| > _things are still a bit messy with integer overflow_
|
| Integer overflow is well-defined behavior, but the behavior
| depends on compile settings. (https://doc.rust-
| lang.org/book/ch03-02-data-types.html#integ...) I do hope
| that someday, the debug behavior becomes _the_ behavior:
| IME most overflows are errors / the author did not intend
| for it to occur. And the "panic on overflow" behavior is
| removable by simply using one of the functions that
| specifies an overflow behavior, in which case then the
| author's intention is just explicitly stated.)
| crotchfire wrote:
| What they are listing is not the undefined behaviors, but
| rather _the C++ language constructs which might lead to them_.
| mort96 wrote:
| There's already a bunch of stuff in the standard which
| explicitly says, "if you do xyz, the behavior is undefined".
| Stuff doesn't become "defined" just because the standard points
| out that it's not defined.
| pornel wrote:
| It's a poor name all around, because the spec already precisely
| defines boundaries of what is UB -- "if _this and this_
| happens, the behavior is undefined".
|
| These are precisely defined situations, that in modern
| compilers' interpretation, are _forbidden_ from ever happening
| in any program under any circumstances.
|
| UB is only a name for nebulous consequences of violating these
| very specific prohibitions.
| j16sdiz wrote:
| > We also will require future proposal authors to keep the annex
| updated if they add or remove undefined behavior. They are making
| undefined behavior an explicit topic to be discussed when
| reviewing a proposal.
|
| This is great.
| wahern wrote:
| Notably, the C standard _does_ have an undefined behavior annex,
| Annex J.2, since C99. It 's not mentioned in the post, and this
| omission in tandem with repeated usage of "C and C++" at various
| points might suggest otherwise to some people.
| quelsolaar wrote:
| And C has a UB study group and we are working on a technical
| rapport that in great detail describes how UB works in C, and a
| comprehensive list of sample code for all known UB.
| tux3 wrote:
| Is there hope that we could some day achieve an exhaustive
| lists of all undefined behavior that C programmers need to be
| aware of?
|
| I realize no one really knows today, but is there a path?
| trws wrote:
| No. It would require listing not just every place the
| standard states undefined behavior is the result, but also
| every corner case _not_ covered by the spec, or that is
| under-specified. It's possible to document known instances,
| but never to document all possible sources of undefined or
| unspecified behavior.
___________________________________________________________________
(page generated 2024-03-01 23:01 UTC)