[HN Gopher] A Guide to Undefined Behavior in C and C++ (2010)
___________________________________________________________________
A Guide to Undefined Behavior in C and C++ (2010)
Author : tmalsburg2
Score : 52 points
Date : 2023-08-17 17:18 UTC (5 hours ago)
(HTM) web link (blog.regehr.org)
(TXT) w3m dump (blog.regehr.org)
| Joker_vD wrote:
| > Case 2: (b == 0) || ((a == INT32_MIN) && (b == -1))
|
| > A Java compiler, in contrast, has obligations in Case 2 and
| must deal with it (though in this particular case, it is likely
| that there won't be runtime overhead since processors can usually
| provide trapping behavior for integer divide by zero).
|
| Actually, there _will_ be runtime overhead on x86 /x64: Java
| mandates that Integer.MinValue / (-1) evaluates to
| Integer.MinValue (see 15.17.2. "Division Operator /" of the Java
| Language Specification) but IDIV instruction raises #DE in such
| circumstance. So the JITter actually emits
| cmp eax, 0x80000000 jne .normalCase xor
| edx, edx cmp $reg, -1 je .specialCase
| .normalCase: cdq idiv $reg
| .specialCase:
|
| code sequence as you can see in its source ([0][1]) instead of
| simplistic "cdq; idiv $reg": because it _does not_ want trapping
| behaviour in this particular case; but e.g. AArch64 doesn 't trap
| neither division by zero nor INT_MIN / -1. That's why accurately
| implementing your language's semantics on different platforms is
| so annoying and why C standard left itself a nice shortcut.
|
| [0]
| https://github.com/openjdk/jdk/blob/d27daf01d6361513a815e783...
|
| [1]
| https://github.com/openjdk/jdk/blob/d27daf01d6361513a815e783...
| fluoridation wrote:
| On the other hand, C left the burden of implementing portable
| semantics to its users.
| Joker_vD wrote:
| Yes, but when C was being made, the application-level
| programmers knew the quirks of the platforms they used just
| as well as the compiler writers because they were almost
| precisely the same people.
| Animats wrote:
| The three big questions:
|
| 1. How big is it?
|
| 2. Who owns it?
|
| 3. Who locks it?
|
| Most undefined behavior in C/C++ involves those three questions.
|
| #1 is historically the most troublesome. And the most
| inexcusable. Pascal, which predates C, didn't have that problem,
| because arrays carried size info. Nor did Algol, Modula I, Modula
| II, and Modula III. Modula I was a very low level language -
| device registers were a language concept.
|
| Something I wrote on this back in 2012.[1] There was some
| consensus at the time that this would work and would be backwards
| compatible with C. But it would be a tough sell, and I didn't
| want to spend my life selling it.
|
| [1] http://animats.com/papers/languages/safearraysforc43.pdf
| jll29 wrote:
| ...and Ada, too. I like the idea of attributes of data objects,
| to access the size of x simple write x'Size (also for types
| e.g. Natural'Size).
|
| The Wirth languages (from which Ada is also a descendant) were
| so much more readable than C, yet relatively capable for
| systems programming, as demonstrated by systems like TeX,
| MacOS, Wirth's Modula compilers and the OS for the Lilith
| workstation he co-designed from scratch.
| Gibbon1 wrote:
| Never used Ada but I think you can define range types so int
| range 0...11. Which I feel is something that you really want
| in embedded and applications level programming.
| thesuperbigfrog wrote:
| >> Never used Ada but I think you can define range types so
| int range 0...11.
|
| Yes. Ada supports integral types with custom ranges:
|
| https://learn.adacore.com/courses/intro-to-
| ada/chapters/stro...
| tialaramex wrote:
| In the medium-long term I want to do this for Rust as
| "Pattern types" because the thing I actually want (custom
| types with niches) is gated on Pattern types, as the way to
| explain to the type system where the niche goes is a
| Pattern. I was persuaded that we can't/ shouldn't just say
| we'll half ass it, we must do it properly if we're doing
| it.
|
| e.g. I don't necessarily have a use for an integer from 0
| to 11, but I _do_ see a use for BalancedI8, a one byte type
| with values -127 to +127 via 0, thus omitting -128. I
| reckon lots of people don 't need -128, whereas a niche is
| very useful. Rust provides NonZeroI8, which has -128
| through +127 but no zero, but I find that's less often what
| you want, and it's not today possible to make your own in
| stable Rust (and in nightly Rust you need a not-for-mortals
| perma-unstable attribute today).
| winrid wrote:
| #4 which is partly #2 - what thread is this callback being
| invoked in? The calling thread? A thread pool in the library?
|
| Mostly a problem I have in java libraries, though.
| JonChesterfield wrote:
| I think a C implementation with overhead instead of UB is
| implementable. I'd like to know what the fundamental
| performance delta we get from UB is. Likewise not sure it's the
| right choice for my life's work.
| Quekid5 wrote:
| The MINIMUM baseline is probably somewhere around
| ASAN/UBSAN/etc. and those aren't exactly cheap... and they
| don't even promise to catch _all_ the problems. The problem
| is that almost every single little thing you can do in C has
| _potential_ for UB, even just the + operator.
|
| So it would absolutely come at a HUGE performance cost,
| unfortunately.
|
| More esoteric stuff is: If you do pointer arithmetic that
| technically goes out of bounds and then _in_ bounds again...
| that 's technically UB (can't remember if this is C++ only or
| both), so you can't rely on knowing where everything is +
| bounds checks.
| matt3210 wrote:
| What behaviors are undefined in rust? Oh wait nobody knows, since
| it has no standard or language spec.
| jcranmer wrote:
| * Reading uninitialized memory
|
| * Violating pointer provenance
|
| * Out-of-bounds pointer accesses (though unlike C, I think,
| it's legal to make a pointer go out-of-bounds and bring it back
| in-bounds and use it)
|
| * Use-after-lifetime
|
| * Storing trap representations in variables
|
| * Having two mutable references to the same memory location
|
| * Data races
|
| Not an exhaustive list, and C has most of these (even the last
| one, although change "two mutable references" to "two restrict
| pointers"). Of course, C itself doesn't have an exhaustive list
| (J.2 is not, in fact, an exhaustive list).
| JonChesterfield wrote:
| Pointer provenance is a nice example. A block of memory
| cannot be read as an array of simd types sometimes and scalar
| types otherwise. It can't contain atomic values which are
| operated on using non-atomic operations during program
| startup before you spawn any threads.
|
| There were proposals to let one mmap existing structures but
| I don't know if any landed. Usually done with reinterpret
| cast and hoping that rule violation doesn't break you.
|
| Pointer provenance does make most application code faster but
| other times it opens a performance gap that you have to step
| outside of C++ to close. Compiler extensions, switching off
| the analysis, changing language.
| angiosperm wrote:
| Use of mmap itself is undefined in the language.
|
| Posix provides a definition that programs rely on, instead.
| Implementers are allowed to define literally anything the
| union of all standards leaves undefined.
| JonChesterfield wrote:
| Mmap itself is alright. You've got a void* from
| somewhere, that's OK. You can placement new into it to
| make objects.
|
| What isn't allowed is casting it to a hashtable type and
| then using it as such. Because there is no hashtable
| instance anywhere, and specifically not there, so you've
| violated the pointer aliasing rules.
|
| The obvious fix is to guarantee that placement new
| doesn't change the bytes, perhaps only for trivially
| copyable types or similar constraint. I didn't see the
| proposals in that direction land but also didn't see them
| fail, so maybe the newer standard permits it.
| LegionMammal978 wrote:
| As I understand it, that's precisely what
| std::start_lifetime_as<T>() does: it effectively performs
| a placement new to create a T object, except that it
| retains the existing bytes at the address. It only works
| with implicit-lifetime types (i.e., scalars, or classes
| with a trivial constructor), though, so it probably
| wouldn't work with your hash table example, except
| perhaps for an inline hash table.
| JonChesterfield wrote:
| Superb! Looking through https://en.cppreference.com/w/cpp
| /memory/start_lifetime_as, this appears to be the right
| thing. It also has volatile overloads (which it looks
| like placement new still does not). This doesn't appear
| to be implemented in libc++ yet but that seems fixable,
| it'll go down the same object construction logic
| placement new does. Thank you for the reference, that'll
| fix some ugly edge cases in one of my libraries.
| agalunar wrote:
| > A block of memory cannot be read as an array of simd
| types sometimes and scalar types otherwise.
|
| As far as I can tell, it is _currently_ the case that,
| _using raw pointers,_ this is not actually undefined
| behavior (but I never entirely trust my conclusions on
| these matters).
|
| "&mut T and &T follow LLVM's scoped noalias model"
| [1][referring to 2 and 3] but I am fairly sure this does
| not currently apply to raw pointers, and "provenance is
| implicitly shared with all pointers transitively derived
| from the original pointer through operations like offset,
| borrowing, and pointer casts." [4]
|
| [1] https://doc.rust-lang.org/reference/behavior-
| considered-unde...
|
| [2] https://llvm.org/docs/LangRef.html#pointeraliasing
|
| [3] "noalias" under
| https://llvm.org/docs/LangRef.html#parameter-attributes
|
| [4] https://doc.rust-lang.org/core/ptr/index.html
|
| Also excellent are
|
| https://faultlore.com/blah/fix-rust-pointers
|
| https://www.ralfj.de/blog/2018/07/24/pointers-and-
| bytes.html
|
| https://www.ralfj.de/blog/2020/12/14/provenance.html
|
| https://www.ralfj.de/blog/2022/04/11/provenance-
| exposed.html
|
| It seems likely you'd already be familiar with these; I'm
| just putting them out there for anyone interested.
| JonChesterfield wrote:
| LLVM can represent various aliasing relationships, modulo
| some risk of C++ inspired bugs in some passes. They might
| all be stamped out now. I remember a bug report about one
| that was open for many years.
|
| I'm happy to hear rust can (probably) represent the same
| relationships LLVM can. C++ cannot, at least as of about
| two years ago when I last looked through the
| corresponding papers. All it can do is different types do
| not alias, where atomic_int and int are different types.
| proto_lambda wrote:
| There is no undefined behaviour in Safe Rust. You're right
| about Unsafe Rust of course.
| lionkor wrote:
| The ultimate "the code is the documentation" is "the compiler
| is the language spec".
| thesuperbigfrog wrote:
| >> The ultimate "the code is the documentation" is "the
| compiler is the language spec".
|
| Rust has a great potential to become a replacement for C and
| C++, but the lack of a language specification is a
| shortcoming that needs to be addressed for it to see wider
| adoption, especially for safety-critical systems.
|
| If the Rust compiler does something surprising, people will
| ask, "Is this a bug?" and without a spec the answer becomes
| the language developers or the community asking, "What should
| the compiler do in this situation?".
|
| It makes sense because the correct behavior (whatever that
| is) has not been defined, but it has a feeling of "we are
| making this up as we go along" because there is no formalized
| answer defined. While this approach is fine for running your
| website or building a command line tool, it is not acceptable
| for safety-critical software. If the software breaks and
| people die, the "we are making this up as we go along"
| approach is not acceptable because it has too much risk.
| lionkor wrote:
| I fully agree, and its definitely a strange feeling coming
| from C++ to not have a single, complete and extensive spec
| to read up on if all else fails.
|
| I want to like Rust, but its already a kitchen sink on par
| with C++ in complexity and misused quirks, not to mention
| macros which hide complexity just like C macros did, that
| the lack of a committee and spec makes it very difficult to
| trust that it won't get more and more features as time goes
| on (becoming like C++, in only the bad ways).
|
| I understand they have an RFC process, but thats not enough
| for a language which is now so commonplace in discussion
| (usually in the form of "if you did it in Rust, this
| problem wouldnt exist", which is often even true).
| iknowstuff wrote:
| Rust macros don't hide anything. They're hygienic and
| clearly annotated when used.
| mike_hock wrote:
| Rust macros are a crutch to work around the language's
| shortcomings. It's just a better crutch than C's.
| iknowstuff wrote:
| >a shortcoming that needs to be addressed for it to see
| wider adoption, especially for safety-critical systems.
|
| This seems like just a hunch of yours that does not seem to
| be reflected by the real world.
| thesuperbigfrog wrote:
| >> This seems like just a hunch of yours that does not
| seem to be reflected by the real world.
|
| What safety-critical systems are written in Rust?
|
| Where can I buy a validated Rust toolchain for safety-
| critical work?
|
| Ferrocene is an effort to build a safety-critical Rust,
| but it is not done yet:
|
| https://ferrous-systems.com/blog/ferrocene-update/
| mjw1007 wrote:
| The good news is that the Rust project has recently agreed
| to write a specification, and has a budget to hire an
| editor for it.
|
| The less good news is that it's likely to take a long time
| before anything resembling a complete description gets
| written.
|
| You can follow its status at https://github.com/rust-
| lang/rust/issues/113527
| thesuperbigfrog wrote:
| >> The good news is that the Rust project has recently
| agreed to write a specification, and has a budget to hire
| an editor for it.
|
| This is awesome to hear. Following that issue . . .
| zer8k wrote:
| > In the long run, unsafe programming languages will not be used
| by mainstream developers, but rather reserved for situations
| where high performance and a low resource footprint are critical.
|
| I see no world where so-called "unsafe" languages would not be
| used. Most graduates of Computer Science programs can, perhaps
| with some trouble, implement a half decent C compiler in a
| weekend or two. This is not a footnote. This fact alone means
| that for any given piece of hardware you're more likely to find a
| random C compiler you can use than anything else. Rust, being the
| most likely contender to replace it, still cannot self-host and
| the grammar is exponentially more complicated than C. It is more
| like C + <whatever> will co-exist peacefully than something like
| C being replaced (even ignoring the millions of lines of code
| that already exist). Not for performance reasons but more that
| you can churn out a C compiler quickly for almost anything given
| a spec of the hardware.
|
| On topic, I find a desk reference for this is very useful. The
| CERT C standard is pretty good to thumb through even if you don't
| adhere to every suggestion.
| pjmlp wrote:
| Just wait until CVE become a liability like handling hazardous
| chemicals.
| ladberg wrote:
| Eh, I don't disagree that unsafe languages will continue to be
| used, but I disagree with ease of compiler design as the
| reason.
|
| You are comparing one of the easier languages to write a
| compiler for (C) with one of the hardest (Rust), and that's not
| due to UB but due to other facets of the languages. I could
| make up a new language that's equivalent to C in every way
| except replace all UB with defined behavior and it wouldn't
| make the naive compiler any different.
|
| Additionally, writing a compiler for a language should really
| be a thing that happens only a handful of times while executing
| the code happens trillions of times so I hope we don't
| sacrifice safety to save compiler authors some work.
| dralley wrote:
| > Rust, being the most likely contender to replace it, still
| cannot self-host
|
| What do you mean, "still cannot self-host?"
|
| You say that like it's a critical failure of the Rust project
| that they need and are attempting to address rather than a
| trivia item. Rust is perfectly happy relying on LLVM just like
| (checks notes) _half the other languages in existence_.
|
| Libraries like LLVM are precisely what the comment you quote is
| talking about.
|
| I'm not even sure that's true, anyway, with the cranelift
| backend. Someone can chime in on whether it's good enough for
| bootstrapping.
| merlincorey wrote:
| Self Hosting your own compiler traditionally was the "end-
| game" of making a compile-able language. It's a sort of proof
| of fitness that the language can literally stand on its own.
|
| This article about Zig achieving self-hosted status in
| 2022[0] points out that they gained many advantages at the
| cost of a lot of time and effort through this process.
| Incidentally, they decided to self-host while also supporting
| LLVM because of deficiencies in LLVM (mainly speed and target
| limitations). This flexibility includes a separate "C"
| backend to compile Zig to C in order to target for example
| game consoles that require a specific C compiler be used.
|
| > You say that like it's a critical goal of the Rust project
| rather than a trivia item.
|
| In my opinion, you are overly minimizing the potential
| benefits to Rust and the Rust community for Rust to be self-
| hosted.
|
| Of course, practically, right now it doesn't matter because
| most people are more than happy to use the already working
| system.
|
| [0] https://kristoff.it/blog/zig-self-hosted-now-what/
| dralley wrote:
| As I said, the cranelift backend exists, and it provides
| many of the same benefits such as improved compilation
| speed. And it's written in Rust.
|
| But it still feels like a trivia item. C compilers written
| in C exist, but almost nobody actually uses them. They use
| GCC, Clang, and MSVC, written in C++. Everybody knows that
| it's possible to self-host C, so the benefit of actually
| doing so in practice is minimal.
|
| It's obviously possible to write a Rust compiler in Rust
| end-to-end. Acting like it's a second tier language because
| actively doing so not a top focus of the community is
| gatekeep-y and ridiculous.
| merlincorey wrote:
| > Acting like it's a second tier language because
| actively doing so not a top focus of the community is
| gatekeep-y and ridiculous.
|
| Here's where I think you are quite a bit off target,
| personally.
|
| I certainly was not and I don't believe the GP you
| originally responded to was saying that "Rust is a second
| tier language due to [lack of self-hosted compiler]", so
| hopefully we can set that statement aside and ignore it
| now.
|
| Let's instead focus on your first statement, which is
| directly related to what GP and I were arguing:
|
| > It's obviously possible to write a Rust compiler in
| Rust end-to-end.
|
| It is certainly possible but actually doing so is
| completely non-obvious because the grammar for Rust is
| much more complicated than C, and Rust has no formal
| language specification (let alone an international
| standard).
|
| While Python does not have an international standard, it
| does have a formal language specification, which is what
| allows for things like PyPy to exist.
|
| Meanwhile, to truly understand Rust, one must be an
| expert in C and learn the `rustc` code base.
|
| It seems like, practically, knowing C and being able to
| write compilers in C is quite useful if you want to make
| an impact in Rust or maybe try your hand at making some
| future Rust replacement (hopefully with a language
| specification that others can follow).
| dralley wrote:
| > It is certainly possible but actually doing so is
| completely non-obvious because the grammar for Rust is
| much more complicated than C, and Rust has no formal
| language specification (let alone an international
| standard).
|
| The Rust compiler frontend is written in Rust. It doesn't
| matter how non-trivial writing a Rust frontend is if you
| can restrict the problem domain to writing a new backend
| for the existing compiler frontend.
|
| And you can. As it stands there is the LLVM backend that
| everyone is familiar with, the GCC backend which is
| nearing completion, and the Cranelift backend which is
| written in Rust.
|
| Zig is similar. Yes, they are going to replace LLVM by
| default, but they're not getting rid of their LLVM
| backend entirely. The main difference between Rust and
| Zig here is a matter of defaults, where Rust defaults to
| using LLVM while Zig will default to their self-hosted
| compiler.
|
| > Meanwhile, to truly understand Rust, one must be an
| expert in C and learn the `rustc` code base.
|
| Are you under the impression that the "rustc" codebase is
| written in C/C++? It is not... It uses LLVM, yes, but
| it's written in Rust.
|
| > I certainly was not and I don't believe the GP you
| originally responded to was saying that "Rust is a second
| tier language due to [lack of self-hosted compiler]", so
| hopefully we can set that statement aside and ignore it
| now.
|
| The discussion started with the statement that Rust will
| never replace unsafe languages without the ability to
| self-host, and then continued with the statement that
| "Self Hosting your own compiler traditionally was the
| "end-game" of making a compile-able language. It's a sort
| of proof of fitness that the language can literally stand
| on its own."
|
| I don't think that was a completely unfair reading of
| these statements. The implication is that Rust is "not a
| fit language" because it "cannot stand on its own" and
| therefore "will never replace unsafe languages".
| zer8k wrote:
| > I don't think that was a completely unfair reading of
| these statements. The implication is that Rust is "not a
| fit language" because it "cannot stand on its own" and
| therefore "will never replace unsafe languages".
|
| I didn't intend this. The primary gripe I had was the
| grammar being complicated (and to be fair...not really
| available in an easy way). That means the places we are
| most likely see such bare metal shenanigans may not adopt
| it because they can't draft a XYZ Co. Compiler. This is a
| semi-common pattern with chip manufacturers.
|
| The conversation diverged after that. Self-hosting is
| simply a signal that a language is "strong enough to
| stand on its own". That doesn't mean non-self hosted
| languages are bad. It just means you still need something
| else to bootstrap it. In the land of bare metal stuff
| like this matters.
| merlincorey wrote:
| > Zig is similar. Yes, they are going to replace LLVM by
| default, but they're not getting rid of their LLVM
| backend entirely.
|
| In the article I linked, they did not say they were
| replacing LLVM by default, but they did say it would
| become the default for DEBUG builds due to the faster
| speed of compilation, to be clear.
|
| > > Meanwhile, to truly understand Rust, one must be an
| expert in C and learn the `rustc` code base.
|
| > Are you under the impression that the "rustc" codebase
| is written in C/C++? It is not... It uses LLVM, yes, but
| it's written in Rust.
|
| I am not under that impression, but I can see how my
| phrasing leads to that conclusion.
|
| After reviewing Rust's Bootstrap on Github[0] I can now
| more precisely state that one's understanding of low-
| level Rust will be enhanced by knowing C/C++ (for the
| LLVM portions) as well as Python (for the Rust does not
| exist on this system downloading of the stage0 binary
| Cargo and Rust compilers from somewhere else).
|
| > Cranelift backend which is written in Rust
|
| When this happens, it seems like it'll be possible to get
| the LLVM bits out of the bootstrap process and lead to a
| fully self-hosted Rust.
|
| So while you may not personally value that, it seems like
| some people in the Rust community do.
|
| [0] https://github.com/rust-
| lang/rust/tree/master/src/bootstrap
| LegionMammal978 wrote:
| > When this happens, it seems like it'll be possible to
| get the LLVM bits out of the bootstrap process and lead
| to a fully self-hosted Rust.
|
| What do you mean by "when this happens"? GP's point is
| that this has _already_ happened: the Cranelift backend
| is feature-complete from the perspective of the language
| [0], except for inline assembly and unwinding on panic.
| It was merged into the upstream compiler in 2020 [1], and
| a Cranelift-based Rust compiler is perfectly capable of
| building another Rust compiler (with some config
| changes).
|
| [0] https://github.com/bjorn3/rustc_codegen_cranelift
|
| [1] https://github.com/rust-lang/rust/pull/77975
| zer8k wrote:
| Except gluing yourself to LLVM has it's own problems.
| Like, for example, any platform that LLVM doesn't support
| you can't support either. LLVM is great. The monoculture
| and smug elitism it produces is not.
|
| > Acting like it's a second tier language because
| actively doing so not a top focus of the community is
| gatekeep-y and ridiculous.
|
| It is probably one of the major reasons we won't see a
| Rust compiler shipped with an operating system for a very
| long time. That doesn't make it second tier. However,
| Rust fans seem to want to stick their head in the sand
| when their baby is criticized. I am a Rust (language) fan
| myself. I am just willing to criticize the language. I do
| not understand why the Rust community has such a volatile
| response to honest, valid, criticism.
| learn-forever wrote:
| it's a ridiculous criticism, and the insult doesn't make
| it less ridiculous
| dralley wrote:
| >It is probably one of the major reasons we won't see a
| Rust compiler shipped with an operating system for a very
| long time.
|
| Even most linux distros don't ship with GCC out of the
| box... much less MacOS and Windows with their respective
| compilers.
|
| If your standard is "Gentoo and FreeBSD will never ship
| it out of the box" then I'm going to 100% stand by my
| statement that this is weird and gatekeep-y.
|
| Especially when the Windows kernel and userspace system
| libraries both have Rust in them.
|
| https://www.bleepingcomputer.com/news/microsoft/new-
| windows-...
|
| https://www.thurrott.com/windows/282471/microsoft-is-
| rewriti...
| tialaramex wrote:
| > we won't see a Rust compiler shipped with an operating
| system for a very long time.
|
| I can't figure out what this constraint means.
|
| My Windows laptop doesn't seem to have provided a C
| compiler, so, maybe that's a problem for Windows?
|
| Huh, well I guess I can buy or download a third party
| compiler, that's easy enough, but then, I can do that for
| Rust too, so, doesn't seem like a difference.
|
| Meanwhile on this Fedora machine, the Rust compiler came
| with the OS. So, is this not an operating system? Maybe
| the stuff it comes with isn't "shipped with" it somehow?
| And so there's no C compiler "shipped with" this
| operating system either, although GCC was installed too ?
| I just don't know what to make of such a criticism.
| patrec wrote:
| > Most graduates of Computer Science programs can, perhaps with
| some trouble, implement a half decent C compiler in a weekend
| or two.
|
| Where "most" of course means < 0.1%.
| badsectoracula wrote:
| > Most graduates of Computer Science programs can, perhaps with
| some trouble, implement a half decent C compiler in a weekend
| or two. This is not a footnote. This fact alone means that for
| any given piece of hardware you're more likely to find a random
| C compiler you can use than anything else.
|
| I think C being a (relatively) very simple is indeed a feature
| it has - however not so much because you can make a compiler
| for it easily (not that it isn't a pro, but it isn't that
| important in practice) but because it means it is easier to
| learn and easier to write tools for.
| dale_glass wrote:
| I don't see how that reasoning is supposed to work in modern
| times.
|
| Who out there is seriously using a compiler churned out in a
| weekend? The fact that you can do it doesn't mean anybody
| seriously would use that.
|
| We're also not really creating architectures anymore. There's
| RISC-V, and Rust already supports that.
| zer8k wrote:
| > Who out there is seriously using a compiler churned out in
| a weekend?
|
| Someone at a chip manufacturer writing something for a brand
| new chipset, for example. It takes a long time to get stuff
| shoved into GCC. It's only in recent history has life settled
| on one or two "big" compilers. There are still plenty of
| other places where you will find bespoke compilers. Perhaps
| not commonly, but they do exist (especially in embedded).
| zabzonk wrote:
| perhaps it is just me, but i have never experienced any of the
| problems outlined in the comments here, despite of writing a
| shedload of C and C++ code (and fortran, assembler and other
| stuff). and i don't think i am a coding god.
| AnimalMuppet wrote:
| (2010)
| dang wrote:
| Added. Thanks!
| dang wrote:
| Related:
|
| _A Guide to Undefined Behavior in C and C++ (2010)_ -
| https://news.ycombinator.com/item?id=18372613 - Nov 2018 (103
| comments)
|
| _A Guide to Undefined Behavior in C and C++ (2010)_ -
| https://news.ycombinator.com/item?id=9884074 - July 2015 (10
| comments)
|
| _A Guide to Undefined Behavior in C and C++, Part 1_ -
| https://news.ycombinator.com/item?id=2544159 - May 2011 (2
| comments)
| jbandela1 wrote:
| If you want some nice examples of how undefined behavior results
| in weirdness, see
| https://devblogs.microsoft.com/oldnewthing/20140627-00/?p=63...
|
| An interesting example from there is how the compiler can turn
| int table[4]; bool exists_in_table(int v) {
| for (int i = 0; i <= 4; i++) { if (table[i] == v)
| return true; } return false; }
|
| Into: bool exists_in_table(int v) {
| return true; }
| fluoridation wrote:
| What's odd about that example is that the optimization is only
| valid if the loop in fact overflows the array every time. So
| the compiler is proving that the array is being overflowed and
| rather than emitting a warning to that effect, it generates
| absurd code.
| kllrnohj wrote:
| > So the compiler is proving that the array is being
| overflowed and rather than emitting a warning to that effect
| <source>:5:13: warning: unsafe buffer access [-Wunsafe-
| buffer-usage] 5 | if (table[i] == v) return
| true; | ^~~~~
|
| https://godbolt.org/z/zGxnKxvz6
|
| This one is weirdly hard to get a compiler warning out of
| which is a fair critque, but so many of the "Look what the
| compiler did to my UB silently!" issues are not at all silent
| and would have been stopped dead with "-Wall -Wextra -Werror"
| iainmerrick wrote:
| As noted elsewhere in this thread, GCC by default does the
| "optimization" and doesn't warn. No doubt there are other
| examples where Clang is the one that misbehaves.
|
| How are we supposed to know whether our code is being
| compiled sensibly or not, without poring over the
| disassembly? Just set all the warning flags and hope for
| the best?
| UncleMeat wrote:
| I think that a big problem is that for every compile that
| seems "not sensible" and is actually not sensible, there
| are 100s or 1000s of compiles that would look absolutely
| insane to a human but are actually exactly what you want
| when you sit down and think about it for a long time.
|
| Almost all of the "don't do the overly clever stuff!"
| proposals would throw away a huge amount of actually
| productive clever stuff.
| fluoridation wrote:
| I think what the GP means by "not sensible" is that
| proving that the code is broken in order to silently
| optimize it more aggressively is not sensible. If your
| theorem proven can find a class of bugs then have it emit
| diagnostics. Don't _only_ use those bugs to make the code
| run faster. Yes, make the code run faster, but let me
| know I may be doing something nonsensical, since chances
| are that it is nonsensical and it doesn 't cost anything
| at run time.
| UncleMeat wrote:
| Right and the next part is the hard part: defining this
| clearly. What I'm saying is that there is a surprising
| amount of "wait, actually I do want that" when you dig
| into this proposal.
| mike_hock wrote:
| A warning is only useful if it prescribes a code
| transformation that affirms the programmer's intent and
| silences the warning (unless the warning was a true
| positive and caught a bug). You cannot simply emit a
| warning every time you optimize based on UB.
|
| There is no `if(obvious out-of-bound access) silently
| emit nonsense har har har` in the compiler's source code.
| The compiler doesn't understand intent or the program as
| a whole. It applies micro transformations that all make
| sense in isolation. And while the compiler also tries to
| detect erroneous programming patterns and warn about
| those, that's exceedingly more difficult.
| moefh wrote:
| > whether our code is being compiled sensibly or not
|
| I'm failing to see what's not sensible about how that
| code is compiled.
|
| The only possible way that function could return false is
| if you read past the end of the array and the value there
| happens to be different from `v`. Is it really the more
| sensible to rely on that, rather than fixing a known
| behavior in case of array overflow?
| robinsonb5 wrote:
| If the compiler's going to interpret undefined behaviour
| as license to do something that runs counter to the
| programmer's expectations, the most sensible course of
| action is for the compiler to yell very loudly about it
| instead of near-silently producing (differently!) broken
| code.
|
| Currently that piece of code doesn't trigger a warning
| with -Wall. It's not even flagged with -Wextra - it needs
| -Weverything.
| moefh wrote:
| One man's "broken code produced by the compiler" is
| another man's "excellently optimized code by the
| compiler".
|
| Where to draw the line is not always clear, but here's a
| very clear-cut example[1] where emitting a warning would
| be bad. If you don't want to watch the video, it's
| basically this:
|
| - the code technically contains undefined behavior, but
| it will never be actually triggered by the program
|
| - changing the code to remove undefined behavior forces
| the compiler to emit terrible code
|
| Making the compiler yell at the programmer in this case
| would be terrible, but it's clearly a consequence of what
| you're asking.
|
| [1] https://youtu.be/yG1OZ69H_-o?t=2358
| fanf2 wrote:
| No, the logic for the optimization is:
|
| - a correct program does not access table[4]
|
| - therefore the loop must always exit early
|
| - the only way to exit early is to return true
| tedunangst wrote:
| No, the compiler knows the array isn't overflowed, because C
| programs don't contain overflows. Therefore the loop must
| exit via one of the return true statements.
| JonChesterfield wrote:
| The amazing part about examples like that is people read them,
| check that the compiler really does work on that basis, and
| then continue writing things in C++ anyway. Wild.
|
| Suppose I should expand on this. The idea seems to be either
| 1/disbelief - compilers wouldn't really do this or 2/
| infallibility - my code contains no UB.
|
| Neither of those positions bears up well under reality.
| Programming C++ is working with an adversary that will make
| your code faster wherever it can, regardless of whether you
| like the resulting behaviour of the binary.
|
| I suspect rust has inherited this perspective in the compiler
| and guards against it with more aggressive semantic checks in
| the front end.
| Gibbon1 wrote:
| What's amazing is programmers haven't tared and feathered the
| standards committee and compiler writers for allowing crap
| like that.
| ninepoints wrote:
| It's just as "amazing" to read these takes from techno
| purists. You use software written in C++ daily, and it can be
| a pragmatic choice regardless of your sensibilities.
| erik_seaberg wrote:
| And we have the core dumps to prove it.
|
| When any Costco sells a desktop _ten thousand_ times faster
| than the one I started on, we can afford runtime sanity
| checks. We don't have to keep living like this, with stacks
| that randomly explode.
| johnbellone wrote:
| But it isn't Rust.
| jacquesm wrote:
| Lots of things 'aren't Rust'. In fact almost everything
| isn't Rust. For now. That may change in due course but
| right now I would guestimate the amount of Rust code
| running on my daily drivers to pretty close to zero%. The
| bulk is C or C++.
| angiosperm wrote:
| Hardly anything is. Literally none of the programs on my
| machine are coded in Rust. (Firefox is reputed to have a
| bit in it.)
| jacquesm wrote:
| About FF and Rust:
|
| https://news.ycombinator.com/item?id=30743577
| JonChesterfield wrote:
| Definitely. There's loads of value delivered by C++
| implementations, including implementations of C++ and other
| languages. The language design of speed over safety mostly
| imposes a cost in developer / debugging time and fear of
| upgrading the compiler toolchain. Occasionally it shows up
| in real world disasters.
|
| I think we've got the balance wrong, partly because some
| engineering considerations derive directly from separate
| compilation. ODR no diagnostic required doesn't have to be
| a thing any more.
| peppermint_gum wrote:
| >The amazing part about examples like that is people read
| them, check that the compiler really does work on that basis,
| and then continue writing things in C++ anyway. Wild.
|
| Well, in modern C++ this code would look like this:
| std::array<int, 4> table; bool exists_in_table(int v)
| { for (auto &elem : table) { if
| (elem == v) return true; } return
| false; }
|
| Or even simpler: std::array<int, 4> table;
| bool exists_in_table(int v) { return
| std::ranges::contains(table, v); }
|
| There's no shortage of footguns in C++, but nonetheless,
| modern C++ is safer than C.
| mike_hock wrote:
| Weirdly, GCC fails to optimize this, but Clang does (if you
| make the table static as in the original example).
| gizmo686 wrote:
| I actually would prefer to get the second output. The result
| is wrong, but consistantly and deterministically so. The
| naive implementation of the broken code is a heisenbug.
| Sometimes it will work, and sometimes it won't, and any
| attempt to debug it would likely perterb the system enough to
| make the issue not surface.
|
| It wouldn't suprise me if I have run into the latter
| situation without relizing it. When I got the the problem, I
| would have just (incorrectly) assumed that the memory right
| after the array happened to have the relevent value. I would
| be counting my blessings that it happened consistantly enough
| to be debuggable.
| jll29 wrote:
| I agree that it is better to get deterministic and
| predictable behavior.
|
| Reminds me of when for a while, I worked on HP 9000s under
| HP-UX and in parallel on an Intel 80486-based Linux box,
| and what I noticed is that the Unix workstations crashed
| sooner and more predictably with segmentation faults than
| Linux on the PC (not sure if this has changed since the
| early 1990s - probably had to do with the MMU); so
| developing on HP under Unix and then finally compiling
| under Linux led to better code quality.
| _gabe_ wrote:
| > check that the compiler really does work on that basis, and
| then continue writing things in C++ anyway. Wild.
|
| My compiler (MSVC) doesn't do that[0]. Clang also doesn't do
| this[1]. It's wild to me that GCC does this optimization[2].
| It's very subtle, but Raymond Chen and OP both say a compiler
| _can_ create this optimization, not that it _will_.
|
| [0]: https://godbolt.org/z/bdx4EMzxe
|
| [1]: https://godbolt.org/z/z833Wa391
|
| [2]: https://godbolt.org/z/6b8aq59M9
| jandrewrogers wrote:
| > The amazing part about examples like that is people read
| them, check that the compiler really does work on that basis,
| and then continue writing things in C++ anyway.
|
| That isn't idiomatic C++ and hasn't been for a long time.
| Sure, it's _possible_ to do it retro C-style, because
| backward compatibility, but you generally don 't see that in
| a modern code base.
| JonChesterfield wrote:
| The modern codebase has grown from a legacy one. The legacy
| one with parts of the codebase that were C, then got
| partially turned into object oriented C++, then partially
| turned into template abstractions. The parts least likely
| to have comprehensive test coverage. _That_ place is indeed
| where a compiler upgrade is most likely to change the
| behaviour of your application.
___________________________________________________________________
(page generated 2023-08-17 23:00 UTC)