[HN Gopher] Catch-23: The New C Standard Sets the World on Fire
___________________________________________________________________
Catch-23: The New C Standard Sets the World on Fire
Author : donmcc
Score : 215 points
Date : 2023-04-01 21:01 UTC (1 days ago)
(HTM) web link (queue.acm.org)
(TXT) w3m dump (queue.acm.org)
| i-use-nixos-btw wrote:
| This is written with quite a lot of hyperbole.
|
| The predominant focus is realloc(pre,0) becoming UB instead of
| what the author misleadingly describes as useful, consistent
| behaviour. It is far from that, and that's the entire reason that
| it was declared UB in the first place: https://www.open-
| std.org/jtc1/sc22/wg14/www/docs/n2464.pdf. Note that this wasn't
| a proposal to change something, it's a defect report: the
| original wording was never suitable.
|
| The second part is the misconception about the impact of UB.
| Making something UB does not dictate that its usage will initiate
| the rise of zombie velociraptors. It grants the implementation
| the power to decide the best course of action. That is, after
| all, what they've been doing all this time anyway.
|
| Note that this deviates from implementation-defined behaviour,
| because an implementation-defined behaviour has to be consistent.
| Where implementations choose to let realloc(ptr,0) summon the
| zombie raptors, they are free to do so. Don't like it? Don't
| target their implementation. Again, this isn't a change from the
| POV of implementers - it's a defect in the existing wording.
|
| In this case, the course of action that any implementation will
| choose is to stick with the status quo. It is clearly not a
| deciding factor in whether or not you embrace the new standard,
| and to suggest otherwise is dishonest, sensationalist nonsense.
| The feature was broken, and it's just being named as such.
| Asooka wrote:
| > The second part is the misconception about the impact of UB.
| Making something UB does not dictate that its usage will
| initiate the rise of zombie velociraptors. It grants the
| implementation the power to decide the best course of action.
| That is, after all, what they've been doing all this time
| anyway.
|
| Wrong. UB never happens. That is the promise the program writer
| makes to the compiler. UB never happens. A correct C program
| never executes UB. This allows the compiler to assume that
| anything that is UB never happens. Does some branch of your
| program unconditionally execute realloc(..., 0) after constant
| propagation? That branch never happens and can just be deleted.
|
| Reading the defect report, they state "Classifying a call to
| realloc with a size of 0 as undefined behavior would allow
| POSIX to define the otherwise undefined behavior however they
| please." which is wrong. UB cannot be defined, if you define
| it, you are no longer writing standard C. It should instead
| have been classified as "implementation-defined behaviour".
|
| In any case it's not that hard to just write a sane wrapper.
| This one is placed in the Public Domain: void
| *sane_realloc(void *ptr, size_t sz) { if
| (sz == 0) { free(ptr); /*free(NULL) is no-op*/
| return NULL; } if (ptr == NULL) {
| return malloc(sz); } return
| realloc(ptr, sz); }
|
| I am calling it sane and not safe, because it is not safe. You
| still have the confusion of what happens when the function
| returns NULL (was it allocation failure or did we free the
| object?) - check errno. However, it has the same fully defined
| semantics on most all implementations and acts like people
| would expect.
|
| You may be tempted to make the function return the value of
| errno, mark it [[nodiscard]] and take a pointer-to-pointer-to-
| void, so that the value of the pointer will only be changed if
| the reallocation was successful. I am not sure if that is
| safer. You are trading one possible bug - null pointer on
| allocation failure, which then will cause a segmentation fault
| for another - stale pointer on allocation failure, but with
| updated size. The latter is more likely to be used in buffer
| overflow attacks than the former.
| Arch-TK wrote:
| I agree that realloc was poorly defined for the 0 size case, I
| think UB or IDB both would have worked in this case to really
| drive that point home, the WG chose UB.
|
| That being said, you're completely wrong about what UB means.
| Making use of UB may as well initiate the rise of zombie
| velociraptors. Except for the situation where your
| implementation explicitly specifies that it provides a
| predictable behaviour for a specific case of UB, there's
| literally no guarantee of what will happen. Assuming that the
| implementation will stick with some status quo and your code
| won't exhibit absolutely unusual behaviour is just naiive.
|
| Please don't mislead people into thinking that it's ever a good
| idea to assume that undefined behaviour will be handled
| sensibly, this kind of mislead assumption is one of the major
| sources of bugs in C code.
| cryptonector wrote:
| Right, this should have been left to the implementor if they
| didn't want to standardize one behavior. Making it UB is the
| worst possible outcome. Yes, people who write portable code
| will still want to not rely on `realloc()`'s freeing
| behavior, but if you do and your realloc() implementation
| doesn't, then you suffer a leak, while if you do and
| realloc() decides to wipe your drive and make your power
| supply explode...
| astrange wrote:
| > Except for the situation where your implementation
| explicitly specifies that it provides a predictable behaviour
| for a specific case of UB, there's literally no guarantee of
| what will happen.
|
| That situation is "when you have UBSan turned on".
| G_z9 wrote:
| [flagged]
| G_z9 wrote:
| How this gets downvoted is beyond me
|
| Yeah, I'm being downvoted by a bot or something.
| SAI_Peregrinus wrote:
| It's not directly related to the topic at hand. It's
| meta-commentary about the discussion, not about the
| actual topic. That, and your post I'm replying to, and
| this reply I'm making should all be downvoted as they're
| all off-topic.
| Dylan16807 wrote:
| > How this gets downvoted is beyond me
|
| Primarily because you're bringing in an argument from a
| different story entirely rather than figuring out a
| better option.
|
| But also you're being very rude in that other thread, and
| calling twitter "high bandwidth" for a discussion is...
| weird.
|
| > Yeah, I'm being downvoted by a bot or something.
|
| Uh huh.
| G_z9 wrote:
| Ok, that's not unreasonable. But I think that making an
| unrelated comment like that is really only a bad thing if
| it's in bad faith. He made that comment expecting a
| response, I'm not like hounding him. And yes, I said some
| rude things. Are you going to downvote every comment that
| you see from me because some other comments I made were
| rude? Doesn't really add up. And why are you policing the
| threads? That's more weird than asking for a twitter
| space.
|
| I don't think asking for a twitter space is all that
| weird. I am constantly frustrated talking with people on
| HN because what could take 2 seconds takes 20 minutes. I
| often find that a debate never even has a chance to be
| resolved because everyone just gets worn out trying to
| talk through a digital straw. Plus, asking for a twitter
| space doesn't involve exchange of personal information or
| anything concerning. It's definitely not done and off the
| wall but I don't think it's problematic.
|
| Edit: the more I think about it, the more sense it makes.
| HN has a problem with being flooded with vitriol and and
| lots of other negative behavior, long chains that are
| just useless. It would make a lot of sense to offload
| most of that to another platform since HN as a platform
| is not well suited to debating. Instead of initiating a
| huge chain of vitriol, a twitter space could be initiated
| when people want to debate something. Instead of tons of
| noise and garbage, HN would host a link to the space. And
| it would be better because the nature of a space lends
| itself to people coming to a conclusion, covering the
| issue more thoroughly and people letting loose less hate,
| all because of the high bandwidth, intimate nature of
| real-time audio. It also helps filter out people who are
| bots or aren't serious or who don't really care about the
| topic being debated. I am legitimately going to email HN
| admin about this.
| astrange wrote:
| Hey, I think getting into arguments for a day then
| randomly giving up and wandering off is what the sites
| all about. Actually, I think the guy who stops replying
| first kind of wins - it's similar to why you shouldn't
| double-text when dating.
|
| > I don't think asking for a twitter space is all that
| weird.
|
| My issue is that I don't think I have anything to
| contribute as I'm not making original conclusions but
| kind of just quoting a typical labor economist.
| (Different from quoting the average person, they're
| usually worried about different things.)
|
| Example being https://www.apricitas.io/p/chatgpt-please-
| take-my-job.
| Dylan16807 wrote:
| I didn't downvote you, by the way.
|
| But sure posting in the wrong topic will get a downvote
| to turn your comment gray. What is wrong with that
| policing?
|
| > Are you going to downvote every comment that you see
| from me because some other comments I made were rude?
| Doesn't really add up.
|
| ...what? You continued the thread here. Nobody is
| downvoting random comments of yours. Your comments on
| that story and this story are part of the same
| conversation.
|
| And if you can't figure out how to reply in a deep thread
| you can just wait a couple minutes for the link to be
| there.
|
| > a twitter space
|
| Oh, the chat thing. I thought you meant _tweets_. Sure,
| that 's a reasonable idea for some conversations.
| G_z9 wrote:
| Yes I've figured out the timer. As a 2010 account, I kind
| of have to just yield to you.
| coliveira wrote:
| > this kind of mislead assumption is one of the major sources
| of bugs in C code.
|
| This is not even close to be true. Most bugs in C code are
| from programmer mistakes, not from UB behavior. The
| exaggeration that is spread by some people regarding UB is
| close to absurd. If something is UB, it may generate
| different results in different situations, even with the same
| compiler. The standard is just clarifying this problem. A
| good compiler will do something sensible, or at least issue a
| warning when this situation is detected. If you have a bad
| compiler that does strange things with your code, it's not a
| defect of UB but the compiler instead.
| wruza wrote:
| Optimizing compilers don't work like that. They can either
| deviate from the standard and leave it as defined behavior,
| or mark it UB and go with it as usual.
|
| To get some insight by analogy, consider this set of
| constraints (unrelated to C): x <= 7
| 2x >= 5 ...(more with x, y, z but not more
| constraining x)...
|
| When you feed this to a linear constraint solver, you may
| get anything from 2.5 to 7 as x. E.g. 3.1415926. Not
| because a solver wanted to draw some circles, but because
| it transformed your geometric problem into an abstract
| representation for its own algorithm, performed some fast
| operations over it and returned the result. Nobody knows
| how exactly a specific solving method will behave wrt
| (underconstrained) x given that the description above is
| all you have.
|
| When you feed UB into an optimizer, you feed a bit of lava
| into a plastic pipe, figuratively. You'll get anything from
| program #2500...0000 to program #6999...9999, where "..."
| is few more thousands/millions of digits. Run some numbers
| from there as an .exe to see if something absurd happens.
|
| The nature of UB and optimizers is that you either relax
| UBs into DBs and get worse efficiency, or you specify more
| UBs and get worse programming safety. What happens in
| between can be perceived as completely random. And the
| better/faster the optimizer is, the more random the outcome
| will likely be.
|
| _The exaggeration that is spread by some people regarding
| UB is close to absurd_
|
| UB-in-code is absurd by definition, no exaggeration here.
| Arch-TK wrote:
| > Most bugs in C code are from programmer mistakes
|
| These most often lead to the triggering of UB. The reason
| why programmer mistakes lead to confusing bugs instead of
| simple and straightforward bugs which are easy to catch in
| the development process is mainly because UB imposes no
| restrictions on what the compiler should do. In the vast
| majority of UB cases the compilers simply don't do
| anything, and assume it can't happen. This is why
| dereferencing a pointer and then checking if it's null ends
| up eliding the null check (because if you've dereferenced
| it, it can't be null, that would be UB). Accessing past the
| end of an array is UB so it can't happen, therefore your
| compiler won't check for it. Accessing past the end of an
| array and accidentally reading from/writing to another
| variable - likewise.
|
| UB encompasses ALL behavior for which the standard does not
| provide an explicit definition. The reason why the C
| standard provides explicit instances of UB usually boils
| down to clarifying situations where people were confused
| about whether something was UB or not. But if the behaviour
| is not defined in the standard, then it is by definition
| UB.
| SCLeo wrote:
| If I am not wrong, one major security bug that C programs
| usually face is buffer overflow, which is an undefined
| behavior.
| omoikane wrote:
| > This is written with quite a lot of hyperbole
|
| The first sight of "catch fire" might not have caught my
| attention, but by the time it got to "instrument of arson" and
| "Molotov cocktails", the style was sufficiently distracting
| that I was convinced I wasn't the intended audience.
| c4mpute wrote:
| > The second part is the misconception about the impact of UB.
| [...] It grants the implementation the power to decide the best
| course of action. That is, after all, what they've been doing
| all this time anyway.
|
| Wrong, Wrong, Wrong.
|
| UB allows the implementation to take any arbitrary course of
| action, without informing anyone, without documentation,
| without any conscious decision, without weighing anything to be
| better/worse. Nondeterministically catching fire and launching
| nuclear rockets is a completely compliant reaction to UB.
|
| What you are describing is "implementation defined" behavior.
| That has to be deterministic, documented, and conforming to
| some definition of sanity. Examples are the binary
| representation of NULL, sizes of integer types or stuff like
| the maximum filename length. Sadly, too many things in C have
| "undefined behavior", too few have "implementation defined"
| behavior.
|
| And UB has always been an excuse for compilers to screw over
| programmers in hideous ways. Programmers are rightfully afraid
| of any kind of new UB being introduced, because it will mean
| that whole new classes of bugs will arise because the compiler
| optimized out that realloc(..., a) where a might be 0, because
| thats UB, so screw you and your code... And this change is
| especially dangerous because it makes a lot of existing code
| UB.
| jcranmer wrote:
| The case of realloc being declared UB (as opposed to impl-
| defined) was not driven by the compiler writers but by the
| people who write the C libraries.
|
| This isn't a case of compilers screwing over the programmers,
| because the people who are responsible for those
| optimizations are the people who are scratching their heads
| as to why it's UB and not impl-defined behavior.
| AlotOfReading wrote:
| I wish UB were only as nasty as "nondeterministic behavior".
| In fact, if there's UB in _anything_ the compiler sees,
| nothing at all can be assumed, including whether you even get
| an output. What you 've given the compiler isn't C, so it
| doesn't have any obligations to do anything with it. The
| codepath with UB doesn't have to run for the nuclear rockets
| to launch and the nasal demons to appear.
|
| Since approximately every nontrivial program ever written has
| UB, in actual practice we're only saved by the fact that
| compilers aren't entirely maliciously compliant.
| Dylan16807 wrote:
| That's not true. If the program's execution path from start
| to finish avoids UB then you're safe. (Also the source code
| itself has to avoid UB, but that part isn't hard.)
|
| It's true that code with UB does not have to be reached,
| per se, but it does have to be something your program
| _will_ reach before it can hurt you.
| AlotOfReading wrote:
| You're correct in practical terms, but I'm making a very
| pedantic point about what the standard requires happen,
| mainly because this pedantry has important implications
| for e.g. safety critical C. Note 1 to the definition in
| 3.4.3 provides some clarification about the extent of UB
| and states that UB can manifest at translation time. It
| also gives says that the translator should behave in a
| documented manner when encountering UB, but does not
| _require_ that it do so.
| AnimalMuppet wrote:
| Fine. HN is, after all, a place where you can be
| pedantic.
|
| But those of us who are actually writing programs mostly
| care about "in practical terms", and in practical terms,
| this doesn't happen, so _we don 't care_. We've got
| enough trouble worrying about what _does_ happen; we don
| 't have time and energy to worry about what _doesn 't_
| and _won 't_ happen.
| still_grokking wrote:
| That's like saying: "I don't care what the standard
| says!"
|
| Sure, this is perfectly fine.
|
| Only that you're not writing any C/C++ than, but
| something in the "gcc 12 language with some switches", or
| maybe the "LLVM 15 language with some switches", or
| something like that.
| AnimalMuppet wrote:
| Well, if Visual Studio (or whatever Microsoft calls their
| compiler these days), and all known versions of gcc, and
| all known versions of LLVM all do something sane, then
| I'm not sure I care all that much about the theoretical
| possibility that some compiler someday might do something
| insane.
| AlotOfReading wrote:
| To provide some more context/motivation for why you might
| care, I write safety-critical code. I'm often advising
| people what they need to do for certification, etc. If
| all you need to do is ensure that you never execute
| undefined operations and knock out the list of specified
| UB, that's totally, 100% manageable. Throw some
| sanitizers on, provide realistic input, and test the hell
| out of it. Normal stuff.
|
| If the reality is that any UB can invalidate the entire
| program (as is the interpretation taken by other
| standards re: C), then that's not remotely sufficient.
| You have to ensure the complete absence of UB.
| LegionMammal978 wrote:
| C has both translation-time UB and runtime UB. (C++
| explicitly separates the two concepts into "ill-defined,
| no diagnostic required" and "undefined behavior".) You
| can tell them apart from the condition for UB to occur:
| if it's a translation-time condition, then it's
| translation-time UB, and if it's a runtime condition,
| then it's runtime UB. (Same with implicit UB: is it a
| translation-time or a runtime assumption being violated?)
|
| Usually when we talk about UB, we're implicitly talking
| about runtime UB, since translation-time UB is generally
| far less subtle. If a program contains only conditional
| runtime UB, the compiler is not permitted to break the
| entire program from the very beginning, since all
| possible executions that do not trigger runtime UB must
| execute correctly as per 5.1.2.3.
| AlotOfReading wrote:
| 5.1.2.3 only binds _conforming_ programs. Programs
| containing UB are by definition non-conforming.
|
| I hadn't considered the C++ standard here, but 1.9 is
| much more clear than corresponding C verbiage. 1.9.5 is
| exactly what's described upthread, where any "execution
| [that] contains an undefined operation" has no prescribed
| behavior. But the note to the requirement immediately
| before that (1.9.4) doesn't use that language and instead
| "imposes no requirements on programs that contain UB". If
| they had intended only to avoid specifying semantics for
| programs that hit UB during some possible execution, they
| would have used the same language as 1.9.5.
| Kranar wrote:
| Your claim is actually false. C differentiates between a
| conforming program and a strictly conforming program.
| 5.1.2.3 binds to conforming programs which is permitted
| to produce output dependent on undefined behavior.
|
| Only strictly conforming programs may not produce output
| dependent on undefined behavior.
| AlotOfReading wrote:
| No? Conformance allows unspecified and implementation
| defined. Strict conformance is the absence of that (i.e.
| same output in every conforming environment). Neither
| includes UB, as UB is "outside the standard" in some
| sense and doesn't have defined semantics.
| Kranar wrote:
| It's a common misconception that a conforming program may
| not engender undefined behavior. In fact this very
| article touches on how realloc has introduced new (and
| backwards incompatible) undefined behavior precisely to
| accommodate the POSIX standard (so that POSIX compliant
| implementations of C can redefine the otherwise undefined
| behavior however they please).
| AlotOfReading wrote:
| Can you cite that? It runs against a plain reading of the
| standards (both C and C++) and would be insane for the
| standard to allow "correct" programs to include those
| with undefined behavior. There was even an unadopted
| proposal (n853 [1]) attempting to clarify this.
|
| While I was making sure I wasn't missing something
| obvious, I took a look through the rest of the WG14
| proposals to see if I was somehow off in my understanding
| regarding translators being allowed to barf over UB
| anywhere in the program. There was a proposal clarifying
| the situation to the possible-execution understanding
| from upthread submitted by Victor Yodaiken (n2278 [2]),
| but unfortunately it was also never adopted.
|
| [1] https://www.open-
| std.org/jtc1/sc22/wg14/www/docs/n853.htm
|
| [2] https://www.open-
| std.org/jtc1/sc22/wg14/www/docs/n2278.pdf
| [deleted]
| coliveira wrote:
| > approximately every nontrivial program ever written has
| UB
|
| You can replace "UB" for "bugs" and the result is the same.
| UB is a bug on the part of the programmer, from the point
| of view of C, similar to dereferencing a null pointer. When
| the standard says that something is UB, it is just
| clarifying what these situations are.
| pmarin wrote:
| If they are bugs they should be reported to the user and
| end the compilation with an error.
| the_why_of_y wrote:
| Compilers actually have some options to enable that.
|
| The problem is, it only works well in the simplest cases
| when the code will 100% exhibit UB within a single
| function.
|
| In most cases, the UB would only manifest on particular
| input values - if you want your compiler to warn about
| that then it will report one "potential UB" for every 10
| lines of C code, and nobody wants to use such a compiler.
| mjevans wrote:
| That's exactly why a compiler shouldn't be able to
| 'optimize' in the face of UB, it should be an ERROR and
| the section of undefined behavior highlighted in the
| error message.
| gpderetta wrote:
| We rehash this argument every few weeks. Please search
| the comment history why it is nonsensical.
| circuit10 wrote:
| Doing that at compile time would require being able to
| perfectly predict everything the program can do, which is
| equivalent to solving the halting problem (make the
| program do something undefined after it finishes, then if
| you get an error at compile time then it halts) and is
| mathematically impossible. Doing it at runtime would have
| a massive performance impact
| chongli wrote:
| This would mean you'd have to insert a check every time
| you add two signed integers together, because signed
| overflow is UB. You'd also have to wrap every memory
| access with bounds checks, because OOB memory access is
| UB.
|
| There are also tons and tons of loop optimizations
| compilers do for side-effect free loops which would have
| to be removed completely. This is because infinite loops
| without side effects are UB. So if you wanted these
| optimizations you'd have to prove to the compiler -- at
| compile time -- that your loop is guaranteed to terminate
| since it is not allowed to assume that it will. Without
| these loop optimizations, numerical C code (such as
| numpy) would be back in the stone ages of performance.
|
| _Edit_ : I just wanted to point out that one of the new
| features in C23 is a standard library header called _<
| stdckdint.h>_ that includes functions for checked integer
| arithmetic. This allows you to safely write code for
| adding, subtracting, and multiplying two unknown signed
| integers and getting an error code which indicates
| success or failure. This will be the standard preferred
| way of doing overflow-safe math.
| heywhatupboys wrote:
| > because signed overflow is UB
|
| no longer
| mafuy wrote:
| > you'd have to insert a check every time you add two
| signed integers together,
|
| This is exactly what is done in serious code. It is
| typically combined with contracts and static analysis
| (often human), e.g. "it is guaranteed that this input is
| in range 10-20, so adding it with this other 16 bit int
| can be assumed to be below sint32_max".
| pclmulqdq wrote:
| Great, those checks can stay in "serious" code, and those
| of us who don't want them can take the UB. C++ 20
| actually ended up specifying that all ints are twos
| complement, removing this from the category of "UB," but
| a lot more weird stuff is programmed in C.
| gpderetta wrote:
| Note that signed overflow is still UB in c++ even with
| 2-complement being guaranteed for signed types.
| DangitBobby wrote:
| Another option would be to define behaviors for integer
| overflow and out of bounds memory access. Presumably they
| happen fairly often and it might be a good idea to nail
| down what should happen in those cases.
| gpderetta wrote:
| Good luck defining the behaviour of use after free of
| accessing out of bound stack memory without bound
| checking and GC.
| bluecalm wrote:
| UB is a better option though. When your signed integer
| overflows it's a bug nevertheless. Why force the compiler
| to generate code for a pointless case instead of letting
| it optimize the intended one?
|
| If you value never having bugs over performance then just
| insert a check or run your program with a sanitizer that
| does that for you. It's a solved problem for a case where
| performance doesn't matter. The thing is that it does.
| skitter wrote:
| That would be great if it was possible, but how do you
| specify & implement sensible behavior for this:
| void foo(int *a, int b) { a[b] = 1}
|
| At runtime there is no information about whether that
| write is in bounds and no way to prevent this from
| corrupting arbitrary data unless you compile for
| something like CHERI.
| chongli wrote:
| Those things aren't up to the language, they're up to
| hardware. C is a portable language that runs on many
| different platforms. Some platforms might have protected
| memory and trap on out of bounds memory access. Other
| platforms have a single, flat address space where out of
| bounds memory access is not an error, it just reads
| whatever is there since your program has full access to
| all memory.
|
| The same goes for integer overflow. Some platforms use
| 1's complement signed integers, some platforms use 2's
| complement. Signed overflow would simply give different
| answers on these platforms. The standards committee long
| ago decided that there's no sensible answer to give which
| covers all cases, so they declared it undefined behaviour
| which allows compilers to assume it'll never happen in
| practice and make lots of optimizations.
|
| Forcing signed overflow to have a defined behaviour means
| forcing every single signed arithmetic operation through
| this path, removing the ability for compilers to combine,
| reorder, or elide operations. This makes a lot of
| optimizations impossible.
| johnny22 wrote:
| doesn't C force 2s complement now? If so, one less thing
| to worry about.
| adrian_b wrote:
| The problem is that here is a vicious circle.
|
| Most old computer architectures had a much more complete
| set of hardware exceptions, including cases like integer
| overflow or out-of-bounds access.
|
| In modern superscalar pipelined CPUs, implementing all
| the desirable hardware exceptions without reducing the
| performance remains possible (through speculative
| execution), but it is more expensive than in simple CPUs.
|
| Because of that, the hardware designers have taken
| advantage of the popularity gained by languages like C
| and C++ and almost all modern programming languages,
| which no longer specify the behavior for various errors,
| and they omit the required hardware means, to reduce the
| CPU cost, justifying their decision by the existing
| programming language standards.
|
| The correct way to solve this would have been to include
| in all programming language standards well-defined and
| uniform behaviors for all erroneous conditions, which
| would have forced the CPU designers to provide efficient
| means to detect such conditions, like they are forced to
| implement the IEEE standard for floating-point
| arithmetic, despite their desire to provide unreliable
| arithmetic, which is cheaper and which could win
| benchmarks by cheating.
| chongli wrote:
| CPU designers don't like having their hand forced like
| that. If you create a new standard forcing them to add
| extra hardware to their designs, they'll skip your
| standard and target the older one (which has way more
| software marketshare anyway). They will absolutely bend
| over backwards to save a few cycles here and a few
| transistors there, just so they can cram in an extra
| feature or claim a better score on some microbenchmark.
| They absolutely do not care at all about making life
| easier for low-level programmers, hardware testers, or
| compiler writers.
| rini17 wrote:
| I don't believe adding simple checks against data already
| present in L1 caches and marked as "unlikely to fail"
| should be so onerous.
| cryptonector wrote:
| Bugs are UB-like in a sense (what's the code going to do?
| well, you'll have to think about it, or try it and see),
| but UB is strictly worse than bugs (different compilers,
| even different versions of the same compiler, can do
| radically different things way beyond the scope of the
| bug).
| AlotOfReading wrote:
| What the standard explicitly calls out as UB is only a
| small subset of actual UB.
|
| While you can certainly classify all UB as "bugs", doing
| so misses the critical differences between UB and other
| categories of bugs. If you have a logic bug for example,
| your program will correctly and consistently do the wrong
| thing. It will continue doing that wrong thing with a
| different compiler, on a different platform today and 10
| years from now. Implementation defined behavior is a bit
| looser, but will still be consistent with any particular
| implementation (which will document the behavior) and
| will only manifest in the code that depends on it. A PR
| inserting one of these "normal" bugs doesn't invalidate
| the entire rest of the program.
|
| UB is different. You can't make assumptions about UB
| because from the point of view of the standard, UB is
| "not C". There are no assumptions to be made, it's just
| all the stuff that doesn't have assigned semantics. And
| since the input is meaningless, so is the entirety of
| whatever the compiler gives you back.
| coliveira wrote:
| > If you have a logic bug for example, your program will
| correctly and consistently do the wrong thing.
|
| Not correct. Bugs can occur differently in different
| architectures, even in high level languages. UB is just a
| kind of bug whose effect depends on how the compiler
| behaves, so you have to be careful to test your code on
| different compiler settings. This is nothing new on
| programming languages, it is only made explicit in the C
| standard. Suddenly people started to believe that
| pointing out the obvious source of bugs (UB) in the
| standard is equivalent to let programs misbehave.
| AlotOfReading wrote:
| I'm not sure if you're making a point about "unspecified
| behavior" (where the compiler can choose between multiple
| valid behaviors), but no, a strictly conforming program
| will have the same semantics on different architectures.
| Strictly conforming programs can still have bugs, but
| their nature is completely different than UB because
| that's the point of the standard.
| adgjlsfhk1 wrote:
| > you have to be careful to test your code on different
| compiler settings.
|
| The problem is you have to test your code on compilers
| that don't exist yet with compiler settings that do
| different things from any compiler that ever might exist.
| coliveira wrote:
| This has always been the case. If you write code that has
| UB, new compilers can do something yet undefined, by
| definition.
| chongli wrote:
| _And UB has always been an excuse for compilers to screw over
| programmers in hideous ways_
|
| Your reply was great up until this. Compiler writers aren't
| looking to screw over programmers, they're looking to make
| code faster. UB gives them the ability to make assumptions
| about what is and is not true, at a particular moment in
| time, in order to skip doing unnecessary work at runtime.
|
| By assuming that code is always on the happy path, you can
| cut a lot of corners and skip checks that would otherwise
| greatly slow down the code. Furthermore, these benefits can
| cascade into more and more optimizations. Sometimes you can
| have these large, complicated functions and call graphs get
| optimized down to a handful of inlined instructions.
| Sometimes the speedup can be so dramatic that the entire
| application is unusable without it!
|
| Many of these optimizations would be impossible if compilers
| were forced to assume the opposite: that UB will occur
| whenever possible.
|
| The tool programmers have available to them is compiler
| flags. You can use flags to turn off these assumptions, at
| the cost of losing out on optimizations, if your code needs
| it and you're unable to fix it. But it's better to turn on
| all possible warnings and treat warnings as errors, rather
| than ignoring them, to push yourself to fix the code.
| adgjlsfhk1 wrote:
| the thing that makes UB almost malicious is that it
| propagates inter-procedurally. This makes reasoning about
| code with UB basically impossible which means that you
| should always assume that the compiler is going to screw
| you over if you use it because there is no way to know
| whether it will.
| chongli wrote:
| You should consider a program with undefined behaviour to
| be the equivalent of a mathematical proof that contains
| an unstated contradiction. _Ex falso quodlibet_ : from a
| falsehood anything follows. Also called the principle of
| explosion.
|
| Undefined behaviour renders your entire program
| meaningless. It must be avoided at all costs. Using
| undefined behaviour on purpose is like sticking a fork in
| an electrical socket.
| Kranar wrote:
| It's funny that your original post was an objection to
| how undefined behavior gives license to screw developers
| over, but here you are talking about how undefined
| behavior is like sticking a fork in an electrical socket.
| chongli wrote:
| My original post was an objection to the implied intent
| on the part of compiler writers. An electrical socket
| does not have intent, it's just a hazard that also
| happens to provide enormous benefits to our lifestyles.
|
| I think it's a perfect analogy to undefined behaviour in
| C: enormous benefits but also a hazard to be wary of. A
| lot of people don't understand the benefits, they just
| see the hazard. Throughout this discussion I've been
| trying to clarify that, with perhaps limited success.
| tsegratis wrote:
| But just to be clear @chongli is logical
|
| Think of UB as a probabilistic error. I.e. it is always
| stupid to rely on it
|
| 1. Write code without errors -- sensible 2. Allow
| compilers to assume the absence of errors -- occasionally
| sensible, since it speeds up your program
|
| In defence of UB, for the most part they are things that
| should break your program anyway: stack overflow is never
| correct. So your choice is mostly to fail badly quickly,
| or to fail slowly well
|
| Thanks to google making the UB sanitizers you are free to
| make that choice even in C
| Kranar wrote:
| I'd argue that it's stupid to think that it's stupid to
| rely on UB.
|
| Almost any non-trivial software explicitly relies on
| undefined behavior, including safety critical libraries
| such as cryptographic libraries, the Linux operating
| system has rampant undefined behavior that it makes a
| conscious decision to use. POSIX makes use of undefined
| behavior for shared libraries (it treats functions loaded
| from shared libraries as void*, which is undefined
| behavior).
| Gibbon1 wrote:
| That's not an argument to keep live grenades laying
| around, it's an argument to remove them from the spec.
|
| Like signed int being UB. Define it to have 2 complement
| semantics. Problem solved. I'm sure the nutters trying to
| extend C++ with templates will howl but this is C not
| C++. And seriously C++ is dead man walking at this point.
| pjmlp wrote:
| Until LLVM, GCC, key game engines and GPGPU SDK get
| rewritten into something else, it is going to be Resident
| Evil day for a looong time.
| chongli wrote:
| C23 does make two's complement standard. It also adds
| checked arithmetic so you can safely avoid signed
| overflow.
|
| It does not make signed overflow defined behaviour. This
| would prevent integer operation reordering as an
| optimization, leading to slower code.
| Gibbon1 wrote:
| Yeah but it's reversed signed overflow shouldn't be UB by
| default. You should have to explicitly opt in for that.
|
| The reason of course why they refuse to do that if
| because if that were that case most shops would up and
| ban unsafe signed.
| properparity wrote:
| >This would prevent integer operation reordering as an
| optimization, leading to slower code.
|
| The sane way to address that is to add explicit opt-in
| annotations like 'restrict'.
| #push_optimize(assume_no_integer_overflow) int x =
| a + b; // more performance orientated code
| #pop_optimize // back to sane C
| #push_optimize(assume_no_alias(a, b), assume_stride(a,
| 16), assume_stride(b, 16)) void compute(float *a,
| float *b, int index) { // here the compiler
| can assume a and b do not alias // and it can
| assume it can always load 16 bytes at a time //
| the programmer has made sure it's aligned and padded to
| so with any index // there's always 16 bytes to
| load // so go on, use any vectorized simd
| instruction you want } #pop_optimize //
| back to sane C
| chongli wrote:
| That's a lot uglier and clunkier than just using the
| ckd_add, ckd_mul etc. safe checked arithmetic. Plus if an
| overflow occurs you still get an incorrect result which
| you probably don't want.
|
| Or maybe I'm wrong? Do people actually want overflows to
| occur and incorrect results? If they're willing to
| tolerate incorrect results, why would they also want
| optimizations disabled?
| Gibbon1 wrote:
| The thing is it's ugly in the rare case that absolute
| performance is worth fighting for. And not ugly in the
| majority case where it isn't in the top three important
| things.
| chongli wrote:
| No, GP's proposal is ugly in the majority case. If you're
| going to make signed overflow defined behaviour then
| every time you write: int c = a + b;
|
| You have to assume it will overflow and give an incorrect
| result. So now you need to check everything, everywhere,
| and you don't get any optimizations unless you explicitly
| ask for them with those ugly #push_optimize annotations.
| I completely fail to see how this is an advantage.
|
| The way C works right now, the assumption is that you
| want optimization by default and safety is opt-in. The
| GP's proposal takes away the optimization by default. It
| then makes incorrect results the default, but it does not
| make safety the default. To make safety the default you
| would have to force people to write conditionals all over
| the place to check for the overflows with ckd_add,
| ckd_mul etc. Merely writing: int c = a
| + b;
|
| Does not give you any assurances that your answer will be
| correct.
| pclmulqdq wrote:
| C++ 20 did that too.
| Joker_vD wrote:
| > Undefined behaviour renders your entire program
| meaningless
|
| That's exactly the complaint. Consider that the
| implementations of the standard library sometimes have
| exposed UB: that renders behaviour of _all_ of the
| running code on the system undefined.
|
| Many programmers believe that the fallout of the UB
| could, and therefore should, be limited in scope.
| coliveira wrote:
| To achieve your goal, compilers would have to disable any
| sufficiently powerful optimization. If you write bugs
| (UB), a powerful compiler will eventually catch them and
| generate code that you didn't intend at the beginning.
| However, this is not the fault of the compiler or the
| language.
| chongli wrote:
| Compiler writers have already done this. With flags you
| can disable any optimization you like, with all of the
| performance loss that entails. But then people complain
| that their programs are slow.
|
| What people really want is an AI that ignores the code
| they write and just "does what they really meant." But of
| course that's not foolproof either. Every day people ask
| each other to do things and miscommunications occur, with
| the wrong thing being done. I don't really know what to
| say other than "people should be more careful and also
| more forgiving."
| coliveira wrote:
| Exactly. All the hoopla about UB is complaining about how
| compiler optimizations work and the fact that the
| standard committee makes clear (with each new meeting)
| what is considered undefined behavior or not. They should
| instead thank the committee for clarifying this.
| adgjlsfhk1 wrote:
| It is the fault of the language to the extent that the
| purpose of the language is to make it easy to write
| correct programs, and UB makes it really hard (and in
| some cases impossible) to write correct programs.
| coliveira wrote:
| It is just the opposite. UB is a clarification to tell
| programmers what the language considers to be undesired
| behavior. If they didn't say anything, it would be always
| a mystery if a certain construct was allowed or not,
| effectively making it compiler dependent. Compilers would
| also have less avenue for creating optimizations. In the
| next iterations of the C standard we may see more
| constructs classified as UB.
| adgjlsfhk1 wrote:
| That sounds good in theory, but many things that are UB
| in C/C++ are UB because they are really hard to verify at
| compile time which makes them almost impossible to
| program around. Any signed addition in C is potential UB
| unless you have a proof that all numbers that will ever
| be input to the addition won't cause overflow (which is
| made harder because C doesn't define the size of the
| default integer types). Furthermore, no progress is UB
| which means that as a programmer, you have to solve the
| halting problem for your program before knowing whether
| it has a bug.
| jcranmer wrote:
| > many things that are UB in C/C++ are UB because they
| are really hard to verify at compile time which makes
| them almost impossible to program around
|
| The second half of the sentence doesn't follow from the
| first. Take everyone's favorite example, signed integer
| overflow: all you have to do to avoid UB on signed
| integer overflow is check for overflow before doing the
| operation (and C23 _finally_ adds features to do that for
| you).
|
| Taking a step back, the fundamental thing about UB is
| that it is very nearly always a bug in your code (and
| this includes especially integer overflow!). Even if you
| gave well-defined semantics to UB, the semantics you'd
| give would very rarely make the program not buggy.
| Complaining that we can't prove programs free of UB is
| tantamount to complaining that we can't prove programs
| free of bugs.
|
| It actually turns out that UB is actually extremely
| helpful for tools that try to help programmers find bugs
| in their code. Since UB is automatically a bug, any tool
| that finds UB knows that it found a bug; if you give it
| well-defined semantics instead, it's a lot trickier to
| assert that it's a bug. In a real-world example, the
| infamous buffer overflow vulnerability Heartbleed stymied
| most (all?) static analyzers for the simple reason that,
| due to how OpenSSL did memory management, _it wasn 't
| actually undefined behavior by C's definition_. Unsigned
| integer overflow also falls into this bucket--it's very
| hard to distinguish between intentional cases of unsigned
| integer overflow (e.g., hashing algorithms) from
| unintentional cases (e.g., calculating buffer sizes).
| xigoi wrote:
| > all you have to do to avoid UB on signed integer
| overflow is check for overflow before doing the operation
| (and C23 finally adds features to do that for you).
|
| ...making your code practically unreadable, since you
| have to write ckd_add(ckd_add(ckd_mul(a,a),ckd_mul(ckd_mu
| l(2,a),b)),ckd_mul(b,b)) instead of a * a + 2 * a * b + b
| * b.
| chongli wrote:
| That's not the correct syntax for the ckd_ operations.
| They take 3 operands, the first being a pointer to an
| integer where the result should be stored. And they
| return a bool, which you need to check in a conditional.
| If you're just going to throw out the bool and ignore the
| overflows, why bother with checked operations in the
| first place?
| xigoi wrote:
| Yeah, I realize that now. That's even worse. So you'll
| have to write something like int
| aa,twoa,twoab,bb,aaplustwoab,aaplustwoabplusbb;
| if (ckd_mul(a,a,&aa)) { return error; } if
| (ckd_mul(2,a,&twoa)) { return error; } // ...
| if (ckd_add(aaplustwoab,bb,aaplustwoabplusbb)) { return
| error; } return aaplustwoabplusbb;
|
| So ergonomic!
|
| > If you're just going to throw out the bool and ignore
| the overflows, why bother with checked operations in the
| first place?
|
| I'd expect the functions to return the result on success
| and crash on failure. Or better, raise an exception, but
| C doesn't have exceptions...
| chongli wrote:
| Why not just write: bool
| aplusb_sqr(int* c, int a, int b) { return c
| && ckd_add(c, a, b) && ckd_mul(c, *c, *c); }
| xigoi wrote:
| Obviously you could do that in this case, I just wanted
| to come up with a complicated formula.
| c4mpute wrote:
| > all you have to do to avoid UB on signed integer
| overflow is check for overflow before doing the operation
|
| All you have to do is add a check for overflow _that the
| compiler will not throw away because "UB won't happen"_.
| The very thing you want to avoid makes avoiding it very
| hard, and lots of bugs have resulted from compilers
| "optimizing" away such overflow checks.
| chongli wrote:
| This is covered in the article and numerous replies in
| this thread. Use <stdckdint.h>.
| the_why_of_y wrote:
| My complaint here is that it took C more than 30 years
| between defining signed integer overflow as UB and
| providing programmers with standard library facilities to
| check if a signed integer operation would result in
| overflow.
|
| I much prefer Rust's approach to arithmetic, where
| overflow with plain arithmetic operators is defined as a
| bug, and panics on debug-enabled builds, plus special
| operations in the standard library like _wrapping_add_
| and _saturating_add_ for the special cases where overflow
| is expected.
| chongli wrote:
| _My complaint here is that it took C more than 30 years
| ... I much prefer Rust 's approach_
|
| That's an odd complaint. Rust didn't spring forth fully
| formed from the ether, it stands on the shoulders of C
| (and other giants of PL history). 30 years ago you
| couldn't use Rust at all because it didn't exist.
|
| The reason the committee doesn't just radically change C
| in all these nice ways to catch up to Rust is because it
| would be incompatible. Then you wouldn't have fixed C,
| you'd just have two languages: "old C", which all of the
| existing C code in the world is written in, and "new C",
| which nothing is written in. At that point why not just
| start over from scratch, like they did with Rust?
| the_why_of_y wrote:
| Interestingly, the first Ada standard in 1983 defined
| signed integer overflow to raise a CONSTRAINT_ERROR
| exception.
|
| But apparently it lacked unsigned integers with modular
| arithmetic?
|
| http://archive.adaic.com/standards/83lrm/html/lrm-11-01.h
| tml... http://archive.adaic.com/standards/83lrm/html/lrm-
| 03-05.html
|
| The 2012 version is a bit more readable, and has unsigned
| integers:
|
| _For a signed integer type, the exception
| Constraint_Error is raised by the execution of an
| operation that cannot deliver the correct result because
| it is outside the base range of the type. For any integer
| type, Constraint_Error is raised by the operators "/",
| "rem", and "mod" if the right operand is zero._
|
| _For a modular type, if the result of the execution of a
| predefined operator (see 4.5) is outside the base range
| of the type, the result is reduced modulo the modulus of
| the type to a value that is within the base range of the
| type._
|
| http://www.ada-
| auth.org/standards/rm12_w_tc1/html/RM-3-5-4.h...
| chongli wrote:
| See my other comment [1] which addresses the exact things
| you brought up here. Safe checked arithmetic is a new
| standard feature in C23. If no progress were not UB, then
| tons of loop optimizations would be impossible and then
| we couldn't have nice things, like numpy.
|
| [1] https://news.ycombinator.com/item?id=35406554
| coliveira wrote:
| > Any signed addition in C is potential UB unless you
| have a proof that all numbers that will ever be input to
| the addition won't cause overflow
|
| This has always been the case. Standard C has always
| operated with the possibility that addition can overflow.
| The programmer or library writer is responsible to check
| if the used types are large enough. If you want to be
| perfectly sure you need to check for overflow. Making
| this UB has not changed the nature of the issue.
|
| > is made harder because C doesn't define the size of the
| default integer types
|
| They correctly made this implementation defined. But C
| now has different byte sized integer types if you want to
| be sure.
| CJefferson wrote:
| Is the improved performance of C over say Java, or Rust
| (which both have much less undefined behaviour -- Java
| almost none) worth the pain and bugs which have been
| caused by UB?
|
| Honestly, I don't think so, and as computers get more
| powerful and the amount of the world which relies on
| their correct functioning grows, I feel the arguments for
| UB become increasingly difficult to justify.
| chongli wrote:
| I went to look up undefined behaviour in Rust and I got
| this scary warning:
|
| _Warning: The following list is not exhaustive. There is
| no formal model of Rust 's semantics for what is and is
| not allowed in unsafe code, so there may be more behavior
| considered unsafe. The following list is just what we
| know for sure is undefined behavior. Please read the
| Rustonomicon before writing unsafe code._
|
| After the warning was a list of many of the same types of
| things that are undefined behaviour in C. In addition,
| there's a bunch more undefined behaviour related to
| improper usage of the unsafe keyword.
|
| So I don't think you get a free lunch with Rust here.
| What you get is a "safe" playground if you stay within
| the guard rails and avoid using the unsafe keyword. But
| then you are limited to writing programs which can be
| expressed in safe Rust, a proper subset of all programs
| you might want to write.
|
| Furthermore, the lack of a formal specification for Rust
| is one area where it lags behind C, a standardized
| language. All of the undefined behaviour in C is decreed
| and documented by the standard, having been decided by
| the committee. Rust, on the other hand, may have weird
| and unpredictable behaviour that you just have to debug
| yourself, which may or may not be compiler bugs.
| Kranar wrote:
| C does not have a formal specification either. It has a
| standard's document that is written using formal English,
| but it does not provide a formal spec of C's semantics. A
| formal spec of a programming language's semantics would
| entail using a formal semantic model such as operational
| or denotational semantics. Some programming languages do
| specify the formal semantics for the entire language or
| some subset of the language but C is not one of them.
|
| Your claim that the C Standard lists all undefined
| behavior is actually false. The C Standard only lists out
| the explicit list of undefined behavior, but it does not
| list out the implicit list of undefined behavior. There
| have been efforts to make just such a list but it's an
| incredibly difficult task.
| CJefferson wrote:
| I agree rust isn't perfect, but I think you underestimate
| the value of "safe" code.
|
| I often write programs that have unsafe code. However,
| the unsafe code is never more than 100 lines, which means
| I have a very small amount of code to reason about --
| Rust users expect (of course, you as a programmer has to
| enforce) that it should be possible to cause UB from safe
| code, so my "safe interface" to my unsafe code ensures my
| code can't cause UB, no matter what I call.
|
| On problem with Rust is generally when you mess up it
| panics -- I think that's better than buffer overflows and
| the like, but still not a good user experience.
|
| This means there is a very small amount of code I have to
| really think about, while in C or C++, basically any
| place x[i] appears (regardless of if x is a pointer or a
| std::vector).
|
| You can of course write safe C code, people do, but it's
| hard, and it only takes one slip up anywhere in your
| program to blow it.
| chongli wrote:
| In one sense, C is the unsafe code block for myriad other
| languages, like Python. Python users don't want to deal
| with undefined behaviour either. They want to write their
| high level code in NumPy or PyTorch and just have
| everything work very fast.
|
| Little do they know: they rely on C for those libraries
| and for things like ATLAS and LAPACK, which implement the
| underlying numerical linear algebra code. Well, it turns
| out that ATLAS relies pretty heavily on optimizing C
| compilers to generate optimal code on many different
| platforms. At the bottom of all this are the many loop
| optimizations included in compilers which, thanks to
| undefined behaviour in the C spec, are able to assume
| that code is always on the happy path.
|
| It also turns out that Rust includes bindings to ATLAS
| and LAPACK. I would imagine at some point people might
| want to write a new linear algebra package in pure Rust.
| I think it'll be quite difficult to match the performance
| of those two in safe Rust, but we'll see.
| jamincan wrote:
| Isn't LAPACK written in Fortran?
| chongli wrote:
| You're right, and ATLAS is as well, but Fortran has
| undefined behaviour [1] for all the same reasons that C
| does.
|
| [1] https://stackoverflow.com/a/57558908
| benj111 wrote:
| My understanding was that they're changing realloc() because
| they previously allowed zero length arrays and because you
| can't tell if this is a zero length array you need to either
| get rid of zero length arrays or change realloc().
|
| So the feature wasn't broken to begin with, it was broken by
| another feature.
| [deleted]
| GuB-42 wrote:
| UB can initiate the rise of zombie velociraptors.
| int n; printf("type 0 to stop the rise of zombie
| velociraptors"); scanf("%d", &n); realloc(pre, n);
| if (n != 0) rise_zombie_velociraptors()
|
| May result in velociraptors raising even if the user enters
| "0".
|
| The reason is that because realloc(pre, 0) is UB, for the
| compiler, it cannot happen, so n can't be 0, so the n != 0 test
| can be optimized out, so, velociraptors.
| a-bit-of-code wrote:
| Is it just me that thinks that the article is a [skilfully
| drafted] joke (or parody or whatever the correct word is)? The
| fact that it has been published close to April 1st raises more
| suspicions.
| still_grokking wrote:
| My interpretation would be rather that the C language is a
| carefully drafted joke or parody.
| brxaf wrote:
| I thought the same initially, but the realloc() parts are
| definitely true.
| PointyFluff wrote:
| [dead]
| Dylan16807 wrote:
| > C23 furthermore gives the compiler license to use an
| unreachable annotation on one code path to justify removing,
| without notice or warning, an entirely different code path that
| is not marked unreachable: see the discussion of puts() in
| Example 1 on page 316 of N3054.9
|
| I don't agree with that description at all. Here's the code:
| 1 if (argc <= 2) 2 unreachable(); 3 else 4
| return printf("%s: we see %s", argv[0], argv[1]); 5 return
| puts("this should never be reached");
|
| The only code path that's "entirely different" is lines 1,4,5 and
| in that case of course you remove a return that's after a return.
|
| And the other valid code path is 1,2,5, which has `puts` after
| `unreachable`.
|
| To need `puts` you have to imagine a code path that gets past the
| "if" without taking either branch?
|
| Maybe the author means something by "code path" that's very
| different from how I interpret it?
|
| I would be pretty surprised if the above code means something
| different from: if (argc <= 2) {
| unreachable(); return puts("this should never be
| reached"); } else { return printf("%s: we see %s",
| argv[0], argv[1]); return puts("this should never be
| reached"); }
| cryptonector wrote:
| There's no problem with this feature. I don't understand TFA's
| problem with it. As a programmer I get to not use
| `unreachable()` if I don't want to, and if I do I'm happy that
| the compiler takes my word for it and does the right thing.
| This is not at all like code elision in UB cases.
|
| The `realloc()` change though...
| wahern wrote:
| I think the point is that if the `argc <= 2` path is
| unreachable, then that means argc is _always_ greater than 2,
| permitting the compiler to optimize the entire block to just:
| return printf("%s: we see %s", argv[0], argv[1]);
|
| IOW, the conditional has been elided. But you're right in that
| the wording of the complaint doesn't match the example. The
| author presumably had in mind some of the more infamous NULL
| pointer-related optimizations, without spending the time to put
| together a properly analogous example.
| dtolnay wrote:
| I interpreted the author's characterization to be about
| something like: 1 if (argc <= 2) 2
| puts("A"); 3 puts("B"); 4 if (argc <= 2)
| 5 unreachable(); 6 else 7 return
| puts("C"); 8 return puts("D");
|
| in which not just lines 4-6,8 go away (as you said) but also
| lines 1-2.
|
| It makes sense to me but I can see why the author would
| characterize this situation as _" license to use an
| unreachable annotation on one code path to justify removing
| an entirely different code path that is not marked
| unreachable"_. In a different world one might expect A to be
| printed "before the UB happens".
| masklinn wrote:
| On the other hand, that has been the behaviour of
| optimising compilers in the face of UBs for years at this
| point, decades maybe. The linux kernel was hit by a deref'
| constraint propagation back in 2009 or so.
|
| This is a behaviour I would absolutely expect from the
| construct, I would even qualify it as "the point".
| alwaysbeconsing wrote:
| One way to look at it (and I am not sure if this is correct,
| but it may be what the essay author meant) is to not treat the
| `unreachable` as affecting the presence of the decision, but
| only the result of the decision. If `unreachable` was replaced
| by a normal statement, we'd have: if (argc <=
| 2) do_something(); else return
| printf("%s: we see %s", argv[0], argv[1]);
|
| So the `return printf` is executed when `argc` is greater than
| 2. If we remove _just the body_ of the first branch:
| if (argc <= 2) ; else return
| printf("%s: we see %s", argv[0], argv[1]);
|
| the same thing holds. And additionally when `argc <= 2`,
| control _will_ move past the `if`.
|
| Under this view, if the `unreachable` won't cause the entire
| removal of the `if`, the compiler will produce the equivalent
| of: if (argc > 2) return
| printf("%s: we see %s", argv[0], argv[1]); return
| puts("this should never be reached")
|
| Again, I don't say this is the correct interpretation, but it
| is one possibility, that would have to be ruled out by other
| parts of the standard.
| Dylan16807 wrote:
| I understand that interpretation, but that's what the end of
| my comment is about. If we treat unreachable as affecting the
| block it's in, but pretend it's not there for control flow,
| then the two versions of the code do different things. That's
| confusing and hard to preserve.
| badrabbit wrote:
| Shouldn't the compiler warn or error on unreachable code?
| codeflo wrote:
| This is not about code that's found to be unreachable through
| static analysis (where compilers might warn), but about a
| manual programmer annotation that claims the code is
| dynamically unreachable even though statically it might look
| otherwise.
| benj111 wrote:
| Why would you want that?
|
| Is it to aid building for multiple targets? For debug
| builds?
| masklinn wrote:
| > Why would you want that?
|
| To aid with optimisation, it basically lets you ask the
| compiler to remove branches, and provide constraints to
| the same.
|
| An implementation might trap in debug code, but given no
| context would be provided you'd likely avoid this and
| would instead use your own wrapper macro to output a
| message of some sort in that case.
| properparity wrote:
| But why put in unreachable? Doesn't make any sense to me.
|
| If a branch is truly not supposed to ever happen, why
| have a branch at all? Just remove that code from the
| source entirely- that helps the optimizer even more,
| because the most optimal code is of course no code at
| all.
| masklinn wrote:
| > But why put in unreachable? Doesn't make any sense to
| me.
|
| Because sometimes you don't have a choice e.g. say you
| have a switch/case, if you don't do anything and none of
| the cases match, then it's equivalent to having an empty
| `default`. But you may want a `default: unreachable()`
| instead, to tell the compiler that it needs no fallback.
|
| > If a branch is truly not supposed to ever happen, why
| have a branch at all? Just remove that code from the
| source entirely- that helps the optimizer even more,
| because the most optimal code is of course no code at
| all.
|
| Except the compiler may compile code with the assumption
| that it needs to handle edge cases you "know" are not
| valid. By providing these branches-which-are-not, you're
| giving the compiler more data to work with. That extra
| data might turn out to be useless, but it might not.
| benj111 wrote:
| But this example isn't adding a constraint. The if
| statement is getting optimised away???
| masklinn wrote:
| It is adding a constraint. The constraint is that argc
| can't be smaller than 2. This is a literal "can't", as
| far as the compiler is concerned it's a logical
| impossibility.
|
| The branch containing the unreachable() obviously gets
| removed but the compiler then propagates the constraint
| (the condition for that illegal branch), and can prune
| any other path where `argc <= 2` upstream and downstream,
| as they are dead code per the constraint.
| ufo wrote:
| It helps optimization. One example is if you have code
| like this: if(condition) {
| error_stuff() abort(); }
| normal_stuff();
|
| If the compiler doesn't know that abort exits the
| program, they have to compile the normal_stuff path under
| the assumption that the error path might have run before
| it. This might result in suboptimal code.
|
| Currently, many compilers support annotations such as
| __attribute__(noreturn) and __builtin_unreachable() to
| manually indicate that a code path is unreachable. C23 is
| now standardizing these features (with a slight tweak to
| the syntax).
| _0ffh wrote:
| You can for example use it to give hints to the compiler
| that allows for optimisations, that it couldn't do
| otherwise.
|
| Described e.g. here https://web.archive.org/web/201605080
| 51118/http://blog.regeh...
|
| Github https://github.com/preames/llvm-assume-hack
| flohofwoe wrote:
| Unreachable is mainly used as an optimization hint. For
| instance if you put an unreachable into the default
| branch of a continuous and non-exhaustive (from the pov
| of the compiler) switch-case statement, the compiler will
| not emit a range check for the jump table lookup.
| Asooka wrote:
| This just shows that "unreachable" is almost impossible to use
| safely. The only safe use of unreachable is if it is
| immediately after an instruction that makes the program stop
| running. It is _not_ for "this cannot happen", because things
| that "cannot happen" happen all the time. If you use
| "unreachable", you're just asking for trouble and it seems the
| compiler authors are happy to oblige.
| josephcsible wrote:
| This couldn't be more wrong. What you say to never use
| unreachable for is one of the most important use cases of
| unreachable. The whole point is to give the optimizer an
| assumption that it can't figure out on its own.
| ternaryoperator wrote:
| This reminds me of a point made by the late Stan Kelly-Bootle,
| who for years wrote the Devil's Advocate column in UNIX Review
| magazine. In the early 1990s, he was discussing Microsoft's new
| C compiler and noted that in the promo material for the new
| compiler, it showed a benchmark for a loop that counted from 1
| to 10,000 then printed "Hello". MS claimed that without
| optimization it took a few milliseconds, after optimization: 0
| ms. A small asterisk explained the optimizer simply removed the
| loop. Kelly-Bootle pointed out, that the only reason a
| developer would write such a loop was to introduce a needed
| delay. Therefore, deleting the loop was not optimizing, but in
| fact pessimizing. And so, it was in fact Microsoft's
| Pessimizing C compiler.
| kzrdude wrote:
| I think it's a practical example of how the C language has
| made a journey to being more high abstraction than it used to
| be, in practice. And how that unsettles those used to the old
| behaviour.
| viraptor wrote:
| Those delay loops are common on microcontrollers and the
| usual solution is to either make the counter volatile or
| insert something opaque to the compiler in the loop body.
|
| It would be of course nice if a warning was produced for that
| specific case: This whole loop was removed - is it really
| what you wanted, or is it a broken delay loop?
| hyperhopper wrote:
| This is not true at all:
|
| I've been many loops that turn into no-ops because all the
| functionality has been refactored out but this fact is hidden
| in function calls.
|
| Sure, this should ideally be surfaced as a lint error, not a
| compiler optimization, but you cannot say that intentional
| delays are the "only" reason.
|
| Also since processing time is variable, using that as a
| method should be extremely heavily
| discouraged/warned/require-opt-in
| codeflo wrote:
| Of course, that's technically incorrect. The way the
| standards are written, the compiler is free to replace the
| program with any other program that has the same (in a
| precisely defined sense) observable behavior (these are the
| famous "as if" formulations in language specs). Heating up
| the CPU is not considered observable behavior.
|
| If someone really just wants a delay, it's easy to either
| (for programs running on normal OSs) call a sleep function,
| or (on tiny embedded systems) add an empty inline assembler
| statement that the compiler can't see through.
| carlmr wrote:
| >Heating up the CPU is not considered observable behavior.
|
| Neither is measuring delays of cached versus non-cached
| instructions. Yet it turns out to be very observable.
| codeflo wrote:
| Of course these things are "observable" in the literal
| sense. And yet, they aren't considered to be observable
| by the memory model of any language spec that I know of.
| Same as CPU power draw, which has been used as a side-
| channel to extract bits of crypto keys, and is very much
| influenced by common optimizations.
|
| Practically, if you need to execute a specific sequence
| of machine instructions in order to prevent side-channel
| attacks, then you have to rely on assembler, compiler
| intrinsics and/or OS support. But that was true way
| before Spectre.
| [deleted]
| JonChesterfield wrote:
| Author is angry but not wrong. Lifting the most damning quote
| from the article as I haven't seen it for a while.
|
| C inventor Dennis Ritchie pointed to several flaws in [ANSI C]
| ... which he said is a licence for the compiler to undertake
| agressive opimisations that are completely legal by the
| committee's rules, but make hash of apparently safe programs; the
| confused attempt to improve optimisation ... spoils the language.
|
| --Dennis Ritchie on the first C standard
| kgbcia wrote:
| I just need built-in string handling
| MatmaRex wrote:
| > As C89 was taking shape, the neurodivergent notion of a "zero-
| length object" was making the rounds
|
| I'm surprised that the authors decided to, and were able to, slip
| in this little euphemism.
| nimish wrote:
| It's still apt, even as someone ostensibly in that category.
|
| It does require some abstract thinking to comprehend sets of
| zero measure, negative measure or complex measure in
| mathematics. A "zero length object" is also encountered pretty
| often in practice:http://docs.autodesk.com/CIV3D/2013/ENU/index
| .html?url=files... and zero-length files come to mind.
|
| The euphemism ends up working out fine, though likely not the
| author's intent.
| blahedo wrote:
| Thanks for pointing this out---when I read the article I
| tripped on that word, thought it odd and not sure what the
| author was trying to say, and moved on, but now that you call
| it out it seems very obviously to be used in just the same way
| that a lot of people used to use the r-slur (and some still
| do).
| [deleted]
| bee_rider wrote:
| I wonder if somewhere along the chain there was an automated
| tool to convert frequently abused mental-health related terms
| like "insane" into something leas hurtful, or something along
| those lines?
|
| I haven't seen widespread use of the word "neurodivergent" as a
| kind of... whatever this is, weirdly euphemistic slur, almost?
| wizzwizz4 wrote:
| > _I wonder if somewhere along the chain there was an
| automated tool to convert frequently abused mental-health
| related terms like "insane" into something leas hurtful, or
| something along those lines?_
|
| And yet again, these Lincoln systems mess up. While giving
| advice to the author can avoid Great Grimsby mistakes, making
| the replacements _automatically_ is an utterly Scunthorpe
| decision, with failures as Slough as they are foreseeable.
|
| If the intent is to be unhurtful, it should need to choose a
| _different word_ to if the intent is to be hurtful! Even our
| most sophisticated automated systems are Milton Keynes at
| determining that kind of thing.
| shagie wrote:
| As an aside, this is one of the spots where GPT does a
| really good job of fixing things. Reword
| the following passage. Change euphemisms to wording that
| has similar meaning though no negative conotations.
| Indicate changed words by putting them in "{{word}}"
| ### Standards are supposed to lead ...
| Why are such requests made? Often because of arithmetic
| bugs. And what is a non-null pointer from malloc(0) good
| for? Absolutely nothing, except shooting yourself in the
| foot. ###
|
| To which it returned: Standards are meant
| to provide the groundwork for a better world by enabling
| code portability. Real standardization necessitates
| {{managing}} a variety of compiler and library
| implementations while maintaining sensible rules. The saga
| of the realloc story demonstrates that C standardization
| does not operate in this fashion any longer.
| As C89 was being crafted, there was a {{unconventional}}
| thought of a "zero-length object": Supporters proposed that
| a non-null pointer to such an object should be given for
| requests for zero-byte allocation. Why do such
| requests come about? Frequently due to miscalculations. And
| what is a non-null pointer from malloc(0) useful for?
| Nothing, apart from {{setting oneself up for failure}}.
|
| It changed (along with some wording around to make the
| changes fit):
|
| heading cats -> managing
|
| neurodivergent -> unconventional
|
| shooting yourself in the foot -> setting oneself up for
| failure
| bee_rider wrote:
| "Unconventional" seems like a bad pick to me, too
| neutral. Clearly the author intends to say something
| negative about zero-length objects. And of course it is
| fine dislike things, it is just a matter of not using
| hurtful language.
| liquidify wrote:
| There is no possible way to have style without the
| potential to bother someone. Just write how you feel. If
| the readers are so offended, they can stop reading. Life
| will go on.
| wizzwizz4 wrote:
| There's no way to say "this thing is rubbish" without the
| potential to bother people who like it. But it's entirely
| possible to say it without pissing off those who don't
| speak, or have motor disabilities, or like Justin Bieber.
| bee_rider wrote:
| There are so many less hurtful words, I can't accept the
| idea that style requires these particular words. I mean
| the sentence is clunky with "neurodivergent" anyway, and
| this unusual use of the word sticks out and is
| distracting. The style is not improved by this pick.
|
| How about "awful" "asinine" or "shit-tastic" instead?
| smsm42 wrote:
| I guess that's be the way to detect if the text has been
| written by the AI - it'd be completely devoid of
| metaphors and cleansed from anything that could possibly
| offend somebody. I wouldn't ever call it a "good job" but
| I guess it's useful.
| xigoi wrote:
| Excuse me, but I'm still offended by the word
| "miscalculations". It implies that calculations can be
| wrong, which dehumanizes people with dyscalculia.
| smsm42 wrote:
| File a report to OpenAI, I'm sure they'll teach it to say
| "calculations that do not exceed certain high standards
| of accuracy" very soon. That's the beauty of it - you can
| run the treadmill on computer speed now.
| bongobingo1 wrote:
| Sorry, `unconventional` is also offensive.
|
| > Reword the following passage. Change euphemisms to
| wording that has similar meaning though no negative
| conotations. Indicate changed words by putting them in
| "{{word}}"
|
| >
|
| > The couples were of unconventional make up, including
| male and female pairings, male and male pairings as well
| as female and female.
|
| >> The couples had non-traditional compositions, with
| pairings consisting of men and women, men and men, and
| women and women.
|
| So's male and female apparently.
| chongli wrote:
| _I haven't seen widespread use of the word "neurodivergent"
| as a kind of... whatever this is, weirdly euphemistic slur,
| almost?_
|
| It's a continuation of the euphemism treadmill [1]. It won't
| be long before "neurodivergent" is considered politically
| incorrect and a new term is invented to replace it.
|
| [1] https://www.urbandictionary.com/define.php?term=Euphemism
| %20...
| peterashford wrote:
| Yeah, that's pretty gross, tbh
| Dwedit wrote:
| Did we ever legalize type punning?
| JonChesterfield wrote:
| We have "pointer provenance" which allows license to track type
| punning across more of your program than ever before in order
| to delete more parts of it with no diagnostic required.
|
| For bonus marks, int and atomic_int are unrelated types, and
| simd vector types aren't a thing, so enjoy the unfixable
| performance cost of choosing C.
| kzrdude wrote:
| Through union yes, I think
| cryptonector wrote:
| Asking the real questions. Without looking I'm willing to bet
| the answer is "no, and stop asking".
| ChancyChance wrote:
| Is the world finally realizing that "a + b" actually returns two
| values: pass/fail and the value if pass?
|
| "a + b = c;" is a fundamentally flawed operation from a computer
| architecture perspective.
| notfed wrote:
| It's a flaw that has a pretty good tradeoff: unparalleled
| readability.
| ChancyChance wrote:
| It depends. If you want to study maths, yes. If you want to
| be a programmer:
|
| [status, value] = add(a, b);
|
| Is much more unparalleled-ly (?) readable from the
| perspective of how a computer actually operates. In reality,
| this:
|
| uint c = (uint)a + (uint)b; // (to make that other guy happy)
|
| is really:
|
| c = (a + b) % (sizeof(uint));
|
| in "C", which is less readable but far more accurate.
| ChancyChance wrote:
| That's 2^sizeof(uint)
| Arch-TK wrote:
| There is actually another option.
|
| A more sophisticated type system.
|
| Let's say you had some pseudocode like this:
| let a = 5 let b = 12 let c = a + b
|
| The type of a would be Integer[5..5], the type of b would be
| Integer[12..12], the type of c would therefore be
| Integer[17..17]. In a more complex example:
| def foo(a: Integer[0..10], b: Integer[0..10]):
| return a + b
|
| The return type of this function would be Integer[0..20].
|
| This kind of type system can solve a number of issues, all but
| division by zero (which would probably still have to be solved
| with some kind of optional type).
|
| If type inference dictates that the upper range of an integer
| would be too large to physically store in a machine data type,
| then you either resort to bignums or you make it a compilation
| error. By adding modular and saturating integer types you can
| handle situations where you want special integer behaviours. By
| explicitly casting (with the operation returning an optional)
| you can handle situations where you want to bound the range.
| This drastically simplifies a lot of code by removing explicit
| bounds checks in all places except where they are absolutely
| necessary. If for some reason you care about the space or
| computational efficiency of the underlying machine type, you
| can have additional annotations (like C's
| u?int_(least|fast)[0-9]+_t). If you absolutely must map to a
| machine type (this is usually misguided, unless you are dealing
| with existing C interfaces, for which such a language can
| provide special types) you can have more annotations.
|
| Ada has something resembling this. I believe there are some
| other languages that implement similar features. I believe this
| sort of thing has a name, but I am not great with remembering
| the names of things.
|
| Hopefully this is some food for thought.
| still_grokking wrote:
| > I believe this sort of thing has a name, [...]
|
| https://en.wikipedia.org/wiki/Refinement_type
|
| But the concept is just a little bit over 30 years old. So
| don't expect it shows up in most mainstream languages before
| the end of the next 20 years, and don't expect it to come to
| the C languages ever.
|
| Meanwhile in mainstream ML-land:
|
| https://github.com/Iltotore/iron
|
| (Or for the older version of the language:
| https://github.com/fthomas/refined)
|
| (Please also note that for this feature both versions don't
| need language support at all but are "just" libraries, as the
| language is powerful enough to express all kinds of type
| level / compile time computations in general.)
| [deleted]
| JonChesterfield wrote:
| Compilers do this sort of range tracking anyway. At least
| within a function. It's useful for loop optimisations.
| im3w1l wrote:
| I think the issue with this is that the worst-case bounds
| normally grow much faster than the actual values. And it can
| be easy to see for the programmer that the values can't
| actually grow that much because a is only big when b is small
| or some property like that, but then you have to convince the
| compiler of the same. I might be misremembering though.
| codethief wrote:
| > because a is only big when b is small or some property
| like that
|
| Exactly, the expressiveness of the type system then
| (typically) becomes the obstacle: How do you express that a
| and b could each reach INT_MAX but their sum never exceeds
| INT_MAX?
| Arch-TK wrote:
| Those kinds of assumptions are where you explicitly cast
| to a smaller ranged type with the option of an error if
| the sum does exceed a limit. The point of this type
| system is not to be able to fully encode every possible
| interaction between numbers in a system, but rather to
| remove unnecessary bounds checking in a bunch of cases
| and make it explicit in the few cases where you ARE
| actually making an assumption.
| wizzwizz4 wrote:
| > _but then you have to convince the compiler of the same._
|
| In conventional parlance, this is known as "handling
| overflow".
| [deleted]
| c4mpute wrote:
| First, you might have meant c = a+b;
|
| The other way isn't really definable as an assignment
| mathematically.
|
| And there is a lot more to it than just pass/fail. First, an
| addition doesn't fail, from a computer architecture
| perspective, the addition will always succeed, the only thing
| that could fail (in all the usual architectures) are possible
| memory fetch and store operations when not strictly dealing in
| register or immediate operands. Second, there is no fail flag.
| There is a overflow flag, an underflow flag, a zero flag, a
| sign and a few more that are irrelevant here. Any of overflow,
| underflow, zero or sign might mean that the operation "failed"
| depending on the types of your operand. Where the processor
| doesn't know anything about the type, so there won't be a
| straightforward 'fail' flag in any case. Only the library or
| compiler can use type information such as (un)signedness,
| bignum-ness, nonzeroness, desired wraparound (for modular
| types) and other possible types together with aforementioned
| flags to decide if that addition might have failed.
|
| So nothing is fundamentally flawed, what you are describing is
| just insufficiently complex (because there is no fail flag,
| just a ton of other flags) or overly complex (because uint32_t
| c = a + b is modular 2^32 arithmetics and cannot fail).
| khazhoux wrote:
| > First, you might have meant c = a+b;
|
| > The other way isn't really definable as an assignment
| mathematically.
|
| This correction is condescending and unnecessary. Unless the
| person had never written a single line of code in their life,
| then they would obviously know "a+b" is not a modifiable
| lvalue.
|
| And the point about pass/fail was also obviously not mean to
| capture the full complexity of the flags set by a CPU
| operation. It was very clearly a statement about how basic
| addition does not behave in computers the way it does on
| paper -- as simple as that.
|
| From HN guidelines: "Please respond to the strongest
| plausible interpretation of what someone says, not a weaker
| one that's easier to criticize."
| c4mpute wrote:
| You might be right on the first point. Edit: actually, you
| might not be. There are languages with compound lvalues and
| CPU architectures with multiple result registers (x86 being
| the best-known example). E.g. you can do "(result, flags,
| err) = do_stuff(a, b, c)" in Go, and x86 DIV storing
| different parts of the division result in different
| registers:
| https://c9x.me/x86/html/file_module_x86_id_72.html And
| generally with common CPU architectures, flags are another
| such result register that is always written, such that any
| operation like c := a+b is actually something like (c,
| flags) := a+b. And for stuff like multiplication, there is
| actually the notion of two result registers being the
| higher and lower part of the resulting operation, like (a *
| 2^32 + b) = c * d (see x86 MUL). Therefore some precision
| in language is necessary for the discussion (and yes, the
| different meanings of ==, =, := in various languages and
| mathematics are also confusing, even to me ;).
|
| I do strongly disagree on the second one about pass/fail.
| This kind of nitpicking is necessary here, because the
| discussion is about a standard intended to precisely
| describe such operations, and how the underlying hardware
| might be utilized to execute them. Being imprecise in this
| context is dangerous, wrong, problematic and leads to the
| whole point of the discussion being lost in a sea of
| handwaving.
| JonChesterfield wrote:
| > The other way isn't really definable as an assignment
| mathematically.
|
| It's an equality sign. See also, := and unification.
| antiquark wrote:
| C reached its zenith in C90, and saw a few good ideas in C99.
| Everything since has been wankery from people who either are
| bored, or have a severe case of C++-envy.
| pjmlp wrote:
| Even then it was already outdated when compared against
| languages like Modula-2 and Object Pascal, it got lucky to ride
| into the waves of UNIX adoption.
| GianFabien wrote:
| Maybe I'm being dense. To me it appears that the standards are
| telling compiler writers what should be done. In doing so the
| compilers will become ever more complex and thus bug-prone.
|
| I learnt C back when K&R (first edition) was the reference. Ok,
| it was hardly much more than a universal assembler to make every
| computer look like a PDP-11. In my experience C is the language
| to use when you want to be close to the metal. For the rest I use
| which ever high-level language/environment is best suited.
| Admittedly some FFI are a pain to use, but once you get the
| boilerplate bedded down your much higher level language gets the
| coordination done.
| RobotToaster wrote:
| >To me it appears that the standards are telling compiler
| writers what should be done.
|
| Isn't that what standards are supposed to do?
| JonChesterfield wrote:
| Traditionally they recorded existing practice and gently
| encouraged diverging implementations to converge.
|
| The alternative approach is to invent things by committee,
| hopefully with some implementers watching, and hope for the
| best.
| juunpp wrote:
| > The ckd_* macros steer a refreshingly sane path around
| arithmetic pitfalls including C's "usual arithmetic conversions."
|
| A 7 letter function to add two numbers and that returns a
| boolean... not entirely sure I'd call that 'sane'.
| ludocode wrote:
| I'd prefer if it were more letters. It bothers me when API
| designers omit random letters just to save a few keystrokes.
| These are particularly egregious because I keep forgetting
| which letters they kept. Is it "chk"? or "ckd"? or "chd"? or
| something else?
|
| I wrote a portability library that wraps these with compiler
| intrinsic and standard C fallbacks. I chose to spell out the
| full word in addition to making the type explicit. It's a lot
| more verbose of course but a lot clearer to read:
|
| https://github.com/ludocode/ghost/blob/develop/include/ghost...
| goatlover wrote:
| A saner language would handle the conversion for you so it
| would work with just the normal math operators.
| masklinn wrote:
| How would that work for the largest type supported by the
| platform?
| pjmlp wrote:
| A panic would be thrown, like in memory safe system
| programming languages, those that were in use outside
| Bell Labs and unfortunely lost to UNIX.
| pjmlp wrote:
| And zero focus on improving the root causes of memory corruption
| due to strings and array indexing errors.
|
| The security world will keep burning it seems.
| heywhatupboys wrote:
| > The security world will keep burning it seems.
|
| There is no alternative to network protocols and IPC that the
| stringtypes C has. You get a length and a byte array. If you
| trust the user, you can assume length is correct. Otherwise no.
| pjmlp wrote:
| Sure there are, as proven by distributed networking stacks
| not written in C.
|
| In fact Ethernet early days goes back to Mesa not C.
|
| UNIX did not invent networking, networking predates UNIX for
| at least a decade.
| RustyRussell wrote:
| Frankly, the C standards ctte went off the deep end when they
| effectively banned NULL to memset etc (obv with zero length).
|
| Not because these functions couldn't handle it, but because this
| assertion simplifies optimizations _elsewhere_.
|
| This has required adding extra checks in my code, found mainly by
| trial and error, and has made it less readable _and_ less
| optimal.
|
| Finally, the checked arithmetic operations returning _false_ on
| success is a horror show. Fortunately it will be found on the
| first time the code is run, but that 's a damnably low bar :(
| Kamq wrote:
| > Finally, the checked arithmetic operations returning false on
| success
|
| That's what got you? C functions returning error flags (with
| zero meaning no error) isn't exactly new.
| Dwedit wrote:
| Replace memset with a macro, that's the C way.
| notfed wrote:
| Isn't the return value just a carry bit?
| spc476 wrote:
| Not every CPU C runs on has a carry bit. MIPS, SPARC, RISC-V,
| all don't have the concept of a "carry bit."
| ericpauley wrote:
| > Finally, the checked arithmetic operations returning false on
| success is a horror show.
|
| This seems in line with C conventions? Generally a 0 return
| code means success.
| wruza wrote:
| With int statuses, not with bools. It's just a twisted logic
| in return value you have to deal with in your head.
|
| "If checked operation has a status, then it failed." - ok
|
| "If checked operation [is true], then it failed." - wat
| SAI_Peregrinus wrote:
| The checked operations ask "did an error occur?". If it's
| false, then the check passed and no error occurred. If it's
| true, then the check indicated an error.
| masklinn wrote:
| > With int statuses, not with bools
|
| Which C historically did not have, so int played that role.
| The function is the same, and the existing idioms remain.
| wruza wrote:
| I find it strange to introduce real bools (which these
| macros return according to their official signatures) and
| then to assign them a meaning of a still-nonexistent but
| widely used C type. At least my C intuition stumbles upon
| that immediately, no matter how long I think about it.
|
| Ah, anyway, standard C/libc is basically a lost cause. It
| can't get any worse, since you have to refer to a manual
| at every call to not step on a landmine.
| blippage wrote:
| #embed is what I really want. And separators.
|
| > Standard C advances slowly
|
| They're not joking, either. C is conservative to a fault, I
| think.
| AlbertoGP wrote:
| > #embed is what I really want. And separators.
|
| If you want to try out those features now, I made a pre-
| processor that translates that into standard C99:
|
| https://sentido-labs.com/en/library/cedro/202106171400/use-e...
|
| https://sentido-labs.com/en/library/cedro/202106171400/#numb...
|
| It includes a cc wrapper called cedrocc that you can use as a
| drop-in replacement:
|
| https://sentido-labs.com/en/library/cedro/202106171400/#cedr...
| solidsnack9000 wrote:
| "Looking forward, marijuana legalization will surely beget
| notions such as fractional-, imaginary-, and negative-length
| objects, each with as much potential for mayhem as zero-length
| objects."
|
| It's a funny thing to say.
| garbagecoder wrote:
| >negative-length
|
| _nervous Minkowski laughter_
| firstlink wrote:
| Rust seems to do fine with ZSTs somehow.
| kibwen wrote:
| ZSTs work splendidly in Safe Rust, but you do need to
| consider them if you're writing unsafe generic code. Here's
| the relevant section of the Rustonomicon: https://doc.rust-
| lang.org/nomicon/exotic-sizes.html#zero-siz... .
| andrepd wrote:
| > All C standards from C89 onward have permitted compilers to
| delete code paths containing undefined operations--which
| compilers merrily do, much to the surprise and outrage of
| coders.16 C23 introduces a new mechanism for astonishing elision:
| By marking a code path with the new unreachable annotation,12 the
| programmer assures the compiler that control will never reach it
| and thereby explicitly invites the compiler to elide the marked
| path.
|
| I don't agree with this in the slightest. I'm not "outraged" by
| undefined behaviour, it's a _fundamental tool_ for writing
| performant code. Ensuring that dereferencing a null pointer or
| accessing outside the bounds of an array is undefined behaviour
| is what lets the compiler not emit a branch on every array access
| and pointer dereference.
|
| Furthermore, I really don't understand the outrage that there is
| another _explicit_ tool to achieve behaviour the author may or
| may not consider harmful. If it 's an explicit macro, it's not a
| tarpit!
| GuB-42 wrote:
| I actually like unreachable() a lot. What it does is that it
| invokes undefined behavior, that's all.
|
| It does nothing trickier than any other kind of UB. In fact, I
| could implement unreachable() like this: void unreachable() {
| (char *)0 = 1; }.
|
| Standardizing it however gives interesting options for compilers
| and tool writers. The best use I can find is to bound the values
| of the argument of a function. For example, if we have "void
| foo(int a) { if (a <= 0) unreachable(); }, it tells the compiler
| that a will always be >0 and it will optimize accordingly, but it
| can also be used in debug builds to trigger a crash, and static
| analyzers can use that to issue warnings if, for example, we call
| foo(0). The advantage of using unreachable() instead of any other
| UB is that the intention is clear.
| lprib wrote:
| Using `unreachable()` instead of `assert()` for your
| preconditions without profiling first is just pre-loading the
| gun to shoot yourself in the foot in the future. When those
| preconditions are inevitably violated at some point, you will
| get random UB corruption rather than simply aborting as is the
| case for assert.
| lionkor wrote:
| Respectfully, you would already be doing this in any C
| codebase, with `assert()`, right? We are all checking our
| preconditions with assert... right?
| GuB-42 wrote:
| AFAIK, assert() is not undefined behavior, so it can't be
| used for optimization. It is either implementation-defined in
| debug mode, or does nothing in release mode.
|
| For example: assert(a >= 0); if (a < 0)
| printf("a is negative");
|
| In release mode, assert() will be gone, so the if/printf()
| will stay. If we used "if (a < 0) unreachable();" instead of
| assert(), it would optimize away both lines.
| pornel wrote:
| NDEBUG makes these checks disappear, so that's not an option
| for checks that are supposed to stay in the program.
| ptx wrote:
| > _What it does is that it invokes undefined behavior, that 's
| all. [...] it can also be used in debug builds to trigger a
| crash_
|
| How can it be used to trigger a crash (a specific behavior) if
| the behavior it invokes is undefined? Are you saying it would
| be defined differently for debug builds so that it doesn't
| invoke undefined behavior?
| quintussss wrote:
| I always wonder how much these new C standards use, as C is now
| mostly used in areas where one is severely limited when it comes
| to compiler choice. Where I work, we use GCC 6.2 and iso9899:1990
| (C90). If we were able to use a modern compiler, we would
| probably just use C++.
| eternalban wrote:
| C is a very large language masquerading as a small language.
| MichaelZuo wrote:
| What does that make C++?
| eternalban wrote:
| https://upload.wikimedia.org/wikipedia/commons/a/a7/Frankens.
| ..
|
| (don't get me wrong. love C. but in an innocent sort of way,
| like a teenager quite unaware of betrayals, heartbreak, love
| triangles, or UB, UsB, and IDB..)
| pjmlp wrote:
| Only because many keep worshiping K&R C, ignoring what is the
| actual C that modern compilers support.
| GuB-42 wrote:
| > C178 purports to be a bug-fix revision of C11. Does the word
| "toto" on page 1 indicate (a) the editor's musical tastes; (b)
| that nobody bothered to spell-check the document; (c) that we're
| not in Kansas anymore; or (d) none of the above?
|
| As a french guy I'd go with (d).
|
| I've often seen "toto" used as a placeholder name, sometimes
| followed by "titi", "tata", "tutu", I have even used it myself.
| It is similar to "foo", "bar", "baz". I don't know if it is
| specific to France, of French speaking countries, but it is
| definitely a thing here.
| rahen wrote:
| Most likely toto as the French for foobar.
|
| Jens Gustedt is part of the C comity and participated to C23.
| He also works for INRIA in France:
| https://en.wikipedia.org/wiki/French_Institute_for_Research_...
| layer8 wrote:
| While the situation with realloc() is unfortunate, it is also not
| difficult to write a wrapper that does what the author wants.
| I've done that before, because it has long been known that not
| all realloc() implementations conform to the (prior) C standard.
| One can furthermore assume that existing implementations won't
| change their behavior just because C23 made it UB.
| p0nce wrote:
| Honestly I'm happy the C standard now address how realloc
| behaves in detail. It was already hard before, and now it's
| documented.
| __s wrote:
| tl;dr `realloc(p, 0)` is slated to be undefined behavior in C23,
| whereas it's been somewhat implementation defined until now, with
| recommendation being realloc(p, 0) is equivalent to free(p)
|
| Seems a bit tone deaf to create new undefined behavior in memory
| handling, especially when a sane default behavior seems to be de
| facto
|
| I've used that free-on-0 behavior myself. Unfortunately the code
| that uses this will often have 0 be a length variable, so hard to
| grep for this. Ideally musl/glibc will both stick to that
| undefined behavior being free & gcc/clang won't go about making
| this something to point their optimizations at
|
| Lest we have to stop using realloc outside of a safe_realloc
| wrapper static void *safe_realloc(void *p, size_t
| newlen) { if (newlen == 0) { free(p); return NULL;
| } return realloc(p, newlen); }
|
| What got this whole thing weird is that C doesn't like zero sized
| objects, but implementations were allowed to return a unique
| pointer for a zero sized allocation. Which then raises the matter
| that being portable there require freeing that reserved chunk for
| non-free implementations. In theory this reservation code could
| be more efficient when code frequently reallocates between 0 &
| some small value. & there was uncertainty because NULL is a way
| to say allocation failure, but then if one did a NULL check on
| realloc's return value they also had to check that the size was
| non-zero
| wahern wrote:
| > Seems a bit tone deaf to create new undefined behavior in
| memory handling,
|
| It's only tone deaf to people who understand "undefined
| behavior" as an epithet or as synonymous with giving a license
| to compilers to screw you over. The term doesn't have either of
| those meaning to those on the C committee. In fact, one of the
| explicit rationales for the proposal is that, "Classifying a
| call to realloc with a size of 0 as undefined behavior would
| allow POSIX to define the otherwise undefined behavior however
| they please." https://www.open-
| std.org/jtc1/sc22/wg14/www/docs/n2464.pdf
|
| > especially when a sane default behavior seems to be de facto
|
| The above proposal, N2464, gives the behavior for AIX, zOS, BSD
| (unspecified), MSVC (crt unspecified), and glibc. They _each_
| have different behaviors.
|
| Why they chose to finally make it undefined (it was marked as
| obsolescent for a long time) rather than keep it as
| implementation-defined, I don't know. Perhaps because it 1)
| simplifies the standard, and 2) by making it undefined it
| suggests compilers should start warning about it--despite all
| this time neither has there arisen a consensus among
| implementations about the best behavior, nor are programmers
| aware that the behavior actually varies widely.
|
| EDIT: The draft SUSv5/POSIX-202x standard has indeed directly
| addressed this issue. See, e.g.,
| https://www.austingroupbugs.net/view.php?id=374 The most recent
| draft included the following addition to RETURN VALUE:
| OB If size is 0, OB CX or either nelem or elsize is
| 0, OB either: OB * A null pointer
| shall be returned OB CX and, if ptr is not a null
| pointer, errno shall be set to [EINVAL]. OB * A
| pointer to the allocated space shall be returned, and the
| memory object pointed to by ptr shall be freed.
| The application shall ensure that the pointer is not used to
| access an object.
|
| CX marks points of divergence with C17. The first CX is because
| of the addition of reallocarray, absent from C17. The second is
| because POSIX will mandate the setting of EINVAL if NULL is
| returned.
| peppermint_gum wrote:
| >It's only tone deaf to people who understand "undefined
| behavior" as an epithet or as synonymous with giving a
| license to compilers to screw you over. The term doesn't have
| either of those meaning to those on the C committee.
|
| It's unfortunate but not surprising that the C committee
| isn't aware of the problems with the undefined behavior.
|
| In fact, after I started reading WG14 meetings minutes, I
| completely lost faith that any of the serious problems with
| the standard will ever get fixed.
| coliveira wrote:
| This is not a problem with the committee and is not a
| problem with compiler writers. The committee is only
| marking certain behaviors as UB. Compilers can do what they
| think is more sensible in these situations. And compiler
| writers are not forcing you to accept these extreme
| optimizations. You always have the option of disabling
| optimizations and accept that your code has bugs (UB). You
| just need to test the code you write under different
| compiler settings, similarly to how you test code in
| different environments.
| __s wrote:
| "just disable optimizations" is not a solution unless the
| compiler allows enough fine grained control where that
| solution is `-ffree-zero-sized-realloc`
| adgjlsfhk1 wrote:
| > It's only tone deaf to people who understand "undefined
| behavior" as an epithet or as synonymous with giving a
| license to compilers to screw you over.
|
| Unfortunately, this is the correct understanding of UB.
| JoshTriplett wrote:
| realloc to 0 size being free is useful in particular because it
| means a function pointer to realloc is a complete memory
| allocator: call realloc with pointer NULL to get malloc, and
| call realloc with size 0 to get free.
| moremetadata wrote:
| > What got this whole thing weird is that C doesn't like zero
| sized objects, but implementations were allowed to return a
| unique pointer for a zero sized allocation.
|
| Some of the windows API's work like this, so how much is
| pressure from MS?
|
| Same discussion from 7 months ago.
|
| https://news.ycombinator.com/item?id=32352965
|
| https://thephd.dev/c23-is-coming-here-is-what-is-on-the-menu...
|
| https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2897.htm
|
| Pattern matching ram for variables/objects whilst they exist
| even if zero'ed or prefilled with a value doesnt give perfect
| security. Random values would make it harder to work out the
| variable/object.
| tzs wrote:
| > Pointers to free'd memory are akin to uninitialized pointers,
| so free(p) followed by if (p==q) is an instrument of arson
|
| What's the reason for this?
| coliveira wrote:
| Using a freed pointer is incorrect behavior, a bug in shorter
| terms. If you do anything with a freed pointer (other than
| assigning new memory), you're inviting all kinds of bugs
| (independent of what the compiler might be doing with your
| code).
| xigoi wrote:
| Obviously _dereferencing_ a freed pointer is incorrect
| behavior, but what harm is there in using its numerical
| value?
| tedunangst wrote:
| 900 years ago there was a CPU which stored pointers in special
| registers and trapped if you loaded a pointer with an invalid
| segment. And so loading the pointer into a register to compare
| it would crash.
| Dylan16807 wrote:
| I can't tell you exactly why but it's consistent with just
| about everything else involving p being undefined, and the
| result of the comparison would be useless anyway.
| tzs wrote:
| Why would the comparison be useless?
|
| I can imagine situations where a pointer q might sometimes be
| a copy of pointer p and sometimes might point to something
| else, and the code wants to free q if and only if it is not a
| copy of p (because p has been free'd earlier).
| Dylan16807 wrote:
| Because a new object can have the same address as p, so
| comparing to p isn't enough to tell you if you have a copy
| of p or a live pointer to something else.
| jcranmer wrote:
| Given the following code: void *p =
| malloc(N); do_random_stuff(p); void *q =
| malloc(N);
|
| With this rule, the compiler can conclude that p and q cannot
| alias, even if it doesn't have body of do_random_stuff. Without
| it, it would first have to prove that p is never freed before
| calling q, which is basically impossible (moving the body of
| intervening code into a different file, for example, would do
| the trick).
| firstlink wrote:
| > and that such changes may impose themselves on old code without
| recompilation when dynamically linked libraries are upgraded.
|
| All I can do is laugh. This is what the dynamic linker fanatics
| wanted. This is what they explicitly advocate for to this day.
| Share and enjoy!!
| bayindirh wrote:
| I'd rather have small binaries and memory efficient systems
| instead of huge blobs having their own complete disconnected
| environments with non-coherent behavior on the same situation.
| Also, wasting tons of memory while at it.
|
| If I have something that critical, I can always statically
| compile.
| coliveira wrote:
| Exactly! Shared libraries mean that new code with modified
| behavior can and will be called when made available,
| independent of how the original code was compiled. It is
| interesting that people come out to complain about this obvious
| behavior.
| hermitdev wrote:
| The problem isn't changing implementation. This is expected
| with shared libs. The problem is changing the contract of the
| function and then expecting it to be drop in compatible. It's
| not. It _should_ be treated as a breaking ABI change, because
| the old behavior and new behavior are not compatible, yet
| it's being masqueraded as such. It's quite literally the same
| behavior/attitude behind the "w" vs "wt" change that led to
| aCropolyse.
| AshamedCaptain wrote:
| I really don't think anyone could possibly want the _specified
| behavior_ of a function changing below their feet.
|
| However, the author is unlikely to be correct here. E.g., to
| this day, glibc contains _multiple implementations of memcpy_
| just to satisfy those executables that depend on the older,
| memmove-like behavior that was once part of the unspecified
| behavior of glibc. The only way to get the dynamic linker to
| choose one of the newer versions is to, well, rebuild the
| executable. It is inconceivable that glibc would not use symbol
| versioning with an actual specification change.
|
| The behavior is practically the same as with static linking,
| and you still get the benefits of dynamic linking.
| throwaway892238 wrote:
| People who don't understand dynamic linking are doomed to re-
| implement it, poorly.
| tedunangst wrote:
| It's a really weird complaint. The standard specifies that it's
| now undefined behavior. That imposes zero requirements to
| change the library. Whatever it is the library was doing, it's
| one possible undefined behavior.
| cryptonector wrote:
| The `realloc()` change calls for pitchforks.
| otabdeveloper4 wrote:
| Hopefully not literally. (But C23 is exactly the kind of
| programming language you expect to do that.)
___________________________________________________________________
(page generated 2023-04-02 23:02 UTC)