[HN Gopher] Making memcpy(NULL, NULL, 0) well-defined
       ___________________________________________________________________
        
       Making memcpy(NULL, NULL, 0) well-defined
        
       Author : gslin
       Score  : 188 points
       Date   : 2024-12-11 12:19 UTC (10 hours ago)
        
 (HTM) web link (developers.redhat.com)
 (TXT) w3m dump (developers.redhat.com)
        
       | voidUpdate wrote:
       | I feel like I've misunderstood something here... shouldn't
       | memcpy(anything, anything, 0) just do nothing, because you're
       | copying 0 bytes?
        
         | mjg59 wrote:
         | That's a reasonable intuitive interpretation of how it _should_
         | behave, but according to the spec it 's undefined behaviour and
         | compilers have a great degree of freedom in what happens as a
         | result.
        
           | voidUpdate wrote:
           | Why didn't they just... define it, back when they wrote it?
        
             | frabert wrote:
             | Every time they leave something undefined, they do so to
             | leave implementations free to use the underlying platform's
             | default behavior, and to allow compilers to use it as an
             | optimization point
        
               | jcelerier wrote:
               | Here it's more that it allows to assume that this is
               | never the case, thus no need to have an additional check
               | in it I assume ?
        
               | lucozade wrote:
               | > time they leave something undefined, they do so to
               | leave implementations free to use the underlying
               | platform's default behavior
               | 
               | That's implementation defined (more or less) ie teh
               | compiler can do whatever makes mst sense for its
               | implementation.
               | 
               | Undefined means (more or less) that the compiler can
               | assume the behaviour never happens so can apply
               | transforms without taking it into account.
               | 
               | > to allow compilers to use it as an optimization point
               | 
               | That's the main advantage of undefined behaviour ie if
               | you can ignore the usage, you may be able to apply
               | optimisations that you couldn't if you had to take it
               | into account. In the article, for example, GCC eliminated
               | what it considered dead code for a NULL check of a
               | variable that couldn't be NULL according to the C spec.
               | 
               | That's also probably the most frustrating thing about
               | optimisations based on undefined behaviour ie checks that
               | prevent undefined behaviour are removed because the
               | compiler thinks that the check can't ever succeed
               | because, if it did, there must have been undefined
               | behaviour. But the way the developer was ensuring defined
               | behaviour was with the check!
        
               | frabert wrote:
               | AFAIK, something having undefined behavior in the spec
               | does not prevent an implementation- (platform-)specific
               | behavior being defined.
               | 
               | As to your point about checks being erased, that
               | generally happens when the checks happen too late
               | (according to the compiler), or in a wrong way. For
               | example, checking that `src` is not NULL _after_
               | memcpy(sec, dst, 0) is called. Or, checking for overflow
               | by doing `if(x+y<0) ...` when x and y are nonnegative
               | signed ints.
        
             | nephanth wrote:
             | I mean, they might not have given thought to that
             | particular corner case, they probably wrote something like
             | 
             | > memcpy(void* ptr1, void* ptr2, int n)
             | 
             | Copy n bytes from ptr1 to ptr2. UNDEFINED if ptr1 is NULL
             | or ptr2 is NULL
             | 
             | -------
             | 
             | It might also have come from a "explicit better than
             | implicit" opinion, as in "it is better to have developers
             | explicitly handle cases where the null pointer is involved
        
               | jbverschoor wrote:
               | I think it's more a strategy. C was not created to be
               | safe. It's pretty much a tiny wrapper around assembler.
               | Every limitation requires extra cycles, compile time or
               | runtime, both of which were scarce.
               | 
               | Of course, someone needs to check in the layers of
               | abstraction. The user, programmer, compiler, cpu,
               | architecture.. They chose for the programmer, who like to
               | call themselves "engineers" these days.
        
               | wruza wrote:
               | Not sure what your last remark means wrt everything else.
        
               | poincaredisk wrote:
               | I disagree with your premise. C was designed to be a high
               | level (for its time) language, abstracted from actual
               | hardware
               | 
               | >It's pretty much a tiny wrapper around assembler
               | 
               | Assebler has zero problem with adding "null + 4" or
               | computing "null-null". C does, because it's not actually
               | a tiny wrapper.
        
             | larschdk wrote:
             | When C was conceived, CPU architectures and platforms were
             | more varied than what we see today. In order to remain
             | portable and yet performant, some details were left as
             | either implementation defined, or completely undefined
             | (i.e. the responsibility of the programmer). Seems archaic
             | today, but it was necessary when C compilers had to be two-
             | pass and run in mere kilobytes of RAM. Even warnings for
             | risky and undefined behavior is a relatively modern concept
             | (last 10-20 years) compared to the age of C.
        
               | actionfromafar wrote:
               | When C was conceived, it was made for a specific DEC CPU,
               | for making an operating system. The idea of a C
               | _standard_ was in the future.
               | 
               | If you wanted to know what (for instance) memcpy
               | _actually_ did, you looked at the source code, or even
               | more likely, the assembler or machine code output. That
               | was  "the standard".
        
               | anticensor wrote:
               | No, K&R's book was the standard.
        
               | actionfromafar wrote:
               | First came the language, then a few years later they
               | described it in a book.
        
               | da_chicken wrote:
               | I think it's reasonable to assume that GP clearly meant
               | the C standard being conceived, as, obviously, K&R's C
               | implementation of the language was ad hoc rather than
               | exhibiting any prescribed specification.
        
               | scoutt wrote:
               | > Seems archaic today ... run in mere kilobytes of RAM
               | 
               | There is an entire industry that does pretty much that...
               | today. They might run in flash instead of RAM, but still,
               | a few kilobytes.
               | 
               | Probably there are more embedded devices out there than
               | PCs. PIC, AVR, MSP, ARM, custom archs. There might be one
               | of those right now under your hand, in that thing you use
               | to move the cursor.
        
               | krisoft wrote:
               | > There is an entire industry that does pretty much
               | that... today.
               | 
               | Which industry runs C compilers on embeded devices?
               | Because that is what the part you elipsised out was
               | talking about.
        
               | scoutt wrote:
               | Oh... yes. You are right. My bad.
        
               | sitzkrieg wrote:
               | many do tho. i have targetted c89 and maybe c99 on
               | several embedded devices
        
               | 0xffff2 wrote:
               | But you're running the compiler on the device rather than
               | cross-compile?
        
               | vlovich123 wrote:
               | They cross compile. No one is compiling code on these
               | machines.
        
             | hyperman1 wrote:
             | memcpy used to be a rep movsb on 8086 DOS compilers. I
             | don't remember if rep movsb stops if cx=0 on entry, or
             | decrements first and wraps around, copying 64K of data.
        
               | connicpu wrote:
               | I know at least MSVC's memcpy on x86_64 still results in
               | a rep movsb if the cpuid flag that says rep movsb is fast
               | is set, which it should be on all x86 chips from about
               | 2011/2012 and onward ;)
        
               | dfox wrote:
               | The specification does not explicitly say that, but the
               | clear intention is that REP with CX=0 should be no-op
               | (you get exactly that situation when REP gets interrupted
               | during the last iteration, in that case CX is zero and IP
               | points to the REP, not the following instruction).
        
               | bonzini wrote:
               | Rep movsb copies 64K if CX=0 (that's actually very
               | useful), but memcpy could be implemented as two
               | instructions:                   jcxz skip          rep
               | movsb         skip:
        
             | wat10000 wrote:
             | The original C standard was more descriptive than
             | prescriptive. There was probably an implementation where it
             | crashed or misbehaved.
        
             | menaerus wrote:
             | Charitable interpretation may be: Back then when the
             | contract of this function was standardized, presumably in
             | C89 which is ~35 years ago, CPUs but also C compilers were
             | not as powerful so wasting an extra couple of CPU cycles to
             | check this condition was much more expensive than it is
             | today. Because of that contract, and which can be seen in
             | the example in the below comments, the compiler is also
             | free to eliminate the dead code which also has the effect
             | of shaving off some extra CPU cycles.
        
             | lmm wrote:
             | Back when they wrote it they were trying to accommodate
             | existing compilers, including those who did useful things
             | to help people catch errors in their programs (e.g. making
             | memcpy trap and send a signal if you called it with NULL).
             | The current generation of compilers that use undefined
             | behaviour as an excuse to do horrible things that screw
             | over regular programmers but increase performance on
             | microbenchmarks postdates the standard.
        
             | FartyMcFarter wrote:
             | Because the benefit was probably seen as very little, and
             | the cost significant.
             | 
             | When you're writing a compiler for an architecture where
             | every byte counts you don't make it write extra code for
             | little benefit.
             | 
             | Programmers were routinely counting bytes (both in code
             | size and data) when writing Assembly code back then, and I
             | mean that literally. Some of that carried into higher-level
             | languages, and rightly so.
        
             | killerstorm wrote:
             | From what I understand:
             | 
             | 1. Initially, they just wanted to give compiler makers more
             | freedom: both in the sense "do whatever is simplest" and
             | "do something platform-specific which dev wants". 2.
             | Compiler devs found that they can use UB for optimization:
             | e.g. if we assume that a branch with UB is unreachable we
             | can generate more efficient code. 3. Sadly, compiler devs
             | started to exploit every opportunity for optimization, e.g.
             | removing code with a potential segfault.
             | 
             | I.e. people who made a standard thought that compiler would
             | remove no-op call to memcpy, but GCC removes the whole
             | branch which makes the call as it considers the whole
             | branch impossible. Standard makers thought that compiler
             | devs would be more reasonable
        
               | kllrnohj wrote:
               | > Standard makers thought that compiler devs would be
               | more reasonable
               | 
               | This is a bit of a terrible take? Compiler devs never did
               | anything "unreasonable", they didn't sit down and go
               | "mwahahaha we can exploit the heck out of UB to break
               | everything!!!!"
               | 
               | Rather, repeatedly applying a series of targeted
               | optimizations, each one in isolation being "reasonable",
               | results in an eventual "unreasonable" total
               | transformation. But this is more an emergent property of
               | modern compilers having hundreds of optimization passes.
               | 
               | At the time the standards were created, the idea of
               | compilers applying so many optimization passes was just
               | not conceivable. Compilers struggled to just do basic
               | compilation. The assumption was a near 1:1 mapping
               | between code & assembly, and that just didn't age well at
               | all.
        
               | LegionMammal978 wrote:
               | One could argue that "optimizing based on signed
               | overflow" was an unreasonable step to take, since any
               | given platform will have some sane, consistent behavior
               | when the underlying instructions cause an overflow. A
               | developer using signed operations without poring over the
               | standard might have easily expected incorrect values (or
               | maybe a trap if the platform likes to use those), but not
               | big changes in control flow. In my experience, signed
               | overflow is generally the biggest cause of "they're
               | putting UB in my reasonable C code!", followed by the
               | rules against type punning, which are violated every day
               | by ordinary usage of the POSIX socket functions.
        
               | kllrnohj wrote:
               | > One could argue that "optimizing based on signed
               | overflow" was an unreasonable step to take
               | 
               | That optimization allows using 64-bit registers / offset
               | loads for signed ints which it can't do if it has to
               | overflow, since that overflow must happen at 32-bits.
               | That's not an uncommon thing.
        
               | uecker wrote:
               | I started to like signed overflow rules, because it is
               | really easy to find problems using sanitizers.
               | 
               | The strict aliasing rules are not violated by typical
               | POSIX socket code as a cast to a different pointer type,
               | i.e. `struct sockaddr` by itself is well-defined
               | behavior. (and POSIX could of course just define
               | something even if ISO C leaves it undefined, but I don't
               | think this is needed here)
        
               | UncleMeat wrote:
               | There isn't a "find UB branches" pass that is seeking out
               | this stuff.
               | 
               | Instead what happens is that you have something like a
               | constant folding or value constraint pass that computes a
               | set of possible values that a variable can hold at
               | various program points by applying constraints of various
               | options. Then you have a dead code elimination pass that
               | identifies dead branches. This pass doesn't know _why_
               | the  "dest" variable can't hold the NULL value at the
               | branch. It just knows that it can't, so it kills the
               | branch.
               | 
               | Imagine the following code:                  int x =
               | abs(get_int());        if (x < 0) {          // do stuff
               | }
               | 
               | Can the compiler eliminate the branch? Of course. All
               | that's happened here is that the constraint propagation
               | feels "reasonable" to you in this case and "unreasonable"
               | to you in the memcpy case.
        
               | meonukk wrote:
               | Why is it allowed to eliminate the branch? In most
               | architectures abs(INT_MIN) returns INT_MIN which is
               | negative
        
               | plorkyeran wrote:
               | Calling abs(INT_MIN) on twos-complement machine is not
               | allowed by the C standard. The behavior of abs() is
               | undefined if the result would not fit in the return
               | value.
        
               | Sohcahtoa82 wrote:
               | I didn't believe this so I looked it up, and yup.
               | 
               | Because of 2's complement limitations, abs(INT_MIN) can't
               | actually be represented and it ends up returning INT_MIN.
        
               | UncleMeat wrote:
               | It's possible that there is an edge case in the output
               | bounds here. I'm just using it as an example.
               | 
               | Replace it with "int x = foo() ? 1 : 2;" if you want.
        
               | mjevans wrote:
               | More reasonable: Emit a warning or error to make the code
               | and human writing it better.
               | 
               | NOT-reasonable: silently 'optimize' a 'gotcha' into
               | behavior the programmer(s) didn't intend.
        
               | gpderetta wrote:
               | NOT-reasonable: expecting the compiler to read the
               | programmer's mind.
        
             | ynik wrote:
             | Probably because they did not think of this special case
             | when writing the standard, or did not find it important
             | enough to consider complicating the standard text for.
             | 
             | In C89, there's just a general provision for all standard
             | library functions:
             | 
             | > Each of the following statements applies unless
             | explicitly stated otherwise in the detailed descriptions
             | that follow. If an argument to a function has an invalid
             | value (such as a value outside the domain of the function,
             | or a pointer outside the address space of the program, or a
             | null pointer), the behavior is undefined. [...]
             | 
             | And then there isn't anything on `memcpy` that would
             | explicitly state otherwise. Later versions of the standard
             | explicitly clarified that this requirement applies even to
             | size 0, but at that point it was only a clarification of an
             | existing requirement from the earlier standard.
             | 
             | People like to read a lot more intention into the standard
             | than is reasonable. Lots of it is just historical accident,
             | really.
        
           | david-gpu wrote:
           | More information on this behavior in the link below.
           | 
           |  _> Note that, apart from contrived examples with deleted
           | null checks, the current rules do not actually help the
           | compiler meaningfully optimize code. A memcpy implementation
           | cannot rely on pointer validity to speculatively read
           | because, even though memcpy(NULL, NULL, 0) is undefined,
           | slices at the end of a buffer are fine. [And if the end of
           | the buffer] were at the end of a page with nothing allocated
           | afterwards, a speculative read from memcpy would break_
           | 
           | https://davidben.net/2024/01/15/empty-slices.html
        
             | Someone wrote:
             | > [And if the end of the buffer] were at the end of a page
             | with nothing allocated afterwards, a speculative read from
             | memcpy would break
             | 
             | 'Only' on platforms that have memory protection hardware.
             | Even there, the platform can always allocate an overflow
             | page for a process, or have the page fault handler check
             | whether the page fault happened due to a speculative read,
             | and repair things (I think the latter is hugely, hugely,
             | hugely impractical, but the standard cannot rule it out)
        
               | immibis wrote:
               | Platforms without memory protection hardware also have no
               | problem reading NULL.
        
               | Someone wrote:
               | My comment is a reply to (part of) a comment that isn't
               | talking about reading from NULL. That's what the _[And if
               | the end of the buffer]_ part implies.
               | 
               | Even if it didn't, I don't think the standard should
               | assume that _"Platforms without memory protection
               | hardware also have no problem reading NULL"_
               | 
               | An OS could, for example, have a very simple memory
               | protection feature where the bottom half of the memory
               | address range is reserved for the OS, the top half for
               | user processes, and any read from an address with the
               | high bit clear by code in the top half of the address
               | range traps and makes the OS kill the process doing the
               | read.
        
               | BenjiWiebe wrote:
               | Doesn't it take memory protection hardware to trap on a
               | memory read?
        
               | hun3 wrote:
               | Not really. MMIO mapped at 0x0 for example.
        
               | david-gpu wrote:
               | Yikes! I would love sipping coffee watching the chief
               | architect chew up whoever suggested that. That sounds
               | awful even on a microcontroller.
        
               | bonzini wrote:
               | On s390 the memory at address 0 (low core) has all sorts
               | of important stuff. Of course s390 has paging enabled
               | pretty much always but still...
        
               | colejohnson66 wrote:
               | AVR's registers are mapped to address 0. So reading and
               | writing NULL is actually modifying r0.
        
               | formerly_proven wrote:
               | AVR's r0 is also a totally normal register, unlike most
               | other RISC which typically have r0 == 0.
        
               | kevin_thibedeau wrote:
               | They may also expect writes to address 0.
        
             | Zondartul wrote:
             | What does "speculative" mean in this case? I understand it
             | as CPU-level speculative execution a.k.a. branch mis-
             | prediction, but that shouldn't have any real-world effects
             | (or else we'd have segfaults all the time due to executing
             | code that didn't really happen)
        
               | dwattttt wrote:
               | Turns out you can have that kind of speculative failure
               | too!
               | https://randomascii.wordpress.com/2018/01/07/finding-a-
               | cpu-d...
        
           | xbar wrote:
           | Upon which some people may rely...
        
             | int_19h wrote:
             | People will only rely on UB when it is well defined by a
             | particular implementation, either explicitly or because of
             | a long history of past use. E.g. using unions for type
             | punning in gcc, or allowing methods to be called on null
             | pointers in MSVC.
             | 
             | But there's nothing like that here.
        
           | jancsika wrote:
           | I get that for the library. But I'm a bit puzzled about the
           | optimizations done by a compiler based on this behavior.
           | 
           | E.g., suppose we patch GCC to preserve any conditional
           | containing the string 'NULL' in it. Would that have a
           | measurable performance impact on Linux/Chromium/Firefox?
        
           | captainmuon wrote:
           | I feel strongly they should split undefined behavior in
           | behavior that is not defined, and things that the compiler is
           | allowed to assume. The former basically already exists as
           | "implementation defined behavior". The latter should be
           | written out explicitly in the documentation:
           | 
           | > memcpy(dest, src, count)
           | 
           | > Copies count bytes from src to dest. [...] Note this is not
           | a plain function, but a special form that applies the
           | constraints dest != NULL and src != NULL to the surrounding
           | scope. Equivalent to:                   assume(dest != NULL)
           | assume(src != NULL)         actual_memcpy(dest, src, count)
           | 
           | The conflation of both concepts breaks the mental model of
           | many programmers, especially ones who learned C/C++ in the
           | 90s where it was common to write very different code, with
           | all kinds of now illegal things like type punning and
           | checking this != NULL.
           | 
           | I'd love to have a flag "-fno-surprizing-ub" or "-fhighlevel-
           | assembler" _combined_ with the above `assume` function or
           | some other syntax to let me help the compiler, so that I can
           | write C like in the 90s - close to metal but with less
           | surprizes.
        
             | Thorrez wrote:
             | >Note this is not a plain function, but a special form that
             | applies the constraints dest != NULL and src != NULL to the
             | surrounding scope.
             | 
             | Plain functions can apply constraints to the surrounding
             | code:
             | 
             | https://godbolt.org/z/fP58WGz9f
        
         | rcxdude wrote:
         | Purely mechanically, yes, but in terms of the definition of the
         | behaviour in the C abstract machine, no, because certain
         | operations on null pointers are undefined, even if the obvious
         | low-level compilation turns into nothing.
        
           | codedokode wrote:
           | Maybe we should get rid of "abstract machine" and treat
           | pointers as memory addresses?
        
             | davidt84 wrote:
             | Congratulations, you've invented an entirely new language.
             | 
             | Now, who's going to write the compiler for it?
        
               | anticensor wrote:
               | No, it's C at -O0.
        
               | davidt84 wrote:
               | No, it's not.
               | 
               | Undefined behaviour is undefined behaviour whatever
               | optimisation level you use.
               | 
               | Some -f flags may extend the C standard and remove
               | undefined behaviour in some cases (e.g. strict aliasing,
               | signed integer overflow, writable string constants, etc.)
        
             | gpderetta wrote:
             | int* oracle();       int foo() {           int x = 1;
             | *oracle() = 42;           return x;       }
             | 
             | Is the above program allowed to return anything other than
             | 1 in your language?
        
               | kibwen wrote:
               | To elaborate, we treat pointers as more than just
               | integers because it gives optimizers the latitude to
               | reorder and eliminate pointer operations. In the example
               | above we cannot do this, because we cannot prove at
               | compile time that x doesn't live at the address returned
               | by oracle.
               | 
               | For some high-quality further discussion, see Ralf Jung's
               | series of blog posts starting with
               | https://www.ralfj.de/blog/2018/07/24/pointers-and-
               | bytes.html
        
               | shultays wrote:
               | However, given how low-level a language C++ is, we can
               | actually break this assumption by setting i to y-x. Since
               | &x[i] is the same as x+i, this means we are actually
               | writing 23 to &y[0].
               | 
               | But that is undefined, you can't do x + (y - x) ie a
               | pointer arithmetic that ends outside of bounds of an
               | array. Since it is undefined, shouldn't C++ assume that
               | changing x[..] can't change y[0]
               | 
               | edit: welp, if I read a few more lines into article I
               | would see that it also tells it is undefined
        
               | gpderetta wrote:
               | to be clear, in my example the result of oracle() cannot
               | possibly alias with 'x' in C or C++ (and in fact gcc will
               | optimize accordingly). In a different language where
               | addresses are mere integers, things would be more
               | complicated.
        
               | codedokode wrote:
               | The result of oracle can point to anything if you write
               | it as return (int *)rand();
               | 
               | Note that rand() returns 32-bit value so you have to call
               | it twice and merge the results to obtain a 64-bit
               | pointer.
        
               | gpderetta wrote:
               | The numerical value returned by oracle might physically
               | match the address of the stack slot for 'x', assuming
               | that it exists, but it doesn't mean that, from a language
               | point of view, it is a valid pointer.
               | 
               | If forging pointers had defined behaviour, it would be
               | impossible to use the language sanely or perform any kind
               | of optimization.
        
               | alerighi wrote:
               | Well even in C is not guaranteed to return anything other
               | than 1, since oracle() may return the memory address of
               | variable 1.
        
               | gpderetta wrote:
               | the literal 1 is not an object in C or C++ hence it does
               | not have an address. If you meant 'x', then also no,
               | oracle() can't return the address of 'x' because of
               | pointer provenance rules.
        
               | shultays wrote:
               | Is it allowed to return anything else in C? Is there
               | anything in standard C that would allow oracle() to
               | access memory address of x?
               | 
               | Sure different compilers might allow inlining assembly or
               | some other ways to access x on previous stack perhaps but
               | then it is not really "C"
        
               | wat10000 wrote:
               | That's the point. C allows this function to be optimized
               | to always return 1. A "pointers are addresses, just emit
               | reads and writes and stop trying to be so clever" version
               | of C would require x to be spilled to the stack, then the
               | write, then reload x and return whatever it contained.
        
               | cv5005 wrote:
               | Then use the register keyword or just reword the standard
               | to assume the register behavior if a variables address
               | hasn't been taken.
               | 
               | The majority of useful optimizations can be kept in a
               | "Sane C" with either code style changes (cache stuff in
               | local vars to avoid aliasing for example) or with minor
               | tweaks to the standard.
        
               | wat10000 wrote:
               | Register behavior is what you want essentially all of the
               | time. So we'd just have to write `register` all over the
               | place for no gain.
               | 
               | "Don't optimize this, read and write it even if you think
               | it's not necessary" is a very rare case so it shouldn't
               | be the default. If you want it, use the volatile keyword.
               | 
               | There's no need to reword the standard to assume the
               | register behavior if the variable's address hasn't been
               | taken. That's already how it works. In this example, if
               | you escape the value of `&x`, it's not legal to optimize
               | this function to always return 1.
        
               | codedokode wrote:
               | When using C, this can return anything (or crash of
               | oracle function returns an invalid pointer, or rewrite
               | its own code if the code section is writable). So if you
               | get rid of "abstract machine", nothing changes - the
               | program can return anything or crash.
        
               | wat10000 wrote:
               | A conforming C compiler is allowed to emit that function
               | to perform the write and then return the constant 1.
               | Should that be allowed?
        
               | atq2119 wrote:
               | [delayed]
        
             | sixfiveotwo wrote:
             | How would you define what a memory address is without first
             | defining in which context it has a meaning?
        
               | codedokode wrote:
               | C was written as a portable assembly language, so I think
               | a memory address is a number that CPU considers to be a
               | memory address.
        
               | layer8 wrote:
               | That's currently the case in C, in that you can convert
               | pointers to and from _uintptr_t_. However, not every
               | number representable in that type needs to be valid
               | memory (that's true on the assembly level as well), hence
               | it's only defined for valid pointers.
        
               | sixfiveotwo wrote:
               | > I think a memory address is a number that CPU considers
               | to be a memory address
               | 
               | I meant to say that, indeed, there must be some concept
               | of CPU for a memory address to have a meaning, and for
               | this concept of CPU to be as widely applicable as
               | possible, surely defining it as abstract as possible is
               | the way to go. Ergo, the idea of a C abstract machine.
               | 
               | Anyway, other people in this thread are discussing the
               | matter more accurately and in more details than I could
               | hope to do, so I'll leave it like that.
        
             | lmm wrote:
             | 20 years ago, making a C compiler that provided sane
             | behaviour and better guarantees (going beyond the minimum
             | defined in the standard) to make code safer and
             | programmers' lives easier, even at the cost of some
             | performance, might have been a good idea. Today any
             | programmer who thinks things like not having security bugs
             | are more important than having bigger numbers on
             | microbenchmarks has already moved on from C.
        
               | uecker wrote:
               | This is certainly not true. Many programmers also learned
               | to the use tools available to write reasonably safe code
               | in C. I do not personally find this problematic.
        
               | quotemstr wrote:
               | > safe code in C
               | 
               | You're like a Japanese holdout in the 60s refusing to
               | leave his bunker long after the war is over.
               | 
               | C lost. Memory safety is a huge boon for security. Human
               | beings, even the best of them, cannot consistently write
               | correct C code. (Look at OpenBSD.) You can keep fighting
               | the war your side has already lost or you can move on.
        
               | uecker wrote:
               | Well, memory safety is great but it seems Rust
               | programmers also manage to create memory safety issues
               | just fine:
               | 
               | https://rustsec.org/advisories/RUSTSEC-2024-0401.html
               | https://rustsec.org/advisories/RUSTSEC-2024-0400.html
               | https://rustsec.org/advisories/RUSTSEC-2024-0377.html
               | https://rustsec.org/advisories/RUSTSEC-2024-0374.html
               | etc.
        
               | whytevuhuni wrote:
               | I think the first one, stack overflow, is technically not
               | a memory safety issue, just denial-of-service on resource
               | exhaustion. Stack overflow is well defined as far as I
               | know.
               | 
               | The other three are definitely memory safety issues.
        
               | quotemstr wrote:
               | C++ is a better unsafe language than unsafe Rust, IMHO.
               | The thing about the social dynamic of Rust, though, is
               | that it keeps unsafe code to a minimum.
        
               | ryao wrote:
               | I would consider a stack overflow to be a memory safety
               | issue. The C++ language authors likely would too. C++
               | famously refused to support variable length stack
               | allocated arrays because of memory safety concerns. In
               | specific, they were worried that code at runtime would
               | make an array so big so big that it would jump the OS
               | guard page, allowing access to unallocated memory that of
               | course is not noticed ahead of time during development.
               | This is probably easy to do unintentionally if you have
               | more stack variables after an enormous stack allocated
               | array and touch them before you touch the array. The
               | alternative is to force developers to use compiler
               | extensions such as alloca(). That makes it easy to pass
               | pointers outside of the stack frame where they are valid
               | and is a definite safety issue. The C++ nitpicking over
               | variable length arrays is silly since it gives us a
               | status quo where C++ developers use alloca() anyway, but
               | it shows that stack overflows are considered a memory
               | safety issue.
        
               | whytevuhuni wrote:
               | In the general case, I think you might be right, although
               | it's a bit mitigated by the fact that Rust does not have
               | support for variable length arrays, alloca, or anything
               | that uses them, in the standard library. As you said
               | though, it's certainly possible.
               | 
               | I was more referring to that specific linked advisory,
               | which is unlikely to use either VLAs or alloca. In that
               | case, where stack overflow would be caused by recursion,
               | a guard frame will always be enough to catch it, and will
               | result in a safe abort [0].
               | 
               | [0] https://github.com/rust-lang/rust/pull/31333
        
               | ryao wrote:
               | Use a sound static analyzer like astree and you can
               | produce memory safe C code:
               | 
               | https://www.absint.com/astree/index.htm
               | 
               | Note that the key word here is sound. The more common
               | static analyzers are unsound tools that will miss cases.
               | Sound tools do not, but few people know of them, they are
               | rare and they are typically proprietary and expensive.
        
               | quotemstr wrote:
               | Sure. I'm also a big fan of what Microsoft has done with
               | SAL. And of course you have formally proven C, as used in
               | seL4. I'd say that the contortions you have to go through
               | to write code with these systems takes you out of the
               | domain of "C" and into a domain of a different, safer
               | language merely resembling C. Such a language might be a
               | fine tool! But it's not arbitrary C.
        
             | NobodyNada wrote:
             | If you do this, your C code will run significantly slower
             | than, say, Java, Go, or C#, because the compiler is unable
             | to apply even the most basic optimizations (which it can do
             | still in all those other languages).
             | 
             | So, at that point why even use C at all? Today, C is used
             | where the overhead of a managed language is unacceptable.
             | If you could just eat the performance cost, you'd probably
             | already be using a managed language. There's not much
             | desire for a variant of C with what would be at least a 10x
             | slowdown in many workloads.
        
               | cv5005 wrote:
               | Or it could be made faster because certain manual
               | optimizations become possible.
               | 
               | An example would a table of interned strings that you
               | wanna match against (say you're writing a parser). Since
               | standard C says thou shall not compare pointers with < or
               | > unless they both point into the same 'object' you are
               | forbidden from doing the speed of light code:
               | char *keywords_begin, *keywords_end;       if(some_str >=
               | keywords_begin && some_str < keywords_end) ...
               | 
               | Official standard sanctioned workarounds would require
               | extra indirection (using indices for example) which is
               | suboptimal.
        
               | gpderetta wrote:
               | You can cast them to uintptr_t and compare them to your
               | heart's desire.
        
             | layer8 wrote:
             | That would restrict C to memory models with a linear
             | address space. That is usually the case nowadays for C
             | implementations, but maybe we don't want to set that in
             | stone, because it would be virtually impossible to revert
             | such a guarantee.
             | 
             | There's also cases like memory address ranges that map to
             | non-memory hardware (i.e. that don't behave like "dumb"
             | memory), and how would you have the C standard define
             | behavior for those?
             | 
             | Lastly, CPU caches require _some_ sort of abstract model as
             | soon as you have multi-threading.
        
             | Measter wrote:
             | The value of an abstract machine is that it allows you to
             | specify how a given program behaves without needing to
             | point to a specific piece of hardware. Compilers then have
             | this as a target when compiling a program for a specific
             | piece of hardware so that they know when the compiler's
             | output is correct.
             | 
             | The issue here is that the abstract machine is under or
             | badly specified.
        
         | IcePic wrote:
         | "man bcopy" on BSD:
         | 
         | 'If len is zero, no bytes are copied.'
         | 
         | Seems reasonable.
        
         | pkhuong wrote:
         | It does nothing, but is only defined when the pointers point
         | into or one past the end of valid objects (live allocations),
         | because that's how the standard defines the C VM, in terms of
         | objects, not a flat byte array.
        
           | whytevuhuni wrote:
           | What if the objects are non-NULL, but invalid (not actually
           | allocated)?
           | 
           | For example, Rust will use address 1 with length 0 for static
           | empty strings, because 1 is a properly aligned non-null
           | pointer.
           | 
           | I would imagine such strings end up being passed to C code
           | sometimes, which may end up calling memcpy with a length of 0
           | on them.
        
             | pkhuong wrote:
             | also UB according to the spec, but LLVM is free to define
             | it. e.g., clang often converts trivial C++ copy
             | constructors to memcpy, which is UB for self-assignment,
             | but I assume that's fine because the C++ front-end only
             | targets LLVM, and LLVM presumably defines the behaviour to
             | do what you'd expect.
        
               | whytevuhuni wrote:
               | Where I work, it is quite normal to link together C code
               | compiled with GCC and Rust code compiled with LLVM, due
               | to how the build system is set up.
               | 
               | As far as I know that disables LTO, but the build system
               | is so complex, and the C code so large, that nobody
               | bothers switching the C side to Clang/LLVM as well.
        
             | creshal wrote:
             | > What if the objects are non-NULL, but invalid (not
             | actually allocated)?
             | 
             | Still UB, since they're restricted pointers that must be
             | valid to begin with.
        
               | bonzini wrote:
               | This is wrong. If you do p=malloc(256), p+256 is valid
               | even though it does not point to a valid address (it
               | might be in an unmapped page; check out ElectricFence).
               | Rust's non-null aligned other pointer is the same, memcpy
               | can't assume it can be dereferenced if the size is zero.
               | The standard text in the linked paper says the same.
        
             | badmintonbaseba wrote:
             | Still technically UB according to the proposed wording. The
             | proposed wording only deals with allowing null pointers
             | explicitly.
        
         | bluetomcat wrote:
         | A trivial implementation wouldn't dereference dest or src in
         | case the length is 0. That's how a student would write it with
         | a for loop (byte-by-byte copy). A non-trivial implementation
         | might do something with the pointers before entering the copy
         | loop.
        
         | ryukoposting wrote:
         | Yes and no.
         | 
         | No, because ISO never said it _must_ behave this way.
         | 
         | Yes, because every libc I've personally encountered acts this
         | way. At a glance, glibc's x86 implementation[1, 2], musl, and
         | picolibc all handle 0-length memcpy as you'd expect. I'm sure
         | other folks could dig up the code for Newlib, uclibc, and
         | others, and they'd see the same thing.
         | 
         | On a related note, ISO C has THREE different things that most
         | people tend to lump together as "undefined behavior." They are:
         | 
         | Implementation-defined behavior: ISO doesn't require any
         | particular behavior, but they _do_ require implementations to
         | consistently apply a particular behavior, and document that
         | behavior.
         | 
         | Unspecified behavior: ISO doesn't require any particular
         | behavior, but they _do_ require implementations to consistently
         | use a particular behavior, but they _don 't_ require that
         | behavior to be documented.
         | 
         | Undefined behavior: ISO doesn't require any particular
         | behavior, and they don't require implementations to define any
         | particular behavior either.
         | 
         | [1]:
         | https://github.com/lattera/glibc/blob/master/string/memcpy.c
         | [2]:
         | https://github.com/lattera/glibc/blob/895ef79e04a953cac14938...
        
         | ryao wrote:
         | I have asked this question in the past and was told that
         | memcpy() is allowed to preemptively read before it has
         | determined it needs to write to make it faster on some CPUs.
         | The presumption is that if you are going to be copying data,
         | there is at least one cache line there already, so reading can
         | start early.
        
       | whytevuhuni wrote:
       | How interesting. GCC does indeed remove that branch.
       | 
       | https://godbolt.org/z/aPcr1bfPe
        
         | mjg59 wrote:
         | Explanation for the above: passing NULL as the destination
         | argument to memcpy() is undefined behaviour at present. gcc
         | assumes that the fact that memcpy() is called therefore means
         | that the destination argument can't be NULL, so "knows" that
         | the dest == NULL check can never be true, and so removes the
         | test and the do_thing1() branch entirely.
         | 
         | Interestingly, replacing len in the memcpy() call results in
         | gcc instead removing the memcpy() call and retaining the check
         | - presumably a different optimisation routine decides that it's
         | a no-op in that case. https://godbolt.org/z/cPdx6v13r is,
         | therefore, interesting - despite this only ever calling test()
         | with a len of 0, the elision of the dest == NULL check is still
         | there, but test() has been inlined _without_ the memcpy
         | (because len == 0) but _with_ do_thing2() (because the
         | behaviour is undefined and so it can assume dest isn 't NULL
         | even though there's a NULL literally right there!)
         | 
         | Fucking compilers, man.
        
           | nayuki wrote:
           | > Fucking compilers, man.
           | 
           | They're just acting as agents that derive the logical
           | consequences of the code.
           | 
           | The fact that the given example code is "surprising" is
           | analogous to this mathematical derivation:
           | a = b         a*a = b*a         a*a - b*b = b*a - b*b
           | (a - b)(a + b) = b(a - b)         (a - b)(a + b)/(a - b) =
           | b(a - b)/(a - b)         ^ Divide by 0, undefined behavior!
           | Everything below is not necessarily true.         a + b = b
           | b + b = b         2b = b         2 = 1         2 - 1 = 1 - 1
           | 1 = 0
           | 
           | The source of truth about what is/isn't allowed is the C
           | standard, not your personal simplified model of it that may
           | contain dangerous misconceptions. The fact that your mental
           | model doesn't match the document is an education problem, not
           | a problem with the compiler.
        
             | marssaxman wrote:
             | > They're just acting as agents that derive the logical
             | consequences of the code.
             | 
             | In a particularly pedantic, uptight, and sometimes un-
             | helpful way, yes.
             | 
             | Compilers don't _have_ to be designed this way; in fact it
             | is a relatively recent development in the history of such
             | tools.
        
             | saurik wrote:
             | > The fact that your mental model doesn't match the
             | document is an education problem, not a problem with the
             | compiler.
             | 
             | Or it is a problem with the document, which is the entire
             | reason we are having this discussion: N3322 argued the
             | document should be fixed, and now it will be for C2y.
        
           | jpollock wrote:
           | How does gcc infer anything about memcpy? Can't I replace the
           | c-library memcpy with my own, so how does it know that dest
           | == NULL can never be true?
        
             | 0xffff2 wrote:
             | If I'm understanding the OP correctly, the C standard says
             | so, i.e. the semantics of memcpy are defined by the
             | standard and the standard says that it's UB to pass NULL.
        
               | tialaramex wrote:
               | Unlike all the more complicated languages the
               | "freestanding" mode C doesn't even have a memcpy feature,
               | so it may not define how one works - maybe you've decided
               | to use the name "memcpy" for your function which
               | generates a memorandum about large South American
               | rodents, and "memo_capybara" was too much typing.
               | 
               | In something like C++ or Rust, even their bare metal
               | "What do you mean _Operating System_? " modes quietly
               | require memcpy and so on because we're not savages,
               | clearly somebody should provide a way to _copy bytes of
               | memory_ , Rust is so civilised that even on bare metal
               | (in Rust's "core" library) you get a working
               | sort_unstable() for your arbitrary slice types!
        
               | bonzini wrote:
               | The compiler is free to give a meaning to memcpy if run
               | in the (default) hosted mode. There's -ffreestanding for
               | freestanding environments.
        
               | tialaramex wrote:
               | Right, though I guess I wasn't clear enough about that
               | for the down voters, but whatever.
        
             | mjg59 wrote:
             | The valid inputs to memcpy() are defined by the C
             | specification, so the compiler is free to make assumptions
             | about what valid inputs are even if the library
             | implementation chooses to allow a broader range of inputs
        
             | MindSpunk wrote:
             | Many standard C functions are treated as "magic" by
             | compilers. Malloc is treated as if it has no side effects
             | (which of course it does, it changes allocator state) so
             | the optimiser can elide allocations. If not you wouldn't be
             | able to elide the call because malloc looks like it has
             | side effects, which it does but not ones we care about
             | observing.
        
               | gpderetta wrote:
               | Not only that, malloc is also assumed to return pointer
               | that don't alias anything else.
        
             | int_19h wrote:
             | Per ISO C, the identifiers declared or defined with
             | external linkage by any C standard library header are
             | considered reserved, so the moment you define your own
             | memcpy, you're already in UB land.
        
             | ryao wrote:
             | You can, but gcc may replace it with an equivalent set of
             | instructions as a compiler optimization, so you would have
             | no guarantee it is used unless you hack the compiler.
             | 
             | On a related note, GCC optimizing away things is a problem
             | for memset when zeroing buffers containing sensitive data,
             | as GCC can often tell that the buffers are going to be
             | freed and thus the write is deemed unnecessary. That is a
             | security issue and has to be resolved by breaking the
             | compiler's optimization through a clever trick:
             | 
             | https://github.com/openzfs/zfs/commit/d634d20d1be31dfa8cf06
             | e... 12352
             | 
             | Similarly, GCC may delete a memcpy to a buffer about to be
             | freed, although I have never observed that as you generally
             | don't do that in production code.
        
             | bonzini wrote:
             | If you do so you have to add -fno-builtins (or just -fno-
             | builtin-memcpy).
        
         | ndesaulniers wrote:
         | > For example, GCC will happily remove the dest == NULL branch
         | in the following code
         | 
         | I think the blog should mention `-fno-delete-null-pointer-
         | checks`
         | 
         | https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html#ind...
        
           | AceJohnny2 wrote:
           | > _-fdelete-null-pointer-checks_
           | 
           | > [...]
           | 
           | > _This option is enabled by default on most targets._
           | 
           | What a footgun.
           | 
           | I understand that, in an effort to compete with other
           | compilers for relevance, GCC pursued performance over safety.
           | Has that era passed? Could GCC choose safer over fast?
           | 
           | Alternatively, has someone compiled a list of flags one might
           | want to enable in latest GCC to avoid such kinds of dangerous
           | optimizations?
        
             | ryao wrote:
             | Usually, when one marks an argument as nonnull via a
             | function attribute, one wants NULL checks to be removed.
        
               | ndesaulniers wrote:
               | There's two similar but distinct function attributes for
               | nullability. One affects codegen, one affects diagnostics
               | only.
        
               | AceJohnny2 wrote:
               | Irrelevant, because delete-null-pointer-checks happens
               | even in absence of nonnull function attribute, see GP's
               | godbolt link, and the documentation that omits any
               | reference to that function attribute.
               | 
               |  _That 's_ what makes it dangerous!
        
               | ryao wrote:
               | That is a side effect of passing the pointer as a
               | function parameter marked nonnull. It implies that the
               | pointer is nonnull and any NULL checks against it can be
               | removed. Pass it to a normal function and you will not
               | see the null check removed.
        
             | comex wrote:
             | Just for the record, that's not the main purpose of
             | -fdelete-null-pointer-checks.
             | 
             | Normally, it only deletes null checks after actual null
             | pointer dereferences. In principle this can't change
             | observable behavior. Null dereferences are guaranteed to
             | trap, so if you don't trap, it means the pointer wasn't
             | null. In other words, _unlike most C compiler
             | optimizations_ , -fdelete-null-pointer-checks should be
             | safe even if you do commit undefined behavior.
             | 
             | This once caused a kerfuffle with the Linux kernel. At the
             | time, x86_64 CPUs allowed the kernel to dereference
             | userspace addresses, and the kernel allowed userspace to
             | map address 0. Therefore, it was possible for userspace to
             | arrange for null pointers to _not_ trap when dereferenced
             | in the kernel. Which meant that the null check optimization
             | could actually change observable behavior. Which introduced
             | a security vulnerability. [1]
             | 
             | Since then, Linux has been compiled with `-fno-delete-null-
             | pointer-checks`, but it's not really necessary: Linux
             | systems have long since enforced that userspace can't map
             | address 0, which means that deleting null pointer checks
             | should be safe in both kernel and userspace. (Newer CPU
             | security features also protect the kernel even if userspace
             | _is_ allowed to map address 0.)
             | 
             | But anyway, I didn't know that -fdelete-null-pointer-checks
             | treated "memcpy with potentially-zero size" as a condition
             | to remove subsequent null pointer checks. That means that
             | the optimization actually isn't safe! Once GCC is updated
             | to respect the newly well-defined behavior, though, it
             | should become truly safe. Probably.
             | 
             | The same can't be said for most UB optimizations - most of
             | which can't be turned off.
             | 
             | [1] https://lwn.net/Articles/342330/
        
       | ape4 wrote:
       | Only about 1000 more functions to do this to.
        
       | high_na_euv wrote:
       | >On the one hand, UB can be important for compiler optimizations
       | 
       | e.g?
        
         | cwzwarich wrote:
         | The example in this blurb is a pretty good one:
         | https://www.hboehm.info/c++mm/why_undef.html
        
         | cesarb wrote:
         | The simplest example of a compiler optimization enabled by UB
         | would be the following:                 int my_function() {
         | int x = 1;         another_function();         return x;
         | }
         | 
         | The compiler can optimize that to:                 int
         | my_function() {         another_function();         return 1;
         | }
         | 
         | Because it's UB for another_function() to use an out-of-bounds
         | pointer to access the stack of my_function() and modify the
         | value of x.
         | 
         | And the most important example of a compiler optimization
         | enabled by UB is related to that: being UB to access local
         | variables through out-of-bounds pointers allows the compiler to
         | place them in registers, instead of being forced to go through
         | the stack for every operation.
        
           | alerighi wrote:
           | Does this still matters today? I mean, first registers are
           | anyway saved on the stack when calling a function, and caches
           | of modern processors are really nearly as fast (if not as
           | fast!) as a register. Registers these days are merely labels,
           | since internally the processor (at least for x86) executes
           | the code in a sort of VM.
           | 
           | To me it seems that all these optimizations were really
           | something useful back in the day, but nowadays we can as well
           | just ignore them and let the processor figure it out without
           | that much loss of performance.
           | 
           | Assuming that the program is "bug free" to me is a terrible
           | idea, since even mitigations that the programmer puts in
           | place to mitigate the effect of bugs (and no program is bug
           | free) are skipped because the compiler can assume the program
           | has no bug. To me security is more important than a 1% more
           | boost in performance.
        
             | gpderetta wrote:
             | Register allocation is one of the most basic optimizations
             | that a compiler can do. Some modern cpus can alias stack
             | memory with internal registers, but it is still not as fast
             | as not spilling at all.
             | 
             | You can enjoy -O0 today and the compiler will happily
             | allocate stack slots for all your variables and keep them
             | up to date (which is useful for debugging). But the
             | difference between -O0 and -O3 is orders of magnitude on
             | many programs.
        
             | wbl wrote:
             | Many calling conventions use registers. And no loads and
             | stores are extremely complex and not free at all: fewer can
             | issue in each cycle and there's some very expensive
             | hardware spent to maintain the ordering on execution.
        
             | cesarb wrote:
             | > I mean, first registers are anyway saved on the stack
             | when calling a function
             | 
             | No, they aren't. For registers defined in the calling
             | convention as "callee-saved", they don't have to be saved
             | on the stack before calling a function (and the called
             | function only has to save them if it actually uses that
             | register). And for registers defined as "caller-saved",
             | they only have to be saved if their value needs to be kept.
             | The compiler knows all that, and tends to use caller-saved
             | registers as scratch space (which doesn't have to be
             | preserved), and callee-saved registers for longer-lived
             | values.
             | 
             | > and caches of modern processors are really nearly as fast
             | (if not as fast!) as a register.
             | 
             | No, they aren't. For instance, a quick web search tells me
             | that the L1D cache for a modern AMD CPU has at least 4
             | cycles of latency. Which means: even if the value you want
             | to read is already in the L1 cache, the processor has to
             | wait 4 cycles before it has that value.
             | 
             | > Registers these days are merely labels, since internally
             | the processor (at least for x86) executes the code in a
             | sort of VM.
             | 
             | No, they aren't. The register file still exists, even
             | though register renaming means which physical register
             | corresponds to a logical register can change. And there's
             | no VM, most common instructions are decoded directly
             | (without going through microcode) into a single uOp or pair
             | of uOps which is executed directly.
             | 
             | > To me it seems that all these optimizations were really
             | something useful back in the day, but nowadays we can as
             | well just ignore them and let the processor figure it out
             | without that much loss of performance.
             | 
             | It's the opposite: these optimizations are more important
             | nowadays, since memory speeds have not kept up with
             | processor speeds, and power consumption became more
             | relevant.
             | 
             | > To me security is more important than a 1% more boost in
             | performance.
             | 
             | Newer programming languages agree with you, and do things
             | like checking array bounds on every access; they rely on
             | compiler optimizations so that the loss of performance is
             | only that "1%".
        
           | MrMcCall wrote:
           | I don't find those compelling reasons and, to the contrary, I
           | think that kind of semantic circumvention to be a symptom of
           | a poorly developed industry.
           | 
           | How can we have properly functioning programs without
           | clearly-defined, and _sensible_ , semantics?
           | 
           | If the developer needs to use registers, then they should
           | choose a dev env/PL that provides them, otherwise such
           | kludges will crash and burn, IMO.
        
             | bagels wrote:
             | We pay for the flexibility of not wearing seatbelts for
             | increasing the consequences of crashes.
        
             | gpderetta wrote:
             | We stopped explicitly declaring locals with the 'register'
             | keyword circa 40 years ago. Register allocation is a low
             | hanging fruit and one of those things that is definitely
             | best left to a compiler for most code.
        
             | wat10000 wrote:
             | Are you saying that C compilers should change every local
             | variable access to read and write to the stack just in case
             | some function intentionally does weird pointer arithmetic
             | to change their values without referring to them in the
             | source code?
        
             | wruza wrote:
             | And now they have to manage register pressure for it to
             | keep being faster. And false dependencies. And some more.
             | It doesn't work like that. Developers can't optimize like
             | compilers do, not with modern CPUs. The compilers do the
             | very heavy lifting in exchange for the complexity of a set
             | of constraints they (and you as a consequence, must) rely
             | on. The more relaxed these constraints are, the less
             | performant code you get. Modern CPUs run modern
             | interpreters as fast as dumbest-compiled C code basically,
             | so if you want sensible semantics, then Typescript is one
             | of the absolutely non-ironic answers.
        
           | cv5005 wrote:
           | You dont need UB for that.
           | 
           | A simple model for both compilers and programmers to
           | understand:
           | 
           | "A variable whose address has not been taken need not be
           | reachable via a random pointer".
           | 
           | I mean that's how an assembly programmer would think - if I
           | put something in r0 I don't expect a store instruction to
           | clobber it.
        
             | UncleMeat wrote:
             | What you describe there is UB. If you define this in the
             | standard, you are defining a kind of runtime behavior that
             | can never happen in a well formed program and the compiler
             | does not have to make a program that encounters this
             | behavior do anything in particular.
        
         | rwmj wrote:
         | This explanation of why signed int overflow is undefined is
         | interesting (although the behaviour is still very annoying):
         | https://kristerw.blogspot.com/2016/02/how-undefined-signed-o...
         | (HN discussion: https://news.ycombinator.com/item?id=11146384)
         | 
         | More examples here: http://blog.llvm.org/2011/05/what-every-c-
         | programmer-should-...
        
         | Arch-TK wrote:
         | http://blog.llvm.org/2011/05/what-every-c-programmer-should-...
         | 
         | In a real world program removing all UB is some cases
         | impossible without adding new breaking features to the C
         | language. But, taking a real world program and removingh all UB
         | which IS possible to remove will introduce an overhead. In some
         | programs this overhead is irrelevant. In others, it is probably
         | the reason why C was picked.
         | 
         | If you want speed without overhead, you need to have more
         | statically checked guarantees. This is what languages such as
         | Rust attempt to achieve (quite successfully).
        
           | uecker wrote:
           | Many real world C programs have no UB.
           | 
           | What Rust attempts to achieve is the possibility of
           | accidentally introducing UB by designing the language in away
           | that makes it impossible to have UB when sticking to the safe
           | subset.
           | 
           | It also possibly to make sure to ensure that C programs have
           | no UB and this does not require any breaking features to C.
           | It usually requires some refactoring the program.
        
         | GuB-42 wrote:
         | Generally, undefined behavior removes the need for
         | systematically checking for special cases, the most common
         | being out of bounds access.
         | 
         | But it can go further than that. Dereferencing a NULL pointer
         | is undefined behavior, so if a pointer is dereferenced, it can
         | be assumed by the compiler not to be NULL and the code can be
         | optimized. For example:                 void foo(int *p) {
         | *p++;         if (p == NULL) {           printf("val is
         | NULL\n");         } else {           printf("val is %d\n", *p);
         | }       }
         | 
         | can be optimized to:                 void foo(int *p) {
         | *p++;         printf("val is %d\n", *p);       }
         | 
         | Note that static analyzers will most likely issue a warning
         | here as such a trivial case is most likely a mistake. But the
         | check for NULL may be part of an inline function that is used
         | in many places, and thanks to the undefined behavior, the code
         | that handles the NULL case will only be generated when
         | relevant. The problem, of course, is that it assumes that the
         | programmer knows what he is doing and doesn't make mistakes.
         | 
         | In the case of memcpy(NULL, NULL, 0), there probably isn't much
         | to gain making it undefined. It most likely doesn't help with
         | the memcpy implementation (len=0 is a generally no-op), and
         | inference based on the fact that the arguments can't be NULL is
         | more likely to screw the programmer up than to improve
         | performance.
        
           | high_na_euv wrote:
           | But how much actual performance is gained here?
        
             | bagels wrote:
             | It all adds up. All those instructions you don't have to
             | execute, especially memory access and cache misses from
             | jumps, pipeline stalls from conditionals, not just from
             | this optimization.
        
             | ncruces wrote:
             | Imagine that you created a function GetPixel that reads an
             | RGB pixel at a memory address, and which has a NULL check
             | as a precondition.
             | 
             | If the compiler can "prove" that the pointer is not NULL it
             | can (after inlining the call) remove 20 million checks for
             | a 20 megapixel image.
             | 
             | The silly issue is the compiler using "you accessed it
             | before" (aka "undefined behaviour") to "prove" that the
             | pointer is not NULL.
             | 
             | But I can attest that avoiding 20 million such checks does
             | indeed make a huge difference.
        
               | cv5005 wrote:
               | Just make a non null checking version: GetPixelUnsafe()
               | and let the responsibility onto the user to do the null
               | check before the loop.
               | 
               | All of these 'problems' have simple and straigtforward
               | workarounds, I'm not convinced these UB are needed at
               | all.
        
               | ncruces wrote:
               | That's a non solution for existing code that already
               | calls GetPixel 20 million times.
               | 
               | It's not like I'm saying C is the best possible way to
               | write new code.
               | 
               | I'm just commenting why this matters for performance, and
               | "remove all undefined behavior" from C compilers is a
               | non-starter.
               | 
               | Now go write Rust for all I care.
        
               | nemothekid wrote:
               | > _All of these 'problems' have simple and straigtforward
               | workarounds, I'm not convinced these UB are needed at
               | all._
               | 
               | He gave you a simple and straightforward example, but
               | that example may not be representative of a real world
               | program where complex analysis leads to better performing
               | code.
               | 
               | As a programmer, its far easier to just insert bounds
               | checks everywhere, and trust the system to remove them
               | when possible. This is what Rust does, and it safe. The
               | problem isn't the compiler, the problem is the standard.
               | More broadly, the standard wasn't written with optimizing
               | compilers in mind.
        
               | Dylan16807 wrote:
               | If we're inlining the call, then we can hoist the NULL
               | check out of the loop. Now it's 1 check per 20 million
               | operations. There's no need to eliminate it or have UB at
               | that point.
        
             | menaerus wrote:
             | It depends on your CPU microarchitectural details, on the
             | complexity and size of your binary executable and the
             | workload of your binary.
             | 
             | So there's no universal answer to your question but it
             | could very well be "much".
        
       | MrMcCall wrote:
       | Isn't it more sensible to just check that the params that are
       | about to be sent to memcpy be reasonable?
       | 
       | That is why I tend to wrap my system calls with my own internal
       | function (which can be inlined in certain PLs), where I can
       | standardize such tests. Otherwise, the resulting code that
       | performs the checks and does the requisite error handling is
       | bloated.
       | 
       | Note that I am also loath to #DEFINE such code because C is
       | already rife with them and my perspective is that the less of
       | them the better.
       | 
       | At the end of the day, quick and dirty fixes will prove the adage
       | "short cuts make long delays", and OpenBSD's approach is the only
       | really viable long-term solution, where you just have to rewrite
       | your code if it has ill-advised constructs.
       | 
       | For designing libraries such as C's stdlib, I don't believe in
       | 'undefined behavior', clearly define your semantics and say, "If
       | you pass a NULL to memcpy, this is what will happen." Same for
       | providing a (n == 0), or should (src == dst).
       | 
       | And if, for some strange reason, fixing the semantics breaks
       | calling code, then I can't imagine that their code wasn't f_cked
       | in the first place.
        
         | hwc wrote:
         | > internal function
         | 
         | every time you introduce something nonstandard, you add one
         | little hardship to anyone trying to read or modify your code.
         | 
         | if a programmer is familiar with the language, it's standard
         | library, and the normal idioms, then they should be able to
         | just jump in.
        
         | int_19h wrote:
         | As the article points out, all major memcpy implementations
         | already do this check inside memcpy. Sure, the caller can also
         | check, but given that it's both redundant in practice and makes
         | some common patterns harder to use than they would otherwise
         | be, there's no reason to not just standardize what's already
         | happening anyway and make everyone's lives easier in the
         | process.
        
       | badmintonbaseba wrote:
       | I just skimmed through the proposed wording in [N3322]. It looks
       | like it silently fixes a defect too, NULL == NULL was also
       | undefined up until C23. Hilarious.
       | 
       | [N3322] https://www.open-
       | std.org/jtc1/sc22/wg14/www/docs/n3322.pdf
        
         | mananaysiempre wrote:
         | This is probably related to the issue with NULL - NULL
         | mentioned in the article.
         | 
         | Imagine you're working in real mode on x86, in the compact or
         | large memory model[1]. This means that a data pointer is
         | basically struct{uint16_t off,seg;} encoding linear address
         | (seg<<4)+off. This makes it annoying to have individual
         | allocations ("objects") >64K in size (because of the weird
         | carries), so these models don't allow that. (The huge model
         | does, and it's significantly slower.) Thus you legitimately
         | have sizeof(size_t) == 2 but sizeof(uintptr_t) == 4 (hi Rust),
         | and God help you if you compare or subtract pointers not within
         | the same allocation. [Also, sizeof(void *) == 4 but sizeof(void
         | (*)(void)) == 2 in the compact model, and the other way around
         | in the medium model.]
         | 
         | Note the addressing scheme is non-bijective. The C standard is
         | generally careful not to require the implementation to
         | canonicalize pointers: if, say, char a[16] happens to be
         | immediately followed by int b[8], an independently declared
         | variable, it may well be that &a+16 (legal "one past" pointer)
         | is {16,1} but &b is {0,2}, which refers to the exact same byte,
         | but the compiler doesn't have to do anything special because
         | dereferencing &a+16 is UB (duh) and comparing (char *)(&a+16)
         | with (char *)&b or subtracting one from the other is also UB
         | (pointers to different objects).
         | 
         | The issue with NULL == NULL and also with NULL - NULL is that
         | now the null pointer is required to be canonical, or these
         | expressions must canonicalize their operands. I don't know why
         | you'd ever make an implementation that has non-canonical NULLs,
         | but I guess the text prior to this change allowed such.
         | 
         | [1]
         | https://devblogs.microsoft.com/oldnewthing/20200728-00/?p=10...
        
           | amluto wrote:
           | > now the null pointer is required to be canonical
           | 
           | Yikes! This particular oddity seems annoying but sort of
           | harmless in x86 real mode, but not necessarily in protected
           | mode. Imagine code that wants to load a pointer into a
           | register: it loads the offset into an ordinary register and
           | the selector portion into a segment register. It's
           | permissible to load the 0 (null) selector, but loading
           | garbage will fault immediately. So, if you allow non
           | canonical NULL, then knowing that a pointer is either valid
           | or NULL does not allow you to hoist a segment load above a
           | condition that might mean you never actually dereference the
           | pointer.
           | 
           | (I have plenty of experience with low-level OS code in all
           | kinds of nasty x86 modes but, thankfully, not so much
           | experience writing ordinary C code targeting protected mode.
           | It sometimes boggles my mind that anyone ever got decent
           | performance with anything involving far data pointers.
           | Segment loads are slow, and there are not a lot of segment
           | registers to go around.)
        
             | bonzini wrote:
             | In real mode assembly days, ES and sometimes DS were just
             | another base register that you could use in a loop. Given
             | the dearth of addressing modes it was quite nice to assume
             | that large arrays started at xxxx0h and therefore that the
             | offset part of the far pointer was zero.
        
         | pm215 wrote:
         | If so, it's one that's been introduced at some point post C99
         | -- the C99 spec explicitly defines the behaviour of NULL ==
         | NULL. Section 6.5.9 para 6 says "Two pointers compare equal if
         | and only if both are null pointers, both are pointers to the
         | same object [etc etc]".
        
           | dwattttt wrote:
           | I don't imagine NULL is defined as "pointing to an object",
           | so I don't expect that clause to apply.
        
             | tsimionescu wrote:
             | You completely skipped over the first part: "Two pointers
             | compare equal if and only if _both are null pointers_ "
        
         | nikic wrote:
         | NULL == NULL was already defined -- but NULL <= NULL wasn't :)
        
         | IWeldMelons wrote:
         | Cannot find any confirmation to your statement. Otoh "All null
         | pointer values (of compatible typewithin the same address
         | space) are already required to compare equal. " in the limked
         | paper.
        
           | PaulDavisThe1st wrote:
           | NULL is not single type in any conventional sense (and is
           | actually tricky to define in a way that makes it usable in
           | the way most programmers expect).
           | 
           | Thus:                 T1* a = NULL;       T2* b = NULL
           | a == b; /* may be undefined at present, depending on the
           | nature of T1 & T2 */
        
       | MuffinFlavored wrote:
       | > because NULL + 0 is undefined behavior in C.
       | 
       | Why? It's 2024. Make it not be? Sure, some older stuff already
       | written might no longer compile and need to be updated. Put it
       | behind a "newer" standard flag/version or whatever.
       | 
       | Or is it that it can't be caught at compile time and only run
       | time... hmm...
        
         | sophiebits wrote:
         | They are making it not be. That's the whole point of the
         | article.
        
       | hwc wrote:
       | Well, that seems like something that should have been there from
       | the beginning .
        
       | nmilo wrote:
       | > However, the most vocal opposition came from a static analysis
       | perspective: Making null pointers well-defined for zero length
       | means that static analyzers can no longer unconditionally report
       | NULL being passed to functions like memcpy--they also need to
       | take the length into account now.
       | 
       | How does this make any sense? We don't want to remove a low
       | hanging footgun because static analyzers can no longer detect it?
        
         | hatthew wrote:
         | My understanding is that with this change, static analyzers
         | have three options:
         | 
         | 1. False positive on code that would have been an issue
         | previously
         | 
         | 2. False negative on a ton of similar footguns
         | 
         | 3. Add complexity to differentiate between these cases
         | 
         | None of these options are fun.
        
       ___________________________________________________________________
       (page generated 2024-12-11 23:01 UTC)