[HN Gopher] Making memcpy(NULL, NULL, 0) well-defined
___________________________________________________________________
Making memcpy(NULL, NULL, 0) well-defined
Author : gslin
Score : 188 points
Date : 2024-12-11 12:19 UTC (10 hours ago)
(HTM) web link (developers.redhat.com)
(TXT) w3m dump (developers.redhat.com)
| voidUpdate wrote:
| I feel like I've misunderstood something here... shouldn't
| memcpy(anything, anything, 0) just do nothing, because you're
| copying 0 bytes?
| mjg59 wrote:
| That's a reasonable intuitive interpretation of how it _should_
| behave, but according to the spec it 's undefined behaviour and
| compilers have a great degree of freedom in what happens as a
| result.
| voidUpdate wrote:
| Why didn't they just... define it, back when they wrote it?
| frabert wrote:
| Every time they leave something undefined, they do so to
| leave implementations free to use the underlying platform's
| default behavior, and to allow compilers to use it as an
| optimization point
| jcelerier wrote:
| Here it's more that it allows to assume that this is
| never the case, thus no need to have an additional check
| in it I assume ?
| lucozade wrote:
| > time they leave something undefined, they do so to
| leave implementations free to use the underlying
| platform's default behavior
|
| That's implementation defined (more or less) ie teh
| compiler can do whatever makes mst sense for its
| implementation.
|
| Undefined means (more or less) that the compiler can
| assume the behaviour never happens so can apply
| transforms without taking it into account.
|
| > to allow compilers to use it as an optimization point
|
| That's the main advantage of undefined behaviour ie if
| you can ignore the usage, you may be able to apply
| optimisations that you couldn't if you had to take it
| into account. In the article, for example, GCC eliminated
| what it considered dead code for a NULL check of a
| variable that couldn't be NULL according to the C spec.
|
| That's also probably the most frustrating thing about
| optimisations based on undefined behaviour ie checks that
| prevent undefined behaviour are removed because the
| compiler thinks that the check can't ever succeed
| because, if it did, there must have been undefined
| behaviour. But the way the developer was ensuring defined
| behaviour was with the check!
| frabert wrote:
| AFAIK, something having undefined behavior in the spec
| does not prevent an implementation- (platform-)specific
| behavior being defined.
|
| As to your point about checks being erased, that
| generally happens when the checks happen too late
| (according to the compiler), or in a wrong way. For
| example, checking that `src` is not NULL _after_
| memcpy(sec, dst, 0) is called. Or, checking for overflow
| by doing `if(x+y<0) ...` when x and y are nonnegative
| signed ints.
| nephanth wrote:
| I mean, they might not have given thought to that
| particular corner case, they probably wrote something like
|
| > memcpy(void* ptr1, void* ptr2, int n)
|
| Copy n bytes from ptr1 to ptr2. UNDEFINED if ptr1 is NULL
| or ptr2 is NULL
|
| -------
|
| It might also have come from a "explicit better than
| implicit" opinion, as in "it is better to have developers
| explicitly handle cases where the null pointer is involved
| jbverschoor wrote:
| I think it's more a strategy. C was not created to be
| safe. It's pretty much a tiny wrapper around assembler.
| Every limitation requires extra cycles, compile time or
| runtime, both of which were scarce.
|
| Of course, someone needs to check in the layers of
| abstraction. The user, programmer, compiler, cpu,
| architecture.. They chose for the programmer, who like to
| call themselves "engineers" these days.
| wruza wrote:
| Not sure what your last remark means wrt everything else.
| poincaredisk wrote:
| I disagree with your premise. C was designed to be a high
| level (for its time) language, abstracted from actual
| hardware
|
| >It's pretty much a tiny wrapper around assembler
|
| Assebler has zero problem with adding "null + 4" or
| computing "null-null". C does, because it's not actually
| a tiny wrapper.
| larschdk wrote:
| When C was conceived, CPU architectures and platforms were
| more varied than what we see today. In order to remain
| portable and yet performant, some details were left as
| either implementation defined, or completely undefined
| (i.e. the responsibility of the programmer). Seems archaic
| today, but it was necessary when C compilers had to be two-
| pass and run in mere kilobytes of RAM. Even warnings for
| risky and undefined behavior is a relatively modern concept
| (last 10-20 years) compared to the age of C.
| actionfromafar wrote:
| When C was conceived, it was made for a specific DEC CPU,
| for making an operating system. The idea of a C
| _standard_ was in the future.
|
| If you wanted to know what (for instance) memcpy
| _actually_ did, you looked at the source code, or even
| more likely, the assembler or machine code output. That
| was "the standard".
| anticensor wrote:
| No, K&R's book was the standard.
| actionfromafar wrote:
| First came the language, then a few years later they
| described it in a book.
| da_chicken wrote:
| I think it's reasonable to assume that GP clearly meant
| the C standard being conceived, as, obviously, K&R's C
| implementation of the language was ad hoc rather than
| exhibiting any prescribed specification.
| scoutt wrote:
| > Seems archaic today ... run in mere kilobytes of RAM
|
| There is an entire industry that does pretty much that...
| today. They might run in flash instead of RAM, but still,
| a few kilobytes.
|
| Probably there are more embedded devices out there than
| PCs. PIC, AVR, MSP, ARM, custom archs. There might be one
| of those right now under your hand, in that thing you use
| to move the cursor.
| krisoft wrote:
| > There is an entire industry that does pretty much
| that... today.
|
| Which industry runs C compilers on embeded devices?
| Because that is what the part you elipsised out was
| talking about.
| scoutt wrote:
| Oh... yes. You are right. My bad.
| sitzkrieg wrote:
| many do tho. i have targetted c89 and maybe c99 on
| several embedded devices
| 0xffff2 wrote:
| But you're running the compiler on the device rather than
| cross-compile?
| vlovich123 wrote:
| They cross compile. No one is compiling code on these
| machines.
| hyperman1 wrote:
| memcpy used to be a rep movsb on 8086 DOS compilers. I
| don't remember if rep movsb stops if cx=0 on entry, or
| decrements first and wraps around, copying 64K of data.
| connicpu wrote:
| I know at least MSVC's memcpy on x86_64 still results in
| a rep movsb if the cpuid flag that says rep movsb is fast
| is set, which it should be on all x86 chips from about
| 2011/2012 and onward ;)
| dfox wrote:
| The specification does not explicitly say that, but the
| clear intention is that REP with CX=0 should be no-op
| (you get exactly that situation when REP gets interrupted
| during the last iteration, in that case CX is zero and IP
| points to the REP, not the following instruction).
| bonzini wrote:
| Rep movsb copies 64K if CX=0 (that's actually very
| useful), but memcpy could be implemented as two
| instructions: jcxz skip rep
| movsb skip:
| wat10000 wrote:
| The original C standard was more descriptive than
| prescriptive. There was probably an implementation where it
| crashed or misbehaved.
| menaerus wrote:
| Charitable interpretation may be: Back then when the
| contract of this function was standardized, presumably in
| C89 which is ~35 years ago, CPUs but also C compilers were
| not as powerful so wasting an extra couple of CPU cycles to
| check this condition was much more expensive than it is
| today. Because of that contract, and which can be seen in
| the example in the below comments, the compiler is also
| free to eliminate the dead code which also has the effect
| of shaving off some extra CPU cycles.
| lmm wrote:
| Back when they wrote it they were trying to accommodate
| existing compilers, including those who did useful things
| to help people catch errors in their programs (e.g. making
| memcpy trap and send a signal if you called it with NULL).
| The current generation of compilers that use undefined
| behaviour as an excuse to do horrible things that screw
| over regular programmers but increase performance on
| microbenchmarks postdates the standard.
| FartyMcFarter wrote:
| Because the benefit was probably seen as very little, and
| the cost significant.
|
| When you're writing a compiler for an architecture where
| every byte counts you don't make it write extra code for
| little benefit.
|
| Programmers were routinely counting bytes (both in code
| size and data) when writing Assembly code back then, and I
| mean that literally. Some of that carried into higher-level
| languages, and rightly so.
| killerstorm wrote:
| From what I understand:
|
| 1. Initially, they just wanted to give compiler makers more
| freedom: both in the sense "do whatever is simplest" and
| "do something platform-specific which dev wants". 2.
| Compiler devs found that they can use UB for optimization:
| e.g. if we assume that a branch with UB is unreachable we
| can generate more efficient code. 3. Sadly, compiler devs
| started to exploit every opportunity for optimization, e.g.
| removing code with a potential segfault.
|
| I.e. people who made a standard thought that compiler would
| remove no-op call to memcpy, but GCC removes the whole
| branch which makes the call as it considers the whole
| branch impossible. Standard makers thought that compiler
| devs would be more reasonable
| kllrnohj wrote:
| > Standard makers thought that compiler devs would be
| more reasonable
|
| This is a bit of a terrible take? Compiler devs never did
| anything "unreasonable", they didn't sit down and go
| "mwahahaha we can exploit the heck out of UB to break
| everything!!!!"
|
| Rather, repeatedly applying a series of targeted
| optimizations, each one in isolation being "reasonable",
| results in an eventual "unreasonable" total
| transformation. But this is more an emergent property of
| modern compilers having hundreds of optimization passes.
|
| At the time the standards were created, the idea of
| compilers applying so many optimization passes was just
| not conceivable. Compilers struggled to just do basic
| compilation. The assumption was a near 1:1 mapping
| between code & assembly, and that just didn't age well at
| all.
| LegionMammal978 wrote:
| One could argue that "optimizing based on signed
| overflow" was an unreasonable step to take, since any
| given platform will have some sane, consistent behavior
| when the underlying instructions cause an overflow. A
| developer using signed operations without poring over the
| standard might have easily expected incorrect values (or
| maybe a trap if the platform likes to use those), but not
| big changes in control flow. In my experience, signed
| overflow is generally the biggest cause of "they're
| putting UB in my reasonable C code!", followed by the
| rules against type punning, which are violated every day
| by ordinary usage of the POSIX socket functions.
| kllrnohj wrote:
| > One could argue that "optimizing based on signed
| overflow" was an unreasonable step to take
|
| That optimization allows using 64-bit registers / offset
| loads for signed ints which it can't do if it has to
| overflow, since that overflow must happen at 32-bits.
| That's not an uncommon thing.
| uecker wrote:
| I started to like signed overflow rules, because it is
| really easy to find problems using sanitizers.
|
| The strict aliasing rules are not violated by typical
| POSIX socket code as a cast to a different pointer type,
| i.e. `struct sockaddr` by itself is well-defined
| behavior. (and POSIX could of course just define
| something even if ISO C leaves it undefined, but I don't
| think this is needed here)
| UncleMeat wrote:
| There isn't a "find UB branches" pass that is seeking out
| this stuff.
|
| Instead what happens is that you have something like a
| constant folding or value constraint pass that computes a
| set of possible values that a variable can hold at
| various program points by applying constraints of various
| options. Then you have a dead code elimination pass that
| identifies dead branches. This pass doesn't know _why_
| the "dest" variable can't hold the NULL value at the
| branch. It just knows that it can't, so it kills the
| branch.
|
| Imagine the following code: int x =
| abs(get_int()); if (x < 0) { // do stuff
| }
|
| Can the compiler eliminate the branch? Of course. All
| that's happened here is that the constraint propagation
| feels "reasonable" to you in this case and "unreasonable"
| to you in the memcpy case.
| meonukk wrote:
| Why is it allowed to eliminate the branch? In most
| architectures abs(INT_MIN) returns INT_MIN which is
| negative
| plorkyeran wrote:
| Calling abs(INT_MIN) on twos-complement machine is not
| allowed by the C standard. The behavior of abs() is
| undefined if the result would not fit in the return
| value.
| Sohcahtoa82 wrote:
| I didn't believe this so I looked it up, and yup.
|
| Because of 2's complement limitations, abs(INT_MIN) can't
| actually be represented and it ends up returning INT_MIN.
| UncleMeat wrote:
| It's possible that there is an edge case in the output
| bounds here. I'm just using it as an example.
|
| Replace it with "int x = foo() ? 1 : 2;" if you want.
| mjevans wrote:
| More reasonable: Emit a warning or error to make the code
| and human writing it better.
|
| NOT-reasonable: silently 'optimize' a 'gotcha' into
| behavior the programmer(s) didn't intend.
| gpderetta wrote:
| NOT-reasonable: expecting the compiler to read the
| programmer's mind.
| ynik wrote:
| Probably because they did not think of this special case
| when writing the standard, or did not find it important
| enough to consider complicating the standard text for.
|
| In C89, there's just a general provision for all standard
| library functions:
|
| > Each of the following statements applies unless
| explicitly stated otherwise in the detailed descriptions
| that follow. If an argument to a function has an invalid
| value (such as a value outside the domain of the function,
| or a pointer outside the address space of the program, or a
| null pointer), the behavior is undefined. [...]
|
| And then there isn't anything on `memcpy` that would
| explicitly state otherwise. Later versions of the standard
| explicitly clarified that this requirement applies even to
| size 0, but at that point it was only a clarification of an
| existing requirement from the earlier standard.
|
| People like to read a lot more intention into the standard
| than is reasonable. Lots of it is just historical accident,
| really.
| david-gpu wrote:
| More information on this behavior in the link below.
|
| _> Note that, apart from contrived examples with deleted
| null checks, the current rules do not actually help the
| compiler meaningfully optimize code. A memcpy implementation
| cannot rely on pointer validity to speculatively read
| because, even though memcpy(NULL, NULL, 0) is undefined,
| slices at the end of a buffer are fine. [And if the end of
| the buffer] were at the end of a page with nothing allocated
| afterwards, a speculative read from memcpy would break_
|
| https://davidben.net/2024/01/15/empty-slices.html
| Someone wrote:
| > [And if the end of the buffer] were at the end of a page
| with nothing allocated afterwards, a speculative read from
| memcpy would break
|
| 'Only' on platforms that have memory protection hardware.
| Even there, the platform can always allocate an overflow
| page for a process, or have the page fault handler check
| whether the page fault happened due to a speculative read,
| and repair things (I think the latter is hugely, hugely,
| hugely impractical, but the standard cannot rule it out)
| immibis wrote:
| Platforms without memory protection hardware also have no
| problem reading NULL.
| Someone wrote:
| My comment is a reply to (part of) a comment that isn't
| talking about reading from NULL. That's what the _[And if
| the end of the buffer]_ part implies.
|
| Even if it didn't, I don't think the standard should
| assume that _"Platforms without memory protection
| hardware also have no problem reading NULL"_
|
| An OS could, for example, have a very simple memory
| protection feature where the bottom half of the memory
| address range is reserved for the OS, the top half for
| user processes, and any read from an address with the
| high bit clear by code in the top half of the address
| range traps and makes the OS kill the process doing the
| read.
| BenjiWiebe wrote:
| Doesn't it take memory protection hardware to trap on a
| memory read?
| hun3 wrote:
| Not really. MMIO mapped at 0x0 for example.
| david-gpu wrote:
| Yikes! I would love sipping coffee watching the chief
| architect chew up whoever suggested that. That sounds
| awful even on a microcontroller.
| bonzini wrote:
| On s390 the memory at address 0 (low core) has all sorts
| of important stuff. Of course s390 has paging enabled
| pretty much always but still...
| colejohnson66 wrote:
| AVR's registers are mapped to address 0. So reading and
| writing NULL is actually modifying r0.
| formerly_proven wrote:
| AVR's r0 is also a totally normal register, unlike most
| other RISC which typically have r0 == 0.
| kevin_thibedeau wrote:
| They may also expect writes to address 0.
| Zondartul wrote:
| What does "speculative" mean in this case? I understand it
| as CPU-level speculative execution a.k.a. branch mis-
| prediction, but that shouldn't have any real-world effects
| (or else we'd have segfaults all the time due to executing
| code that didn't really happen)
| dwattttt wrote:
| Turns out you can have that kind of speculative failure
| too!
| https://randomascii.wordpress.com/2018/01/07/finding-a-
| cpu-d...
| xbar wrote:
| Upon which some people may rely...
| int_19h wrote:
| People will only rely on UB when it is well defined by a
| particular implementation, either explicitly or because of
| a long history of past use. E.g. using unions for type
| punning in gcc, or allowing methods to be called on null
| pointers in MSVC.
|
| But there's nothing like that here.
| jancsika wrote:
| I get that for the library. But I'm a bit puzzled about the
| optimizations done by a compiler based on this behavior.
|
| E.g., suppose we patch GCC to preserve any conditional
| containing the string 'NULL' in it. Would that have a
| measurable performance impact on Linux/Chromium/Firefox?
| captainmuon wrote:
| I feel strongly they should split undefined behavior in
| behavior that is not defined, and things that the compiler is
| allowed to assume. The former basically already exists as
| "implementation defined behavior". The latter should be
| written out explicitly in the documentation:
|
| > memcpy(dest, src, count)
|
| > Copies count bytes from src to dest. [...] Note this is not
| a plain function, but a special form that applies the
| constraints dest != NULL and src != NULL to the surrounding
| scope. Equivalent to: assume(dest != NULL)
| assume(src != NULL) actual_memcpy(dest, src, count)
|
| The conflation of both concepts breaks the mental model of
| many programmers, especially ones who learned C/C++ in the
| 90s where it was common to write very different code, with
| all kinds of now illegal things like type punning and
| checking this != NULL.
|
| I'd love to have a flag "-fno-surprizing-ub" or "-fhighlevel-
| assembler" _combined_ with the above `assume` function or
| some other syntax to let me help the compiler, so that I can
| write C like in the 90s - close to metal but with less
| surprizes.
| Thorrez wrote:
| >Note this is not a plain function, but a special form that
| applies the constraints dest != NULL and src != NULL to the
| surrounding scope.
|
| Plain functions can apply constraints to the surrounding
| code:
|
| https://godbolt.org/z/fP58WGz9f
| rcxdude wrote:
| Purely mechanically, yes, but in terms of the definition of the
| behaviour in the C abstract machine, no, because certain
| operations on null pointers are undefined, even if the obvious
| low-level compilation turns into nothing.
| codedokode wrote:
| Maybe we should get rid of "abstract machine" and treat
| pointers as memory addresses?
| davidt84 wrote:
| Congratulations, you've invented an entirely new language.
|
| Now, who's going to write the compiler for it?
| anticensor wrote:
| No, it's C at -O0.
| davidt84 wrote:
| No, it's not.
|
| Undefined behaviour is undefined behaviour whatever
| optimisation level you use.
|
| Some -f flags may extend the C standard and remove
| undefined behaviour in some cases (e.g. strict aliasing,
| signed integer overflow, writable string constants, etc.)
| gpderetta wrote:
| int* oracle(); int foo() { int x = 1;
| *oracle() = 42; return x; }
|
| Is the above program allowed to return anything other than
| 1 in your language?
| kibwen wrote:
| To elaborate, we treat pointers as more than just
| integers because it gives optimizers the latitude to
| reorder and eliminate pointer operations. In the example
| above we cannot do this, because we cannot prove at
| compile time that x doesn't live at the address returned
| by oracle.
|
| For some high-quality further discussion, see Ralf Jung's
| series of blog posts starting with
| https://www.ralfj.de/blog/2018/07/24/pointers-and-
| bytes.html
| shultays wrote:
| However, given how low-level a language C++ is, we can
| actually break this assumption by setting i to y-x. Since
| &x[i] is the same as x+i, this means we are actually
| writing 23 to &y[0].
|
| But that is undefined, you can't do x + (y - x) ie a
| pointer arithmetic that ends outside of bounds of an
| array. Since it is undefined, shouldn't C++ assume that
| changing x[..] can't change y[0]
|
| edit: welp, if I read a few more lines into article I
| would see that it also tells it is undefined
| gpderetta wrote:
| to be clear, in my example the result of oracle() cannot
| possibly alias with 'x' in C or C++ (and in fact gcc will
| optimize accordingly). In a different language where
| addresses are mere integers, things would be more
| complicated.
| codedokode wrote:
| The result of oracle can point to anything if you write
| it as return (int *)rand();
|
| Note that rand() returns 32-bit value so you have to call
| it twice and merge the results to obtain a 64-bit
| pointer.
| gpderetta wrote:
| The numerical value returned by oracle might physically
| match the address of the stack slot for 'x', assuming
| that it exists, but it doesn't mean that, from a language
| point of view, it is a valid pointer.
|
| If forging pointers had defined behaviour, it would be
| impossible to use the language sanely or perform any kind
| of optimization.
| alerighi wrote:
| Well even in C is not guaranteed to return anything other
| than 1, since oracle() may return the memory address of
| variable 1.
| gpderetta wrote:
| the literal 1 is not an object in C or C++ hence it does
| not have an address. If you meant 'x', then also no,
| oracle() can't return the address of 'x' because of
| pointer provenance rules.
| shultays wrote:
| Is it allowed to return anything else in C? Is there
| anything in standard C that would allow oracle() to
| access memory address of x?
|
| Sure different compilers might allow inlining assembly or
| some other ways to access x on previous stack perhaps but
| then it is not really "C"
| wat10000 wrote:
| That's the point. C allows this function to be optimized
| to always return 1. A "pointers are addresses, just emit
| reads and writes and stop trying to be so clever" version
| of C would require x to be spilled to the stack, then the
| write, then reload x and return whatever it contained.
| cv5005 wrote:
| Then use the register keyword or just reword the standard
| to assume the register behavior if a variables address
| hasn't been taken.
|
| The majority of useful optimizations can be kept in a
| "Sane C" with either code style changes (cache stuff in
| local vars to avoid aliasing for example) or with minor
| tweaks to the standard.
| wat10000 wrote:
| Register behavior is what you want essentially all of the
| time. So we'd just have to write `register` all over the
| place for no gain.
|
| "Don't optimize this, read and write it even if you think
| it's not necessary" is a very rare case so it shouldn't
| be the default. If you want it, use the volatile keyword.
|
| There's no need to reword the standard to assume the
| register behavior if the variable's address hasn't been
| taken. That's already how it works. In this example, if
| you escape the value of `&x`, it's not legal to optimize
| this function to always return 1.
| codedokode wrote:
| When using C, this can return anything (or crash of
| oracle function returns an invalid pointer, or rewrite
| its own code if the code section is writable). So if you
| get rid of "abstract machine", nothing changes - the
| program can return anything or crash.
| wat10000 wrote:
| A conforming C compiler is allowed to emit that function
| to perform the write and then return the constant 1.
| Should that be allowed?
| atq2119 wrote:
| [delayed]
| sixfiveotwo wrote:
| How would you define what a memory address is without first
| defining in which context it has a meaning?
| codedokode wrote:
| C was written as a portable assembly language, so I think
| a memory address is a number that CPU considers to be a
| memory address.
| layer8 wrote:
| That's currently the case in C, in that you can convert
| pointers to and from _uintptr_t_. However, not every
| number representable in that type needs to be valid
| memory (that's true on the assembly level as well), hence
| it's only defined for valid pointers.
| sixfiveotwo wrote:
| > I think a memory address is a number that CPU considers
| to be a memory address
|
| I meant to say that, indeed, there must be some concept
| of CPU for a memory address to have a meaning, and for
| this concept of CPU to be as widely applicable as
| possible, surely defining it as abstract as possible is
| the way to go. Ergo, the idea of a C abstract machine.
|
| Anyway, other people in this thread are discussing the
| matter more accurately and in more details than I could
| hope to do, so I'll leave it like that.
| lmm wrote:
| 20 years ago, making a C compiler that provided sane
| behaviour and better guarantees (going beyond the minimum
| defined in the standard) to make code safer and
| programmers' lives easier, even at the cost of some
| performance, might have been a good idea. Today any
| programmer who thinks things like not having security bugs
| are more important than having bigger numbers on
| microbenchmarks has already moved on from C.
| uecker wrote:
| This is certainly not true. Many programmers also learned
| to the use tools available to write reasonably safe code
| in C. I do not personally find this problematic.
| quotemstr wrote:
| > safe code in C
|
| You're like a Japanese holdout in the 60s refusing to
| leave his bunker long after the war is over.
|
| C lost. Memory safety is a huge boon for security. Human
| beings, even the best of them, cannot consistently write
| correct C code. (Look at OpenBSD.) You can keep fighting
| the war your side has already lost or you can move on.
| uecker wrote:
| Well, memory safety is great but it seems Rust
| programmers also manage to create memory safety issues
| just fine:
|
| https://rustsec.org/advisories/RUSTSEC-2024-0401.html
| https://rustsec.org/advisories/RUSTSEC-2024-0400.html
| https://rustsec.org/advisories/RUSTSEC-2024-0377.html
| https://rustsec.org/advisories/RUSTSEC-2024-0374.html
| etc.
| whytevuhuni wrote:
| I think the first one, stack overflow, is technically not
| a memory safety issue, just denial-of-service on resource
| exhaustion. Stack overflow is well defined as far as I
| know.
|
| The other three are definitely memory safety issues.
| quotemstr wrote:
| C++ is a better unsafe language than unsafe Rust, IMHO.
| The thing about the social dynamic of Rust, though, is
| that it keeps unsafe code to a minimum.
| ryao wrote:
| I would consider a stack overflow to be a memory safety
| issue. The C++ language authors likely would too. C++
| famously refused to support variable length stack
| allocated arrays because of memory safety concerns. In
| specific, they were worried that code at runtime would
| make an array so big so big that it would jump the OS
| guard page, allowing access to unallocated memory that of
| course is not noticed ahead of time during development.
| This is probably easy to do unintentionally if you have
| more stack variables after an enormous stack allocated
| array and touch them before you touch the array. The
| alternative is to force developers to use compiler
| extensions such as alloca(). That makes it easy to pass
| pointers outside of the stack frame where they are valid
| and is a definite safety issue. The C++ nitpicking over
| variable length arrays is silly since it gives us a
| status quo where C++ developers use alloca() anyway, but
| it shows that stack overflows are considered a memory
| safety issue.
| whytevuhuni wrote:
| In the general case, I think you might be right, although
| it's a bit mitigated by the fact that Rust does not have
| support for variable length arrays, alloca, or anything
| that uses them, in the standard library. As you said
| though, it's certainly possible.
|
| I was more referring to that specific linked advisory,
| which is unlikely to use either VLAs or alloca. In that
| case, where stack overflow would be caused by recursion,
| a guard frame will always be enough to catch it, and will
| result in a safe abort [0].
|
| [0] https://github.com/rust-lang/rust/pull/31333
| ryao wrote:
| Use a sound static analyzer like astree and you can
| produce memory safe C code:
|
| https://www.absint.com/astree/index.htm
|
| Note that the key word here is sound. The more common
| static analyzers are unsound tools that will miss cases.
| Sound tools do not, but few people know of them, they are
| rare and they are typically proprietary and expensive.
| quotemstr wrote:
| Sure. I'm also a big fan of what Microsoft has done with
| SAL. And of course you have formally proven C, as used in
| seL4. I'd say that the contortions you have to go through
| to write code with these systems takes you out of the
| domain of "C" and into a domain of a different, safer
| language merely resembling C. Such a language might be a
| fine tool! But it's not arbitrary C.
| NobodyNada wrote:
| If you do this, your C code will run significantly slower
| than, say, Java, Go, or C#, because the compiler is unable
| to apply even the most basic optimizations (which it can do
| still in all those other languages).
|
| So, at that point why even use C at all? Today, C is used
| where the overhead of a managed language is unacceptable.
| If you could just eat the performance cost, you'd probably
| already be using a managed language. There's not much
| desire for a variant of C with what would be at least a 10x
| slowdown in many workloads.
| cv5005 wrote:
| Or it could be made faster because certain manual
| optimizations become possible.
|
| An example would a table of interned strings that you
| wanna match against (say you're writing a parser). Since
| standard C says thou shall not compare pointers with < or
| > unless they both point into the same 'object' you are
| forbidden from doing the speed of light code:
| char *keywords_begin, *keywords_end; if(some_str >=
| keywords_begin && some_str < keywords_end) ...
|
| Official standard sanctioned workarounds would require
| extra indirection (using indices for example) which is
| suboptimal.
| gpderetta wrote:
| You can cast them to uintptr_t and compare them to your
| heart's desire.
| layer8 wrote:
| That would restrict C to memory models with a linear
| address space. That is usually the case nowadays for C
| implementations, but maybe we don't want to set that in
| stone, because it would be virtually impossible to revert
| such a guarantee.
|
| There's also cases like memory address ranges that map to
| non-memory hardware (i.e. that don't behave like "dumb"
| memory), and how would you have the C standard define
| behavior for those?
|
| Lastly, CPU caches require _some_ sort of abstract model as
| soon as you have multi-threading.
| Measter wrote:
| The value of an abstract machine is that it allows you to
| specify how a given program behaves without needing to
| point to a specific piece of hardware. Compilers then have
| this as a target when compiling a program for a specific
| piece of hardware so that they know when the compiler's
| output is correct.
|
| The issue here is that the abstract machine is under or
| badly specified.
| IcePic wrote:
| "man bcopy" on BSD:
|
| 'If len is zero, no bytes are copied.'
|
| Seems reasonable.
| pkhuong wrote:
| It does nothing, but is only defined when the pointers point
| into or one past the end of valid objects (live allocations),
| because that's how the standard defines the C VM, in terms of
| objects, not a flat byte array.
| whytevuhuni wrote:
| What if the objects are non-NULL, but invalid (not actually
| allocated)?
|
| For example, Rust will use address 1 with length 0 for static
| empty strings, because 1 is a properly aligned non-null
| pointer.
|
| I would imagine such strings end up being passed to C code
| sometimes, which may end up calling memcpy with a length of 0
| on them.
| pkhuong wrote:
| also UB according to the spec, but LLVM is free to define
| it. e.g., clang often converts trivial C++ copy
| constructors to memcpy, which is UB for self-assignment,
| but I assume that's fine because the C++ front-end only
| targets LLVM, and LLVM presumably defines the behaviour to
| do what you'd expect.
| whytevuhuni wrote:
| Where I work, it is quite normal to link together C code
| compiled with GCC and Rust code compiled with LLVM, due
| to how the build system is set up.
|
| As far as I know that disables LTO, but the build system
| is so complex, and the C code so large, that nobody
| bothers switching the C side to Clang/LLVM as well.
| creshal wrote:
| > What if the objects are non-NULL, but invalid (not
| actually allocated)?
|
| Still UB, since they're restricted pointers that must be
| valid to begin with.
| bonzini wrote:
| This is wrong. If you do p=malloc(256), p+256 is valid
| even though it does not point to a valid address (it
| might be in an unmapped page; check out ElectricFence).
| Rust's non-null aligned other pointer is the same, memcpy
| can't assume it can be dereferenced if the size is zero.
| The standard text in the linked paper says the same.
| badmintonbaseba wrote:
| Still technically UB according to the proposed wording. The
| proposed wording only deals with allowing null pointers
| explicitly.
| bluetomcat wrote:
| A trivial implementation wouldn't dereference dest or src in
| case the length is 0. That's how a student would write it with
| a for loop (byte-by-byte copy). A non-trivial implementation
| might do something with the pointers before entering the copy
| loop.
| ryukoposting wrote:
| Yes and no.
|
| No, because ISO never said it _must_ behave this way.
|
| Yes, because every libc I've personally encountered acts this
| way. At a glance, glibc's x86 implementation[1, 2], musl, and
| picolibc all handle 0-length memcpy as you'd expect. I'm sure
| other folks could dig up the code for Newlib, uclibc, and
| others, and they'd see the same thing.
|
| On a related note, ISO C has THREE different things that most
| people tend to lump together as "undefined behavior." They are:
|
| Implementation-defined behavior: ISO doesn't require any
| particular behavior, but they _do_ require implementations to
| consistently apply a particular behavior, and document that
| behavior.
|
| Unspecified behavior: ISO doesn't require any particular
| behavior, but they _do_ require implementations to consistently
| use a particular behavior, but they _don 't_ require that
| behavior to be documented.
|
| Undefined behavior: ISO doesn't require any particular
| behavior, and they don't require implementations to define any
| particular behavior either.
|
| [1]:
| https://github.com/lattera/glibc/blob/master/string/memcpy.c
| [2]:
| https://github.com/lattera/glibc/blob/895ef79e04a953cac14938...
| ryao wrote:
| I have asked this question in the past and was told that
| memcpy() is allowed to preemptively read before it has
| determined it needs to write to make it faster on some CPUs.
| The presumption is that if you are going to be copying data,
| there is at least one cache line there already, so reading can
| start early.
| whytevuhuni wrote:
| How interesting. GCC does indeed remove that branch.
|
| https://godbolt.org/z/aPcr1bfPe
| mjg59 wrote:
| Explanation for the above: passing NULL as the destination
| argument to memcpy() is undefined behaviour at present. gcc
| assumes that the fact that memcpy() is called therefore means
| that the destination argument can't be NULL, so "knows" that
| the dest == NULL check can never be true, and so removes the
| test and the do_thing1() branch entirely.
|
| Interestingly, replacing len in the memcpy() call results in
| gcc instead removing the memcpy() call and retaining the check
| - presumably a different optimisation routine decides that it's
| a no-op in that case. https://godbolt.org/z/cPdx6v13r is,
| therefore, interesting - despite this only ever calling test()
| with a len of 0, the elision of the dest == NULL check is still
| there, but test() has been inlined _without_ the memcpy
| (because len == 0) but _with_ do_thing2() (because the
| behaviour is undefined and so it can assume dest isn 't NULL
| even though there's a NULL literally right there!)
|
| Fucking compilers, man.
| nayuki wrote:
| > Fucking compilers, man.
|
| They're just acting as agents that derive the logical
| consequences of the code.
|
| The fact that the given example code is "surprising" is
| analogous to this mathematical derivation:
| a = b a*a = b*a a*a - b*b = b*a - b*b
| (a - b)(a + b) = b(a - b) (a - b)(a + b)/(a - b) =
| b(a - b)/(a - b) ^ Divide by 0, undefined behavior!
| Everything below is not necessarily true. a + b = b
| b + b = b 2b = b 2 = 1 2 - 1 = 1 - 1
| 1 = 0
|
| The source of truth about what is/isn't allowed is the C
| standard, not your personal simplified model of it that may
| contain dangerous misconceptions. The fact that your mental
| model doesn't match the document is an education problem, not
| a problem with the compiler.
| marssaxman wrote:
| > They're just acting as agents that derive the logical
| consequences of the code.
|
| In a particularly pedantic, uptight, and sometimes un-
| helpful way, yes.
|
| Compilers don't _have_ to be designed this way; in fact it
| is a relatively recent development in the history of such
| tools.
| saurik wrote:
| > The fact that your mental model doesn't match the
| document is an education problem, not a problem with the
| compiler.
|
| Or it is a problem with the document, which is the entire
| reason we are having this discussion: N3322 argued the
| document should be fixed, and now it will be for C2y.
| jpollock wrote:
| How does gcc infer anything about memcpy? Can't I replace the
| c-library memcpy with my own, so how does it know that dest
| == NULL can never be true?
| 0xffff2 wrote:
| If I'm understanding the OP correctly, the C standard says
| so, i.e. the semantics of memcpy are defined by the
| standard and the standard says that it's UB to pass NULL.
| tialaramex wrote:
| Unlike all the more complicated languages the
| "freestanding" mode C doesn't even have a memcpy feature,
| so it may not define how one works - maybe you've decided
| to use the name "memcpy" for your function which
| generates a memorandum about large South American
| rodents, and "memo_capybara" was too much typing.
|
| In something like C++ or Rust, even their bare metal
| "What do you mean _Operating System_? " modes quietly
| require memcpy and so on because we're not savages,
| clearly somebody should provide a way to _copy bytes of
| memory_ , Rust is so civilised that even on bare metal
| (in Rust's "core" library) you get a working
| sort_unstable() for your arbitrary slice types!
| bonzini wrote:
| The compiler is free to give a meaning to memcpy if run
| in the (default) hosted mode. There's -ffreestanding for
| freestanding environments.
| tialaramex wrote:
| Right, though I guess I wasn't clear enough about that
| for the down voters, but whatever.
| mjg59 wrote:
| The valid inputs to memcpy() are defined by the C
| specification, so the compiler is free to make assumptions
| about what valid inputs are even if the library
| implementation chooses to allow a broader range of inputs
| MindSpunk wrote:
| Many standard C functions are treated as "magic" by
| compilers. Malloc is treated as if it has no side effects
| (which of course it does, it changes allocator state) so
| the optimiser can elide allocations. If not you wouldn't be
| able to elide the call because malloc looks like it has
| side effects, which it does but not ones we care about
| observing.
| gpderetta wrote:
| Not only that, malloc is also assumed to return pointer
| that don't alias anything else.
| int_19h wrote:
| Per ISO C, the identifiers declared or defined with
| external linkage by any C standard library header are
| considered reserved, so the moment you define your own
| memcpy, you're already in UB land.
| ryao wrote:
| You can, but gcc may replace it with an equivalent set of
| instructions as a compiler optimization, so you would have
| no guarantee it is used unless you hack the compiler.
|
| On a related note, GCC optimizing away things is a problem
| for memset when zeroing buffers containing sensitive data,
| as GCC can often tell that the buffers are going to be
| freed and thus the write is deemed unnecessary. That is a
| security issue and has to be resolved by breaking the
| compiler's optimization through a clever trick:
|
| https://github.com/openzfs/zfs/commit/d634d20d1be31dfa8cf06
| e... 12352
|
| Similarly, GCC may delete a memcpy to a buffer about to be
| freed, although I have never observed that as you generally
| don't do that in production code.
| bonzini wrote:
| If you do so you have to add -fno-builtins (or just -fno-
| builtin-memcpy).
| ndesaulniers wrote:
| > For example, GCC will happily remove the dest == NULL branch
| in the following code
|
| I think the blog should mention `-fno-delete-null-pointer-
| checks`
|
| https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html#ind...
| AceJohnny2 wrote:
| > _-fdelete-null-pointer-checks_
|
| > [...]
|
| > _This option is enabled by default on most targets._
|
| What a footgun.
|
| I understand that, in an effort to compete with other
| compilers for relevance, GCC pursued performance over safety.
| Has that era passed? Could GCC choose safer over fast?
|
| Alternatively, has someone compiled a list of flags one might
| want to enable in latest GCC to avoid such kinds of dangerous
| optimizations?
| ryao wrote:
| Usually, when one marks an argument as nonnull via a
| function attribute, one wants NULL checks to be removed.
| ndesaulniers wrote:
| There's two similar but distinct function attributes for
| nullability. One affects codegen, one affects diagnostics
| only.
| AceJohnny2 wrote:
| Irrelevant, because delete-null-pointer-checks happens
| even in absence of nonnull function attribute, see GP's
| godbolt link, and the documentation that omits any
| reference to that function attribute.
|
| _That 's_ what makes it dangerous!
| ryao wrote:
| That is a side effect of passing the pointer as a
| function parameter marked nonnull. It implies that the
| pointer is nonnull and any NULL checks against it can be
| removed. Pass it to a normal function and you will not
| see the null check removed.
| comex wrote:
| Just for the record, that's not the main purpose of
| -fdelete-null-pointer-checks.
|
| Normally, it only deletes null checks after actual null
| pointer dereferences. In principle this can't change
| observable behavior. Null dereferences are guaranteed to
| trap, so if you don't trap, it means the pointer wasn't
| null. In other words, _unlike most C compiler
| optimizations_ , -fdelete-null-pointer-checks should be
| safe even if you do commit undefined behavior.
|
| This once caused a kerfuffle with the Linux kernel. At the
| time, x86_64 CPUs allowed the kernel to dereference
| userspace addresses, and the kernel allowed userspace to
| map address 0. Therefore, it was possible for userspace to
| arrange for null pointers to _not_ trap when dereferenced
| in the kernel. Which meant that the null check optimization
| could actually change observable behavior. Which introduced
| a security vulnerability. [1]
|
| Since then, Linux has been compiled with `-fno-delete-null-
| pointer-checks`, but it's not really necessary: Linux
| systems have long since enforced that userspace can't map
| address 0, which means that deleting null pointer checks
| should be safe in both kernel and userspace. (Newer CPU
| security features also protect the kernel even if userspace
| _is_ allowed to map address 0.)
|
| But anyway, I didn't know that -fdelete-null-pointer-checks
| treated "memcpy with potentially-zero size" as a condition
| to remove subsequent null pointer checks. That means that
| the optimization actually isn't safe! Once GCC is updated
| to respect the newly well-defined behavior, though, it
| should become truly safe. Probably.
|
| The same can't be said for most UB optimizations - most of
| which can't be turned off.
|
| [1] https://lwn.net/Articles/342330/
| ape4 wrote:
| Only about 1000 more functions to do this to.
| high_na_euv wrote:
| >On the one hand, UB can be important for compiler optimizations
|
| e.g?
| cwzwarich wrote:
| The example in this blurb is a pretty good one:
| https://www.hboehm.info/c++mm/why_undef.html
| cesarb wrote:
| The simplest example of a compiler optimization enabled by UB
| would be the following: int my_function() {
| int x = 1; another_function(); return x;
| }
|
| The compiler can optimize that to: int
| my_function() { another_function(); return 1;
| }
|
| Because it's UB for another_function() to use an out-of-bounds
| pointer to access the stack of my_function() and modify the
| value of x.
|
| And the most important example of a compiler optimization
| enabled by UB is related to that: being UB to access local
| variables through out-of-bounds pointers allows the compiler to
| place them in registers, instead of being forced to go through
| the stack for every operation.
| alerighi wrote:
| Does this still matters today? I mean, first registers are
| anyway saved on the stack when calling a function, and caches
| of modern processors are really nearly as fast (if not as
| fast!) as a register. Registers these days are merely labels,
| since internally the processor (at least for x86) executes
| the code in a sort of VM.
|
| To me it seems that all these optimizations were really
| something useful back in the day, but nowadays we can as well
| just ignore them and let the processor figure it out without
| that much loss of performance.
|
| Assuming that the program is "bug free" to me is a terrible
| idea, since even mitigations that the programmer puts in
| place to mitigate the effect of bugs (and no program is bug
| free) are skipped because the compiler can assume the program
| has no bug. To me security is more important than a 1% more
| boost in performance.
| gpderetta wrote:
| Register allocation is one of the most basic optimizations
| that a compiler can do. Some modern cpus can alias stack
| memory with internal registers, but it is still not as fast
| as not spilling at all.
|
| You can enjoy -O0 today and the compiler will happily
| allocate stack slots for all your variables and keep them
| up to date (which is useful for debugging). But the
| difference between -O0 and -O3 is orders of magnitude on
| many programs.
| wbl wrote:
| Many calling conventions use registers. And no loads and
| stores are extremely complex and not free at all: fewer can
| issue in each cycle and there's some very expensive
| hardware spent to maintain the ordering on execution.
| cesarb wrote:
| > I mean, first registers are anyway saved on the stack
| when calling a function
|
| No, they aren't. For registers defined in the calling
| convention as "callee-saved", they don't have to be saved
| on the stack before calling a function (and the called
| function only has to save them if it actually uses that
| register). And for registers defined as "caller-saved",
| they only have to be saved if their value needs to be kept.
| The compiler knows all that, and tends to use caller-saved
| registers as scratch space (which doesn't have to be
| preserved), and callee-saved registers for longer-lived
| values.
|
| > and caches of modern processors are really nearly as fast
| (if not as fast!) as a register.
|
| No, they aren't. For instance, a quick web search tells me
| that the L1D cache for a modern AMD CPU has at least 4
| cycles of latency. Which means: even if the value you want
| to read is already in the L1 cache, the processor has to
| wait 4 cycles before it has that value.
|
| > Registers these days are merely labels, since internally
| the processor (at least for x86) executes the code in a
| sort of VM.
|
| No, they aren't. The register file still exists, even
| though register renaming means which physical register
| corresponds to a logical register can change. And there's
| no VM, most common instructions are decoded directly
| (without going through microcode) into a single uOp or pair
| of uOps which is executed directly.
|
| > To me it seems that all these optimizations were really
| something useful back in the day, but nowadays we can as
| well just ignore them and let the processor figure it out
| without that much loss of performance.
|
| It's the opposite: these optimizations are more important
| nowadays, since memory speeds have not kept up with
| processor speeds, and power consumption became more
| relevant.
|
| > To me security is more important than a 1% more boost in
| performance.
|
| Newer programming languages agree with you, and do things
| like checking array bounds on every access; they rely on
| compiler optimizations so that the loss of performance is
| only that "1%".
| MrMcCall wrote:
| I don't find those compelling reasons and, to the contrary, I
| think that kind of semantic circumvention to be a symptom of
| a poorly developed industry.
|
| How can we have properly functioning programs without
| clearly-defined, and _sensible_ , semantics?
|
| If the developer needs to use registers, then they should
| choose a dev env/PL that provides them, otherwise such
| kludges will crash and burn, IMO.
| bagels wrote:
| We pay for the flexibility of not wearing seatbelts for
| increasing the consequences of crashes.
| gpderetta wrote:
| We stopped explicitly declaring locals with the 'register'
| keyword circa 40 years ago. Register allocation is a low
| hanging fruit and one of those things that is definitely
| best left to a compiler for most code.
| wat10000 wrote:
| Are you saying that C compilers should change every local
| variable access to read and write to the stack just in case
| some function intentionally does weird pointer arithmetic
| to change their values without referring to them in the
| source code?
| wruza wrote:
| And now they have to manage register pressure for it to
| keep being faster. And false dependencies. And some more.
| It doesn't work like that. Developers can't optimize like
| compilers do, not with modern CPUs. The compilers do the
| very heavy lifting in exchange for the complexity of a set
| of constraints they (and you as a consequence, must) rely
| on. The more relaxed these constraints are, the less
| performant code you get. Modern CPUs run modern
| interpreters as fast as dumbest-compiled C code basically,
| so if you want sensible semantics, then Typescript is one
| of the absolutely non-ironic answers.
| cv5005 wrote:
| You dont need UB for that.
|
| A simple model for both compilers and programmers to
| understand:
|
| "A variable whose address has not been taken need not be
| reachable via a random pointer".
|
| I mean that's how an assembly programmer would think - if I
| put something in r0 I don't expect a store instruction to
| clobber it.
| UncleMeat wrote:
| What you describe there is UB. If you define this in the
| standard, you are defining a kind of runtime behavior that
| can never happen in a well formed program and the compiler
| does not have to make a program that encounters this
| behavior do anything in particular.
| rwmj wrote:
| This explanation of why signed int overflow is undefined is
| interesting (although the behaviour is still very annoying):
| https://kristerw.blogspot.com/2016/02/how-undefined-signed-o...
| (HN discussion: https://news.ycombinator.com/item?id=11146384)
|
| More examples here: http://blog.llvm.org/2011/05/what-every-c-
| programmer-should-...
| Arch-TK wrote:
| http://blog.llvm.org/2011/05/what-every-c-programmer-should-...
|
| In a real world program removing all UB is some cases
| impossible without adding new breaking features to the C
| language. But, taking a real world program and removingh all UB
| which IS possible to remove will introduce an overhead. In some
| programs this overhead is irrelevant. In others, it is probably
| the reason why C was picked.
|
| If you want speed without overhead, you need to have more
| statically checked guarantees. This is what languages such as
| Rust attempt to achieve (quite successfully).
| uecker wrote:
| Many real world C programs have no UB.
|
| What Rust attempts to achieve is the possibility of
| accidentally introducing UB by designing the language in away
| that makes it impossible to have UB when sticking to the safe
| subset.
|
| It also possibly to make sure to ensure that C programs have
| no UB and this does not require any breaking features to C.
| It usually requires some refactoring the program.
| GuB-42 wrote:
| Generally, undefined behavior removes the need for
| systematically checking for special cases, the most common
| being out of bounds access.
|
| But it can go further than that. Dereferencing a NULL pointer
| is undefined behavior, so if a pointer is dereferenced, it can
| be assumed by the compiler not to be NULL and the code can be
| optimized. For example: void foo(int *p) {
| *p++; if (p == NULL) { printf("val is
| NULL\n"); } else { printf("val is %d\n", *p);
| } }
|
| can be optimized to: void foo(int *p) {
| *p++; printf("val is %d\n", *p); }
|
| Note that static analyzers will most likely issue a warning
| here as such a trivial case is most likely a mistake. But the
| check for NULL may be part of an inline function that is used
| in many places, and thanks to the undefined behavior, the code
| that handles the NULL case will only be generated when
| relevant. The problem, of course, is that it assumes that the
| programmer knows what he is doing and doesn't make mistakes.
|
| In the case of memcpy(NULL, NULL, 0), there probably isn't much
| to gain making it undefined. It most likely doesn't help with
| the memcpy implementation (len=0 is a generally no-op), and
| inference based on the fact that the arguments can't be NULL is
| more likely to screw the programmer up than to improve
| performance.
| high_na_euv wrote:
| But how much actual performance is gained here?
| bagels wrote:
| It all adds up. All those instructions you don't have to
| execute, especially memory access and cache misses from
| jumps, pipeline stalls from conditionals, not just from
| this optimization.
| ncruces wrote:
| Imagine that you created a function GetPixel that reads an
| RGB pixel at a memory address, and which has a NULL check
| as a precondition.
|
| If the compiler can "prove" that the pointer is not NULL it
| can (after inlining the call) remove 20 million checks for
| a 20 megapixel image.
|
| The silly issue is the compiler using "you accessed it
| before" (aka "undefined behaviour") to "prove" that the
| pointer is not NULL.
|
| But I can attest that avoiding 20 million such checks does
| indeed make a huge difference.
| cv5005 wrote:
| Just make a non null checking version: GetPixelUnsafe()
| and let the responsibility onto the user to do the null
| check before the loop.
|
| All of these 'problems' have simple and straigtforward
| workarounds, I'm not convinced these UB are needed at
| all.
| ncruces wrote:
| That's a non solution for existing code that already
| calls GetPixel 20 million times.
|
| It's not like I'm saying C is the best possible way to
| write new code.
|
| I'm just commenting why this matters for performance, and
| "remove all undefined behavior" from C compilers is a
| non-starter.
|
| Now go write Rust for all I care.
| nemothekid wrote:
| > _All of these 'problems' have simple and straigtforward
| workarounds, I'm not convinced these UB are needed at
| all._
|
| He gave you a simple and straightforward example, but
| that example may not be representative of a real world
| program where complex analysis leads to better performing
| code.
|
| As a programmer, its far easier to just insert bounds
| checks everywhere, and trust the system to remove them
| when possible. This is what Rust does, and it safe. The
| problem isn't the compiler, the problem is the standard.
| More broadly, the standard wasn't written with optimizing
| compilers in mind.
| Dylan16807 wrote:
| If we're inlining the call, then we can hoist the NULL
| check out of the loop. Now it's 1 check per 20 million
| operations. There's no need to eliminate it or have UB at
| that point.
| menaerus wrote:
| It depends on your CPU microarchitectural details, on the
| complexity and size of your binary executable and the
| workload of your binary.
|
| So there's no universal answer to your question but it
| could very well be "much".
| MrMcCall wrote:
| Isn't it more sensible to just check that the params that are
| about to be sent to memcpy be reasonable?
|
| That is why I tend to wrap my system calls with my own internal
| function (which can be inlined in certain PLs), where I can
| standardize such tests. Otherwise, the resulting code that
| performs the checks and does the requisite error handling is
| bloated.
|
| Note that I am also loath to #DEFINE such code because C is
| already rife with them and my perspective is that the less of
| them the better.
|
| At the end of the day, quick and dirty fixes will prove the adage
| "short cuts make long delays", and OpenBSD's approach is the only
| really viable long-term solution, where you just have to rewrite
| your code if it has ill-advised constructs.
|
| For designing libraries such as C's stdlib, I don't believe in
| 'undefined behavior', clearly define your semantics and say, "If
| you pass a NULL to memcpy, this is what will happen." Same for
| providing a (n == 0), or should (src == dst).
|
| And if, for some strange reason, fixing the semantics breaks
| calling code, then I can't imagine that their code wasn't f_cked
| in the first place.
| hwc wrote:
| > internal function
|
| every time you introduce something nonstandard, you add one
| little hardship to anyone trying to read or modify your code.
|
| if a programmer is familiar with the language, it's standard
| library, and the normal idioms, then they should be able to
| just jump in.
| int_19h wrote:
| As the article points out, all major memcpy implementations
| already do this check inside memcpy. Sure, the caller can also
| check, but given that it's both redundant in practice and makes
| some common patterns harder to use than they would otherwise
| be, there's no reason to not just standardize what's already
| happening anyway and make everyone's lives easier in the
| process.
| badmintonbaseba wrote:
| I just skimmed through the proposed wording in [N3322]. It looks
| like it silently fixes a defect too, NULL == NULL was also
| undefined up until C23. Hilarious.
|
| [N3322] https://www.open-
| std.org/jtc1/sc22/wg14/www/docs/n3322.pdf
| mananaysiempre wrote:
| This is probably related to the issue with NULL - NULL
| mentioned in the article.
|
| Imagine you're working in real mode on x86, in the compact or
| large memory model[1]. This means that a data pointer is
| basically struct{uint16_t off,seg;} encoding linear address
| (seg<<4)+off. This makes it annoying to have individual
| allocations ("objects") >64K in size (because of the weird
| carries), so these models don't allow that. (The huge model
| does, and it's significantly slower.) Thus you legitimately
| have sizeof(size_t) == 2 but sizeof(uintptr_t) == 4 (hi Rust),
| and God help you if you compare or subtract pointers not within
| the same allocation. [Also, sizeof(void *) == 4 but sizeof(void
| (*)(void)) == 2 in the compact model, and the other way around
| in the medium model.]
|
| Note the addressing scheme is non-bijective. The C standard is
| generally careful not to require the implementation to
| canonicalize pointers: if, say, char a[16] happens to be
| immediately followed by int b[8], an independently declared
| variable, it may well be that &a+16 (legal "one past" pointer)
| is {16,1} but &b is {0,2}, which refers to the exact same byte,
| but the compiler doesn't have to do anything special because
| dereferencing &a+16 is UB (duh) and comparing (char *)(&a+16)
| with (char *)&b or subtracting one from the other is also UB
| (pointers to different objects).
|
| The issue with NULL == NULL and also with NULL - NULL is that
| now the null pointer is required to be canonical, or these
| expressions must canonicalize their operands. I don't know why
| you'd ever make an implementation that has non-canonical NULLs,
| but I guess the text prior to this change allowed such.
|
| [1]
| https://devblogs.microsoft.com/oldnewthing/20200728-00/?p=10...
| amluto wrote:
| > now the null pointer is required to be canonical
|
| Yikes! This particular oddity seems annoying but sort of
| harmless in x86 real mode, but not necessarily in protected
| mode. Imagine code that wants to load a pointer into a
| register: it loads the offset into an ordinary register and
| the selector portion into a segment register. It's
| permissible to load the 0 (null) selector, but loading
| garbage will fault immediately. So, if you allow non
| canonical NULL, then knowing that a pointer is either valid
| or NULL does not allow you to hoist a segment load above a
| condition that might mean you never actually dereference the
| pointer.
|
| (I have plenty of experience with low-level OS code in all
| kinds of nasty x86 modes but, thankfully, not so much
| experience writing ordinary C code targeting protected mode.
| It sometimes boggles my mind that anyone ever got decent
| performance with anything involving far data pointers.
| Segment loads are slow, and there are not a lot of segment
| registers to go around.)
| bonzini wrote:
| In real mode assembly days, ES and sometimes DS were just
| another base register that you could use in a loop. Given
| the dearth of addressing modes it was quite nice to assume
| that large arrays started at xxxx0h and therefore that the
| offset part of the far pointer was zero.
| pm215 wrote:
| If so, it's one that's been introduced at some point post C99
| -- the C99 spec explicitly defines the behaviour of NULL ==
| NULL. Section 6.5.9 para 6 says "Two pointers compare equal if
| and only if both are null pointers, both are pointers to the
| same object [etc etc]".
| dwattttt wrote:
| I don't imagine NULL is defined as "pointing to an object",
| so I don't expect that clause to apply.
| tsimionescu wrote:
| You completely skipped over the first part: "Two pointers
| compare equal if and only if _both are null pointers_ "
| nikic wrote:
| NULL == NULL was already defined -- but NULL <= NULL wasn't :)
| IWeldMelons wrote:
| Cannot find any confirmation to your statement. Otoh "All null
| pointer values (of compatible typewithin the same address
| space) are already required to compare equal. " in the limked
| paper.
| PaulDavisThe1st wrote:
| NULL is not single type in any conventional sense (and is
| actually tricky to define in a way that makes it usable in
| the way most programmers expect).
|
| Thus: T1* a = NULL; T2* b = NULL
| a == b; /* may be undefined at present, depending on the
| nature of T1 & T2 */
| MuffinFlavored wrote:
| > because NULL + 0 is undefined behavior in C.
|
| Why? It's 2024. Make it not be? Sure, some older stuff already
| written might no longer compile and need to be updated. Put it
| behind a "newer" standard flag/version or whatever.
|
| Or is it that it can't be caught at compile time and only run
| time... hmm...
| sophiebits wrote:
| They are making it not be. That's the whole point of the
| article.
| hwc wrote:
| Well, that seems like something that should have been there from
| the beginning .
| nmilo wrote:
| > However, the most vocal opposition came from a static analysis
| perspective: Making null pointers well-defined for zero length
| means that static analyzers can no longer unconditionally report
| NULL being passed to functions like memcpy--they also need to
| take the length into account now.
|
| How does this make any sense? We don't want to remove a low
| hanging footgun because static analyzers can no longer detect it?
| hatthew wrote:
| My understanding is that with this change, static analyzers
| have three options:
|
| 1. False positive on code that would have been an issue
| previously
|
| 2. False negative on a ton of similar footguns
|
| 3. Add complexity to differentiate between these cases
|
| None of these options are fun.
___________________________________________________________________
(page generated 2024-12-11 23:01 UTC)