[HN Gopher] Clang vs. Clang
___________________________________________________________________
Clang vs. Clang
Author : dchest
Score : 102 points
Date : 2024-08-03 14:45 UTC (8 hours ago)
(HTM) web link (blog.cr.yp.to)
(TXT) w3m dump (blog.cr.yp.to)
| josephcsible wrote:
| > compiler writers refuse to take responsibility for the bugs
| they introduced, even though the compiled code worked fine before
| the "optimizations". The excuse for not taking responsibility is
| that there are "language standards" saying that these bugs should
| be blamed on millions of programmers writing code that bumps into
| "undefined behavior"
|
| But that's not an excuse for having a bug; it's the exact
| evidence that it's not a bug at all. Calling the compiler buggy
| for not doing what you want when you commit Undefined Behavior is
| like calling dd buggy for destroying your data when you call it
| with the wrong arguments.
| nabla9 wrote:
| Optimizing compilers that don't allow disabling all
| optimizations makes it impossible to write secure code with
| them. Must do it with assembly.
| exe34 wrote:
| clang::optnone
| nabla9 wrote:
| "Optimizing compilers that don't allow disabling __all__
| optimizations"
| exe34 wrote:
| do these exist? who's using them?
| layer8 wrote:
| It's not well-defined what counts as an optimization. For
| example, should every single source-level read access of
| a memory location go through all cache levels down to
| main memory, instead of, for example, caching values in
| registers? That would be awfully slow. But that question
| is one reason for UB.
| kevingadd wrote:
| Or writing code that relies on inlining and/or tail call
| optimization to successfully run at all without running
| out of stack... We've got some code that doesn't run if
| compiled O0 due to that.
| cmeacham98 wrote:
| If your "secure" code is not secure because of a compiler
| optimization it is fundamentally incorrect and broken.
| nabla9 wrote:
| It's secure code we use.
|
| I'm sure you know who DJB is.
| jjuhl wrote:
| Why is knowing who the author is relevant? Either what he
| posts is correct or it is not, who the person is is
| irrelevant.
| hedgehog wrote:
| There is a fundamental difference of priorities between the
| two worlds. For most general application code any
| optimization is fine as long as the output is correct. In
| security critical code information leakage from execution
| time and resource usage on the chip matters but that
| essentially means you need to get away from data-dependent
| memory access patterns and flow control.
| thayne wrote:
| The problem is that preventing timing attacks often means
| you have to implement something in constant time. And most
| language specifications and implementations don't give you
| any guarantees that any operations hapen in constant time
| and can't be optimized.
|
| So the only possible way to ensure things like string
| comparison don't have data-dependent timing is often to
| implement it in assembly, which is not great.
|
| What we really need is intrinsics that are guaranteed to
| have the desired timing properties , and/or a way to
| disable optimization, or at least certain kinds of
| optimization for an area of code.
| plorkyeran wrote:
| Intrinsics which do the right thing seems like so
| obviously the correct answer to me that I've always been
| confused about why the discussion is always about
| disabling optimizations. Even in the absence of compiler
| optimizations (which is not even an entirely meaningful
| concept), writing C code which you hope the compiler will
| decide to translate into the exact assembly you had in
| mind is just a very brittle way to write software. If you
| need the program to have very specific behavior which the
| language doesn't give you the tools to express, you
| should be asking for those tools to be added to the
| language, not complaining about how your attempts at
| tricking the compiler into the thing you want keep
| breaking.
| arp242 wrote:
| The article explains why this is not as simple as that,
| especially in the case of timing attacks. Here it's not
| just the end-result that matters, but _how_ it 's done that
| matters. If any code can be change to anything else that
| gives the same results, then this becomes quite hard.
|
| Absolutist statements such as this may give you a glowing
| sense of superiority and cleverness, but they contribute
| nothing and are not as clever as you think.
| plorkyeran wrote:
| The article describes why you can't write code which is
| resistant to timing attacks in portable C, but then
| concludes that actually the code he wrote is correct and
| it's the compiler's fault it didn't work. It's
| _inconvenient_ that anything which cares about timing
| attacks cannot be securely written in C, but that doesn't
| make the code not fundamentally incorrect and broken.
| bluGill wrote:
| If you have ub then you have a bug and there is some system
| that will show it. It isn't hard to write code without ub.
| bigstrat2003 wrote:
| It is, in fact, pretty hard as evidenced by how often
| programmers fail at it. The macho attitude of "it's not
| hard, just write good code" is divorced from observable
| reality.
| bluGill wrote:
| People write buffer overflows because and memory leaks
| they are not coreful. The rest of ub are things I have
| never seen despite running sanitizers and a large
| codebase.
| saagarjha wrote:
| Perhaps you're not looking all that hard.
| tbrownaw wrote:
| Staying under the speed limit is, in fact, pretty hard as
| evidenced by how often drivers fail at it.
| kevingadd wrote:
| It's more complex than that for the example of car speed
| limits. Depending on where you live, the law also says
| that driving _too slow_ is illegal because it creates an
| unsafe environment by forcing other drivers on i.e. the
| freeway to pass you.
|
| But yeah, seeing how virtually everyone on every road is
| constantly speeding, that doesn't give me a lot of faith
| in my fellow programmers' ability to avoid UB...
| pjmlp wrote:
| Only if developers act as grown ups and use all static
| analysers they can get hold of, instead of acting as they
| know better.
|
| The tone of my answer is a reflection of what most surveys
| state, related to the actual use of such tooling.
| Rusky wrote:
| Disabling all optimizations isn't even enough- fundamentally
| what you need is a much narrower specification for how the
| source language maps to its output. Even -O0 doesn't give you
| that, and in fact will often be counterproductive (e.g.
| you'll get branches in places that the optimizer would have
| removed them).
|
| The problem with this is that no general purpose compiler
| wants to tie its own hands behind its back in this way, for
| the benefit of one narrow use case. It's not just that it
| would cost performance for everyone else, but also that it
| requires a totally different approach to specification and
| backwards compatibility, not to mention deep changes to
| compiler architecture.
|
| You almost may as well just design a new language, at that
| point.
| amluto wrote:
| > You almost may as well just design a new language, at
| that point.
|
| Forget "almost".
|
| Go compile this C code: void foo(int
| *ptr) { free(ptr); *ptr =
| 42; }
|
| This is UB. And it has nothing whatsoever to do with
| optimizations -- any sensible translation to machine code
| is a use-after-free, and an attacker can probably find a
| way to exploit that machine code to run arbitrary code and
| format your disk.
|
| If you don't like this, use a language without UB.
|
| But djb wants something different, I think: a way to tell
| the compiler not to introduce timing dependencies on
| certain values. This is a nice idea, but it needs hardware
| support! Your CPU may well implement ALU instructions with
| data-dependent timing. Intel, for example, reserves the
| right to do this unless you set an MSR to tell it not to.
| And you cannot set that MSR from user code, so what exactly
| is a compiler supposed to do?
|
| https://www.intel.com/content/www/us/en/developer/articles/
| t...
| Rusky wrote:
| I am not talking about UB at all. I am talking about the
| same constant-time stuff that djb's post is talking
| about.
| SAI_Peregrinus wrote:
| Execution time is not considered Observable Behavior in
| the C standard. It's entirely outside the semantics of
| the language. It is Undefined Behavior, though not UB
| that necessarily invalidates the program's other
| semantics the way a use-after-free would.
| thayne wrote:
| The problem is that c and c++ have a ridiculous amount of
| undefined behavior, and it is extremely difficult to avoid all
| of it.
|
| One of the advantages of rust is it confines any potential UB
| to unsafe blocks. But even in rust, which has defined behavior
| in a lot of places that are UB in c, if you venture into unsafe
| code, it is remarkable easy to accidentally run into subtle UB
| issues.
| layer8 wrote:
| It's true that UB is not intuitive at first, but "ridiculous
| amount" and "difficult to avoid" is overstating it. You have
| to have a proof-writing mindset when coding, but you do get
| sensitized to the pitfalls once you read up on what the
| language constructs actually guarantee (and don't guarantee),
| and it's not that much more difficult than, say, avoiding
| panics in Rust.
| pjmlp wrote:
| So surely you know by hear the circa 200 use cases
| documented in ISO C, and the even greater list documented
| in ISO C++ standard documents.
|
| Because, me despite knowing both since the 1990's, I rather
| leave that to static analysis tools.
| tomjakubowski wrote:
| In my experience it is very easy to accidentally introduce
| iterator invalidation: it starts with calling a callback
| while iterating, add some layers of indirection, and
| eventually somebody will add some innocent looking code
| deep down the call stack which ends up mutating the
| collection while it's being iterated.
| layer8 wrote:
| I can tell you that this happens in Java as well, which
| doesn't have undefined behavior. That's just the nature
| of mutable state in combination with algorithms that only
| work while the state remains unmodified.
| thayne wrote:
| It isn't so much that it is unintuitive, for the most
| part[1], but rather that there are a lot of things to keep
| track of, and a seemingly innocous change in one part of
| the program can potentially result in UB in somewhere far
| away. And usually such bugs are not code that is blatantly
| undefined behavior, but rather code that is well defined
| most of the time, but in some edge case can trigger
| undefined behavior.
|
| It would help if there was better tooling for finding
| places that could result in UB.
|
| [1]: although some of them can be a little surprising, like
| the fact that overflow is defined for unsigned types but
| not signed types
| js2 wrote:
| I think you're replying to a strawman. Here's the full quote:
|
| > The excuse for not taking responsibility is that there are
| "language standards" saying that these bugs should be blamed on
| millions of programmers writing code that bumps into "undefined
| behavior", rather than being blamed on the much smaller group
| of compiler writers subsequently changing how this code
| behaves. These "language standards" are written by the compiler
| writers.
|
| > Evidently the compiler writers find it more important to
| continue developing "optimizations" than to have computer
| systems functioning as expected. Developing "optimizations"
| seems to be a very large part of what compiler writers are paid
| to do.
|
| The argument is that the compiler writers are themselves the
| ones deciding what is and isn't undefined, and they are
| defining those standards in such a way as to allow themselves
| latitude for further optimizations. Those optimizations then
| break previously working code.
|
| The compiler writers could instead choose to prioritize
| backwards compatibility, but they don't. Further, these
| optimizations don't meaningfully improve the performance of
| real world code, so the trade-off of breaking code isn't even
| worth it.
|
| That's the argument you need to rebut.
| quohort wrote:
| Perhaps the solution is also to reign in the language
| standard to support stricter use cases. For example, what if
| there was a constant-time { ... }; block in the same way you
| have extern "C" { ... }; . Not only would it allow you to
| have optimizations outside of the block, it would also force
| the compiler to ensure that a given block of code is always
| constant-time (as a security check done by the compiler).
| duped wrote:
| I think this is actually a mistake by the author since the rant
| is mostly focused on implementation defined behavior, not
| undefined.
|
| The examples they give are all perfectly valid code. The
| specific bugs they're talking about seem to be compiler
| optimizations that replace bit twiddling arithmetic into
| branches, which isn't a safe optimization if the bit twiddling
| happens in a cryptographic context because it opens the door
| for timing attacks.
|
| I don't think it's correct to call either the source code or
| compiler buggy, it's the C standard that is under specified to
| the author's liking and it creates security bugs on some
| targets.
|
| Ultimately though I can agree with the C standard authors that
| they cannot define the behavior of hardware, they can only
| define the semantics for the language itself. Crypto guys will
| have to suffer because the blame is on the hardware for these
| bugs, not the software.
| Conscat wrote:
| Fwiw clang has a `clang::optnone` attribute to disable all
| optimizations on a per-function basis, and GCC has the fantastic
| `gnu::optimize` attribute which allows you to add or remove
| optimizations by name, or set the optimization level regardless
| of compiler flags. `gnu::optimize(0)` is similar to that clang
| flag. Clang also has `clang::no_builtins` to disable specifically
| the memcpy and memset optimizations.
| leni536 wrote:
| "The optimize attribute should be used for debugging purposes
| only. It is not suitable in production code. "
|
| https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attribute...
| Conscat wrote:
| That's an interesting note. I wonder why they claim this. As
| far as I know, `[[gnu::optimize("-fno-tree-loop-distribute-
| patterns")]]` (or the equivalent #pragma) is required for
| implementing a memcpy function in C unless you do something
| funky with the build system.
| leni536 wrote:
| Maybe that's applied to the TU that defines it? I don't see
| it in the glibc sources.
| ndesaulniers wrote:
| Compile your code with `-O0` and shut up already.
| mananaysiempre wrote:
| Unfortunately GCC's codegen for GCC's x86 intrinsics headers is
| really remarkably awful at -O0, particularly around constant
| loads and broadcasts, because those usually use code that's as
| naive as possible and rely on compiler optimizations to
| actually turn it into a broadcast, immediate, or whatever. (I
| haven't checked Clang.)
| jjuhl wrote:
| "Unfortunately GCC's codegen for GCC's x86 intrinsics headers
| is really remarkably awful at -O0" - but that kind of seems
| to be what is asked for..
| mananaysiempre wrote:
| No. If I say (e.g.) _mm256_set_epi32(a,b,...,c) with
| constant arguments (which is the preferred way to make a
| vector constant), I expect to see 32 aligned bytes in the
| constant pool and a VMOVDQA in the code, not the mess of
| VPINSRDs that I'll get at -O0 and that makes it essentially
| impossible to write decent vectorized code. The same way
| that I don't expect to see a MUL in the assembly when I
| write sizeof(int) * CHAR_BIT in the source (and IIRC I
| won't see one).
|
| (Brought to you by a two-week investigation of a mysterious
| literally 100x slowdown that was caused by the fact that QA
| always ran a debug build and those are always compiled at
| -O0.)
| jjuhl wrote:
| Seems you want the compiler to do some _optimization_ ,
| to improve the generated code. Or?
| mananaysiempre wrote:
| In this case, I'd expect constant folding to be the
| absolute minimum performed at all optimization levels. It
| is, in fact,--for integers. For (integer) vectors, it's
| not, even though it's much more important there. That's
| why advising cryptographers who program using vector
| intrinsics (aka "assembly except you get a register
| allocator") to compile with GCC at -O0 is such bad
| advice. (Just checked MSVC and it's better there.)
|
| There are, however, more unambiguous cases, where Intel
| documents an intrinsic to produce an instruction, but GCC
| does not in fact produce said instruction from said
| intrinsic unless optimization is enabled. (I just don't
| remember them because constants in particular were so
| ridiculously bad in the specific case I hit.)
| JonChesterfield wrote:
| If you constant fold and keep things in registers then
| you generally can't look at or change the pieces in a
| debugger. So everything gets written to the stack where
| it's easy to find.
| anonymoushn wrote:
| Clang tends to put everything on the stack at -O0 and
| actually try to do register allocation only as an
| optimization.
| mananaysiempre wrote:
| Generally? Sure, so does GCC, but that's IME less impactful
| than a pessimized vectorized routine. (Corollary that I hit
| literally yesterday: exclusively at -O0, until you step
| past the opening brace--i.e. the function prologue--GDB
| will show stack garbage instead of the function arguments
| passed in registers.)
| ziml77 wrote:
| Why does the code need to rely on hacks to get around
| optimizations? Can't they be disabled per-unit by just compiling
| different files with different optimization flags?
| Someone wrote:
| You can't realistically have a C compiler that doesn't do any
| optimizations.
|
| For one thing, CPU caches are wonders of technology, but a C
| compiler that only uses registers for computations but stores
| all results in memory and issues a load for every read will be
| unbearingly slow.
|
| So, you need a register allocator and if you have that, you
| either need (A) an algorithm to spill data to memory if you run
| out of registers, or (B) have to refuse to compile such code.
|
| If you make choice A, any change to the code for spilling back
| to memory can affect timings and that can introduce a timing
| bug in constant-time code that isn't branch-free.
|
| Also, there still is no guarantee that code that is constant-
| time on CPU X also will be on CPU Y. For example, one CPU has
| single-cycle 64-bit multiplication, but another doesn't.
|
| If you make choice B, you don't have a portable language
| anymore. Different CPUs have different amounts of registers,
| and they can have different features, so code that runs fine in
| on one CPU may not do so on another one (even if it has the
| exact same amount of registers of the same size).
|
| Phrased differently: C isn't a language that supports writing
| constant-time functions. If you want that, you either have to
| try hard to beat a C compiler into submission, and you will
| fail in doing that, or choose a different language, and that
| likely will be one that is a lot like the assembly language of
| the target CPU. You could make it _look_ similar between CPUs,
| but there would be subtle or not so subtle differences in
| semantics or in what programs the language accepts for
| different CPUs.
|
| Having said that: a seriously dumbed down C compiler (with a
| simple register allocator that programmers can mostly
| understand, no constant folding, no optimizations replacing
| multiplications by bit shifts or divisions by multiplications,
| 100% honors 'inline' phrases, etc.) probably could get close to
| what people want. It might even have a feature where code that
| requires register spilling triggers a compiler error. I am not
| aware of any compiler with that feature, though.
|
| I wouldn't call that C, though, as programs written in it would
| be a lot less portable.
| pjmlp wrote:
| That would bring us back to the days of 8 and 16 bit home
| computers, where the quality of C compilers outside UNIX, was
| hardly anything to be impressed about.
| UncleMeat wrote:
| What is an optimization?
|
| You wrote some code. It doesn't refer to registers. Is register
| allocation that minimized spillage an optimization? How would
| you write a compiler that has "non-optimizing" register
| allocation?
| TNorthover wrote:
| I'm vaguely sympathetic to these crypto people's end goals
| (talking about things like constant time evaluation & secret
| hiding), but it's really not what general purpose compilers are
| even thinking about most of the time so I doubt it'll ever be
| more than a hack that mostly works.
|
| They'll probably need some kind of specialized compiler of their
| own if they want to be serious about it. Or carry on with asm.
| Retr0id wrote:
| The author has written such a compiler:
| https://cr.yp.to/qhasm.html (or at least, a prototype for one)
| jedisct1 wrote:
| Jasmin has largely replaced qhasm.
| pjmlp wrote:
| Jasmin is also an assembler for JVM bytecode, love
| overloaded names.
| kstrauser wrote:
| I can't help but feel we're going to think of these as the bad
| old years, and that at some point we'll have migrated off of C to
| a language with much less UB. It's so easy to express things in C
| that compile but that the compiler couldn't possibly guess the
| _intent_ of because C doesn 't have a way to express it.
|
| For instance, in Python you can write something like:
| result = [something(value) for value in set_object]
|
| Because Python's set objects are unordered, it's clear that it
| doesn't matter in which order the items are processed, and that
| the order of the results doesn't matter. That opens a whole lot
| of optimizations at the language level that don't rely on
| brilliant compilers implying what the author meant. Similar code
| in another language with immutable data can go one step further:
| since something(value1) can't possibly affect something(value2),
| it can execute those in parallel with threads or processes or
| whatever else makes it go fast.
|
| Much of the optimization of C compilers is looking at patterns in
| the code and trying to find faster ways to do what the author
| probably meant. Because C lacks the ability to express much
| intent compared to pretty much any newer language, they have the
| freedom to guess, but also _have_ to make those kinds of
| inferences to get decent performance.
|
| On the plus side, this might be a blessing in disguise like when
| the Hubble telescope needed glasses. We invented brilliant
| techniques to make it work despite its limitations. Once we fixed
| its problems, those same techniques made it perform way better
| than originally expected. All those C compiler optimizations,
| applied to a language that's not C, may give us superpowers.
| unclad5968 wrote:
| While all that makes sense in theory none of it has actually
| demonstrated to be faster than C. The compiler doesn't need to
| guess what the programmer is trying to do because C is close
| enough to the actual hardware that the programmer can just tell
| it what to do.
| pjmlp wrote:
| Note that C code has hardly fast outside big iron UNIX,
| during the 1980's and up to the mid 1990's, any half clever
| developer could easily outperform the generated machine code,
| with manually written Assembly code.
|
| Hence why games for 8 and 16 bit home computers were mostly
| written in Assembly, and there were books like the Zen of
| Assembly Programming.
|
| It was the way that optimizating compilers started to exploit
| UB in C, that finally made it fast enough for modern times.
|
| Modern hardware has nothing to do with C abstract machine.
| remexre wrote:
| https://research.microsoft.com/en-
| us/um/people/simonpj/paper...
| dathinab wrote:
| Funnily part of why Python is well, one of the slowest widely
| used languages is that any AOT compiler has a really hard
| time guessing what it does ("slowest" for pure python only,
| it's still often more then fast enough).
|
| Through then the "cognitive"/"risk" overhead of large
| complicated C code bases in typical company use cases (*1)
| makes it so that you have to be very strict/careful about
| doing any optimizations in C at all. In which case ironically
| your perf. can easily be below that of e.g. go, Rust, C#,
| Java etc. (depending on use case). I.e. in the typical code
| base the additional optimizations the compiler can do due to
| better understanding as well as less risky but limited/simple
| ad-hoc human optimizations beat out C quite often.
|
| In a certain way it's the same story as back then with ASM,
| in theory in some use-cases it's faster but in practice for a
| lot of real world code with real world constraints of dev-
| hours and dev-expertise writing C was the better business
| choice.
|
| (1) I.e. hardly any resources for optimization for most code.
| Potentially in general a few to little devs for the
| tasks/deadlines. Definitely no time to chase UB bugs.
| o11c wrote:
| It's certainly true that there's room for semantic
| optimization, but my observation is that such optimization is
| largely around memory allocation.
|
| And AFAIK the only languages which tend to implement such
| memory optimizations are Java-like, and the only reason they
| bother is because of their up-front aggressive pessimization
| ... which the optimization can't make up for.
|
| Edit: my point is: yes C sucks, but everybody else is worse
| dzaima wrote:
| On the Python example, the downside is that, even though the
| order is unspecified, people may still rely on some properties,
| and have their code break when an optimizer changes the order.
| Basically the same as UB really, though potentially resulting
| in wrong results, not necessarily safety issues (at least not
| immediately; but wrong results can turn into safety issues
| later on). And, unlike with UB, having a "sanitizer" that
| verifies that your code works on all possible set orders is
| basically impossible.
|
| gcc/clang do have a variety of things for providing low-level
| hints to the compiler that are frequently absent in other
| languages - __builtin_expect/__builtin_unpredictable,
| __builtin_unreachable/__builtin_assume, "#pragma clang loop
| vectorize(assume_safety)"/"#pragma GCC ivdep", more pragmas for
| disabling loop unrolling/vectorizing or choosing specific
| values. Biggest thing imo missing being some "optimization
| fences" to explicitly disallow the compiler to reason about a
| value from its source (__asm__ can, to an extent, do this, but
| has undesired side-effects, and needs platform-specific
| register kind names).
|
| There's certainly potential in higher-level intent-based
| optimization though. Things coming to mind being reserving
| space in an arraylist before a loop with n pushes, merging
| hashmap lookups in code doing contains-get-put with the same
| key, simplifying away objects/allocations from ability to
| locally reason about global allocation behavior.
| saagarjha wrote:
| I was already rolling my eyes but then I saw the unironic link to
| "The Death of Optimizing Compilers" and they might as well have
| fell out of my head. Someone please explain to the crypto people
| that designing a general-purpose language around side-channel
| resistance is actually stupid since most people don't need it,
| optimizations actually do help quite a lot (...if they didn't,
| you wouldn't be begging for them: -O0 exists), and the model of
| UB C(++) has is not going away. If you want to make your own
| dedicated cryptography compiler that does all this stuff I
| honestly think you should and I would support such a thing but
| when you think the whole world is conspiring against your
| attempts to write perfect code maybe it's you.
| JonChesterfield wrote:
| The crypto people really want to write slow code. That's what
| constant time means - your best case is as slow as your worst
| case. Noone else wants that so there's direct tension when they
| also want to work in some dialect of a general purpose
| language.
| johnfn wrote:
| > The bugs admitted in the compiler changelogs are just the tip
| of the iceberg. Whenever possible, compiler writers refuse to
| take responsibility for the bugs they introduced, even though the
| compiled code worked fine before the "optimizations".
|
| This makes it difficult to read the rest of the article. Really?
| All compiler authors, as a blanket statement, act in bad faith?
| Whenever possible?
|
| > As a cryptographic example, benchmarks across many CPUs show
| that the avx2 implementation of kyber768 is about 4 times faster
| than portable code compiled with an "optimizing" compiler.
|
| What? This is an apples to oranges comparison. Compilers optimize
| all code they parse; optimizing a single algorithm will of course
| speed up implementations of that specific algorithm, but what
| about the 99.9999999% of code which is _not_ your particular
| hand-optimized algorithm?
| marcus0x62 wrote:
| > This makes it difficult to read the rest of the article.
| Really? All compiler authors, as a blanket statement, act in
| bad faith? Whenever possible?
|
| When I saw the link was to DJB's site, I figured the post would
| contain a vitriolic and hyperbolic rant. It's pretty on-brand
| for him (although, to be fair, he's usually right.)
| josephcsible wrote:
| > (although, to be fair, he's usually right.)
|
| This is worth emphasizing. I actually can't think of any
| articles of his other than this one that miss the mark.
| jonhohle wrote:
| I'm not sure this one is wrong, especially if you've been
| bitten by underdocumented compiler or framework changes
| that modify behavior of previously working code.
|
| For example, I have a small utility app built against
| SwiftUI for macOS 13. Compiling on macOS 14 while still
| linking against frameworks for 13 results in broken UI
| interaction in a particular critical use case. This was a
| deliberate change made to migrate devs away from a
| particular API, but it fails silently at compile time and
| runtime. Moving the code back to a macOS 13 machine would
| produce the correct result.
|
| As a dev, I can no longer trust that linking against
| specific library version will produce the same result and
| now need to think of some tuple of compile host and library
| version
|
| At one point should working code be considered correct and
| complete when compiler writers change code generation that
| doesn't depend on UB? I'm sure it's worse for JITed
| languages where constant time operations work in test and
| for the first few hundred iterations and then are
| "optimized" into variable time branching instructions on a
| production host somewhere.
| saagarjha wrote:
| No, he's wrong. What you're talking about is completely
| different than what you are: your code doesn't work
| because Apple changes the behavior of their frameworks,
| which has nothing to do with what compiler you're using.
| There's a different contract there than what a C compiler
| gives you.
| jonhohle wrote:
| It's not quite that simple in Swift-land. There is a
| significant overlap between what's compiler and what's
| runtime. I'm supposedly linking against the same
| libraries, but changing the host platform changes the
| output binaries. Same code, same linked libraries,
| different host platform, different codegen.
|
| Mine isn't security critical, but the result is similarly
| unexpected.
| tptacek wrote:
| The "debunking NIST's calculation" one was, if I'm
| remembering this right, refuted by Chris Peikert on the PQC
| mailing list immediately after it was posted.
| UncleMeat wrote:
| I don't think DJB is right here, but I do think he is one of
| the few "ugh compilers taking advantage of UB" people who is
| actually serious about it. DJB wants absolute certainty in
| predicting the compiled code so that he can make
| significantly stronger guarantees about his programs than
| almost anybody else needs.
|
| The bad news for him is that the bulk of clang users aren't
| writing core cryptographic primitives and really DJB just
| needs a different language and compiler stack for his
| specific goals.
| g-b-r wrote:
| The bad news for us is that plenty of cryptographic (or
| otherwise critical) code is _already_ written in C or C++,
| and when compiler writers play with their optimizations,
| they cause real-world problems to a good portion of the
| population
| dathinab wrote:
| > [..] whenever possible, compiler writers refuse to take
| responsibility for the bugs they introduced
|
| I have seldomly seen someone discredit their expertise that fast
| in a blog post. (Especially if you follow the link and realized
| it's just basic fundamental C stuff of UB not meaning it produces
| an "arbitrary" value.)
| dataflow wrote:
| No, I think you're just speaking past each other here. You're
| using "bug" in reference to the source code. They're using
| "bug" in reference to the generated program. With UB it's often
| the case that the source code is buggy but the generated
| program is still correct. Later the compiler authors introduce
| a new optimization that generates a buggy program based on UB
| in the source code, and the finger-pointing starts.
|
| Edit: What nobody likes to admit is that _all_ sides share
| responsibility to the users here, and that is _hard_ to deal
| with. People just want a single entity to offload the
| responsibility to, but reality doesn 't care. To give an
| extreme analogy to get the point across: if your battery caught
| fire just because your CRUD app dereferenced NULL, nobody
| (well, nobody sane) would point the finger at the app author
| for forgetting to check for NULL. The compiler, OS, and
| hardware vendors would be held accountable for their
| irresponsibly-designed products, "undefined behavior" in the
| standard be damned. Everyone in the supply chain shares a
| responsibility to anticipate how their products can be misused
| and handle them in a reasonable manner. The apportionment of
| the responsibility depends on the situation and isn't something
| you can just determine by just asking "was this UB in the ISO
| standard?"
| RandomThoughts3 wrote:
| > if your battery caught fire just because your CRUD app
| dereferenced NULL, nobody (well, nobody sane) would point the
| finger at the app author for forgetting to check for NULL.
|
| I think pretty much anyone sane would and would be right to
| do so. Incorrect code is, well, incorrect and safety critical
| code shouldn't use UB. Plus, it's your duty as a software
| producer to use an appropriate toolchain and validate the
| application produced. You can't offload the responsibility of
| your failure to do so to a third party (doesn't stop people
| for trying all the time with either their toolchains or a
| library they use but that shouldn't be tolerated and be
| pointed as the failure to properly test and validate it is).
|
| I would be ashamed if fingers were pointed towards a compiler
| provider there unless said provider certified that its
| compiler wouldn't do that and somehow lied (but even then,
| still a testing failure on the software producer part).
| dataflow wrote:
| > I think pretty much anyone sane would and would be right
| to do so. Incorrect code is, well, incorrect and safety
| critical code shouldn't use UB
|
| You missed the whole point of the example. I gave CRUD app
| as an example for a reason. We weren't talking safety-
| critical code like battery firmware here.
| RandomThoughts3 wrote:
| Because your exemple isn't credible. But even then I
| don't think I missed the point, no. You are responsible
| for what your application does (be it a CRUD app or any
| others). If it causes damage because you fail to test
| properly, it is _your_ responsibility. The fact that so
| many programmers fail to grasp this - which is taken as
| evidence in pretty much any other domain - is why the
| current quality of the average piece of software is so
| low.
|
| Anyway, I would like to know by which magic you think a
| CRUD app could burn a battery? There is a whole stack of
| systems to prevent that from ever happening.
| pritambaral wrote:
| > There is a whole stack of systems to prevent that from
| ever happening.
|
| You've almost got the point your parent is trying to
| make. That the supply chain shares this responsibility,
| as they said.
|
| > I would like to know by which magic you think a CRUD
| app could burn a battery?
|
| I don't know about batteries, but there was a time when
| Dell refused to honour their warranty on their Inspiron
| series laptops if they found VLC to be installed. Their
| (utterly stupid) reasoning? That VLC allows the user to
| raise the (software) volume higher than 100%. It was
| their own damn fault for using poor quality speakers and
| not limiting allowable current through them in their
| (software or hardware) drivers.
| RandomThoughts3 wrote:
| > You've almost got the point your parent is trying to
| make. That the supply chain shares this responsibility,
| as they said.
|
| Deeply disagree. Failsafe doesn't magically remove your
| responsibility.
|
| I'm so glad I started my career in a safety critical
| environment with other engineers working on the non
| software part. The amount of software people who think
| they can somehow absolve themselves of all responsibility
| for shipping garbage still shock me after 15 years in the
| field.
|
| > It was their own damn fault for using poor quality
| speakers
|
| Yes, exactly, I'm glad to see we actually agree. It's
| Dell's fault - not the speaker manufacturer's fault, not
| the subcontractor who designed the sound part's fault -
| Dell's fault because they are the one who actually
| shipped the final product.
| dathinab wrote:
| > just speaking past each other here
|
| no I'm not
|
| if your program has UB it's broken and it doesn't matter if
| it currently happen to work correct under a specific compiler
| version, it's also fully your fault
|
| sure there is shared responsibility through the stack, but
| _one of the most important aspects when you have something
| like a supply chain is to know who supplies what under which
| guarantees taking which responsibilities_
|
| and for C/C++ its clearly communicated that it's soly your
| responsibility to avoid UB (in the same way that for
| batteries it's the batteries vendors responsibility to
| produce batteries which can't randomly cough on fire and the
| firmware vendors responsibility for using the battery
| driver/chagrin circuit correctly and your OS responsibility
| so that a randoms program faulting can't affect the firmware
| etc.)
|
| > be misused and handle them in a reasonable manner
|
| For things provided B2B its in general only the case in
| context of it involving end user, likely accidents and
| similar.
|
| Instead it's the responsibility of the supplier to be clear
| about what can be done with the product and what not and if
| you do something outside of the spec it's your responsibility
| to continuously make sure it's safe (or in general ask the
| supply for clarifying guarantees wrt. your usage).
|
| E.g. if you buy capacitors rate for up to 50C environmental
| temperature but happen to work for up to 80C then you still
| can't use them for 80C because there is 0% guarantee that
| even other capacitors from the same batch will also work for
| 80C. In the same way compilers are only "rate"(1) to behave
| as expected for programs without UB.
|
| If you find it unacceptable because it's to easy to end up
| with accidental UB, then you should do what anyone in a
| supply chain with a too risky to use component would do:
|
| Replace it with something less risky to use.
|
| There is a reason the ONCD urged developers to stop using
| C/C++ and similar where viable, because that is pretty much
| just following standard supply chain management best-
| practice.
|
| (1: just for the sake of wording. Through there are
| certified, i.e. ~rated, compilers revisions)
| HippoBaro wrote:
| I think the author knows very well what UB is and means. But
| he's thinking critically about the whole system.
|
| UB is meant to add value. It's possible to write a language
| without it, so why do we have any UB at all? We do because of
| portability and because it gives flexibility to compilers
| writers.
|
| The post is all about whether this flexibility is worth it when
| compared with the difficulty of writing programs without UB.
|
| The author makes the case that (1) there seem to be more money
| lost on bugs than money saved on faster bytecode and (2)
| there's an unwillingness to do something about it because
| compiler writers have a lot of weight when it comes to what
| goes into language standards.
| layer8 wrote:
| The issue is that you'd have to come up with and agree on an
| alternative language specification without (or with less) UB.
| Having the compiler implementation be the specification is
| not a solution. And such a newly agreed specification would
| invariably either turn some previously conforming programs
| nonconforming, or reduce performance in relevant scenarios,
| or both.
|
| That's not to say that it wouldn't be worth it, but given the
| multitude of compiler implementations and vendors, and the
| huge amount of existing code, it's a difficult proposition.
|
| What traditionally has been done, is either to define some
| "safe" subset of C verified by linters, or since you probably
| want to break some compatibility anyway, design a separate
| new language.
| twoodfin wrote:
| Even stipulating that part of the argument, the author then
| goes on a tear about optimizations breaking constant-time
| evaluation, which doesn't have anything to do with UB.
|
| The real argument seems to be that C compilers had it right
| when they really did embody C as portable assembly, and
| everything that's made that mapping less predictable has been
| a regression.
| dathinab wrote:
| But C never had been portable assembly.
|
| Which I think is somewhat the core of the problem. People
| treating things in C in ways they just are not. Weather
| that is C is portable assembly or C the "it's just bit's in
| memory" view of things (which often is double wrong
| ignoring stuff like hardware caching). Or stuff like
| writing const time code based on assuming that the compiler
| probably, hopefully can't figure out that it can optimize
| something.
|
| > The real argument seems to be that C compilers had it
| right when they really did embody C as portable assembly
|
| But why would you use such a C. Such a C would be slow
| compared to it's competition while still prone to
| problematic bugs. At the same time often people seem to
| forgot that part of UB is rooted in different hardware
| doing different things including having behavior in some
| cases which isn't just a register/mem address having an
| "arbitrary value" but more similar to C UB (like e.g. when
| it involves CPU caches).
| mpweiher wrote:
| > But C never had been portable assembly.
|
| The ANSI C standards committee disagrees with you.
|
| "Committee did not want to force programmers into writing
| portably, to preclude the use of C as a "high-level
| assembler:"
|
| https://www.open-std.org/JTC1/SC22/WG14/www/docs/n897.pdf
|
| p 2, line 39. (p10 of the PDF)
|
| "C code can be portable. "
|
| line 30
| pjmlp wrote:
| Back in 1989, when C abstract machine semantics were
| closer to being a portable macro processor, and stuff
| like the _register_ keyword was actually something
| compilers cared about.
| SAI_Peregrinus wrote:
| And _even then_ there was no notion of constant-time
| being observable behavior to the compiler. You cannot
| write reliably constant-time code in C because execution
| time is not a property the C language includes in its
| model of computation.
| mpweiher wrote:
| But having a straightforward/predictable mapping to the
| underlying machine and _its_ semantics is included in the
| C model of computation.
|
| And that is actually not just compatible with the C
| "model of computation" being otherwise quite incomplete,
| these two properties are really just two sides of the
| same coin.
|
| The whole idea of an "abstract C machine" that
| unambiguously and completely specifies behavior is a
| fiction.
| zajio1am wrote:
| > UB is meant to add value. It's possible to write a language
| without it, so why do we have any UB at all? We do because of
| portability and because it gives flexibility to compilers
| writers.
|
| Implementation-defined behavior is here for portability for
| valid code. Undefined behavior is here so that compilers have
| leeway with handling invalid conditions (like null pointer
| dereference, out-of-bounds access, integer overflows,
| division by zero ...).
|
| What does it mean that a language does not have UBs? There
| are several cases how to handle invalid conditions:
|
| 1) eliminate them at compile time - this is optimal, but
| currently practical just for some classes of errors.
|
| 2) have consistent, well-defined behavior for them -
| platforms may have vastly different way how to handle invalid
| conditions
|
| 3) have consistent, implementation-defined behavior for them
| - usable for some classes of errors (integer overflow,
| division by zero), but for others it would add extensive
| runtime overhead.
|
| 4) have inconsistent behavior (UB) - C way
| leni536 wrote:
| C and C++ are unsuitable for writing algorithms with constant-
| time guarantees. The standards have little to no notion of real
| time, and compilers don't offer additional guarantees as
| extensions.
|
| But blaming the compiler devs for this is just misguided.
| quietbritishjim wrote:
| That was my thought reading this article. If you want to
| produce machine code that performs operations in constant time
| regardless of the branch taken, you need to use a language that
| supports expressing that, which C does not.
| actionfromafar wrote:
| Heck, _CPUs_ themselves aren 't suitable for constant time
| operations. At any time, some new CPU can be released which
| changes how quick some operations are.
| quietbritishjim wrote:
| Or microcode updates to existing CPUs!
| quuxplusone wrote:
| The author's Clang patch is interesting, but I wonder if what he
| really wants is, like, a new optimization level "-Obranchless"
| which is like O2/O3 but disables all optimizations which might
| introduce new conditional branches. Presumably optimizations that
| _remove_ branches are fine; it's just that you don't want any
| deliberately branchless subexpression being replaced with a
| branch.
|
| Basically like today's "-Og/-Odebug" or "-fno-omit-frame-
| pointers" but for this specific niche.
|
| I'd be interested to see a post comparing the performance and
| vulnerability of the mentioned crypto code with and without this
| (hypothetical) -Obranchless.
| quuxplusone wrote:
| ... except that even my idea fails to help with software math.
| If the programmer writes `uint128 a, b; ... a /= b` under
| -Obranchless, does that mean they don't want us calling a
| C++-runtime software division routine (__udiv3 or however it's
| spelled) that might contain branches? And if so, then what on
| earth do we do instead? -- well, just give an error at compile
| time, I guess.
| detaro wrote:
| Yes, a compile failure would IMHO be the only useful result
| in that case.
| nolist_policy wrote:
| Not branchless, they just need it to be constant-time. That
| is definitely doable with pure software division.
| o11c wrote:
| Complains about branching, but doesn't even mention
| `__builtin_expect_with_probability`.
| Retr0id wrote:
| Why is that relevant?
| dchest wrote:
| There's no point in mentioning something that doesn't solve the
| issue.
| o11c wrote:
| It's as close an answer as you're going to get while using a
| language that's unsuitable for the issue.
|
| And in practice it's pretty reliable at generating `cmov`s
| ...
| zokier wrote:
| It's free software, they are completely free to fork it make it
| have whatever semantics they want if they don't like the ISO C
| semantics. They can't really expect someone else to do that for
| them for free, and especially this sort of post is not exactly
| the sort of thing that would any of the compiler people to come
| to djbs side
| Retr0id wrote:
| What I'd really like is a way to express code in a medium/high
| level language, and provide hand-optimized assembly code
| alongside it (for as many target architectures as you need). For
| a first-pass, you could machine-generate that assembly, and then
| manually verify that it's constant time (for example) and perform
| additional optimizations over the top of that, by hand.
|
| The "compiler"'s job would then be to assert that the behaviour
| of the source matches the behaviour of the provided assembly.
| (This is probably a hard/impossible problem to solve in the
| general case, but I think it'd be solvable in enough cases to be
| useful)
|
| To me this would offer the best of both worlds - readable,
| auditable source code, alongside high-performance assembly that
| you know won't randomly break in a future compiler update.
| amluto wrote:
| It's worth noting that, on Intel CPUs, neither clang nor anything
| else can possibly generate correct code, because correct code
| does not exist in user mode.
|
| https://www.intel.com/content/www/us/en/developer/articles/t...
|
| Look at DOITM in that document -- it is simply impossible for a
| userspace crypto library to set the required bit.
| IAmLiterallyAB wrote:
| Couldn't you syscall into the kernel to set the flag, then
| return back into usermode with it set?
| amluto wrote:
| So your compiler is supposed to emit a pair of syscalls each
| function that does integer math? Never mind that a pair of
| syscalls that do WRMSR may well take longer than whatever
| crypto operation is between them.
|
| I have absolutely nothing good to say about Intel's design
| here.
| pvillano wrote:
| What's the alternative?
| amluto wrote:
| An instruction prefix that makes instructions constant
| time. A code segment bit (ugly but would work). Different
| instructions. Making constant time the default. A control
| register that's a _user_ register.
| wolf550e wrote:
| > It would be interesting to study what percentage of security
| failures can be partly or entirely attributed to compiler
| "optimizations".
|
| I bet it's roughly none.
| g-b-r wrote:
| Oh yeah, because no security failure was ever related to
| undefined behavior
| JonChesterfield wrote:
| Deleting null pointer checks in the Linux kernel is the first
| one to come to mind
| wolf550e wrote:
| That's one CVE, right? How many other vulnerabilities were
| caused by compiler optimizations, whether they were bugs in
| the compiler or allowed by the spec?
| tomcam wrote:
| UB means undefined behavior
|
| Somehow it took me long minutes to infer this.
| AndyKelley wrote:
| If you don't like C's semantics then how about using a different
| programming language instead of getting angry at compiler
| engineers.
| nrr wrote:
| I'm honestly unsure whether djb would actually find anything
| other than his qhasm tolerable (yes, even Zig). I find this
| particular commentary from him unsurprising.
| krackers wrote:
| Refreshing post that conveys a perspective I haven't seen voiced
| often. See also: https://gavinhoward.com/2023/08/the-scourge-
| of-00ub/
| quohort wrote:
| Very interesting article and much-needed criticism of the current
| standard of heuristic optimization.
|
| Before reading this, I thought that a simple compiler could never
| usefully compete against optimizing compilers (which require more
| manpower to produce), but perhaps there is a niche use-case for a
| compiler with better facilities for manual optimization. This
| article has inspired me to make a simple compiler myself.
| krackers wrote:
| You don't need to get rid of all optimizations though, just the
| "unsafe" ones. And you could always make them opt-in instead of
| opt-out.
|
| Now I'm definitely closer to a noob, but compilers already have
| flags like no-strict-overflow and no-delete-null-pointer-
| checks. I don't see why we can't make these the default
| options. It's already "undefined behavior" per the spec, so why
| not make it do something sensible. The only danger is that some
| pedant comes along and says that with these assumptions what
| you're now writing isn't "portable C" and relies on compiler-
| defined behavior, but in the real world if it does the correct
| thing I don't think anyone would care: just call your dialect
| "boringC" instead of C99 or something (borrowing Gavin Howard's
| term), and the issue disappears.
| quohort wrote:
| > And you could always make them opt-in instead of opt-out.
|
| > The only danger is that some pedant comes along and says
| that with these assumptions what you're now writing isn't
| "portable C" and relies on compiler-defined behavior, but in
| the real world if it does the correct thing I don't think
| anyone would care: just call your dialect "boringC" instead
| of C99 or something (borrowing Gavin Howard's term), and the
| issue disappears.
|
| My idea is to make a new language with some simple syntax
| like S-expressions. Compilation would be (almost entirely)
| done with lisp-like macros, but unlike lisp it would be an
| imperative language rather than a functional language. The
| main data structure would have to be a hierarchy (analogous
| to CONS) to facilitate these macros.
|
| Optimizations (and Specializations) would be opt-in and would
| depend on the intrinsics and macros you allow in compilation.
| For example, you could start writing code with this default
| data structure, and later swap it out for some more specific
| data structure like a linked list or a hashtable. The most
| daunting problem is the issue of how the compiler selects
| what optimization or specializations to use; Optimizing for
| something like code size is straightforward, but optimizing
| for code speed will depend on what branches are taken at
| runtime. Now I suppose that the language should simply allow
| the programmer to manually express their preferences (which
| could be discovered through benchmarks/code studies).
|
| I think that this could have a niche for manually-optimized
| code that requires strict static analysis and straight-
| forward compilation. It also could have a niche in
| decompilation/recompilation/reverse-engineering (I think that
| a similar process can run in reverse to disassemble even
| obfuscated code, because you could easily write a macro to
| reverse an ad-hoc obfuscation mechanism).
|
| Here is another application of the language: By controlling
| the macros and intrinsics available at compilation, you could
| ensure compile-time security of userspace programs. For
| example, you could have a setup such that speculative
| execution vulnerabilities and the like are impossible to
| compile. I think you could safely enforce cooperative
| multitasking between programs.
|
| I'll probably start with a simple assembly language like
| WASM, then LLVM-IR. Eventually it would have JS/C/Rust
| bindings to interoperate with normal libraries.
|
| Lastly, I would like to make it so you can write routines
| that are portable between CPUs, GPUs, and even FPGAs, but
| this would be very difficult and this functionality may
| better realized with a functional language (e.g. CLASP
| https://github.com/clasp-developers/clasp) or may require
| programmers to work at an uncomfortably high level of
| abstraction.
| gumby wrote:
| I like Bernstein but sometimes he flies off the handle in the
| wrong direction. This is a good example, which he even half-
| heartedly acknowledges at the end!
|
| A big chunk of the essay is about a side point -- how good the
| gains of optimization might be, which, even with data, would be a
| use-case dependent decision.
|
| But the bulk of his complaint is that C compilers fail to take
| into account semantics _that cannot be expressed in the
| language_. Wow, shocker!
|
| At the very end he says "use a language which can express the
| needed semantics". The entire essay could have been replaced with
| that sentence.
| quohort wrote:
| > A big chunk of the essay is about a side point -- how good
| the gains of optimization might be, which, even with data,
| would be a use-case dependent decision.
|
| I think this was useful context, and it was eye-opening to me.
| bhk wrote:
| There's an important point to be made here: those who define
| the semantics of C and C++ shovel an unreasonable amount of
| behavior into the bucket of "undefined behavior". Much of this
| has dubious justifications, while making it more difficult to
| write correct programs.
| duped wrote:
| To be pedantic, I think you're speaking about unspecified
| behavior and implementation defined behavior. Undefined
| behavior specifically refers to things that have no
| meaningful semantics, so the compiler assumes it never
| happens.
|
| Unspecified behavior is anything outside the scope of
| observable behavior for which there are two or more ways the
| implementation can choose.
|
| Since the timing of instructions on machines with speculative
| execution is not observable behavior in C, anything that
| impacts it is unspecified.
|
| There's really no way around this, and I disagree that
| there's an "unreasonable" amount of it. Truly the problem is
| up to the judgement of the compiler developers what choice to
| make and for users to pick implementations based on those
| choices, or work around them as needed.
| bhk wrote:
| I am referring to _undefined_ behavior.
|
| For example, consider the case integer overflow when adding
| two signed numbers. C considers this undefined behavior,
| making the program's behavior is undefined. _All_ bets are
| off, even if the program never makes use of the resulting
| value. C compilers are allowed to assume the overflow can
| never happen, which in some cases allows them to infer that
| numbers must fit within certain bounds, which allows them
| to do things like optimize away bounds checks written by
| the programmer.
|
| A more reasonable language design choice would be to treat
| this as an operation that produces and unspecified integer
| result, or an implementation-defined result.
|
| Edit: The following article helps clear up some common
| confusion about undefined behavior:
|
| https://blog.regehr.org/archives/213
|
| Unfortunately this article, like most on the subject,
| perpetuates the notion that there are significant
| performance benefits to treating simple things like integer
| overflow as UB. E.g.: "I've heard that certain tight loops
| speed up by 30%-50% ..." Where that is true, the compiler
| could still emit the optimized form of the loop without UB-
| based inference, _but_ it would simply have to be guarded
| by a run-time check (outside of the loop) that would fall
| back to the slower code in the rare occasions when the
| assumptions do not hold.
___________________________________________________________________
(page generated 2024-08-03 23:00 UTC)