hngopher.com

       [HN Gopher] GCC always assumes aligned pointer accesses (2020)
       ___________________________________________________________________
        
       GCC always assumes aligned pointer accesses (2020)
        
       Author : luu
       Score  : 102 points
       Date   : 2023-08-20 00:59 UTC (22 hours ago)
        
 (HTM) web link (trust-in-soft.com)
 (TXT) w3m dump (trust-in-soft.com)
        
       | zajio1am wrote:
       | > The C standards, having to accommodate both target
       | architectures where misaligned accesses worked and target
       | architectures where these violently interrupted the program,
       | applied their universal solution: they classified misaligned
       | access as an undefined behavior.
       | 
       | No. If the C standard wants to accommodate different target
       | architectures, they use implementation-specified behavior. The
       | undefined behavior is just polite way to say that the code is
       | buggy.
       | 
       | The C standard just requires natural alignment even on
       | architectures that allows unaligned accesses.
        
       | volta87 wrote:
       | The arguments in this blogpost are fundamentally flawed. The fact
       | that they opened a bug based on them but got shut down should
       | have raised all red flags.
       | 
       | When compiling and running a C program, the only thing that
       | matters is "what the C abstract machine does". Programs that
       | exhibit UB in the abstract machine are allowed to do "anything".
       | 
       | Trying to scope that down using arguments of the form "but what
       | the hardware does is X" are fundamentally flawed, because
       | anything means anything, and what the hardware does doesn't
       | change that, and therefore it doesn't matter.
       | 
       | This blogpost "What The Hardware Does is not What Your Program
       | Does" explains this in more detail and with more examples.
       | 
       | https://www.ralfj.de/blog/2019/07/14/uninit.html
        
         | kmeisthax wrote:
         | Great, except no implementation of the C abstract machine
         | actually exists. So you can't test against it. All you have are
         | compilers that use it to justify miscompiling your code.
         | 
         | We need a C interpreter that intentionally implements C machine
         | features that don't correspond to any architectural feature -
         | i.e. pointers are (allocation provenance, offset) pairs,
         | integer overflow panics, every pointer construction is checked,
         | etc. If only to point out how hilariously absurd the ISO C UB
         | rules are and how nobody actually follows them.
         | 
         | My personal opinion is that "undefined behavior" was a spec
         | writing mistake that has been rules-lawyered into absurdity.
         | For example, signed integer overflow being UB was intended to
         | allow compiling C to non-twos-compliment machines. This was
         | interpreted to allow inventing _new_ misbehaviors for integer
         | overflow instead of  "do whatever the target architecture
         | does."
        
           | dorianh wrote:
           | The blog post company does sell a C interpreter that checks
           | for all undefined behaviors (with provenance and offset).
        
           | codedokode wrote:
           | > For example, signed integer overflow being UB was intended
           | to allow compiling C to non-twos-compliment machines.
           | 
           | This is indeed a design mistake, but in another sense.
           | Ordinary arithmetic ops like + or - should throw an exception
           | on overflow (with both signed and unsigned operands) because
           | most of the times you need an ordinary math, not math modulo
           | 2^32. For those rare cases where wrap around is desired,
           | there should be a function like add_and_wrap() or a special
           | operator.
        
           | plorkyeran wrote:
           | UBSan covers each of those except provenance checking, and
           | ASan mostly catches provenance problems even though that's
           | not directly the goal. There are some dumb forms of UB not
           | caught by any of the sanitizers, but most of them are.
           | 
           | Making your program UBSan-clean is the bare minimum you
           | should do if you're writing C or C++ in 2023, not an absurd
           | goal. I know it'll never happen, but I'm increasingly of the
           | opinion that UBSan should be enabled by default.
        
         | captainmuon wrote:
         | I think it is a good blog post, because it highlights an issue
         | that I was not aware of and that I think many programmers are
         | not. I do think I am a decent C programmer, and I spotted the
         | strict aliasing issue immediately, but I didn't know that
         | unaligned pointer access is UB. Because let's face it, the
         | majority of programmers didn't read the standard, and those who
         | did don't remember all facets.
         | 
         | I first learned many years ago that you _should_ pick apart
         | binary data by casting structs, using pointers to the middle of
         | fields and so on. It was ubiquitous for both speed and
         | convenience. I don 't know if it was legal even in the 90s, but
         | it was general practice - MS Office file formats from that time
         | were just dumped structs. Then at some point I learned about
         | pointer alignment - but it was always framed due to
         | performance, and due to the capabilities of exotic platforms,
         | never as a correctness issue. But it's not just important to
         | learn what to do, but also why to do it, which is why we need
         | more articles highlighting these issues.
         | 
         | (And I have to admit, I am one of these misguided people who
         | would love a flag to turn C into "portable assembler" again.
         | Even if it is 10x slower, and even if I had to add annotations
         | to every damn for loop to tell the compiler that I'm not
         | overflowing. There are just cases where understanding what you
         | are actually doing to the hardware trumps performance.)
        
         | hedora wrote:
         | Honestly, I think you are both incorrect.
         | 
         | C has always had a concept of implementation defined behavior,
         | and unaligned memory accesses used to be defined to work
         | correctly on x86.
         | 
         | Intel added instructions that can't handle unaligned access, so
         | they broke that contract. I'd argue that it is an instruction
         | set architecture bug.
         | 
         | Alternatively, Intel could argue that compilers shouldn't emit
         | vector instructions unless they can statically prove the
         | pointer is aligned. That's not feasible in general for
         | languages like C/C++, so that's a pretty weak defense of having
         | the processor pay the overhead of supporting unaligned access
         | on some, but not all, paths.
        
           | bigbillheck wrote:
           | > C has always had a concept of implementation defined
           | behavior,
           | 
           | Surely only after standardization tho?
        
           | armitron wrote:
           | Unaligned memory accesses are undefined behavior in C. If
           | you're writing C, you should be abiding by C rules. "Used to
           | work correctly" is more guesswork and ignorance than "abiding
           | by C rules". In C, playing fast&loose with definitions hurts,
           | BAD.
           | 
           | Frankly, I'd be ashamed to write this blog post since the
           | only thing it accomplishes is exposing its writers as not
           | understanding the very thing they're signaling expertise on.
        
             | iso8859-1 wrote:
             | What makes you think they don't understand it? They
             | acknowledge that it is UB. I read them as realistic, since
             | they know that people rely on C compilers working a certain
             | way. They even wrote an interpreter that detects UB:
             | https://github.com/TrustInSoft/tis-interpreter
             | 
             | I understand why people like the compiler being able to
             | leverage UB. I suspect this philosophy actually makes
             | Trust-In-Soft more money: You could argue that if there was
             | no UB, there would be no need for the tis-interpreter.
             | 
             | So isn't it in fact quite self-less that they encourage the
             | world to optimize a bit less (spending more money on
             | 'compute'), while standing to profit from the unintended
             | behaviour they'd otherwise be contracted to help debug?
        
           | SkiFire13 wrote:
           | > C has always had a concept of implementation defined
           | behavior, and unaligned memory accesses used to be defined to
           | work correctly on x86.
           | 
           | There are a bunch of misconceptions here:
           | 
           | - unaligned loads were never implementation defined, they are
           | undefined;
           | 
           | - even if they were implementation defined, this would give
           | the compiler the choice of how to define them, not the
           | instruction set;
           | 
           | - unaligned memory accesses on x86 for non-vector registers
           | still work fine, so old instructions were not impacted and
           | there's no bug. It's just that the expectations were not
           | fulfilled for the new extension of those instructions.
        
             | chaboud wrote:
             | Note: SIMD on x86 has unaligned instructions that used to
             | be much slower (decoded differently) than their aligned
             | counterparts.
             | 
             | For example, on Pentium 3 and Pentium Core 2, the unaligned
             | instructions took twice as many cycles to execute. On
             | modern x86 family processors, it's the same cycle count
             | either way. The only perf penalty one should account for is
             | crossing of cache lines, generally a much smaller problem.
        
           | JonChesterfield wrote:
           | Loads of architectures can't do misaligned memory access.
           | Even x86 has problems when variables span cache lines. The
           | compiler usually deals with this for the programmer, e.g. by
           | rounding the address down then doing multiple operations and
           | splicing the result together.
        
           | assbuttbuttass wrote:
           | Undefined and implementation defined are different in C. The
           | number of bits in an int is implementation defined. Unaligned
           | access is undefined.
        
         | compiler-guy wrote:
         | If you really are targeting the x86_64 instruction set, you
         | should be writing x86_64 instructions. Then you get exactly
         | what the hardware does and don't get any of those pesky
         | compiler assumptions.
         | 
         | Of course you don't get any of those pleasant optimizations
         | either. But those optimizations are only possible because of
         | the assumptions.
        
         | j16sdiz wrote:
         | That's what the author meant when he said "The shift of the C
         | language from "portable assembly" to "high-level programming
         | language without the safety of high-level programming
         | languages""
         | 
         | Back in the 1980s, C was expected to do what hardware does.
         | There was no "the C abstract machine".
         | 
         | The abstract machine idea was introduced much later.
         | 
         | > The arguments in this blogpost are fundamentally flawed.
         | 
         | The "fundamentally flawed" comment is revisionist idea.
        
           | wbl wrote:
           | How does C do what hardware does and store things in
           | registers when it can?
        
             | pjmlp wrote:
             | It doesn't, it is up to the compiler and optimizer to
             | decide how to go at it.
             | 
             | Vector instructions, replacing library functions with
             | compiler intrisics, splitting structs across registers and
             | stack, unrolling loops are all examples absent from the
             | language standard.
        
             | JonChesterfield wrote:
             | Two ways. One is the platform ABI sometimes says specific
             | arguments are passed in specific registers. The second is
             | (essentially) assigning local variables offsets on a
             | machine stack where some offsets are stored in registers.
        
           | bigbillheck wrote:
           | > Back in the 1980s, C was expected to do what hardware does.
           | There was no "the C abstract machine".
           | 
           | There was also a huge variety of compilers that were buggy
           | and incomplete each in their own ways, often with mutually-
           | incompatible extensions, not to mention prone to generating
           | pretty awful code.
        
           | User23 wrote:
           | To the best of my recollection the "abstract machine" is a
           | C++ism that unfortunately crept into C.
        
             | armitron wrote:
             | The "abstract machine" is present in the first C standard,
             | published in 1989.
        
             | zokier wrote:
             | From C89 document:
             | 
             | > 2.1.2.3 Program execution
             | 
             | > The semantic descriptions in this Standard describe the
             | behavior of an abstract machine in which issues of
             | optimization are irrelevant
             | 
             | [...]
             | 
             | > Alternatively, an implementation might perform various
             | optimizations within each translation unit, such that the
             | actual semantics would agree with the abstract semantics
             | only when making function calls across translation unit
             | boundaries. In such an implementation, at the time of each
             | function entry and function return where the calling
             | function and the called function are in different
             | translation units, the values of all externally linked
             | objects and of all objects accessible via pointers therein
             | would agree with the abstract semantics. Furthermore, at
             | the time of each such function entry the values of the
             | parameters of the called function and of all objects
             | accessible via pointers therein would agree with the
             | abstract semantics.
        
           | JonChesterfield wrote:
           | This turns out to be contentious. There are two histories of
           | the C language and which one you get told is true depends on
           | who you ask.
           | 
           | 1/ a way to emit specific assembly with a compiler dealing
           | with register allocation and instruction selection
           | 
           | 2/ an abstract machine specification that permits
           | optimisations and also happens to lower well defined code to
           | some architectures
           | 
           | My working theory is that the language standardisation effort
           | invented the latter. So when people say C was always like
           | this, they mean since ansi c89, and there was no language
           | before that. And when people say C used to be
           | typed/convenient assembly language, they're referring to the
           | language that was called C that existed in reality prior to
           | that standards document.
           | 
           | The WG14 mailing list was insistent (in correspondence to me)
           | that C was always like this, some of whom were presumably
           | around at the time. A partial counterargument is the semi-
           | infamous message from Dennis Richie copied in various places,
           | e.g. https://www.lysator.liu.se/c/dmr-on-noalias.html
           | 
           | An out of context quote from that email to encourage people
           | to read said context and ideally reply here with more
           | information on this historical assessment
           | 
           | "The fundamental problem is that it is not possible to write
           | real programs using the X3J11 definition of C. The committee
           | has created an unreal language that no one can or will
           | actually use."
           | 
           | Regards
        
             | lonjil wrote:
             | > My working theory is that the language standardisation
             | effort invented the latter. So when people say C was always
             | like this, they mean since ansi c89, and there was no
             | language before that. And when people say C used to be
             | typed/convenient assembly language, they're referring to
             | the language that was called C that existed in reality
             | prior to that standards document.
             | 
             | But the committee has always had a lot of C compiler
             | developers in it. The people who wrote the C89 standard
             | were the same people who developed many of the C compilers
             | in use before C89. The people who created the reality prior
             | to C89 created the reality after C89. Any perception of
             | "portable assembly" probably stemmed simply from the fact
             | that optimizers were much less sophisticated.
        
         | eklitzke wrote:
         | The blog post is also kind of unhinged because in the
         | incredibly rare cases where you would want to write code like
         | this you can literally just use the asm keyword.
         | 
         | I think it's also worth considering WHY compilers (and the C
         | standard) make these kinds of assumptions. For starters, not
         | all hardware platforms allow unaligned accesses at all. Even on
         | x86 where it's supported, you want to avoid doing unaligned
         | reads at all costs because they're up to 2x slower than aligned
         | accesses. God forbid you try to use unaligned atomics, because
         | while technically supported by x86 they're 200x slower than
         | using the LOCK prefix with an aligned read.[^1] The fact that
         | you need to go through escape hatches to get the compiler to
         | generate code to do unaligned loads and stores is a good thing,
         | because it helps prevent people from writing code with
         | mysterious slowdowns.
         | 
         | Writing a function that takes two pointers of the same type
         | already has to pessimize loads and stores on the assumption
         | that the pointers could alias. That is to say, if your function
         | takes int _p, int_ q then doing a store to p requires reloading
         | q, because p and q could point to the same thing. Thankfully in
         | some situations the compiler can figure out that in a certain
         | context p and q have different addresses and therefore can't
         | alias, this helps the compiler generate faster code (by
         | avoiding redundant loads). If p and q are allowed to alias even
         | when they have different addresses, this would all go out the
         | window and you'd basically need to assume that all pointer
         | types could alias under any situation. This would be TERRIBLE
         | for performance.
         | 
         | [^1]: https://rigtorp.se/split-locks/
        
           | vlovich123 wrote:
           | While the sentiment is correct as to why compilers makes
           | alignment assumptions, a lot of the details here I think are
           | not quite right.
           | 
           | > For starters, not all hardware platforms allow unaligned
           | accesses at all
           | 
           | If you're dealing with very simple CPUs like the ARM M0,
           | sure. But even the M3/M4 allows unaligned access.
           | 
           | > Even on x86 where it's supported, you want to avoid doing
           | unaligned reads at all costs because they're up to 2x slower
           | than aligned accesses
           | 
           | I believe that information hasn't been true for a long time
           | (since 1995). Unless you're talking about unaligned accesses
           | that also cross a cache line boundary being slower [1]. But I
           | imagine that aligned accesses crossing a cache line boundary
           | are also similarly slower because the slowness is the cache
           | line boundary.
           | 
           | > God forbid you try to use unaligned atomics, because while
           | technically supported by x86 they're 200x slower than using
           | the LOCK prefix with an aligned read
           | 
           | What you're referring to is atomic unaligned access that's
           | also across cache line boundaries. I don't know what it is
           | within a cache line, but I imagine it's not as bad as you
           | make it out to be. Unaligned atomics across cache line
           | boundaries also don't work on ARM and have much spottier
           | support than unaligned access in general.
           | 
           | TLDR: People cargo cult advice about unaligned access but
           | it's more because it's a simpler rule of thumb and there's
           | typically very little benefit to pack things as tightly as
           | possible which is where unaligned accesses generally come up.
           | 
           | [1] https://news.ycombinator.com/item?id=10529947
        
             | macjohnmcc wrote:
             | Yeah even Microsoft's compiler aligns values on appropriate
             | boundaries for performance reasons. DWORDs on DWORD
             | boundaries etc. And if you want to pack the data structure
             | to avoid the gaps in structures there are methods to do so
             | via #pragma options. I think their complaining about what
             | was done for performance reasons shows a great lack of
             | overall understanding. More time researching and less time
             | griping would have served them better.
        
             | AshamedCaptain wrote:
             | Your message is more misleading than the GP.
             | 
             | Many architectures sold today still claim unaligned
             | accesses are optional (e.g. all ARM pre-v7, which includes
             | the popular Raspberry Pi Zero). Not to mention that even if
             | they are supported, not all instructions support it (which
             | is the case today on all ARM cores and even on x86).
             | 
             | From the architectures and instructions which may support
             | it, it may have a performance penalty which may range from
             | "somewhat slower" (e.g. Intel still recommends stack
             | alignment, because otherwise many internal store
             | optimizations start giving up) to "ridiculously slower"
             | (e.g. I once had to write a trap handler that software-
             | emulated unaligned accesses on ARM -- on all 32-bit ARMs
             | Linux still does this for all instructions except plain
             | undecorated LDR/STR when the special unaligned ABI is
             | enabled).
             | 
             | And finally, even if the architecture supports it with
             | decent enough performance, it may do it with relaxed
             | atomicity. E.g. even as of today aarch64 makes zero
             | guarantees regarding atomicity of even atomic instructions
             | on unaligned addresses (yes, really). To put it simply
             | because it is a _pain in the ass_ to implement correctly
             | (say programmer does atomic load/store on overlapping
             | addresses with different alignments). This is whether they
             | cross cache lines or not.
             | 
             | i.e. it's as a bad as the GP is saying. You can't just put
             | one example of one processor handling each case correctly
             | to dismiss this claim, because the point is that most
             | processor's don't bother and those who do bother still have
             | severe crippling limitations that make it unfeasible to use
             | in a GP compiler.
             | 
             | And there is still a lot of benefit to packing things up...
             | but it does require way too much care and programmer
             | effort.
        
             | torusle wrote:
             | > If you're dealing with very simple CPUs like the > ARM
             | M0, sure. But even the M3/M4 allows unaligned > access.
             | 
             | On ARM M3/M4 you have the same issue with LDRD and STRD
             | instructions which do not allow unaligned access. Even the
             | normal load/stores don't allow unaligned access in all
             | cases. Try this in the peripheral memory region for
             | starters. And things get even more complicated when the
             | memory protection unit shakes up things.
        
           | Gibbon1 wrote:
           | > For starters, not all hardware platforms allow unaligned
           | accesses at all.
           | 
           | Yeah and always everywhere a mistake. It was a mistake back
           | in the 1970's and it's increasing bigger mistake as time goes
           | on. Just like big endian and 'network order'
        
         | jcranmer wrote:
         | This line in particular really bugs me:
         | 
         | > The present blog post brings bad, and as far as I know,
         | previously undocumented news. Even if you really are targeting
         | an instruction set without any memory access instruction that
         | requires alignment, GCC still applies some sophisticated
         | optimizations that assume aligned pointers.
         | 
         | I could have told you this was true ~20 years ago, and the main
         | reason I'm so conservative in how far back gcc has been doing
         | this is that it's only around that time I started programming--
         | I strongly suspect this dates back to the 90's.
        
           | SAI_Peregrinus wrote:
           | It dates to the first standardization of C in 1989. The "C as
           | portable assembly" view ended when ANSI C got standardized,
           | and K&R's 2nd edition was published.
        
             | circuit10 wrote:
             | For C to be portable this needs to be undefined behaviour
             | because there are CPUs that don't support unaligned access
        
               | astrange wrote:
               | There are some other reasons, but that's one of them.
               | 
               | Another is that you want to guarantee objects are stored
               | aligned in memory because that gives you some free bits
               | in pointers you can hide stuff in. (This has less
               | hardware support than it should.)
        
               | chaboud wrote:
               | That's why, while much of the linked blog is kind of off
               | the mark (signs of someone knowing less than they think
               | they know), the general conclusion, using aligned
               | pointers is recommended, is one that I typically
               | recommend to developers new to C or C++ anyway.
               | 
               | I'm alright with folks sticking to aligned pointer
               | operations, largely for performance reasons. On some
               | platforms, unaligned operations are really expensive.
        
             | j16sdiz wrote:
             | I would argue it's the modern understanding of C standard
             | is flawed.
             | 
             | Back in 89, many of those unspecified behavior were
             | understood as implementation/hardware dependent, not
             | undefined. Aliasing was the norm, `restrict` was actually a
             | keyword.
             | 
             | Modern C is neither safe nor low-level.
        
               | jcranmer wrote:
               | Ascertaining the state of the mind of the C committee in
               | 1989 is difficult, since only the documents from ~late
               | 1996 are consistently available online (the earlier
               | documents are probably sitting somewhere in a warehouse
               | in Geneva, but they may as well not exist anymore).
               | 
               | But definitely by the time C99 came out, it is clear that
               | optimize-assuming-UB-doesn't-happen was an endorsed
               | viewpoint of the committee [1]. C99 also added restrict
               | to the language (not C89 as you suggest), and restrict
               | was the first standardized feature that was a pure UB-
               | optimization hint [2].
               | 
               | It is important to remember that there isn't just one
               | catch-all category of implementation-varying behavior.
               | There is a difference between unspecified behavior,
               | implementation-defined behavior, and undefined behavior.
               | Undefined behavior has been understood, from its
               | inception, as behavior that doesn't constrain the
               | compiler, and often describes behavior that _can 't_ be
               | meaningfully constrained (especially with regards to
               | potentially-trapping operations).
               | 
               | [1] The C99 rationale gives an example of an optimization
               | that compilers can perform that relies on assuming UB
               | can't happen--reassociation of integer addition, on one's
               | complement machines.
               | 
               | [2] The register keyword is I believe even in K&R C and
               | would also be qualified as a compiler hint feature, but I
               | note that it prohibits taking the address of the variable
               | entirely, so it doesn't rely on UB. Whereas restrict has
               | to rely on "if these two variables alias, it's UB" to
               | allow the compiler to optimize assuming nonaliasing.
        
               | Someone wrote:
               | > Back in 89 [...] `restrict` was actually a keyword.
               | 
               | Was it? I thought it's more recent.
               | https://en.wikipedia.org/wiki/Restrict seems to agree (
               | _"In the C programming language, restrict is a keyword,
               | introduced by the C99 standard,[1] that can be used in
               | pointer declarations"_ ), as does
               | https://en.cppreference.com/w/c/language/restrict (
               | _"restrict type qualifier (since C99)"_ )
               | 
               | Was there an older usage?
        
               | rsaxvc wrote:
               | I've used several pre-C99 embedded compilers that
               | supported restrict. IIRC, probably of mid 90s vintage.
        
               | 6D794163636F756 wrote:
               | I haven't gotten to use C in industry, but I was taught
               | that undefined behavior just means that it is defined by
               | the running system and not the compiler. Is that not the
               | general understanding? Maybe I was just taught that way
               | because it was old timers teaching it.
        
               | umanwizard wrote:
               | That's indeed incorrect. Undefined behavior anywhere
               | means that the entirety of your program is undefined and
               | may do anything.
        
               | gdwatson wrote:
               | If the language standard leaves some behavior undefined,
               | other sources (e.g., POSIX, your ABI, your standard
               | library docs, or your compiler docs) are free to define
               | it. If they do, and you are willing to limit your
               | program's portability, you can use that behavior with
               | confidence. But they also leave many behaviors undefined,
               | and you can't rely on those.
               | 
               | For implementation-defined behavior, the language
               | standard lays out a menu of options and your
               | implementation is required to pick one and document it.
               | IMHO, many things in the C standard are undefined that
               | ought to be implementation-defined. But unaligned pointer
               | accesses would be hard to handle that way; at best you
               | could make the compiler explicitly document whether or
               | not it supports them on a given architecture.
        
               | patrakov wrote:
               | What you are talking about is implementation-defined
               | behavior. It exists in the C standard separately from the
               | undefined behavior.
        
               | SAI_Peregrinus wrote:
               | Implementation Defined behavior means the standards
               | authors provided a list of possible behaviors, and
               | compiler authors must pick one and document which they
               | picked.
               | 
               | Unspecified behavior is more what you're thinking of,
               | though in that case the standard still provides a list of
               | possibilities that compiler authors have to pick from,
               | they just don't have to document it or always make the
               | same choice for every program.
               | 
               | There's no allowed subset of behavior where compiler
               | authors are free to pick whatever they want and document
               | it (but must do so). IMO there should be, most "Undefined
               | Behavior" could be specified and documented, even where
               | that choice would be "the compiler assumes such
               | situations are unreachable and optimizes based on that
               | assumption" like much of current UB. At least it'd be
               | explicit!
        
               | dale_glass wrote:
               | No. See this for details on how UB is handled by
               | compilers:
               | 
               | http://blog.llvm.org/2011/05/what-every-c-programmer-
               | should-...
               | 
               | The TL;DR is that compilers compile code based on
               | assumptions that UB won't be invoked. This sometimes
               | produces extremely surprising results which have nothing
               | to do with the hardware/OS.
        
           | noselasd wrote:
           | indeed, I still have ~20 years old code that picks up and
           | rectifies unaligned memory so gcc does the right thing. To
           | claim a compiler bugs out on unaligned memory sounds very
           | weird, I assumed that was common knowledge.
        
             | nomel wrote:
             | My first 10 minutes of trying to talk to hardware, and then
             | googling the error message. taught me.
        
         | tom_ wrote:
         | And one of the anythings permitted would be to behave in a
         | documented manner characteristic of the target environment. The
         | program is after all almost certainly being built to run on an
         | actual machine; if you know what that actual machine does, it
         | would sometimes be useful to be able to take advantage of that.
         | We might not be able to demand this on the basis that the
         | standard requires it, but as a quality of implementation issue
         | I think it a reasonable request.
         | 
         | This is such an obvious thing to do that I'm surprised the C
         | standard doesn't include wording along those lines to
         | accommodate it. But I suppose even if it did, people would just
         | ignore it.
        
           | robinsonb5 wrote:
           | The problem is that what the machine does isn't necessarily
           | consistent. If you're using old-as-the-green-hills integer
           | instructions then yes, the CPU supports unaligned access. If
           | you want to benefit from the speedup afforded by the latest
           | vector instructions, now it suddenly it doesn't.
           | 
           | Also, to be fair, GCC does appear to back off the
           | optimisations when dealing with, for example, a struct with
           | the packed attribute.
        
       | Dwedit wrote:
       | We need an '__unaligned' modifier for pointers to specify that
       | the pointer will be used for unaligned reads and writes.
        
         | Athas wrote:
         | Why not just use memcpy()?
        
           | greesil wrote:
           | Arrays?
        
           | j16sdiz wrote:
           | Because, in some hardware, unaligned read is ok. and you want
           | to take advantage of the hardware feature?
        
             | dzaima wrote:
             | Both gcc and clang will optimize a memcpy to an unaligned
             | load/store where possible.
        
               | j16sdiz wrote:
               | If the compiler is so smart, I guess it could insert a
               | memcpy when needed?
               | 
               | The standard, you may say.. I would argue it's the
               | standard need to be changed. The modern reading of the
               | standard is not useful as a low-level language and is
               | unsafe as a high-level language.
        
               | dzaima wrote:
               | Right, I agree that it would be nice to have some way to
               | request unaligned load/store to be permitted, alike
               | -fwrapv for signed int wrapping. But nevertheless the UB
               | behavior is a reasonable option that's beneficial for
               | other things.
        
               | rsaxvc wrote:
               | > If the compiler is so smart, I guess it could insert a
               | memcpy when needed?
               | 
               | If I'm reading your comment and the blog post correctly,
               | the compiler would need a memory like access on every
               | multibyte pointer argument where the compiler cannot
               | otherwise prove alignment. Is that correct?
        
               | rsaxvc wrote:
               | > memory like
               | 
               | I meant memcpy()-like
        
               | dzaima wrote:
               | Internally, the compiler could represent it however it
               | wants. LLVM IR's load/store instructions just have an
               | "align" property, which is usually sizeof(the type), but
               | can be set to 1 to mimic memcpy (and indeed llvm/clang
               | immediately translate a memcpy to such -
               | https://godbolt.org/z/7T46a6aqT).
               | 
               | Though it seems that, independent of this, it assumes
               | that an int* in general will be 4-byte-aligned, so e.g.
               | https://godbolt.org/z/aWTEd4s3K still has an "align 4"
               | despite using memcpy. So one must also cast to a char*
               | before using memcpy() to actually have it work. yay for
               | more footguns!
        
           | IshKebab wrote:
           | That is the solution they recommend.
        
         | mtklein wrote:
         | If you've got control over the type of the pointee and not just
         | the pointer, __attribute__((packed)) can work for this.
         | struct foo {                         int x; };         struct
         | bar { __attribute__((packed)) int x; };
         | _Static_assert(_Alignof(struct foo) == 4, "");
         | _Static_assert(_Alignof(struct bar) == 1, "");
        
           | loeg wrote:
           | Yeah, I end up using __attribute__((packed)) for this at
           | work. For tortured reasons, part of our codebase allocates
           | memory with only 8-byte alignment, but the buffer is cast to
           | a type that would have 16-byte alignment without
           | __attribute__((packed)). As a result Clang wants to generate
           | VMOVDQAs that require 16-byte alignment, unless you use
           | packed, in which case it generates VMOVDQU.
        
       | shadowofneptune wrote:
       | The big thing seems to be less about GCC, and more a question of,
       | "what should a compiler be?"
       | 
       | He'd be better looking at smaller, less-known compilers, like the
       | Portable C Compiler or the Intel C Compiler. If you want hyper-
       | optimized, better-than-assembly quality, you pretty much have to
       | give up predictability. The best optimizations that are
       | predictable can't be written using modern compiler theory. They
       | instead involve a lot of work, care, and attention that can't be
       | generalized to other architectures. It can require a love for an
       | architecture, even if's a crap one.
       | 
       | It's a tradeoff. Not every compiler needs to be optimized, and
       | not every compiler needs to embody the spirit of a language.
        
       | dzaima wrote:
       | In a project I'm working on[0], there's an array object type used
       | throughout, which can sometimes point to arbitrary data
       | elsewhere. In a funky edge-case[1], such an array can be built
       | with an unaligned data pointer.
       | 
       | Thus, if gcc/clang started seriously utilizing aligned pointer
       | accesses everywhere, _nearly every single load & store in the
       | entire project_ would have to be replaced with something
       | significantly more verbose. Maybe in a more fancy language you
       | could have ptr<int> vs unaligned_ptr<int> or similar, but in C
       | you kinda just have compiler flags, and maybe __attribute__-s if
       | you can spare some verbosity.
       | 
       | C UB is often genuinely useful, but imo having an opt-out option
       | for certain things is, regardless, a very reasonable request.
       | 
       | [0]: https://github.com/dzaima/CBQN
       | 
       | [1]: Any regularly allocated array has appropriate alignment. But
       | there are some functions that take a slice of the array
       | "virtually" (i.e. pointing to it instead of copying), and another
       | one that bitwise-reinterprets an array to one with a different
       | element type (again, operating virtually). This leads to a
       | problem when e.g. taking i8 elements [3;7) and reinterpreting as
       | an i32 array. A workaround would be to make the reinterpret copy
       | memory if necessary (and this would have to be done if targeting
       | something without unaligned load/store), but that'd ruin it being
       | a nice O(1).
        
         | tylerhou wrote:
         | > This leads to a problem when e.g. taking i8 elements [3;7)
         | and reinterpreting as an i32 array.
         | 
         | Even ignoring alignment issues, this is already UB because it
         | violates the strict aliasing rule. You technically need to
         | memcpy and hope that the compiler optimizes the memcpy out. In
         | C++20 you can use std::bit_cast in some circumstances.
         | https://en.cppreference.com/w/cpp/numeric/bit_cast. In C11 you
         | can use a union, but that still requires a "copy" into the
         | union.
        
           | dzaima wrote:
           | I'm of course already using -fno-strict-aliasing (primarily
           | because without it it's impossible to implement a custom
           | memory allocator, but it also helps here).
        
         | lights0123 wrote:
         | > in C you kinda just have compiler flags, and maybe
         | __attribute__-s if you can spare some verbosity
         | 
         | Which you can use to wrap the unaligned type as a packed
         | struct, i.e.                  struct
         | __attribute__((__packed__)) unaligned_int { int i; };
         | 
         | which has an alignment of 1.
        
       | non-e-moose wrote:
       | Unaligned pointer accesses are for 80386 bozos. Period End of
       | story. If you want to play in 64-bit land, live by the
       | architectural rules. If you do not, your code will likely die.
       | And you need to "Lurn" a lot
        
       | dang wrote:
       | Discussed at the time:
       | 
       |  _GCC always assumes aligned pointer accesses_ -
       | https://news.ycombinator.com/item?id=22887685 - April 2020 (91
       | comments)
        
       | JoeAltmaier wrote:
       | Machine language architecture is flawed. The assumption of
       | alignment is in the machine language.
       | 
       | Compilers can only use the instructions that are there. They have
       | a difficult choice: close their eyes and generate aligned-pointer
       | moves, or use a sequence of tests and partial move instructions
       | that is orders of magnitude less efficient.
       | 
       | We've needed machine instructions to load or move memory
       | efficiently regardless of alignment, for decades.
        
         | titzer wrote:
         | In retrospect, too many crazy software dances are due to
         | miserliness on the part of hardware designs. Saving a couple
         | bits in the address lines and not needing to straddle cache
         | lines? We're far past the point where that's a considerable
         | cost, as all modern ISA implementations now attest.
        
       | phkahler wrote:
       | If you're going to read byte-level data you should be using a
       | char pointer.
       | 
       | The author also speculates on how common this "bug" is. I'd say
       | 15000 Debian packages that work properly indicates that just
       | about nobody is relying on this undefined behavior.
        
         | dzaima wrote:
         | For one, the linux kernel itself relies on this UB -
         | https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93031#c5, linked
         | from OP.
         | 
         | And indeed linux would hit UBSAN reports on those, but just
         | disables it if native unaligned load/store is configured - last
         | paragraph of
         | https://github.com/torvalds/linux/blob/706a741595047797872e6...
         | 
         | At present there's almost no reliance on this UB by compilers
         | on something that actually has a chance of affecting real code,
         | so it's not particularly unexpected that software appears to
         | work.
        
         | [deleted]
        
         | unnah wrote:
         | Getting to that point has required years of maintenance work
         | since compiler writers started interpreting more and more
         | undefined behaviour as optimization opportunities. At least now
         | we have UBSAN to actually test for undefined behaviour at
         | runtime.
        
         | assbuttbuttass wrote:
         | Undefined behavior doesn't necessarily mean the program will
         | exhibit an issue. It could silently break with a future version
         | of the compiler, which it sounds like was the case here
        
       | anonymousiam wrote:
       | The SPARC cc compiler has (had) a -misalign flag. Maybe the GCC
       | guys could support the same option?
       | 
       | https://docs.oracle.com/cd/E19957-01/805-4952/6j4mdcegh/inde...
        
       | tedunangst wrote:
       | Compilers should just add a --k&r mode to appease these people,
       | but it should also reject ansi code with parse errors.
        
         | iso8859-1 wrote:
         | If a --k&r mode were to be reliable, wouldn't it need to get
         | specified first? Otherwise people would start relying on some
         | edge case.
         | 
         | If speed is not a requirement for the --k&r mode, you could
         | just take the tis-interpreter and note that if it runs without
         | UB, it is still much faster than an actual computer was when
         | k&r were active.
         | 
         | Would it even be possible to specify a variant of C that
         | contains no UB (e.g. would define exactly what happens on
         | unaligned access), but can compile practical existing C89
         | programs? I wonder if it could be written such that it could
         | actually specify the behaviour consistently across the language
         | intersection supported by both of e.g. GCC 2.95 and Chibicc[0].
         | 
         | Or maybe there are so many bugs in GCC 2.95 that it would
         | simply be infeasible? How much time would it take to specify?
         | 
         | [0]: https://github.com/rui314/chibicc
        
           | tedunangst wrote:
           | It could probably only be used with a neural link. It would
           | read your mind, then emit code that matches your perception
           | of what you imagine old compilers did.
        
             | zokier wrote:
             | You jest, but there was some real effort put in to attempt
             | define a dialect of C that would be less UB etc. And indeed
             | the big problem was defining the semantics:
             | 
             | > After publishing the Friendly C Proposal, I spent some
             | time discussing its design with people, and eventually I
             | came to the depressing conclusion that there's no way to
             | get a group of C experts -- even if they are knowledgable,
             | intelligent, and otherwise reasonable -- to agree on the
             | Friendly C dialect. There are just too many variations,
             | each with its own set of performance tradeoffs, for
             | consensus to be possible.
             | 
             | https://blog.regehr.org/archives/1287
        
       | WalterBright wrote:
       | The D programming language does not allow the creation of
       | misaligned pointers in code marked as @safe, and in @safe code
       | assumes they are aligned. In @system code you can do whatever you
       | like, but things need to be aligned that are provided to @safe
       | code.
        
         | dzaima wrote:
         | Doesn't look like that changes anything about actual
         | dereferencing though, which is the primary thing discussed -
         | https://godbolt.org/z/4vW5Ksnab still emits an "align 4", which
         | llvm could still assume as UB if violated (though I don't know
         | if it ever does).
        
       | compiler-guy wrote:
       | A good article on this topic is Christian Lattner's "What ever C
       | programmer should know about undefined behavior."
       | 
       | http://blog.llvm.org/2011/05/what-every-c-programmer-should-...
        
       | azakai wrote:
       | An example of a recent compile target that breaks on unaligned
       | pointer accesses was asm.js. There, a 32-bit read turns into a
       | read from a JavaScript Int32Array like this:
       | 
       | HEAP32[ptr >> 2]
       | 
       | The k-th index in the array contains 4 bytes of data, so the
       | pointer to an address must be divided by 4, which is what the >>
       | 2 does. And >> 2 will "break" unaligned pointers because it
       | discards the low bits.
       | 
       | In practice we did run into codebases that broke because of this,
       | but it was fairly rare. We built some tools (SAFE_HEAP) that
       | helped find such issues. In the end it may have added some work
       | to a small amount of ports, but very few I think.
       | 
       | asm.js has been superceded by WebAssembly, which allows unaligned
       | accesses, so this is no longer a problem there.
        
         | titzer wrote:
         | > WebAssembly, which allows unaligned accesses
         | 
         | And I think we made the right call (other than the vestigial
         | alignment bits in load/store immediates, which AFAIK, no engine
         | is making use of).
        
       | banthar wrote:
       | This is how all undefined behavior works. It seems to be working
       | now but breaks with new CPU, GCC version or on wrong moon phase.
       | 
       | "-Wcast-align=strict" will work in this but not all cases -
       | that's why we have UBSAN:                   $ gcc
       | -fsanitize=undefined test.c         $ ./a.out
       | test.c:6:6: runtime error: store to misaligned address
       | 0x55e4007adeb1 for type 'int', which requires 4 byte alignment
       | 0x55e4007adeb1: note: pointer points here          00 00 00  01
       | 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00
       | 00 00  51 00 00 00 00
        
       | teo_zero wrote:
       | In other words, the result they got is different from the
       | expected one. The keyword here is "expected": if your code
       | contains a part that generates undefined behavior accordingto the
       | standard, then you should have no expectations. What's worth
       | mentioning in this blog post?
        
       | MUSIC_NERD wrote:
       | [flagged]
        
       | DannyBee wrote:
       | They are confused, and seem not to realize that ABIs exist, and
       | often specify alignment requirements. They seem to believe there
       | are just ISA and architecture specs.
       | 
       | When you compile for Linux x86_64 ABI, gcc assumes that the stack
       | is 16 byte aligned because it's required by the ABI.
       | 
       | Regardless of whether the ISA needs it.
       | 
       | If they want the compiler to make no assumptions about aligned
       | accesses, they would need to define an ABI in GCC that operates
       | that way and compile.with it. They were historically supported
       | (though its been years since I looked)
        
       | quelsolaar wrote:
       | Storing pointers unaligned and using memcpy to extract them to an
       | aligned pointer, can be a performance gain, if it means less
       | padding taking up valuable cache space.
        
         | DougMerritt wrote:
         | In the right circumstances, you're very right -- but most
         | people will get some aspect of this wrong.
        
       | NotYourLawyer wrote:
       | Yes, compilers take advantage of undefined behavior.
        
       ___________________________________________________________________
       (page generated 2023-08-20 23:01 UTC)