[HN Gopher] GCC always assumes aligned pointer accesses (2020)
___________________________________________________________________
GCC always assumes aligned pointer accesses (2020)
Author : luu
Score : 102 points
Date : 2023-08-20 00:59 UTC (22 hours ago)
(HTM) web link (trust-in-soft.com)
(TXT) w3m dump (trust-in-soft.com)
| zajio1am wrote:
| > The C standards, having to accommodate both target
| architectures where misaligned accesses worked and target
| architectures where these violently interrupted the program,
| applied their universal solution: they classified misaligned
| access as an undefined behavior.
|
| No. If the C standard wants to accommodate different target
| architectures, they use implementation-specified behavior. The
| undefined behavior is just polite way to say that the code is
| buggy.
|
| The C standard just requires natural alignment even on
| architectures that allows unaligned accesses.
| volta87 wrote:
| The arguments in this blogpost are fundamentally flawed. The fact
| that they opened a bug based on them but got shut down should
| have raised all red flags.
|
| When compiling and running a C program, the only thing that
| matters is "what the C abstract machine does". Programs that
| exhibit UB in the abstract machine are allowed to do "anything".
|
| Trying to scope that down using arguments of the form "but what
| the hardware does is X" are fundamentally flawed, because
| anything means anything, and what the hardware does doesn't
| change that, and therefore it doesn't matter.
|
| This blogpost "What The Hardware Does is not What Your Program
| Does" explains this in more detail and with more examples.
|
| https://www.ralfj.de/blog/2019/07/14/uninit.html
| kmeisthax wrote:
| Great, except no implementation of the C abstract machine
| actually exists. So you can't test against it. All you have are
| compilers that use it to justify miscompiling your code.
|
| We need a C interpreter that intentionally implements C machine
| features that don't correspond to any architectural feature -
| i.e. pointers are (allocation provenance, offset) pairs,
| integer overflow panics, every pointer construction is checked,
| etc. If only to point out how hilariously absurd the ISO C UB
| rules are and how nobody actually follows them.
|
| My personal opinion is that "undefined behavior" was a spec
| writing mistake that has been rules-lawyered into absurdity.
| For example, signed integer overflow being UB was intended to
| allow compiling C to non-twos-compliment machines. This was
| interpreted to allow inventing _new_ misbehaviors for integer
| overflow instead of "do whatever the target architecture
| does."
| dorianh wrote:
| The blog post company does sell a C interpreter that checks
| for all undefined behaviors (with provenance and offset).
| codedokode wrote:
| > For example, signed integer overflow being UB was intended
| to allow compiling C to non-twos-compliment machines.
|
| This is indeed a design mistake, but in another sense.
| Ordinary arithmetic ops like + or - should throw an exception
| on overflow (with both signed and unsigned operands) because
| most of the times you need an ordinary math, not math modulo
| 2^32. For those rare cases where wrap around is desired,
| there should be a function like add_and_wrap() or a special
| operator.
| plorkyeran wrote:
| UBSan covers each of those except provenance checking, and
| ASan mostly catches provenance problems even though that's
| not directly the goal. There are some dumb forms of UB not
| caught by any of the sanitizers, but most of them are.
|
| Making your program UBSan-clean is the bare minimum you
| should do if you're writing C or C++ in 2023, not an absurd
| goal. I know it'll never happen, but I'm increasingly of the
| opinion that UBSan should be enabled by default.
| captainmuon wrote:
| I think it is a good blog post, because it highlights an issue
| that I was not aware of and that I think many programmers are
| not. I do think I am a decent C programmer, and I spotted the
| strict aliasing issue immediately, but I didn't know that
| unaligned pointer access is UB. Because let's face it, the
| majority of programmers didn't read the standard, and those who
| did don't remember all facets.
|
| I first learned many years ago that you _should_ pick apart
| binary data by casting structs, using pointers to the middle of
| fields and so on. It was ubiquitous for both speed and
| convenience. I don 't know if it was legal even in the 90s, but
| it was general practice - MS Office file formats from that time
| were just dumped structs. Then at some point I learned about
| pointer alignment - but it was always framed due to
| performance, and due to the capabilities of exotic platforms,
| never as a correctness issue. But it's not just important to
| learn what to do, but also why to do it, which is why we need
| more articles highlighting these issues.
|
| (And I have to admit, I am one of these misguided people who
| would love a flag to turn C into "portable assembler" again.
| Even if it is 10x slower, and even if I had to add annotations
| to every damn for loop to tell the compiler that I'm not
| overflowing. There are just cases where understanding what you
| are actually doing to the hardware trumps performance.)
| hedora wrote:
| Honestly, I think you are both incorrect.
|
| C has always had a concept of implementation defined behavior,
| and unaligned memory accesses used to be defined to work
| correctly on x86.
|
| Intel added instructions that can't handle unaligned access, so
| they broke that contract. I'd argue that it is an instruction
| set architecture bug.
|
| Alternatively, Intel could argue that compilers shouldn't emit
| vector instructions unless they can statically prove the
| pointer is aligned. That's not feasible in general for
| languages like C/C++, so that's a pretty weak defense of having
| the processor pay the overhead of supporting unaligned access
| on some, but not all, paths.
| bigbillheck wrote:
| > C has always had a concept of implementation defined
| behavior,
|
| Surely only after standardization tho?
| armitron wrote:
| Unaligned memory accesses are undefined behavior in C. If
| you're writing C, you should be abiding by C rules. "Used to
| work correctly" is more guesswork and ignorance than "abiding
| by C rules". In C, playing fast&loose with definitions hurts,
| BAD.
|
| Frankly, I'd be ashamed to write this blog post since the
| only thing it accomplishes is exposing its writers as not
| understanding the very thing they're signaling expertise on.
| iso8859-1 wrote:
| What makes you think they don't understand it? They
| acknowledge that it is UB. I read them as realistic, since
| they know that people rely on C compilers working a certain
| way. They even wrote an interpreter that detects UB:
| https://github.com/TrustInSoft/tis-interpreter
|
| I understand why people like the compiler being able to
| leverage UB. I suspect this philosophy actually makes
| Trust-In-Soft more money: You could argue that if there was
| no UB, there would be no need for the tis-interpreter.
|
| So isn't it in fact quite self-less that they encourage the
| world to optimize a bit less (spending more money on
| 'compute'), while standing to profit from the unintended
| behaviour they'd otherwise be contracted to help debug?
| SkiFire13 wrote:
| > C has always had a concept of implementation defined
| behavior, and unaligned memory accesses used to be defined to
| work correctly on x86.
|
| There are a bunch of misconceptions here:
|
| - unaligned loads were never implementation defined, they are
| undefined;
|
| - even if they were implementation defined, this would give
| the compiler the choice of how to define them, not the
| instruction set;
|
| - unaligned memory accesses on x86 for non-vector registers
| still work fine, so old instructions were not impacted and
| there's no bug. It's just that the expectations were not
| fulfilled for the new extension of those instructions.
| chaboud wrote:
| Note: SIMD on x86 has unaligned instructions that used to
| be much slower (decoded differently) than their aligned
| counterparts.
|
| For example, on Pentium 3 and Pentium Core 2, the unaligned
| instructions took twice as many cycles to execute. On
| modern x86 family processors, it's the same cycle count
| either way. The only perf penalty one should account for is
| crossing of cache lines, generally a much smaller problem.
| JonChesterfield wrote:
| Loads of architectures can't do misaligned memory access.
| Even x86 has problems when variables span cache lines. The
| compiler usually deals with this for the programmer, e.g. by
| rounding the address down then doing multiple operations and
| splicing the result together.
| assbuttbuttass wrote:
| Undefined and implementation defined are different in C. The
| number of bits in an int is implementation defined. Unaligned
| access is undefined.
| compiler-guy wrote:
| If you really are targeting the x86_64 instruction set, you
| should be writing x86_64 instructions. Then you get exactly
| what the hardware does and don't get any of those pesky
| compiler assumptions.
|
| Of course you don't get any of those pleasant optimizations
| either. But those optimizations are only possible because of
| the assumptions.
| j16sdiz wrote:
| That's what the author meant when he said "The shift of the C
| language from "portable assembly" to "high-level programming
| language without the safety of high-level programming
| languages""
|
| Back in the 1980s, C was expected to do what hardware does.
| There was no "the C abstract machine".
|
| The abstract machine idea was introduced much later.
|
| > The arguments in this blogpost are fundamentally flawed.
|
| The "fundamentally flawed" comment is revisionist idea.
| wbl wrote:
| How does C do what hardware does and store things in
| registers when it can?
| pjmlp wrote:
| It doesn't, it is up to the compiler and optimizer to
| decide how to go at it.
|
| Vector instructions, replacing library functions with
| compiler intrisics, splitting structs across registers and
| stack, unrolling loops are all examples absent from the
| language standard.
| JonChesterfield wrote:
| Two ways. One is the platform ABI sometimes says specific
| arguments are passed in specific registers. The second is
| (essentially) assigning local variables offsets on a
| machine stack where some offsets are stored in registers.
| bigbillheck wrote:
| > Back in the 1980s, C was expected to do what hardware does.
| There was no "the C abstract machine".
|
| There was also a huge variety of compilers that were buggy
| and incomplete each in their own ways, often with mutually-
| incompatible extensions, not to mention prone to generating
| pretty awful code.
| User23 wrote:
| To the best of my recollection the "abstract machine" is a
| C++ism that unfortunately crept into C.
| armitron wrote:
| The "abstract machine" is present in the first C standard,
| published in 1989.
| zokier wrote:
| From C89 document:
|
| > 2.1.2.3 Program execution
|
| > The semantic descriptions in this Standard describe the
| behavior of an abstract machine in which issues of
| optimization are irrelevant
|
| [...]
|
| > Alternatively, an implementation might perform various
| optimizations within each translation unit, such that the
| actual semantics would agree with the abstract semantics
| only when making function calls across translation unit
| boundaries. In such an implementation, at the time of each
| function entry and function return where the calling
| function and the called function are in different
| translation units, the values of all externally linked
| objects and of all objects accessible via pointers therein
| would agree with the abstract semantics. Furthermore, at
| the time of each such function entry the values of the
| parameters of the called function and of all objects
| accessible via pointers therein would agree with the
| abstract semantics.
| JonChesterfield wrote:
| This turns out to be contentious. There are two histories of
| the C language and which one you get told is true depends on
| who you ask.
|
| 1/ a way to emit specific assembly with a compiler dealing
| with register allocation and instruction selection
|
| 2/ an abstract machine specification that permits
| optimisations and also happens to lower well defined code to
| some architectures
|
| My working theory is that the language standardisation effort
| invented the latter. So when people say C was always like
| this, they mean since ansi c89, and there was no language
| before that. And when people say C used to be
| typed/convenient assembly language, they're referring to the
| language that was called C that existed in reality prior to
| that standards document.
|
| The WG14 mailing list was insistent (in correspondence to me)
| that C was always like this, some of whom were presumably
| around at the time. A partial counterargument is the semi-
| infamous message from Dennis Richie copied in various places,
| e.g. https://www.lysator.liu.se/c/dmr-on-noalias.html
|
| An out of context quote from that email to encourage people
| to read said context and ideally reply here with more
| information on this historical assessment
|
| "The fundamental problem is that it is not possible to write
| real programs using the X3J11 definition of C. The committee
| has created an unreal language that no one can or will
| actually use."
|
| Regards
| lonjil wrote:
| > My working theory is that the language standardisation
| effort invented the latter. So when people say C was always
| like this, they mean since ansi c89, and there was no
| language before that. And when people say C used to be
| typed/convenient assembly language, they're referring to
| the language that was called C that existed in reality
| prior to that standards document.
|
| But the committee has always had a lot of C compiler
| developers in it. The people who wrote the C89 standard
| were the same people who developed many of the C compilers
| in use before C89. The people who created the reality prior
| to C89 created the reality after C89. Any perception of
| "portable assembly" probably stemmed simply from the fact
| that optimizers were much less sophisticated.
| eklitzke wrote:
| The blog post is also kind of unhinged because in the
| incredibly rare cases where you would want to write code like
| this you can literally just use the asm keyword.
|
| I think it's also worth considering WHY compilers (and the C
| standard) make these kinds of assumptions. For starters, not
| all hardware platforms allow unaligned accesses at all. Even on
| x86 where it's supported, you want to avoid doing unaligned
| reads at all costs because they're up to 2x slower than aligned
| accesses. God forbid you try to use unaligned atomics, because
| while technically supported by x86 they're 200x slower than
| using the LOCK prefix with an aligned read.[^1] The fact that
| you need to go through escape hatches to get the compiler to
| generate code to do unaligned loads and stores is a good thing,
| because it helps prevent people from writing code with
| mysterious slowdowns.
|
| Writing a function that takes two pointers of the same type
| already has to pessimize loads and stores on the assumption
| that the pointers could alias. That is to say, if your function
| takes int _p, int_ q then doing a store to p requires reloading
| q, because p and q could point to the same thing. Thankfully in
| some situations the compiler can figure out that in a certain
| context p and q have different addresses and therefore can't
| alias, this helps the compiler generate faster code (by
| avoiding redundant loads). If p and q are allowed to alias even
| when they have different addresses, this would all go out the
| window and you'd basically need to assume that all pointer
| types could alias under any situation. This would be TERRIBLE
| for performance.
|
| [^1]: https://rigtorp.se/split-locks/
| vlovich123 wrote:
| While the sentiment is correct as to why compilers makes
| alignment assumptions, a lot of the details here I think are
| not quite right.
|
| > For starters, not all hardware platforms allow unaligned
| accesses at all
|
| If you're dealing with very simple CPUs like the ARM M0,
| sure. But even the M3/M4 allows unaligned access.
|
| > Even on x86 where it's supported, you want to avoid doing
| unaligned reads at all costs because they're up to 2x slower
| than aligned accesses
|
| I believe that information hasn't been true for a long time
| (since 1995). Unless you're talking about unaligned accesses
| that also cross a cache line boundary being slower [1]. But I
| imagine that aligned accesses crossing a cache line boundary
| are also similarly slower because the slowness is the cache
| line boundary.
|
| > God forbid you try to use unaligned atomics, because while
| technically supported by x86 they're 200x slower than using
| the LOCK prefix with an aligned read
|
| What you're referring to is atomic unaligned access that's
| also across cache line boundaries. I don't know what it is
| within a cache line, but I imagine it's not as bad as you
| make it out to be. Unaligned atomics across cache line
| boundaries also don't work on ARM and have much spottier
| support than unaligned access in general.
|
| TLDR: People cargo cult advice about unaligned access but
| it's more because it's a simpler rule of thumb and there's
| typically very little benefit to pack things as tightly as
| possible which is where unaligned accesses generally come up.
|
| [1] https://news.ycombinator.com/item?id=10529947
| macjohnmcc wrote:
| Yeah even Microsoft's compiler aligns values on appropriate
| boundaries for performance reasons. DWORDs on DWORD
| boundaries etc. And if you want to pack the data structure
| to avoid the gaps in structures there are methods to do so
| via #pragma options. I think their complaining about what
| was done for performance reasons shows a great lack of
| overall understanding. More time researching and less time
| griping would have served them better.
| AshamedCaptain wrote:
| Your message is more misleading than the GP.
|
| Many architectures sold today still claim unaligned
| accesses are optional (e.g. all ARM pre-v7, which includes
| the popular Raspberry Pi Zero). Not to mention that even if
| they are supported, not all instructions support it (which
| is the case today on all ARM cores and even on x86).
|
| From the architectures and instructions which may support
| it, it may have a performance penalty which may range from
| "somewhat slower" (e.g. Intel still recommends stack
| alignment, because otherwise many internal store
| optimizations start giving up) to "ridiculously slower"
| (e.g. I once had to write a trap handler that software-
| emulated unaligned accesses on ARM -- on all 32-bit ARMs
| Linux still does this for all instructions except plain
| undecorated LDR/STR when the special unaligned ABI is
| enabled).
|
| And finally, even if the architecture supports it with
| decent enough performance, it may do it with relaxed
| atomicity. E.g. even as of today aarch64 makes zero
| guarantees regarding atomicity of even atomic instructions
| on unaligned addresses (yes, really). To put it simply
| because it is a _pain in the ass_ to implement correctly
| (say programmer does atomic load/store on overlapping
| addresses with different alignments). This is whether they
| cross cache lines or not.
|
| i.e. it's as a bad as the GP is saying. You can't just put
| one example of one processor handling each case correctly
| to dismiss this claim, because the point is that most
| processor's don't bother and those who do bother still have
| severe crippling limitations that make it unfeasible to use
| in a GP compiler.
|
| And there is still a lot of benefit to packing things up...
| but it does require way too much care and programmer
| effort.
| torusle wrote:
| > If you're dealing with very simple CPUs like the > ARM
| M0, sure. But even the M3/M4 allows unaligned > access.
|
| On ARM M3/M4 you have the same issue with LDRD and STRD
| instructions which do not allow unaligned access. Even the
| normal load/stores don't allow unaligned access in all
| cases. Try this in the peripheral memory region for
| starters. And things get even more complicated when the
| memory protection unit shakes up things.
| Gibbon1 wrote:
| > For starters, not all hardware platforms allow unaligned
| accesses at all.
|
| Yeah and always everywhere a mistake. It was a mistake back
| in the 1970's and it's increasing bigger mistake as time goes
| on. Just like big endian and 'network order'
| jcranmer wrote:
| This line in particular really bugs me:
|
| > The present blog post brings bad, and as far as I know,
| previously undocumented news. Even if you really are targeting
| an instruction set without any memory access instruction that
| requires alignment, GCC still applies some sophisticated
| optimizations that assume aligned pointers.
|
| I could have told you this was true ~20 years ago, and the main
| reason I'm so conservative in how far back gcc has been doing
| this is that it's only around that time I started programming--
| I strongly suspect this dates back to the 90's.
| SAI_Peregrinus wrote:
| It dates to the first standardization of C in 1989. The "C as
| portable assembly" view ended when ANSI C got standardized,
| and K&R's 2nd edition was published.
| circuit10 wrote:
| For C to be portable this needs to be undefined behaviour
| because there are CPUs that don't support unaligned access
| astrange wrote:
| There are some other reasons, but that's one of them.
|
| Another is that you want to guarantee objects are stored
| aligned in memory because that gives you some free bits
| in pointers you can hide stuff in. (This has less
| hardware support than it should.)
| chaboud wrote:
| That's why, while much of the linked blog is kind of off
| the mark (signs of someone knowing less than they think
| they know), the general conclusion, using aligned
| pointers is recommended, is one that I typically
| recommend to developers new to C or C++ anyway.
|
| I'm alright with folks sticking to aligned pointer
| operations, largely for performance reasons. On some
| platforms, unaligned operations are really expensive.
| j16sdiz wrote:
| I would argue it's the modern understanding of C standard
| is flawed.
|
| Back in 89, many of those unspecified behavior were
| understood as implementation/hardware dependent, not
| undefined. Aliasing was the norm, `restrict` was actually a
| keyword.
|
| Modern C is neither safe nor low-level.
| jcranmer wrote:
| Ascertaining the state of the mind of the C committee in
| 1989 is difficult, since only the documents from ~late
| 1996 are consistently available online (the earlier
| documents are probably sitting somewhere in a warehouse
| in Geneva, but they may as well not exist anymore).
|
| But definitely by the time C99 came out, it is clear that
| optimize-assuming-UB-doesn't-happen was an endorsed
| viewpoint of the committee [1]. C99 also added restrict
| to the language (not C89 as you suggest), and restrict
| was the first standardized feature that was a pure UB-
| optimization hint [2].
|
| It is important to remember that there isn't just one
| catch-all category of implementation-varying behavior.
| There is a difference between unspecified behavior,
| implementation-defined behavior, and undefined behavior.
| Undefined behavior has been understood, from its
| inception, as behavior that doesn't constrain the
| compiler, and often describes behavior that _can 't_ be
| meaningfully constrained (especially with regards to
| potentially-trapping operations).
|
| [1] The C99 rationale gives an example of an optimization
| that compilers can perform that relies on assuming UB
| can't happen--reassociation of integer addition, on one's
| complement machines.
|
| [2] The register keyword is I believe even in K&R C and
| would also be qualified as a compiler hint feature, but I
| note that it prohibits taking the address of the variable
| entirely, so it doesn't rely on UB. Whereas restrict has
| to rely on "if these two variables alias, it's UB" to
| allow the compiler to optimize assuming nonaliasing.
| Someone wrote:
| > Back in 89 [...] `restrict` was actually a keyword.
|
| Was it? I thought it's more recent.
| https://en.wikipedia.org/wiki/Restrict seems to agree (
| _"In the C programming language, restrict is a keyword,
| introduced by the C99 standard,[1] that can be used in
| pointer declarations"_ ), as does
| https://en.cppreference.com/w/c/language/restrict (
| _"restrict type qualifier (since C99)"_ )
|
| Was there an older usage?
| rsaxvc wrote:
| I've used several pre-C99 embedded compilers that
| supported restrict. IIRC, probably of mid 90s vintage.
| 6D794163636F756 wrote:
| I haven't gotten to use C in industry, but I was taught
| that undefined behavior just means that it is defined by
| the running system and not the compiler. Is that not the
| general understanding? Maybe I was just taught that way
| because it was old timers teaching it.
| umanwizard wrote:
| That's indeed incorrect. Undefined behavior anywhere
| means that the entirety of your program is undefined and
| may do anything.
| gdwatson wrote:
| If the language standard leaves some behavior undefined,
| other sources (e.g., POSIX, your ABI, your standard
| library docs, or your compiler docs) are free to define
| it. If they do, and you are willing to limit your
| program's portability, you can use that behavior with
| confidence. But they also leave many behaviors undefined,
| and you can't rely on those.
|
| For implementation-defined behavior, the language
| standard lays out a menu of options and your
| implementation is required to pick one and document it.
| IMHO, many things in the C standard are undefined that
| ought to be implementation-defined. But unaligned pointer
| accesses would be hard to handle that way; at best you
| could make the compiler explicitly document whether or
| not it supports them on a given architecture.
| patrakov wrote:
| What you are talking about is implementation-defined
| behavior. It exists in the C standard separately from the
| undefined behavior.
| SAI_Peregrinus wrote:
| Implementation Defined behavior means the standards
| authors provided a list of possible behaviors, and
| compiler authors must pick one and document which they
| picked.
|
| Unspecified behavior is more what you're thinking of,
| though in that case the standard still provides a list of
| possibilities that compiler authors have to pick from,
| they just don't have to document it or always make the
| same choice for every program.
|
| There's no allowed subset of behavior where compiler
| authors are free to pick whatever they want and document
| it (but must do so). IMO there should be, most "Undefined
| Behavior" could be specified and documented, even where
| that choice would be "the compiler assumes such
| situations are unreachable and optimizes based on that
| assumption" like much of current UB. At least it'd be
| explicit!
| dale_glass wrote:
| No. See this for details on how UB is handled by
| compilers:
|
| http://blog.llvm.org/2011/05/what-every-c-programmer-
| should-...
|
| The TL;DR is that compilers compile code based on
| assumptions that UB won't be invoked. This sometimes
| produces extremely surprising results which have nothing
| to do with the hardware/OS.
| noselasd wrote:
| indeed, I still have ~20 years old code that picks up and
| rectifies unaligned memory so gcc does the right thing. To
| claim a compiler bugs out on unaligned memory sounds very
| weird, I assumed that was common knowledge.
| nomel wrote:
| My first 10 minutes of trying to talk to hardware, and then
| googling the error message. taught me.
| tom_ wrote:
| And one of the anythings permitted would be to behave in a
| documented manner characteristic of the target environment. The
| program is after all almost certainly being built to run on an
| actual machine; if you know what that actual machine does, it
| would sometimes be useful to be able to take advantage of that.
| We might not be able to demand this on the basis that the
| standard requires it, but as a quality of implementation issue
| I think it a reasonable request.
|
| This is such an obvious thing to do that I'm surprised the C
| standard doesn't include wording along those lines to
| accommodate it. But I suppose even if it did, people would just
| ignore it.
| robinsonb5 wrote:
| The problem is that what the machine does isn't necessarily
| consistent. If you're using old-as-the-green-hills integer
| instructions then yes, the CPU supports unaligned access. If
| you want to benefit from the speedup afforded by the latest
| vector instructions, now it suddenly it doesn't.
|
| Also, to be fair, GCC does appear to back off the
| optimisations when dealing with, for example, a struct with
| the packed attribute.
| Dwedit wrote:
| We need an '__unaligned' modifier for pointers to specify that
| the pointer will be used for unaligned reads and writes.
| Athas wrote:
| Why not just use memcpy()?
| greesil wrote:
| Arrays?
| j16sdiz wrote:
| Because, in some hardware, unaligned read is ok. and you want
| to take advantage of the hardware feature?
| dzaima wrote:
| Both gcc and clang will optimize a memcpy to an unaligned
| load/store where possible.
| j16sdiz wrote:
| If the compiler is so smart, I guess it could insert a
| memcpy when needed?
|
| The standard, you may say.. I would argue it's the
| standard need to be changed. The modern reading of the
| standard is not useful as a low-level language and is
| unsafe as a high-level language.
| dzaima wrote:
| Right, I agree that it would be nice to have some way to
| request unaligned load/store to be permitted, alike
| -fwrapv for signed int wrapping. But nevertheless the UB
| behavior is a reasonable option that's beneficial for
| other things.
| rsaxvc wrote:
| > If the compiler is so smart, I guess it could insert a
| memcpy when needed?
|
| If I'm reading your comment and the blog post correctly,
| the compiler would need a memory like access on every
| multibyte pointer argument where the compiler cannot
| otherwise prove alignment. Is that correct?
| rsaxvc wrote:
| > memory like
|
| I meant memcpy()-like
| dzaima wrote:
| Internally, the compiler could represent it however it
| wants. LLVM IR's load/store instructions just have an
| "align" property, which is usually sizeof(the type), but
| can be set to 1 to mimic memcpy (and indeed llvm/clang
| immediately translate a memcpy to such -
| https://godbolt.org/z/7T46a6aqT).
|
| Though it seems that, independent of this, it assumes
| that an int* in general will be 4-byte-aligned, so e.g.
| https://godbolt.org/z/aWTEd4s3K still has an "align 4"
| despite using memcpy. So one must also cast to a char*
| before using memcpy() to actually have it work. yay for
| more footguns!
| IshKebab wrote:
| That is the solution they recommend.
| mtklein wrote:
| If you've got control over the type of the pointee and not just
| the pointer, __attribute__((packed)) can work for this.
| struct foo { int x; }; struct
| bar { __attribute__((packed)) int x; };
| _Static_assert(_Alignof(struct foo) == 4, "");
| _Static_assert(_Alignof(struct bar) == 1, "");
| loeg wrote:
| Yeah, I end up using __attribute__((packed)) for this at
| work. For tortured reasons, part of our codebase allocates
| memory with only 8-byte alignment, but the buffer is cast to
| a type that would have 16-byte alignment without
| __attribute__((packed)). As a result Clang wants to generate
| VMOVDQAs that require 16-byte alignment, unless you use
| packed, in which case it generates VMOVDQU.
| shadowofneptune wrote:
| The big thing seems to be less about GCC, and more a question of,
| "what should a compiler be?"
|
| He'd be better looking at smaller, less-known compilers, like the
| Portable C Compiler or the Intel C Compiler. If you want hyper-
| optimized, better-than-assembly quality, you pretty much have to
| give up predictability. The best optimizations that are
| predictable can't be written using modern compiler theory. They
| instead involve a lot of work, care, and attention that can't be
| generalized to other architectures. It can require a love for an
| architecture, even if's a crap one.
|
| It's a tradeoff. Not every compiler needs to be optimized, and
| not every compiler needs to embody the spirit of a language.
| dzaima wrote:
| In a project I'm working on[0], there's an array object type used
| throughout, which can sometimes point to arbitrary data
| elsewhere. In a funky edge-case[1], such an array can be built
| with an unaligned data pointer.
|
| Thus, if gcc/clang started seriously utilizing aligned pointer
| accesses everywhere, _nearly every single load & store in the
| entire project_ would have to be replaced with something
| significantly more verbose. Maybe in a more fancy language you
| could have ptr<int> vs unaligned_ptr<int> or similar, but in C
| you kinda just have compiler flags, and maybe __attribute__-s if
| you can spare some verbosity.
|
| C UB is often genuinely useful, but imo having an opt-out option
| for certain things is, regardless, a very reasonable request.
|
| [0]: https://github.com/dzaima/CBQN
|
| [1]: Any regularly allocated array has appropriate alignment. But
| there are some functions that take a slice of the array
| "virtually" (i.e. pointing to it instead of copying), and another
| one that bitwise-reinterprets an array to one with a different
| element type (again, operating virtually). This leads to a
| problem when e.g. taking i8 elements [3;7) and reinterpreting as
| an i32 array. A workaround would be to make the reinterpret copy
| memory if necessary (and this would have to be done if targeting
| something without unaligned load/store), but that'd ruin it being
| a nice O(1).
| tylerhou wrote:
| > This leads to a problem when e.g. taking i8 elements [3;7)
| and reinterpreting as an i32 array.
|
| Even ignoring alignment issues, this is already UB because it
| violates the strict aliasing rule. You technically need to
| memcpy and hope that the compiler optimizes the memcpy out. In
| C++20 you can use std::bit_cast in some circumstances.
| https://en.cppreference.com/w/cpp/numeric/bit_cast. In C11 you
| can use a union, but that still requires a "copy" into the
| union.
| dzaima wrote:
| I'm of course already using -fno-strict-aliasing (primarily
| because without it it's impossible to implement a custom
| memory allocator, but it also helps here).
| lights0123 wrote:
| > in C you kinda just have compiler flags, and maybe
| __attribute__-s if you can spare some verbosity
|
| Which you can use to wrap the unaligned type as a packed
| struct, i.e. struct
| __attribute__((__packed__)) unaligned_int { int i; };
|
| which has an alignment of 1.
| non-e-moose wrote:
| Unaligned pointer accesses are for 80386 bozos. Period End of
| story. If you want to play in 64-bit land, live by the
| architectural rules. If you do not, your code will likely die.
| And you need to "Lurn" a lot
| dang wrote:
| Discussed at the time:
|
| _GCC always assumes aligned pointer accesses_ -
| https://news.ycombinator.com/item?id=22887685 - April 2020 (91
| comments)
| JoeAltmaier wrote:
| Machine language architecture is flawed. The assumption of
| alignment is in the machine language.
|
| Compilers can only use the instructions that are there. They have
| a difficult choice: close their eyes and generate aligned-pointer
| moves, or use a sequence of tests and partial move instructions
| that is orders of magnitude less efficient.
|
| We've needed machine instructions to load or move memory
| efficiently regardless of alignment, for decades.
| titzer wrote:
| In retrospect, too many crazy software dances are due to
| miserliness on the part of hardware designs. Saving a couple
| bits in the address lines and not needing to straddle cache
| lines? We're far past the point where that's a considerable
| cost, as all modern ISA implementations now attest.
| phkahler wrote:
| If you're going to read byte-level data you should be using a
| char pointer.
|
| The author also speculates on how common this "bug" is. I'd say
| 15000 Debian packages that work properly indicates that just
| about nobody is relying on this undefined behavior.
| dzaima wrote:
| For one, the linux kernel itself relies on this UB -
| https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93031#c5, linked
| from OP.
|
| And indeed linux would hit UBSAN reports on those, but just
| disables it if native unaligned load/store is configured - last
| paragraph of
| https://github.com/torvalds/linux/blob/706a741595047797872e6...
|
| At present there's almost no reliance on this UB by compilers
| on something that actually has a chance of affecting real code,
| so it's not particularly unexpected that software appears to
| work.
| [deleted]
| unnah wrote:
| Getting to that point has required years of maintenance work
| since compiler writers started interpreting more and more
| undefined behaviour as optimization opportunities. At least now
| we have UBSAN to actually test for undefined behaviour at
| runtime.
| assbuttbuttass wrote:
| Undefined behavior doesn't necessarily mean the program will
| exhibit an issue. It could silently break with a future version
| of the compiler, which it sounds like was the case here
| anonymousiam wrote:
| The SPARC cc compiler has (had) a -misalign flag. Maybe the GCC
| guys could support the same option?
|
| https://docs.oracle.com/cd/E19957-01/805-4952/6j4mdcegh/inde...
| tedunangst wrote:
| Compilers should just add a --k&r mode to appease these people,
| but it should also reject ansi code with parse errors.
| iso8859-1 wrote:
| If a --k&r mode were to be reliable, wouldn't it need to get
| specified first? Otherwise people would start relying on some
| edge case.
|
| If speed is not a requirement for the --k&r mode, you could
| just take the tis-interpreter and note that if it runs without
| UB, it is still much faster than an actual computer was when
| k&r were active.
|
| Would it even be possible to specify a variant of C that
| contains no UB (e.g. would define exactly what happens on
| unaligned access), but can compile practical existing C89
| programs? I wonder if it could be written such that it could
| actually specify the behaviour consistently across the language
| intersection supported by both of e.g. GCC 2.95 and Chibicc[0].
|
| Or maybe there are so many bugs in GCC 2.95 that it would
| simply be infeasible? How much time would it take to specify?
|
| [0]: https://github.com/rui314/chibicc
| tedunangst wrote:
| It could probably only be used with a neural link. It would
| read your mind, then emit code that matches your perception
| of what you imagine old compilers did.
| zokier wrote:
| You jest, but there was some real effort put in to attempt
| define a dialect of C that would be less UB etc. And indeed
| the big problem was defining the semantics:
|
| > After publishing the Friendly C Proposal, I spent some
| time discussing its design with people, and eventually I
| came to the depressing conclusion that there's no way to
| get a group of C experts -- even if they are knowledgable,
| intelligent, and otherwise reasonable -- to agree on the
| Friendly C dialect. There are just too many variations,
| each with its own set of performance tradeoffs, for
| consensus to be possible.
|
| https://blog.regehr.org/archives/1287
| WalterBright wrote:
| The D programming language does not allow the creation of
| misaligned pointers in code marked as @safe, and in @safe code
| assumes they are aligned. In @system code you can do whatever you
| like, but things need to be aligned that are provided to @safe
| code.
| dzaima wrote:
| Doesn't look like that changes anything about actual
| dereferencing though, which is the primary thing discussed -
| https://godbolt.org/z/4vW5Ksnab still emits an "align 4", which
| llvm could still assume as UB if violated (though I don't know
| if it ever does).
| compiler-guy wrote:
| A good article on this topic is Christian Lattner's "What ever C
| programmer should know about undefined behavior."
|
| http://blog.llvm.org/2011/05/what-every-c-programmer-should-...
| azakai wrote:
| An example of a recent compile target that breaks on unaligned
| pointer accesses was asm.js. There, a 32-bit read turns into a
| read from a JavaScript Int32Array like this:
|
| HEAP32[ptr >> 2]
|
| The k-th index in the array contains 4 bytes of data, so the
| pointer to an address must be divided by 4, which is what the >>
| 2 does. And >> 2 will "break" unaligned pointers because it
| discards the low bits.
|
| In practice we did run into codebases that broke because of this,
| but it was fairly rare. We built some tools (SAFE_HEAP) that
| helped find such issues. In the end it may have added some work
| to a small amount of ports, but very few I think.
|
| asm.js has been superceded by WebAssembly, which allows unaligned
| accesses, so this is no longer a problem there.
| titzer wrote:
| > WebAssembly, which allows unaligned accesses
|
| And I think we made the right call (other than the vestigial
| alignment bits in load/store immediates, which AFAIK, no engine
| is making use of).
| banthar wrote:
| This is how all undefined behavior works. It seems to be working
| now but breaks with new CPU, GCC version or on wrong moon phase.
|
| "-Wcast-align=strict" will work in this but not all cases -
| that's why we have UBSAN: $ gcc
| -fsanitize=undefined test.c $ ./a.out
| test.c:6:6: runtime error: store to misaligned address
| 0x55e4007adeb1 for type 'int', which requires 4 byte alignment
| 0x55e4007adeb1: note: pointer points here 00 00 00 01
| 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
| 00 00 51 00 00 00 00
| teo_zero wrote:
| In other words, the result they got is different from the
| expected one. The keyword here is "expected": if your code
| contains a part that generates undefined behavior accordingto the
| standard, then you should have no expectations. What's worth
| mentioning in this blog post?
| MUSIC_NERD wrote:
| [flagged]
| DannyBee wrote:
| They are confused, and seem not to realize that ABIs exist, and
| often specify alignment requirements. They seem to believe there
| are just ISA and architecture specs.
|
| When you compile for Linux x86_64 ABI, gcc assumes that the stack
| is 16 byte aligned because it's required by the ABI.
|
| Regardless of whether the ISA needs it.
|
| If they want the compiler to make no assumptions about aligned
| accesses, they would need to define an ABI in GCC that operates
| that way and compile.with it. They were historically supported
| (though its been years since I looked)
| quelsolaar wrote:
| Storing pointers unaligned and using memcpy to extract them to an
| aligned pointer, can be a performance gain, if it means less
| padding taking up valuable cache space.
| DougMerritt wrote:
| In the right circumstances, you're very right -- but most
| people will get some aspect of this wrong.
| NotYourLawyer wrote:
| Yes, compilers take advantage of undefined behavior.
___________________________________________________________________
(page generated 2023-08-20 23:01 UTC)