hngopher.com

       [HN Gopher] ABI Mistakes
       ___________________________________________________________________
        
       ABI Mistakes
        
       Author : yagizdegirmenci
       Score  : 118 points
       Date   : 2021-05-08 17:34 UTC (5 hours ago)
        
 (HTM) web link (elronnd.net)
 (TXT) w3m dump (elronnd.net)
        
       | gumby wrote:
       | > Unfortunately, that doesn't work anymore; compilers are smart
       | now, and they don't like it when objects alias.
       | 
       | Let's be specific: compilers for languages that support aliasing.
       | For example, FORTRAN does not permit aliasing and therefore has
       | all sorts of optimizations that languages that do permit aliasing
       | cannot have.
       | 
       | It's a tradeoff like any other, and isn't specific to a compiler
       | beyond the fact that a given compiler X can compile language Y.
        
         | varajelle wrote:
         | But Fortran has its own ABI, while a believe this article is
         | only about C ABI.
        
       | cokernel_hacker wrote:
       | My reading of the C++ standard is that this behavior is
       | effectively mandated and that one can write a program which can
       | tell if an ABI observed the proposed optimization.
       | 
       | [expr.call]: "The lvalue-to-rvalue, array-to-pointer, and
       | function-to-pointer standard conversions are performed on the
       | argument expression."
       | 
       | [conv.lval]: "... if T has a class type, the conversion copy-
       | initializes the result object from the glvalue."
       | 
       | The way a program can tell if a compiler is compliant to the
       | standard is like so:                 struct S { int large[100];
       | };            int compliant(struct S a, const struct S *b);
       | int escape(const void *x);            int bad() {         struct
       | S s;         escape(&s);         return compliant(s, &s);       }
       | int compliant(struct S a, const struct S *b) {         int r = &a
       | != b;         escape(&a);         escape(b);         return r;
       | }
       | 
       | There are three calls to 'escape'. A programmer may assume that
       | the first and third call to escape observes a different object
       | than the second call to escape and they may assume that
       | 'compliant' returns '1'.
        
         | moonchild wrote:
         | The compiler would be forced to create copies in that case. In
         | general (using my proposed ABI), taking the address of an
         | object will cause this, because it is possible to mutate an
         | object through its address.
         | 
         | It's still a win because 1) you can avoid making copies in many
         | places, and 2) code size decreases because the copy happens one
         | time in the callee rather than many times for every caller.
        
           | gpderetta wrote:
           | So the caller might have to copy if the the pointer escapes
           | and the callee might have to copy if it needs to mutate the
           | value. In practice in many cases you might end up with more
           | copies than the "bad" ABI.
        
         | MauranKilom wrote:
         | But similar things can already happen where copy elision is
         | optional. It's one the niches the C++ standard carves out
         | regarding the as-if rule, and adding one for this new purpose
         | is conceivable.
        
           | gpderetta wrote:
           | Copy elision is very different as it to only ever avoid
           | copying objects that are just about to be destroyed.
        
       | stephc_int13 wrote:
       | Am I the only one reading this article as disguised PR for a new
       | trendy language?
        
         | mort96 wrote:
         | Yes. It's an article talking about a better calling convention
         | for C, the opposite of a new trendy language.
        
       | mssundaram wrote:
       | Speaking just on the design of the article, Firefox Reader mode
       | is very useful here
        
         | SixDouble5321 wrote:
         | That's funny. I read this in a bright environment with
         | sunglasses, an I had to use Firefox reader mode to even see it.
        
       | haneefmubarak wrote:
       | Out of curiosity: wouldn't doing this make semantics between C
       | and C++ a lot nastier since in C++ passing a copy of an object by
       | value implicitly calls the copy constructor from the caller
       | function?
       | 
       | Still, definitely an interesting optimization. I can definitely
       | see a place to use this optimization myself in the near future
       | (custom new ABI) either way.
        
         | zabzonk wrote:
         | You can't pass a C++ object with a user-defined copy
         | constructor or a non-trivial default to a C function. If you
         | want to pass an object from C++ to C it has to be POD (plain
         | old data) i.e. effectively a C struct.
        
           | haneefmubarak wrote:
           | Sure, but I was thinking more of how C++ calling conventions
           | typically tend to follow C calling conventions on a given
           | architecture, so AIUI you'd either have to have a
           | differentish calling convention or inefficiently require
           | callees to make copies of already copied objects.
           | 
           | Not that there's anything intrinsically wrong with having a
           | differentish calling convention for C and C++ - it'd just add
           | a smidgen more complexity is all.
        
       | nneonneo wrote:
       | Hmm, I don't agree. If you pass a structure by value, that is
       | _supposed_ to make a copy. The callee is free to modify the copy
       | any way it likes (unless you use const, and that's not much
       | protection in C). Doing as the author suggests, and passing in an
       | immutable parameter, would introduce copy-on-write semantics.
       | 
       | If the concern is the overhead of copying large structures
       | around, the obvious solution is to not do that and simply pass
       | structure pointers (like most APIs do). This also gives the API
       | implementer (i.e. callee) full control over whether copies get
       | made or not.
       | 
       | In fact, I think I'd argue that if you're passing large
       | structures by value, Your API Is Probably Wrong and this is
       | absolutely not the ABI's problem in the first place.
        
         | derefr wrote:
         | > Doing as the author suggests, and passing in an immutable
         | parameter, would introduce copy-on-write semantics.
         | 
         | Yes, and what's wrong with that, if it's transparently done by
         | the compiler?
         | 
         | In this scheme, the caller would pass non-register-sized data
         | "by value" by actually passing a pointer to it. (Importantly,
         | this pointer _is_ register-sized, and so wouldn 't necessarily
         | need to be spilled to the stack--unlike caller memory copy,
         | which is always to the stack.)
         | 
         | A callee that can be statically determined to never modify the
         | value, would, under this calling convention, be compiled to
         | code that simply works with the data "through" the pointer
         | that's been placed in its register / on the stack.
         | 
         | A callee that _can 't_ be statically determined to never modify
         | the value, would, under this calling convention, generate a
         | memcpy _from_ the pointer _onto_ the stack; where the pointer
         | then goes dead at that point (and so, if the pointer was
         | spilled to the stack by the caller, then the memcpy could be
         | targeted so as to overwrite the pointer on the stack.)
         | 
         | This would be much more efficient under most conditions. Its
         | only inefficiency would come from the situation where the
         | pointer being passed would necessarily be spilled to the stack;
         | and then the callee would be statically guaranteed to make a
         | copy. In that case, you're doing an extra push (for the
         | pointer) that current ABIs avoid by just having the caller do
         | an eager copy.
         | 
         | But this would be quite rare in practice, since this ABI (and
         | the ABIs it replaces) are only for _external symbols_ -- the
         | kind whose linkage is fundamentally dynamic (i.e. where the
         | linker-loader could in theory substitute anything it likes for
         | the external symbol with an LD_PRELOAD shim.)
         | 
         | Internal symbols -- private static functions -- get bespoke
         | compiler-specific ABIs that skip all this enforced
         | caller/callee predetermined role business and just codegen
         | whatever's most efficient on a case-by-case bases, with
         | different callsites getting different monomorphizations or
         | partial inlinings of the callee that distribute the
         | responsibilities differently.
         | 
         | And since these ABIs only matter for external linkage, you have
         | to remember that symbols with external linkage get no "type
         | attribute enforcement" at link-load time, and so your symbol
         | for a function that was originally guaranteed to be a callee-
         | that-always-writes, _might_ be substituted at link-load time
         | for a callee-that-never-writes. In which case, having the
         | pointer on the callee 's stack once again becomes handy, rather
         | than useless.
         | 
         | > In fact, I think I'd argue that if you're passing large
         | structures by value
         | 
         | Not necessarily _large_ structures. More often things like
         | UUIDs--values _just_ large-enough to not fit into a (64-bit)
         | machine register. It 's these _small_ memory spills, done in
         | hot loops, that add up. There are huge wins for the cleanliness
         | of e.g. RDBMS engine code, if their various 128-bit to 512-bit
         | column types can be passed by value without necessitating an
         | eager copy.
        
           | shawnz wrote:
           | If you can effectively do that static analysis, why does it
           | matter whether you pass by value or by reference? Can't you
           | do the analysis either way? So why not keep telling people to
           | pass by reference, just in case their compiler is not so
           | advanced, and when you do have access to advanced static
           | analysis, then just use it to improve the pass-by-reference
           | case.
        
           | andi999 wrote:
           | A problem would be that performance becomes brittle. So you
           | change the strict in your function and suddenly the program
           | is slower.
        
             | Negitivefrags wrote:
             | That war was lost decades ago.
        
           | temac wrote:
           | Other parts of the program can modify the object after the
           | callee has started, if the callee contains some synchro there
           | would be no race. At this point maybe you can think of
           | additional tricks like CO"W" before any potential synchro
           | point if some exist, but I'm not sure big advantages would
           | remain except if you very carefully craft your program
           | knowing all of that for the opti to happen, and it seems a
           | lot of complexity in the compiler for limited gains,
           | especially since if you have to massage the code you could as
           | well transform to a reference yourself at source level (and
           | that would better resist to e.g. calling unknown functions
           | from the callee)
        
             | Negitivefrags wrote:
             | In order for another part of the program to do that, you
             | would already have had to generate a _mutable_ pointer to
             | the object somewhere. As the article actually already goes
             | into, this is something the compiler already tracks in
             | order to know if a value needs to be loaded from memory
             | when using it again.
             | 
             | If the ABI specified that objects passed this way were
             | passed by immutable pointer, then just passing the object
             | using this new method doesn't generate a mutable pointer,
             | so the compiler is free to assume nobody else can modify
             | it.
        
               | temac wrote:
               | But that means the caller also has some proofs to do, if
               | a counter example is found or nothing can be proven the
               | caller _also has to do a copy_ , and that's with no
               | guarantee the callee won't do another one...
               | 
               | So not really an optimization anymore.
        
               | Negitivefrags wrote:
               | You say "proofs" but, as I said, this is already
               | something the compiler keeps track of.
               | 
               | The caller is required to do a copy as the ABI is right
               | now so if you can't do this optimisation it's only as bad
               | as it was before.
               | 
               | Yes, there is a case where this creates one extra copy.
               | The case where you a) give away a mutable pointer to a
               | variable before calling another function, and b) the
               | called function modifies it's own argument and then c)
               | that function then doesn't use that object as the
               | argument for another function it calls.
               | 
               | That's the only case where it generates an extra copy,
               | and I'm going to go out on a limb and say that I'm almost
               | cirtain that this will reduce sigificantly more copies
               | that it creates.
        
         | CyberRabbi wrote:
         | > and passing in an immutable parameter, would introduce copy-
         | on-write semantics.
         | 
         | Yes that's the point the author is making. His scheme incurs
         | strictly less overhead than the scheme currently used.
         | 
         | Some people may want to program in a "value oriented" way
         | without pointers at the C level for various reasons but for
         | efficiency reasons they cannot.
        
         | Negitivefrags wrote:
         | The explaination by derefr is correct.
         | 
         | But there is a larger assumption that underlies your post that
         | I think is important to correct.
         | 
         | You seem to be caught up in the idea that the code you write,
         | and the code the compiler generates must have some kind of one-
         | to-one corrospondance, but this isn't the case at all.
         | 
         | The code the compiler generates must have the same final
         | result, but it doesn't have to actually work the same way.
         | 
         | So if you pass a structure by value, and the compiler can avoid
         | a copy somehow because it's more efficient, then of course it
         | should do so.
         | 
         | Making an ABI that is ammenible to allowing the compiler to
         | make better optimisations is of course also a good idea. And
         | the proposed ABI here achieves that.
        
           | saagarjha wrote:
           | Dropping one-to-one correspondence is fine for most code, but
           | there are certain places where it is unacceptable. An ABI is
           | one of these places, because the interface is designed to be
           | exposed in a way that is observable.
        
           | __jem wrote:
           | > The code the compiler generates must have the same final
           | result, but it doesn't have to actually work the same way.
           | 
           | You're obviously correct, but I think this ignores downstream
           | costs of sufficiently magical compilers. If my compiler
           | tunnels into a parallel universe, executes the code there,
           | and returns the result, that's fine, but it's going to be
           | enormously painful to debug.
           | 
           | Obviously very few developers (including myself) have an
           | appropriate mental model of highly complex, speculative
           | processor architectures that end up executing their code, but
           | I do think there is at least _some_ benefit to having a
           | correspondence between the code you write and the code the
           | compiler generates.
           | 
           | Maybe it's not a problem with the compiler, but a problem
           | with languages. Maybe we just need better languages?
        
             | msbarnett wrote:
             | > You're obviously correct, but I think this ignores
             | downstream costs of sufficiently magical compilers. If my
             | compiler tunnels into a parallel universe, executes the
             | code there, and returns the result, that's fine, but it's
             | going to be enormously painful to debug.
             | 
             | > Obviously very few developers (including myself) have an
             | appropriate mental model of highly complex, speculative
             | processor architectures that end up executing their code,
             | but I do think there is at least some benefit to having a
             | correspondence between the code you write and the code the
             | compiler generates.
             | 
             | It's not really clear to me why you think this? Or at
             | least, it's not clear to me how you can write this without
             | levelling the same complaint against essentially any C
             | compiler written after the mid 1980s?
             | 
             | Because for all the hand--wringing in the comments here
             | about compilers generating code that doesn't do what your C
             | said it would do in the way you wrote it, that ship sailed
             | _decades_ ago.
             | 
             | That quaint little integer addition loop you write in C
             | will, as like as not, be exploded by the compiler into an
             | unrolled stream of SIMD instructions and packed registers
             | that doesn't resemble what you wrote in the least, which
             | the CPU may then further scramble and interleave, resulting
             | in nothing remotely like what a naive person might imagine
             | is a reasonable translation of what you wrote - and yet,
             | debuggers remain broadly useful tools.
             | 
             | And people have for decades now shown a strong desire to
             | enthusiastically and with great speed drop compilers in
             | favour of newer ones that performed better optimizations,
             | which is to say, people have shown a strong preference for
             | compilers that are good at generating code that looks
             | nothing like what you wrote, provided the results are the
             | same and the performance is as fast as possible.
             | 
             | Turning a copy into passing a pointer that may later copy
             | if and only if the compiler sees the possibility of a write
             | coming is, in terms of modern optimizations, an absolutely
             | conservative optimization that doesn't routinely happen
             | only because platform ABIs unnecessarily rule it out. Far
             | from being a bridge too far, in terms of modern compiler
             | behaviour it's simple and low-hanging fruit which isn't
             | exploited even though far more exotic tricks are already
             | the rule of the day.
        
             | cma wrote:
             | Doesn't this hurt forward compatibility to new
             | architectures where there are more registers and it would
             | be more efficient doing it through registers? Seems like
             | automatic choice by the compiler should be default and if
             | explicit control is needed some kind of attribute or
             | something could be used to override the compiler's choice.
        
               | zorgmonkey wrote:
               | Not really because ABI's are already architecture
               | specific so they already do stuff different for other
               | CPUs. Similarly, their are already a variety of
               | attributes that can be used to explicitly control the ABI
               | of the top of my head things like packed structs and
               | function calling convention are common and in several
               | languages (C, C++, rust and zig).
        
       | gpderetta wrote:
       | struct X { int i; };            X* p;       void bar(X x){
       | P->i=1;         assert(x.i==0);// fails   with transparent pass
       | by ref       }       void foo() {         X x{0};         P=&x;
       | bar(x);       }
        
         | Negitivefrags wrote:
         | The compiler knows that there is a mutable pointer to x
         | floating around as soon as you did `P=&x`. Therefore it knows
         | it needs to make a new copy of x and then pass an immutable
         | pointer to that value when calling `bar`.
         | 
         | This is something the compiler already keeps track of anyway in
         | order to be able to know if it needs to re-load variables from
         | memory when they are used again after a function call.
         | 
         | Basically the optimsiation degrades to exactly what happened
         | before in the case where you do this.
        
           | gpderetta wrote:
           | In many cases you might need to amke two copies. Also
           | remember, the author proposes to change the abi to encourage
           | pass by value instead of of pass by restricted reference, so
           | in the original code there might be zero copies.
           | 
           | In practice this optimization would be very fragile (lake
           | most optimizations relying on escape analysis) and people
           | would keep passing by reference in fear that a small change
           | like taking the address of an object might force a copy of a
           | large struct.
        
         | mort96 wrote:
         | Well, no, that's just one of the situations where the compiler
         | would have to make a copy in the callee; a write to a pointer
         | which might alias the argument.
        
       | temac wrote:
       | For a copy on write const ref ABI trick to work, the referenced
       | area has to be stable while the callee is running. Good luck with
       | guarantying that in C or C++ (and no, it would not necessarily be
       | a race for the refed object to be concurrently unstable, the
       | called function could have internal synchro...)
       | 
       | So I think this idea is virtually impossible. Your ABI is
       | probably not wrong, but your blog post probably is :P
        
         | mort96 wrote:
         | Well, no. The callee would just have to make a copy in cases
         | where the memory might change.
         | 
         | The compiler would have to produce a copy in the case of this
         | function:                   void foo1(struct large l) {
         | bar();             baz(l.x + l.y);         }
         | 
         | Because the call to `bar()` might change the caller's copy, so
         | the callee `foo1` has to make a local copy before calling
         | `bar()`.
         | 
         | But it would not have to produce a copy in this example:
         | void foo2(struct large l) {             baz(l.x + l.y);
         | }
         | 
         | Because nothing can change the caller's copy before the callee
         | `foo2` is done using it.
        
           | Negitivefrags wrote:
           | This isn't correct.
           | 
           | The compiler would not have to produce a copy in either case.
           | 
           | The call to bar() can not change the value of l because the
           | person who called foo1 gave you an immutable pointer to it.
           | 
           | How did foo1's caller know it was immutable?
           | 
           | If the value passed to foo1 was a local variable, or one of
           | it's arguments passed to it by immutable pointer, then all
           | the compiler has to do is check if, in that function, it
           | created a mutable pointer to the value and put it somewhere
           | accessable to someone else. If it did not, then it can just
           | pass the pointer right through.
           | 
           | If it did create a mutable pointer, or if the value came from
           | some other place the compiler has no knowledge about, then it
           | would make a copy of the value before passing a pointer to
           | foo1, just like it would do for the old ABI.
        
       | foobiekr wrote:
       | "A correctly-specified ABI should pass large structures by
       | immutable reference"
       | 
       | There is no such thing at the level that the ABI works,
       | especially at a kernel-userland boundary (where the choice is to
       | fully marshal the arguments or accept that you have a TOCTOU
       | issue).
        
         | user-the-name wrote:
         | An ABI is a contract. If it wants to say that a pointer is an
         | immutable reference, all it needs to do is to say "this pointer
         | is an immutable reference", and specify what that means. It is
         | up to those who follow that contract to follow those rules.
         | 
         | You ABI could say a pointer may only be accessed on Wednesdays
         | if it really wanted to.
        
           | omnicognate wrote:
           | Wednesdays in what timezone and by what clock?
        
             | varajelle wrote:
             | Even if we assume the computer's own internal clock, we
             | would have a race condition if the computer goes to sleep
             | on Wednesday afternoon just after a call to that function
             | is done, and woken up in the next day. Sounds like a really
             | impractical ABI.
        
         | legulere wrote:
         | Callee saved registers kind of work like that. You either don't
         | touch those registers, or you restore them before returning.
        
       | leni536 wrote:
       | I think clang's noescape attribute for pointer function
       | parameters are somewhat related. There are several compiler
       | extensions that allow refining calling conventions, so clearly
       | even compiler vendors think that there are things to explore
       | here.
       | 
       | Having said that I think the suggestion would be observably break
       | both the current C and C++ standards, so something like this
       | shouldn't be done without an explicit attribute.
        
       | hctaw wrote:
       | I'm reminded of Chesteron's Fence in this.
       | 
       | Every major ABI is listed here as containing the same mistakes.
       | I'm inclined to think the people who designed these ABIs were
       | smart enough to understand the consequences of their design
       | decisions.
       | 
       | I don't know whether this author is correct or not, but my gut is
       | there is something missing here with respect to non local control
       | flow (like exception handling, setjmp/longjmp, and fibers).
        
         | lostcolony wrote:
         | I love seeing others bring up Chesterton's fence; it's been a
         | reference that comes to mind with quite a lot of the WTFery
         | I've encountered in my career (usually it remains WTFery even
         | when looking for underlying reasons, but it at least helps
         | remind me to question my instincts).
         | 
         | I don't really know enough to weigh in on this, but I can say
         | that having pursued a lot of WTFish things in my career so far,
         | 90% of the times I've encountered bad decisions, the
         | explanation for it was either "it was done that way because
         | legacy reasons" (i.e., it had to be done that way then, the
         | reason it had to be has changed, and now it would break things
         | to do it 'correctly') or "it was easier" (i.e., at the time the
         | badness wasn't really going to affect anyone, or not
         | measurably, or was very intentional tech debt, and it's only
         | 'now' that anyone is noticing/caring).
        
           | david422 wrote:
           | I've seen people make bad architectural decisions that now
           | the company is stuck with. And it comes down to just the fact
           | that it was a bad decision, no second guessing needed.
           | 
           | I've also seen "bad" decisions made due to outside
           | constraints. These decisions look like bad decisions, except
           | that if you try to "fix" those decisions, it becomes a lot
           | harder than it looks.
        
           | rsj_hn wrote:
           | Yup, there's also time dependence. Perhaps someone wrote some
           | software in COBOL that is hard to maintain now. But rewritng
           | it may not be worth the opportunity cost now, especially for
           | well-tested systems that have been around for a long time and
           | which have critical failure modes. Sometimes it's better to
           | leave things alone and work around them, even if it results
           | in an uglier design.
        
           | derefr wrote:
           | In this case, "it was done that way because legacy reasons"
           | is close, but the real answer is "it was done that way
           | because we hadn't yet invented the parts of compiler theory
           | required to create compilers that enforce this constraint at
           | the type level."
        
         | jcelerier wrote:
         | > A correctly-specified ABI should pass large structures by
         | immutable reference
         | 
         | is just not possible. CPUs don't know about `const`. So you
         | have to work with the assumption that functions that you call
         | can do anything to their arguments. Thus copies cannot be
         | avoided.
        
           | [deleted]
        
           | mhh__ wrote:
           | The CPU also doesn't know what an ABI is
        
           | wizzwizz4 wrote:
           | CPUs actually _do_ know about const; it 's called a read-only
           | page.
           | 
           | Besides, that's irrelevant. There's nothing stopping my
           | function from following every pointer on the stack and
           | smashing up its contents; are you going to defend against
           | that, too? If not, how is this any different?
        
           | jacoblambda wrote:
           | An ABI also has a concept of defined and undefined behaviour.
           | You can design an ABI that is fully protected against abuse
           | but often the performance penalty for that will be huge.
           | 
           | Instead what you'll do is specify the constrained inputs and
           | expected output behaviour. From there you can out anything
           | that violates those constraints as non-conformant. As long as
           | you maintain those constraints between versions, there's no
           | ABI breakage.
           | 
           | Also you can absolutely have constant references in an ABI.
           | There may be ways of ignoring the const depending on how you
           | design the ABI but they will be obvious abuse.
        
         | gpderetta wrote:
         | Exactly, see my example elsethread. Also in C and derivatives
         | distinct objects are guaranteed to have distinct addresses.
         | Implicit sharing would break this.
        
           | mort96 wrote:
           | It wouldn't. The compiler would just have to generate the
           | copy when the standard demands it (such as if the function
           | body takes the address of the object).
        
             | gpderetta wrote:
             | Yes but then in many cases either (or both!) the caller and
             | the callee might need to make a copy defeating the point of
             | the optimization or even being worse than the original.
        
               | mort96 wrote:
               | In many cases the callee would have to make a copy, yes.
               | However:
               | 
               | 1. In many cases, no copy would have to be made. There
               | are lots of small non-complex functions out there where
               | the compiler can prove that it's safe to not make a copy.
               | 
               | 2. In many other cases, a copy has to be made. But the
               | copy is made by the callee, not by the caller. That means
               | that all the instructions necessary to copy the argument
               | ends up in the binary once in the callee, rather than
               | once for every function call, leading to less code bloat
               | (which has its own performance advantages).
               | 
               | In fact, a stupid compiler could just always make a copy
               | without analyzing the function body. This would result in
               | a compiler which generates code that's about as fast as
               | it would be with current ABIs, but with a smaller size.
        
               | gpderetta wrote:
               | You have to make a copy on the caller or the callee if
               | the address of the object escapes, so you might end up
               | with two extra copies even if nothing in the program
               | mutates the object.
        
         | legulere wrote:
         | How about those explanations:
         | 
         | It didn't matter before, as compilers were not optimizing as
         | much, code had a much closer 1:1 correspondence to assembly (if
         | you are passing it by pointer and not register, you would want
         | to make that clear in code).
         | 
         | It's much easier to implement in simple compilers. On the side
         | of the callee you don't have to check if you manipulate your
         | arguments, which is generally hard. Being able to manipulate
         | your arguments is another shortcut for keeping the compiler
         | simple. On the side of the caller you don't have to check if
         | you hand out a mutable pointer.
         | 
         | Also finally and most importantly: memory access was much
         | cheaper in terms of cpu cycles. Just look at cdecl: all
         | parameters are passed on the stack instead of registers. Our
         | current calling conventions stem from performance hacks like
         | fastcall that were only optimizing for existing code (you pass
         | big structs by pointer by convention).
        
         | brigade wrote:
         | Well for one, the language says a copy is made _at the time of
         | the function call_ , and it's perfectly valid to modify the
         | original before the copy is finished being used. So pretty much
         | any potentially aliasing write or function call in the callee
         | would force a copy, and as he notes C's aliasing rules are lax
         | enough that that's most of them.
         | 
         | Then if you care about the possibility of signal handlers
         | modifying the original... you pretty much have to make a copy
         | every time anyway.
        
           | temac wrote:
           | Plus any _potential_ concurrency synchro point existing would
           | force a copy, plus using any unknown function, etc.
           | 
           | Using rust and propagating the single writer xor multiple
           | readers requirement in an ABI, this might be interesting. But
           | with C/C++, I'm afraid copies would be forced "all" the time.
        
             | mort96 wrote:
             | There's still a lot of functions which don't call unknown
             | functions before accessing an argument passed by value,
             | don't take the argument's address, etc. There are many
             | simple functions such as this one:                   void
             | print_foo(FILE *outf, struct foo foo) {
             | fprintf(outf, "foo '%s': %i, %i\n", foo.name, foo.x,
             | foo.y);         }
             | 
             | That one would gain a speed-up and code-bloat reduction
             | from the proposed ABI, and there are many like it.
             | 
             | But even if every single function had to fall back to
             | making a copy, the argument is that there's still a
             | significant code bloat saving by putting the copy in the
             | callee rather than in the caller. After all, the
             | instructions necessary to make a copy takes some space, and
             | with the proposed ABI, those instructions are put in the
             | called function, rather than in every function call. Most
             | functions are called more than once, and all functions are
             | called at least once (hopefully), so anything which can be
             | changed from O(number of function calls) to O(number of
             | functions) is an improvement.
        
         | jbverschoor wrote:
         | Sometimes a mistake is a decision under the assumption that the
         | people intended to use this are smarter / more careful than
         | they are.
        
         | moonchild wrote:
         | > my gut is there is something missing here with respect to non
         | local control flow (like exception handling, setjmp/longjmp,
         | and fibers)
         | 
         | (Post author.)
         | 
         | Mechanically, what happens is essentially the same as what
         | ms/arm/riscv do: the caller creates a reference and passes it
         | to the callee. The only difference is that the callee is _more_
         | restricted than it would otherwise have been in what it can do
         | with the memory pointed to by that reference. So I don 't think
         | that there can possibly be any implications for non-local
         | control flow.
        
           | hctaw wrote:
           | Doesn't the referenced data have to be guaranteed to outlive
           | the callee, which would only be true if the callee is
           | guaranteed to return to the calling scope?
           | 
           | You can get around the immutability of the reference if your
           | compiler implements the ABI with copy on write semantics,
           | which I think is a reasonable compromise. But I'm still not
           | certain how you would handle arbitrary control flow that the
           | compiler may not be able to reason about.
           | 
           | If for example your arguments may be behind const references,
           | how would you implement getcontext/swapcontext for your ABI?
           | If everything is an integral value in registers or on the
           | stack then it's really easy, but i would think it would have
           | to be a compiler intrinsic if it depends on the function
           | signature of the calling context, in order to perform the
           | required copies.
        
       | chjj wrote:
       | DMR on `restrict` (originally proposed for C89 as `noalias`):
       | http://port70.net/~nsz/c/c89/dmr-on-noalias.html
        
       | legulere wrote:
       | Why would that optimization not be possible with a simple const
       | *?
       | 
       | The problem I see with that proposal is that it introduces a new
       | type of pointer, the immutable pointer. That seems like equal to
       | a const pointer, but it's not. With a const pointer the callee
       | can not mutate the pointee. But it can still be mutated from the
       | outside. That means that any time you handed out a mutable
       | pointer to something you have to make a copy for handing out an
       | immutable pointer to it. An ABI like this would probably be much
       | more complex to implement for a small gain.
       | 
       | You would end up with hardly predictable behaviour wether the
       | struct gets copied or not. C# structs suffer a lot from this,
       | because methods are mutating by default
       | (https://codeblog.jonskeet.uk/2014/07/16/micro-
       | optimization-t...). The biggest problem is simply that this is
       | not explicit.
       | 
       | Also there is a case where a calling convention like this can
       | make things worse, as you will have to make two copies:
       | void fn1(A*);         void fn2(A a) {           // Have to make
       | copy here because mutating           fn1(&a);         }
       | void fn3()         {           A a;           fn1(&a);
       | // Have to make copy here because reference to a might have
       | escaped.           fn2(a);         }
        
         | moonchild wrote:
         | > With a const pointer the callee can not mutate the pointee.
         | But it can still be mutated from the outside
         | 
         | That's incorrect. In c semantics, at least, it is legal to take
         | a pointer to non-const, cast it to a pointer to const, pass it
         | back, and mutate the object pointed to. It is only illegal to
         | mutate if the pointer actually points to const memory.
         | 
         | Which means that, in a _general_ sense, if you 're given a
         | const pointer, since you can't know if it actually points at
         | const memory, you shouldn't mutate through it. But if you're
         | handing out const pointers to non-const memory, you shouldn't
         | count on the memory not being changed through those pointers.
         | 
         | IOW const is all but useless in c (except as a declaration of
         | intent).
         | 
         | > An ABI like this would probably be much more complex to
         | implement for a small gain
         | 
         | Why? Mechanically, it's almost the same as the ms/arm/riscv
         | abi, except that a copy is made by the callee rather than the
         | caller.
        
           | gpderetta wrote:
           | Const matters to the caller, if it calls a function that
           | takes a pointer to const it can generally reasonably expect
           | that the object wont be changed.
           | 
           | It is not about enabling compiler optimisations but about
           | preserving invariants.
        
       | vvanders wrote:
       | One of the things I'm really excited about for Rust is restrict
       | semantics are baked right there into the language with &mut/&. I
       | think there were still a few LLVM bugs to flush out(because very
       | little C/C++ code uses restrict by nature of how it subtly
       | explodes when you get it wrong).
       | 
       | In theory when they sort that out Rust should be able to turn on
       | restrict where it applies globally "for free".
        
         | nathankleyn wrote:
         | Propagating "noalias" metadata for LLVM has actually finally
         | been enabled again recently in nightly [0]. However it has
         | already caused some regressions so it is not clear whether we
         | may go through another revert/fix in llvm/reenable cycle [1].
         | This has happened several times already sadly [2] as, exactly
         | as you say, basically nobody else has forged through these
         | paths in LLVM before.
         | 
         | [0]: https://github.com/rust-lang/rust/pull/82834
         | 
         | [1]: https://github.com/rust-lang/rust/issues/84958
         | 
         | [2]: https://stackoverflow.com/a/57259339
        
           | vvanders wrote:
           | Ah damn that's a real shame, something to be said about how
           | rarely restrict is used(I usually only touched it for
           | particle systems or inner loops of components where I knew I
           | was the only iterator).
        
           | galangalalgol wrote:
           | Seriously, this has been _years_ now. Is this understandable,
           | or does this tell us something bad about llvm?
        
             | wnoise wrote:
             | Or something bad about noalias?
        
             | swsieber wrote:
             | This is understandable. Rust really uses restrict semantics
             | in anger compared to any other language I know of. Have you
             | seen restrict used in a c codebase? The LLVM support for
             | restrict just doesn't get exercised much outside of Rust.
        
               | galangalalgol wrote:
               | Also there are a lot of fortran compilers using llvm now.
               | Fortran has the information for noalias as well.
        
               | galangalalgol wrote:
               | I use it in c++ on most signal processing stuff. I think
               | eigen will use it if you don't let it use intrinsics for
               | simd. I also use g++ so I wouldn't have encountered it
               | anyway.
        
             | khuey wrote:
             | This is what happens when you're trying to add a hairy
             | feature to a big legacy codebase.
        
               | rrdharan wrote:
               | Referring to the LLVM codebase as legacy makes me feel
               | old...
        
             | zozbot234 wrote:
             | > Is this understandable, or does this tell us something
             | bad about llvm?
             | 
             | LLVM is a large project that's mostly written in pre-modern
             | C++, and "noalias" is a highly non-trivial feature that
             | affects many parts of the compiler in 'cross-cutting' ways.
             | It would be surprising if it did _not_ turn up some initial
             | bugs.
        
               | galangalalgol wrote:
               | Initial, yes, but this was first uncovered in Oct 2015.
               | That seems like long enough to fix it.
        
               | saagarjha wrote:
               | Aliasing analysis is a complicated part of the compiler,
               | and it underpins a lot of optimization passes. It's not
               | an easy thing to bolt on.
        
               | masklinn wrote:
               | It's not a single bug, it's a bunch of different bugs in
               | the interactions between noalias and various analysis and
               | optimisation passes.
        
         | Teknoman117 wrote:
         | The other neat thing about Rust is that if you turn on LTO and
         | all of the involved code is Rust, there is no hard ABI. The
         | linker will do all sorts of interesting things with register
         | usage and how to pass data to functions.
        
           | mhh__ wrote:
           | Why is that not the case with any other language and LTO?
        
             | comex wrote:
             | It is. As far as I know, rustc just exposes LLVM's LTO
             | functionality, without performing any Rust-specific
             | optimizations at link time. So you get the same
             | optimizations as you would with LTO in Clang (C compiler),
             | and indeed you can do cross-language LTO between the two.
             | 
             | Also, even without LTO, LLVM can perform the same ABI
             | optimizations on functions that are local to a single
             | translation unit and not exported to other ones. In Rust
             | that means a function not exported across crate boundaries.
             | In C that means a `static` function.
        
             | donkarma wrote:
             | I think MSVC has some weird things when you call a function
             | and you do this, I don't remember but it had a weird
             | calling convention in the assembly IIRC.
        
       | colour wrote:
       | "Well, what's wrong with that? Surely we can just do what we did
       | in the days of dumb compilers and pass structures by pointer.
       | Unfortunately, that doesn't work anymore; compilers are smart
       | now, and they don't like it when objects alias. " What does that
       | even mean? Does gcc cry after I make it translate code with
       | pointers, I simply don't get it.
        
         | re wrote:
         | "Doesn't work anymore" is a hyperbole. Aliasing can prevent
         | optimizations, like in the code example immediately after that
         | paragraph where the value of `x` needs to be loaded from memory
         | to be returned.
         | 
         | See also
         | https://en.wikipedia.org/wiki/Aliasing_(computing)#Conflicts...
        
       ___________________________________________________________________
       (page generated 2021-05-08 23:00 UTC)