[HN Gopher] No sane compiler would optimize atomics (2105)
___________________________________________________________________
No sane compiler would optimize atomics (2105)
Author : wheresvic4
Score : 51 points
Date : 2021-07-11 19:58 UTC (3 hours ago)
(HTM) web link (www.open-std.org)
(TXT) w3m dump (www.open-std.org)
| andi999 wrote:
| Why is it surprising that atomic can be optimized?
| rurban wrote:
| Likewise no sane compiler would optimize memset. Unluckily we
| don't have sane compilers
| rowanG077 wrote:
| Why not? This makes little sense to me. Why shouldn't a
| compiler optimize it?
| 1ris wrote:
| Because, when a programmer write a call to memset, she does,
| if you, as a compiler, take her seriously at all, expects the
| memory to be set, and not it just being a mere suggestion 1).
| should added a separate function called it
| memsetOrDontIfYouThinkINeverReadThisAgain. Does this make
| sense?
|
| 1) Yes, I know there are rules and all that are precise and
| allow that.
| [deleted]
| aqrit wrote:
| https://media.ccc.de/v/35c3-9788-memsad
| halayli wrote:
| Compilers are sane and written by very smart people. They do
| have bugs just like any other software. In my 20+ yrs of
| coding, I can count the number of bugs I've encountered on one
| hand.
|
| memset tend to be optimized out in the dead code elimination
| pass and highly relies on SSA. Compilers adhere to the abstract
| machine specs. If you want memset not to be optimized out then
| use memset_s because the spec explicitly doesn't allow that.
|
| if an optimization relies on a UB and surprised the programmer
| then the programmer was also relying on a UB in the first
| place.
| userbinator wrote:
| If anything I think they are written by people who are _too_
| smart --- theoreticians who care only about the standard and
| can 't be bothered exercising common sense when the standard
| doesn't specify what to do.
|
| http://blog.metaobject.com/2014/04/cc-osmartass.html
| beached_whale wrote:
| The example about signed overflow misses the point, on good
| code it results in a performance improvement. Similar for
| dereference of null pointers; it allows the compiler to
| remove visible successive checks(e.g if you call *ptr, then
| all further calls to ptr will not be null until it is
| written to). This makes it faster to express preconditions
| but let the compiler make safer code faster.
| halayli wrote:
| Your definition of what's considered common sense is based
| on your sole understanding of computers. It's more like
| your intuition.
|
| Just because you don't understand something fully doesn't
| mean it's not common sense.
|
| It's very common that programmers don't bother reading the
| standard or attempt to understand the concept of an
| abstract machine.
|
| Programmers that know what they are doing tend not to be
| surprised by such nuances.
| 1ris wrote:
| The problem with programming is almost never the lack of
| smartness. It's almost always the opposite: People being too
| clever.
|
| The sane thing in a realistic scenario is to ban all usage of
| memset and add a diagnostic to use memset_s, like any
| responsible programmer should do with strcpy. Then however,
| was it really wise to make this machinery, that the people
| who need it the most are most likely to not have, necessary?
| I really, really don't think so.
|
| \Edit: Here is some code written by a very smart person,
| Niels Fugerson. I think (not sure) it tries to wipe sensive
| material from memory and gets it wrong. Really not sure, tho.
|
| https://github.com/MacPass/KeePassKit/blob/master/TwoFish/tw.
| ..
| ncmncm wrote:
| Yes, this code gets it wrong.
|
| It demonstrates that there are many different meanings for
| "smart". Nobody qualifies for all of them.
| secondcoming wrote:
| Isn't memset different in that it's an external function that
| compiler writers have chosen to intercept and perform
| themselves? Why is saying 'use memset_s' acceptable when they
| could have stopped treating memset as a special case?
| chrisseaton wrote:
| > Isn't memset different in that it's an external function
| that compiler writers have chosen to intercept and perform
| themselves?
|
| The semantics of memset and memset_s are defined by a
| standard. They're not unknown functions that could do
| anything, and they're breaking a rule by treating them
| differently, like you're suggesting they are.
| Someone wrote:
| If you include a standard header (using <> brackets) and
| call a function that the standard says get declared when
| doing that, the compiler is allowed to assume you want the
| standard behavior, and inline that/replace it by a
| specialized version/etc.
|
| Optimizing _memset_ , in particular, can reap huge
| benefits, for example for sizes of 4 or 8 bytes in tight
| loops or for zeroing cache line sized blocks.
| halayli wrote:
| memset is not a special case. Compiler designers are very
| much against special casing, and rightfully so.
|
| Compilers do have an intrinsic equivalent but that has
| nothing to do with whether it gets optimized out or kept.
|
| Following SSA rules, if you are writing to a non-volatile
| memory and not reading from it for the remaining duration
| of its lifetime then it's dead code.
|
| use memset_s was a suggestion if your goal is to scrub
| memory because it guarantees the write will happen. The
| guarantee comes from the C standard:
|
| > Unlike memset, any call to the memset_s function shall be
| evaluated strictly according to the rules of the abstract
| machine as described in (5.1.2.3). That is, any call to the
| memset_s function shall assume that the memory indicated by
| s and n may be accessible in the future and thus must
| contain the values indicated by c
|
| Which essentially means treat the memory as-if it's
| volatile.
| 1ris wrote:
| Are you allowed to cast the void * to volatile void *,
| pass that to memset and except the same result, or is the
| void *, volatile void*, void * conversion eliminated
| aswell?
| ncmncm wrote:
| You can wave whichever dead chickens you like over the
| keyboard, all to the same effect. The compiler ensures
| only semantics that are observable.
|
| If you pass a pointer to a volatile or atomic object to
| another function the compiler cannot see into, the
| compiler knows that any specified operations _might be_
| observable, so must act accordingly. Then, operations on
| that object will not be optimized away. Or, anyway, not
| until someone builds the program using Link Time
| Optimization ( "-flto") and the called function body
| becomes visible.
| 1ris wrote:
| I'm confused. Is the memset to the temporary volatile
| void* observable or not, and is the temporary possibly
| eliminated?
| ncmncm wrote:
| Treating "the memory as-if it's volatile" would _not_
| block such an optimization. Specifically, zeroing a
| volatile or atomic that lives on the stack immediately
| before returning from a function whose scope it lives in
| is freely optimized away, because it is not observable.
|
| Many people mentally model atomic as akin to volatile,
| about which they also frequently harbor superstitions.
|
| I knew a FreeBSD core committer who insisted (and
| convinced our boss!) that volatile was entirely
| sufficient to serialize interactions between threads.
|
| Someone else expected a line with a volatile or atomic
| operation never to be optimized away, and was appalled
| that it happened. Assignment to a volatile stack object
| whose address has not escaped and is not used will, in
| fact, be optimized away, regardless of the "clear
| language of the Standard".
|
| Often, not performing what would appear to be a pointless
| optimization would block a more substantial optimization.
| Compiler writers optimize aggressively in order to break
| through to the substantial ones, which are most usually
| dead-code elimination that can then propagate backward
| and upward.
|
| If you really need for a volatile object and operations
| on it not to be optimized away, you need to allow a
| pointer to it to escape to somewhere the compiler cannot
| see. Have a function in a separate ".o" file, and pass
| the pointer to that.
| chrisseaton wrote:
| A similar thing is that I often come across people who are well-
| informed but still surprised that compilers will combine two
| observable operations into one, and complain that some other
| thread somehow 'should' be able to observe the intermediate
| state. But I don't understand how they think they would be able
| to tell the difference between an interleaving that _never
| happens to happen_ , and one that _will never happen_.
| voidnullnil wrote:
| So what's the motivation for the article? Presumably someone
| complained that the compiler optimizing atomics broke their
| code.
| chrisseaton wrote:
| If they need some interleaving to happen at least n% of the
| time for their program to work... what am I supposed to do?
| Because it was never guaranteed to be interleaved at least n%
| of the time, even when the atomics were not merged. Do they
| want me to bypass the system scheduler and run threads
| concurrently with forced interleavings against a statistical
| distribution? I doubt they want that. So what do they want?
| historyloop wrote:
| It can be tricky in that fuzzing your program to discover race
| conditions seems to behave perfectly on one compilation target
| where these atomics are detected and reified, while you later
| discover on another platform surprisingly your app crashes 20%
| of the time.
|
| Ideally we test our applications on all hardware, with all
| configuration permutations etc. etc. In practice we do rely on
| our compilers translating our intent accurately and sometimes
| such edge cases matter.
|
| Compatibility is a tricky thing. It's kind of like the argument
| whether adding optional arguments to a function breaks BC. It
| doesn't break BC if you don't pass those parameters. But if for
| some reason you were passing extra parameters hoping they'd be
| ignored (for example as a handler some other place) then adding
| optional parameters WILL break BC and cause your software's
| behavior to be undefined.
| gizmo686 wrote:
| It is not about translating intent accurately. What you need
| is for compilation to be fully specified. Anywhere where
| there is multiple correct ways to compile something
| introduces a risk for bugs that only show up with some
| compilers.
|
| If you are a compiler or library writer, one solution is to
| avoid having useful properties that are not part of the spec.
| For instance, Go does not guarantee any particular iteration
| order for hashmaps; so they go out of there way to randomize
| the iteration order, thereby preventing developers from
| writing code that depends on a deterministic order.
|
| In the case of threading, what you would need to do is
| essentially have a compiler/runtime that goes out of its way
| to order and time operation in a random/adversarial manner.
|
| I've seen research that looks into doing this in a VM
| environment; which would be inhibited by the type of compiler
| optimizations being discussed. And others that modify the
| compiler itself to replace the concurrency primitives with
| runtime functions, that can then execute them in a fuzzed
| order.
|
| Ultimately, fuzzing and testing can only give you confidence
| that what is being tested is mostly correct. It can never
| give you confidence that what is written is entirely correct.
| If you want confidence in the latter, you need to invest in
| some form of static analysis (which could either be built
| into the language, such as a type system, or be an external
| analysis tool). Ultimatly, writing even a moderately
| complicated program (by modern standards) with full
| confidence in its correctness would involve advancing the
| state of the art of the field by decades (if not longer).
|
| For the most part, the field just doesn't care about programs
| being fully correct; and accept it as a fact of life that
| going onto new/untested platforms and configurations will
| introduce/expose bugs.
| b3morales wrote:
| Very interesting. I would love to see tools emerge out of
| that research. In my experience one of the basic problems
| with testing/verifying concurrent code is that it is often
| difficult-to-impossible to perturb a particular part of the
| system, to exercise different orderings, and thus find
| corners.
| kccqzy wrote:
| The article has this very nice example here:
| x.fetch_add(1, std::memory_order_relaxed);
| y.fetch_add(1, std::memory_order_relaxed);
| x.fetch_add(1, std::memory_order_relaxed);
| y.fetch_add(1, std::memory_order_relaxed);
|
| Can a different thread observe x and y only incremented once?
| Before the optimization yes, it's possible. After the
| optimization, no, it's not possible. The compiler removes a
| possible observable state in the course of optimization, and
| that occasionally surprises people.
| chrisseaton wrote:
| > and that occasionally surprises people
|
| But what did they want?
|
| They tell me they're not happy with 0% chance of some
| interleaving they need. Are they happy with 0.000000000001%?
| I'm not sure I understand the practical difference between
| that and 0%? Can I tell them it is possible, but rely on the
| death of the universe happening before they have a right to
| complain it didn't turn up yet? If they have some hard
| minimum they require, then what do they expect me to do?
| Completely abstract from the system scheduler to make it
| happen?
|
| Doesn't seem a reasonable expectation from programmers on
| compiler writers, and likely wouldn't really be what they
| wanted even if I made it happen.
| dataflow wrote:
| It might be 0.000000000001% but it need not be, right?
| That's kind of the point. I'm pretty sure I could write you
| a program where the probability of observing that would be
| (say) 50% without the optimization, and with the
| optimization that would go to 0%, defeating the program.
| Should that be allowed?
|
| This is basically a statistical/probabilistic version of
| another problem: should a compiler be allowed to change the
| time complexity of an algorithm? If I write linear search,
| and the compiler proves the list is sorted, should it be
| allowed to switch to binary search--or vice-versa?
| Traditionally the answer might be 'yes', but it's not
| exactly hard to argue an answer of 'no' might also be
| desirable in a number of situations. Now in this case, it's
| not the time complexity that's changing, but the
| statistical properties of the program... and while in some
| cases it might be negligible (1E-10 or what have you),
| that's not necessarily the case, at which point a
| difference in degree really can become a difference in
| kind.
| chrisseaton wrote:
| But no mechanism was ever guaranteeing the 50% in the
| first place. So their program could already stop working
| correctly without any changes and with the same compiler.
| What I'm saying is I don't know how they'd even tell the
| difference between my compiler being the problem, or them
| just being unlucky and not getting the interleavings they
| want even though they're statistically likely.
| dataflow wrote:
| I'm saying if you made a complier that made reduced that
| expected 50% to roughly 0%, it wouldn't matter to the
| user if it was 1E-10 or exactly 0%. _Both_ could be
| undesirable for exactly the same reason. Again: you can
| complain that the 1E-10 compiler is till "correct"
| because there was never a 50% guarantee, but in doing
| that you're completely missing the point. Here's an
| exaggerated example. Imagine your Tesla autopilot's
| accuracy suddenly dropped from >99.9999% to <0.00001%
| overnight because of some stupid optimization (maybe this
| one) that was _technically_ correct. Assume there was
| never a _guarantee_ it would stay this accurate. Do you
| _really_ think you could shift the blame to the user
| because you were "technically" not guaranteeing
| anything?
| overgard wrote:
| > A similar thing is that I often come across people who are
| well-informed but still surprised that compilers will combine
| two observable operations into one
|
| Well, think of it this way, if you're not into compilers
| wouldn't you expect your code to run as written? Especially if
| you put a breakpoint in there and you can see each step
| happening sequently. I don't really think people that don't
| know these things should be viewed as ignorant because the
| compiler tends to be doing some very unintuitive things.
| dang wrote:
| One past thread:
|
| _No Sane Compiler Would Optimize Atomics_ -
| https://news.ycombinator.com/item?id=11694325 - May 2016 (56
| comments)
| MauranKilom wrote:
| Could you fix the title to not be almost a century in the
| future please? :)
| elliotpage wrote:
| 2105? I love posts from the future
| tester756 wrote:
| >For Compiler Writers
|
| >Get back to work, there's so much more to optimize... and so
| much code to break! Help users write good code: the compiler
| should provide diagnostics when it detects anti-patterns or
| misuses of atomics.
|
| Is it worth to put effort into compiler optimizations instead of
| putting more effort into providing "diagnostics" and informations
| for programmer?
|
| in context of this:
| http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.29....
| kccqzy wrote:
| The very paragraph you are quoting asks compiler writers to
| provide better diagnostics.
| tester756 wrote:
| but there's also
|
| >Get back to work, there's so much more to optimize...
|
| my point is that maybe code optimization research time could
| be spent better
| overgard wrote:
| I'd love to have a compiler that just tells me what optimizations
| could be made, but let me either do it myself (if syntactically
| available in the language), or explicitly mark the optimization
| as ok through a #pragma or something like that. I just think
| having to read the disassembly to figure out what happened isn't
| a great user experience.
___________________________________________________________________
(page generated 2021-07-11 23:00 UTC)