Subj : Re: Memory Barriers, Compiler Optimizations, etc. To : comp.programming.threads From : Joseph Seigh Date : Thu Feb 03 2005 01:47 pm On Thu, 3 Feb 2005 09:01:24 -0800, Scott Meyers wrote: > On Wed, 02 Feb 2005 07:43:15 -0500, Joseph Seigh wrote: >> On Tue, 1 Feb 2005 20:38:08 -0800, Scott Meyers wrote: >> > Assuming that x, y, a, and b are all distinct locations, is it reasonable >> > to assume that no compiler will move the assignment to y above the barrier, >> > or is it necessary to declare x and y volatile to prevent such code motion? >> >> Hypothetically, yes. Volatile wouldn't help as it has no meaning for >> threads. If the variables are only known to the local scope, ie. they're >> not external or have had an address taken, then the compiler can move them >> whereever it wants since no other thread can see them. > > My concern wrt volatile was that treatments of memory issues refer to > "program order" as if it's the same as "source code order," but with > compilers moving stuff around prior to code generation, "source code order" > may be quite different from "program order." At least in C++, if I want to > ensure that the the relative order of these reads is preserved, > > x = a; // I want x to be read before y > y = b; > > declaring x and y volatile will do it. Compilers can still move the reads > around wrt reads and writes of non-volatile data, but to remain compliant > with the C++ standard, x must be read before y in the generated code, i.e., > in program order. I guess. I'm not real familiar with volatile since it's not that useful in threading. If expressions are sequence points then that should make every statement a sequence point also. > > However, if compilers recognize and respect the semantics of membars, the > need for volatile goes away, because I can just stick a membar between the > reads (which I need anyway), and the problem is solved. AFAIK they don't, so we have to use the ad hoc solutions that we use now. > > Incidently, I understand how compiler intrinsics like Microsoft's > _ReadWriteBarrier are recognized by compilers, but from what I've read in > this group, there seems to be the assumption that calling an externally > defined function containing assembler will prevent code motion across > calls to the function, because compilers must pessimistically assume that > calls to the function affect all memory locations. With increasingly > aggressiving cross-module inlining technology available, this seems like a > bet that gets worse and worse with time. It's not hard to imagine a build > system that can see that a called function doesn't affect the value of a > global variable and thus move a read or write of that variable across the > call. Is there a reason this can't happen, or are we just lucky that our > tools are, for the time being, both conservative and kind of dumb? The latter. We're just lucky for now. There seems to be extreme antipathy towards threading issues in the C community at least. Try to ask any thread specific questions in the C newsgroups at least and you get "C has nothing to do with threads" response. There's less of that in the C++ newsgroups now since Herb Sutter, Andrei Alexandrescu, and yourself maybe, have picked up on and started promoting threading. For example, I never got any authoritative response as to why Linux assumes int loads and stores are atomic in ia32. Apparentlly it's either some undocumented communication somewhere or, more likely, someone is just assuming that since gcc does atomic load/store of int for every case they've observed, it must do so for all cases. It sort of the same for separately compiled external functions. You assume that the compiler has to drop optimization for any variable that has had its address gotten from or passed to an external routine, or has the external attribute. It could break at some point and we'll have to start writing all the synchronization functions in external assembler programs. That will make memory barriers more expensive than they already are. It's not just C and C++ you have to worry about. Hardware architects have even less of a clue about multi-threading than compiler writers. Their sophistication ends at using a test and set to implement a lock. They have no notion of how people are actually doing concurrent programming. With the use of RCU (Read Copy Update) in the Linux kernal, they've adopted the use of load dependent memory barriers to avoid the more expensive load fence memory barriers. The load dependent memory barriers aren't part of any architected memory model, so hardware architects definitely are not aware that they're being used. It's a distinct possibility that some hardware vendor will break it, much to their detriment in the marketplace. There's a pseudo-op in Linux for this so they can put in a real memory barrier if needed. Currently alpha processors don't support dependent load memory ordering. There's was a discussion on this in Linux kernel mailing list back during the implementation of RCU in Linux, but there's no explicit documentation that will carry forward. -- Joe Seigh .