Subj : Re: Memory Barriers, Compiler Optimizations, etc. To : comp.programming.threads From : Joseph Seigh Date : Wed Feb 02 2005 07:43 am On Tue, 1 Feb 2005 20:38:08 -0800, Scott Meyers wrote: > I've encountered documents (including, but not limited to, postings on this > newsgroup) suggesting that acquire barriers and release barriers are not > standalone entities but are instead associated with loads and stores, > respectively. Furthermore, making them standalone seems to change > semantics (see below). Yet APIs for inserting them via languages like C or > C++ seem to deal with barriers alone -- there are no associated reads or > writes. (See, for example, > http://msdn.microsoft.com/library/default.asp?url=/library/en- > us/vclang/html/vclrf_readwritebarrier.asp.) > So I have two questions about this. First, are acquire/release properly > part of loads/stores, or does it make sense for them to be standalone? If > the former, how are programmers in languages like C/C++ expected to make > the association between reads/writes and memory barriers? Memory barriers aren't directly observable, so you have to define them in terms of their effect on stuff that is observable, e.g. reads and writes mainly. > > Next, is it reasonable to assume that compilers will recognize memory > barrier instructions and not perform code motion that is contrary to their > meaning? For example: > > x = a; > insertAcquireBarrier(); // or x.acquire = 1 if the barrier > // should not be standalone > y = b; s/will/should/ Yes. > > Assuming that x, y, a, and b are all distinct locations, is it reasonable > to assume that no compiler will move the assignment to y above the barrier, > or is it necessary to declare x and y volatile to prevent such code motion? Hypothetically, yes. Volatile wouldn't help as it has no meaning for threads. If the variables are only known to the local scope, ie. they're not external or have had an address taken, then the compiler can move them whereever it wants since no other thread can see them. It might be nice to have a new attribute like "shared" rather than volatile to start with a clean slate. "shared" would actually have to have a real thread behavior relevant definition, not whatever the implementation feels like which is the case with volatile. > > Finally, is the following reasoning (prepared for another purpose, but > usable here, I hope) about the semantics of memory barriers correct? > > Based on the ICSA 90 paper introducing release consistency, I think of an > acquire barrier as a way of saying "I'm about to enter a critical section," > and a release barrier as a way of saying "I'm about to leave a critical > section." So consider this situation, where we want to ensure that > memory location 1 is accessed before memory location 2: > Access memory location 1 > Announce entry to critical section // acquire barrier > Announce exit from critical section // release barrier > Access memory location 2 > We have to prevent stuff from moving out of the critical section, but > there's no reason to keep stuff from moving into it. That is, if x is a > shared variable, we need to access it only within a critical section, but > if y is thread-local, compilers can perform code motion to move access of y > into the critical section without harm (except that the critical section is > now going to take longer to execute). > Neither access above is inside the critical section, so both can be moved: > Announce entry to critical section > Access memory location 1 // moved this down > Access memory location 2 // moved this up > Announce exit from critical section > But within a critical section, instructions can be reordered at will, as > long as they are independent. So let's assume that the two memory > locations are independent. That makes this reordering possible: > Announce entry to critical section > Access memory location 2 > Access memory location 1 > Announce exit from critical section > And now we're hosed. > On the other hand, if the memory barriers are part of the loads/stores, we > have this: > acquire & access memory location 2 > access memory location 3 > Because you can't move subsequent accesses up above an acquire (i.e. you > can't move something out of a critical section), you're guaranteed that > location 1 must be accessed before location 2. For the thread that executed that critical section, the accesses always appear to have happened in program logical order. Any reordering by the compiler and processor is supposed to be transparent to that thread. As far as what other threads can see, the order of accesses done with out a lock is undefined and your example has them done while not holding the lock. Note that there are two sets of accesses done by two different threads. If only one thread uses a lock, you still have a problem determining the order of accesses by the other threads if they don't do the accesses using the same lock. The problem with discussing what should be happening here is that Posix never formally defined sematics for synchronization. You develope a fairly good idea after doing threaded programming for a while, though some still seem to be off a bit. I made an attempt at a formal defintion here http://groups.google.com/groups?threadm=3A111C5A.A49B55CA%40genuity.com which maybe you can take a look at. It might give you a sense of what some of the issues are. I've redid the memory visibility definition so what I have now is substantially different. It also attempts to define other synchronization constructs. It's unfinished at this point since it takes a lot of concentration to work on it and formal sematics doesn't seem to be a hight priority with anyone, and I already have a good idea of what the semantics probably are. >Thanks for all clarifications, > > Scott > -- Joe Seigh .