Subj : Re: Memory Barriers, Compiler Optimizations, etc.
To   : comp.programming.threads
From : Gianni Mariani
Date : Tue Feb 01 2005 11:14 pm

Scott Meyers wrote:
....
> 
>   x = a;
>   insertAcquireBarrier();    // or x.acquire = 1 if the barrier
>                              // should not be standalone
>   y = b;
> 
> Assuming that x, y, a, and b are all distinct locations, is it reasonable
> to assume that no compiler will move the assignment to y above the barrier,
> or is it necessary to declare x and y volatile to prevent such code motion?

There is no standard, however, GCC (and I believe MSVC) provide 
non-standard mechanisms to ensure that the aquire barrier is not moved 
(by the way that the barrier function is defined).

> 
> Finally, is the following reasoning (prepared for another purpose, but
> usable here, I hope) about the semantics of memory barriers correct?
> 
>   Based on the ICSA 90 paper introducing release consistency, I think of an
>   acquire barrier as a way of saying "I'm about to enter a critical section,"
>   and a release barrier as a way of saying "I'm about to leave a critical
>   section."  So consider this situation, where we want to ensure that
>   memory location 1 is accessed before memory location 2:
>   
>     Access memory location 1
>     Announce entry to critical section       // acquire barrier
>     Announce exit from critical section      // release barrier
>     Access memory location 2
>   
>   We have to prevent stuff from moving out of the critical section, but
>   there's no reason to keep stuff from moving into it.  That is, if x is a
>   shared variable, we need to access it only within a critical section, but
>   if y is thread-local, compilers can perform code motion to move access of y
>   into the critical section without harm (except that the critical section is
>   now going to take longer to execute).

I'd be very surprised to see the compiler violate the sequence of the 
memory barrier.

>   
>   Neither access above is inside the critical section, so both can be moved:
>   
>     Announce entry to critical section
>     Access memory location 1                 // moved this down
>     Access memory location 2                 // moved this up
>     Announce exit from critical section
>   
>   But within a critical section, instructions can be reordered at will, as
>   long as they are independent.  So let's assume that the two memory
>   locations are independent.  That makes this reordering possible:
>   
>     Announce entry to critical section
>     Access memory location 2      
>     Access memory location 1      
>     Announce exit from critical section
>   
>   And now we're hosed.  

The compiler (or the code) would be broken if it did that.

>   
>   On the other hand, if the memory barriers are part of the loads/stores, we
>   have this:
>   
>     acquire & access memory location 2
>     access memory location 3
>   
>   Because you can't move subsequent accesses up above an acquire (i.e. you
>   can't move something out of a critical section), you're guaranteed that
>   location 1 must be accessed before location 2.

All aquire does is to guarentee that any load (memory fetch) operations, 
possibly many, that have been requested before the barrier instruction 
are completed before any subsequent memory fetch operations.

Similarly, release guarentees that all memory store operations before 
the release barrier instruction are made visible to other threads 
(CPU's) before any memory store operations after the release instruction.

It's more like a sequence point.

volatile int v1 = BAD;
volatile bool done = false;

reader:
a:	bool is_done = done;
b:	aquire();
c:	if ( is_done ) play_with( v1 );

writer:
x:	v1 = GOOD;
y:	release();
z:	done = true;

It's more like synchronizing points where the order of memory 
modifications must remain consistent with memory load operations.

In this case b: guarentees that the load for v1 must happen after the 
load of v1 and the store for v1 (x:) must happen before the store to z:.

Hence, the reader thread will never see the value of v1==BAD when done 
is true.

.