Subj : Memory Barriers, Compiler Optimizations, etc.
To   : comp.programming.threads
From : Scott Meyers
Date : Tue Feb 01 2005 08:38 pm

I've encountered documents (including, but not limited to, postings on this
newsgroup) suggesting that acquire barriers and release barriers are not
standalone entities but are instead associated with loads and stores,
respectively.  Furthermore, making them standalone seems to change
semantics (see below).  Yet APIs for inserting them via languages like C or
C++ seem to deal with barriers alone -- there are no associated reads or
writes.  (See, for example,
http://msdn.microsoft.com/library/default.asp?url=/library/en-
us/vclang/html/vclrf_readwritebarrier.asp.)
So I have two questions about this.  First, are acquire/release properly
part of loads/stores, or does it make sense for them to be standalone?  If
the former, how are programmers in languages like C/C++ expected to make
the association between reads/writes and memory barriers?

Next, is it reasonable to assume that compilers will recognize memory
barrier instructions and not perform code motion that is contrary to their
meaning?  For example:

  x = a;
  insertAcquireBarrier();    // or x.acquire = 1 if the barrier
                             // should not be standalone
  y = b;

Assuming that x, y, a, and b are all distinct locations, is it reasonable
to assume that no compiler will move the assignment to y above the barrier,
or is it necessary to declare x and y volatile to prevent such code motion?

Finally, is the following reasoning (prepared for another purpose, but
usable here, I hope) about the semantics of memory barriers correct?

  Based on the ICSA 90 paper introducing release consistency, I think of an
  acquire barrier as a way of saying "I'm about to enter a critical section,"
  and a release barrier as a way of saying "I'm about to leave a critical
  section."  So consider this situation, where we want to ensure that
  memory location 1 is accessed before memory location 2:
  
    Access memory location 1
    Announce entry to critical section       // acquire barrier
    Announce exit from critical section      // release barrier
    Access memory location 2
  
  We have to prevent stuff from moving out of the critical section, but
  there's no reason to keep stuff from moving into it.  That is, if x is a
  shared variable, we need to access it only within a critical section, but
  if y is thread-local, compilers can perform code motion to move access of y
  into the critical section without harm (except that the critical section is
  now going to take longer to execute).
  
  Neither access above is inside the critical section, so both can be moved:
  
    Announce entry to critical section
    Access memory location 1                 // moved this down
    Access memory location 2                 // moved this up
    Announce exit from critical section
  
  But within a critical section, instructions can be reordered at will, as
  long as they are independent.  So let's assume that the two memory
  locations are independent.  That makes this reordering possible:
  
    Announce entry to critical section
    Access memory location 2      
    Access memory location 1      
    Announce exit from critical section
  
  And now we're hosed.  
  
  On the other hand, if the memory barriers are part of the loads/stores, we
  have this:
  
    acquire & access memory location 2
    access memory location 3
  
  Because you can't move subsequent accesses up above an acquire (i.e. you
  can't move something out of a critical section), you're guaranteed that
  location 1 must be accessed before location 2.
  
Thanks for all clarifications,

Scott

.