Subj : Re: Memory visibility and MS Interlocked instructions
To   : comp.programming.threads
From : Alexander Terekhov
Date : Wed Aug 31 2005 01:09 pm


Mayan Moudgill wrote:
[...]
> Having looked at Andy's answer, it appears that the following is possible:
> 
> Y,Z are initially 0.
> processor 1 writes Y with 2.
> processor 2 reads 2 from Y and writes Z with that value.
> processor 3 reads 2 from Z and 0 from Y.
> 
> This is the "obvious" behavior assuming a processor with in-order stores
> (via a store buffer) using ...

Sure. 

Contrast it with (that's what I thought Joe was driving at when he 
alleged that PC is somewhat more relaxed than RC... I mean his claim
that "processor consistency doesn't give you acquire and release as 
they are commonly understood"):

X,Y,Z are initially 0.
processor 1 writes X with 42.
processor 1 writes Y with 2.
processor 2 reads 2 from Y and writes Z with that value.
processor 3 reads 2 from Z and 0 from X.

Boom! PC doesn't allow that behavior because before processor 1 is 
allowed to perform store to Y *with respect to any other processor*, 
preceding store to X must be performed *with respect to all other 
processors* ("as if" of course). See the PC "conditions" 
(1990-rc-isca.pdf) under "Extension to Dubois’ Abstraction" 
(1993-tr-68.pdf) "Performing a Memory Request"? terms.

http://research.compaq.com/wrl/people/kourosh/papers/1995_thesis.pdf

"The models we have considered up to now all provide the appearance of 
 a single copy of memory to the programmer. Processor consistency (PC) 
 [GLL+90, GGH93b] is the first model that we consider where the 
 multiple-copy aspects of the memory are exposed to the programmer.1 
 1 The processor consistency model described here is distinct from the 
 (informal) model proposed by Goodman [Goo89, Goo91].

 [...]
 
 The conceptual system consists of several processors each with their 
 own copy of the entire memory. By modeling memory as being replicated 
 at every processing node, we can capture the non-atomic effects that 
 arise due to presence of multiple copies of a single memory location.
 Since the memory no longer behaves as a single logical copy, we need 
 to extend the notion of read and write memory operations to deal with 
 the presence of multiple copies. Read operations are quite similar to
 before and remain atomic. The only difference is that a read is 
 satisfied by the memory copy at the issuing processor's node (i.e., 
 read from Pi is serviced by Mi). Write operations no longer appear 
 atomic, however. Each write operation conceptually results in all 
 memory copies corresponding to the location to be updated to the new 
 value. Therefore, we model each write as a set of n sub-operations, 
 W(1) ... W(n), where n is the number of processors, and  each sub-
 operation represents the event of updating one of the memory copies 
 (e.g., W(1) updates the location in M1)."

regards,
alexander.

.