Subj : Re: Memory visibility and MS Interlocked instructions
To   : comp.programming.threads
From : Peter Dimov
Date : Thu Sep 01 2005 11:37 am

Sean Kelly wrote:
> Alexander Terekhov wrote:
> > Joe Seigh wrote:
> > [...]
> > > I'm finding it impossible to argue with a moving target.  If I subtract
> > > everything you say out, it pretty much sounds like ia32 loads are not
> > > always guaranteed to be "in-order".
> >
> > They are always "in-order" with respect to other loads and subsequent
> > stores. What you can't grok is that ia32 *stores* (being PC stores;
> > i.e. RCpc release stores) are not constrained to ensure "remote write
> > atomicity" (in IA64 formal memory model speak).
>
> Okay now you've got me confused because it sounds like you're arguing
> Joe's case.  If IA-32 stores do not "become remotely visible to all
> processors in the same order" then the assertion that all stores have
> release semantics is only true at a processor level, which would imply
> that a membar is required to globally order writes.  Thus, msync.acq
> and msync.rel operations (to use atomic<> semantics) would both need
> the LOCK prefix.  Is this correct?

ld.acq and st.rel do not guarantee total store ordering. If you have

X = 0, Y = 0

CPU1:

st.rel X 1
ld.acq Y

CPU2:

st.rel Y 1
ld.acq X

It is possible for CPU1 and CPU2 to both load 0. This can't happen in a
TSO model (*), because one of the two stores must execute first.

--

(*) I don't know for sure whether SPARC-TSO is really TSO, though.

.