Subj : Re: Memory visibility and MS Interlocked instructions To : comp.programming.threads From : Peter Dimov Date : Thu Sep 01 2005 11:37 am Sean Kelly wrote: > Alexander Terekhov wrote: > > Joe Seigh wrote: > > [...] > > > I'm finding it impossible to argue with a moving target. If I subtract > > > everything you say out, it pretty much sounds like ia32 loads are not > > > always guaranteed to be "in-order". > > > > They are always "in-order" with respect to other loads and subsequent > > stores. What you can't grok is that ia32 *stores* (being PC stores; > > i.e. RCpc release stores) are not constrained to ensure "remote write > > atomicity" (in IA64 formal memory model speak). > > Okay now you've got me confused because it sounds like you're arguing > Joe's case. If IA-32 stores do not "become remotely visible to all > processors in the same order" then the assertion that all stores have > release semantics is only true at a processor level, which would imply > that a membar is required to globally order writes. Thus, msync.acq > and msync.rel operations (to use atomic<> semantics) would both need > the LOCK prefix. Is this correct? ld.acq and st.rel do not guarantee total store ordering. If you have X = 0, Y = 0 CPU1: st.rel X 1 ld.acq Y CPU2: st.rel Y 1 ld.acq X It is possible for CPU1 and CPU2 to both load 0. This can't happen in a TSO model (*), because one of the two stores must execute first. -- (*) I don't know for sure whether SPARC-TSO is really TSO, though. .