Subj : Re: Memory visibility and MS Interlocked instructions To : comp.programming.threads From : Sean Kelly Date : Mon Sep 05 2005 11:45 am Alexander Terekhov wrote: > Sean Kelly wrote: > > > > Alexander Terekhov wrote: > > > Alexander Terekhov wrote: > > > [...] > > > > Nah, loops are needed for LR-SC on Power. For x86, it is just a single > > > > load followed by InterlockedCompareExchange(&addr, temp, temp) [MP > > > > > > Silly me. InterlockedCompareExchange(&addr, 42, 42) should work just > > > fine. I've asked Andy Glew of Intel to confirm it ("Intel x86 memory > > > model question" thread on comp.arch). > > > > A load/store combination would definately work, > > What load/store combination? Two MOVs. Though for contended addresses this could lose data, so perhaps it's no such a good idea :) > > Assuming *addr != 42 then we've essentially loaded > > addr twice in a row. > > CMPXCHG on x86 always performs a (hopefully StoreLoad+LoadLoad fenced) > load followed by a (LoadStore+StoreStore fenced) store (plus trailing > MFENCE, so to speak). (CMPXCHG is supposed to be "fully fenced".) You > just need to ensure that "source operand" register has the same value > as "Accumulator = AL, AX, EAX, or RAX depending on whether a byte, > word, doubleword, or quadword comparison is being performed". CMPXCHG > will store the loaded value (if it's different) in the accumulator. > > Or am I just reading faked ia32 specs? You're not, I was accidentally replying to a previous post of yours: > For x86, it is just a single load followed by InterlockedCompareExchange Sorry for the confusion. I guess now it's mostly an issue of how to build this into an API as I assume this load method should not be the default, even for msync::acq tagged ops. Sean .