Subj : Re: Memory visibility and MS Interlocked instructions
To   : comp.programming.threads
From : Alexander Terekhov
Date : Mon Sep 05 2005 08:23 pm


Sean Kelly wrote:
> 
> Alexander Terekhov wrote:
> > Alexander Terekhov wrote:
> > [...]
> > > Nah, loops are needed for LR-SC on Power. For x86, it is just a single
> > > load followed by InterlockedCompareExchange(&addr, temp, temp) [MP
> >
> > Silly me. InterlockedCompareExchange(&addr, 42, 42) should work just
> > fine. I've asked Andy Glew of Intel to confirm it ("Intel x86 memory
> > model question" thread on comp.arch).
> 
> A load/store combination would definately work, 

What load/store combination?

>                                                 but if CMPXCHG would
> work as well then so much the better.  Is a separate load even
> necessary then? 

No.
 
>                 Assuming *addr != 42 then we've essentially loaded
> addr twice in a row.

CMPXCHG on x86 always performs a (hopefully StoreLoad+LoadLoad fenced) 
load followed by a (LoadStore+StoreStore fenced) store (plus trailing
MFENCE, so to speak). (CMPXCHG is supposed to be "fully fenced".) You 
just need to ensure that "source operand" register has the same value 
as "Accumulator = AL, AX, EAX, or RAX depending on whether a byte, 
word, doubleword, or quadword comparison is being performed". CMPXCHG 
will store the loaded value (if it's different) in the accumulator.

Or am I just reading faked ia32 specs?

regards,
alexander.

.