Subj : Re: Memory visibility and MS Interlocked instructions
To   : comp.programming.threads
From : Sean Kelly
Date : Mon Sep 05 2005 11:45 am

Alexander Terekhov wrote:
> Sean Kelly wrote:
> >
> > Alexander Terekhov wrote:
> > > Alexander Terekhov wrote:
> > > [...]
> > > > Nah, loops are needed for LR-SC on Power. For x86, it is just a single
> > > > load followed by InterlockedCompareExchange(&addr, temp, temp) [MP
> > >
> > > Silly me. InterlockedCompareExchange(&addr, 42, 42) should work just
> > > fine. I've asked Andy Glew of Intel to confirm it ("Intel x86 memory
> > > model question" thread on comp.arch).
> >
> > A load/store combination would definately work,
>
> What load/store combination?

Two MOVs.  Though for contended addresses this could lose data, so
perhaps it's no such a good idea :)

> >                 Assuming *addr != 42 then we've essentially loaded
> > addr twice in a row.
>
> CMPXCHG on x86 always performs a (hopefully StoreLoad+LoadLoad fenced)
> load followed by a (LoadStore+StoreStore fenced) store (plus trailing
> MFENCE, so to speak). (CMPXCHG is supposed to be "fully fenced".) You
> just need to ensure that "source operand" register has the same value
> as "Accumulator = AL, AX, EAX, or RAX depending on whether a byte,
> word, doubleword, or quadword comparison is being performed". CMPXCHG
> will store the loaded value (if it's different) in the accumulator.
>
> Or am I just reading faked ia32 specs?

You're not, I was accidentally replying to a previous post of yours:

> For x86, it is just a single load followed by InterlockedCompareExchange

Sorry for the confusion.

I guess now it's mostly an issue of how to build this into an API as I
assume this load method should not be the default, even for msync::acq
tagged ops.


Sean

.