Subj : Re: What is the real costs of LOCK on x86 multiprocesor machine?
To   : comp.programming.threads
From : chris noonan
Date : Mon Aug 01 2005 11:37 am


Joe Seigh wrote:
> chris noonan wrote:
> >
> > Now that memory chips have millions of transistors,
> > a few could be spared for a primitive ALU. Then add some
> > extra transaction types to the memory bus. One
> > such transaction would implement waiting on a
> > semaphore. The memory controller performs a read
> > cycle to get the value of the specified memory word,
> > decrements it in its ALU (unless already zero),
> > performs a write cycle to put the new value back in
> > memory, then returns the old value of the word
> > across the bus to the requesting processor. This
> > sequence would be atomic with respect to other
> > processors, trivially.
>
> This is called fetch-and-op, op being inc(rement), dec(rement),
> etc...  It's an atomic read, modify, write instruction.
> Itanium has it with the fetchadd instruction though it
> only works that way with special uncached memory so you
> can't really use it with normal shared memory.

The way I meant it is already in the literature:
"Highly Efficient Synchronization Based on Active Memory
Operations"
Zhang, Fang, Carter
http://www.cs.utah.edu/~retrac/papers/ipdps04.pdf

Chris

.