Subj : Re: What is the real costs of LOCK on x86 multiprocesor machine?
To   : comp.programming.threads
From : Joe Seigh
Date : Mon Aug 01 2005 03:22 pm

chris noonan wrote:
> Joe Seigh wrote:
> 
>>chris noonan wrote:
>>
>>>Now that memory chips have millions of transistors,
>>>a few could be spared for a primitive ALU. Then add some
>>>extra transaction types to the memory bus. One
>>>such transaction would implement waiting on a
>>>semaphore. The memory controller performs a read
>>>cycle to get the value of the specified memory word,
>>>decrements it in its ALU (unless already zero),
>>>performs a write cycle to put the new value back in
>>>memory, then returns the old value of the word
>>>across the bus to the requesting processor. This
>>>sequence would be atomic with respect to other
>>>processors, trivially.
>>
>>This is called fetch-and-op, op being inc(rement), dec(rement),
>>etc...  It's an atomic read, modify, write instruction.
>>Itanium has it with the fetchadd instruction though it
>>only works that way with special uncached memory so you
>>can't really use it with normal shared memory.
> 
> 
> The way I meant it is already in the literature:
> "Highly Efficient Synchronization Based on Active Memory
> Operations"
> Zhang, Fang, Carter
> http://www.cs.utah.edu/~retrac/papers/ipdps04.pdf
> 

It seems to be meant for parallel computation where the
cpus are dedicated to the computation and it's ok for a
cpu to just wait on the result of the atomic operation.
These would be supercomputers with individual computers
connected with high speed links and they're interested in
reducing network traffic over the links.

I suppose this might be applicable to regular multi-threaded
programming once you get 50 to 100 core processors widely available
and you can dispense with time slicing.


-- 
Joe Seigh

When you get lemons, you make lemonade.
When you get hardware, you make software.

.