Subj : Re: What is the real costs of LOCK on x86 multiprocesor machine? To : comp.programming.threads From : chris noonan Date : Mon Aug 01 2005 11:37 am Joe Seigh wrote: > chris noonan wrote: > > > > Now that memory chips have millions of transistors, > > a few could be spared for a primitive ALU. Then add some > > extra transaction types to the memory bus. One > > such transaction would implement waiting on a > > semaphore. The memory controller performs a read > > cycle to get the value of the specified memory word, > > decrements it in its ALU (unless already zero), > > performs a write cycle to put the new value back in > > memory, then returns the old value of the word > > across the bus to the requesting processor. This > > sequence would be atomic with respect to other > > processors, trivially. > > This is called fetch-and-op, op being inc(rement), dec(rement), > etc... It's an atomic read, modify, write instruction. > Itanium has it with the fetchadd instruction though it > only works that way with special uncached memory so you > can't really use it with normal shared memory. The way I meant it is already in the literature: "Highly Efficient Synchronization Based on Active Memory Operations" Zhang, Fang, Carter http://www.cs.utah.edu/~retrac/papers/ipdps04.pdf Chris .