Subj : Re: What is the real costs of LOCK on x86 multiprocesor machine? To : comp.programming.threads From : Joe Seigh Date : Mon Aug 01 2005 03:22 pm chris noonan wrote: > Joe Seigh wrote: > >>chris noonan wrote: >> >>>Now that memory chips have millions of transistors, >>>a few could be spared for a primitive ALU. Then add some >>>extra transaction types to the memory bus. One >>>such transaction would implement waiting on a >>>semaphore. The memory controller performs a read >>>cycle to get the value of the specified memory word, >>>decrements it in its ALU (unless already zero), >>>performs a write cycle to put the new value back in >>>memory, then returns the old value of the word >>>across the bus to the requesting processor. This >>>sequence would be atomic with respect to other >>>processors, trivially. >> >>This is called fetch-and-op, op being inc(rement), dec(rement), >>etc... It's an atomic read, modify, write instruction. >>Itanium has it with the fetchadd instruction though it >>only works that way with special uncached memory so you >>can't really use it with normal shared memory. > > > The way I meant it is already in the literature: > "Highly Efficient Synchronization Based on Active Memory > Operations" > Zhang, Fang, Carter > http://www.cs.utah.edu/~retrac/papers/ipdps04.pdf > It seems to be meant for parallel computation where the cpus are dedicated to the computation and it's ok for a cpu to just wait on the result of the atomic operation. These would be supercomputers with individual computers connected with high speed links and they're interested in reducing network traffic over the links. I suppose this might be applicable to regular multi-threaded programming once you get 50 to 100 core processors widely available and you can dispense with time slicing. -- Joe Seigh When you get lemons, you make lemonade. When you get hardware, you make software. .