Subj : Re: What is the real costs of LOCK on x86 multiprocesor machine? To : comp.programming.threads From : chris noonan Date : Thu Aug 04 2005 04:01 am Joe Seigh wrote: > chris noonan wrote: > > The way I meant it is already in the literature: > > "Highly Efficient Synchronization Based on Active Memory > > Operations" > > Zhang, Fang, Carter > > http://www.cs.utah.edu/~retrac/papers/ipdps04.pdf > > > > It seems to be meant for parallel computation where the > cpus are dedicated to the computation and it's ok for a > cpu to just wait on the result of the atomic operation. > These would be supercomputers with individual computers > connected with high speed links and they're interested in > reducing network traffic over the links. > > I suppose this might be applicable to regular multi-threaded > programming once you get 50 to 100 core processors widely available > and you can dispense with time slicing. I suspect it would also be valuable for PC-type systems having (say) four processors. These are currently architected with a MESI crossbar equalising the processor data caches. The purpose of this is presumably to provide *implicit* synchronisation between threads running on different processors. However, because of the Pentium's out-of-order execution and write buffering, *explicit* synchronisation (memory barriers, bus-locking instructions) is required in any case. So what does cache coherency achieve? Why not synchronise processors at the physical point where they are already connected, at main memory? Chris .