Subj : Re: What is the real costs of LOCK on x86 multiprocesor machine?
To   : comp.programming.threads
From : chris noonan
Date : Thu Aug 04 2005 04:01 am


Joe Seigh wrote:
> chris noonan wrote:
> > The way I meant it is already in the literature:
> > "Highly Efficient Synchronization Based on Active Memory
> > Operations"
> > Zhang, Fang, Carter
> > http://www.cs.utah.edu/~retrac/papers/ipdps04.pdf
> >
>
> It seems to be meant for parallel computation where the
> cpus are dedicated to the computation and it's ok for a
> cpu to just wait on the result of the atomic operation.
> These would be supercomputers with individual computers
> connected with high speed links and they're interested in
> reducing network traffic over the links.
>
> I suppose this might be applicable to regular multi-threaded
> programming once you get 50 to 100 core processors widely available
> and you can dispense with time slicing.

I suspect it would also be valuable for PC-type systems
having (say) four processors. These are currently architected
with a MESI crossbar equalising the processor data caches.
The purpose of this is presumably to provide *implicit*
synchronisation between threads running on different
processors.

However, because of the Pentium's out-of-order execution
and write buffering, *explicit* synchronisation (memory
barriers, bus-locking instructions) is required in any
case. So what does cache coherency achieve?

Why not synchronise processors at the physical point
where they are already connected, at main memory?

Chris

.