Subj : Re: What is the real costs of LOCK on x86 multiprocesor machine?
To   : comp.programming.threads
From : Joe Seigh
Date : Sat Jul 30 2005 08:18 am

David Schwartz wrote:
> "Mirek Fidler" <cxl@volny.cz> wrote in message 
> news:3l0uv7F10k8vjU1@individual.net...
> 
>>Unfortunately, at the moment I do not have hardware to test...
> 
> 
>     Sadly, the penalty comes from the fact that the lock must take place 
> during the execution of a single instruction. This means that no other 
> operations can overlap that instruction, so the pipelines all empty, the 
> instruction is processed, then the pipelines fill again. This has a huge 
> cost on the p4.
> 
It doesn't serialize the processor, there are instructions that do this.
It "serializes" the memory accesses, i.e. they complete rather than just
get ordered.  Instruction prefetching can still occur so the pipeline
isn't completely flushed.  Self modifying code needs to use additional
synchronization.

"Locked operations are atomic with respect to all other memory operations and all externally
visible events. Only instruction fetch and page table accesses can pass locked instructions.
Locked instructions can be used to synchronize data written by one processor and read by
another processor.
For the P6 family processors, locked operations serialize all outstanding load and store operations
(that is, wait for them to complete). This rule is also true for the Pentium 4 and Intel Xeon
processors, with one exception: load operations that reference weakly ordered memory types
(such as the WC memory type) may not be serialized."

I'm not sure that last bit means.  Interlocked instructions aren't guaranteed to order
loads?  


-- 
Joe Seigh

When you get lemons, you make lemonade.
When you get hardware, you make software.

.