Subj : Re: What is the real costs of LOCK on x86 multiprocesor machine? To : comp.programming.threads From : Joe Seigh Date : Sat Jul 30 2005 08:18 am David Schwartz wrote: > "Mirek Fidler" wrote in message > news:3l0uv7F10k8vjU1@individual.net... > >>Unfortunately, at the moment I do not have hardware to test... > > > Sadly, the penalty comes from the fact that the lock must take place > during the execution of a single instruction. This means that no other > operations can overlap that instruction, so the pipelines all empty, the > instruction is processed, then the pipelines fill again. This has a huge > cost on the p4. > It doesn't serialize the processor, there are instructions that do this. It "serializes" the memory accesses, i.e. they complete rather than just get ordered. Instruction prefetching can still occur so the pipeline isn't completely flushed. Self modifying code needs to use additional synchronization. "Locked operations are atomic with respect to all other memory operations and all externally visible events. Only instruction fetch and page table accesses can pass locked instructions. Locked instructions can be used to synchronize data written by one processor and read by another processor. For the P6 family processors, locked operations serialize all outstanding load and store operations (that is, wait for them to complete). This rule is also true for the Pentium 4 and Intel Xeon processors, with one exception: load operations that reference weakly ordered memory types (such as the WC memory type) may not be serialized." I'm not sure that last bit means. Interlocked instructions aren't guaranteed to order loads? -- Joe Seigh When you get lemons, you make lemonade. When you get hardware, you make software. .