Subj : Re: What is the real costs of LOCK on x86 multiprocesor machine? To : comp.programming.threads From : David Schwartz Date : Sat Jul 30 2005 04:21 am "Mirek Fidler" wrote in message news:3l0uv7F10k8vjU1@individual.net... > Well, I guess that I should have been be more specific: > If accessed memory is contained exclusively in one CPU cache, is it true > that LOCK penalty is lower than 100-200 cycles? No. > My question is mostly about reference counted shared objects (e.g. in > C++). What I want to find out is whether penalty is present in all cases, > or whether when only single thread "owns" shared data (reference count is > not accessed by other threads), penalty is lower. It is. > Unfortunately, at the moment I do not have hardware to test... Sadly, the penalty comes from the fact that the lock must take place during the execution of a single instruction. This means that no other operations can overlap that instruction, so the pipelines all empty, the instruction is processed, then the pipelines fill again. This has a huge cost on the p4. DS .