Subj : Re: CMPXCHG timing
To   : comp.programming.threads
From : David Schwartz
Date : Mon Apr 04 2005 07:47 pm


"Michael Pryhodko" <mpryhodko@westpac.com.au> wrote in message 
news:1112663578.315153.228560@g14g2000cwa.googlegroups.com...

>>     First of all, you would always put the lock in its own cache
>> line. A
>> processor would have to own the cache line in order to perform the
>> unlocked
>> compare and exchange anyway. Your argument about a 486PC isn't
>> convincing
>> because the pipeline depth is so much shorter that the LOCK prefix
>> hurts
>> each CPU less. Plus, there aren't massively parallel SMP 486 systems.

> 1. If m_lock variable is not cached, processor could (or will?) lock
> system bus.

    I'm not sure what you're talking about here. There are quite a few 
cases. On a modern processor, the system bus is never locked. The cache line 
is acquired and not released for the duration of the LOCKed operation.

> 2. I do not see any connection with pipeline depth, AFAIK 'sfence' and
> 'LOCK' does not invalidate pipeline.

    They do. The cost of a fence or LOCK is controlled by the pipeline 
depth. For example, a store fence requires stores to be classified as either 
"before" or "after" the fence. This requires the fence to be a specific 
time, not a different time in each of various pipelines.

> What do you mean by "owning" cache line? if you mean LOCK'ing it (in
> order to avoid bus lock) -- it is not true, because only XCHG
> implicitly locks, not CMPXCHG.

    Whether or not you lock the compare/exchange, the processor must acquire 
the cache line before it can do anything. And whether or not you lock it, 
the bus will not be locked, only the cache line might be. Assuming the 
locked variable is in its own cache line (which is the only sensible way to 
do it), the cost the LOCK prefix is due to pipeline issues, same as for the 
fence.

    DS

.