Subj : Re: CMPXCHG timing
To   : comp.programming.threads
From : Michael Pryhodko
Date : Mon Apr 04 2005 08:11 pm

> > 1. If m_lock variable is not cached, processor could (or will?)
lock
> > system bus.
>
>     I'm not sure what you're talking about here. There are quite a
few
> cases. On a modern processor, the system bus is never locked. The
cache line
> is acquired and not released for the duration of the LOCKed
operation.

>From IA-32 Manual Vol 3, 7.1.4 :
"For the Intel486 and *!*Pentium*!* processors, the LOCK# signal is
always asserted on the bus during a LOCK operation, even if the area of
memory being locked is cached in the processor.
For the Pentium 4, Intel Xeon, and P6 family processors, if the area of
memory being locked during a LOCK operation is *!*cached*!* in the
processor that is performing the LOCK operation as write-back memory
and is *!*completely contained in a cache line*!*, the processor *!*may
not*!* assert the LOCK# signal on the bus. Instead, it will modify the
memory location internally and allow it's cache coherency mechanism
to insure that the operation is carried out atomically. This operation
is called "cache locking." The cache coherency mechanism
automatically prevents two or more processors that have cached the same
area of memory from simultaneously modifying data in that area."

i.e.:
1. not only 486 PC always locks bus :)
2. operand should be cached and should not be split accross cache lines
3. and even if all these conditions are true, processor MAY not lock
bus (i.e. Intel does not give guarantee)


> > 2. I do not see any connection with pipeline depth, AFAIK 'sfence'
and
> > 'LOCK' does not invalidate pipeline.
>
>     They do. The cost of a fence or LOCK is controlled by the
pipeline
> depth. For example, a store fence requires stores to be classified as
either
> "before" or "after" the fence. This requires the fence to be a
specific
> time, not a different time in each of various pipelines.

Hmm... Wait a second, I thought that sfence is placed on pipeline just
like any other instruction and when it is retired -- it simply flushes
store buffers (plus maybe something to do with cache coherency
mechanism). In that case if anything lies on pipeline behind sfence --
it will be there, nobody will remove it. Or maybe I am wrong and it is
processed completely in another way, for example:
whenever sfence fetched from memory, pipeline is flushed, store buffers
flushed, sfence immediately retired and continue as usual.
?


> > What do you mean by "owning" cache line? if you mean LOCK'ing it
(in
> > order to avoid bus lock) -- it is not true, because only XCHG
> > implicitly locks, not CMPXCHG.
>
>     Whether or not you lock the compare/exchange, the processor must
acquire
> the cache line before it can do anything. And whether or not you lock
it,
> the bus will not be locked, only the cache line might be. Assuming
the
> locked variable is in its own cache line (which is the only sensible
way to
> do it), the cost the LOCK prefix is due to pipeline issues, same as
for the
> fence.

I agree that from the point of view of ONE GIVEN processor cost of LOCK
could be similar to cost of a fence, But for whole system -- I think
fence is cheaper. Run test app I posted in response to Chris. I was
surprised by results :).

Bye.
Sincerely yours, Michael.

.