Subj : Re: CMPXCHG timing To : comp.programming.threads From : Michael Pryhodko Date : Mon Apr 04 2005 08:11 pm > > 1. If m_lock variable is not cached, processor could (or will?) lock > > system bus. > > I'm not sure what you're talking about here. There are quite a few > cases. On a modern processor, the system bus is never locked. The cache line > is acquired and not released for the duration of the LOCKed operation. >From IA-32 Manual Vol 3, 7.1.4 : "For the Intel486 and *!*Pentium*!* processors, the LOCK# signal is always asserted on the bus during a LOCK operation, even if the area of memory being locked is cached in the processor. For the Pentium 4, Intel Xeon, and P6 family processors, if the area of memory being locked during a LOCK operation is *!*cached*!* in the processor that is performing the LOCK operation as write-back memory and is *!*completely contained in a cache line*!*, the processor *!*may not*!* assert the LOCK# signal on the bus. Instead, it will modify the memory location internally and allow it's cache coherency mechanism to insure that the operation is carried out atomically. This operation is called "cache locking." The cache coherency mechanism automatically prevents two or more processors that have cached the same area of memory from simultaneously modifying data in that area." i.e.: 1. not only 486 PC always locks bus :) 2. operand should be cached and should not be split accross cache lines 3. and even if all these conditions are true, processor MAY not lock bus (i.e. Intel does not give guarantee) > > 2. I do not see any connection with pipeline depth, AFAIK 'sfence' and > > 'LOCK' does not invalidate pipeline. > > They do. The cost of a fence or LOCK is controlled by the pipeline > depth. For example, a store fence requires stores to be classified as either > "before" or "after" the fence. This requires the fence to be a specific > time, not a different time in each of various pipelines. Hmm... Wait a second, I thought that sfence is placed on pipeline just like any other instruction and when it is retired -- it simply flushes store buffers (plus maybe something to do with cache coherency mechanism). In that case if anything lies on pipeline behind sfence -- it will be there, nobody will remove it. Or maybe I am wrong and it is processed completely in another way, for example: whenever sfence fetched from memory, pipeline is flushed, store buffers flushed, sfence immediately retired and continue as usual. ? > > What do you mean by "owning" cache line? if you mean LOCK'ing it (in > > order to avoid bus lock) -- it is not true, because only XCHG > > implicitly locks, not CMPXCHG. > > Whether or not you lock the compare/exchange, the processor must acquire > the cache line before it can do anything. And whether or not you lock it, > the bus will not be locked, only the cache line might be. Assuming the > locked variable is in its own cache line (which is the only sensible way to > do it), the cost the LOCK prefix is due to pipeline issues, same as for the > fence. I agree that from the point of view of ONE GIVEN processor cost of LOCK could be similar to cost of a fence, But for whole system -- I think fence is cheaper. Run test app I posted in response to Chris. I was surprised by results :). Bye. Sincerely yours, Michael. .