Subj : Re: CMPXCHG timing To : comp.programming.threads From : David Schwartz Date : Mon Apr 04 2005 12:40 pm "Michael Pryhodko" wrote in message news:1112421997.738275.83390@f14g2000cwb.googlegroups.com... >> I don't see how the redundant store hurts this. If you didn't get >> the >> lock, you have to loop and try again anyway, right? > Right, It does not hurt here. It makes 'unlock' operation impossible. Not impossible. It just means that after you 'unlock', you have to go back and make sure you don't still have the lock, just as you did with the 'lock' operation. >> > Because LOCK is not so cheap on x86. >> The fence is about as expensive as the lock. > LOCK could be much more expensive than fence. LOCK could influence > whole system locking system bus (if mem_addr not in the cache or split > accross cache lines or you have 486 PC), while fence influences only > given processor. Thus fence becomes more and more desirable if you have > increasing number of processors. (Considering that cache coherency > mechanism does not make 'sfence' to execute longer if you have more > processors -- I do not know details about cache coherency mehanism > implementations). First of all, you would always put the lock in its own cache line. A processor would have to own the cache line in order to perform the unlocked compare and exchange anyway. Your argument about a 486PC isn't convincing because the pipeline depth is so much shorter that the LOCK prefix hurts each CPU less. Plus, there aren't massively parallel SMP 486 systems. >> Yes, just not useful because it relies upon so many fragile > assumptions. > :) This algorithm is quite interesting by itself. Maybe the way it is > implemented for x86 is fragile, but there are other platforms that > could provide necessary guarantees and benefit from this algorithm. > Consider my work as research. That may be true, but isn't it obvious by now that x86 is not the platform to test/develop this on? DS .