Subj : Re: CMPXCHG timing To : comp.programming.threads From : Michael Pryhodko Date : Fri Apr 01 2005 07:02 am > > unlock() > > { > > // sanity check > > ASSERT(A == P); > > > You need the sfence here for release semantics. Strictly speaking > it has to be mfence (store/store + store/load) Why? According to IA Manual stores are observed in program order. You disagree? > > // mark flag 'free' > > A = 0; > > > > // if OS will interrupt here this will drain store buffers > > // (i.e. it will do the same as 'sfence' call below) > > > > // make it visible to other threads > > // we could skip this instruction but it will speed up lock release > > // thus increasing overall performance in high-contention case > > sfence > Um, no. Memory barriers doesn't make memory visiblity faster for coherent > cache memory. The opposite actually. That contradicts everything I learned so far. Could you prove it? > I'm not sure why you're concerned with timings. The membars don't do > what you think they do. And if you try to force specific timing > behavior you are going to slow things down considerably which is > probably not what you want. > > What are you trying to do here? I am trying to create cheapest possible "lock". Main idea (suppose there is no store buffers and OS cannot interrupt): rule1. all processors perform at the same speed (this is very important) and can not be stopped/slowed rule2. suppose that store is visible immediately rule3. memory model is SC (sequential consistency) rule4. every thread has his own unique number Pi now consider simple program: // try to lock enter: if (lock == 0) lock = Pi else goto enter // now every thread that will try to lock will fail // but we could have other threads that passed 'if (lock == 0)' check // so we will use rule1. and just wait until all these threads finish their 'lock = Pi' command // In my case I do it by running 'store' to dummy variable dummy = 5 // now we have guarantee that every such thread finished its 'store' // so we just check if we were the "last" one to write to lock if (lock != Pi) goto enter Unlock is simple: lock = 0 Now you should understand... Everything else is to fool OS, store buffer and so on. Unfortunately cmpxchg has redundant store which kills this whole idea :((. And I can not replace cmpxchg with if (lock == 0) lock = Pi because OS could interrupt in between these lines and I did not found a solution how to: - prevent it or - design a lock which will be immune to that effect Bye. Sincerely yours, Michael. .