Subj : Re: CMPXCHG timing To : comp.programming.threads From : Chris Thomasson Date : Mon Apr 04 2005 03:12 pm > Somehow I could get to the "we should not be here line" on > my P4 2.8GHz HT -- which is totally confusing to me, only current > thread ever writes this specific unique id into m_lock variable and > since we are at the beginning of lock() -- m_lock should be != > curr_thread_id. It look like I am blind. I haven't totally studied your idea, but from what I can immediately gather it seems like your going for a simple spinlock without using LOCK prefix. Is that about it? Your lock function (x86) seems to have: 1. (store/load) loop: 2. non-atomic cas to shared 3. (store/store) 4. if ( cas_failed ) { // slow path 1 5. ( delay ) 6. goto loop } // fast path 1 7. non-atomic cas to local 8. (store/load) 9. if ( we_do_not_own ) { // slow path 2 5. ( delay ) 6. goto loop } // fast path 2 7. done And your unlock function has: store 0 into shared ( store/store ) So, lock function has 3 barriers ( two of which are expensive store/load's ) and 2 non-atomic CAS ( one is to shared mem ) on the fast-path, and the unlock function has one store to shared and 1 barrier on the fast-path. Are you sure this is vastly more efficient than just using the simple LOCK prefix? I have never tried messing around with cmpxchg on SMP/HT system without using the lock prefix... .