Subj : Re: CMPXCHG timing
To   : comp.programming.threads
From : Chris Thomasson
Date : Mon Apr 04 2005 03:12 pm

> Somehow I could get to the "we should not be here line" on
> my P4 2.8GHz HT -- which is totally confusing to me, only current
> thread ever writes this specific unique id into m_lock variable and
> since we are at the beginning of lock() -- m_lock should be !=
> curr_thread_id. It look like I am blind.

<snip>

I haven't totally studied your idea, but from what I can immediately gather 
it seems like your going for a simple spinlock without using LOCK prefix. Is 
that about it?

Your lock function (x86) seems to have:

1.  (store/load)

loop:
2.  non-atomic cas to shared
3.  (store/store)
4.  if ( cas_failed )
    {
      // slow path 1
5.    ( delay )
6.    goto loop
    }

    // fast path 1
7.  non-atomic cas to local
8.  (store/load)
9.  if ( we_do_not_own )
    {
      // slow path 2
5.    ( delay )
6.    goto loop
    }

    // fast path 2
7.  done



And your unlock function has:

store 0 into shared
( store/store )




So, lock function has 3 barriers ( two of which are expensive store/load's ) 
and 2 non-atomic CAS ( one is to shared mem ) on the fast-path, and the 
unlock function has one store to shared and 1 barrier on the fast-path. Are 
you sure this is vastly more efficient than just using the simple LOCK 
prefix? I have never tried messing around with cmpxchg on SMP/HT system 
without using the lock prefix...

.