Subj : Re: CMPXCHG timing
To   : comp.programming.threads
From : David Schwartz
Date : Fri Apr 01 2005 09:26 pm


"Michael Pryhodko" <mpryhodko@westpac.com.au> wrote in message 
news:1112419034.194401.221390@f14g2000cwb.googlegroups.com...

> 1. first processor who finish his 'cmxchg+sfence' will prevent any
> processor which does not started 'cmpxchg' (or to be more precise which
> does not yet FETCH'ed corresponding cache line, see label CS.1 in
> pseudocode) from entering lock (i.e. cmpxchg wil always fail since
> m_lock variable will have non-zero value)

> 2. But since could have a number of otherprocessors that currently in
> process of executing their 'cmxchg+sfence' instructions we need to wait
> for them to finish. After we finished waiting (whether it is Sleep(100)
> or store to dummy variable) we could simply check value of lock to find
> which processor "wins". Because after that point every processor either
> trying to enter the lock or already finished 'store' part of cmpxchg
> and made his changes visible. Winner is the last processor who writes
> to  lock variable. Other will start process from the beginning.

    I don't see how the redundant store hurts this. If you didn't get the 
lock, you have to loop and try again anyway, right?

>>     If you're waiting for all threads to finish anyway, how does the
> extra
>> write hurt you? You wait for all threads to finish and read the value
> of the
>> lock. So what's the problem?
>
> Run test app and you'll see -- redundand store makes it impossible to
> implement lock. Read my corresponding posting about that.

    I have read it but I don't understand it. If you don't get the lock 
(whether due to the redundant store or due to another processor grabbing 
it), you just loop.

>>     And why are you doing all this anyway when it's clear that a
>> simple LOCK
>> prefix will 100% work.

> Because LOCK is not so cheap on x86.

    The fence is about as expensive as the lock.

> Because this is very interesting
> task.

    Yes, just not useful because it relies upon so many fragile assumptions.

> Because I thought it is possible

    Probably possible.

> :(. Basically only two problems
> left:
> 1. implement waiting process (or prove that my variant is ok).

    That is a key problem. You must wait the pipeline depth, which should 
cost as much as a LOCK.

> 2. get around redundant store in cmpxchg on x86.

    I don't see how that hurts you. After all the threads get through, 
either some thread owns the lock or no thread does, shouldn't they all agree 
on which?

    DS

.