Subj : Re: CMPXCHG timing To : comp.programming.threads From : David Schwartz Date : Fri Apr 01 2005 09:26 pm "Michael Pryhodko" wrote in message news:1112419034.194401.221390@f14g2000cwb.googlegroups.com... > 1. first processor who finish his 'cmxchg+sfence' will prevent any > processor which does not started 'cmpxchg' (or to be more precise which > does not yet FETCH'ed corresponding cache line, see label CS.1 in > pseudocode) from entering lock (i.e. cmpxchg wil always fail since > m_lock variable will have non-zero value) > 2. But since could have a number of otherprocessors that currently in > process of executing their 'cmxchg+sfence' instructions we need to wait > for them to finish. After we finished waiting (whether it is Sleep(100) > or store to dummy variable) we could simply check value of lock to find > which processor "wins". Because after that point every processor either > trying to enter the lock or already finished 'store' part of cmpxchg > and made his changes visible. Winner is the last processor who writes > to lock variable. Other will start process from the beginning. I don't see how the redundant store hurts this. If you didn't get the lock, you have to loop and try again anyway, right? >> If you're waiting for all threads to finish anyway, how does the > extra >> write hurt you? You wait for all threads to finish and read the value > of the >> lock. So what's the problem? > > Run test app and you'll see -- redundand store makes it impossible to > implement lock. Read my corresponding posting about that. I have read it but I don't understand it. If you don't get the lock (whether due to the redundant store or due to another processor grabbing it), you just loop. >> And why are you doing all this anyway when it's clear that a >> simple LOCK >> prefix will 100% work. > Because LOCK is not so cheap on x86. The fence is about as expensive as the lock. > Because this is very interesting > task. Yes, just not useful because it relies upon so many fragile assumptions. > Because I thought it is possible Probably possible. > :(. Basically only two problems > left: > 1. implement waiting process (or prove that my variant is ok). That is a key problem. You must wait the pipeline depth, which should cost as much as a LOCK. > 2. get around redundant store in cmpxchg on x86. I don't see how that hurts you. After all the threads get through, either some thread owns the lock or no thread does, shouldn't they all agree on which? DS .