Subj : Re: CMPXCHG timing To : comp.programming.threads From : Michael Pryhodko Date : Fri Apr 01 2005 09:17 pm > I'm utterly baffled how your algorithm was supposed to work. What keeps > all the processors from reading the zero value and all deciding to exchange > it with the Pi value? Since the operation is not atomic, it's equivalent to > 'read, compare, possibly store'. What stops all the processors from reading > at the same time, comparing at the same time, and then storing in > succession, each thinking it wrote the unique value? Each succeeding with > its compare/exchange. Nothing stops. But there are some tricks: 1. first processor who finish his 'cmxchg+sfence' will prevent any processor which does not started 'cmpxchg' (or to be more precise which does not yet FETCH'ed corresponding cache line, see label CS.1 in pseudocode) from entering lock (i.e. cmpxchg wil always fail since m_lock variable will have non-zero value) 2. But since could have a number of otherprocessors that currently in process of executing their 'cmxchg+sfence' instructions we need to wait for them to finish. After we finished waiting (whether it is Sleep(100) or store to dummy variable) we could simply check value of lock to find which processor "wins". Because after that point every processor either trying to enter the lock or already finished 'store' part of cmpxchg and made his changes visible. Winner is the last processor who writes to lock variable. Other will start process from the beginning. > If you're waiting for all threads to finish anyway, how does the extra > write hurt you? You wait for all threads to finish and read the value of the > lock. So what's the problem? Run test app and you'll see -- redundand store makes it impossible to implement lock. Read my corresponding posting about that. > And why are you doing all this anyway when it's clear that a simple LOCK > prefix will 100% work. Because LOCK is not so cheap on x86. Because this is very interesting task. Because I thought it is possible :(. Basically only two problems left: 1. implement waiting process (or prove that my variant is ok). 2. get around redundant store in cmpxchg on x86. Bye. Sincerely yours, Michael. .