Subj : Re: CMPXCHG timing To : comp.programming.threads From : Michael Pryhodko Date : Sat Apr 02 2005 02:46 am > It's hard to figure out sometimes what code is supposed to do even when > it does work. Agreed. Also it seems that I am very bad teacher since quite often I feel troubles making other undestand smth. :) > On the basis that you're trying to implement a "lock" and seem to not > want to use the interlocked hardware primatives, I would guess at this > point you are trying to implement a lock using a distributed algorithm. Yes, however I do not know what do you mean by "distributed algorithm". I do not algorithms classification. Specifically I tried to create such algorithm for x86 using its features (acquire sematic of read, release -- of load, writes fron one proc observed in the program order by others and so on) but also I found "formalization" of this algorithm and necessary guarantess to implement it on any platform that could provide such guarantees. > Distributed algorithm is a formal term and is where threads read other > thread's storage but not write to it. Known distributed algorithms for > locking include Dekker's algorithm and Peterson's algorithm. > > You should study those algorithms first and learn how they work. Certainly, remember you advised me a while ago to look at them? This was a time when I started working on that algorithm. I stidied these algorithms well, I found their generalization for N threads pretty ineffective and started working on my own algorithm. > Note > that any write up on them will assume a totally ordered memory model > and you'll have to use the proper memory barriers in any implementations > you do. Certainly, I understand different memory models complications well, thanks to Alexander Terekhov and WRL-95-9.pdf :). > Distributed algorithms aren't generally used as they don't perform as well > as the hardware primatives that you are trying to avoid. The only > place they're used is on very large systems with lot's of processors > where the hardware primatives don't work, don't scale, or don't exist. Well, I think this algorithm will perform very well even on one-processor system. :) About algorithm description -- in one post I described it in very simple way (similar to description of Dekker's algorithm). There is no reason to repeat it. I could describe it in more simple terms (e.g. putting bananas in holes and so on :)) if you'll ask me. But what I really want is a: - specific questions - some advise in areas where "I am not sure", such as "wait" part of algorithm and "cmpxchg redundant store" problem Bye. Sincerely yours, Michael. .