Subj : Re: CMPXCHG timing
To   : comp.programming.threads
From : Michael Pryhodko
Date : Sat Apr 02 2005 02:46 am

> It's hard to figure out sometimes what code is supposed to do even
when
> it does work.

Agreed. Also it seems that I am very bad teacher since quite often I
feel troubles making other undestand smth. :)


> On the basis that you're trying to implement a "lock" and seem to not
> want to use the interlocked hardware primatives, I would guess at
this
> point you are trying to implement a lock using a distributed
algorithm.

Yes, however I do not know what do you mean by "distributed algorithm".
I do not algorithms classification. Specifically I tried to create such
algorithm for x86 using its features (acquire sematic of read, release
-- of load, writes fron one proc observed in the program order by
others and so on) but also I found "formalization" of this algorithm
and necessary guarantess to implement it on any platform that could
provide such guarantees.


> Distributed algorithm is a formal term and is where threads read
other
> thread's storage but not write to it.  Known distributed algorithms
for
> locking include Dekker's algorithm and Peterson's algorithm.
>
> You should study those algorithms first and learn how they work.

Certainly, remember you advised me a while ago to look at them? This
was a time when I started working on that algorithm. I stidied these
algorithms well, I found their generalization for N threads pretty
ineffective and started working on my own algorithm.


> Note
> that any write up on them will assume a totally ordered memory model
> and you'll have to use the proper memory barriers in any
implementations
> you do.

Certainly, I understand different memory models complications well,
thanks to Alexander Terekhov and WRL-95-9.pdf :).


> Distributed algorithms aren't generally used as they don't perform as
well
> as the hardware primatives that you are trying to avoid.  The only
> place they're used is on very large systems with lot's of processors
> where the hardware primatives don't work, don't scale, or don't
exist.

Well, I think this algorithm will perform very well even on
one-processor system. :)


About algorithm description -- in one post I described it in very
simple way (similar to description of Dekker's algorithm). There is no
reason to repeat it. I could describe it in more simple terms (e.g.
putting bananas in holes and so on :)) if you'll ask me. But what I
really want is a:
- specific questions
- some advise in areas where "I am not sure", such as "wait" part of
algorithm and "cmpxchg redundant store" problem


Bye.
Sincerely yours, Michael.

.