Subj : Re: C++ desired features
To   : comp.lang.c++.moderated,comp.programming.threads
From : SenderX
Date : Sat Feb 26 2005 12:47 am

> In my experience, lock-free solutions are great, especially when
> someone else has already dealt with the tricky correctness issues.

:)

Lock-free programmers can be hit from all sides; The compiler, the CPU,
intricate algorithms, portability, ect... I believe the compiler is the
worst one of all. Those tricky bastards can reorder part of a
"critical-sequence" of instructions and cause a fu@k$ng race-condition that
usually sneaks under testing, and hits ten months down the road. Then you
find your self scouring through disassembled code only to find that the
demon seed compiler reordered one little thing! This is why I code all of my
lock-free stuff in externally assembled functions:

http://groups-beta.google.com/group/comp.programming.threads/msg/423df394a0370fa6

;)




> In particular, they can safely be used from signal/interrupt handlers,
> etc.

Yes. That's a very important feature.




> But if you look simply at performance, in my experience at least, they
> can win or lose, depending on implementation details, hardware
> characteristics, etc.  For example, I reported a small experiment in
> http://portal.acm.org/citation.cfm?doid=1011767.1011774 (or
> http://www.hpl.hp.com/techreports/2004/HPL-2004-105.html).  Beside the
> actual content of the paper, the last section compares a simple
> lock-free stack vs. a solution based on simple spin-sleep locks.  The
> winners are different for a 2xPII/266 and a 4xPPro machine.  And that's
> pretty similar hardware.  (There were also OS differences, but I'm not
> sure they matter for this particular comparison.  If so, I don't
> understand the effect.)
>
> As many people have pointed out, the real problem with contention is
> typically that a critical cache line (either holding the lock, or the
> lock-free data structure) is transferred back and forth between
> processor caches for every access.  That's inherently not cheap.

Yes. That can, and will, happen with "loop-based" lock-free algorithms;
lock-free stack requires a CAS loop on both the producer and consumer sides.
In all of my experiments I find that "loop-less" lock-free solutions can
beat basically any lock-based solution you throw at it. For instance, take a
simple single-producer/consumer queue. A portable algorithm for this
data-structure does not require any mutexs at all. It doesn't require any
atomic operations. Instead, it relies on simple loads and stores coupled
with producer/consumer-side memory barriers. Since its loop-less, you can't
get any CAS failures that need to retry...

You might want to take a look at the lock-free queue implementation on my
site. I really can't seem to design a lock-based solution that beats it. If
you can, please post it and I will put up a link to it on my site.




> (Inappropriate lock implementations may of course make it much worse,
> as you can also see in the above test.)

Indeed. Although, lock-based solutions can be used to help lock-free
algorithms out a great deal. Lock-free reads with lock-based writes are a
prime example of this. Once I get my library posted, there will be extensive
examples of how a marriage between lock-free and lock-based algorithms can
be extremely useful...


-- 
http://appcore.home.comcast.net/
(portable lock-free data-structures)



      [ See http://www.gotw.ca/resources/clcm.htm for info about ]
      [ comp.lang.c++.moderated.    First time posters: Do this! ]

.