Subj : Re: C++ desired features To : comp.lang.c++.moderated,comp.programming.threads From : SenderX Date : Sat Feb 26 2005 12:47 am > In my experience, lock-free solutions are great, especially when > someone else has already dealt with the tricky correctness issues. :) Lock-free programmers can be hit from all sides; The compiler, the CPU, intricate algorithms, portability, ect... I believe the compiler is the worst one of all. Those tricky bastards can reorder part of a "critical-sequence" of instructions and cause a fu@k$ng race-condition that usually sneaks under testing, and hits ten months down the road. Then you find your self scouring through disassembled code only to find that the demon seed compiler reordered one little thing! This is why I code all of my lock-free stuff in externally assembled functions: http://groups-beta.google.com/group/comp.programming.threads/msg/423df394a0370fa6 ;) > In particular, they can safely be used from signal/interrupt handlers, > etc. Yes. That's a very important feature. > But if you look simply at performance, in my experience at least, they > can win or lose, depending on implementation details, hardware > characteristics, etc. For example, I reported a small experiment in > http://portal.acm.org/citation.cfm?doid=1011767.1011774 (or > http://www.hpl.hp.com/techreports/2004/HPL-2004-105.html). Beside the > actual content of the paper, the last section compares a simple > lock-free stack vs. a solution based on simple spin-sleep locks. The > winners are different for a 2xPII/266 and a 4xPPro machine. And that's > pretty similar hardware. (There were also OS differences, but I'm not > sure they matter for this particular comparison. If so, I don't > understand the effect.) > > As many people have pointed out, the real problem with contention is > typically that a critical cache line (either holding the lock, or the > lock-free data structure) is transferred back and forth between > processor caches for every access. That's inherently not cheap. Yes. That can, and will, happen with "loop-based" lock-free algorithms; lock-free stack requires a CAS loop on both the producer and consumer sides. In all of my experiments I find that "loop-less" lock-free solutions can beat basically any lock-based solution you throw at it. For instance, take a simple single-producer/consumer queue. A portable algorithm for this data-structure does not require any mutexs at all. It doesn't require any atomic operations. Instead, it relies on simple loads and stores coupled with producer/consumer-side memory barriers. Since its loop-less, you can't get any CAS failures that need to retry... You might want to take a look at the lock-free queue implementation on my site. I really can't seem to design a lock-based solution that beats it. If you can, please post it and I will put up a link to it on my site. > (Inappropriate lock implementations may of course make it much worse, > as you can also see in the above test.) Indeed. Although, lock-based solutions can be used to help lock-free algorithms out a great deal. Lock-free reads with lock-based writes are a prime example of this. Once I get my library posted, there will be extensive examples of how a marriage between lock-free and lock-based algorithms can be extremely useful... -- http://appcore.home.comcast.net/ (portable lock-free data-structures) [ See http://www.gotw.ca/resources/clcm.htm for info about ] [ comp.lang.c++.moderated. First time posters: Do this! ] .