Subj : Re: Lock Free -- where to start To : comp.programming.threads From : Chris Thomasson Date : Wed Sep 28 2005 04:15 pm > Let me just give you my general caveat about lock free code. First, > lock free code is generally non-portable and extremely complex. Such code > can easily contain bugs that are extremely difficult even for experts to > spot. The complexity is almost always unnecessary. The complexity can be hidden behind a simple and portable API, that anybody can use. Like the pthreads api. We all know that a particular pthread implementation may be very complex, and may very well use some lock-free algorithms. But, does the average user actually care about this kind of stuff? My point is, again "The complexity can be hidden behind a simple API". Really. Would there be bugs... Well, maybe; it depends on the implementer. There may be very subtle bugs in your pthread implementation as well... > But worse, in most realistic cases, lock free tremendously *reduce* > performance. The myth is that locks are expensive. Humm... Well, it really depends on the type of lock-free algorithm you compare to locks. I would agree that "CAS-loop" based lock-free algorithms can thrash the cache... http://groups.google.com/group/comp.lang.c++.moderated/msg/6b2ccf76ba145c9c?hl=en http://www.hpl.hp.com/techreports/2004/HPL-2004-105.html however: Contended locks can adversely affect performance and simply prevent scalability in some situations: Take a global read-write lock protecting a "read-mostly" data-structure. Under heavy read contention, forward progress can be stalled waiting for the updates to the internal lock state to be "shuttled around the CPU's caches". If the size of the read-side critical section is moderate, the cost of moving lock updates from one CPU's cache to another can actually rival that of the critical section itself; This can prevent readers from truly executing in parallel. The more processors you add, the worse it gets. This is not true for some special high-performance reader/writer solutions... > The truth is that *contention* is expensive. Lock free algorithms allow > code that is contending to run in parallel. Locks allow other code to run > that probably won't contend. Think of a portable algorithm that can totally avoid contention for very high numbers of "reader threads"; NO atomic operations and NO memory barriers required... How can a mutex compete with that? Think frequent iteration over large database-like "read-mostly" data-structures, before you answer. > For example, consider three threads, A, B, and C. A and B both want to > manipulate a queue, C wants to do something else entirely. Assume two > CPUs. With a lock, either A or B will grab the queue lock. A or B will run > with C and both threads will run at full speed with minimal contention. > With a lock-free queue, A can run concurrently with B. C will have to > wait, while A and B fight over the cache lines that contain the queues and > its structures. IMHO, this is an example of poor application engineering/design. I would have C queue a request and wait for something else to do... Then A or B could do the work; after all, they do sound like they are "worker threads" of some sort... ;) > Lock free algorithms work great in kernels and low level threading > libraries for a variety of reasons. Portability is not that important. Lock-free algorithms can "easily" be hidden behind a portable API. > But for application code, or even most libraries on top of threading > libraries, it's a pure loss. Application code would not use raw lock-free structures directly. They would use a library that exports a "portable API". > , it's a pure loss. Yeah... The ability for read-only iterations (frequent database searches?) to achieve a substantial increase in the number of reads per-second, per-reader-thread, would most certainly be a loss... ;) .