Subj : Re: Lock Free -- where to start
To   : comp.programming.threads
From : Chris Thomasson
Date : Wed Sep 28 2005 04:15 pm

>    Let me just give you my general caveat about lock free code. First, 
> lock free code is generally non-portable and extremely complex. Such code 
> can easily contain bugs that are extremely difficult even for experts to 
> spot. The complexity is almost always unnecessary.

The complexity can be hidden behind a simple and portable API, that anybody 
can use. Like the pthreads api. We all know that a particular pthread 
implementation may be very complex, and may very well use some lock-free 
algorithms. But, does the average user actually care about this kind of 
stuff? My point is, again "The complexity can be hidden behind a simple 
API". Really.

Would there be bugs... Well, maybe; it depends on the implementer. There may 
be very subtle bugs in your pthread implementation as well...




>    But worse, in most realistic cases, lock free tremendously *reduce* 
> performance. The myth is that locks are expensive.

Humm... Well, it really depends on the type of lock-free algorithm you 
compare to locks. I would agree that "CAS-loop" based lock-free algorithms 
can thrash the cache...

http://groups.google.com/group/comp.lang.c++.moderated/msg/6b2ccf76ba145c9c?hl=en
http://www.hpl.hp.com/techreports/2004/HPL-2004-105.html




however:

Contended locks can adversely affect performance and simply prevent 
scalability in some situations:

Take a global read-write lock protecting a "read-mostly" data-structure. 
Under heavy read contention, forward progress can be stalled waiting for the 
updates to the internal lock state to be "shuttled around the CPU's caches". 
If the size of the read-side critical section is moderate, the cost of 
moving lock updates from one CPU's cache to another can actually rival that 
of the critical section itself; This can prevent readers from truly 
executing in parallel. The more processors you add, the worse it gets. This 
is not true for some special high-performance reader/writer solutions...




> The truth is that *contention* is expensive. Lock free algorithms allow 
> code that is contending to run in parallel. Locks allow other code to run 
> that probably won't contend.

Think of a portable algorithm that can totally avoid contention for very 
high numbers of "reader threads"; NO atomic operations and NO memory 
barriers required...

How can a mutex compete with that?

Think frequent iteration over large database-like "read-mostly" 
data-structures, before you answer.




>    For example, consider three threads, A, B, and C. A and B both want to 
> manipulate a queue, C wants to do something else entirely. Assume two 
> CPUs. With a lock, either A or B will grab the queue lock. A or B will run 
> with C and both threads will run at full speed with minimal contention. 
> With a lock-free queue, A can run concurrently with B. C will have to 
> wait, while A and B fight over the cache lines that contain the queues and 
> its structures.

IMHO, this is an example of poor application engineering/design. I would 
have C queue a request and wait for something else to do... Then A or B 
could do the work; after all, they do sound like they are "worker threads" 
of some sort...

;)




>    Lock free algorithms work great in kernels and low level threading 
> libraries for a variety of reasons. Portability is not that important.

Lock-free algorithms can "easily" be hidden behind a portable API.




> But for application code, or even most libraries on top of threading 
> libraries, it's a pure loss.

Application code would not use raw lock-free structures directly. They would 
use a library that exports a "portable API".




> , it's a pure loss.

Yeah...

The ability for read-only iterations (frequent database searches?) to 
achieve a substantial increase in the number of reads per-second, 
per-reader-thread, would most certainly be a loss...

;)

.