Subj : Re: Lock Free -- where to start
To   : comp.programming.threads
From : David Schwartz
Date : Sun Oct 02 2005 07:31 pm


"chris noonan" <usenet@leapheap.co.uk> wrote in message 
news:1128289952.883814.291520@o13g2000cwo.googlegroups.com...

> David Schwartz wrote (in various posts):

>>Blocking on a lock is going to help performance because useful
>>work will be done until it's possible for the blocked thread
>>to run without contention. Blocking on a lock is good because it allows
>>contention to be avoided.

>>Contention is bad. Lock free algorithms are about maximizing contention

> If locks are efficient, how is it that many programs on a quad
> processor machine give more throughput if only one thread is
> started, as opposed to four?

    Probably because the threads are fighting each other. If they used locks 
more, then having four threads started would be the same as having one 
thread started and they would give more performance. ;)

> If locks are efficient, why does the technique called I/O Completion
> Port exist?

    I don't understand what one has to do with the other. IOCP is about 
optimizing the number of concurrent threads without the risk of thread 
starvation.

> This thread (in the other sense of the word) has been vague
> about the mechanism by which contention between threads
> for a lock can affect performance. Allow me to rectify
> that:

> Imagine a uniprocessor machine running 1000 threads (not
> performing I/O). Contrary to popular belief, performance
> is not affected in the slightest by this number of threads.
> One context switch takes place every timeslice (say 20
> milliseconds) whether there are two threads or 1000.
> This frequency (50 Hz) of context switching reduces
> throughput slightly but not greatly. How do I know?
> Because the operating system timeslice is determined
> as a tradeoff between throughput and latency.

    Exactly. Context switches are only expensive if you somehow force a 
large number of them.

> Now introduce a lock, contended for by the threads,
> such that the lock is held for 1% of a thread's
> running time. Every hundredth timeslice, a thread
> is switched out when it is holding the lock. During
> the subsequent 20ms timeslice, all the other threads
> run up against the lock, leading to 999 context
> switches before the original thread gets the processor
> again. Accordingly, every hundred timeslices there are
> 1099 switches instead of 100, an eleven-fold increase.
> That *is* going to sap performance.

    Umm, no. Some thread will get the lock and unless the thread library is 
poorly designed, it will be allowed to do useful work at full speed without 
context switches. If other threads want the lock, they'll have to wait.

> That example had threads holding a lock 1% of the
> time. Things can be much worse. The literature suggests
> that typical programs may spend 20% of their running
> time inside the allocator (heap manager) routines.
> It's lucky there are lock-free allocators available ;)

    Yes, as I pointed out, the allocator is one place where lock free 
algorithms can be a big win because there may not be anything else to do.

> If it seems possible by clever design to reduce the time
> spent holding a lock to almost nothing, bear in mind
> that on recent Pentiums just acquiring the lock is going
> to take over a hundred machine cycles; there is a
> limit.

    The lock-free algorithms generally increase the number of such expensive 
operations, and they add extra cache misses.

    DS

.