982
Subj : Re: Lock Free -- where to start
To   : comp.programming.threads
From : chris noonan
Date : Sun Oct 02 2005 03:52 pm

David Schwartz wrote (in various posts):

>Blocking on a lock is going to help performance because useful
>work will be done until it's possible for the blocked thread
>to run without contention. Blocking on a lock is good because it allows
>contention to be avoided.

>Contention is bad. Lock free algorithms are about maximizing contention

If locks are efficient, how is it that many programs on a quad
processor machine give more throughput if only one thread is
started, as opposed to four?

If locks are efficient, why does the technique called I/O Completion
Port exist?

This thread (in the other sense of the word) has been vague
about the mechanism by which contention between threads
for a lock can affect performance. Allow me to rectify
that:

Imagine a uniprocessor machine running 1000 threads (not
performing I/O). Contrary to popular belief, performance
is not affected in the slightest by this number of threads.
One context switch takes place every timeslice (say 20
milliseconds) whether there are two threads or 1000.
This frequency (50 Hz) of context switching reduces
throughput slightly but not greatly. How do I know?
Because the operating system timeslice is determined
as a tradeoff between throughput and latency.

Now introduce a lock, contended for by the threads,
such that the lock is held for 1% of a thread's
running time. Every hundredth timeslice, a thread
is switched out when it is holding the lock. During
the subsequent 20ms timeslice, all the other threads
run up against the lock, leading to 999 context
switches before the original thread gets the processor
again. Accordingly, every hundred timeslices there are
1099 switches instead of 100, an eleven-fold increase.
That *is* going to sap performance.

That example had threads holding a lock 1% of the
time. Things can be much worse. The literature suggests
that typical programs may spend 20% of their running
time inside the allocator (heap manager) routines.
It's lucky there are lock-free allocators available ;)

If it seems possible by clever design to reduce the time
spent holding a lock to almost nothing, bear in mind
that on recent Pentiums just acquiring the lock is going
to take over a hundred machine cycles; there is a
limit.

Chris

.

0