982 Subj : Re: Lock Free -- where to start To : comp.programming.threads From : chris noonan Date : Sun Oct 02 2005 03:52 pm David Schwartz wrote (in various posts): >Blocking on a lock is going to help performance because useful >work will be done until it's possible for the blocked thread >to run without contention. Blocking on a lock is good because it allows >contention to be avoided. >Contention is bad. Lock free algorithms are about maximizing contention If locks are efficient, how is it that many programs on a quad processor machine give more throughput if only one thread is started, as opposed to four? If locks are efficient, why does the technique called I/O Completion Port exist? This thread (in the other sense of the word) has been vague about the mechanism by which contention between threads for a lock can affect performance. Allow me to rectify that: Imagine a uniprocessor machine running 1000 threads (not performing I/O). Contrary to popular belief, performance is not affected in the slightest by this number of threads. One context switch takes place every timeslice (say 20 milliseconds) whether there are two threads or 1000. This frequency (50 Hz) of context switching reduces throughput slightly but not greatly. How do I know? Because the operating system timeslice is determined as a tradeoff between throughput and latency. Now introduce a lock, contended for by the threads, such that the lock is held for 1% of a thread's running time. Every hundredth timeslice, a thread is switched out when it is holding the lock. During the subsequent 20ms timeslice, all the other threads run up against the lock, leading to 999 context switches before the original thread gets the processor again. Accordingly, every hundred timeslices there are 1099 switches instead of 100, an eleven-fold increase. That *is* going to sap performance. That example had threads holding a lock 1% of the time. Things can be much worse. The literature suggests that typical programs may spend 20% of their running time inside the allocator (heap manager) routines. It's lucky there are lock-free allocators available ;) If it seems possible by clever design to reduce the time spent holding a lock to almost nothing, bear in mind that on recent Pentiums just acquiring the lock is going to take over a hundred machine cycles; there is a limit. Chris . 0