Subj : Re: Lock Free -- where to start
To   : comp.programming.threads
From : David Schwartz
Date : Fri Oct 14 2005 01:41 pm


"chris noonan" <usenet@leapheap.co.uk> wrote in message 
news:1129307992.361217.132980@g14g2000cwa.googlegroups.com...

> David Schwartz wrote:

>>     The optimum number of concurrent threads is one per processor if the
>> threads are CPU bound. I still don't understand what you think the
>> connection between IOCP and lock efficiency is. Maybe it's obvious to 
>> you,
>> but I don't see it. I'm genuinely curious.

> I agree in principle about the optimum number of threads. However
> when there is no synchronisation between the threads the
> performance loss suffered by having a large number of threads
> is so tiny as not to be measurable.

    True, but there's still a loss of control. What you are doing is 
transferring to the scheduler the job of deciding what work should be done. 
You really want to keep that under application control.

> My IOCP point was more
> philosophical than technical: a "one thread good, many
> threads bad" mantra has evolved through experience of
> operating systems that take so many locks that multi-
> threaded performance dives (that is my guess, I can't
> be sure). IOCP is a means of reducing the number of
> threads running, at the cost of software complexity.

    IOCP doesn't increase software complexity. In fact, in many cases it 
reduces it because the implementation of a thread pool is simpler. Perhaps 
you're subconsciously thinking something like "I have a library that does 
I/O and thread pools and if I want IOCP, I'd have to code it myself". Well, 
once you have a library that does IOCP thread pools and socket I/O, you have 
it forever.

>> In any event, if you don't have 1000 CPUs, it's pretty dumb for you to
>> have 1000 ready-to-run threads. If you do have 1000 CPUs, it's really 
>> just
>> the time it takes to do *one* context switch.

> It's dumb to run 1000 threads if you expect performance
> deterioration. If I were programming a thousand-client-server
> under Microsoft Windows, I'd use IOCP. What I am interested
> in is the reason for the performance deterioration.

    It is, of course, a lot of things, and performance deterioration isn't 
the only issue. For one thing, many schedulers don't handle large numbers of 
ready-to-run threads well. If you code that performs well on all platforms, 
you can't rely on the scheduler having particular optimizations.

    As I understand it, the main problem with the "use 1,000 threads to do 
1,000 jobs" approach is that each job will likely be very small, and thus 
you will need 1,000 context switches to get 1,000 jobs done. If you have 
1,000 large jobs (that will take many timeslices each to complete), the 
penalty for having 1,000 threads doing them is much less.

    Consider, for example, a chat server. We have a message we need to send 
to 1,000 clients. If each client has its own thread, we'll need 1,000 
context switches in rapid succession to get all the data sent. If we use a 
thread pool, we'll wind up with one thread on each CPU that will run until 
all 1,000 messages are sent or they use up a full timeslice. As a result, 
we'll need maybe 3 or 4 context switches to get the messages sent instead of 
1,000.

    DS

.