Subj : Re: A multithreading benchmark
To   : comp.programming.threads
From : doug
Date : Wed Jun 08 2005 09:50 am


"Uenal Mutlu" <520001085531-0001@t-online.de> wrote in message 
news:d85dgl$n4g$00$1@news.t-online.com...
> "doug" <noone@nowhere.co.uk> wrote
>> "Uenal Mutlu" wrote
>> > "chris noonan" <usenet@leapheap.co.uk> wrote
>> >> I am designing a benchmark program for investigating
>> >> multithreaded performance. I will use the program
>> >> for comparing various heap managers, but it is bound
>> >> to be useful for other purposes.
>> >>
>> >> These are the design criteria I have so far:
>> >>
>> >> 1. The benchmark will be semi-synthetic, in that
>> >> it will be constructed for the stated purpose,
>> >> but will perform some useful task, and could be
>> >> mistaken for a 'real' application.
>> >>
>> >> 2. The program will take the server role of a
>> >> client-server system, and will run on a
>> >> thread-per-connection basis.
>> >>
>> >> 3. There will be a fixed amount of work, the
>> >> number of threads will be parameterisable for
>> >> each run of the benchmark, and the workload
>> >> will be divided between however many threads
>> >> there are.
>> >>
>> >> 4. The program will be coded in C++ in natural
>> >> style.
>> >>
>> >> 5. The dynamic memory profile of the running
>> >> benchmark will conform as far as possible to
>> >> empirical measurements of average allocation
>> >> size, duration etc. to be found in the
>> >> literature.
>> >>
>> >> Anything else worth considering?
>> >>
>> >> Initially the program will be targeted to
>> >> Microsoft Windows platforms, as there is a
>> >> dearth of suitable benchmarks.
>> >>
>> >> Is there some way of simulating a number of
>> >> clients without the complication of having
>> >> multiple machines and real network connections?
>> >
>> > A benchmark for thread performance should be done completely
>> > on the same one local machine. Once on a single CPU machine
>> > and then on a multiple CPU machine.
>> > Adding other factors like network etc. would disturb the benchmark.
>> >
>>
>> It's a tricky one, eh?
>>
>> Normally, you'd want your benchmark to be useful in real life scenarios.
>> But...
>> - if you don't do real network I/O then you will miss how the threading
>> behaves in the presence of e.g. the TCP/IP network stack (this has bitten 
>> me
>> before)
>> - but you don't really want to count the cost of network time and client,
>> now, so maybe you shouldn't do it.
>> - or maybe putting the clients on the local machine would work, but you
>> don't want to have to account for their activity
>>
>> How do other people tune things like threadpools, etc?  We normally go 
>> for
>> option 1 above - use real clients and network travel.
>
> You can simulate both server and clients on the same machine.
> Then using network functions (ie. local adress(es) and ports) is even ok.

This isn't really great:
- you have extra processes running on your box (the test clients), that will 
affect CPU memory access patterns, threading, network stack, and a thousand 
other things.  Basically, a whole load of code running that won't be running 
on a production system.  So you're not really getting a true picture of what 
the system will run like in real life.

> The next best way would be using machines within 1 hop.

Indeed.  This is what we do, with a representative range of client types and 
behaviours.

> Of course it's up to what you exactly want to measure. For measuring 
> thread
> performance I would not go on wire, but using the IP adress(es) of the 
> local
> machine would be ok (plus same or different ports (depending on protocol)
> on that machine).
>
>

Of course, the biggest pain about testing multithreaded performance is 
measuring it.  The simple act of measuring it will change it.  A good way to 
measure is to get numbers for throughput and latency (under extremely high 
load) from the client.  e.g. in a VoIP system, you might check the time to 
receiving the first packet of audio, and calculate the distribution over all 
clients.
If you find you have problems, or are not getting the performance benefits 
you expect from your testing & enhancing, then you move to tools that run 
locally and inspect the machine while you go.  E.g. a profiler, or some kind 
of homemade debug profiling code.
Rinse and repeat until happy.

Wierd, I can't see my posts...  Nevermind.

.