a6b
Subj : Re: Any overhead to pthread getspecific() (a.k.a. thread-local variables)?
To   : comp.programming.threads
From : Chris Thomasson
Date : Tue May 10 2005 02:43 am

>I am curious about what sort of overhead I should expect from using
> getspecific() frequently in a program?

You can't portably rely on any performance numbers to be consistent wrt to 
the overhead of "any" pthread_ calls. As for your per-thread data, you 
"should" provide the ability for users to pass a pointer directly to the 
various API's that use your memory management scheme. You don't really have 
to worry about the performance of pthread_set/getspecific( ... ) if you do 
it this way because nearly all the calls to pthread_set/getspecific an be 
eliminated. I use a scheme like this in my library...




> I am implementing multithreaded
> low-contention (nearly lock-free) memory management in C++ using
> thread-local memory pools and so this will be getting called frequently to
> get the current thread-local memory pool.

I have implemented a similar setup that works on Intel based Linux/Windows 
systems. It uses a simple, yet effective three-tier "node cache" algorithm 
that allocates and frees nodes in the following order:

per-thread->lock-free_lifo->malloc/free


This setup is very well-known, portable and compatible with all sorts of 
lock-free algorithms. The algorithm also performs extremely well if 
implemented "correctly". Unfortunately, you will have to take "special care" 
if you want this type of cache algorithm to perform well on SMP systems, 
regardless of the potential performance problems the pthread API's "can" 
have. The actual relevant source code for my node cache implementation is 
contained in the following files:

http://appcore.home.comcast.net/appcore/include/ac_thread_h.html
( inline fast-path cache functions )


http://appcore.home.comcast.net/appcore/src/cpu/i686/ac_i686_c.html
( cpu specific slow-path cache functions )


http://appcore.home.comcast.net/appcore/src/cpu/i686/ac_i686_gcc_asm.html
( cpu specific lock-free cache support )


under all api's prefixed with:

ac_thread_cpu_node_
ac_i686_node_cache_
ac_i686_stack_




Also, notice the use of CPU specific code for a simple node cache? This is 
sort of necessary in order to provide efficient support for the "per-thread 
cache out of nodes" case. Just another "small" example on how adding 
high-performance support for lock-free algorithms in a multi-processor 
environment can be very tedious and difficult...

;)


-- 
http://appcore.home.comcast.net/
(portable lock-free data-structures)

.

0