a6b Subj : Re: Any overhead to pthread getspecific() (a.k.a. thread-local variables)? To : comp.programming.threads From : Chris Thomasson Date : Tue May 10 2005 02:43 am >I am curious about what sort of overhead I should expect from using > getspecific() frequently in a program? You can't portably rely on any performance numbers to be consistent wrt to the overhead of "any" pthread_ calls. As for your per-thread data, you "should" provide the ability for users to pass a pointer directly to the various API's that use your memory management scheme. You don't really have to worry about the performance of pthread_set/getspecific( ... ) if you do it this way because nearly all the calls to pthread_set/getspecific an be eliminated. I use a scheme like this in my library... > I am implementing multithreaded > low-contention (nearly lock-free) memory management in C++ using > thread-local memory pools and so this will be getting called frequently to > get the current thread-local memory pool. I have implemented a similar setup that works on Intel based Linux/Windows systems. It uses a simple, yet effective three-tier "node cache" algorithm that allocates and frees nodes in the following order: per-thread->lock-free_lifo->malloc/free This setup is very well-known, portable and compatible with all sorts of lock-free algorithms. The algorithm also performs extremely well if implemented "correctly". Unfortunately, you will have to take "special care" if you want this type of cache algorithm to perform well on SMP systems, regardless of the potential performance problems the pthread API's "can" have. The actual relevant source code for my node cache implementation is contained in the following files: http://appcore.home.comcast.net/appcore/include/ac_thread_h.html ( inline fast-path cache functions ) http://appcore.home.comcast.net/appcore/src/cpu/i686/ac_i686_c.html ( cpu specific slow-path cache functions ) http://appcore.home.comcast.net/appcore/src/cpu/i686/ac_i686_gcc_asm.html ( cpu specific lock-free cache support ) under all api's prefixed with: ac_thread_cpu_node_ ac_i686_node_cache_ ac_i686_stack_ Also, notice the use of CPU specific code for a simple node cache? This is sort of necessary in order to provide efficient support for the "per-thread cache out of nodes" case. Just another "small" example on how adding high-performance support for lock-free algorithms in a multi-processor environment can be very tedious and difficult... ;) -- http://appcore.home.comcast.net/ (portable lock-free data-structures) . 0