Subj : pthread_cond_timedwait() wastes CPU time To : comp.programming.threads From : Laurent Birtz Date : Thu Feb 17 2005 08:35 pm Hi, I'm running into a strange problem related to pthread on Linux. I have a program that contacts a server regularly, and its operations should take very little CPU. The program uses several threads, but all are blocked save one. Using top(1), I monitor the program execution. Typically the program consumes very little CPU, so the value displayed by top for the CPU time consumed by the thread doesn't change. However, after a random period of time, say 50 seconds, the active thread suddently begins to take lots of CPU time for 2 to 10 seconds. The CPU usage reported by top is around 4% for the thread when this happens. As a result, the CPU time reported by top increases rapidly, by several tenths of milliseconds every second. Such CPU bursts happen seemingly randomly, as far as I can tell. When investigating, I made sure that all other threads were blocked, except the thread that contacts the server 20 times per second. I traced my program with printf() and gdb, I profiled it using gprof and gcov, and I'm fairly certain that the program is not doing something out of the ordinary during the CPU bursts. I then located the functions responsible for the waste of time by parsing /proc/pid/stat at the beginning and the end of those functions. I identified one of the culprits: pthread_cond_timedwait(). Here's a snippet of my code: global_set_mark("", &global_start_4a, &global_start_4b); int error = pthread_cond_timedwait(cond_ptr, mutex_ptr, spec_ptr); global_get_mark_diff("PTHREAD TIMED WAIT", global_start_4a, global_start_4b); global_set_mark() and global_set_mark_diff() are the functions that parse /proc/pid/stat to obtain the CPU time before and after the call. global_set_mark_diff() prints a message when the user or system time has increased after the mark was set. Here's a trace I get: Mark PTHREAD TIMED WAIT diff: 1, 0. Mark PTHREAD TIMED WAIT diff: 1, 0. Mark PTHREAD TIMED WAIT diff: 1, 0. Mark PTHREAD TIMED WAIT diff: 1, 0. Mark PTHREAD TIMED WAIT diff: 1, 0. Mark PTHREAD TIMED WAIT diff: 0, 1. Mark PTHREAD TIMED WAIT diff: 0, 1. Of course it's normal that the CPU time increases from time to time during the function call, but it's abnormal that the message occurs frequently and several times in a row. pthread_cond_timewait() is not the only place where the time is lost - I suppose I could detect the same behavior around mutex lock calls and such. The program doesn't receive signals. The thread consuming the time mostly does read(), write() and poll() calls on a socket before going back to sleep with pthread_cond_timedwait(). So what's happening in pthread_cond_timedwait()? I'd appreciate any help on the issue. The problem is not critical, but it makes my program waste 3 seconds of CPU time per hour on average. Something is wrong and I'd like to know what it is. Laurent Birtz .