Subj : Re: Problem with system() calls in a multithreaded program on HPUX 11 To : comp.programming.threads,comp.sys.hp.hpux From : Joe Seigh Date : Fri Feb 11 2005 07:48 am On 11 Feb 2005 03:46:23 -0800, Mahesh Kumar wrote: > Hello, > > I am porting a multithreaded program to HPUX 11 from Solaris, in > which threads make calls to system() functions. The program basically > creates a number of threads and runs them specified number of times. > Each thread performs some task and creates a trace file. The threads > then verify the trace file against standard ones to check whether the > run was successful or not. The number of threads is variable. > > Problem: > ======== > As I increase the number of threads, the program hangs, while trying > to run some system() call. If I remove the system() calls altogether, > the program runs fine. Below is an explanation of the problem followed > by the code. The program uses pthreads and runs fine on Solaris. > > > There may be some variable names used in explanation. These names > appear in code given after that. > > Explanation: > ============ > If I increase the value of noOfThreads to say 3, 4 and so on. The > program hangs say around when noOfThreads is 6 or 7. Now as the > problem occurs, two three defunct processes are created. I ran "ps -f > -u" command and output was something like this (mtreg is the name of > above program) > -bash-2.05b$ ps -f -u mkumar > UID PID PPID C STIME TTY TIME COMMAND > mkumar 1726 1190 0 00:06:12 pts/ta 0:10 mtreg > mkumar 1190 1189 0 23:04:02 pts/ta 0:01 -bash > mkumar 1731 1726 0 00:06:20 pts/ta 0:00 > mkumar 1730 1726 2 00:06:20 pts/ta 0:00 > mkumar 1743 0 0 00:06:20 pts/ta 0:00 mtreg > mkumar 1741 1726 0 00:06:21 pts/ta 0:00 sh -c perl strip.pl > /export/home/configdev/tmp/FAAa01726mod0 > mkumar 1742 1741 0 00:06:21 pts/ta 0:00 perl strip.pl > /export/home/configdev/tmp/FAAa01726mod0456a.m > mkumar 1751 1190 5 00:07:48 pts/ta 0:00 ps -f -u mkumar > [...] > > Can anyone please tell me why system() calls are causing problem in > HPUX 11 whereas the same thing runs fine on Solaris? It would be > really great if you can suggest a possible solution? > Probably the SIGCHLD signal handing got messed up. You have defunct programs that have finished but have not had their exit status collected yet. Even though system() is supposed to be thread safe it's way too sensitive to signal disposition to be using in anything but a a single threaded program. Change your program to be single threaded and use fork(), exec(), and wait(). See the unix programming books by Stevens on how to do it. Using system() from threads was a major violation of the KISS rule and you should expect to have problems when that happens. And you should expect that we aren't going to try to make programs, that are much more complicated than they should be, work. -- Joe Seigh Lock-free synchronization primitives http://atomic-ptr-plus.sourceforge.net/ .