Subj : Problem with system() calls in a multithreaded program on HPUX 11 To : comp.programming.threads,comp.sys.hp.hpux From : maheshkumarjha Date : Fri Feb 11 2005 03:46 am Hello, I am porting a multithreaded program to HPUX 11 from Solaris, in which threads make calls to system() functions. The program basically creates a number of threads and runs them specified number of times. Each thread performs some task and creates a trace file. The threads then verify the trace file against standard ones to check whether the run was successful or not. The number of threads is variable. Problem: ======== As I increase the number of threads, the program hangs, while trying to run some system() call. If I remove the system() calls altogether, the program runs fine. Below is an explanation of the problem followed by the code. The program uses pthreads and runs fine on Solaris. There may be some variable names used in explanation. These names appear in code given after that. Explanation: ============ If I increase the value of noOfThreads to say 3, 4 and so on. The program hangs say around when noOfThreads is 6 or 7. Now as the problem occurs, two three defunct processes are created. I ran "ps -f -u" command and output was something like this (mtreg is the name of above program) -bash-2.05b$ ps -f -u mkumar UID PID PPID C STIME TTY TIME COMMAND mkumar 1726 1190 0 00:06:12 pts/ta 0:10 mtreg mkumar 1190 1189 0 23:04:02 pts/ta 0:01 -bash mkumar 1731 1726 0 00:06:20 pts/ta 0:00 mkumar 1730 1726 2 00:06:20 pts/ta 0:00 mkumar 1743 0 0 00:06:20 pts/ta 0:00 mtreg mkumar 1741 1726 0 00:06:21 pts/ta 0:00 sh -c perl strip.pl /export/home/configdev/tmp/FAAa01726mod0 mkumar 1742 1741 0 00:06:21 pts/ta 0:00 perl strip.pl /export/home/configdev/tmp/FAAa01726mod0456a.m mkumar 1751 1190 5 00:07:48 pts/ta 0:00 ps -f -u mkumar Before hanging the output at the console was: ================================================================================ Running perl strip.pl /export/home/configdev/tmp/EAAa07614mod0456a.myt Running perl strip.pl /export/home/configdev/tmp/DAAa07614mod0456a.myt Running perl strip.pl /export/home/configdev/tmp/AAAa07614mod0456a.myt Running perl strip.pl /export/home/configdev/tmp/CAAa07614mod0456a.myt Finished running: perl strip.pl /export/home/configdev/tmp/EAAa07614mod0456a.myt Running diff -w mod0456a.trc /export/home/configdev/tmp/EAAa07614mod0456a.myt > /export/home/configdev/tmp/EAAa07614mod0456a.myt.diff Finished running diff -w mod0456a.trc /export/home/configdev/tmp/EAAa07614mod0456a.myt > /export/home/configdev/tmp/EAAa07614mod0456a.myt.diff Running perl strip.pl /export/home/configdev/tmp/BAAa07614mod0456a.myt Finished running: perl strip.pl /export/home/configdev/tmp/DAAa07614mod0456a.myt Running diff -w mod0456a.trc /export/home/configdev/tmp/DAAa07614mod0456a.myt > /export/home/configdev/tmp/DAAa07614mod0456a.myt.diff Finished running: perl strip.pl /export/home/configdev/tmp/AAAa07614mod0456a.myt Running perl strip.pl /export/home/configdev/tmp/FAAa01726mod0456a.myt Running diff -w mod0456a.trc /export/home/configdev/tmp/AAAa07614mod0456a.myt > /export/home/configdev/tmp/AAAa07614mod0456a.myt.diff Finished running diff -w mod0456a.trc /export/home/configdev/tmp/DAAa07614mod0456a.myt > /export/home/configdev/tmp/DAAa07614mod0456a.myt.diff Running diff -w mod0456a.trc /export/home/configdev/tmp/CAAa07614mod0456a.myt > /export/home/configdev/tmp/CAAa07614mod0456a.myt.diff ================================================================================ Now some things that I observed are: 1. I started only one mtreg process (PID 1726). But when the program hanged, there is one more mtreg process with PPID 0 which is there. It was idle. 2. Each time the program hangs, there are one or more defunct processes. 3. I am unable to kill the program once it hangs, and system has to be rebooted. 4. The number of threads for which the program hangs is not fixed. It can hang at 5, 6 ,7 or 8 threads. It even hanged once for only 4 threads 5. Although last statement is Running "diff", it has not yet started. 6. I tried an experiment in which I removed all the system() function calls, and instead placed fclose(fopen(diffFileName, "w")). It meant just creating the file without doing anything. This time I was able to run the program even with 10 threads each doing 10 iterations. And it seems that the program might run fine for any number of threads. ( I checked uptil 15 threads). ================================================================================ CODE: ( The code is representative of whole code. It may not be compilable) #include #include #include #include #define noOfThreads 1 #define noOfIterations 1 char outFileName[512]; char standardTraceFile[512]; void * threadStartRoutine(void* p); void doOneIterationOfThread(); /* * Creates a number of threads and runs them. Waits for their completion and then exits. */ void createThreadsAndRun() { pthread_t threadList[noOfThreads]; for(int i=0; i %s", standardTraceFile, newTraceFile, diffFileName); cerr<<"Running "<" actually brings the file into a normalized form. It means, that it * changes the values that are run dependent in the trace file, like time stamps and some other info to predecided * normal value. ( Like time stamps may be converted to 0x0) This makes the new trace file and standard trace file * comparable. strip.pl (perl script) performs this task by substituting regular expressions. * Note Ends ************* */ ================================================================================ Can anyone please tell me why system() calls are causing problem in HPUX 11 whereas the same thing runs fine on Solaris? It would be really great if you can suggest a possible solution? Thanks and regards, Mahesh Kumar .