Subj : Re: Problem with system() calls in a multithreaded program on HPUX 11 To : comp.programming.threads,comp.sys.hp.hpux From : Dan Koren Date : Tue Feb 15 2005 02:16 am The briefest answer to your question is that system() is not MT-safe, and is almost guaranteed to break if invoked from a multi-threaded program. Replace 'system(cmd)' by 'pclose(popen(cmd, "w"))' and things should work as long as 'cmd' is reasonably well behaved. dk "Mahesh Kumar" wrote in message news:5bf55c06.0502110346.e055886@posting.google.com... > Hello, > > I am porting a multithreaded program to HPUX 11 from Solaris, in > which threads make calls to system() functions. The program basically > creates a number of threads and runs them specified number of times. > Each thread performs some task and creates a trace file. The threads > then verify the trace file against standard ones to check whether the > run was successful or not. The number of threads is variable. > > Problem: > ======== > As I increase the number of threads, the program hangs, while trying > to run some system() call. If I remove the system() calls altogether, > the program runs fine. Below is an explanation of the problem followed > by the code. The program uses pthreads and runs fine on Solaris. > > > There may be some variable names used in explanation. These names > appear in code given after that. > > Explanation: > ============ > If I increase the value of noOfThreads to say 3, 4 and so on. The > program hangs say around when noOfThreads is 6 or 7. Now as the > problem occurs, two three defunct processes are created. I ran "ps -f > -u" command and output was something like this (mtreg is the name of > above program) > -bash-2.05b$ ps -f -u mkumar > UID PID PPID C STIME TTY TIME COMMAND > mkumar 1726 1190 0 00:06:12 pts/ta 0:10 mtreg > mkumar 1190 1189 0 23:04:02 pts/ta 0:01 -bash > mkumar 1731 1726 0 00:06:20 pts/ta 0:00 > mkumar 1730 1726 2 00:06:20 pts/ta 0:00 > mkumar 1743 0 0 00:06:20 pts/ta 0:00 mtreg > mkumar 1741 1726 0 00:06:21 pts/ta 0:00 sh -c perl strip.pl > /export/home/configdev/tmp/FAAa01726mod0 > mkumar 1742 1741 0 00:06:21 pts/ta 0:00 perl strip.pl > /export/home/configdev/tmp/FAAa01726mod0456a.m > mkumar 1751 1190 5 00:07:48 pts/ta 0:00 ps -f -u mkumar > > Before hanging the output at the console was: > ================================================================================ > Running perl strip.pl /export/home/configdev/tmp/EAAa07614mod0456a.myt > Running perl strip.pl /export/home/configdev/tmp/DAAa07614mod0456a.myt > Running perl strip.pl /export/home/configdev/tmp/AAAa07614mod0456a.myt > Running perl strip.pl /export/home/configdev/tmp/CAAa07614mod0456a.myt > Finished running: perl strip.pl > /export/home/configdev/tmp/EAAa07614mod0456a.myt > Running diff -w mod0456a.trc > /export/home/configdev/tmp/EAAa07614mod0456a.myt > > /export/home/configdev/tmp/EAAa07614mod0456a.myt.diff > Finished running diff -w mod0456a.trc > /export/home/configdev/tmp/EAAa07614mod0456a.myt > > /export/home/configdev/tmp/EAAa07614mod0456a.myt.diff > Running perl strip.pl /export/home/configdev/tmp/BAAa07614mod0456a.myt > Finished running: perl strip.pl > /export/home/configdev/tmp/DAAa07614mod0456a.myt > Running diff -w mod0456a.trc > /export/home/configdev/tmp/DAAa07614mod0456a.myt > > /export/home/configdev/tmp/DAAa07614mod0456a.myt.diff > Finished running: perl strip.pl > /export/home/configdev/tmp/AAAa07614mod0456a.myt > Running perl strip.pl /export/home/configdev/tmp/FAAa01726mod0456a.myt > Running diff -w mod0456a.trc > /export/home/configdev/tmp/AAAa07614mod0456a.myt > > /export/home/configdev/tmp/AAAa07614mod0456a.myt.diff > Finished running diff -w mod0456a.trc > /export/home/configdev/tmp/DAAa07614mod0456a.myt > > /export/home/configdev/tmp/DAAa07614mod0456a.myt.diff > Running diff -w mod0456a.trc > /export/home/configdev/tmp/CAAa07614mod0456a.myt > > /export/home/configdev/tmp/CAAa07614mod0456a.myt.diff > ================================================================================ > > Now some things that I observed are: > 1. I started only one mtreg process (PID 1726). But when the program > hanged, there is one more mtreg process with PPID 0 which is there. It > was idle. > 2. Each time the program hangs, there are one or more defunct > processes. > 3. I am unable to kill the program once it hangs, and system has to be > rebooted. > 4. The number of threads for which the program hangs is not fixed. It > can hang at 5, 6 ,7 or 8 threads. It even hanged once for only 4 > threads > 5. Although last statement is Running "diff", it has not yet started. > 6. I tried an experiment in which I removed all the system() function > calls, and instead placed fclose(fopen(diffFileName, "w")). It meant > just creating the file without doing anything. > This time I was able to run the program even with 10 threads each > doing 10 iterations. And it seems that the program might run fine for > any number of threads. ( I checked uptil 15 threads). > > ================================================================================ > > CODE: ( The code is representative of whole code. It may not be > compilable) > > > #include > #include > #include > #include > > #define noOfThreads 1 > #define noOfIterations 1 > > char outFileName[512]; > char standardTraceFile[512]; > > void * threadStartRoutine(void* p); > > void doOneIterationOfThread(); > > /* > * Creates a number of threads and runs them. Waits for their > completion and then exits. > */ > void createThreadsAndRun() > { > pthread_t threadList[noOfThreads]; > for(int i=0; i { > pthread_attr_t attr; > pthread_attr_init(&attr); > pthread_attr_setscope(&attr, PTHREAD_SCOPE_SYSTEM ); > pthread_create(&threadList[i],&attr, threadStartRoutine, nextReq()); > cerr << "start thread " << i << endl; > } > for (i = 0; i < noOfThreads; i++) > { > pthread_join(threadList[i], NULL); > cerr << "finish thread " << i << endl; > } > > } > > int main(int argc, char* argv) > { > //The arguments are not shown, as the functions are just > representative of the function they are intended to perform > setOutFileName(); //depending on argc and argv set the value of > outFileName; outFileName is the filename for trace file > setStandardTraceFileName(); //obtained from one of the arguments. > sets the variable standard trace file name > createThreadsAndRun(); > } > > void * threadStartRoutine(void* p) > { > char* prefix = tempnam(NULL,""); > sprintf(newTraceFile, "%s%s", prefix, outFileName); // set the > tracefile name > for(i = 0; i { > //do some initializations > if(!doOneIterationOfThread()) > { > cerr<<"Run failed: "< } > else > { > cerr<<"Run suzzessful: "< } > } > } > > int doOneIterationOfThread() > { > doCoreWork(); //writes trace into the tracefile with actual values > if(verifyTrace(standardTraceFile, newTraceFile) != 0) //to verify > this run of the thread > { > return false; > } > else > { > return true; > } > } > > /* > * tracefile names are with full path > */ > int verifyTrace(char* standardTraceFile, char* newTraceFile) > { > char cmd[512]; > char diffFileName[512]; > sprintf(diffFileName, "%s.diff", newTraceFile); > > sprintf(cmd, "perl strip.pl %s", newTraceFile); > cerr<<"Running "< system(cmd); > cerr<<"Finished running: "< > sprintf(cmd, "diff -w %s %s > %s", standardTraceFile, newTraceFile, > diffFileName); > cerr<<"Running "< system(cmd); > cerr<<"Finished running: "< > struct stat buf; > stat(diffFileName, &buf); > > unlink(diffFileName); > > if(buf.st_size == 0) > return true; > else > return false; > } > > /* > * Note ****************** > * "perl strip.pl " actually brings the file into > a normalized form. It means, that it > * changes the values that are run dependent in the trace file, like > time stamps and some other info to predecided > * normal value. ( Like time stamps may be converted to 0x0) This > makes the new trace file and standard trace file > * comparable. strip.pl (perl script) performs this task by > substituting regular expressions. > * Note Ends ************* > */ > > ================================================================================ > > Can anyone please tell me why system() calls are causing problem in > HPUX 11 whereas the same thing runs fine on Solaris? It would be > really great if you can suggest a possible solution? > > Thanks and regards, > > Mahesh Kumar .