From dmlb@ragnet.demon.co.uk Thu Apr  1 15:12:42 1999
Return-Path: <dmlb@ragnet.demon.co.uk>
Received: from finch-post-11.mail.demon.net (finch-post-11.mail.demon.net [194.217.242.39])
	by hub.freebsd.org (Postfix) with ESMTP id 639DB155F7
	for <FreeBSD-gnats-submit@freebsd.org>; Thu,  1 Apr 1999 15:12:28 -0800 (PST)
	(envelope-from dmlb@ragnet.demon.co.uk)
Received: from [158.152.46.40] (helo=ragnet.demon.co.uk)
	by finch-post-11.mail.demon.net with smtp (Exim 2.12 #1)
	id 10Sqd6-000Eus-0B
	for FreeBSD-gnats-submit@freebsd.org; Thu, 1 Apr 1999 23:12:08 +0000
Received: from dmlb by ragnet.demon.co.uk with local (Exim 1.82 #1)
	id 10Spfw-0000Je-00; Thu, 1 Apr 1999 23:11:00 +0100
Message-Id: <E10Spfw-0000Je-00@ragnet.demon.co.uk>
Date: Thu, 1 Apr 1999 23:11:00 +0100
From: dmlb@ragnet.demon.co.uk
Reply-To: dmlb@ragnet.demon.co.uk
To: FreeBSD-gnats-submit@freebsd.org
Cc: dmlb@ragnet.demon.co.uk
Subject: Problems with job control in bin/sh and fix.
X-Send-Pr-Version: 3.2

>Number:         10912
>Category:       bin
>Synopsis:       /bin/sh: Fix to prevent infinite loops on missing children
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    mikeh
>State:          closed
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Thu Apr  1 15:20:01 PST 1999
>Closed-Date:    Sat Jun 23 04:17:31 PDT 2001
>Last-Modified:  Sat Jun 23 04:19:18 PDT 2001
>Originator:     Duncan Barclay
>Release:        FreeBSD 2.2.6-RELEASE i386
>Organization:
>Environment:

	3.1 release used for testing. Full CVS repository available
	locally.

>Description:

	Code in jobs.c dowait() fails when a child is reaped and breakwaitcmd
	is set.  The shell thinks there is a job waiting to be reaped which
	has already exit'd.  It then loops forever calling dowait() trying to
	find its lost child.   
	 
	This occurs when a sub-shell is invoked by a trap - the sub shell has
	breakwaitcmd set on entry to dowait() but the wait will also reap the
	child. dowait() exits because of breakwaitcmd and does not update the
	job status.
	
	/bin/sh before 1998 09 07 does not exhibit this behaviour, this is when
	background execution of traps was added.


	Example:
	

	In .env I have
	
	  _winch(){
    	foo=$(stty -a | sed ....)
    	....
  	}
  	trap _winch 28
	
	Turning on DEBUG and obseving trace output (I've added a getpid to
	trace() so I can follow this).  Also I have changed onsig() in trap.c
	to set breakwaitcmd to the signal number so I can see it happen.
	
	PID 8679 is the backquote sub-shell.
	PID 8680 is the stty process.
	
	[8679] In parent shell:  child = 8680
	[8679] searchexec "sed" returns "/usr/bin/sed"
	[8679] forkshell(%0, 0x80aa4dc, 0) called
	[8679] In parent shell:  child = 8681
	[8679] waitforjob(%1) called				*** 8679 waits for stty
								*** to finish
	[8679] dowait(1) called, breakwaitcmd = 128		*** trouble brewing
	  [8680] Child shell 8680
	  [8680] evaltree(0x80aa4a4: 1) called
	  [8680] evalcommand(0x80aa4a4, 1) called
	  [8680] evalcommand arg: stty
	  [8680] evalcommand arg: -a				*** stty completes
	[8679] wait returns 8680, status=0			*** stty is reaped
	[8679] dowait returning because breakwaitcmd = 128	*** oh s**t!!!!
	
	I've "fixed" this by checking that in_waitcmd is set in the signal
	handler, but this may not be "right". I'm not entirely sure that the
	breakwaitcmd code is right in dowait() as it doesn't check that the wait
	returned due to a signal to this process. 
	
	Patches below to add pid to trace logging and the in_waitcmd check.

Index: sh/show.c
===================================================================
RCS file: /ide0.e/ncvs/src/bin/sh/show.c,v
retrieving revision 1.9
diff -u -r1.9 show.c
--- show.c	1998/05/18 06:44:19	1.9
+++ show.c	1999/04/01 22:25:05
@@ -316,6 +316,7 @@
 	fmt = va_arg(va, char *);
 #endif
 	if (tracefile != NULL) {
+	        (void) fprintf(tracefile, "[%d] ", getpid());
 		(void) vfprintf(tracefile, fmt, va);
 		if (strchr(fmt, '\n'))
 			(void) fflush(tracefile);
Index: sh/trap.c
===================================================================
RCS file: /ide0.e/ncvs/src/bin/sh/trap.c,v
retrieving revision 1.17
diff -u -r1.17 trap.c
--- trap.c	1998/09/10 22:09:11	1.17
+++ trap.c	1999/04/01 22:37:20
@@ -362,15 +362,15 @@
 
 	/* If we are currently in a wait builtin, prepare to break it */
 	if ((signo == SIGINT || signo == SIGQUIT) && in_waitcmd != 0)
-		breakwaitcmd = 1;
+		breakwaitcmd = signo;
 	/* 
 	 * If a trap is set, not ignored and not the null command, we need 
 	 * to make sure traps are executed even when a child blocks signals.
 	 */
-	if (trap[signo] != NULL && 
+	if (in_waitcmd != 0 && trap[signo] != NULL && 
 	    ! trap[signo][0] == '\0' &&
 	    ! (trap[signo][0] == ':' && trap[signo][1] == '\0'))
-		breakwaitcmd = 1;
+		breakwaitcmd = 1000*signo+in_waitcmd;
 }
 
Duncan
 

>How-To-Repeat:
>Fix:
>Release-Note:
>Audit-Trail:
State-Changed-From-To: open->feedback 
State-Changed-By: mikeh 
State-Changed-When: Fri Jun 22 22:01:35 PDT 2001 
State-Changed-Why:  
This behavior appears to have been reverted and put under the -T 
option. Does this fix the problem for you? 


Responsible-Changed-From-To: freebsd-bugs->mikeh 
Responsible-Changed-By: mikeh 
Responsible-Changed-When: Fri Jun 22 22:01:35 PDT 2001 
Responsible-Changed-Why:  
I'll handle feedback. 

http://www.FreeBSD.org/cgi/query-pr.cgi?pr=10912 
State-Changed-From-To: feedback->closed 
State-Changed-By: mikeh 
State-Changed-When: Sat Jun 23 04:17:31 PDT 2001 
State-Changed-Why:  
The code has been moved out of the default into the -T 
option. Submitter reports that this PR can be closed. 

http://www.FreeBSD.org/cgi/query-pr.cgi?pr=10912 
>Unformatted:
