From mitya@mitya.mitya.static.dol.ru  Tue May  4 02:03:44 2004
Return-Path: <mitya@mitya.mitya.static.dol.ru>
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id E9B0816A4CE
	for <FreeBSD-gnats-submit@freebsd.org>; Tue,  4 May 2004 02:03:44 -0700 (PDT)
Received: from mitya.mitya.static.dol.ru (mitya.mitya.static.dol.ru [194.87.5.172])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 4A86E43D1F
	for <FreeBSD-gnats-submit@freebsd.org>; Tue,  4 May 2004 02:03:42 -0700 (PDT)
	(envelope-from mitya@mitya.mitya.static.dol.ru)
Received: from mitya.mitya.static.dol.ru (localhost [127.0.0.1])
	by mitya.mitya.static.dol.ru (8.12.11/8.12.11) with ESMTP id i448RvpW000698
	for <FreeBSD-gnats-submit@freebsd.org>; Tue, 4 May 2004 12:27:58 +0400 (MSD)
	(envelope-from mitya@mitya.mitya.static.dol.ru)
Received: (from mitya@localhost)
	by mitya.mitya.static.dol.ru (8.12.11/8.12.11/Submit) id i448Rvum000697;
	Tue, 4 May 2004 12:27:57 +0400 (MSD)
	(envelope-from mitya)
Message-Id: <200405040827.i448Rvum000697@mitya.mitya.static.dol.ru>
Date: Tue, 4 May 2004 12:27:57 +0400 (MSD)
From: Dmitry Sivachenko <mitya@demos.su>
Reply-To: Dmitry Sivachenko <mitya@demos.su>
To: FreeBSD-gnats-submit@freebsd.org
Cc:
Subject: endless loop in sh(1)
X-Send-Pr-Version: 3.113
X-GNATS-Notify:

>Number:         66242
>Category:       bin
>Synopsis:       endless loop in sh(1)
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    maxim
>State:          closed
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Tue May 04 02:10:25 PDT 2004
>Closed-Date:    Fri Apr 14 10:43:28 GMT 2006
>Last-Modified:  Fri Apr 14 10:43:28 GMT 2006
>Originator:     Dmitry Sivachenko
>Release:        FreeBSD 5.2-CURRENT i386
>Organization:
>Environment:
System: FreeBSD mitya.mitya.static.dol.ru 5.2-CURRENT FreeBSD 5.2-CURRENT #1: Sun Apr 18 17:57:01 MSD 2004 mitya@mitya.mitya.static.dol.ru:/usr/obj/usr/src/sys/CAVIA i386


The following sh(1) behaviour can be observed on both -CURRENT and -STABLE.

>Description:
Consider the following script:

#!/bin/sh -T

trap 'echo TRAP!; ps; exit 1' HUP;

echo 'Started...'
read a


Run it and send HUP signal to sh(1) while it is waiting for 'read' command.
You reach trap handler, ps(1) output appears but the script does NOT exit
and sh(1) process starts to eat 100% of CPU.

Here is truss output:
wait4(0xffffffff,0xbfbfe9d8,0x2,0x0)             ERR#10 'No child processes'
wait4(0xffffffff,0xbfbfe9d8,0x2,0x0)             ERR#10 'No child processes'
wait4(0xffffffff,0xbfbfe9d8,0x2,0x0)             ERR#10 'No child processes'
wait4(0xffffffff,0xbfbfe9d8,0x2,0x0)             ERR#10 'No child processes'
wait4(0xffffffff,0xbfbfe9d8,0x2,0x0)             ERR#10 'No child processes'
wait4(0xffffffff,0xbfbfe9d8,0x2,0x0)             ERR#10 'No child processes'
......

Here is backtrace:

(gdb) bt
#0  0x80763fc in wait4 ()
#1  0x8075941 in wait3 ()
#2  0x8051f8a in waitproc (block=1, status=0xbfbffa0c)
    at /mnt/backup/releng_4/src/bin/sh/jobs.c:1025
#3  0x8051cbd in dowait (block=1, job=0x80c6000)
    at /mnt/backup/releng_4/src/bin/sh/jobs.c:926
#4  0x8051b8a in waitforjob (jp=0x80c6000, origstatus=0xbfbffa88)
    at /mnt/backup/releng_4/src/bin/sh/jobs.c:870
#5  0x804be33 in evalcommand (cmd=0x80b6d6c, flags=0, backcmd=0x0)
    at /mnt/backup/releng_4/src/bin/sh/eval.c:904
#6  0x804acc0 in evaltree (n=0x80b6d6c, flags=0)
    at /mnt/backup/releng_4/src/bin/sh/eval.c:281
#7  0x804aafa in evaltree (n=0x80b6e04, flags=0)
    at /mnt/backup/releng_4/src/bin/sh/eval.c:199
#8  0x804aafa in evaltree (n=0x80b6e38, flags=0)
    at /mnt/backup/releng_4/src/bin/sh/eval.c:199
#9  0x804aa73 in evalstring (
    s=0x80c5100 "rm -f /tmp/st28742.box221.zecke.demos.su; _clean SIGHUP /dev/tt
yph.28742.zecke.demos.su 28742;  exit")
    at /mnt/backup/releng_4/src/bin/sh/eval.c:171
#10 0x80598da in dotrap () at /mnt/backup/releng_4/src/bin/sh/trap.c:401
#11 0x804acf6 in evaltree (n=0x80b6d00, flags=0)
    at /mnt/backup/releng_4/src/bin/sh/eval.c:290
#12 0x80528f4 in cmdloop (top=1) at /mnt/backup/releng_4/src/bin/sh/main.c:250

The waitproc() at jobs.c:926 returns -1 and sets errno to ECHILD (because
the child does not exist at that time).
Since (pid <= 0) condition is true at jobs.c:935, -1 is returned and we are
entering dotrap() at jobs.c:870.  dotrap() never alters 'state' field
of struct job.

>How-To-Repeat:

See above.
>Fix:

	


>Release-Note:
>Audit-Trail:

From: Bruce Evans <bde@zeta.org.au>
To: Dmitry Sivachenko <mitya@demos.su>
Cc: FreeBSD-gnats-submit@freebsd.org, freebsd-bugs@freebsd.org
Subject: Re: bin/66242: endless loop in sh(1)
Date: Wed, 5 May 2004 03:35:23 +1000 (EST)

 On Tue, 4 May 2004, Dmitry Sivachenko wrote:
 
 > The following sh(1) behaviour can be observed on both -CURRENT and -STABLE.
 >
 > >Description:
 > Consider the following script:
 >
 > #!/bin/sh -T
 >
 > trap 'echo TRAP!; ps; exit 1' HUP;
 >
 > echo 'Started...'
 > read a
 >
 >
 > Run it and send HUP signal to sh(1) while it is waiting for 'read' command.
 > You reach trap handler, ps(1) output appears but the script does NOT exit
 > and sh(1) process starts to eat 100% of CPU.
 >
 > Here is truss output:
 > wait4(0xffffffff,0xbfbfe9d8,0x2,0x0)             ERR#10 'No child processes'
 > wait4(0xffffffff,0xbfbfe9d8,0x2,0x0)             ERR#10 'No child processes'
 > wait4(0xffffffff,0xbfbfe9d8,0x2,0x0)             ERR#10 'No child processes'
 > wait4(0xffffffff,0xbfbfe9d8,0x2,0x0)             ERR#10 'No child processes'
 > wait4(0xffffffff,0xbfbfe9d8,0x2,0x0)             ERR#10 'No child processes'
 > wait4(0xffffffff,0xbfbfe9d8,0x2,0x0)             ERR#10 'No child processes'
 > ......
 
 I've seen this behaviour for makeworld, and just today for making a kernel.
 It is hard to reproduce for makeworld.  At first I thought it might have
 been caused by a recent commit to the wait loop.  It wasn't exactly that.
 Next I thought that it was a kernel bug in my version of exit1().  Moving
 things back to nearer where they were seemed to reduce the problem but
 didn't fix it.  I'm happy that it is not my bug and can easily be reproduced
 :-).
 
 Bruce

From: Zherdev Anatoly <tolyar@mx.ru>
To: FreeBSD-gnats-submit@FreeBSD.org
Cc:  
Subject: Re: bin/66242: endless loop in sh(1)
Date: Thu, 6 May 2004 11:37:40 +0400

 This patch solves the problem for me:
 
 --- jobs.c.orig Fri Apr 30 15:10:48 2004
 +++ jobs.c      Thu May  6 10:50:17 2004
 @@ -928,6 +928,8 @@
         } while ((pid == -1 && errno == EINTR && breakwaitcmd == 0) ||
             (WIFSTOPPED(status) && !iflag));
         in_dowait--;
 +       if (pid == -1 && errno == ECHILD)
 +               job->state= JOBDONE;
         if (breakwaitcmd != 0) {
                 breakwaitcmd = 0;
                 return -1;
 
 
 -- 
 Zherdev Anatoly.

From: Eugene Grosbein <eugen@grosbein.pp.ru>
To: bug-followup@freebsd.org
Cc: Dmitry Sivachenko <mitya@demos.su>,
	Bruce Evans <bde@zeta.org.au>, Zherdev Anatoly <tolyar@mx.ru>
Subject: Re: bin/66242: endless loop in sh(1)
Date: Thu, 6 May 2004 22:52:08 +0800

 This patch also fixes http://www.freebsd.org/cgi/query-pr.cgi?pr=bin/58195
 that is still the problem for 4.10-PRERELEASE
 
 Eugene Grosbein
Responsible-Changed-From-To: freebsd-bugs->maxim 
Responsible-Changed-By: maxim 
Responsible-Changed-When: Thu May 6 11:24:39 PDT 2004 
Responsible-Changed-Why:  
Punish myself. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=66242 

From: Tor Egge <Tor.Egge@cvsup.no.freebsd.org>
To: tolyar@mx.ru
Cc: freebsd-bugs@freebsd.org, FreeBSD-gnats-submit@freebsd.org
Subject: Re: bin/66242: endless loop in sh(1)
Date: Fri, 07 May 2004 23:39:27 +0000 (GMT)

 >  This patch solves the problem for me:
 
 An alternate patch is:
 
 Index: jobs.c
 ===================================================================
 RCS file: /home/ncvs/src/bin/sh/jobs.c,v
 retrieving revision 1.67
 diff -u -r1.67 jobs.c
 --- jobs.c	6 Apr 2004 20:06:51 -0000	1.67
 +++ jobs.c	7 May 2004 22:57:27 -0000
 @@ -926,7 +926,8 @@
  	in_dowait--;
  	if (breakwaitcmd != 0) {
  		breakwaitcmd = 0;
 -		return -1;
 +		if (pid <= 0) 
 +			return -1;
  	}
  	if (pid <= 0)
  		return pid;
 
 
 By not returning early when breakwaitcmd is nonzero and pid is positive,
 dowait() can record that the process has exited, allowing the shell to exit
 normally instead of going into an infinite loop for the script provided in the
 PR.
 
 The shell can go into a similar loop if it's waiting for a child process that
 another process is tracing.  Your patch would cause the shell to believe the
 child dead, while it's only temporarily the child of another process (the
 tracer).  A short sleep if errno was ECHILD would limit the resource usage.
 
 The shell can also go into a similar loop if the child was killed by signal
 127, since the shell would believe the child to have only stopped (WIFSTOPPED()
 macro returns nonzero value).  Disallowing signals 127 and 128 will fix that
 problem.
 
 - Tor Egge

From: Eugene Grosbein <eugen@grosbein.pp.ru>
To: cracauer@freebsd.org
Cc: bug-followup@freebsd.org
Subject: Re: bin/66242: endless loop in sh(1)
Date: Wed, 10 Nov 2004 22:18:12 +0700

 > Commit fix sent by Tor Egge <Tor.Egge@cvsup.no.freebsd.org>
 > Only use return value from system call if system call succeeded.
 > 
 > Tested with `make world` and some of my own scripts.
 > This should be MFCed soon.  While /bin/sh is hard to test the fix is
 > obviously correct and can be assumed not to break something else
 > (famous last words...).
 
 Would you like to MFC this to RELENG_4?
 Nine months was nice test period :-)
 
 Eugene Grosbein

From: Martin Cracauer <cracauer@cons.org>
To: Eugene Grosbein <eugen@grosbein.pp.ru>
Cc: cracauer@FreeBSD.ORG, bug-followup@FreeBSD.ORG
Subject: Re: bin/66242: endless loop in sh(1)
Date: Wed, 10 Nov 2004 10:23:04 -0500

 Not sure, I think Bruce found a problem with that one.
 
 Let me check my archives.
 
 Martin
 
 Eugene Grosbein wrote on Wed, Nov 10, 2004 at 10:18:12PM +0700: 
 > 
 > > Commit fix sent by Tor Egge <Tor.Egge@cvsup.no.freebsd.org>
 > > Only use return value from system call if system call succeeded.
 > > 
 > > Tested with `make world` and some of my own scripts.
 > > This should be MFCed soon.  While /bin/sh is hard to test the fix is
 > > obviously correct and can be assumed not to break something else
 > > (famous last words...).
 > 
 > Would you like to MFC this to RELENG_4?
 > Nine months was nice test period :-)
 > 
 > Eugene Grosbein
 
 -- 
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 Martin Cracauer <cracauer@cons.org>   http://www.cons.org/cracauer/
  No warranty.    This email is probably produced by one of my cats 
  stepping on the keys. No, I don't have an infinite number of cats.

From: Maxim Konovalov <maxim@FreeBSD.org>
To: Eugene Grosbein <eugen@grosbein.pp.ru>
Cc: bug-followup@FreeBSD.org
Subject: Re: bin/66242: endless loop in sh(1)
Date: Wed, 10 Nov 2004 18:35:38 +0300 (MSK)

 On Wed, 10 Nov 2004, 15:20-0000, Eugene Grosbein wrote:
 
 > The following reply was made to PR bin/66242; it has been noted by GNATS.
 >
 > From: Eugene Grosbein <eugen@grosbein.pp.ru>
 > To: cracauer@freebsd.org
 > Cc: bug-followup@freebsd.org
 > Subject: Re: bin/66242: endless loop in sh(1)
 > Date: Wed, 10 Nov 2004 22:18:12 +0700
 >
 >  > Commit fix sent by Tor Egge <Tor.Egge@cvsup.no.freebsd.org>
 >  > Only use return value from system call if system call succeeded.
 >  >
 >  > Tested with `make world` and some of my own scripts.
 >  > This should be MFCed soon.  While /bin/sh is hard to test the fix is
 >  > obviously correct and can be assumed not to break something else
 >  > (famous last words...).
 >
 >  Would you like to MFC this to RELENG_4?
 
 I need to test/commit the bug fix to HEAD first.
 
 >  Nine months was nice test period :-)
 
 It is/was not a test period at all.  My FreeBSD time is very limited
 these days.  I have no problem if SomeOne(tm) makes all tests/commits.
 
 -- 
 Maxim Konovalov
State-Changed-From-To: open->patched 
State-Changed-By: maxim 
State-Changed-When: Thu Dec 2 13:14:48 GMT 2004 
State-Changed-Why:  
Fixed in -CURRENT, thanks! 

http://www.freebsd.org/cgi/query-pr.cgi?pr=66242 

From: Matteo Riondato <rionda@gufi.org>
To: Gnats PR Database <freebsd-gnats-submit@freebsd.org>
Cc: maxim@freebsd.org
Subject: Re: bin/66242: endless loop in sh(1)
Date: Sat, 9 Apr 2005 14:16:01 +0200

 --DaTEgnRu5pEC8TLg
 Content-Type: text/plain; charset=us-ascii
 Content-Disposition: inline
 Content-Transfer-Encoding: quoted-printable
 
 This has not been MFCed to RELENG_5 yet.
 Thanks.
 Best Regards
 --=20
 Rionda aka Matteo Riondato
 Disinformato per default
 G.U.F.I. Staff Member (http://www.gufi.org)
 FreeSBIE Developer (http://www.freesbie.org)
 
 --DaTEgnRu5pEC8TLg
 Content-Type: application/pgp-signature
 Content-Disposition: inline
 
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.0 (FreeBSD)
 
 iD8DBQFCV8eB2Mp4pR7Fa+wRAv1HAJ9Zgt8ICq0QUgdLxnVxLb3kNmkL1gCgipY/
 AIzdDD3+BSDVWH1i+hG4PBc=
 =ZEQc
 -----END PGP SIGNATURE-----
 
 --DaTEgnRu5pEC8TLg--
State-Changed-From-To: patched->open 
State-Changed-By: maxim 
State-Changed-When: Mon May 30 13:43:25 GMT 2005 
State-Changed-Why:  
src/bin/sh/jobs.c rev. 1.68 was merged to RELENG_5.  bde@ still 
has some concerns regarding this PR but I failed to reproduce them. 


Responsible-Changed-From-To: maxim->freebsd-bugs 
Responsible-Changed-By: maxim 
Responsible-Changed-When: Mon May 30 13:43:25 GMT 2005 
Responsible-Changed-Why:  


http://www.freebsd.org/cgi/query-pr.cgi?pr=66242 

From: Maxim Konovalov <maxim@macomnet.ru>
To: Dmitry Sivachenko <mitya@demos.su>
Cc: bug-followup@freebsd.org, bde@freebsd.org,
        Eugene Grosbein <eugen@grosbein.pp.ru>
Subject: bin/66242
Date: Fri, 14 Apr 2006 01:22:36 +0400 (MSD)

 Hello,
 
 I've lost the track of the PR.  Gentlemen, is it still an issue in
 RELENG_6 and HEAD?
 
 -- 
 Maxim Konovalov

From: Bruce Evans <bde@zeta.org.au>
To: Maxim Konovalov <maxim@macomnet.ru>
Cc: Dmitry Sivachenko <mitya@demos.su>, bug-followup@FreeBSD.org,
        bde@FreeBSD.org, Eugene Grosbein <eugen@grosbein.pp.ru>
Subject: Re: bin/66242
Date: Fri, 14 Apr 2006 20:14:00 +1000 (EST)

 On Fri, 14 Apr 2006, Maxim Konovalov wrote:
 
 > Hello,
 >
 > I've lost the track of the PR.  Gentlemen, is it still an issue in
 > RELENG_6 and HEAD?
 
 You committed a fix that I'm happy with.  I threw away all my local
 patches for this problem.
 
 Bruce

From: =?koi8-r?B?5M3J1NLJyiDzydfB3sXOy88=?= <mitya@demos.su>
To: Maxim Konovalov <maxim@macomnet.ru>
Cc: bug-followup@freebsd.org, bde@freebsd.org,
	Eugene Grosbein <eugen@grosbein.pp.ru>
Subject: Re: bin/66242
Date: Fri, 14 Apr 2006 14:28:26 +0400

 On Fri, Apr 14, 2006 at 01:22:36AM +0400, Maxim Konovalov wrote:
 > Hello,
 > 
 > I've lost the track of the PR.  Gentlemen, is it still an issue in
 > RELENG_6 and HEAD?
 > 
 
 
 It seems the problem is fixed.
State-Changed-From-To: open->closed 
State-Changed-By: maxim 
State-Changed-When: Fri Apr 14 10:42:25 UTC 2006 
State-Changed-Why:  
It seems the bug was fixed finally. 


Responsible-Changed-From-To: freebsd-bugs->maxim 
Responsible-Changed-By: maxim 
Responsible-Changed-When: Fri Apr 14 10:42:25 UTC 2006 
Responsible-Changed-Why:  
I have committed the code. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=66242 
>Unformatted:
