From olli@lurza.secnetix.de  Tue Nov 19 04:44:00 2002
Return-Path: <olli@lurza.secnetix.de>
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 3E60D37B401
	for <FreeBSD-gnats-submit@freebsd.org>; Tue, 19 Nov 2002 04:44:00 -0800 (PST)
Received: from lurza.secnetix.de (lurza.secnetix.de [212.66.1.130])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 7323443E75
	for <FreeBSD-gnats-submit@freebsd.org>; Tue, 19 Nov 2002 04:43:59 -0800 (PST)
	(envelope-from olli@lurza.secnetix.de)
Received: from lurza.secnetix.de (localhost [IPv6:::1])
	by lurza.secnetix.de (8.12.6/8.12.5) with ESMTP id gAJChadK073556;
	Tue, 19 Nov 2002 13:43:36 +0100 (CET)
	(envelope-from oliver.fromme@secnetix.de)
Received: (from olli@localhost)
	by lurza.secnetix.de (8.12.6/8.12.5/Submit) id gAJChZLU073555;
	Tue, 19 Nov 2002 13:43:35 +0100 (CET)
Message-Id: <200211191243.gAJChZLU073555@lurza.secnetix.de>
Date: Tue, 19 Nov 2002 13:43:35 +0100 (CET)
From: Oliver Fromme <olli@secnetix.de>
Reply-To: Oliver Fromme <olli@secnetix.de>
To: FreeBSD-gnats-submit@freebsd.org
Cc: Oliver Fromme <olli@fromme.com>
Subject: /bin/sh coredump
X-Send-Pr-Version: 3.113
X-GNATS-Notify:

>Number:         45478
>Category:       bin
>Synopsis:       /bin/sh coredump
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    stefanf
>State:          closed
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Tue Nov 19 04:50:01 PST 2002
>Closed-Date:    Mon Dec 26 18:19:13 GMT 2005
>Last-Modified:  Mon Dec 26 18:19:13 GMT 2005
>Originator:     Oliver Fromme
>Release:        FreeBSD 4.7-RELEASE i386
>Organization:
secnetix GmbH & Co KG, http://www.secnetix.de/
>Environment:

System: FreeBSD 4.7-RELEASE

I could also reproduce the very same problem on 4.6 and
even on 4.4, so it seems to be a long-standing problem.

>Description:

$ /bin/sh
$ while for true; do false; done; do true; done
^C
$ set -E
sh in malloc(): warning: recursive call
sh in malloc(): warning: recursive call
Segmentation fault (core dumped)

>How-To-Repeat:

See above.  The problem is not 100% reproducible.
Sometimes it happens immediately, sometimes it takes
20+ attempts, but sooner or later it'll crash.
I guess it depends on when exctly you interrupt the
endless while loop.

I can also provide the core dump if necessary.

>Fix:

Sorry, none known.

I suggest someone familiar with the innards of our sh
looks at it.  It's probably easy to track down, using
/etc/malloc.conf and the core dumps ...

>Release-Note:
>Audit-Trail:
Responsible-Changed-From-To: freebsd-bugs->tjr 
Responsible-Changed-By: tjr 
Responsible-Changed-When: Sat Nov 30 05:17:53 PST 2002 
Responsible-Changed-Why:  
I believe this is caused by the SIGINT handler longjmp()'ing 
out when it's in the middle of a malloc() call. Calls to malloc() 
and free() should be bracketed in INTON and INTOFF. 

I haven't had much luck tracking this down in the past, but 
I'll try again to find the missing INTON/INTOFF. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=45478 
Responsible-Changed-From-To: tjr->freebsd-bugs 
Responsible-Changed-By: tjr 
Responsible-Changed-When: Sat Feb 14 23:36:58 PST 2004 
Responsible-Changed-Why:  
Unassign due to lack of time and interest. Perhaps someone else 
will pick this up. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=45478 

From: Giorgos Keramidas <keramida@freebsd.org>
To: Oliver Fromme <olli@secnetix.de>
Cc: bug-followup@freebsd.org, Oliver Fromme <olli@fromme.com>
Subject: Re: bin/45478: /bin/sh coredump
Date: Fri, 15 Apr 2005 18:52:32 +0300

 On 2002-11-19 13:43, Oliver Fromme <olli@secnetix.de> wrote:
 > Responsible-Changed-By: tjr
 > Responsible-Changed-Why:
 > I believe this is caused by the SIGINT handler longjmp()'ing
 > out when it's in the middle of a malloc() call. Calls to malloc()
 > and free() should be bracketed in INTON and INTOFF.
 >
 > I haven't had much luck tracking this down in the past, but
 > I'll try again to find the missing INTON/INTOFF.
 
 I just happened to stumble upon this bug today.  It's still with us in
 FreeBSD 6.0-CURRENT.  It seems that the inner for loop in the following:
 
 	while for true; do false; done; do true; done
 
 is not stopped by sh(1) when ^C is hit.  Even after the interrupt is
 received, sh consumes at least 5-15% of CPU on my test here, while it
 appears to be sitting at a PS1 prompt, waiting for more input.
 
   PID USERNAME   THR PRI NICE   SIZE    RES STATE    TIME   WCPU    CPU COMMAND
  2352 keramida     1   5    0  1668K  1192K ttyin    0:03 25.48% 10.79% sh
 
 After a few of these commands have been run, sh may reach CPU
 utilizations of even more:
 
   PID USERNAME   THR PRI NICE   SIZE    RES STATE    TIME   WCPU    CPU COMMAND
  2352 keramida     1 123    0  1672K  1196K RUN      1:11 63.21% 63.18% sh
 

From: Giorgos Keramidas <keramida@freebsd.org>
To: Oliver Fromme <olli@secnetix.de>
Cc: bug-followup@freebsd.org, Oliver Fromme <olli@fromme.com>
Subject: Re: bin/45478: /bin/sh coredump
Date: Fri, 15 Apr 2005 19:13:31 +0300

 On 2005-04-15 18:52, Giorgos Keramidas <keramida@freebsd.org> wrote:
 > On 2002-11-19 13:43, Oliver Fromme <olli@secnetix.de> wrote:
 > > Responsible-Changed-By: tjr
 > > Responsible-Changed-Why:
 > > I believe this is caused by the SIGINT handler longjmp()'ing
 > > out when it's in the middle of a malloc() call. Calls to malloc()
 > > and free() should be bracketed in INTON and INTOFF.
 > >
 > > I haven't had much luck tracking this down in the past, but
 > > I'll try again to find the missing INTON/INTOFF.
 >
 > I just happened to stumble upon this bug today.
 
 I managed to get sh to print "Out of space" after a few more
 invocations, and here's the backtrace I get either with gcore or by
 sending a SEGV to the process (there's no other way to stop it from
 printing infinite numbers of "Out of space" error messages):
 
 : (gdb) bt
 : #0  0x2811f2e3 in write () at write.S:2
 : #1  0x0805733d in xwrite (fd=2, buf=0x806a000 "Out of space\namida/Mailbox", nbytes=13)
 :     at output.c:318
 : #2  0x080573b4 in flushout (dest=0x806132c) at output.c:206
 : #3  0x08057418 in flushall () at output.c:196
 : #4  0x0804c733 in exverror (cond=1, msg=0x805eb68 "Out of space", 
 :     ap=0xbfbfe7f4 "4迿\031\222\006(\a(\021(\001") at error.c:156
 : #5  0x0804c787 in error (msg=0x806a000 "Out of space\namida/Mailbox") at error.c:166
 : #6  0x0805555c in ckmalloc (nbytes=500) at memalloc.c:61
 : #7  0x0805560d in stalloc (nbytes=496) at memalloc.c:132
 : #8  0x080557ad in growstackblock () at memalloc.c:247
 : #9  0x0804e1f1 in padvance (path=0xbfbfe8ac, name=0x806320c "") at exec.c:192
 : #10 0x08054d38 in chkmail (silent=0) at mail.c:88
 : #11 0x08054f92 in cmdloop (top=1) at main.c:213
 : #12 0x08055138 in main (argc=1, argv=0xbfbfea40) at main.c:183
 
 I don't know if this helps track down the problem though.  If anyone
 with more sh-clue wants me to send the core file or post more data out
 of it, please ask.
 

From: Nate Eldredge <nge@cs.hmc.edu>
To: bug-followup@FreeBSD.org, olli@secnetix.de
Cc: keramida@freebsd.org
Subject: Re: bin/45478: /bin/sh coredump
Date: Thu, 13 Oct 2005 14:24:07 -0700 (PDT)

 I have reproduced this and have a fix.  It does seem to have been 
 INTON/INTOFF.  To find where the signal was being handled inside malloc, 
 in gdb you can do
 
 handle SIGINT nostop pass
 (answer yes)
 break exraise
 command
  	where
  	continue
  	end
 run
 
 which will print a stack trace every time it handles the signal.  Then you 
 look for malloc inside the trace.
 
 I found the following offenders:
 
 #11 0x0805bba7 in redirect (redir=0x0, flags=1) at redir.c:111
 #11 0x0805e1a3 in setvar (name=0x80dd76e "_", val=0x810d0ac "true", 
 flags=0) at var.c:246
 
 However, a casual glance revealed several other un-wrapped calls.  90% of 
 these are made via ckmalloc/ckrealloc/ckfree.  Thus I think the easiest 
 and safest thing to do is to call INTON/INTOFF inside these functions. 
 They are safe to use recursively and have trivial overhead (incrementing a 
 counter).  Here is a patch which does this.  It is against CURRENT but I 
 imagine it will apply to other versions as well.  After this patch I 
 cannot make it crash anymore.
 
 I found only one unsafe use of malloc/free directly: in the umask builtin. 
 umaskcmd() calls setmode() which calls malloc, and then calls free.  My 
 patch puts INTON/INTOFF around this as well.
 
 By the way, I think the CPU usage thing is a red herring.  I assume the 
 numbers are from top, and its CPU field is an average.  It definitely 
 decays slowly to 0 after a process has been running, so it can report 
 nonzero even for a totally idle process.  sh gets commands from the user 
 with a blocking read, so it must be idle while waiting for input.
 
 diff -ur /usr/src/bin/sh/memalloc.c ./src/bin/sh/memalloc.c
 --- /usr/src/bin/sh/memalloc.c	Tue Apr  6 13:06:51 2004
 +++ ./src/bin/sh/memalloc.c	Thu Oct 13 14:02:30 2005
 @@ -57,8 +57,10 @@
   {
   	pointer p;
 
 +	INTOFF;
   	if ((p = malloc(nbytes)) == NULL)
   		error("Out of space");
 +	INTON;
   	return p;
   }
 
 @@ -70,11 +72,20 @@
   pointer
   ckrealloc(pointer p, int nbytes)
   {
 +	INTOFF;
   	if ((p = realloc(p, nbytes)) == NULL)
   		error("Out of space");
 +	INTON;
   	return p;
   }
 
 +void
 +ckfree(pointer p)
 +{
 +	INTOFF;
 +	free(p);
 +	INTON;
 +}
 
   /*
    * Make a copy of a string in safe storage.
 diff -ur /usr/src/bin/sh/memalloc.h ./src/bin/sh/memalloc.h
 --- /usr/src/bin/sh/memalloc.h	Tue Apr  6 13:06:51 2004
 +++ ./src/bin/sh/memalloc.h	Thu Oct 13 14:01:27 2005
 @@ -48,6 +48,7 @@
 
   pointer ckmalloc(int);
   pointer ckrealloc(pointer, int);
 +void ckfree(pointer);
   char *savestr(char *);
   pointer stalloc(int);
   void stunalloc(pointer);
 @@ -72,5 +73,3 @@
   #define STTOPC(p)	p[-1]
   #define STADJUST(amount, p)	(p += (amount), sstrnleft -= (amount))
   #define grabstackstr(p)	stalloc(stackblocksize() - sstrnleft)
 -
 -#define ckfree(p)	free((pointer)(p))
 diff -ur /usr/src/bin/sh/miscbltin.c ./src/bin/sh/miscbltin.c
 --- /usr/src/bin/sh/miscbltin.c	Fri Sep  9 12:59:41 2005
 +++ ./src/bin/sh/miscbltin.c	Thu Oct 13 14:00:13 2005
 @@ -274,12 +274,14 @@
   			umask(mask);
   		} else {
   			void *set;
 +			INTOFF;
   			if ((set = setmode (ap)) == 0)
   				error("Illegal number: %s", ap);
 
   			mask = getmode (set, ~mask & 0777);
   			umask(~mask & 0777);
   			free(set);
 +			INTON;
   		}
   	}
   	return 0;
 
 -- 
 Nate Eldredge
 nge@cs.hmc.edu
Responsible-Changed-From-To: freebsd-bugs->stefanf 
Responsible-Changed-By: stefanf 
Responsible-Changed-When: Fri Oct 14 08:51:22 GMT 2005 
Responsible-Changed-Why:  
I'll try to fix that one. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=45478 

From: Oliver Fromme <olli@lurza.secnetix.de>
To: bug-followup@FreeBSD.org, nge@cs.hmc.edu
Cc:  
Subject: Re: bin/45478: /bin/sh coredump
Date: Fri, 14 Oct 2005 14:42:04 +0200 (CEST)

 I have applied Nate Eldredge's patch.
 With that patch, I couldn't reproduce the problem anymore.
 
 Thank you very much, Nate!
 
 I would vote for committing this.
 
 -- 
State-Changed-From-To: open->patched 
State-Changed-By: stefanf 
State-Changed-When: Fri Oct 28 11:02:38 GMT 2005 
State-Changed-Why:  
I committed a modified version of this patch to HEAD.  Thanks. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=45478 
State-Changed-From-To: patched->closed 
State-Changed-By: stefanf 
State-Changed-When: Mon Dec 26 18:19:02 UTC 2005 
State-Changed-Why:  
Also fixed in RELENG_{5,6}. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=45478 
>Unformatted:
