From mteterin@250-217.customer.cloud9.net  Thu Oct 21 20:51:13 2004
Return-Path: <mteterin@250-217.customer.cloud9.net>
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 737A616A4CE
	for <FreeBSD-gnats-submit@freebsd.org>; Thu, 21 Oct 2004 20:51:13 +0000 (GMT)
Received: from corbulon.video-collage.com (aldan.algebra.com [216.254.65.224])
	by mx1.FreeBSD.org (Postfix) with ESMTP id D36D243D4C
	for <FreeBSD-gnats-submit@freebsd.org>; Thu, 21 Oct 2004 20:51:12 +0000 (GMT)
	(envelope-from mteterin@250-217.customer.cloud9.net)
Received: from 250-217.customer.cloud9.net (195-11.customer.cloud9.net [168.100.195.11])
	by corbulon.video-collage.com (8.12.11/8.12.11) with ESMTP id i9LKpAIw029900
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK)
	for <FreeBSD-gnats-submit@freebsd.org>; Thu, 21 Oct 2004 16:51:11 -0400 (EDT)
	(envelope-from mteterin@250-217.customer.cloud9.net)
Received: from 250-217.customer.cloud9.net (mteterin@localhost [127.0.0.1])
	by 250-217.customer.cloud9.net (8.13.1/8.13.1) with ESMTP id i9LKp4hn058118
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO)
	for <FreeBSD-gnats-submit@freebsd.org>; Thu, 21 Oct 2004 16:51:04 -0400 (EDT)
	(envelope-from mteterin@250-217.customer.cloud9.net)
Received: (from mteterin@localhost)
	by 250-217.customer.cloud9.net (8.13.1/8.13.1/Submit) id i9LKp4vF058117;
	Thu, 21 Oct 2004 16:51:04 -0400 (EDT)
	(envelope-from mteterin)
Message-Id: <200410212051.i9LKp4vF058117@250-217.customer.cloud9.net>
Date: Thu, 21 Oct 2004 16:51:04 -0400 (EDT)
From: Mikhail Teterin <mi@aldan.algebra.com>
To: FreeBSD-gnats-submit@freebsd.org
Subject: unkillable process(es) stuck in `STOP' state
X-Send-Pr-Version: 3.113
X-GNATS-Notify:

>Number:         72979
>Category:       kern
>Synopsis:       unkillable process(es) stuck in `STOP' state
>Confidential:   no
>Severity:       critical
>Priority:       high
>Responsible:    freebsd-bugs
>State:          closed
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Thu Oct 21 21:00:44 GMT 2004
>Closed-Date:    Fri Nov 11 13:14:27 GMT 2005
>Last-Modified:  Tue Jan  2 11:10:18 GMT 2007
>Originator:     Mikhail Teterin
>Release:        FreeBSD 6.0-CURRENT i386
>Organization:
Virtual Estates, Inc.
>Environment:
System: FreeBSD mi 6.0-CURRENT FreeBSD 6.0-CURRENT #1: Wed Oct 20 12:08:24 EDT 2004 mteterin@mi:/meow/obj/misha/src/sys/Gigabyte i386

	debug.mpsafenet and safevm both set to 0

>Description:

Somehow, it is possible to get a process into a state, where it can not be
killed. The state is reported as `T' by ps and as `STOP' by top. There seems
to be a zomie-child of the process, when this happens:

  UID   PID  PPID CPU PRI NI   VSZ  RSS MWCHAN STAT  TT       TIME COMMAND
 1042  1096     1  57   8  0 68044 50228 -      T     ??    0:27,88 kontact
 1042  4903  1096   0 -84  0     0    0 -      Z     ??    0:03,07 <defunct>

But neither -CONT nor -KILL can get rid of the process. Attempts to ktrace
it result in empty ktrace.out.

>How-To-Repeat:

	It is unclear, what exactly is causing this. So far, I have seen it
	twice -- both time with KDE-programs (KMail and Kontact), which attach
	a debugger to themselves, when crashing (a frequent occurence).

	Whatever mistake KDE may be making in their error-handling, it does
	not explain an unkillable process.

	May be, this has something to do with the threading library (libpthread)?

>Fix:

	Don't know...
>Release-Note:
>Audit-Trail:

From: Stefan Walter <sw@gegenunendlich.de>
To: freebsd-gnats-submit@FreeBSD.org, mi@aldan.algebra.com
Cc:  
Subject: Re: kern/72979: unkillable process(es) stuck in `STOP' state
Date: Mon, 25 Oct 2004 14:46:20 +0200

 I have just seen the same thing on a RELENG_5 system as of Oct 24th. In
 this case it was Umbrello, which comes with devel/kdesdk3; Umbrello
 crashed (application bug, AFAICS), and when I closed the window that
 showed the backtrace, the process remained in state 'STOP', according to
 top(1).
 
 I don't know if the application's backtrace can be of any help here, but
 just in case:
 
 [Switching to LWP 100139]
 0x296474d7 in wait4 () from /lib/libc.so.5
 #0  0x296474d7 in wait4 () from /lib/libc.so.5
 #1  0x296388df in waitpid () from /lib/libc.so.5
 #2  0x294b4089 in waitpid () from /usr/lib/libpthread.so.1
 #3  0x28b2c7ed in KCrash::defaultCrashHandler ()
    from /usr/local/lib/libkdecore.so.6
 #4  0x294ba6ef in sigaction () from /usr/lib/libpthread.so.1
 #5  0xbfbfff94 in ?? ()
 #6  0x0000000b in ?? ()
 #7  0xbfbfdf40 in ?? ()
 #8  0xbfbfdc80 in ?? ()
 #9  0x00000000 in ?? ()
 #10 0x294ba41c in sigaction () from /usr/lib/libpthread.so.1
 #11 0x0828fd78 in JavaCodeClassField::~JavaCodeClassField ()
 #12 0x28f5fd0b in QObject::~QObject () from /usr/X11R6/lib/libqt-mt.so.3
 #13 0x0818698d in UMLObject::~UMLObject ()
 #14 0x0818a8cd in UMLRole::~UMLRole ()
 #15 0x28f5fd0b in QObject::~QObject () from /usr/X11R6/lib/libqt-mt.so.3
 #16 0x0818698d in UMLObject::~UMLObject ()
 #17 0x080fa535 in UMLAssociation::~UMLAssociation ()
 #18 0x28f606a5 in QObject::event () from /usr/X11R6/lib/libqt-mt.so.3
 #19 0x28f0a29d in QApplication::internalNotify ()
    from /usr/X11R6/lib/libqt-mt.so.3
 #20 0x28f096ca in QApplication::notify () from /usr/X11R6/lib/libqt-mt.so.3
 #21 0x28aa1945 in KApplication::notify () from /usr/local/lib/libkdecore.so.6
 #22 0x28f0afd5 in QApplication::sendPostedEvents ()
    from /usr/X11R6/lib/libqt-mt.so.3
 #23 0x28f0adec in QApplication::sendPostedEvents ()
    from /usr/X11R6/lib/libqt-mt.so.3
 #24 0x28eba528 in QEventLoop::processEvents ()
    from /usr/X11R6/lib/libqt-mt.so.3
 #25 0x28f1b44f in QEventLoop::enterLoop () from /usr/X11R6/lib/libqt-mt.so.3
 #26 0x28f1b3a8 in QEventLoop::exec () from /usr/X11R6/lib/libqt-mt.so.3
 #27 0x28f0a3f4 in QApplication::exec () from /usr/X11R6/lib/libqt-mt.so.3
 #28 0x08144ee3 in main ()

From: Mikhail Teterin <mi+mx@aldan.algebra.com>
To: davidxu@freebsd.org
Cc: Michael Nottebrock <michaelnottebrock@gmx.net>, re@freebsd.org,
	davidxu@t2t2.com, davidxu@viatech.com.cn,
	freebsd-gnats-submit@freebsd.org
Subject: Re: kern/72979: unkillable process(es) stuck in `STOP' state
Date: Mon, 25 Oct 2004 22:38:56 -0400

 David, would you take a look at this? Some apps sometimes end up unkillable:
 
  http://www.freebsd.org/cgi/query-pr.cgi?pr=72979
 
 If you provide an encouraging answer quick enough, we may even be able to 
 persuade re@ to hold the release long enough for you to fix it. Thanks!
 
  -mi

From: Ken Smith <kensmith@cse.Buffalo.EDU>
To: Mikhail Teterin <mi+mx@aldan.algebra.com>
Cc: davidxu@freebsd.org,
	Michael Nottebrock <michaelnottebrock@gmx.net>, re@freebsd.org,
	davidxu@t2t2.com, davidxu@viatech.com.cn,
	freebsd-gnats-submit@freebsd.org
Subject: Re: kern/72979: unkillable process(es) stuck in `STOP' state
Date: Mon, 25 Oct 2004 23:21:29 -0400

 --lrZ03NoBR/3+SXJZ
 Content-Type: text/plain; charset=us-ascii
 Content-Disposition: inline
 Content-Transfer-Encoding: quoted-printable
 
 On Mon, Oct 25, 2004 at 10:38:56PM -0400, Mikhail Teterin wrote:
 > David, would you take a look at this? Some apps sometimes end up unkillab=
 le:
 >=20
 >  http://www.freebsd.org/cgi/query-pr.cgi?pr=3D72979
 >=20
 > If you provide an encouraging answer quick enough, we may even be able to=
 =20
 > persuade re@ to hold the release long enough for you to fix it. Thanks!
 >=20
 >  -mi
 
 David,
 
 Scott thought you might have had some work done on this already, and
 perhaps even some testable code.  If that's true can you let us know
 please?  Scott needs to head to Europe for the conference but I can
 try to push this stuff along while he's occupied with the conference.
 If you do have some time invested in this can you let us know what
 state it's in please?  We'll take it from there to decide what to
 do with it.
 
 Thanks.
 
 --=20
 						Ken Smith
 - From there to here, from here to      |       kensmith@cse.buffalo.edu
   there, funny things are everywhere.   |
                       - Theodore Geisel |
 
 --lrZ03NoBR/3+SXJZ
 Content-Type: application/pgp-signature
 Content-Disposition: inline
 
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.2.2 (SunOS)
 
 iD8DBQFBfcK3/G14VSmup/YRAkicAKCDimmHXdtFX3/jaU5g5C+DBD4euwCfbKf/
 xJe3pYk6xs9mPvTHAR1eDBc=
 =1mOr
 -----END PGP SIGNATURE-----
 
 --lrZ03NoBR/3+SXJZ--

From: David Xu <davidxu@freebsd.org>
To: Mikhail Teterin <mi+mx@aldan.algebra.com>
Cc: Michael Nottebrock <michaelnottebrock@gmx.net>, re@freebsd.org,
	davidxu@t2t2.com, davidxu@viatech.com.cn,
	freebsd-gnats-submit@freebsd.org
Subject: Re: kern/72979: unkillable process(es) stuck in `STOP' state
Date: Tue, 26 Oct 2004 19:11:54 +0800

 Mikhail Teterin wrote:
 
 >David, would you take a look at this? Some apps sometimes end up unkillable:
 >
 > http://www.freebsd.org/cgi/query-pr.cgi?pr=72979
 >
 >If you provide an encouraging answer quick enough, we may even be able to 
 >persuade re@ to hold the release long enough for you to fix it. Thanks!
 >
 > -mi
 >
 >
 >  
 >
 I had committed a change about 3 days ago, this might fix the problem, 
 can you
 try ?
 
 http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/kern/kern_exit.c.diff?r1=1.250&r2=1.251&f=h
 
 David Xu
 

From: Ken Smith <kensmith@cse.Buffalo.EDU>
To: David Xu <davidxu@freebsd.org>
Cc: Mikhail Teterin <mi+mx@aldan.algebra.com>,
	Michael Nottebrock <michaelnottebrock@gmx.net>, re@freebsd.org,
	davidxu@t2t2.com, davidxu@viatech.com.cn,
	freebsd-gnats-submit@freebsd.org
Subject: Re: kern/72979: unkillable process(es) stuck in `STOP' state
Date: Tue, 26 Oct 2004 10:00:32 -0400

 On Tue, Oct 26, 2004 at 07:11:54PM +0800, David Xu wrote:
 > Mikhail Teterin wrote:
 > 
 > >David, would you take a look at this? Some apps sometimes end up 
 > >unkillable:
 > >
 > >http://www.freebsd.org/cgi/query-pr.cgi?pr=72979
 > >
 > >If you provide an encouraging answer quick enough, we may even be able to 
 > >persuade re@ to hold the release long enough for you to fix it. Thanks!
 > >
 > >-mi
 > >
 > I had committed a change about 3 days ago, this might fix the problem, 
 > can you
 > try ?
 > 
 > http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/kern/kern_exit.c.diff?r1=1.250&r2=1.251&f=h
 > 
 
 That looks both promising (given symptoms) and low-risk.
 
 Mikhail et. al., will you have any problems testing that?
 
 -- 
 						Ken Smith
 - From there to here, from here to      |       kensmith@cse.buffalo.edu
   there, funny things are everywhere.   |
                       - Theodore Geisel |

From: Mikhail Teterin <mi+mx@aldan.algebra.com>
To: Ken Smith <kensmith@cse.buffalo.edu>
Cc: David Xu <davidxu@freebsd.org>,
	Michael Nottebrock <michaelnottebrock@gmx.net>, re@freebsd.org,
	davidxu@t2t2.com, davidxu@viatech.com.cn,
	freebsd-gnats-submit@freebsd.org
Subject: Re: kern/72979: unkillable process(es) stuck in `STOP' state
Date: Tue, 26 Oct 2004 12:07:47 -0400

 =On Tue, Oct 26, 2004 at 07:11:54PM +0800, David Xu wrote:
 =That looks both promising (given symptoms) and low-risk.
 =
 =Mikhail et. al., will you have any problems testing that?
 
 This problem was not easy to reproduce to begin with :-(
 
 I don't even know, how it can be positively claimed gone. Shall we merge 
 David's fix into RELENG_5 and hope for the best?
 
  -mi

From: Scott Long <scottl@freebsd.org>
To: Mikhail Teterin <mi+mx@aldan.algebra.com>
Cc: Ken Smith <kensmith@cse.buffalo.edu>,
	David Xu <davidxu@freebsd.org>,
	Michael Nottebrock <michaelnottebrock@gmx.net>, re@freebsd.org,
	davidxu@t2t2.com, davidxu@viatech.com.cn,
	freebsd-gnats-submit@freebsd.org
Subject: Re: kern/72979: unkillable process(es) stuck in `STOP' state
Date: Tue, 26 Oct 2004 10:14:48 -0600

 Mikhail Teterin wrote:
 > =On Tue, Oct 26, 2004 at 07:11:54PM +0800, David Xu wrote:
 > =That looks both promising (given symptoms) and low-risk.
 > =
 > =Mikhail et. al., will you have any problems testing that?
 > 
 > This problem was not easy to reproduce to begin with :-(
 > 
 > I don't even know, how it can be positively claimed gone. Shall we merge 
 > David's fix into RELENG_5 and hope for the best?
 > 
 >  -mi
 
 One of the emails in the thread was by Sam Leffler and he seemed to have
 a very easy procedure for producing the problem.  It only required
 attaching GDB to any running threaded process.  Can you look that up and
 see if you get similar results?
 
 Scott

From: Ken Smith <kensmith@cse.Buffalo.EDU>
To: David Xu <davidxu@freebsd.org>
Cc: Mikhail Teterin <mi+mx@aldan.algebra.com>,
	Michael Nottebrock <michaelnottebrock@gmx.net>, re@freebsd.org,
	davidxu@t2t2.com, davidxu@viatech.com.cn,
	freebsd-gnats-submit@freebsd.org
Subject: Re: kern/72979: unkillable process(es) stuck in `STOP' state
Date: Tue, 26 Oct 2004 22:53:05 -0400

 --5mCyUwZo2JvN/JJP
 Content-Type: text/plain; charset=us-ascii
 Content-Disposition: inline
 Content-Transfer-Encoding: quoted-printable
 
 On Tue, Oct 26, 2004 at 07:11:54PM +0800, David Xu wrote:
 > Mikhail Teterin wrote:
 >=20
 > >David, would you take a look at this? Some apps sometimes end up=20
 > >unkillable:
 > >
 > >http://www.freebsd.org/cgi/query-pr.cgi?pr=3D72979
 > >
 > >If you provide an encouraging answer quick enough, we may even be able t=
 o=20
 > >persuade re@ to hold the release long enough for you to fix it. Thanks!
 > >
 > >-mi
 > >
 > I had committed a change about 3 days ago, this might fix the problem,=20
 > can you
 > try ?
 >=20
 > http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/kern/kern_exit.c.diff?r1=3D=
 1.250&r2=3D1.251&f=3Dh
 >=20
 > David Xu
 
 David,
 
 I think I have successfully tested your fix.  Even if it doesn't wind
 up fixing these problems it does fix something that shouldn't happen.
 
 Please commit to RELENG_5 and RELENG_5_3.  If you are too busy to do
 it can you let me know please? I'll do it if you can't.  We're making
 progress on the other big show-stopper problem and might have a resolution
 to that as early as tomorrow (if a patch doesn't work at least we know
 the root cause of a hang and can tell people how to avoid it; but a
 patch that might fix the cause of the hang is being tested).  We're
 thinking about doing a mini-RC2 just to test out these things and
 hopefully be ready to re-roll the release when the conference is over
 and people return home.
 
 Knowing the fix it's trivially easy to do something that should not
 happen:
 
 	% gdb prog	# can be anything
 	gdb> break main
 	gdb> run
 	gdb> ^Z
 	% ps
 	% kill <pid of gdb>
 
 Process for prog will be left behind and can't be killed.  With David's
 patch process for prog dies along with gdb.
 
 Thanks...
 
 --=20
 						Ken Smith
 - From there to here, from here to      |       kensmith@cse.buffalo.edu
   there, funny things are everywhere.   |
                       - Theodore Geisel |
 
 --5mCyUwZo2JvN/JJP
 Content-Type: application/pgp-signature
 Content-Disposition: inline
 
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.2.2 (SunOS)
 
 iD8DBQFBfw2O/G14VSmup/YRAm65AJwLdHfdSQJF5PN82MMyfa01nTknzQCffcGW
 EdNmGeogzX6PaOT2e/vJBfc=
 =GcZV
 -----END PGP SIGNATURE-----
 
 --5mCyUwZo2JvN/JJP--

From: Ken Smith <kensmith@cse.Buffalo.EDU>
To: Ken Smith <kensmith@cse.Buffalo.EDU>
Cc: David Xu <davidxu@freebsd.org>,
	Mikhail Teterin <mi+mx@aldan.algebra.com>,
	Michael Nottebrock <michaelnottebrock@gmx.net>, re@freebsd.org,
	davidxu@t2t2.com, davidxu@viatech.com.cn,
	freebsd-gnats-submit@freebsd.org
Subject: Re: kern/72979: unkillable process(es) stuck in `STOP' state
Date: Tue, 26 Oct 2004 23:54:19 -0400

 On Tue, Oct 26, 2004 at 10:53:05PM -0400, Ken Smith wrote:
 
 > David,
 > 
 > I think I have successfully tested your fix.  Even if it doesn't wind
 > up fixing these problems it does fix something that shouldn't happen.
 > 
 > Please commit to RELENG_5 and RELENG_5_3.  If you are too busy to do
 > it can you let me know please?
 
 Sigh.  Hold off a wee bit please.
 
 That test I sent eariler was 100% reproducible and your patch fixed
 that case.  But I found someone else who was having similar problems
 and he reports he can still get stuck processes.  I'll spend some
 time trying to reproduce his case to see if I can reliably make
 it happen somehow.  He said with this patch it does change what
 happens - the process being debugged does notice that it gets hit
 with a sighup when he exits gdb but the process being debugged
 doesn't die.
 
 -- 
 						Ken Smith
 - From there to here, from here to      |       kensmith@cse.buffalo.edu
   there, funny things are everywhere.   |
                       - Theodore Geisel |

From: Mikhail Teterin <mi+kde@aldan.algebra.com>
To: Ken Smith <kensmith@cse.buffalo.edu>
Cc: David Xu <davidxu@freebsd.org>,
	Mikhail Teterin <mi+mx@aldan.algebra.com>,
	Michael Nottebrock <michaelnottebrock@gmx.net>, re@freebsd.org,
	davidxu@t2t2.com, davidxu@viatech.com.cn,
	freebsd-gnats-submit@freebsd.org
Subject: Re: kern/72979: unkillable process(es) stuck in `STOP' state
Date: Wed, 27 Oct 2004 07:58:44 -0400

 On Tuesday 26 October 2004 10:53 pm, Ken Smith wrote:
  
 =  % gdb prog # can be anything
 =  gdb> break main
 =  gdb> run
 =  gdb> ^Z
 =  % ps
 =  % kill <pid of gdb>
 = 
 = Process for prog will be left behind and can't be killed. With David's
 = patch process for prog dies along with gdb.
 
 Should not just the gdb be killed in the above scenario -- without the
 explicit killing of prog?
 
 Or does killing the tracer always imply killing the tracee? Thanks!
 
  -mi
 

From: David Xu <davidxu@freebsd.org>
To: Mikhail Teterin <mi+kde@aldan.algebra.com>
Cc: Ken Smith <kensmith@cse.buffalo.edu>,
	Mikhail Teterin <mi+mx@aldan.algebra.com>,
	Michael Nottebrock <michaelnottebrock@gmx.net>, re@freebsd.org,
	davidxu@t2t2.com, davidxu@viatech.com.cn,
	freebsd-gnats-submit@freebsd.org
Subject: Re: kern/72979: unkillable process(es) stuck in `STOP' state
Date: Wed, 27 Oct 2004 20:13:11 +0800

 Mikhail Teterin wrote:
 
 >On Tuesday 26 October 2004 10:53 pm, Ken Smith wrote:
 > 
 >=  % gdb prog # can be anything
 >=  gdb> break main
 >=  gdb> run
 >=  gdb> ^Z
 >=  % ps
 >=  % kill <pid of gdb>
 >= 
 >= Process for prog will be left behind and can't be killed. With David's
 >= patch process for prog dies along with gdb.
 >
 >Should not just the gdb be killed in the above scenario -- without the
 >explicit killing of prog?
 >
 >Or does killing the tracer always imply killing the tracee? Thanks!
 >
 > -mi
 >
 >
 >
 >  
 >
 Killing tracer should always kill tracee as well, this becauses tracer
 may have changed tracees 's code or data, and if tracer dies, the tracee
 will be segment fault otherwise if we let it continue.
 David Xu
 

From: Ken Smith <kensmith@cse.Buffalo.EDU>
To: Mikhail Teterin <mi+mx@aldan.algebra.com>
Cc: Ken Smith <kensmith@cse.Buffalo.EDU>,
	David Xu <davidxu@freebsd.org>,
	Michael Nottebrock <michaelnottebrock@gmx.net>, re@freebsd.org,
	davidxu@t2t2.com, davidxu@viatech.com.cn,
	freebsd-gnats-submit@freebsd.org
Subject: Re: kern/72979: unkillable process(es) stuck in `STOP' state
Date: Wed, 27 Oct 2004 14:23:40 -0400

 On Tue, Oct 26, 2004 at 12:07:47PM -0400, Mikhail Teterin wrote:
 > =On Tue, Oct 26, 2004 at 07:11:54PM +0800, David Xu wrote:
 > =That looks both promising (given symptoms) and low-risk.
 > =
 > =Mikhail et. al., will you have any problems testing that?
 > 
 > This problem was not easy to reproduce to begin with :-(
 > 
 > I don't even know, how it can be positively claimed gone. Shall we merge 
 > David's fix into RELENG_5 and hope for the best?
 > 
 >  -mi
 
 Has anyone had time to test David's patch?  If yes have you had
 any problems since?
 
 I should know later today if the other person who had been reporting
 problems has had them stop.  He reported last night he was still able
 to reproduce the problem but he hasn't had any "luck" doing it today.
 
 David, my only real question I guess is whether perhaps	P_STOPPED_SINGLE
 should be added to the flags that are being turned off by your patch.
 If I'm following the code right what had made P_STOPPED_TRACE special
 was it's one of the three signals lumped into P_STOPPED and the code
 in do_tdsignal() looks for that, as well as I'm sure the schedulers...
 
 -- 
 						Ken Smith
 - From there to here, from here to      |       kensmith@cse.buffalo.edu
   there, funny things are everywhere.   |
                       - Theodore Geisel |

From: David Xu <davidxu@freebsd.org>
To: Ken Smith <kensmith@cse.Buffalo.EDU>
Cc: Mikhail Teterin <mi+mx@aldan.algebra.com>,
	Michael Nottebrock <michaelnottebrock@gmx.net>, re@freebsd.org,
	davidxu@t2t2.com, davidxu@viatech.com.cn,
	freebsd-gnats-submit@freebsd.org
Subject: Re: kern/72979: unkillable process(es) stuck in `STOP' state
Date: Thu, 28 Oct 2004 07:19:57 +0800

 Ken Smith wrote:
 
 >On Tue, Oct 26, 2004 at 12:07:47PM -0400, Mikhail Teterin wrote:
 >  
 >
 >>=On Tue, Oct 26, 2004 at 07:11:54PM +0800, David Xu wrote:
 >>=That looks both promising (given symptoms) and low-risk.
 >>=
 >>=Mikhail et. al., will you have any problems testing that?
 >>
 >>This problem was not easy to reproduce to begin with :-(
 >>
 >>I don't even know, how it can be positively claimed gone. Shall we merge 
 >>David's fix into RELENG_5 and hope for the best?
 >>
 >> -mi
 >>    
 >>
 >
 >Has anyone had time to test David's patch?  If yes have you had
 >any problems since?
 >
 >I should know later today if the other person who had been reporting
 >problems has had them stop.  He reported last night he was still able
 >to reproduce the problem but he hasn't had any "luck" doing it today.
 >
 >David, my only real question I guess is whether perhaps	P_STOPPED_SINGLE
 >should be added to the flags that are being turned off by your patch.
 >If I'm following the code right what had made P_STOPPED_TRACE special
 >was it's one of the three signals lumped into P_STOPPED and the code
 >in do_tdsignal() looks for that, as well as I'm sure the schedulers...
 >
 >  
 >
 P_STOPPED_SINGLE needn't be removed here, in fact, the flag should 
 already be
 turned off at this point, if the flag is still there, then there must be a
 bug, thread_single(SINGLE_EXIT) should turn it off if it returns or thread
 will exit in the function.
 P_STOPPED_TRACE is cleared by debugger via PT_DETACH in sys_process.c, but
 if debugger dies without chance to call ptrace(PT_DETACH), then exit1() must
 clear it.
 
 David Xu
 

From: Ken Smith <kensmith@cse.Buffalo.EDU>
To: David Xu <davidxu@freebsd.org>
Cc: Ken Smith <kensmith@cse.Buffalo.EDU>,
	Mikhail Teterin <mi+mx@aldan.algebra.com>,
	Michael Nottebrock <michaelnottebrock@gmx.net>, re@freebsd.org,
	davidxu@t2t2.com, davidxu@viatech.com.cn,
	freebsd-gnats-submit@freebsd.org
Subject: Re: kern/72979: unkillable process(es) stuck in `STOP' state
Date: Wed, 27 Oct 2004 20:24:02 -0400

 --SLDf9lqlvOQaIe6s
 Content-Type: text/plain; charset=us-ascii
 Content-Disposition: inline
 Content-Transfer-Encoding: quoted-printable
 
 On Thu, Oct 28, 2004 at 07:19:57AM +0800, David Xu wrote:
 
 > P_STOPPED_SINGLE needn't be removed here, in fact, the flag should=20
 > already be
 > turned off at this point, if the flag is still there, then there must be a
 > bug, thread_single(SINGLE_EXIT) should turn it off if it returns or thread
 > will exit in the function.
 > P_STOPPED_TRACE is cleared by debugger via PT_DETACH in sys_process.c, but
 > if debugger dies without chance to call ptrace(PT_DETACH), then exit1() m=
 ust
 > clear it.
 
 Thank you for the explanation.
 
 Please go ahead with the MFC of your patch as-is.  I'm sorry to have
 been such a pain in the butt about this.  If you have any questions
 let re@ know please.  It can go to both RELENG_5 and RELENG_5_3 at
 this point.
 
 Thanks for all your help with this.
 
 --=20
 						Ken Smith
 - From there to here, from here to      |       kensmith@cse.buffalo.edu
   there, funny things are everywhere.   |
                       - Theodore Geisel |
 
 --SLDf9lqlvOQaIe6s
 Content-Type: application/pgp-signature
 Content-Disposition: inline
 
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.2.2 (SunOS)
 
 iD8DBQFBgDwh/G14VSmup/YRAk6EAJsFckJuMDBworwYSIiLZQMaRhmyGwCeMdR1
 KIMqigcun5omyj/Yjv8tZIA=
 =/P59
 -----END PGP SIGNATURE-----
 
 --SLDf9lqlvOQaIe6s--

From: Mark Wolgemuth <mark@employease.com>
To: freebsd-gnats-submit@FreeBSD.org
Cc:  
Subject: Re: kern/72979: unkillable process(es) stuck in `STOP' state
Date: Tue, 12 Apr 2005 12:34:58 -0400

 I just had this 'STOP' thread unkillable proc behavior with this setup:
 
 5.3-RELEASE-p7
 
 My kernel is GENERIC + SMP option + ipfw options.
 
 I run these packages inside a jail. The jail is a clone of the host 
 system, plus the packages.
 The host system has only the cyrus-sasl pkgs to support smtp/auth, 
 nothing else. The jail has these:
 
 pkg: spamass-milter:
 spamass-milter-0.2.0_5
 
 pkgs for spamassassin:
 p5-Digest-HMAC-1.01 Perl5 interface to HMAC Message-Digest Algorithms
 p5-Digest-SHA1-2.10 Perl interface to the SHA-1 Algorithm
 p5-HTML-Parser-3.38 Perl5 module for parsing HTML documents
 p5-HTML-Tagset-3.04 Some useful data table in parsing HTML
 p5-Mail-SpamAssassin-3.0.2 A highly efficient mail filter for 
 identifying spam
 p5-Mail-Tools-1.66  Perl5 modules for dealing with Internet e-mail 
 messages
 p5-Net-DNS-0.48     Perl5 interface to the DNS resolver, and dynamic 
 updates
 p5-URI-1.35         Perl5 interface to Uniform Resource Identifier 
 (URI) refere
 perl-5.8.6_2        Practical Extraction and Report Language
 
 The error occured between "spamass-milter", which uses libkse for 
 threads, and spamc, a child process that communicates to spamd. They 
 share a socket file.
 It appears that a thread spawned a spamc process that hung.
 
 Attempting to kill the process for spamass-milter left all threads in 
 "STOP" state. Attempting to kill spamc proc left a <defunct>. At this 
 point I was stuck and (stupidly) tried attaching gdb to spamass-milter, 
 which locked the entire system.
 

From: David Xu <davidxu@freebsd.org>
To: bug-followup@freebsd.org, mi@aldan.algebra.com
Cc:  
Subject: Re: kern/72979: unkillable process(es) stuck in `STOP' state
Date: Fri, 11 Nov 2005 17:10:33 +0800

 You should find and kill zoombie process's parent process, if the parent 
 process is also a zoombie, you should find its grandfather, ...
 it is a UNIX behavior that you can not kill a zoombie process, but
 should kill its ancestor. I would think that you have encoutered
 some buggy programs which forgot to recycle its children.
 
 David Xu
 

From: David Xu <davidxu@freebsd.org>
To: bug-followup@freebsd.org, mi@aldan.algebra.com
Cc:  
Subject: Re: kern/72979: unkillable process(es) stuck in `STOP' state
Date: Fri, 11 Nov 2005 17:16:14 +0800

 I think I should close the PR. the problem was fixed.
State-Changed-From-To: open->closed 
State-Changed-By: davidxu 
State-Changed-When: Fri Nov 11 13:12:25 GMT 2005 
State-Changed-Why:  
Fixed. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=72979 

From: "Andrew Pantyukhin" <infofarmer@FreeBSD.org>
To: bug-followup@FreeBSD.org, "David Xu" <davidxu@freebsd.org>, 
	"Mikhail Teterin" <mi@freebsd.org>
Cc:  
Subject: Re: kern/72979: unkillable process(es) stuck in `STOP' state
Date: Mon, 1 Jan 2007 22:12:15 +0300

 I've got picard (a very fat python app) in this very
 state, on latest current. I tried killing its parent
 (zsh) which only reassigned it to ppid 1. I won't try
 to kill that one :-)
 
 Tell me I'm on crack or I'll reopen this PR.
 
 Thanks!

From: David Xu <davidxu@freebsd.org>
To: "Andrew Pantyukhin" <infofarmer@freebsd.org>
Cc: bug-followup@freebsd.org,
 "Mikhail Teterin" <mi@freebsd.org>
Subject: Re: kern/72979: unkillable process(es) stuck in `STOP' state
Date: Tue, 2 Jan 2007 10:41:53 +0800

 On Tuesday 02 January 2007 03:12, Andrew Pantyukhin wrote:
 > I've got picard (a very fat python app) in this very
 > state, on latest current. I tried killing its parent
 > (zsh) which only reassigned it to ppid 1. I won't try
 > to kill that one :-)
 >
 > Tell me I'm on crack or I'll reopen this PR.
 >
 > Thanks!
 
 Can you reproduce it on 6.2 RC ?
 
 David Xu

From: "Andrew Pantyukhin" <infofarmer@FreeBSD.org>
To: "David Xu" <davidxu@freebsd.org>
Cc: bug-followup@freebsd.org, "Mikhail Teterin" <mi@freebsd.org>
Subject: Re: kern/72979: unkillable process(es) stuck in `STOP' state
Date: Tue, 2 Jan 2007 05:55:50 +0300

 On 1/2/07, David Xu <davidxu@freebsd.org> wrote:
 > On Tuesday 02 January 2007 03:12, Andrew Pantyukhin wrote:
 > > I've got picard (a very fat python app) in this very
 > > state, on latest current. I tried killing its parent
 > > (zsh) which only reassigned it to ppid 1. I won't try
 > > to kill that one :-)
 > >
 > > Tell me I'm on crack or I'll reopen this PR.
 > >
 > > Thanks!
 >
 > Can you reproduce it on 6.2 RC ?
 
 It's hard to reproduce it even as it is. Actually, I think
 it's something current-ish. I've been experiencing some
 instabilities for a week or two I think. I'll keep updating
 and tell you if I stumble over it again.
 
 Thanks!

From: David Xu <davidxu@freebsd.org>
To: "Andrew Pantyukhin" <infofarmer@freebsd.org>
Cc: bug-followup@freebsd.org,
 "Mikhail Teterin" <mi@freebsd.org>
Subject: Re: kern/72979: unkillable process(es) stuck in `STOP' state
Date: Tue, 2 Jan 2007 12:13:05 +0800

 On Tuesday 02 January 2007 10:55, Andrew Pantyukhin wrote:
 
 > It's hard to reproduce it even as it is. Actually, I think
 > it's something current-ish. I've been experiencing some
 > instabilities for a week or two I think. I'll keep updating
 > and tell you if I stumble over it again.
 >
 > Thanks!
 
 Do you know the python process is multi-threaded or not ?
 

From: "Andrew Pantyukhin" <infofarmer@FreeBSD.org>
To: "David Xu" <davidxu@freebsd.org>
Cc: bug-followup@freebsd.org, "Mikhail Teterin" <mi@freebsd.org>
Subject: Re: kern/72979: unkillable process(es) stuck in `STOP' state
Date: Tue, 2 Jan 2007 09:26:02 +0300

 On 1/2/07, David Xu <davidxu@freebsd.org> wrote:
 > On Tuesday 02 January 2007 10:55, Andrew Pantyukhin wrote:
 >
 > > It's hard to reproduce it even as it is. Actually, I think
 > > it's something current-ish. I've been experiencing some
 > > instabilities for a week or two I think. I'll keep updating
 > > and tell you if I stumble over it again.
 > >
 > > Thanks!
 >
 > Do you know the python process is multi-threaded or not ?
 
 It is. Actually, before rebooting back then I tried to
 kill another process (the audacious music player), which
 ended up in the very same state, so I believe it was a
 temporary system-wide condition, though all other processes
 ended successfully.

From: David Xu <davidxu@freebsd.org>
To: "Andrew Pantyukhin" <infofarmer@freebsd.org>
Cc: bug-followup@freebsd.org,
 "Mikhail Teterin" <mi@freebsd.org>
Subject: Re: kern/72979: unkillable process(es) stuck in `STOP' state
Date: Tue, 2 Jan 2007 17:17:10 +0800

 On Tuesday 02 January 2007 14:26, Andrew Pantyukhin wrote:
 
 > >
 > > Do you know the python process is multi-threaded or not ?
 >
 > It is. Actually, before rebooting back then I tried to
 > kill another process (the audacious music player), which
 > ended up in the very same state, so I believe it was a
 > temporary system-wide condition, though all other processes
 > ended successfully.
 
 Can you try following patch ?
 
 Index: kern_exit.c
 ===================================================================
 RCS file: /home/ncvs/src/sys/kern/kern_exit.c,v
 retrieving revision 1.294
 diff -u -r1.294 kern_exit.c
 --- kern_exit.c	25 Oct 2006 06:18:04 -0000	1.294
 +++ kern_exit.c	2 Jan 2007 09:15:10 -0000
 @@ -413,8 +413,12 @@
  	 */
  	sx_xlock(&proctree_lock);
  	q = LIST_FIRST(&p->p_children);
 -	if (q != NULL)		/* only need this if any child is S_ZOMB */
 +	if (q != NULL) {	/* only need this if any child is S_ZOMB */
 +		PROC_LOCK(initproc);
 +		initproc->p_flag |= P_STATCHILD;
  		wakeup(initproc);
 +		PROC_UNLOCK(initproc);
 +	}
  	for (; q != NULL; q = nq) {
  		nq = LIST_NEXT(q, p_sibling);
  		PROC_LOCK(q);

From: David Xu <davidxu@freebsd.org>
To: "Andrew Pantyukhin" <infofarmer@freebsd.org>
Cc: bug-followup@freebsd.org,
 "Mikhail Teterin" <mi@freebsd.org>
Subject: Re: kern/72979: unkillable process(es) stuck in `STOP' state
Date: Tue, 2 Jan 2007 19:09:15 +0800

 I have updated the patch, please use this patch instead:
 
 Index: kern_exit.c
 ===================================================================
 RCS file: /home/ncvs/src/sys/kern/kern_exit.c,v
 retrieving revision 1.294
 diff -u -r1.294 kern_exit.c
 --- kern_exit.c	25 Oct 2006 06:18:04 -0000	1.294
 +++ kern_exit.c	2 Jan 2007 11:06:56 -0000
 @@ -413,8 +413,12 @@
  	 */
  	sx_xlock(&proctree_lock);
  	q = LIST_FIRST(&p->p_children);
 -	if (q != NULL)		/* only need this if any child is S_ZOMB */
 +	if (q != NULL) {	/* only need this if any child is S_ZOMB */
 +		PROC_LOCK(initproc);
  		wakeup(initproc);
 +		psignal(initproc, SIGCHLD);
 +		PROC_UNLOCK(initproc);
 +	}
  	for (; q != NULL; q = nq) {
  		nq = LIST_NEXT(q, p_sibling);
  		PROC_LOCK(q);
 @@ -479,13 +483,16 @@
  	} else
  		mtx_unlock(&p->p_pptr->p_sigacts->ps_mtx);
  
 -	if (p->p_pptr == initproc)
 -		psignal(p->p_pptr, SIGCHLD);
 -	else if (p->p_sigparent != 0) {
 +	if (p->p_pptr == initproc) {
 +		wakeup(initproc);
 +		psignal(initproc, SIGCHLD);
 +	} else if (p->p_sigparent != 0) {
  		if (p->p_sigparent == SIGCHLD)
  			childproc_exited(p);
 -		else	/* LINUX thread */
 +		else {	/* LINUX thread */
 +			wakeup(p->p_pptr);
  			psignal(p->p_pptr, p->p_sigparent);
 +		}
  	}
  	PROC_UNLOCK(p->p_pptr);
  	PROC_UNLOCK(p);
>Unformatted:
