From ps@mu.org  Sun Jul  3 00:38:19 2005
Return-Path: <ps@mu.org>
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 8523516A547;
	Sun,  3 Jul 2005 00:38:17 +0000 (GMT)
	(envelope-from ps@mu.org)
Received: from elvis.mu.org (elvis.mu.org [192.203.228.196])
	by mx1.FreeBSD.org (Postfix) with ESMTP id E81FD4411C;
	Sun,  3 Jul 2005 00:23:32 +0000 (GMT)
	(envelope-from ps@mu.org)
Received: by elvis.mu.org (Postfix, from userid 1000)
	id 8A52F62608; Sat,  2 Jul 2005 17:20:20 -0700 (PDT)
Received: from mx2.freebsd.org (mx2.freebsd.org [216.136.204.119])
	by elvis.mu.org (Postfix) with ESMTP id 461A25C900
	for <ps@mu.org>; Sat,  5 Mar 2005 17:21:08 -0800 (PST)
Received: from hub.freebsd.org (hub.freebsd.org [216.136.204.18])
	by mx2.freebsd.org (Postfix) with ESMTP
	id C472D567D5; Sun,  6 Mar 2005 01:20:50 +0000 (GMT)
	(envelope-from owner-freebsd-current@freebsd.org)
Received: from hub.freebsd.org (localhost [127.0.0.1])
	by hub.freebsd.org (Postfix) with ESMTP
	id 8302716A513; Sun,  6 Mar 2005 01:20:45 +0000 (GMT)
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP
	id 02E4816A4D6; Sun,  6 Mar 2005 01:20:41 +0000 (GMT)
Received: from bloodwood.hunterlink.net.au (smtp-local.hunterlink.net.au
	[203.12.144.17])	by mx1.FreeBSD.org (Postfix) with ESMTP
	id 4F9FD43D58; Sun,  6 Mar 2005 01:20:37 +0000 (GMT)
	(envelope-from boris@brooknet.com.au)
Received: from localhost (ppp2DA6.dyn.pacific.net.au [61.8.45.166])
	j261KTlt010181;	Sun, 6 Mar 2005 12:20:30 +1100
Received: by localhost (Postfix, from userid 1001)
	id 701FB17D8; Sun,  6 Mar 2005 12:21:46 +1100 (EST)
Message-Id: <20050306012146.701FB17D8@localhost>
Date: Sun,  6 Mar 2005 12:21:46 +1100 (EST)
From: Sam Lawrance <boris@brooknet.com.au>
Sender: owner-freebsd-current@freebsd.org
Reply-To: Sam Lawrance <boris@brooknet.com.au>
To: FreeBSD-gnats-submit@freebsd.org
Cc: current@freebsd.org
Subject: Swapped out procs not brought in immediately after child exits
X-Send-Pr-Version: 3.113
X-GNATS-Notify:

>Number:         82910
>Category:       kern
>Synopsis:       Swapped out procs not brought in immediately after child exits
>Confidential:   no
>Severity:       non-critical
>Priority:       medium
>Responsible:    freebsd-bugs
>State:          closed
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Sun Jul 03 00:40:25 GMT 2005
>Closed-Date:    Sun Jul 03 02:40:02 GMT 2005
>Last-Modified:  Sun Jul 03 02:40:02 GMT 2005
>Originator:     Sam Lawrance
>Release:        FreeBSD 5.4-PRERELEASE i386
>Organization:
>Environment:
System: FreeBSD dirk.no.domain 5.4-PRERELEASE FreeBSD 5.4-PRERELEASE #10: Sun Ma
r 6 10:45:13 EST 2005 root@dirk.no.domain:/usr/testbuild/src5/sys/i386/compile/G
ENERIC i386


>Description:

I run -stable on my lonely box, but AFAICS this affects current.

This problem is similar in flavour to one that I reported a while ago,
since fixed.

Here's an example. Below we have a login, shell and su which have
swapped out, and a shell which is active:

root 4291  0.0  0.0  1664     0  v3  IWs  -         0:00.00 login [pam] (login)
sam  4298  0.0  0.0  2260     0  v3  IW   -         0:00.00 -bash (bash)
root 4299  0.0  0.0  1644     0  v3  IW   -         0:00.00 su
root 4300  0.0  0.4  2952  1132  v3  S+    3:23PM   0:00.66 su (bash)

When 4300 exits, it will sit in the zombie state for a long
time, waiting for 4299 to be swapped in.  Same for 4299 and 4298.

The kernel call stack for 4300 would be something like

	exit1
	  kern_exit
	    wakeup (parent process as wait channel)
	      sleepq_broadcast
	        sleepq_resume_thread (on parent process)
	          setrunnable

In setrunnable, curthread->td_pflags is flagged with TDP_WAKEPROC0 to
indicate the vm scheduler should be awoken to do its thing.

David Xu's original change was to check for TDP_WAKEPROC0 in
critical_exit() and wakeup(&proc0) from there. Things were arranged
this way in order to prevent an LOR between sched_lock and sleepqueue
locks.

That scheme doesn't take into account that exit1() does a
critical_enter() that has no corresponding critical_exit() in that
thread (because the exiting thread grabs sched_lock which is held across
cpu_throw).

So the wakeup is not done, and we just have to wait for the vm's tsleep
on proc0 to time out. The same thing might occur across other exit
points, but I don't know what they are.

>How-To-Repeat:

Run a shell somewhere (first). Su or run another shell or similar (second).
Wait until the first shell has swapped out (might require running some other
memory hogs). Exit the second shell. Notice that the second shell takes a
long time to exit.

>Fix:

A possible solution might be to wakeup(&proc0) after waking the parent
and before grabbing sched_lock:

Index: kern_exit.c
===================================================================
RCS file: /home/ncvs/FreeBSD/src/sys/kern/kern_exit.c,v
retrieving revision 1.256
diff -u -r1.256 kern_exit.c
--- kern_exit.c	29 Jan 2005 14:03:41 -0000	1.256
+++ kern_exit.c	6 Mar 2005 01:17:35 -0000
@@ -503,6 +503,7 @@
 	mtx_unlock_spin(&sched_lock);
 	wakeup(p->p_pptr);
 	PROC_UNLOCK(p->p_pptr);
+	wakeup(&proc0);
 	mtx_lock_spin(&sched_lock);
 	critical_exit();
 
_______________________________________________
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"

>Release-Note:
>Audit-Trail:
State-Changed-From-To: open->closed 
State-Changed-By: lawrance 
State-Changed-When: Sun Jul 3 02:38:03 GMT 2005 
State-Changed-Why:  
Hmm, I didn't send this.  Dupe of 78474. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=82910 
>Unformatted:
