From nobody@FreeBSD.org  Mon Jan 22 06:35:31 2007
Return-Path: <nobody@FreeBSD.org>
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id 0CE9616A400
	for <freebsd-gnats-submit@FreeBSD.org>; Mon, 22 Jan 2007 06:35:31 +0000 (UTC)
	(envelope-from nobody@FreeBSD.org)
Received: from www.freebsd.org (www.freebsd.org [69.147.83.33])
	by mx1.freebsd.org (Postfix) with ESMTP id F1D7513C45A
	for <freebsd-gnats-submit@FreeBSD.org>; Mon, 22 Jan 2007 06:35:30 +0000 (UTC)
	(envelope-from nobody@FreeBSD.org)
Received: from www.freebsd.org (localhost [127.0.0.1])
	by www.freebsd.org (8.13.1/8.13.1) with ESMTP id l0M6ZUoM001699
	for <freebsd-gnats-submit@FreeBSD.org>; Mon, 22 Jan 2007 06:35:30 GMT
	(envelope-from nobody@www.freebsd.org)
Received: (from nobody@localhost)
	by www.freebsd.org (8.13.1/8.13.1/Submit) id l0M6ZUZC001698;
	Mon, 22 Jan 2007 06:35:30 GMT
	(envelope-from nobody)
Message-Id: <200701220635.l0M6ZUZC001698@www.freebsd.org>
Date: Mon, 22 Jan 2007 06:35:30 GMT
From: Craig Rodrigues<rodrigc@crodrigues.org>
To: freebsd-gnats-submit@FreeBSD.org
Subject: MOKB testcase for kqueue can cause kernel panic
X-Send-Pr-Version: www-3.0

>Number:         108201
>Category:       kern
>Synopsis:       [kqueue] MOKB testcase for kqueue can cause kernel panic
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    kib
>State:          closed
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Mon Jan 22 06:40:17 GMT 2007
>Closed-Date:    Thu Jul 24 13:48:40 UTC 2008
>Last-Modified:  Thu Jul 24 13:48:40 UTC 2008
>Originator:     Craig Rodrigues
>Release:        CURRENT
>Organization:
>Environment:
FreeBSD 7.0-CURRENT FreeBSD 7.0-CURRENT #35: Sun Jan 21 23:32:23 EST 2007
>Description:
The attached testcase from "Month of Kernel Bugs"
 http://projects.info-pull.com/mokb/MOKB-24-11-2006.html

causes the following panic on my system:

panic: mutex kqueue own at /usr/src/sys/kern/kern_event.c: 1069

I cannot get a proper gdb backtrace.  The ddb stack trace looks like:

kqueue_expand()
kqueue_register()
filt_proc()
knote()
fork()
fork()
syscall()
>How-To-Repeat:

>Fix:


Patch attached with submission follows:

/*
 *  Obtained from:
 *  http://projects.info-pull.com/mokb/MOKB-24-11-2006.html
 */

#include <sys/types.h>
#include <sys/event.h>
#include <sys/time.h>
#include <stdio.h>
#include <unistd.h>

int main(void) {
	struct kevent ke;
	int kq;

	kq = kqueue();
	
	EV_SET(&ke, getpid(), EVFILT_PROC, EV_ADD,
	    NOTE_EXIT|NOTE_EXEC|NOTE_TRACK, 0, NULL);
	
	kevent(kq, &ke, 1, NULL, 0, NULL);

	if (fork() != 0)
		kevent(kq, NULL, 0, &ke, 1, NULL);

	return (0);
}

>Release-Note:
>Audit-Trail:
Responsible-Changed-From-To: freebsd-bugs->jmg 
Responsible-Changed-By: jmg 
Responsible-Changed-When: Thu Jan 25 02:27:42 UTC 2007 
Responsible-Changed-Why:  
I'll take this kqueue PR... 

http://www.freebsd.org/cgi/query-pr.cgi?pr=108201 

From: John-Mark Gurney <gurney_j@resnet.uoregon.edu>
To: freebsd-gnats-submit@FreeBSD.org, Craig Rodrigues <rodrigc@crodrigues.org>
Cc:  
Subject: PR/108201
Date: Sat, 30 Jun 2007 00:39:06 -0700

 --azLHFNyN32YCQGCU
 Content-Type: text/plain; charset=us-ascii
 Content-Disposition: inline
 
 Could you try the attached patch?  It turns out that knote locks the
 kq, and kqueue_expand is called to ensure there is enough space to add
 a new kevent...  kqueue_expand was asserting that the KQ lock was not
 held, instead of checking conditions and immediately returning... since
 we already have the space allocated due to the original knote, the test
 will return...
 
 We could limit the test to just the (!fops->f_isfd && kq->kq_knhashmask !=
 0) case, but decided to add the other case for completeness...
 
 Thanks.
 
 -- 
   John-Mark Gurney				Voice: +1 415 225 5579
 
      "All that I will do, has been done, All that I have, has not."
 
 --azLHFNyN32YCQGCU
 Content-Type: text/plain; charset=us-ascii
 Content-Disposition: attachment; filename="108201.patch"
 
 Index: kern_event.c
 ===================================================================
 RCS file: /home/ncvs/src/sys/kern/kern_event.c,v
 retrieving revision 1.111
 diff -u -r1.111 kern_event.c
 --- kern_event.c	28 May 2007 17:15:05 -0000	1.111
 +++ kern_event.c	30 Jun 2007 07:38:26 -0000
 @@ -1059,6 +1059,15 @@
  	int fd;
  	int mflag = waitok ? M_WAITOK : M_NOWAIT;
  
 +	/*
 +	 * knote locks the KQ and filt_proc calls kqueue_register if _TRACK
 +	 * is set.  Return early so we don't assert KQ_NOTOWNED in this
 +	 * case.  We have a knote in the hash, so we have the table.
 +	 */
 +	if ((fops->f_isfd && kq->kq_knlistsize > ident) ||
 +	    (!fops->f_isfd && kq->kq_knhashmask != 0))
 +		return 0;
 +
  	KQ_NOTOWNED(kq);
  
  	if (fops->f_isfd) {
 
 --azLHFNyN32YCQGCU--

From: Craig Rodrigues <rodrigc@crodrigues.org>
To: John-Mark Gurney <gurney_j@resnet.uoregon.edu>
Cc: bug-followup@FreeBSD.org
Subject: Re: kern/108201: [kqueue] MOKB testcase for kqueue can cause kernel panic
Date: Sat, 7 Jul 2007 15:29:55 -0400

 On Sat, Jun 30, 2007 at 12:39:06AM -0700, John-Mark Gurney wrote:
 > Could you try the attached patch?  It turns out that knote locks the
 > kq, and kqueue_expand is called to ensure there is enough space to add
 
 Hi,
 
 The attached testcase at:
 http://www.freebsd.org/cgi/query-pr.cgi?pr=108201
 
 still causes a kernel panic with your patch,
 but the panic is now in a different place.
 
 Specifically I am getting this error:
 
 panic: _mtx_lock_sleep: recursed on non-recursive mutex kqueue @ /usr/src/sys/kern/kern_event.c: 851
 
 I can provide a fuller stack trace or kernel config if you want.
 
 -- 
 Craig Rodrigues        
 rodrigc@crodrigues.org

From: Kostik Belousov <kostikbel@gmail.com>
To: bug-followup@FreeBSD.org, rodrigc@crodrigues.org
Cc:  
Subject: Re: kern/108201: [kqueue] MOKB testcase for kqueue can cause kernel panic
Date: Wed, 7 May 2008 15:25:51 +0300

 --Se4vaFMzNav43fb+
 Content-Type: text/plain; charset=us-ascii
 Content-Disposition: inline
 Content-Transfer-Encoding: quoted-printable
 
 Yes, the patch only allows the code to proceed somewhat further. The
 underlying problem is the recursive call to the kqueue_register from the
 filt_proc(). I tried to temprorary drop kq lock around the call, but
 it again only allowed to see the LORs since knote() is called from the
 fork() while holding the process lock.
 
 I believe the proper solution is to remove the offending call from
 filt_proc(), and instead register the event for the child process in the
 safe context.
 
 My attempt to implement it is below. At least, it does not panic for me.
 
 diff --git a/sys/kern/kern_event.c b/sys/kern/kern_event.c
 index aa3d4a8..0d536fc 100644
 --- a/sys/kern/kern_event.c
 +++ b/sys/kern/kern_event.c
 @@ -403,29 +403,62 @@ filt_proc(struct knote *kn, long hint)
  		return (1);
  	}
 =20
 -	/*
 -	 * process forked, and user wants to track the new process,
 -	 * so attach a new knote to it, and immediately report an
 -	 * event with the parent's pid.
 -	 */
 -	if ((event =3D=3D NOTE_FORK) && (kn->kn_sfflags & NOTE_TRACK)) {
 -		struct kevent kev;
 -		int error;
 +	return (kn->kn_fflags !=3D 0);
 +}
 =20
 -		/*
 -		 * register knote with new process.
 -		 */
 -		kev.ident =3D hint & NOTE_PDATAMASK;	/* pid */
 -		kev.filter =3D kn->kn_filter;
 -		kev.flags =3D kn->kn_flags | EV_ADD | EV_ENABLE | EV_FLAG1;
 -		kev.fflags =3D kn->kn_sfflags;
 -		kev.data =3D kn->kn_id;			/* parent */
 -		kev.udata =3D kn->kn_kevent.udata;	/* preserve udata */
 -		error =3D kqueue_register(kn->kn_kq, &kev, NULL, 0);
 -		if (error)
 -			kn->kn_fflags |=3D NOTE_TRACKERR;
 -	}
 +/*
 + * Called when the process forked. For each knote attached to the
 + * parent, check whether user wants to track the new process. If so
 + * attach a new knote to it, and immediately report an event with the
 + * parent's pid.
 + */
 =20
 -	return (kn->kn_fflags !=3D 0);
 +void
 +knote_fork(struct knlist *list, int pid)
 +{
 +	struct kqueue *kq;
 +	struct knote *kn;
 +	struct kevent kev;
 +	int error;
 +
 +	if (list =3D=3D NULL)
 +		return;
 +	list->kl_lock(list->kl_lockarg);
 +
 +	SLIST_FOREACH(kn, &list->kl_list, kn_selnext) {
 +		kq =3D kn->kn_kq;
 +		if ((kn->kn_status & KN_INFLUX) !=3D KN_INFLUX) {
 +			KQ_LOCK(kq);
 +			if ((kn->kn_status & KN_INFLUX) !=3D KN_INFLUX &&
 +			    (kn->kn_sfflags & NOTE_TRACK)) {
 +				kn->kn_status |=3D KN_INFLUX;
 +				KQ_UNLOCK(kq);
 +				list->kl_unlock(list->kl_lockarg);
 +
 +				/*
 +				 * register knote with new process.
 +				 */
 +				kev.ident =3D pid;
 +				kev.filter =3D kn->kn_filter;
 +				kev.flags =3D kn->kn_flags | EV_ADD | EV_ENABLE |
 +				    EV_FLAG1;
 +				kev.fflags =3D kn->kn_sfflags;
 +				kev.data =3D kn->kn_id;		/* parent */
 +				kev.udata =3D kn->kn_kevent.udata;/* preserve udata */
 +				error =3D kqueue_register(kq, &kev, NULL, 0);
 +				if (kn->kn_fop->f_event(kn, NOTE_FORK | pid))
 +					KNOTE_ACTIVATE(kn, 1);
 +				if (error)
 +					kn->kn_fflags |=3D NOTE_TRACKERR;
 +				KQ_LOCK(kq);
 +				kn->kn_status &=3D ~KN_INFLUX;
 +				KQ_UNLOCK_FLUX(kq);
 +				list->kl_lock(list->kl_lockarg);
 +			} else
 +				KQ_UNLOCK(kq);
 +		}
 +		kq =3D NULL;
 +	}
 +	list->kl_unlock(list->kl_lockarg);
  }
 =20
 diff --git a/sys/kern/kern_fork.c b/sys/kern/kern_fork.c
 index 223f78a..38c4e91 100644
 --- a/sys/kern/kern_fork.c
 +++ b/sys/kern/kern_fork.c
 @@ -715,6 +715,8 @@ again:
 =20
  	PROC_UNLOCK(p1);
 =20
 +	knote_fork(&p1->p_klist, p2->p_pid);
 +
  	/*
  	 * Preserve synchronization semantics of vfork.  If waiting for
  	 * child to exec or exit, set P_PPWAIT on child, and sleep on our
 diff --git a/sys/sys/event.h b/sys/sys/event.h
 index 167056c..99f6bca 100644
 --- a/sys/sys/event.h
 +++ b/sys/sys/event.h
 @@ -205,6 +205,7 @@ struct proc;
  struct knlist;
 =20
  extern void	knote(struct knlist *list, long hint, int islocked);
 +extern void	knote_fork(struct knlist *list, int pid);
  extern void	knlist_add(struct knlist *knl, struct knote *kn, int islocked);
  extern void	knlist_remove(struct knlist *knl, struct knote *kn, int islock=
 ed);
  extern void	knlist_remove_inevent(struct knlist *knl, struct knote *kn);
 
 --Se4vaFMzNav43fb+
 Content-Type: application/pgp-signature
 Content-Disposition: inline
 
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.9 (FreeBSD)
 
 iEYEARECAAYFAkghn84ACgkQC3+MBN1Mb4jPyQCg9NQNgxgDatWjxBIwzIse4mNk
 9vcAn00GuJC4Pb4uuK8rKspVHYjSJWsV
 =A5kX
 -----END PGP SIGNATURE-----
 
 --Se4vaFMzNav43fb+--

From: dfilter@FreeBSD.ORG (dfilter service)
To: bug-followup@FreeBSD.org
Cc:  
Subject: Re: kern/108201: commit references a PR
Date: Mon,  7 Jul 2008 09:31:28 +0000 (UTC)

 kib         2008-07-07 09:30:11 UTC
 
   FreeBSD src repository
 
   Modified files:
     sys/kern             kern_event.c kern_fork.c 
     sys/sys              event.h 
   Log:
   SVN rev 180340 on 2008-07-07 09:30:11Z by kib
   
   The kqueue_register() function assumes that it is called from the top of
   the syscall code and acquires various event subsystem locks as needed.
   The handling of the NOTE_TRACK for EVFILT_PROC is currently done by
   calling the kqueue_register() from filt_proc() filter, causing recursive
   entrance of the kqueue code. This results in the LORs and recursive
   acquisition of the locks.
   
   Implement the variant of the knote() function designed to only handle
   the fork() event. It mostly copies the knote() body, but also handles
   the NOTE_TRACK, removing the handling from the filt_proc(), where it
   causes problems described above. The function is called from the fork1()
   instead of knote().
   
   When encountering NOTE_TRACK knote, it marks the knote as influx
   and drops the knlist and kqueue lock. In this context call to
   kqueue_register is safe from the problems.
   
   An error from the kqueue_register() is reported to the observer as
   NOTE_TRACKERR fflag.
   
   PR:     108201
   Reviewed by:    jhb, Pramod Srinivasan <pramod juniper net> (previous version)
   Discussed with: jmg
   Tested by:      pho
   MFC after:      2 weeks
   
   Revision  Changes    Path
   1.122     +67 -15    src/sys/kern/kern_event.c
   1.294     +2 -4      src/sys/kern/kern_fork.c
   1.39      +1 -0      src/sys/sys/event.h
 _______________________________________________
 cvs-all@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/cvs-all
 To unsubscribe, send any mail to "cvs-all-unsubscribe@freebsd.org"
 

From: dfilter@FreeBSD.ORG (dfilter service)
To: bug-followup@FreeBSD.org
Cc:  
Subject: Re: kern/108201: commit references a PR
Date: Mon, 21 Jul 2008 10:00:26 +0000 (UTC)

 kib         2008-07-21 09:59:40 UTC
 
   FreeBSD src repository
 
   Modified files:        (Branch: RELENG_7)
     sys/kern             kern_event.c kern_fork.c 
     sys/sys              event.h 
   Log:
   SVN rev 180653 on 2008-07-21 09:59:40Z by kib
   
   MFC r180340:
   
   The kqueue_register() function assumes that it is called from the top of
   the syscall code and acquires various event subsystem locks as needed.
   The handling of the NOTE_TRACK for EVFILT_PROC is currently done by
   calling the kqueue_register() from filt_proc() filter, causing recursive
   entrance of the kqueue code. This results in the LORs and recursive
   acquisition of the locks.
   
   Implement the variant of the knote() function designed to only handle
   the fork() event. It mostly copies the knote() body, but also handles
   the NOTE_TRACK, removing the handling from the filt_proc(), where it
   causes problems described above. The function is called from the fork1()
   instead of knote().
   
   When encountering NOTE_TRACK knote, it marks the knote as influx
   and drops the knlist and kqueue lock. In this context call to
   kqueue_register is safe from the problems.
   
   An error from the kqueue_register() is reported to the observer as
   NOTE_TRACKERR fflag.
   
   PR:     108201
   
   Revision   Changes    Path
   1.113.2.4  +67 -15    src/sys/kern/kern_event.c
   1.282.2.4  +2 -4      src/sys/kern/kern_fork.c
   1.37.2.2   +1 -0      src/sys/sys/event.h
 _______________________________________________
 cvs-all@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/cvs-all
 To unsubscribe, send any mail to "cvs-all-unsubscribe@freebsd.org"
 
Responsible-Changed-From-To: jmg->kib 
Responsible-Changed-By: kib 
Responsible-Changed-When: Thu Jul 24 13:47:30 UTC 2008 
Responsible-Changed-Why:  
Take from jmg@. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=108201 
State-Changed-From-To: open->closed 
State-Changed-By: kib 
State-Changed-When: Thu Jul 24 13:47:55 UTC 2008 
State-Changed-Why:  
Patches are committed to HEAD and RELENG_7. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=108201 
>Unformatted:
Do you think you'll have time to look at this again before the release? It would be nice to not have a known 
panic present.
