From nobody@FreeBSD.org  Wed Mar  3 21:14:46 2010
Return-Path: <nobody@FreeBSD.org>
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 70AC21065670
	for <freebsd-gnats-submit@FreeBSD.org>; Wed,  3 Mar 2010 21:14:46 +0000 (UTC)
	(envelope-from nobody@FreeBSD.org)
Received: from www.freebsd.org (www.freebsd.org [IPv6:2001:4f8:fff6::21])
	by mx1.freebsd.org (Postfix) with ESMTP id 44CC78FC0A
	for <freebsd-gnats-submit@FreeBSD.org>; Wed,  3 Mar 2010 21:14:46 +0000 (UTC)
Received: from www.freebsd.org (localhost [127.0.0.1])
	by www.freebsd.org (8.14.3/8.14.3) with ESMTP id o23LEjYK082344
	for <freebsd-gnats-submit@FreeBSD.org>; Wed, 3 Mar 2010 21:14:45 GMT
	(envelope-from nobody@www.freebsd.org)
Received: (from nobody@localhost)
	by www.freebsd.org (8.14.3/8.14.3/Submit) id o23LEj6h082341;
	Wed, 3 Mar 2010 21:14:45 GMT
	(envelope-from nobody)
Message-Id: <201003032114.o23LEj6h082341@www.freebsd.org>
Date: Wed, 3 Mar 2010 21:14:45 GMT
From: Alexander Sack <asack@niksun.com>
To: freebsd-gnats-submit@FreeBSD.org
Subject: bpf(4) can panic due to a race condition on descriptor destruction
X-Send-Pr-Version: www-3.1
X-GNATS-Notify:

>Number:         144453
>Category:       kern
>Synopsis:       bpf(4) can panic due to a race condition on descriptor destruction
>Confidential:   no
>Severity:       non-critical
>Priority:       low
>Responsible:    jkim
>State:          closed
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Wed Mar 03 21:20:03 UTC 2010
>Closed-Date:    Mon Mar 22 21:01:41 UTC 2010
>Last-Modified:  Mon Mar 22 21:01:41 UTC 2010
>Originator:     Alexander Sack
>Release:        CURRENT, 7.2-amd64
>Organization:
Niksun
>Environment:
SA5000PAL Intel board with 8GB of RAM, em(4) network interface card
>Description:
When an application polls on a particular bpf descriptor, a timeout is scheduled, bpf_timed_out() via callout_reset().  If a buffer is not available within the poll period, bpf_timed_out() is fired which will change the bpf_d state and wakeup any threads waiting for an event.  When bpf_timed_out() is attempts to acquire the descriptor lock.

Now if a process is in the middle of a poll/select and closes (gracefully or otherwise) when the bpf descriptor is closed, bpf_dtor() is called.  This will acquire the descriptor lock and do callout_stop() if the bpf state is in BPF_WAITING (i.e. select was called and callout_reset has completed scheduling the callout).  After calling callout_stop() it released the descriptor lock where now a race condition can occur.

If callout_stop() can't stop bpf_timed_out() from firing (say it has already fired) then bpf_timed_out() is sitting waiting on the descriptor lock to continue. When bpf_dtor() drops the lock, bpf_timed_out() is allowed to continue. But bpf_dtor() is going to free the descriptor that bpf_timed_out() is currently changing.  This can lead to panic.

The patch attached addresses this situation by just doing a callout_active() and if so do a callout_drain() which will wait until bpf_timed_out() has finished.  This allows bpf_dtor() to confidently free the descriptor during close operation.
>How-To-Repeat:
Loads of pollers on a descriptor with high load during a shutdown.
>Fix:
See patch attached.  I tested this on my Intel machine issuing 200 tcpdump processes with zerocopy disabled and enabled (even though with zerocopy libpcap doens't poll on it) capturing 100% utilization gige traffic.  No panic occured during shutdown.  We also saw this using our own custom packet capture application which is where I discovered and fixed the problem.

Patch attached with submission follows:

? bpf.patch
Index: bpf.c
===================================================================
RCS file: /home/ncvs/src/sys/net/bpf.c,v
retrieving revision 1.219
diff -u -r1.219 bpf.c
--- bpf.c	20 Feb 2010 00:19:21 -0000	1.219
+++ bpf.c	3 Mar 2010 21:04:48 -0000
@@ -614,6 +614,15 @@
 	mac_bpfdesc_destroy(d);
 #endif /* MAC */
 	knlist_destroy(&d->bd_sel.si_note);
+	/*
+	 * If we could not stop the callout above, 
+	 * then when we release the descriptor lock, 
+	 * there is a race between when bpf_timed_out() 
+	 * finishes and descriptor tear down.  Check
+	 * for it and drain.
+	 */
+	if (callout_active(&d->bd_callout))
+		callout_drain(&d->bd_callout);
 	bpf_freed(d);
 	free(d, M_BPF);
 }


>Release-Note:
>Audit-Trail:
Responsible-Changed-From-To: freebsd-bugs->jkim 
Responsible-Changed-By: jkim 
Responsible-Changed-When: Wed Mar 3 22:29:33 UTC 2010 
Responsible-Changed-Why:  
I will take it. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=144453 

From: dfilter@FreeBSD.ORG (dfilter service)
To: bug-followup@FreeBSD.org
Cc:  
Subject: Re: kern/144453: commit references a PR
Date: Fri, 12 Mar 2010 19:15:12 +0000 (UTC)

 Author: jkim
 Date: Fri Mar 12 19:14:58 2010
 New Revision: 205092
 URL: http://svn.freebsd.org/changeset/base/205092
 
 Log:
   Tidy up callout for select(2) and read timeout.
   
   - Add a missing callout_drain(9) before the descriptor deallocation.[1]
   - Prefer callout_init_mtx(9) over callout_init(9) and let the callout
   subsystem handle the mutex for callout function.
   
   PR:		kern/144453
   Submitted by:	Alexander Sack (asack at niksun dot com)[1]
   MFC after:	1 week
 
 Modified:
   head/sys/net/bpf.c
 
 Modified: head/sys/net/bpf.c
 ==============================================================================
 --- head/sys/net/bpf.c	Fri Mar 12 18:41:41 2010	(r205091)
 +++ head/sys/net/bpf.c	Fri Mar 12 19:14:58 2010	(r205092)
 @@ -614,6 +614,7 @@ bpf_dtor(void *data)
  	mac_bpfdesc_destroy(d);
  #endif /* MAC */
  	knlist_destroy(&d->bd_sel.si_note);
 +	callout_drain(&d->bd_callout);
  	bpf_freed(d);
  	free(d, M_BPF);
  }
 @@ -651,7 +652,7 @@ bpfopen(struct cdev *dev, int flags, int
  	mac_bpfdesc_create(td->td_ucred, d);
  #endif
  	mtx_init(&d->bd_mtx, devtoname(dev), "bpf cdev lock", MTX_DEF);
 -	callout_init(&d->bd_callout, CALLOUT_MPSAFE);
 +	callout_init_mtx(&d->bd_callout, &d->bd_mtx, 0);
  	knlist_init_mtx(&d->bd_sel.si_note, &d->bd_mtx);
  
  	return (0);
 @@ -807,13 +808,15 @@ bpf_timed_out(void *arg)
  {
  	struct bpf_d *d = (struct bpf_d *)arg;
  
 -	BPFD_LOCK(d);
 +	BPFD_LOCK_ASSERT(d);
 +
 +	if (callout_pending(&d->bd_callout) || !callout_active(&d->bd_callout))
 +		return;
  	if (d->bd_state == BPF_WAITING) {
  		d->bd_state = BPF_TIMED_OUT;
  		if (d->bd_slen != 0)
  			bpf_wakeup(d);
  	}
 -	BPFD_UNLOCK(d);
  }
  
  static int
 _______________________________________________
 svn-src-all@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/svn-src-all
 To unsubscribe, send any mail to "svn-src-all-unsubscribe@freebsd.org"
 
State-Changed-From-To: open->feedback 
State-Changed-By: jkim 
State-Changed-When: Fri Mar 12 19:29:59 UTC 2010 
State-Changed-Why:  
This should be fixed on HEAD now.  Please re-test and let me know.  Thanks! 

http://www.freebsd.org/cgi/query-pr.cgi?pr=144453 

From: dfilter@FreeBSD.ORG (dfilter service)
To: bug-followup@FreeBSD.org
Cc:  
Subject: Re: kern/144453: commit references a PR
Date: Mon, 22 Mar 2010 19:59:20 +0000 (UTC)

 Author: jkim
 Date: Mon Mar 22 19:59:00 2010
 New Revision: 205463
 URL: http://svn.freebsd.org/changeset/base/205463
 
 Log:
   MFC:	r205092
   
   Tidy up callout for select(2) and read timeout.
   
   - Add a missing callout_drain(9) before the descriptor deallocation.[1]
   - Prefer callout_init_mtx(9) over callout_init(9) and let the callout
   subsystem handle the mutex for callout function.
   
   PR:		kern/144453
   Submitted by:	Alexander Sack (asack at niksun dot com)[1]
 
 Modified:
   stable/8/sys/net/bpf.c
 Directory Properties:
   stable/8/sys/   (props changed)
   stable/8/sys/amd64/include/xen/   (props changed)
   stable/8/sys/cddl/contrib/opensolaris/   (props changed)
   stable/8/sys/contrib/dev/acpica/   (props changed)
   stable/8/sys/contrib/pf/   (props changed)
   stable/8/sys/dev/xen/xenpci/   (props changed)
 
 Modified: stable/8/sys/net/bpf.c
 ==============================================================================
 --- stable/8/sys/net/bpf.c	Mon Mar 22 19:52:06 2010	(r205462)
 +++ stable/8/sys/net/bpf.c	Mon Mar 22 19:59:00 2010	(r205463)
 @@ -611,6 +611,7 @@ bpf_dtor(void *data)
  	mac_bpfdesc_destroy(d);
  #endif /* MAC */
  	knlist_destroy(&d->bd_sel.si_note);
 +	callout_drain(&d->bd_callout);
  	bpf_freed(d);
  	free(d, M_BPF);
  }
 @@ -648,7 +649,7 @@ bpfopen(struct cdev *dev, int flags, int
  	mac_bpfdesc_create(td->td_ucred, d);
  #endif
  	mtx_init(&d->bd_mtx, devtoname(dev), "bpf cdev lock", MTX_DEF);
 -	callout_init(&d->bd_callout, CALLOUT_MPSAFE);
 +	callout_init_mtx(&d->bd_callout, &d->bd_mtx, 0);
  	knlist_init_mtx(&d->bd_sel.si_note, &d->bd_mtx);
  
  	return (0);
 @@ -804,13 +805,15 @@ bpf_timed_out(void *arg)
  {
  	struct bpf_d *d = (struct bpf_d *)arg;
  
 -	BPFD_LOCK(d);
 +	BPFD_LOCK_ASSERT(d);
 +
 +	if (callout_pending(&d->bd_callout) || !callout_active(&d->bd_callout))
 +		return;
  	if (d->bd_state == BPF_WAITING) {
  		d->bd_state = BPF_TIMED_OUT;
  		if (d->bd_slen != 0)
  			bpf_wakeup(d);
  	}
 -	BPFD_UNLOCK(d);
  }
  
  static int
 _______________________________________________
 svn-src-all@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/svn-src-all
 To unsubscribe, send any mail to "svn-src-all-unsubscribe@freebsd.org"
 

From: dfilter@FreeBSD.ORG (dfilter service)
To: bug-followup@FreeBSD.org
Cc:  
Subject: Re: kern/144453: commit references a PR
Date: Mon, 22 Mar 2010 20:12:22 +0000 (UTC)

 Author: jkim
 Date: Mon Mar 22 20:12:10 2010
 New Revision: 205464
 URL: http://svn.freebsd.org/changeset/base/205464
 
 Log:
   MFC:	r205092
   
   Tidy up callout for select(2) and read timeout.
   
   - Add a missing callout_drain(9) before the descriptor deallocation.[1]
   - Prefer callout_init_mtx(9) over callout_init(9) and let the callout
   subsystem handle the mutex for callout function.
   
   PR:		kern/144453
   Submitted by:	Alexander Sack (asack at niksun dot com)[1]
 
 Modified:
   stable/7/sys/net/bpf.c
 Directory Properties:
   stable/7/sys/   (props changed)
   stable/7/sys/cddl/contrib/opensolaris/   (props changed)
   stable/7/sys/contrib/dev/acpica/   (props changed)
   stable/7/sys/contrib/pf/   (props changed)
 
 Modified: stable/7/sys/net/bpf.c
 ==============================================================================
 --- stable/7/sys/net/bpf.c	Mon Mar 22 19:59:00 2010	(r205463)
 +++ stable/7/sys/net/bpf.c	Mon Mar 22 20:12:10 2010	(r205464)
 @@ -424,7 +424,7 @@ bpfopen(struct cdev *dev, int flags, int
  	mac_create_bpfdesc(td->td_ucred, d);
  #endif
  	mtx_init(&d->bd_mtx, devtoname(dev), "bpf cdev lock", MTX_DEF);
 -	callout_init(&d->bd_callout, CALLOUT_MPSAFE);
 +	callout_init_mtx(&d->bd_callout, &d->bd_mtx, 0);
  	knlist_init_mtx(&d->bd_sel.si_note, &d->bd_mtx);
  
  	return (0);
 @@ -455,6 +455,7 @@ bpfclose(struct cdev *dev, int flags, in
  	mac_destroy_bpfdesc(d);
  #endif /* MAC */
  	knlist_destroy(&d->bd_sel.si_note);
 +	callout_drain(&d->bd_callout);
  	bpf_freed(d);
  	dev->si_drv1 = NULL;
  	free(d, M_BPF);
 @@ -615,13 +616,15 @@ bpf_timed_out(void *arg)
  {
  	struct bpf_d *d = (struct bpf_d *)arg;
  
 -	BPFD_LOCK(d);
 +	BPFD_LOCK_ASSERT(d);
 +
 +	if (callout_pending(&d->bd_callout) || !callout_active(&d->bd_callout))
 +		return;
  	if (d->bd_state == BPF_WAITING) {
  		d->bd_state = BPF_TIMED_OUT;
  		if (d->bd_slen != 0)
  			bpf_wakeup(d);
  	}
 -	BPFD_UNLOCK(d);
  }
  
  static int
 _______________________________________________
 svn-src-all@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/svn-src-all
 To unsubscribe, send any mail to "svn-src-all-unsubscribe@freebsd.org"
 

From: dfilter@FreeBSD.ORG (dfilter service)
To: bug-followup@FreeBSD.org
Cc:  
Subject: Re: kern/144453: commit references a PR
Date: Mon, 22 Mar 2010 20:21:32 +0000 (UTC)

 Author: jkim
 Date: Mon Mar 22 20:21:22 2010
 New Revision: 205465
 URL: http://svn.freebsd.org/changeset/base/205465
 
 Log:
   MFC:	r205092
   
   Tidy up callout for select(2) and read timeout.
   
   - Add a missing callout_drain(9) before the descriptor deallocation.[1]
   - Prefer callout_init_mtx(9) over callout_init(9) and let the callout
   subsystem handle the mutex for callout function.
   
   PR:		kern/144453
   Submitted by:	Alexander Sack (asack at niksun dot com)[1]
 
 Modified:
   stable/6/sys/net/bpf.c
 Directory Properties:
   stable/6/sys/   (props changed)
   stable/6/sys/contrib/pf/   (props changed)
   stable/6/sys/dev/cxgb/   (props changed)
 
 Modified: stable/6/sys/net/bpf.c
 ==============================================================================
 --- stable/6/sys/net/bpf.c	Mon Mar 22 20:12:10 2010	(r205464)
 +++ stable/6/sys/net/bpf.c	Mon Mar 22 20:21:22 2010	(r205465)
 @@ -398,7 +398,7 @@ bpfopen(struct cdev *dev, int flags, int
  	mac_create_bpfdesc(td->td_ucred, d);
  #endif
  	mtx_init(&d->bd_mtx, devtoname(dev), "bpf cdev lock", MTX_DEF);
 -	callout_init(&d->bd_callout, NET_CALLOUT_MPSAFE);
 +	callout_init_mtx(&d->bd_callout, &d->bd_mtx, 0);
  	knlist_init(&d->bd_sel.si_note, &d->bd_mtx, NULL, NULL, NULL);
  
  	return (0);
 @@ -429,6 +429,7 @@ bpfclose(struct cdev *dev, int flags, in
  	mac_destroy_bpfdesc(d);
  #endif /* MAC */
  	knlist_destroy(&d->bd_sel.si_note);
 +	callout_drain(&d->bd_callout);
  	bpf_freed(d);
  	dev->si_drv1 = NULL;
  	free(d, M_BPF);
 @@ -577,13 +578,15 @@ bpf_timed_out(void *arg)
  {
  	struct bpf_d *d = (struct bpf_d *)arg;
  
 -	BPFD_LOCK(d);
 +	BPFD_LOCK_ASSERT(d);
 +
 +	if (callout_pending(&d->bd_callout) || !callout_active(&d->bd_callout))
 +		return;
  	if (d->bd_state == BPF_WAITING) {
  		d->bd_state = BPF_TIMED_OUT;
  		if (d->bd_slen != 0)
  			bpf_wakeup(d);
  	}
 -	BPFD_UNLOCK(d);
  }
  
  static int
 _______________________________________________
 svn-src-all@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/svn-src-all
 To unsubscribe, send any mail to "svn-src-all-unsubscribe@freebsd.org"
 
State-Changed-From-To: feedback->closed 
State-Changed-By: jkim 
State-Changed-When: Mon Mar 22 20:59:40 UTC 2010 
State-Changed-Why:  
Feedback received privately and the fixes were MFC'd to 6, 7, and 8. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=144453 
>Unformatted:
