From dan@kulesh.obluda.cz  Sun Sep  2 22:27:07 2007
Return-Path: <dan@kulesh.obluda.cz>
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id D605616A418
	for <FreeBSD-gnats-submit@freebsd.org>; Sun,  2 Sep 2007 22:27:07 +0000 (UTC)
	(envelope-from dan@kulesh.obluda.cz)
Received: from smtp1.kolej.mff.cuni.cz (smtp1.kolej.mff.cuni.cz [78.128.192.4])
	by mx1.freebsd.org (Postfix) with ESMTP id 8C18813C46B
	for <FreeBSD-gnats-submit@freebsd.org>; Sun,  2 Sep 2007 22:27:07 +0000 (UTC)
	(envelope-from dan@kulesh.obluda.cz)
Received: from kulesh.obluda.cz (openvpn.ms.mff.cuni.cz [195.113.20.87])
	by smtp1.kolej.mff.cuni.cz (8.13.8/8.13.8) with ESMTP id l82MQlZi085049
	for <FreeBSD-gnats-submit@freebsd.org>; Mon, 3 Sep 2007 00:26:53 +0200 (CEST)
	(envelope-from dan@kulesh.obluda.cz)
Received: from kulesh.obluda.cz (localhost. [127.0.0.1])
	by kulesh.obluda.cz (8.14.1/8.14.1) with ESMTP id l82MQlT3002156
	for <FreeBSD-gnats-submit@freebsd.org>; Mon, 3 Sep 2007 00:26:47 +0200 (CEST)
	(envelope-from dan@kulesh.obluda.cz)
Received: (from root@localhost)
	by kulesh.obluda.cz (8.14.1/8.13.8/Submit) id l82MQlSu002155;
	Mon, 3 Sep 2007 00:26:47 +0200 (CEST)
	(envelope-from dan)
Message-Id: <200709022226.l82MQlSu002155@kulesh.obluda.cz>
Date: Mon, 3 Sep 2007 00:26:47 +0200 (CEST)
From: Dan Lukes <dan@obluda.cz>
Reply-To: Dan Lukes <dan@obluda.cz>
To: FreeBSD-gnats-submit@freebsd.org
Cc:
Subject: Giant not owned at /usr/src/sys/netinet/tcp_sack.c:271=tcp_clean_sackreport(tp)
X-Send-Pr-Version: 3.113
X-GNATS-Notify:

>Number:         116034
>Category:       kern
>Synopsis:       Giant not owned at /usr/src/sys/netinet/tcp_sack.c:271=tcp_clean_sackreport(tp)
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    rwatson
>State:          closed
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Sun Sep 02 22:30:02 GMT 2007
>Closed-Date:    Sat Jan 12 22:30:16 UTC 2008
>Last-Modified:  Sat Jan 12 22:30:16 UTC 2008
>Originator:     Dan Lukes
>Release:        FreeBSD 6.2-STABLE i386
>Organization:
Obludarium
>Environment:
System: FreeBSD 6.2-STABLE (sources from Sep 2 21:30:25 CEST 2007)
src/sys/netinet/tcp_sack.c,v 1.26.2.3 2007/06/12 19:21:54 jhb
src/sys/netinet/tcp_subr.c,v 1.228.2.14 2006/12/30 17:58:46 jhb
src/sys/kern/kern_mbuf.c,v 1.9.2.9 2007/02/11 03:31:18 mohans
src/sys/netinet/tcp_input.c,v 1.281.2.13 2007/06/12 18:53:32 jhb

Kernel compiled with IPSEC (=> MPSAFE network stack forced disabled; IPSEC require Giant)
Kernel compiled with INVARIANTS and INVARIANT_SUPPORT so missing lock trigger panic

NOTE: This doesn't apply for CURRENT, but may be significant for 6.3-RELEASE unless
Giant dependency completely MFCed-out from network stack code

>Description:

tcp_clean_sackreport() called from tcp_drain() without Giant but calee require it when
not on MPSAFE network stack 


The relevant code fragments (the INP_LOCK_ASSERT call is the point of panic):

 -----------------------
tcp_clean_sackreport(tp)
        struct tcpcb *tp;
{
        int i;

        INP_LOCK_ASSERT(tp->t_inpcb);
        tp->rcv_numsacks = 0;
        for (i = 0; i < MAX_SACK_BLKS; i++)
                tp->sackblks[i].start = tp->sackblks[i].end=0;
}

 -----------------------

#define INP_LOCK_ASSERT(inp)    do {                                    \
        mtx_assert(&(inp)->inp_mtx, MA_OWNED);                          \
        NET_ASSERT_GIANT();                                             \
} while (0)

 -----------------------

#define NET_ASSERT_GIANT() do {                                         \
        if (!debug_mpsafenet)                                           \
                mtx_assert(&Giant, MA_OWNED);                           \
} while (0)

 -----------------------


1. debug->mpsafenet is forced to off because IPSEC compiled in kernel. 
2. The INP_LOCK_ASSERT(tp->t_inpcb) test for Giant when mpsafenet off. 
3. tcp_clean_sackreport called from tcp_drain called from mb_reclaim without Giant

4. ABEND on kernel compiled with INVARIANTS/INVARIANT_SUPPORT; possible race condition 
	when compiled without INVARIANT



 ===================================================

Kernel code backtrace:

#2  0xc04b9b7d in panic (fmt=0xc064cd1e "mutex %s not owned at %s:%d")
    at /usr/src/sys/kern/kern_shutdown.c:565
#3  0xc04b1e4f in _mtx_assert (m=0xc06a8880, what=0,
    file=0xc065a3bb "/usr/src/sys/netinet/tcp_sack.c", line=271)
    at /usr/src/sys/kern/kern_mutex.c:768
#4  0xc05480bb in tcp_clean_sackreport (tp=0xc49aa910)
    at /usr/src/sys/netinet/tcp_sack.c:271
#5  0xc0549927 in tcp_drain () at /usr/src/sys/netinet/tcp_subr.c:891
#6  0xc04b0a69 in mb_reclaim (junk=0x0) at /usr/src/sys/kern/kern_mbuf.c:557
#7  0xc05d6cd4 in vm_pageout_scan (pass=0) at /usr/src/sys/vm/vm_pageout.c:724
#8  0xc05d7eaf in vm_pageout () at /usr/src/sys/vm/vm_pageout.c:1554

>How-To-Repeat:
	Compile kernel with INVARIANTS/INVARIANT_SUPPORT + (IPSEC or set mpsafenet to 0)

	Do something that trigger mb_reclaim
>Fix:

	Workaround is simple:
net.inet.tcp.do_tcpdrain=0

	Fix:

add NET_LOCK_GIANT() / NET_UNLOCK_GIANT() into tcp_drain()
unless someone smarter than me claim that Giant is NOT required here
>Release-Note:
>Audit-Trail:
State-Changed-From-To: open->feedback 
State-Changed-By: kmacy 
State-Changed-When: Fri Nov 16 21:09:15 UTC 2007 
State-Changed-Why:  

Does this still occur on RELENG_7? IPSEC has been replaced with FAST_IPSEC. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=116034 
State-Changed-From-To: feedback->open 
State-Changed-By: kmacy 
State-Changed-When: Sun Nov 18 08:24:04 UTC 2007 
State-Changed-Why:  

Feedback received. Probably still an issue, will follow up when I have time to set up IPSEC. 


Responsible-Changed-From-To: freebsd-bugs->kmacy 
Responsible-Changed-By: kmacy 
Responsible-Changed-When: Sun Nov 18 08:24:04 UTC 2007 
Responsible-Changed-Why:  

Feedback received. Probably still an issue, will follow up when I have time to set up IPSEC 
to test.  

http://www.freebsd.org/cgi/query-pr.cgi?pr=116034 
Responsible-Changed-From-To: kmacy->rwatson 
Responsible-Changed-By: rwatson 
Responsible-Changed-When: Sat Nov 24 22:20:05 UTC 2007 
Responsible-Changed-Why:  
Grab ownership of this as debug.mpsafenet is something I've spent some time 
working with.  A casual glance seems to confirm that we should be 
conditionally acquiring Giant in the VM drain routine for the network 
stack.  Not having Giant here probably isn't a big problem in practice 
(other than the INVARIANTS panics, of course) since the drain routine 
shouldn't lead to transmission paths via IPSEC, but it's definitely worth 
fixing. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=116034 

From: Robert Watson <rwatson@FreeBSD.org>
To: Dan Lukes <dan@obluda.cz>
Cc: FreeBSD-gnats-submit@FreeBSD.org, freebsd-bugs@FreeBSD.org
Subject: Re: kern/116034: Giant not owned at	/usr/src/sys/netinet/tcp_sack.c:271=tcp_clean_sackreport(tp)
Date: Tue, 4 Dec 2007 16:26:26 +0000 (GMT)

 On Mon, 3 Sep 2007, Dan Lukes wrote:
 
 > add NET_LOCK_GIANT() / NET_UNLOCK_GIANT() into tcp_drain() unless someone 
 > smarter than me claim that Giant is NOT required here
 
 There's actually a slightly more general problem here--could you try this more 
 general patch?  I've requested to merge this to 6.3.
 
 Robert N M Watson
 Computer Laboratory
 University of Cambridge
 
 Index: kern_mbuf.c
 ===================================================================
 RCS file: /home/ncvs/src/sys/kern/kern_mbuf.c,v
 retrieving revision 1.9.2.9
 diff -u -r1.9.2.9 kern_mbuf.c
 --- kern_mbuf.c	11 Feb 2007 03:31:18 -0000	1.9.2.9
 +++ kern_mbuf.c	4 Dec 2007 15:53:52 -0000
 @@ -550,9 +550,11 @@
   	WITNESS_WARN(WARN_GIANTOK | WARN_SLEEPOK | WARN_PANIC, NULL,
   	    "mb_reclaim()");
 
 +	NET_LOCK_GIANT();
   	mbstat.m_drain++;
   	for (dp = domains; dp != NULL; dp = dp->dom_next)
   		for (pr = dp->dom_protosw; pr < dp->dom_protoswNPROTOSW; pr++)
   			if (pr->pr_drain != NULL)
   				(*pr->pr_drain)();
 +	NET_UNLOCK_GIANT();
   }

From: Eugene Grosbein <eugen@grosbein.pp.ru>
To: bug-followup@freebsd.org
Cc: Dan Lukes <dan@obluda.cz>, Robert Watson <rwatson@freebsd.org>
Subject: Re: kern/116034: Giant not owned at /usr/src/sys/netinet/tcp_sack.c:271=tcp_clean_sackreport(tp)
Date: Wed, 5 Dec 2007 01:00:46 +0700

 Robert Watson wrote:
 
 > There's actually a slightly more general problem here--could you try this more
 > general patch? I've requested to merge this to 6.3.
 
 Hi!
 
 I'm experiencing the same panic for INVARIANTS-enabled kernel now.
 Really, it panices every time just after boot. Your patch solves
 the problem, thank you.
 
 Eugene Grosbein

From: Dan Lukes <dan@obluda.cz>
To: Robert Watson <rwatson@FreeBSD.org>
Cc: FreeBSD-gnats-submit@FreeBSD.org, freebsd-bugs@FreeBSD.org
Subject: Re: kern/116034: Giant not owned at	/usr/src/sys/netinet/tcp_sack.c:271=tcp_clean_sackreport(tp)
Date: Tue, 04 Dec 2007 20:40:56 +0100

 Robert Watson napsal/wrote, On 12/04/07 17:26:
 > There's actually a slightly more general problem here--could you try 
 > this more general patch?
 
 	It should resolve the problem.
 
 	I tried to test it, the system is going to panic, but I'm not sure the 
 mb_reclaim() has been called. I inserted a printf just after 
 NET_LOCK_GIANT() but never seen the text on console even the do_drain 
 sysctl has been set to 1.
 
 	Do you know a reliable way how to trigger vm_lowmem event ? I tried to 
 allocate as much as possible memory, i tried to call EVENTHANDLER_INVOKE 
 from kernel module, but it seems the mb_reclaim has not been invoked.
 
 
 						Dan
 
 
 
 -- 
 Dan Lukes                                               SISAL MFF UK
 AKA: dan at obluda.cz, dan at freebsd.cz, dan at (kolej.)mff.cuni.cz

From: dfilter@FreeBSD.ORG (dfilter service)
To: bug-followup@FreeBSD.org
Cc:  
Subject: Re: kern/116034: commit references a PR
Date: Wed,  5 Dec 2007 00:00:21 +0000 (UTC)

 rwatson     2007-12-05 00:00:10 UTC
 
   FreeBSD src repository
 
   Modified files:        (Branch: RELENG_6)
     sys/kern             kern_mbuf.c 
   Log:
   Call NET_LOCK_GIANT/NET_UNLOCK_GIANT around calls to protocol drain
   methods in mb_reclaim().  This is not an MFC, as debug.mpsafenet is
   not present in 7.x or 8.x.  In practice, this likely resulted in
   instability only on kernels with INVARIANTS, as the protocol drain
   paths are generally MPSAFE.
   
   PR:             116034
   Reported by:    Dan Lukes <dan@obluda.cz>
   Discussed with: kmacy, alc
   Approved by:    re (kensmith)
   Tested by:      Eugene Grosbein <eugen at grosbein dot pp dot ru>
   
   Revision  Changes    Path
   1.9.2.10  +2 -0      src/sys/kern/kern_mbuf.c
 _______________________________________________
 cvs-all@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/cvs-all
 To unsubscribe, send any mail to "cvs-all-unsubscribe@freebsd.org"
 
State-Changed-From-To: open->patched 
State-Changed-By: rwatson 
State-Changed-When: Wed Dec 5 00:11:31 UTC 2007 
State-Changed-Why:  
Merged as kern_mbuf.c:1.9.2.10. 


http://www.freebsd.org/cgi/query-pr.cgi?pr=116034 

From: Robert Watson <rwatson@FreeBSD.org>
To: Dan Lukes <dan@obluda.cz>
Cc: bug-followup@freebsd.org
Subject: Re: kern/116034: Giant not owned at	/usr/src/sys/netinet/tcp_sack.c:271=tcp_clean_sackreport(tp)
Date: Wed, 5 Dec 2007 00:20:54 +0000 (GMT)

 On Tue, 4 Dec 2007, Dan Lukes wrote:
 
 > The following reply was made to PR kern/116034; it has been noted by GNATS.
 >
 > From: Dan Lukes <dan@obluda.cz>
 > To: Robert Watson <rwatson@FreeBSD.org>
 > Cc: FreeBSD-gnats-submit@FreeBSD.org, freebsd-bugs@FreeBSD.org
 > Subject: Re: kern/116034: Giant not owned at	/usr/src/sys/netinet/tcp_sack.c:271=tcp_clean_sackreport(tp)
 > Date: Tue, 04 Dec 2007 20:40:56 +0100
 >
 > Robert Watson napsal/wrote, On 12/04/07 17:26:
 > > There's actually a slightly more general problem here--could you try
 > > this more general patch?
 >
 > 	It should resolve the problem.
 >
 > 	I tried to test it, the system is going to panic, but I'm not sure the 
 > mb_reclaim() has been called. I inserted a printf just after 
 > NET_LOCK_GIANT() but never seen the text on console even the do_drain sysctl 
 > has been set to 1.
 >
 > 	Do you know a reliable way how to trigger vm_lowmem event ? I tried to 
 > allocate as much as possible memory, i tried to call EVENTHANDLER_INVOKE 
 > from kernel module, but it seems the mb_reclaim has not been invoked.
 
 Hmm.  I'm not sure I've ever tried to cause it to occur directly.  I guess the 
 problem is that this is not easily tested by just hooking up mb_reclaim to a 
 proc sysctl, as sysctl will hold Giant.  You could have a sysctl proc that 
 kicks off mb_reclaim in a task queue however.
 
 Robert N M Watson
 Computer Laboratory
 University of Cambridge

From: dfilter@FreeBSD.ORG (dfilter service)
To: bug-followup@FreeBSD.org
Cc:  
Subject: Re: kern/116034: commit references a PR
Date: Wed,  5 Dec 2007 00:47:55 +0000 (UTC)

 rwatson     2007-12-05 00:47:48 UTC
 
   FreeBSD src repository
 
   Modified files:        (Branch: RELENG_6_3)
     sys/kern             kern_mbuf.c 
   Log:
   Merge kern_mbuf.c:1.9.2.10 from RELENG_6 to RELENG_6_3:
   
     Call NET_LOCK_GIANT/NET_UNLOCK_GIANT around calls to protocol drain
     methods in mb_reclaim().  This is not an MFC, as debug.mpsafenet is
     not present in 7.x or 8.x.  In practice, this likely resulted in
     instability only on kernels with INVARIANTS, as the protocol drain
     paths are generally MPSAFE.
   
     PR:             116034
     Reported by:    Dan Lukes <dan@obluda.cz>
     Discussed with: kmacy, alc
     Approved by:    re (kensmith)
     Tested by:      Eugene Grosbein <eugen at grosbein dot pp dot ru>
   
   Revision     Changes    Path
   1.9.2.9.2.1  +2 -0      src/sys/kern/kern_mbuf.c
 _______________________________________________
 cvs-all@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/cvs-all
 To unsubscribe, send any mail to "cvs-all-unsubscribe@freebsd.org"
 
State-Changed-From-To: patched->closed 
State-Changed-By: rwatson 
State-Changed-When: Sat Jan 12 22:29:11 UTC 2008 
State-Changed-Why:  
Close PR -- the patch has been committed to 6-STABLE and will appear in 
FreeBSD 6.3-RELEASE.  I've had an additional success report from Eugene 
Grosbein as well. 


http://www.freebsd.org/cgi/query-pr.cgi?pr=116034 
>Unformatted:
