From dan@kulesh.obluda.cz  Sat Dec 15 13:19:32 2007
Return-Path: <dan@kulesh.obluda.cz>
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 0F5BA16A420
	for <FreeBSD-gnats-submit@freebsd.org>; Sat, 15 Dec 2007 13:19:32 +0000 (UTC)
	(envelope-from dan@kulesh.obluda.cz)
Received: from smtp1.kolej.mff.cuni.cz (smtp1.kolej.mff.cuni.cz [78.128.192.4])
	by mx1.freebsd.org (Postfix) with ESMTP id A488813C442
	for <FreeBSD-gnats-submit@freebsd.org>; Sat, 15 Dec 2007 13:19:31 +0000 (UTC)
	(envelope-from dan@kulesh.obluda.cz)
Received: from kulesh.obluda.cz (openvpn.ms.mff.cuni.cz [195.113.20.87])
	by smtp1.kolej.mff.cuni.cz (8.13.8/8.13.8) with ESMTP id lBFDJCpP020958
	for <FreeBSD-gnats-submit@freebsd.org>; Sat, 15 Dec 2007 14:19:16 +0100 (CET)
	(envelope-from dan@kulesh.obluda.cz)
Received: from kulesh.obluda.cz (localhost. [127.0.0.1])
	by kulesh.obluda.cz (8.14.2/8.14.2) with ESMTP id lBFDJAXU001392
	for <FreeBSD-gnats-submit@freebsd.org>; Sat, 15 Dec 2007 14:19:10 +0100 (CET)
	(envelope-from dan@kulesh.obluda.cz)
Received: (from dan@localhost)
	by kulesh.obluda.cz (8.14.2/8.14.1/Submit) id lBFDJAcF001391;
	Sat, 15 Dec 2007 14:19:10 +0100 (CET)
	(envelope-from dan)
Message-Id: <200712151319.lBFDJAcF001391@kulesh.obluda.cz>
Date: Sat, 15 Dec 2007 14:19:10 +0100 (CET)
From: Dan Lukes <dan@obluda.cz>
Reply-To: Dan Lukes <dan@obluda.cz>
To: FreeBSD-gnats-submit@freebsd.org
Cc:
Subject: 'Giant not owned at ...' with 're' interface
X-Send-Pr-Version: 3.113
X-GNATS-Notify:

>Number:         118719
>Category:       kern
>Synopsis:       [re] 'Giant not owned at ...' with 're' interface
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    rwatson
>State:          closed
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Sat Dec 15 13:20:01 UTC 2007
>Closed-Date:    Sun Mar 02 15:01:39 UTC 2008
>Last-Modified:  Sun Mar 02 15:01:39 UTC 2008
>Originator:     Dan Lukes
>Release:        FreeBSD 6.3-PRERELEASE i386
>Organization:
Obludarium
>Environment:
FreeBSD 6.3-PRERELEASE i386
src/sys/net/if.c,v 1.234.2.21 2007/07/13 01:26:44 thompsa
src/sys/security/mac/mac_net.c,v 1.117 2005/07/05 23:39:50 rwatson
sys/net/bpf.c,v 1.153.2.12 2007/11/03 17:13:16 csjp
src/sys/net/if_ethersubr.c,v 1.193.2.15 2007/09/17 17:50:49 julian
src/sys/net/bpfdesc.h,v 1.29.2.3 2007/01/19 23:01:31 jhb
src/sys/sys/mutex.h,v 1.79.2.4 2006/08/01 18:38:35 jhb
src/sys/dev/re/if_re.c,v 1.46.2.36 2007/12/06 06:01:47 yongari
src/sys/kern/subr_bus.c,v 1.184.2.6 2007/11/05 11:49:44 phk
src/sys/kern/kern_intr.c,v 1.124.2.8 2007/10/29 21:10:03 emaste

Kernel compiled with IPSEC (=> MPSAFE network stack forced disabled; IPSEC require Giant)
Kernel compiled with INVARIANTS and INVARIANT_SUPPORT so missing lock trigger panic

 ****  ****  ****  ****  ****  ****  ****  ****
NOTE: This doesn't apply for CURRENT, but may be significant for 6.3-RELEASE / 6-STABLE
 ****  ****  ****  ****  ****  ****  ****  ****

>Description:
Panic triggered every time ( a packet arrive on 're' interface) AND (a bpf is active)
(for example - dhclient use bpf)

panic: mutex Giant not owned at 
#if MAC option compiled in
	mac_net.c:382
#else
	bpf.c:1345
#fi

On both places it is the BPFD_LOCK_ASSERT macro that call the panic()

NOTE - there seems not to be changes in the if_re.c in past few weeks 
that can cause that problem - so the problem may affect all 
MP_SAFE network card drivers - but I'm didn't test it.

Unfortunatelly, the kernel locks during memory dump, so I have no exact backtrace, but I hope I can reconstruct the possible flow by self.

mac_net.c:382 = mac_check_bpfdesc_receive()
bpf.c:1345 = catchpacket()

they are called from bpf_mtap() or bpf_mtap2() 
they are called from ether_vlan_mtop() which is part of ETHER_BPF_MTAP macro
the macro is used within device driver if_input method
this method is called from receiving interrupt service routine

in the 're' driver is interrupt declared as MP_SAFE so the Giant is not
locked by the core. Is it isn't acquired later as weel, the panic will be 
triggered.

>How-To-Repeat:
Compile kernel with INVARIANTS/INVARIANT_SUPPORT + (IPSEC or set mpsafenet to 0)

use dhclient or tcpdump on 're' interface. Wait for a packet

>Fix:

Workaround is simple:
don't use IPSEC or BPF or use a IFF_NEEDSGIANT devices only

you may also remove INVARIANTS to avoid panic on error, 
but race contition may occur if the Giant is really needed here

Fix:
we need to have Giant acquired in the packet input path, unless someone smarter
than me claimed it's not necesarry to have it - then we need correct the
within bpf's catchpacket() and mac's mac_check_bpfdesc_received routines

The re's bus_setup_intr() is called with INTR_MPSAFE | INTR_FAST flags
The INTR_MPSAFE is cleared by bus_setup_intr's logic, but INTR_FAST remain active.

Then the re's ISR routine will be called without Giant

unfortunatelly, I don know where is exact point of problem - if mpsafenet inactive we need either
 -----------
[1] bus_setup_intr shall clear the INTR_FAST as well

* or *

[2] Giant shall be acquired before calling of driver's ISR routine even if IH_FAST active

* or *

[3] correct the driver's ISR as it is responsible to acquire Giant by self
 -----------

in the later case I would like to note the msk and em drivers may have the same problem 
as they use INTR_FAST also but I didn't tried them

>Release-Note:
>Audit-Trail:
Responsible-Changed-From-To: freebsd-bugs->rwatson 
Responsible-Changed-By: rwatson 
Responsible-Changed-When: Sat Dec 15 16:30:11 UTC 2007 
Responsible-Changed-Why:  
Grab this PR to take a look. 


http://www.freebsd.org/cgi/query-pr.cgi?pr=118719 

From: Robert Watson <rwatson@FreeBSD.org>
To: Dan Lukes <dan@obluda.cz>
Cc: FreeBSD-gnats-submit@FreeBSD.org
Subject: Re: [Fwd: Re: kern/118719: 'Giant not owned at ...' with 're'
 interface]
Date: Sat, 12 Jan 2008 22:48:43 +0000 (GMT)

 On Sat, 15 Dec 2007, Dan Lukes wrote:
 
 > 	I hit another mpsafenet related panic. You may be interested in.
 >
 > 	Please don't hesisate I'm not going to inform you personally about 
 > all my PR submission. It's exception as 6.3-RELEASE is on the way and you did 
 > most of mpsafenet changes.
 
 This appears to be a device driver bug -- if_re on 6.x needs to conditionally 
 acquire Giant in any path that may reach the network stack when 
 debug.mpsafenet is set to 0.  It doesn't do this for its deferred tasks and 
 needs to.  The attached untested patch (or one very like it) might correct 
 this problem.  A fix won't make 6.3-RELEASE, but if this works for you, I can 
 merge it to 6-STABLE.  As you observe, the problem doesn't affect 7.x, as 
 there is no conditional Giant acquisition there.  Unfortunately, I have no 
 if_re hardware, but if you are able to test this, that would be very helpful.
 
 Robert N M Watson
 Computer Laboratory
 University of Cambridge
 
 Index: if_re.c
 ===================================================================
 RCS file: /data/fbsd-cvs/ncvs/src/sys/dev/re/if_re.c,v
 retrieving revision 1.46.2.36
 diff -u -r1.46.2.36 if_re.c
 --- if_re.c	6 Dec 2007 06:01:47 -0000	1.46.2.36
 +++ if_re.c	12 Jan 2008 22:46:01 -0000
 @@ -1991,6 +1991,7 @@
   	sc = arg;
   	ifp = sc->rl_ifp;
 
 +	NET_LOCK_GIANT();
   	RL_LOCK(sc);
 
   	status = CSR_READ_2(sc, RL_ISR);
 @@ -1998,12 +1999,14 @@
 
   	if (sc->suspended || !(ifp->if_flags & IFF_UP)) {
   		RL_UNLOCK(sc);
 +		NET_UNLOCK_GIANT();
   		return;
   	}
 
   #ifdef DEVICE_POLLING
   	if  (ifp->if_capenable & IFCAP_POLLING) {
   		RL_UNLOCK(sc);
 +		NET_UNLOCK_GIANT();
   		return;
   	}
   #endif
 @@ -2033,6 +2036,7 @@
   		taskqueue_enqueue_fast(taskqueue_fast, &sc->rl_txtask);
 
   	RL_UNLOCK(sc);
 +	NET_UNLOCK_GIANT();
 
           if ((CSR_READ_2(sc, RL_ISR) & RL_INTRS_CPLUS) || rval) {
   		taskqueue_enqueue_fast(taskqueue_fast, &sc->rl_inttask);
 @@ -2200,7 +2204,9 @@
   	struct ifnet		*ifp;
 
   	ifp = arg;
 +	NET_LOCK_GIANT();
   	re_start(ifp);
 +	NET_UNLOCK_GIANT();
 
   	return;
   }

From: Dan Lukes <dan@obluda.cz>
To: Robert Watson <rwatson@FreeBSD.org>
Cc: FreeBSD-gnats-submit@FreeBSD.org
Subject: Re: [Fwd: Re: kern/118719: 'Giant not owned at ...' with 're' interface]
Date: Sun, 13 Jan 2008 10:48:21 +0100

 Robert Watson napsal/wrote, On 01/12/08 23:48:
 > This appears to be a device driver bug -- if_re on 6.x needs to 
 > conditionally acquire Giant in any path that may reach the network stack 
 ...
 > Unfortunately, I have no if_re hardware, but if you are able to test 
 > this, that would be very helpful.
 
 Unfortunately, I have if_re hardware ... ;-)
 
 The patch seems to work.
 
 Thank you
 
 > A fix won't make 6.3-RELEASE, but if this works for you, I can merge it to 6-STABLE.
 
 	Hm. I don't know the missing locks are causing a real race condition or 
 not. If yes, the 6.3 branch may be unstable with 're'.
 
 	But no problem for me personally. I have own modified system of 
 building of kernel & world. It use csup but incorporate my own patches 
 from local repository into downloaded source tree.
 
 	I have a lot of patches already so I will add another one.
 
 	
 	Sincerely
 
 				Dan Lukes
 				University of Charles

From: Robert Watson <rwatson@FreeBSD.org>
To: Dan Lukes <dan@obluda.cz>
Cc: FreeBSD-gnats-submit@FreeBSD.org
Subject: Re: [Fwd: Re: kern/118719: 'Giant not owned at ...' with 're'
 interface]
Date: Sun, 13 Jan 2008 12:08:38 +0000 (GMT)

 On Sun, 13 Jan 2008, Dan Lukes wrote:
 
 > Robert Watson napsal/wrote, On 01/12/08 23:48:
 >> This appears to be a device driver bug -- if_re on 6.x needs to 
 >> conditionally acquire Giant in any path that may reach the network stack 
 > ...
 >> Unfortunately, I have no if_re hardware, but if you are able to test this, 
 >> that would be very helpful.
 >
 > Unfortunately, I have if_re hardware ... ;-)
 >
 > The patch seems to work.
 
 Thanks -- I'll merge it to 6-STABLE once 6.3 is done.
 
 >> A fix won't make 6.3-RELEASE, but if this works for you, I can merge it to 
 >> 6-STABLE.
 >
 > 	Hm. I don't know the missing locks are causing a real race condition 
 > or not. If yes, the 6.3 branch may be unstable with 're'.
 
 Potentially there could be issues with IPSEC stability on 6.3 when combined 
 with if_re interfaces unless this patch is in place.  I'm not sure this will 
 be worth an errata patch as we've not had any actual reports of instability, 
 and we generally reserve errata patches for cases where there are moderately 
 widespread repors.  However, getting it in 6-STABLE is no problem following 
 the release.
 
 Thanks for the report,
 
 Robert N M Watson
 Computer Laboratory
 University of Cambridge

From: Dan Lukes <dan@obluda.cz>
To: Robert Watson <rwatson@FreeBSD.org>
Cc: FreeBSD-gnats-submit@FreeBSD.org
Subject: Re: [Fwd: Re: kern/118719: 'Giant not owned at ...' with 're' interface]
Date: Sun, 13 Jan 2008 13:56:42 +0100

 Robert Watson napsal/wrote, On 01/13/08 13:08:
 > I'm not sure this will be worth an errata patch as we've not had any actual 
 > reports of instability
 
 	Of course. The 6.2 has no problem and 6.3 has not been released yet, so 
 computers in production environment are not affected (yet). I'm in doubt 
 about the number of computers with IPSEC dedicated to development or 
 beta-test.
 
 	On the other side, I hope no production computer have Realtek NIC as 
 they are not so good low-end hardware.
 
 	Well, we can wait if someone other will complain.
 
 	Thank you for your cooperation.
 
 	Sincerely
 
 					Dan Lukes
 					University of Charles
 
 

From: Robert Watson <rwatson@FreeBSD.org>
To: Dan Lukes <dan@obluda.cz>
Cc: FreeBSD-gnats-submit@FreeBSD.org
Subject: Re: [Fwd: Re: kern/118719: 'Giant not owned at ...' with 're'
 interface]
Date: Sun, 13 Jan 2008 13:53:22 +0000 (GMT)

 On Sun, 13 Jan 2008, Dan Lukes wrote:
 
 > Robert Watson napsal/wrote, On 01/13/08 13:08:
 >> I'm not sure this will be worth an errata patch as we've not had any actual 
 >> reports of instability
 >
 > Of course. The 6.2 has no problem and 6.3 has not been released yet, so 
 > computers in production environment are not affected (yet). I'm in doubt 
 > about the number of computers with IPSEC dedicated to development or 
 > beta-test.
 
 Are you sure that 6.2 has no problems in this regard?  The taskqueue 
 construct, which leads to entering the network stack from a task without Giant 
 when debug.mpsafenet is set to 0 (i.e., IPSEC not compiled in), was introduced 
 between FReeBSD 6.1 and 6.2, so should have appeared in 6.2-RELEASE.
 
 I'm not arguing this isn't a serious bug -- rather that the threshold for 
 doing an errata patch is not just that a bug be serious, but also that the 
 effects be widely felt, justifying the change going out on FreeBSD update, 
 announcements to announce@ mailing lists with patches, etc.  If we do see 
 wide-spread reports of problems, then we can and should do an errata patch, 
 but right now it doesn't seem to meet the bar: there is only one known report, 
 and that's based on running with INVARIANTS and seeing the assertion fail 
 rather than an actual symptom of the bug in the form of instability.
 
 Robert N M Watson
 Computer Laboratory
 University of Cambridge
 

From: Dan Lukes <dan@obluda.cz>
To: Robert Watson <rwatson@FreeBSD.org>
Cc: FreeBSD-gnats-submit@FreeBSD.org
Subject: Re: [Fwd: Re: kern/118719: 'Giant not owned at ...' with 're' interface]
Date: Sun, 13 Jan 2008 15:42:20 +0100

 Robert Watson napsal/wrote, On 01/13/08 14:53:
 > Are you sure that 6.2 has no problems in this regard? 
 
 	Mea culpa. You hit the point.
 
 	Althought the computer for tests has the specific OS configuration 
 (IPSEC+INVARIANTS) long time ago, the Realtek  has been new player 
 within the game (after mainboard change). I forgot about it.
 
 	I have tenths of production 6.2-R, a few of them with Realteks, but 
 they have no IPSEC nor INVARIANTS configuration.
 
 	Sorry for mis-information. It seems you are true, the 6.2 shall have 
 problem as well.
 
 
 				Dan Lukes
 				University of Charles
 
State-Changed-From-To: open->analyzed 
State-Changed-By: rwatson 
State-Changed-When: Sun Jan 13 18:04:55 UTC 2008 
State-Changed-Why:  
Change to analyzed state--problem is understood, proposed patch appears 
to work, but waiting for RELENG_6 to re-open for general merging before 
committing. 


http://www.freebsd.org/cgi/query-pr.cgi?pr=118719 

From: dfilter@FreeBSD.ORG (dfilter service)
To: bug-followup@FreeBSD.org
Cc:  
Subject: Re: kern/118719: commit references a PR
Date: Sun,  2 Mar 2008 14:54:54 +0000 (UTC)

 rwatson     2008-03-02 14:54:48 UTC
 
   FreeBSD src repository
 
   Modified files:        (Branch: RELENG_6)
     sys/dev/re           if_re.c 
   Log:
   Conditionally acquire Giant based on debug.mpsafenet around entry points
   from if_re taskqueue and other potentially Giant-free spots.  If we don't
   do this, Giant may not be held entering KAME IPSEC, etc.
   
   This problem appeared in FreeBSD 6.2 as a result of a move to fast
   interrupts, and does not exist in 7.x due to not having debug.mpsafenet.
   
   PR:             118719
   Reported by:    Dan Lukes <dan at obluda dot cz>
   Reviwed by:     yongari
   
   Revision   Changes    Path
   1.46.2.39  +6 -0      src/sys/dev/re/if_re.c
 _______________________________________________
 cvs-all@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/cvs-all
 To unsubscribe, send any mail to "cvs-all-unsubscribe@freebsd.org"
 
State-Changed-From-To: analyzed->closed 
State-Changed-By: rwatson 
State-Changed-When: Sun Mar 2 15:01:07 UTC 2008 
State-Changed-Why:  
Close PR, as fix has now been merged to RELENG_6 and should appear in 
FreeBSD 6.4. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=118719 
>Unformatted:
