From yar@behemoth.ramtel.ru  Mon Oct  3 00:37:43 2005
Return-Path: <yar@behemoth.ramtel.ru>
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id CF48F16A41F;
	Mon,  3 Oct 2005 00:37:43 +0000 (GMT)
	(envelope-from yar@behemoth.ramtel.ru)
Received: from behemoth.ramtel.ru (behemoth.ramtel.ru [81.19.64.118])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 5E1B743D45;
	Mon,  3 Oct 2005 00:37:43 +0000 (GMT)
	(envelope-from yar@behemoth.ramtel.ru)
Received: from behemoth.ramtel.ru (localhost [127.0.0.1])
	by behemoth.ramtel.ru (8.13.4/8.13.4) with ESMTP id j930bfxc025925;
	Mon, 3 Oct 2005 04:37:41 +0400 (MSD)
	(envelope-from yar@behemoth.ramtel.ru)
Received: (from yar@localhost)
	by behemoth.ramtel.ru (8.13.4/8.13.3/Submit) id j930bfsV025924;
	Mon, 3 Oct 2005 04:37:41 +0400 (MSD)
	(envelope-from yar)
Message-Id: <200510030037.j930bfsV025924@behemoth.ramtel.ru>
Date: Mon, 3 Oct 2005 04:37:41 +0400 (MSD)
From: Yar Tikhiy <yar@comp.chem.msu.su>
To: FreeBSD-gnats-submit@freebsd.org
Cc: mlaier@freebsd.org, glebius@freebsd.org
Subject: [pf][multicast] destroying active syncdev leads to panic
X-Send-Pr-Version: 3.113
X-GNATS-Notify:

>Number:         86848
>Category:       kern
>Synopsis:       [pf][multicast] destroying active syncdev leads to panic
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    bms
>State:          closed
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Mon Oct 03 00:40:17 GMT 2005
>Closed-Date:    Sun Apr 29 20:40:10 GMT 2007
>Last-Modified:  Sun Apr 29 20:40:10 GMT 2007
>Originator:     Yar Tikhiy
>Release:        FreeBSD 7.0-CURRENT i386
>Organization:
MSU
>Environment:
	CURRENT
	
>Description:
	If you destroy an interface that is acting as syncdev
	for pfsync0, the system will panic as soon as the next
	pfsync update is sent out, i.e., in a moment.

	This looks like another case of deref'ing a cached pointer
	to a now-dead interface by IP multicast code.  Such cases
	should be dealt with at once IMHO instead of rolling particular
	solutions to each of them.

	The panic is reproducible.  The stack trace is as follows:

	(kgdb) bt
	#0  doadump () at pcpu.h:165
	#1  0xc04daa04 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:399
	#2  0xc04dacaf in panic (fmt=0xc06439b5 "from debugger")
	    at /usr/src/sys/kern/kern_shutdown.c:555
	#3  0xc045b549 in db_panic (addr=-1068253756, have_addr=0, count=-1, modif=0xcc7f4918 "")
	    at /usr/src/sys/ddb/db_command.c:434
	#4  0xc045b4e0 in db_command (last_cmdp=0xc06a1504, cmd_table=0x0,
	    aux_cmd_tablep=0xc066fd40, aux_cmd_tablep_end=0xc066fd44)
	    at /usr/src/sys/ddb/db_command.c:403
	#5  0xc045b5a8 in db_command_loop () at /usr/src/sys/ddb/db_command.c:454
	#6  0xc045d19d in db_trap (type=12, code=0) at /usr/src/sys/ddb/db_main.c:221
	#7  0xc04f26d3 in kdb_trap (type=12, code=0, tf=0xcc7f4ab0)
	    at /usr/src/sys/kern/subr_kdb.c:473
	#8  0xc062b984 in trap_fatal (frame=0xcc7f4ab0, eva=3735929054)
	    at /usr/src/sys/i386/i386/trap.c:822
	#9  0xc062b6f3 in trap_pfault (frame=0xcc7f4ab0, usermode=0, eva=3735929054)
	    at /usr/src/sys/i386/i386/trap.c:742
	#10 0xc062b33d in trap (frame=
	      {tf_fs = 8, tf_es = 40, tf_ds = 40, tf_edi = 0, tf_esi = -1067127111, tf_ebp = -864072976, tf_isp = -864072996, tf_ebx = -1067127084, tf_edx = -559038242, tf_ecx = 0, tf_eax = -559038242, tf_trapno = 12, tf_err = 0, tf_eip = -1068253756, tf_cs = 32, tf_eflags = 582, tf_esp = -864072776, tf_ss = -1068545840}) at /usr/src/sys/i386/i386/trap.c:432
	#11 0xc061e21a in calltrap () at /usr/src/sys/i386/i386/exception.s:139
	#12 0xc053bdc4 in strlen (str=0xdeadc0de <Address 0xdeadc0de out of bounds>)
	    at /usr/src/sys/libkern/strlen.c:41
	#13 0xc04f48d0 in kvprintf (fmt=0xc064eed4 " @ %s:%d", func=0xc04f41f8 <snprintf_func>,
	    arg=0xcc7f4bd4, radix=10, ap=0xcc7f4c1c "le(\001")
	    at /usr/src/sys/kern/subr_prf.c:679
	#14 0xc04f4195 in vsnprintf (str=0xdeadc0de <Address 0xdeadc0de out of bounds>,
	    size=3735929054, format=0xc064eeb9 "mtx_lock() of spin mutex %s @ %s:%d",
	    ap=0xcc7f4c18 "le(\001") at /usr/src/sys/kern/subr_prf.c:413
	#15 0xc04dabfb in panic (fmt=0xc064eeb9 "mtx_lock() of spin mutex %s @ %s:%d")
	    at /usr/src/sys/kern/kern_shutdown.c:522
	---Type <return> to continue, or q <return> to quit---
	#16 0xc04d308b in _mtx_lock_flags (m=0xc1284660, opts=0,
	    file=0xc065b26c "/usr/src/sys/netinet/ip_output.c", line=296)
	    at /usr/src/sys/kern/kern_mutex.c:269
	#17 0xc0554447 in ip_output (m=0xc14b2100, opt=0xc1394aa8, ro=0xcc7f4c5c, flags=2,
	    imo=0xc122f208, inp=0x0) at /usr/src/sys/netinet/ip_output.c:296
	#18 0xc0435fa2 in pfsync_senddef (arg=0xc122f200)
	    at /usr/src/sys/contrib/pf/net/if_pfsync.c:1836
	#19 0xc04e5d9d in softclock (dummy=0x0) at /usr/src/sys/kern/kern_timeout.c:290
	#20 0xc04c873c in ithread_loop (arg=0xc1104600) at /usr/src/sys/kern/kern_intr.c:547
	#21 0xc04c7bac in fork_exit (callout=0xc04c85f8 <ithread_loop>, arg=0xc1104600,
	    frame=0xcc7f4d38) at /usr/src/sys/kern/kern_fork.c:789
	#22 0xc061e27c in fork_trampoline () at /usr/src/sys/i386/i386/exception.s:208
	(kgdb) frame 17
	#17 0xc0554447 in ip_output (m=0xc14b2100, opt=0xc1394aa8, ro=0xcc7f4c5c, flags=2,
	    imo=0xc122f208, inp=0x0) at /usr/src/sys/netinet/ip_output.c:296
	296                     IN_LOOKUP_MULTI(ip->ip_dst, ifp, inm);
	(kgdb) frame 18
	#18 0xc0435fa2 in pfsync_senddef (arg=0xc122f200)
	    at /usr/src/sys/contrib/pf/net/if_pfsync.c:1836
	1836                    if (ip_output(m, NULL, NULL, IP_RAWOUTPUT, &sc->sc_imo, NULL))

>How-To-Repeat:
	ifconfig vlan1 ... up
	ifconfig pfsync0 syncdev vlan1 up
	ifconfig vlan1 unplumb
	[panic!]

>Fix:
>Release-Note:
>Audit-Trail:
State-Changed-From-To: open->feedback 
State-Changed-By: bms 
State-Changed-When: Fri Feb 9 02:29:05 UTC 2007 
State-Changed-Why:  
Do you still get this with HEAD? 
It may have been fixed; carp has needed fixes for problems like this. 


Responsible-Changed-From-To: freebsd-bugs->bms 
Responsible-Changed-By: bms 
Responsible-Changed-When: Fri Feb 9 02:29:05 UTC 2007 
Responsible-Changed-Why:  
I'll take this 

http://www.freebsd.org/cgi/query-pr.cgi?pr=86848 
State-Changed-From-To: feedback->analyzed 
State-Changed-By: bms 
State-Changed-When: Fri Feb 9 13:09:20 UTC 2007 
State-Changed-Why:  
Reproduced on HEAD. 
This is an architectural problem; pfsync is not notified of 
detaching children; so it couldn't deal with that condition anyway. 
Although the in_multi rejig has triggered this bug. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=86848 

From: Bruce M Simpson <bms@incunabulum.net>
To: freebsd-gnats-submit@FreeBSD.org
Cc: Gleb Smirnoff <glebius@FreeBSD.org>, Yar Tikhiy <yar@FreeBSD.org>, 
 net@FreeBSD.org
Subject: Re: kern/86848: [pf][multicast] destroying active syncdev leads to
 panic
Date: Sun, 25 Feb 2007 16:15:37 +0000

 This is a multi-part message in MIME format.
 --------------040203040900030206070206
 Content-Type: text/plain; charset=ISO-8859-1; format=flowed
 Content-Transfer-Encoding: 7bit
 
 
   Hi,
 
 
 Please try the attached patch which should hopefully fix this issue 
 (untested).
 
 Regards,
 BMS
 
 
 --------------040203040900030206070206
 Content-Type: text/x-patch;
  name="pfsyncdev-inmulti.patch"
 Content-Transfer-Encoding: 7bit
 Content-Disposition: inline;
  filename="pfsyncdev-inmulti.patch"
 
 ? .swp
 Index: if_pfsync.c
 ===================================================================
 RCS file: /home/ncvs/src/sys/contrib/pf/net/if_pfsync.c,v
 retrieving revision 1.32
 diff -u -p -r1.32 if_pfsync.c
 --- if_pfsync.c	29 Dec 2006 13:59:47 -0000	1.32
 +++ if_pfsync.c	25 Feb 2007 16:11:03 -0000
 @@ -170,6 +170,9 @@ void	pfsync_timeout(void *);
  void	pfsync_send_bus(struct pfsync_softc *, u_int8_t);
  void	pfsync_bulk_update(void *);
  void	pfsync_bulkfail(void *);
 +#ifdef __FreeBSD__
 +static void	pfsync_ifdetach(void *, struct ifnet *);
 +#endif
  
  int	pfsync_sync_ok;
  #ifndef __FreeBSD__
 @@ -191,6 +194,9 @@ pfsync_clone_destroy(struct ifnet *ifp)
          struct pfsync_softc *sc;
  
  	sc = ifp->if_softc;
 +#ifdef __FreeBSD__
 +	EVENTHANDLER_DEREGISTER(ifnet_departure_event, sc->sc_detachtag);
 +#endif
  	callout_stop(&sc->sc_tmo);
  	callout_stop(&sc->sc_bulk_tmo);
  	callout_stop(&sc->sc_bulkfail_tmo);
 @@ -225,6 +231,16 @@ pfsync_clone_create(struct if_clone *ifc
  		return (ENOSPC);
  	}
  
 +#ifdef __FreeBSD__
 +	sc->sc_detachtag = EVENTHANDLER_REGISTER(ifnet_departure_event,
 +	    pfsync_ifdetach, sc, EVENTHANDLER_PRI_ANY);
 +	if (sc->sc_detachtag == NULL) {
 +		if_free(ifp);
 +		free(sc, M_PFSYNC);
 +		return (ENOSPC);
 +	}
 +#endif
 +
  	pfsync_sync_ok = 1;
  	sc->sc_mbuf = NULL;
  	sc->sc_mbuf_net = NULL;
 @@ -1870,6 +1886,35 @@ pfsync_sendout(sc)
  
  #ifdef __FreeBSD__
  static void
 +pfsync_ifdetach(void *arg, struct ifnet *ifp)
 +{
 +	struct pfsync_softc *sc = (struct pfsync_softc *)arg;
 +	struct ip_moptions *imo;
 +
 +	if (sc == NULL || sc->sc_sync_ifp != ifp)
 +		return;		/* not for us; unlocked read */
 +
 +	PF_LOCK();
 +
 +	/* Deal with detaching an interface which went away. */
 +	sc->sc_sync_ifp = NULL;
 +	if (sc->sc_mbuf_net != NULL) {
 +		s = splnet();
 +		m_freem(sc->sc_mbuf_net);
 +		sc->sc_mbuf_net = NULL;
 +		sc->sc_statep_net.s = NULL;
 +		splx(s);
 +	}
 +	imo = &sc->sc_imo;
 +	if (imo->imo_num_memberships > 0) {
 +		in_delmulti(imo->imo_membership[--imo->imo_num_memberships]);
 +		imo->imo_multicast_ifp = NULL;
 +	}
 +
 +	PF_UNLOCK();
 +}
 +
 +static void
  pfsync_senddef(void *arg)
  {
  	struct pfsync_softc *sc = (struct pfsync_softc *)arg;
 @@ -1879,6 +1924,14 @@ pfsync_senddef(void *arg)
  		IF_DEQUEUE(&sc->sc_ifq, m);
  		if (m == NULL)
  			break;
 +#if 1
 +		/* XXX: paranoia */
 +		if (sc->sc_sync_ifp == NULL) {
 +			pfsyncstats.pfsyncs_oerrors++;
 +			m_freem(m);
 +			continue;
 +		}
 +#endif
  		if (ip_output(m, NULL, NULL, IP_RAWOUTPUT, &sc->sc_imo, NULL))
  			pfsyncstats.pfsyncs_oerrors++;
  	}
 Index: if_pfsync.h
 ===================================================================
 RCS file: /home/ncvs/src/sys/contrib/pf/net/if_pfsync.h,v
 retrieving revision 1.7
 diff -u -p -r1.7 if_pfsync.h
 --- if_pfsync.h	10 Jun 2005 17:23:49 -0000	1.7
 +++ if_pfsync.h	25 Feb 2007 16:11:03 -0000
 @@ -181,6 +181,7 @@ struct pfsync_softc {
  	int			 sc_maxupdates;	/* number of updates/state */
  #ifdef __FreeBSD__
  	LIST_ENTRY(pfsync_softc) sc_next;
 +	eventhandler_tag	 sc_detachtag;
  #endif
  };
  #endif
 
 --------------040203040900030206070206--

From: "Bruce M. Simpson" <bms@FreeBSD.org>
To: Bruce M Simpson <bms@incunabulum.net>
Cc: freebsd-gnats-submit@FreeBSD.org, Yar Tikhiy <yar@FreeBSD.org>, 
 Gleb Smirnoff <glebius@FreeBSD.org>,
  net@FreeBSD.org
Subject: Re: kern/86848: [pf][multicast] destroying active syncdev leads to
 panic
Date: Sun, 25 Feb 2007 16:44:37 +0000

 Whups. That needs 'int s' or the spl calls removed.
 I am under the weather today (dry flu type virus)...

From: Yar Tikhiy <yar@comp.chem.msu.su>
To: Bruce M Simpson <bms@incunabulum.net>
Cc: freebsd-gnats-submit@FreeBSD.org, Gleb Smirnoff <glebius@FreeBSD.org>,
        net@FreeBSD.org
Subject: Re: kern/86848: [pf][multicast] destroying active syncdev leads to panic
Date: Mon, 12 Mar 2007 20:00:39 +0300

 On Sun, Feb 25, 2007 at 04:15:37PM +0000, Bruce M Simpson wrote:
 > 
 > Please try the attached patch which should hopefully fix this issue 
 > (untested).
 
 I'm sorry to come up with bad news, but the patch resulted in a
 different panic:
 
 -- 
 Yar
 
 Kernel page fault with the following non-sleepable locks held:
 exclusive sleep mutex pf task mtx r = 0 (0xc0679ee4) locked @ contrib/pf/net/if_                                              pfsync.c:1897
 KDB: stack backtrace:
 db_trace_self_wrapper(c0625b0a) at db_trace_self_wrapper+0x25
 kdb_backtrace(1,0,c,cccbcadc,cccbcad0,...) at kdb_backtrace+0x29
 witness_warn(5,0,c0641504) at witness_warn+0x192
 trap(cccbcadc) at trap+0x144
 calltrap() at calltrap+0x6
 --- trap 0xc, eip = 0xc056a4ab, esp = 0xcccbcb1c, ebp = 0xcccbcb24 ---
 in_delmulti(c1982ae0,c1a1e4a0,cccbcb60,c055ae35,c19d5300,...) at in_delmulti+0xb
 pfsync_ifdetach(c19d5300,c19d0c00,c1a1a00c,0,c062ec24,...) at pfsync_ifdetach+0x                                              99
 if_detach(c19d0c00,c19d0c00,c19d0c00,cccbcb8c,c1b12be8,...) at if_detach+0x2cd
 ether_ifdetach(c19d0c00,c19d0c00,c1b14840,2d,cccbcbc4,...) at ether_ifdetach+0x3                                              a
 vlan_clone_destroy(c1b14840,c19d0c00,c19d0c00,c1b133ee,c1b14870,0,c062f063,d6) a                                              t vlan_clone_destroy+0x14
 if_clone_destroyif(c1b14840,c19d0c00,80206979,c19aa720,cccbcbf4,...) at if_clone                                              _destroyif+0xc7
 if_clone_destroy(c19aa720,80206979,c1abee44,c19aa720,cccbcc1c,...) at if_clone_d                                              estroy+0x81
 ifioctl(c1abee44,80206979,c19aa720,c1ab36c0,0,...) at ifioctl+0xbe
 soo_ioctl(c1aa2480,80206979,c19aa720,c18f0d00,c1ab36c0) at soo_ioctl+0x2db
 kern_ioctl(c1ab36c0,3,80206979,c19aa720) at kern_ioctl+0x296
 ioctl(c1ab36c0,cccbcd00) at ioctl+0xf1
 syscall(cccbcd38) at syscall+0x256
 Xint0x80_syscall() at Xint0x80_syscall+0x20
 --- syscall (54, FreeBSD ELF32, ioctl), eip = 0x281582df, esp = 0xbfbfe52c, ebp                                               = 0xbfbfe548 ---
 
 
 Fatal trap 12: page fault while in kernel mode
 fault virtual address   = 0xdeadc138
 fault code              = supervisor read, page not present
 instruction pointer     = 0x20:0xc056a4ab
 stack pointer           = 0x28:0xcccbcb1c
 frame pointer           = 0x28:0xcccbcb24
 code segment            = base 0x0, limit 0xfffff, type 0x1b
                         = DPL 0, pres 1, def32 1, gran 1
 processor eflags        = interrupt enabled, resume, IOPL = 0
 current process         = 144 (ifconfig)
 panic: from debugger
 Uptime: 1m52s
 Physical memory: 251 MB
 Dumping 16 MB: 1
 
 #0  doadump () at pcpu.h:147
 147     pcpu.h: No such file or directory.
         in pcpu.h
 (kgdb) bt
 #0  doadump () at pcpu.h:147
 #1  0xc04e73d0 in boot (howto=260) at ../../../kern/kern_shutdown.c:409
 #2  0xc04e767b in panic (fmt=0xc06150e1 "from debugger") at ../../../kern/kern_shutdown.c:563
 #3  0xc045640e in db_panic (addr=-1068063573, have_addr=0, count=-1, modif=0xcccbc91c "") at ../../../ddb/db_command.c:433
 #4  0xc04563a7 in db_command (last_cmdp=0xc067d804, cmd_table=0x0) at ../../../ddb/db_command.c:401
 #5  0xc0456462 in db_command_loop () at ../../../ddb/db_command.c:453
 #6  0xc04580ad in db_trap (type=12, code=0) at ../../../ddb/db_main.c:222
 #7  0xc0505889 in kdb_trap (type=12, code=0, tf=0x0) at ../../../kern/subr_kdb.c:502
 #8  0xc05fcc1d in trap_fatal (frame=0xcccbcadc, eva=3735929144) at ../../../i386/i386/trap.c:859
 #9  0xc05fc2cc in trap (frame=0xcccbcadc) at ../../../i386/i386/trap.c:276
 #10 0xc05e8ddb in calltrap () at ../../../i386/i386/exception.s:139
 #11 0xc056a4ab in in_delmulti (inm=0xc1982ae0) at ../../../netinet/in.c:1052
 #12 0xc0433f99 in pfsync_ifdetach (arg=0xc19d5300, ifp=0xc19cee00) at ../../../contrib/pf/net/if_pfsync.c:1908
 #13 0xc055ae35 in if_detach (ifp=0xc19d0c00) at ../../../net/if.c:709
 #14 0xc055fb9a in ether_ifdetach (ifp=0xc19d0c00) at ../../../net/if_ethersubr.c:917
 #15 0xc1b12be8 in vlan_clone_destroy (ifc=0xc1b14840, ifp=0xc19d0c00) at /usr/src/sys/modules/if_vlan/../../net/if_vlan.c:766
 #16 0xc055e27f in if_clone_destroyif (ifc=0xc1b14840, ifp=0xc19d0c00) at ../../../net/if_clone.c:218
 #17 0xc055e1b1 in if_clone_destroy (name=0xc19aa720 "vlan0") at ../../../net/if_clone.c:196
 #18 0xc055d03a in ifioctl (so=0xc1abee44, cmd=2149607801, data=0xc19aa720 "vlan0", td=0xc1ab36c0) at ../../../net/if.c:1810
 #19 0xc0518237 in soo_ioctl (fp=0xc19cee00, cmd=2149607801, data=0xc19aa720, active_cred=0xc18f0d00, td=0xc1ab36c0)
     at ../../../kern/sys_socket.c:202
 #20 0xc0512f7a in kern_ioctl (td=0xc1ab36c0, fd=3, com=2149607801, data=0xc19aa720 "vlan0") at file.h:266
 #21 0xc0512c9d in ioctl (td=0xc1ab36c0, uap=0xcccbcd00) at ../../../kern/sys_generic.c:544
 #22 0xc05fcf12 in syscall (frame=0xcccbcd38) at ../../../i386/i386/trap.c:1008
 #23 0xc05e8e40 in Xint0x80_syscall () at ../../../i386/i386/exception.s:196
 #24 0x00000033 in ?? ()
 Previous frame inner to this frame (corrupt stack?)
 (kgdb) frame 11
 #11 0xc056a4ab in in_delmulti (inm=0xc1982ae0) at ../../../netinet/in.c:1052
 1052            ifp = inm->inm_ifp;
 (kgdb) p inm
 $4 = (struct in_multi *) 0xc1982ae0
 (kgdb) p *inm
 $5 = {inm_link = {le_next = 0xdeadc0de, le_prev = 0xdeadc0de}, inm_addr = {s_addr = 3735929054}, inm_ifp = 0xdeadc0de,
   inm_ifma = 0xdeadc0de, inm_timer = 3735929054, inm_state = 3735929054, inm_rti = 0xc0665040}
 
 %%% EOF %%%

From: "Bruce M. Simpson" <bms@FreeBSD.org>
To: Yar Tikhiy <yar@comp.chem.msu.su>
Cc: freebsd-gnats-submit@FreeBSD.org
Subject: Re: kern/86848: [pf][multicast] destroying active syncdev leads to
 panic
Date: Thu, 15 Mar 2007 19:48:29 +0000

 We're on the right track.
 It looks like the fix I gave you in the PR should work if refcounting is 
 implemented properly.
 I will do the work in p4 and push patches out when ready.

From: dfilter@FreeBSD.ORG (dfilter service)
To: bug-followup@FreeBSD.org
Cc:  
Subject: Re: kern/86848: commit references a PR
Date: Mon, 19 Mar 2007 17:52:26 +0000 (UTC)

 bms         2007-03-19 17:52:15 UTC
 
   FreeBSD src repository
 
   Modified files:
     sys/contrib/pf/net   if_pfsync.c if_pfsync.h 
   Log:
   Teach pfsync(4) that its member interfaces may go away.
   
   This change partially resolves the issue in the PR. Further architectural
   fixes, in the form of reference counting, are needed.
   
   PR:             86848
   Reviewed by:    yar
   MFC after:      1 month
   
   Revision  Changes    Path
   1.33      +49 -0     src/sys/contrib/pf/net/if_pfsync.c
   1.8       +1 -0      src/sys/contrib/pf/net/if_pfsync.h
 _______________________________________________
 cvs-all@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/cvs-all
 To unsubscribe, send any mail to "cvs-all-unsubscribe@freebsd.org"
 
State-Changed-From-To: analyzed->patched 
State-Changed-By: bms 
State-Changed-When: Tue Mar 20 00:39:42 UTC 2007 
State-Changed-Why:  
Architectural changes in -CURRENT should address this 

http://www.freebsd.org/cgi/query-pr.cgi?pr=86848 
State-Changed-From-To: patched->feedback 
State-Changed-By: bms 
State-Changed-When: Sun Apr 1 22:03:36 UTC 2007 
State-Changed-Why:  
I believe this issue is now resolved in -current, and the multicast 
refcounting changes have had a chance to settle in. 
Can you confirm if the issue is resolved? 
If so I will try to MFC the changes, providing they don't break the abi. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=86848 

From: Yar Tikhiy <yar@comp.chem.msu.su>
To: Bruce M Simpson <bms@FreeBSD.org>
Cc: bug-followup@FreeBSD.org
Subject: Re: kern/86848: [pf][multicast] destroying active syncdev leads to panic
Date: Fri, 6 Apr 2007 15:36:53 +0400

 On Sun, Apr 01, 2007 at 10:04:29PM +0000, Bruce M Simpson wrote:
 >
 > I believe this issue is now resolved in -current, and the multicast
 > refcounting changes have had a chance to settle in.
 > Can you confirm if the issue is resolved?
 
 Sorry, but the panic is still there.  CVSup'd and rebuilt the whole
 system just an hour ago.
 
 -- 
 Yar
 
 Unread portion of the kernel message buffer:
 Kernel page fault with the following non-sleepable locks held:
 exclusive sleep mutex pf task mtx r = 0 (0xc067a244) locked @ contrib/pf/net/if_pfsync.c:1897
 KDB: stack backtrace:
 db_trace_self_wrapper(c0625a52) at db_trace_self_wrapper+0x25
 kdb_backtrace(1,0,c,cccccaf4,cccccae8,...) at kdb_backtrace+0x29
 witness_warn(5,0,c0641592) at witness_warn+0x192
 trap(cccccaf4) at trap+0x140
 calltrap() at calltrap+0x6
 --- trap 0xc, eip = 0xc0568c4b, esp = 0xcccccb34, ebp = 0xcccccb3c ---
 in_delmulti(c1a66c80,c19d21e0,cccccb78,c0559385,c19ca900,...) at in_delmulti+0x23
 pfsync_ifdetach(c19ca900,c19a1400,c1a1978c,0,c062ebf2,...) at pfsync_ifdetach+0x99
 if_detach(c19a1400,c19a1400,c19a1400,cccccba4,c1d34b2c,...) at if_detach+0x2d5
 ether_ifdetach(c19a1400,c19a1400,c1d367a0,2d,cccccbdc,...) at ether_ifdetach+0x3a
 vlan_clone_destroy(c1d367a0,c19a1400,c19a1400,c1d352d5,c1d367d0,0,c062f073,d6) at vlan_clone_destroy+0x14
 if_clone_destroyif(c1d367a0,c19a1400) at if_clone_destroyif+0xc7
 if_clone_detach(c1d367a0,cccccc1c,c04dd306,c1ca8e80,1,...) at if_clone_detach+0xa5
 vlan_modevent(c1ca8e80,1,0) at vlan_modevent+0x1b
 module_unload(c1ca8e80,0,c0691ad0,c0620b09,24d,c1ca8e80) at module_unload+0x4e
 linker_file_unload(c1acf100,0) at linker_file_unload+0x99
 kern_kldunload(c1accbd0,3,0,cccccd2c,c05fc8aa,...) at kern_kldunload+0x95
 kldunloadf(c1accbd0,cccccd00) at kldunloadf+0x1e
 syscall(cccccd38) at syscall+0x252
 Xint0x80_syscall() at Xint0x80_syscall+0x20
 --- syscall (444, FreeBSD ELF32, kldunloadf), eip = 0x280bfa6f, esp = 0xbfbfe7dc, ebp = 0xbfbfec48 ---
 
 
 Fatal trap 12: page fault while in kernel mode
 fault virtual address   = 0xdeadc0ee
 fault code              = supervisor read, page not present
 instruction pointer     = 0x20:0xc0568c4b
 stack pointer           = 0x28:0xcccccb34
 frame pointer           = 0x28:0xcccccb3c
 code segment            = base 0x0, limit 0xfffff, type 0x1b
                         = DPL 0, pres 1, def32 1, gran 1
 processor eflags        = interrupt enabled, resume, IOPL = 0
 current process         = 852 (kldunload)
 panic: from debugger
 Uptime: 15m51s
 Physical memory: 251 MB
 Dumping 23 MB: 8
 
 #0  doadump () at pcpu.h:172
 172     pcpu.h: No such file or directory.
         in pcpu.h
 (kgdb) bt
 #0  doadump () at pcpu.h:172
 #1  0xc04e60f8 in boot (howto=260) at ../../../kern/kern_shutdown.c:409
 #2  0xc04e63a3 in panic (fmt=0xc0614d81 "from debugger")
     at ../../../kern/kern_shutdown.c:563
 #3  0xc045674a in db_panic (addr=-1068069813, have_addr=0, count=-1,
     modif=0xccccc934 "") at ../../../ddb/db_command.c:433
 #4  0xc04566e3 in db_command (last_cmdp=0xc067db64, cmd_table=0x0)
     at ../../../ddb/db_command.c:401
 #5  0xc045679e in db_command_loop () at ../../../ddb/db_command.c:453
 #6  0xc04583e9 in db_trap (type=12, code=0) at ../../../ddb/db_main.c:222
 #7  0xc0504aa9 in kdb_trap (type=12, code=0, tf=0x0)
     at ../../../kern/subr_kdb.c:502
 #8  0xc05fc5b9 in trap_fatal (frame=0xcccccaf4, eva=3735929070)
     at ../../../i386/i386/trap.c:859
 #9  0xc05fbc68 in trap (frame=0xcccccaf4) at ../../../i386/i386/trap.c:276
 #10 0xc05e849b in calltrap () at ../../../i386/i386/exception.s:139
 #11 0xc0568c4b in in_delmulti (inm=0xc1a66c80) at ../../../netinet/in.c:1099
 #12 0xc04342d5 in pfsync_ifdetach (arg=0xc19ca900, ifp=0xdeadc0de)
     at ../../../contrib/pf/net/if_pfsync.c:1908
 #13 0xc0559385 in if_detach (ifp=0xc19a1400) at ../../../net/if.c:734
 #14 0xc055e2ce in ether_ifdetach (ifp=0xc19a1400)
     at ../../../net/if_ethersubr.c:923
 #15 0xc1d34b2c in vlan_clone_destroy (ifc=0xc1d367a0, ifp=0xc19a1400)
 ---Type <return> to continue, or q <return> to quit---
     at /usr/src/sys/modules/if_vlan/../../net/if_vlan.c:761
 #16 0xc055c987 in if_clone_destroyif (ifc=0xc1d367a0, ifp=0xc19a1400)
     at ../../../net/if_clone.c:218
 #17 0xc055cced in if_clone_detach (ifc=0xc1d367a0)
     at ../../../net/if_clone.c:283
 #18 0xc1d33c07 in vlan_modevent (mod=0xc1ca8e80, type=-559038242, data=0x0)
     at /usr/src/sys/modules/if_vlan/../../net/if_vlan.c:547
 #19 0xc04dd306 in module_unload (mod=0xc1ca8e80, flags=0)
     at ../../../kern/kern_module.c:244
 #20 0xc04d7869 in linker_file_unload (file=0xc1acf100, flags=0)
     at ../../../kern/kern_linker.c:594
 #21 0xc04d8199 in kern_kldunload (td=0xc1accbd0, fileid=3, flags=0)
     at ../../../kern/kern_linker.c:942
 #22 0xc04d824a in kldunloadf (td=0xc1accbd0, uap=0x0)
     at ../../../kern/kern_linker.c:971
 #23 0xc05fc8aa in syscall (frame=0xcccccd38) at ../../../i386/i386/trap.c:1008
 #24 0xc05e8500 in Xint0x80_syscall () at ../../../i386/i386/exception.s:196
 #25 0x00000033 in ?? ()
 Previous frame inner to this frame (corrupt stack?)
 (kgdb) frame 11
 #11 0xc0568c4b in in_delmulti (inm=0xc1a66c80) at ../../../netinet/in.c:1099
 1099            ifp = inm->inm_ifma->ifma_ifp;
 (kgdb) p inm->inm_ifma
 $1 = (struct ifmultiaddr *) 0xdeadc0de
 (kgdb) p *inm
 $2 = {inm_link = {le_next = 0xdeadc0de, le_prev = 0xdeadc0de}, inm_addr = {
     s_addr = 3735929054}, inm_ifp = 0xdeadc0de, inm_ifma = 0xdeadc0de,
   inm_timer = 3735929054, inm_state = 3735929054, inm_rti = 0xdeadc0de,
   inm_refcount = 3735929054}
 

From: Yar Tikhiy <yar@comp.chem.msu.su>
To: Bruce M Simpson <bms@FreeBSD.org>
Cc: bug-followup@FreeBSD.org
Subject: Re: kern/86848: [pf][multicast] destroying active syncdev leads to panic
Date: Fri, 6 Apr 2007 15:46:54 +0400

 On Fri, Apr 06, 2007 at 03:36:53PM +0400, Yar Tikhiy wrote:
 > On Sun, Apr 01, 2007 at 10:04:29PM +0000, Bruce M Simpson wrote:
 > >
 > > I believe this issue is now resolved in -current, and the multicast
 > > refcounting changes have had a chance to settle in.
 > > Can you confirm if the issue is resolved?
 > 
 > Sorry, but the panic is still there.  CVSup'd and rebuilt the whole
 > system just an hour ago.
 
 Just realized that it must be a different panic, now it happens
 instantly, on the if_clone destruction path.  It's still multicast
 related though.
 
 -- 
 Yar

From: "Bruce M. Simpson" <bms@FreeBSD.org>
To: Yar Tikhiy <yar@comp.chem.msu.su>
Cc: bug-followup@FreeBSD.org
Subject: Re: kern/86848: [pf][multicast] destroying active syncdev leads to
 panic
Date: Sat, 07 Apr 2007 03:59:16 +0100

 Yar Tikhiy wrote:
 > On Fri, Apr 06, 2007 at 03:36:53PM +0400, Yar Tikhiy wrote:
 >   
 >> On Sun, Apr 01, 2007 at 10:04:29PM +0000, Bruce M Simpson wrote:
 >>     
 >>> I believe this issue is now resolved in -current, and the multicast
 >>> refcounting changes have had a chance to settle in.
 >>> Can you confirm if the issue is resolved?
 >>>       
 >> Sorry, but the panic is still there.  CVSup'd and rebuilt the whole
 >> system just an hour ago.
 >>     
 >
 > Just realized that it must be a different panic, now it happens
 > instantly, on the if_clone destruction path.  It's still multicast
 > related though.
 >   
 That may be an improvement... I think the ordering of the detach 
 operations may need rejigged now. From the looks of your backtrace it 
 appears the panic with vlan+pfsync now happens at netinet level, rather 
 than link layer level, and after the in_ifma got freed with the ifp 
 still present, which is something I haven't accounted for.
 
 Do you know where the 0xdeadcode is coming from? There is an assertion 
 to catch that sort of thing in in_delmulti(), but only if the pointer is 
 NULL.
 
 My gut feeling about this one is that something somewhere didn't bump 
 the refcount on the underlying link-layer membership, because the code 
 currently assumes that the ifma object refcount is correct.
 
 BMS
 
 
 
State-Changed-From-To: feedback->analyzed 
State-Changed-By: bms 
State-Changed-When: Tue Apr 10 11:30:39 UTC 2007 
State-Changed-Why:  
additional work needed, but we're on the right track. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=86848 

From: "Bruce M. Simpson" <bms@FreeBSD.org>
To: "Bruce M. Simpson" <bms@FreeBSD.org>
Cc: Yar Tikhiy <yar@comp.chem.msu.su>,  bug-followup@FreeBSD.org
Subject: Re: kern/86848: [pf][multicast] destroying active syncdev leads to
 panic
Date: Thu, 12 Apr 2007 13:59:03 +0100

 This is a multi-part message in MIME format.
 --------------040801000307010406040700
 Content-Type: text/plain; charset=ISO-8859-1; format=flowed
 Content-Transfer-Encoding: 7bit
 
 Bruce M. Simpson wrote:
 > Yar Tikhiy wrote:
 >>
 >> Just realized that it must be a different panic, now it happens
 >> instantly, on the if_clone destruction path.  It's still multicast
 >> related though.
 >>   
 
 I've finally had a chance to sit down and resolve this.
 
 Because the group which pfsync joins on behalf of its member interface 
 only has a refcount of 1 (it is the only consumer), the netinet code 
 will free this membership when it is detached. pfsync's detach event 
 handler is always called after in_purgemaddrs(). It then tries to call 
 in_delmulti(), which is a use-after-free bug. This triggers a panic 
 because the underlying ifma object has been freed.
 
 Try this patch. It fixes the symptoms of the problem but I'm not sure 
 that it's the right fix.
 
 One alternative is to split up the in_multi refcounting in such a way 
 that the stack can deal with the underlying state having been freed by 
 an earlier call. The way this could be done is by the 
 in_delmulti_locked() function accepting an argument 'detaching' which it 
 then uses to determine whether or not it should free the lower-level 
 state and free the object, just like what now happens for struct 
 ifmultiaddr; then perhaps the in_purgemaddrs() function can be retired, 
 and/or taught to signal an error if the refcount on an object is 1 
 (protocols should clean up after themselves) and only clean up the 
 224.0.0.1 membership which is allocated when an ifnet is first attached 
 to netinet.
 
 If this is good for you I'll commit it to -CURRENT.
 
 Regards,
 BMS
 
 --------------040801000307010406040700
 Content-Type: text/x-patch;
  name="pfsync.diff"
 Content-Transfer-Encoding: 7bit
 Content-Disposition: inline;
  filename="pfsync.diff"
 
 ==== //depot/user/bms/netdev/sys/contrib/pf/net/if_pfsync.c#4 - /home/bms/p4/netdev/sys/contrib/pf/net/if_pfsync.c ====
 --- /tmp/tmp.1169.0	Thu Apr 12 13:51:04 2007
 +++ /home/bms/p4/netdev/sys/contrib/pf/net/if_pfsync.c	Thu Apr 12 13:41:03 2007
 @@ -1908,7 +1908,15 @@
  	}
  	imo = &sc->sc_imo;
  	if (imo->imo_num_memberships > 0) {
 -		in_delmulti(imo->imo_membership[--imo->imo_num_memberships]);
 +		KASSERT(imo->imo_num_memberships == 1,
 +			("%s: imo_num_memberships != 1", __func__)); 
 +		/*
 +		 * Our event handler is always called after protocol
 +		 * domains have been detached from the underlying ifnet.
 +		 * Do not call in_delmulti(); we held a single reference
 +		 * which the protocol domain has purged in in_purgemaddrs().
 +		 */
 +		imo->imo_membership[--imo->imo_num_memberships] = NULL;
  		imo->imo_multicast_ifp = NULL;
  	}
  
 
 --------------040801000307010406040700--
State-Changed-From-To: analyzed->patched 
State-Changed-By: bms 
State-Changed-When: Sat Apr 14 01:09:04 UTC 2007 
State-Changed-Why:  
A possibly final fix for the issue has been committed to -CURRENT. 
It is unlikely that this fix will be MFCed as it is cumulative with 
significant changes in architecture. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=86848 

From: dfilter@FreeBSD.ORG (dfilter service)
To: bug-followup@FreeBSD.org
Cc:  
Subject: Re: kern/86848: commit references a PR
Date: Sat, 14 Apr 2007 01:01:55 +0000 (UTC)

 bms         2007-04-14 01:01:46 UTC
 
   FreeBSD src repository
 
   Modified files:
     sys/contrib/pf/net   if_pfsync.c 
   Log:
   In member interface detach event handler, do not attempt to free state
   which has already been freed by in_ifdetach(). With this cumulative change,
   the removal of a member interface will not cause a panic in pfsync(4).
   
   Requested by:   yar
   PR:             86848
   
   Revision  Changes    Path
   1.34      +9 -1      src/sys/contrib/pf/net/if_pfsync.c
 _______________________________________________
 cvs-all@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/cvs-all
 To unsubscribe, send any mail to "cvs-all-unsubscribe@freebsd.org"
 

From: Yar Tikhiy <yar@comp.chem.msu.su>
To: Bruce M Simpson <bms@FreeBSD.org>
Cc: bug-followup@FreeBSD.org
Subject: Re: kern/86848: [pf][multicast] destroying active syncdev leads to panic
Date: Thu, 19 Apr 2007 00:34:27 +0400

 On Sat, Apr 14, 2007 at 01:09:46AM +0000, Bruce M Simpson wrote:
 >
 > A possibly final fix for the issue has been committed to -CURRENT.
 > It is unlikely that this fix will be MFCed as it is cumulative with
 > significant changes in architecture.
 
 CURRENT has finally stopped panicing in my case.  Thank you a lot!
 
 -- 
 Yar
State-Changed-From-To: patched->closed 
State-Changed-By: bms 
State-Changed-When: Sun Apr 29 20:39:47 UTC 2007 
State-Changed-Why:  
Closed with submitter's acknowledgement -- this fix will not be merged 
to -STABLE. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=86848 
>Unformatted:
