From nobody@FreeBSD.org  Tue Dec 28 16:36:53 2010
Return-Path: <nobody@FreeBSD.org>
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id B66F3106564A
	for <freebsd-gnats-submit@FreeBSD.org>; Tue, 28 Dec 2010 16:36:53 +0000 (UTC)
	(envelope-from nobody@FreeBSD.org)
Received: from red.freebsd.org (unknown [IPv6:2001:4f8:fff6::22])
	by mx1.freebsd.org (Postfix) with ESMTP id 6D93F8FC08
	for <freebsd-gnats-submit@FreeBSD.org>; Tue, 28 Dec 2010 16:36:53 +0000 (UTC)
Received: from red.freebsd.org (localhost [127.0.0.1])
	by red.freebsd.org (8.14.4/8.14.4) with ESMTP id oBSGarvh053636
	for <freebsd-gnats-submit@FreeBSD.org>; Tue, 28 Dec 2010 16:36:53 GMT
	(envelope-from nobody@red.freebsd.org)
Received: (from nobody@localhost)
	by red.freebsd.org (8.14.4/8.14.4/Submit) id oBSGaros053635;
	Tue, 28 Dec 2010 16:36:53 GMT
	(envelope-from nobody)
Message-Id: <201012281636.oBSGaros053635@red.freebsd.org>
Date: Tue, 28 Dec 2010 16:36:53 GMT
From: mike tancsa <mike@sentex.net>
To: freebsd-gnats-submit@FreeBSD.org
Subject: netgraph panic with ipv6 enabled
X-Send-Pr-Version: www-3.1
X-GNATS-Notify:

>Number:         153497
>Category:       kern
>Synopsis:       [netgraph] netgraph panic due to race conditions
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    freebsd-net
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Tue Dec 28 16:40:09 UTC 2010
>Closed-Date:    
>Last-Modified:  Mon May  9 20:40:08 UTC 2011
>Originator:     mike tancsa
>Release:        RELENG_8
>Organization:
Sentex Communications
>Environment:
FreeBSD 8.2-PRERELEASE #6: Sun Dec 12 16:25:12 EST 2010
>Description:
Using mpd5 as an LNS with approximately 600 sessions using just IPV4 is nice and stable. However, if I enable ipv6 in mpd5 and on the system, the box will panic anywhere from a few days to a couple of weeks.  Enabling WITNESS in the kernel seems to make the issue more acute. The panic appears to be in the same place each time.

I have the core dump files available




panic: page fault

GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i386-marcel-freebsd"...

Unread portion of the kernel message buffer:


Fatal trap 12: page fault while in kernel mode
cpuid = 1; apic id = 01
fault virtual address   = 0x24
fault code              = supervisor read, page not present
instruction pointer     = 0x20:0xc5f02dc5
stack pointer           = 0x28:0xc4f5a928
frame pointer           = 0x28:0xc4f5a93c
code segment            = base 0x0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, def32 1, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 1006 (ng_queue0)
trap number             = 12
panic: page fault
cpuid = 1
Uptime: 15d18h26m46s
Physical memory: 2036 MB
Dumping 281 MB: 266panic: bufwrite: buffer is not busy???
cpuid = 1
 250
<110>ipfw: 10 Deny UDP 192.168.1.4:19008 64.7.156.213:1204 in via em1
 234 218 202 186 170 154 138 122 106 90 74 58 42 26 10


#0  doadump () at pcpu.h:231
231     pcpu.h: No such file or directory.
        in pcpu.h
(kgdb) #0  doadump () at pcpu.h:231
#1  0xc068cee3 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:419
#2  0xc068d147 in panic (fmt=Variable "fmt" is not available.
) at /usr/src/sys/kern/kern_shutdown.c:592
#3  0xc08f082c in trap_fatal (frame=0xc4f5a8e8, eva=36)
    at /usr/src/sys/i386/i386/trap.c:946
#4  0xc08f0a90 in trap_pfault (frame=0xc4f5a8e8, usermode=0, eva=36)
    at /usr/src/sys/i386/i386/trap.c:859
#5  0xc08f0f39 in trap (frame=0xc4f5a8e8) at /usr/src/sys/i386/i386/trap.c:532
#6  0xc08d825c in calltrap () at /usr/src/sys/i386/i386/exception.s:166
#7  0xc5f02dc5 in ng_address_hook (here=0x0, item=0xc61fd9c0, 
    hook=0xcd018980, retaddr=0)
    at /usr/src/sys/modules/netgraph/netgraph/../../../netgraph/ng_base.c:3504
#8  0xc60a6bdd in ng_ppp_bypass (node=Variable "node" is not available.
)
    at /usr/src/sys/modules/netgraph/ppp/../../../netgraph/ng_ppp.c:901
#9  0xc60a79c5 in ng_ppp_rcvdata (hook=0xcbf6d880, item=0xc61fd9c0)
    at /usr/src/sys/modules/netgraph/ppp/../../../netgraph/ng_ppp.c:1524
#10 0xc5f04774 in ng_apply_item (node=0xc60e2e00, item=0xc61fd9c0, rw=0)
    at /usr/src/sys/modules/netgraph/netgraph/../../../netgraph/ng_base.c:2336
#11 0xc5f0374f in ng_snd_item (item=0xc61fd9c0, flags=Variable "flags" is not available.
)
    at /usr/src/sys/modules/netgraph/netgraph/../../../netgraph/ng_base.c:2253
#12 0xc5f04774 in ng_apply_item (node=0xc645d180, item=0xc61fd9c0, rw=0)
    at /usr/src/sys/modules/netgraph/netgraph/../../../netgraph/ng_base.c:2336
#13 0xc5f0374f in ng_snd_item (item=0xc61fd9c0, flags=Variable "flags" is not available.
)
    at /usr/src/sys/modules/netgraph/netgraph/../../../netgraph/ng_base.c:2253
#14 0xc5f04774 in ng_apply_item (node=0xc63afe00, item=0xc61fd9c0, rw=0)
    at /usr/src/sys/modules/netgraph/netgraph/../../../netgraph/ng_base.c:2336
#15 0xc5f0374f in ng_snd_item (item=0xc61fd9c0, flags=Variable "flags" is not available.
)
    at /usr/src/sys/modules/netgraph/netgraph/../../../netgraph/ng_base.c:2253
#16 0xc6094a4c in ng_ksocket_incoming2 (node=0xc6195000, hook=0x0, 
    arg1=0xc618180c, arg2=0)
    at /usr/src/sys/modules/netgraph/ksocket/../../../netgraph/ng_ksocket.c:1153
#17 0xc5f048a9 in ng_apply_item (node=0xc6195000, item=0xc5f0ad40, rw=1)
    at /usr/src/sys/modules/netgraph/netgraph/../../../netgraph/ng_base.c:2407
#18 0xc5f059f6 in ngthread (arg=0x0)
    at /usr/src/sys/modules/netgraph/netgraph/../../../netgraph/ng_base.c:3351
#19 0xc06627d1 in fork_exit (callout=0xc5f05890 <ngthread>, arg=0x0, 
    frame=0xc4f5ad28) at /usr/src/sys/kern/kern_fork.c:845
#20 0xc08d82d4 in fork_trampoline () at /usr/src/sys/i386/i386/exception.s:273
(kgdb) 
(kgdb) up 7
#7  0xc5f02dc5 in ng_address_hook (here=0x0, item=0xc61fd9c0, hook=0xcd018980, retaddr=0) at /usr/src/sys/modules/netgraph/netgraph/../../../netgraph/ng_base.c:3504
3504            if ((hook == NULL) ||
(kgdb) list
3499             * Quick sanity check..
3500             * Since a hook holds a reference on it's node, once we know
3501             * that the peer is still connected (even if invalid,) we know
3502             * that the peer node is present, though maybe invalid.
3503             */
3504            if ((hook == NULL) ||
3505                NG_HOOK_NOT_VALID(hook) ||
3506                NG_HOOK_NOT_VALID(peer = NG_HOOK_PEER(hook)) ||
3507                NG_NODE_NOT_VALID(peernode = NG_PEER_NODE(hook))) {
3508                  

(kgdb) p *hook
$1 = {hk_name = "bypass", '\0' <repeats 25 times>, hk_private = 0x1, hk_flags = 0, hk_type = 0, hk_peer = 0xcb58e180, hk_node = 0xc60e2e00, hk_hooks = {le_next = 0x0, 
    le_prev = 0xcbf6d8b4}, hk_rcvmsg = 0, hk_rcvdata = 0xc60a9d00 <ng_ppp_rcvdata_bypass>, hk_refs = 2}
(kgdb) p *peer
$2 = {hk_name = "\b\000\000\000 \000\000\000\005\000\000\000\000\000\000\000\035X)\004\036QCcmd5\000\000\000", hk_private = 0x0, hk_flags = 0, hk_type = 0, hk_peer = 0x0, 
  hk_node = 0x0, hk_hooks = {le_next = 0xc613e000, le_prev = 0x7466656c}, hk_rcvmsg = 0x67697232, hk_rcvdata = 0x7468, hk_refs = 0}
(kgdb) p *peer
$3 = {hk_name = "\b\000\000\000 \000\000\000\005\000\000\000\000\000\000\000\035X)\004\036QCcmd5\000\000\000", hk_private = 0x0, hk_flags = 0, hk_type = 0, hk_peer = 0x0, 
  hk_node = 0x0, hk_hooks = {le_next = 0xc613e000, le_prev = 0x7466656c}, hk_rcvmsg = 0x67697232, hk_rcvdata = 0x7468, hk_refs = 0}
(kgdb) p *item
$4 = {el_flags = 5, el_next = {stqe_next = 0x0}, el_dest = 0x0, el_hook = 0x0, body = {da_m = 0xcd160a00, msg = {msg_msg = 0xcd160a00, msg_retaddr = 0}, fn = {fn_fn = {
        fn_fn = 0xcd160a00, fn_fn2 = 0xcd160a00}, fn_arg1 = 0x0, fn_arg2 = 0}}, apply = 0x0, depth = 3}
(kgdb) p *peernode
Cannot access memory at address 0x0
(kgdb) 
>How-To-Repeat:
make an ipv6 enabled LNS with ~500 connections or more. Wait a week or two.
>Fix:


>Release-Note:
>Audit-Trail:
Responsible-Changed-From-To: freebsd-bugs->freebsd-net 
Responsible-Changed-By: linimon 
Responsible-Changed-When: Mon Jan 3 20:48:20 UTC 2011 
Responsible-Changed-Why:  
Over to maintainer(s). 

http://www.freebsd.org/cgi/query-pr.cgi?pr=153497 

From: Mike Tancsa <mike@sentex.net>
To: bug-followup@FreeBSD.org
Cc:  
Subject: Re: kern/153497: [netgraph] netgraph panic with ipv6 enabled
Date: Thu, 06 Jan 2011 08:30:08 -0500

 Another panic
 
 
 
 Fatal trap 12: page fault while in kernel mode
 cpuid = 0; apic id = 00
 fault virtual address   = 0x537b6
 fault code              = supervisor read, page not present
 instruction pointer     = 0x20:0xc5f29e79
 stack pointer           = 0x28:0xc4e8f9b4
 frame pointer           = 0x28:0xc4e8f9d0
 code segment            = base 0x0, limit 0xfffff, type 0x1b
                         = DPL 0, pres 1, def32 1, gran 1
 processor eflags        = interrupt enabled, resume, IOPL = 0
 current process         = 0 (em1 taskq)
 trap number             = 12
 panic: page fault
 cpuid = 0
 Uptime: 6d7h49m4s
 Physical memory: 2036 MB
 Dumping 273 MB: 258 242 226 210 194 178 162 146 130 114 98 82 66 50 34 18 2
 
 
 (kgdb) bt
 #0  doadump () at pcpu.h:231
 #1  0xc068cee3 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:419
 #2  0xc068d147 in panic (fmt=Variable "fmt" is not available.
 ) at /usr/src/sys/kern/kern_shutdown.c:592
 #3  0xc08f082c in trap_fatal (frame=0xc4e8f974, eva=341942) at
 /usr/src/sys/i386/i386/trap.c:946
 #4  0xc08f0a90 in trap_pfault (frame=0xc4e8f974, usermode=0, eva=341942)
 at /usr/src/sys/i386/i386/trap.c:859
 #5  0xc08f0f39 in trap (frame=0xc4e8f974) at
 /usr/src/sys/i386/i386/trap.c:532
 #6  0xc08d825c in calltrap () at /usr/src/sys/i386/i386/exception.s:166
 #7  0xc5f29e79 in ng_address_hook (here=0x0, item=0xc5f56180,
 hook=0xca5d1300, retaddr=0) at
 /usr/src/sys/modules/netgraph/netgraph/../../../netgraph/ng_base.c:3525
 #8  0xc5f6777d in ng_iface_send (ifp=0xcbad4800, m=0xcabfa200,
 sa=Variable "sa" is not available.
 ) at /usr/src/sys/modules/netgraph/iface/../../../netgraph/ng_iface.c:475
 #9  0xc5f67bd8 in ng_iface_output (ifp=0xcbad4800, m=0xcabfa200,
 dst=0xc4e8fafc, ro=0xc4e8faf4) at
 /usr/src/sys/modules/netgraph/iface/../../../netgraph/ng_iface.c:410
 #10 0xc075f58e in ip_output (m=0xcabfa200, opt=0x0, ro=0xc4e8faf4,
 flags=Variable "flags" is not available.
 ) at /usr/src/sys/netinet/ip_output.c:634
 #11 0xc075c3f9 in ip_forward (m=0xcabfa200, srcrt=0) at
 /usr/src/sys/netinet/ip_input.c:1521
 #12 0xc075da02 in ip_input (m=0xcabfa200) at
 /usr/src/sys/netinet/ip_input.c:729
 #13 0xc073ffc9 in netisr_dispatch_src (proto=1, source=0, m=0xcabfa200)
 at /usr/src/sys/net/netisr.c:917
 #14 0xc0740260 in netisr_dispatch (proto=1, m=0xcabfa200) at
 /usr/src/sys/net/netisr.c:1004
 #15 0xc0737111 in ether_demux (ifp=0xc5275400, m=0xcabfa200) at
 /usr/src/sys/net/if_ethersubr.c:894
 #16 0xc073767f in ether_input (ifp=0xc5275400, m=0xcabfa200) at
 /usr/src/sys/net/if_ethersubr.c:753
 #17 0xc052e9aa in em_rxeof (rxr=0xc520c400, count=98, done=0x0) at
 /usr/src/sys/dev/e1000/if_em.c:4283
 #18 0xc052ebcd in em_handle_que (context=0xc5277000, pending=1) at
 /usr/src/sys/dev/e1000/if_em.c:1482
 #19 0xc06c6a8a in taskqueue_run_locked (queue=0xc5270000) at
 /usr/src/sys/kern/subr_taskqueue.c:250
 #20 0xc06c6c1c in taskqueue_thread_loop (arg=0xc527b568) at
 /usr/src/sys/kern/subr_taskqueue.c:387
 #21 0xc06627d1 in fork_exit (callout=0xc06c6b60 <taskqueue_thread_loop>,
 arg=0xc527b568, frame=0xc4e8fd28) at /usr/src/sys/kern/kern_fork.c:845
 #22 0xc08d82d4 in fork_trampoline () at
 /usr/src/sys/i386/i386/exception.s:273
 (kgdb) up 7
 #7  0xc5f29e79 in ng_address_hook (here=0x0, item=0xc5f56180,
 hook=0xca5d1300, retaddr=0) at
 /usr/src/sys/modules/netgraph/netgraph/../../../netgraph/ng_base.c:3525
 3525            if (peernode == NULL) {
 (kgdb) list
 3520            if (NG_HOOK_NOT_VALID(peer)) {
 3521                    BZXXXPRINTF("NG_HOOK_NOT_VALID(peer)");
 3522                    goto outahere;
 3523            }
 3524            peernode = NG_PEER_NODE(hook);
 3525            if (peernode == NULL) {
 3526                    BZXXXPRINTF("peernode == NULL");
 3527                    goto outahere;
 3528            }
 3529            if (NG_NODE_NOT_VALID(peernode)) {
 (kgdb)
 (kgdb) p *hook
 $1 = {hk_name = "inet", '\0' <repeats 27 times>, hk_private = 0x0,
 hk_flags = 48, hk_type = 0, hk_peer = 0xcb7eb100, hk_node = 0xc6226c00,
 hk_hooks = {le_next = 0x0,
     le_prev = 0xc6226c34}, hk_rcvmsg = 0, hk_rcvdata = 0, hk_refs = 2}
 (kgdb) p *peer
 $2 = {hk_name = "ng381", '\0' <repeats 26 times>, hk_private =
 0xc5f69160, hk_flags = 0, hk_type = 1, hk_peer = 0xcb51c6c0, hk_node =
 0x53792, hk_hooks = {le_next = 0xca6d5e00,
     le_prev = 0xcb4a3480}, hk_rcvmsg = 0xc5f30904 <ng_name_hash+452>,
 hk_rcvdata = 0xcb720c00, hk_refs = -973929112}
 (kgdb) p *peernode
 Cannot access memory at address 0x53792

From: "Bjoern A. Zeeb" <bz@FreeBSD.org>
To: Mike Tancsa <mike@sentex.net>
Cc: bug-followup@FreeBSD.org
Subject: Re: kern/153497: [netgraph] netgraph panic with ipv6 enabled
Date: Thu, 6 Jan 2011 14:21:39 +0000 (UTC)

 On Thu, 6 Jan 2011, Mike Tancsa wrote:
 
 I would just like to add two things:
 
 a) this isn't really IPv6 related; it's just a matetr of timing and
     depending on options and modules used it's more or less likely.
     It's basically races in the netgraph code.
 
 b) the XXXBZ printfs come from where he was previously seeing a NULL
     pointer de-ref, which in the last case looks like random memory
     corruption or indeed just another race.
     The debugging patch was:
     http://people.freebsd.org/~bz/20101228-01-ng-base-race-pr153497.diff
     with the printf in the hook == NULL case removed again, as that
     happened a lot, which I think isn't right either but ...
 
 -- 
 Bjoern A. Zeeb                                 You have to have visions!
          <ks> Going to jail sucks -- <bz> All my daemons like it!
    http://www.freebsd.org/doc/en_US.ISO8859-1/books/handbook/jails.html

From: Eugene Grosbein <egrosbein@rdtc.ru>
To: bug-followup@FreeBSD.ORG
Cc: mike tancsa <mike@sentex.net>
Subject: Re: kern/153497: [netgraph] netgraph panic due to race conditions
Date: Tue, 10 May 2011 03:15:52 +0700

 Hi!
 
 Could you check out latest RELENG_8?
 
 glebius@ mad several changes eliminating races that made my mpd servers pretty stable.
 Changes have been MFC'd.
 
 Eugene Grosbein

From: Mike Tancsa <mike@sentex.net>
To: Eugene Grosbein <egrosbein@rdtc.ru>
Cc: bug-followup@FreeBSD.ORG
Subject: Re: kern/153497: [netgraph] netgraph panic due to race conditions
Date: Mon, 09 May 2011 16:34:49 -0400

 On 5/9/2011 4:15 PM, Eugene Grosbein wrote:
 > Hi!
 > 
 > Could you check out latest RELENG_8?
 > 
 > glebius@ mad several changes eliminating races that made my mpd servers pretty stable.
 > Changes have been MFC'd.
 
 I am no longer seeing any netgraph panics, but the box is still crashing
 after about 15 days uptime now. As soon as I track down more
 information, I will open a new ticket.
 
 	---Mike
 
 > 
 > Eugene Grosbein
 > 
 > 
 
 
 -- 
 -------------------
 Mike Tancsa, tel +1 519 651 3400
 Sentex Communications, mike@sentex.net
 Providing Internet services since 1994 www.sentex.net
 Cambridge, Ontario Canada   http://www.tancsa.com/
>Unformatted:
