From nobody@FreeBSD.org  Mon Dec  8 20:52:31 2008
Return-Path: <nobody@FreeBSD.org>
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 180BF106564A
	for <freebsd-gnats-submit@FreeBSD.org>; Mon,  8 Dec 2008 20:52:31 +0000 (UTC)
	(envelope-from nobody@FreeBSD.org)
Received: from www.freebsd.org (www.freebsd.org [IPv6:2001:4f8:fff6::21])
	by mx1.freebsd.org (Postfix) with ESMTP id 048278FC16
	for <freebsd-gnats-submit@FreeBSD.org>; Mon,  8 Dec 2008 20:52:31 +0000 (UTC)
	(envelope-from nobody@FreeBSD.org)
Received: from www.freebsd.org (localhost [127.0.0.1])
	by www.freebsd.org (8.14.3/8.14.3) with ESMTP id mB8KqUi0007813
	for <freebsd-gnats-submit@FreeBSD.org>; Mon, 8 Dec 2008 20:52:30 GMT
	(envelope-from nobody@www.freebsd.org)
Received: (from nobody@localhost)
	by www.freebsd.org (8.14.3/8.14.3/Submit) id mB8KqUST007812;
	Mon, 8 Dec 2008 20:52:30 GMT
	(envelope-from nobody)
Message-Id: <200812082052.mB8KqUST007812@www.freebsd.org>
Date: Mon, 8 Dec 2008 20:52:30 GMT
From: Boris Kochergin <spawk@acm.poly.edu>
To: freebsd-gnats-submit@FreeBSD.org
Subject: Kernel panic with EtherIP (may be related to SVN commit 178025)
X-Send-Pr-Version: www-3.1
X-GNATS-Notify:

>Number:         129508
>Category:       kern
>Synopsis:       [carp] [panic] Kernel panic with EtherIP (may be related to SVN commit 178025)
>Confidential:   no
>Severity:       serious
>Priority:       low
>Responsible:    freebsd-net
>State:          feedback
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Mon Dec 08 21:00:04 UTC 2008
>Closed-Date:    
>Last-Modified:  Sat Oct 15 08:50:05 UTC 2011
>Originator:     Boris Kochergin
>Release:        7.1-PRERELEASE from October 22, 2008
>Organization:
Polytechnic Institute of NYU
>Environment:
FreeBSD wireless-master 7.1-PRERELEASE FreeBSD 7.1-PRERELEASE #0: Wed Oct 22 22:55:52 UTC 2008     boris@wireless-master:/usr/obj/usr/src/sys/WIRELESS-MASTER  i386
>Description:
Collected in a few threads, in chronological order:

http://lists.freebsd.org/pipermail/freebsd-net/2008-February/016967.html
http://lists.freebsd.org/pipermail/freebsd-net/2008-September/019425.html
http://lists.freebsd.org/pipermail/freebsd-net/2008-October/019789.html

The panic referenced in the last thread occured on a 7.0-RELEASE system
running with the patches, so I refrained from filing a PR. I have since
moved the setup from the original machine to two machines running
7.1-PRERELEASE, using CARP and pfsync for failover. Yesterday, after 16
days of uptime, the panic occured on one of the new machines. Here is
the backtrace:

Fatal trap 12: page fault while in kernel mode
fault virtual address   = 0xc
fault code              = supervisor read, page not present
instruction pointer     = 0x20:0xc059a2b3
stack pointer           = 0x28:0xcc49e688
frame pointer           = 0x28:0xcc49e6b0
code segment            = base 0x0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, def32 1, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 19 (irq5: xl0)
trap number             = 12
panic: page fault
Uptime: 16d4h0m52s
Physical memory: 246 MB
Dumping 58 MB: 43 27 11

Reading symbols from /boot/kernel/acpi.ko...Reading symbols from /boot/kernel/acpi.ko.symbols...done.
done.
Loaded symbols for /boot/kernel/acpi.ko
#0  doadump () at /usr/src/sys/kern/kern_shutdown.c:244

warning: Source file is more recent than executable.

244             dumptid = curthread->td_tid;
(kgdb) where
#0  doadump () at /usr/src/sys/kern/kern_shutdown.c:244
#1  0xc0537739 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:418
#2  0xc0537af5 in panic (fmt=Could not find the frame base for "panic".
) at /usr/src/sys/kern/kern_shutdown.c:574
#3  0xc0748ff1 in trap_fatal (frame=0xcc49e648, eva=12) at /usr/src/sys/i386/i386/trap.c:939
#4  0xc0748be0 in trap_pfault (frame=0xcc49e648, usermode=0, eva=12) at /usr/src/sys/i386/i386/trap.c:852
#5  0xc07484aa in trap (frame=0xcc49e648) at /usr/src/sys/i386/i386/trap.c:530
#6  0xc07325eb in calltrap () at /usr/src/sys/i386/i386/exception.s:159
#7  0xc059a2b3 in m_copym (m=0x0, off0=1500, len=16, wait=1) at /usr/src/sys/kern/uipc_mbuf.c:538
#8  0xc0637da6 in ip_fragment (ip=0xc1f1a6e8, m_frag=0xcc49e7b8, mtu=1500, if_hwassist_flags=0, sw_csum=769) at /usr/src/sys/netinet/ip_output.c:729
#9  0xc06377a4 in ip_output (m=0xc1f1a600, opt=0x0, ro=0xc1e8e624, flags=0, imo=0x0, inp=0x0) at /usr/src/sys/netinet/ip_output.c:568
#10 0xc0627d95 in in_gif_output (ifp=0xc1e30000, family=18, m=0xc1f1a600) at /usr/src/sys/netinet/in_gif.c:230
#11 0xc060e5bd in gif_output (ifp=0xc1e30000, m=0xc1e1e300, dst=0xc1e35630, rt=0x0) at /usr/src/sys/net/if_gif.c:455
#12 0xc060e1f9 in gif_start (ifp=0xc1e30000) at /usr/src/sys/net/if_gif.c:351
#13 0xc0605e29 in bridge_enqueue (sc=0xc1ec4800, dst_ifp=0xc1e30000, m=0x0) at /usr/src/sys/net/if_bridge.c:1742
#14 0xc0607f8c in bridge_broadcast (sc=0xc1ec4800, src_if=0xc1e30c00, m=0xc1e1e300, runfilt=1) at /usr/src/sys/net/if_bridge.c:2386
#15 0xc0606b41 in bridge_forward (sc=0xc1ec4800, sbif=0xc1f61000, m=0xc1e1e300) at /usr/src/sys/net/if_bridge.c:2046
#16 0xc06070c5 in bridge_input (ifp=0xc1e30c00, m=0xc1ed2b00) at /usr/src/sys/net/if_bridge.c:2168
#17 0xc060e8ca in gif_input (m=0xc1ed2b00, af=18, ifp=0xc1e30c00) at /usr/src/sys/net/if_gif.c:563
#18 0xc06280f4 in in_gif_input (m=0xc1ed2b00, off=20) at /usr/src/sys/netinet/in_gif.c:328
#19 0xc062f815 in encap4_input (m=0xc1ed2b00, off=20) at /usr/src/sys/netinet/ip_encap.c:191
#20 0xc063381f in ip_input (m=0xc1ed2b00) at /usr/src/sys/netinet/ip_input.c:665
#21 0xc061081a in netisr_dispatch (num=2, m=0xc1ed2b00) at /usr/src/sys/net/netisr.c:185
#22 0xc060cf06 in ether_demux (ifp=0xc1d4fc00, m=0xc1ed2b00) at /usr/src/sys/net/if_ethersubr.c:834
#23 0xc060cce2 in ether_input (ifp=0xc1d4fc00, m=0xc1ed2b00) at /usr/src/sys/net/if_ethersubr.c:692
#24 0xc06994c4 in xl_rxeof (sc=0xc1d70000) at /usr/src/sys/pci/if_xl.c:2022
#25 0xc0699f6e in xl_intr (arg=0xc1d70000) at /usr/src/sys/pci/if_xl.c:2257
#26 0xc050fc70 in ithread_execute_handlers (p=0xc1d43000, ie=0xc1cc3980) at /usr/src/sys/kern/kern_intr.c:1088
#27 0xc050fe37 in ithread_loop (arg=0xc1d6cca0) at /usr/src/sys/kern/kern_intr.c:1175
#28 0xc050dcf3 in fork_exit (callout=0xc050fdb0 <ithread_loop>, arg=0xc1d6cca0, frame=0xcc49ed38) at /usr/src/sys/kern/kern_fork.c:804
#29 0xc0732660 in fork_trampoline () at /usr/src/sys/i386/i386/exception.s:264
(kgdb)

Here is the output of ifconfig on the afflicted machine:

sis0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=8<VLAN_MTU>
        ether 00:02:e3:05:bf:bd
        inet 10.0.0.1 netmask 0xffffff00 broadcast 10.0.0.255
        media: Ethernet autoselect (100baseTX <full-duplex>)
        status: active
xl0: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=9<RXCSUM,VLAN_MTU>
        ether 00:04:76:2a:4f:d2
        inet 128.238.9.197 netmask 0xffffff00 broadcast 128.238.9.255
        media: Ethernet autoselect (100baseTX <full-duplex>)
        status: active
lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> metric 0 mtu 16384
        inet 127.0.0.1 netmask 0xff000000 
pfsync0: flags=41<UP,RUNNING> metric 0 mtu 1460
        pfsync: syncdev: sis0 syncpeer: 224.0.0.240 maxupd: 128
carp0: flags=49<UP,LOOPBACK,RUNNING> metric 0 mtu 1500
        inet 128.238.9.199 netmask 0xffffff00 
        carp: MASTER vhid 1 advbase 1 advskew 0
bridge0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
        ether de:ad:be:ef:ca:fe
        inet 192.168.0.1 netmask 0xffffff00 broadcast 192.168.0.255
        id 00:00:00:00:00:00 priority 32768 hellotime 2 fwddelay 15
        maxage 20 holdcnt 6 proto rstp maxaddr 100 timeout 1200
        root id 00:00:00:00:00:00 priority 32768 ifcost 0 port 0
        member: gif5 flags=143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP>
                ifmaxaddr 0 port 12 priority 128 path cost 55
        member: gif4 flags=143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP>
                ifmaxaddr 0 port 11 priority 128 path cost 55
        member: gif3 flags=143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP>
                ifmaxaddr 0 port 10 priority 128 path cost 55
        member: gif2 flags=143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP>
                ifmaxaddr 0 port 9 priority 128 path cost 55
        member: gif1 flags=143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP>
                ifmaxaddr 0 port 8 priority 128 path cost 55
        member: gif0 flags=143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP>
                ifmaxaddr 0 port 7 priority 128 path cost 55
gif0: flags=8051<UP,POINTOPOINT,RUNNING,MULTICAST> metric 0 mtu 1280
        tunnel inet 128.238.9.199 --> 128.238.9.196
        inet 10.0.0.1 --> 10.0.0.2 netmask 0xffffff00 
gif1: flags=8051<UP,POINTOPOINT,RUNNING,MULTICAST> metric 0 mtu 1280
        tunnel inet 128.238.9.199 --> 128.238.49.216
        inet 10.0.0.1 --> 10.0.0.3 netmask 0xffffff00 
gif2: flags=8051<UP,POINTOPOINT,RUNNING,MULTICAST> metric 0 mtu 1280
        tunnel inet 128.238.9.199 --> 128.238.38.179
        inet 10.0.0.1 --> 10.0.0.4 netmask 0xffffff00 
gif3: flags=8051<UP,POINTOPOINT,RUNNING,MULTICAST> metric 0 mtu 1280
        tunnel inet 128.238.9.199 --> 128.238.197.3
        inet 10.0.0.1 --> 10.0.0.5 netmask 0xffffff00 
gif4: flags=8051<UP,POINTOPOINT,RUNNING,MULTICAST> metric 0 mtu 1280
        tunnel inet 128.238.9.199 --> 128.238.140.238
        inet 10.0.0.1 --> 10.0.0.6 netmask 0xffffff00 
gif5: flags=8051<UP,POINTOPOINT,RUNNING,MULTICAST> metric 0 mtu 1280
        tunnel inet 128.238.9.199 --> 128.238.23.2
        inet 10.0.0.1 --> 10.0.0.7 netmask 0xffffff00

As I have automatic failover in place, the machine can be taken down to
test patches, etc.
>How-To-Repeat:
Set up a network like the one documented in the first thread, use it,
and wait a while. It's tricky, at best, since weeks can go by between panics.
>Fix:


>Release-Note:
>Audit-Trail:
Responsible-Changed-From-To: freebsd-bugs->freebsd-net 
Responsible-Changed-By: linimon 
Responsible-Changed-When: Mon Dec 8 22:04:09 UTC 2008 
Responsible-Changed-Why:  
Possibly net-related. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=129508 

From: Boris Kochergin <spawk@acm.poly.edu>
To: bug-followup@FreeBSD.org
Cc:  
Subject: Re: kern/129508: [panic] Kernel panic with EtherIP (may be related
 to SVN commit 178025)
Date: Thu, 26 Feb 2009 23:34:06 -0500

 For anyone who was unenthusiastic about this due to the infrequency of 
 the problem, this has become less of a debugging nightmare. Due to 
 increased network load, the panic occurs about once a day, on average, 
 now with 7.1-RELEASE-p2.

From: Boris Kochergin <spawk@acm.poly.edu>
To: bug-followup@FreeBSD.org
Cc:  
Subject: Re: kern/129508: [panic] Kernel panic with EtherIP (may be related
 to SVN commit 178025)
Date: Wed, 13 May 2009 15:38:47 -0400

 As a workaround, lowering the MTU of the 802.11 interfaces on all of the 
 access points made the panic go away, until one of the machines that was 
 panicking was upgraded to 7.2-RELEASE. Afterward, a panic with a 
 different backtrace started occuring:
 
 Unread portion of the kernel message buffer:
 in_cksum_skip: out of data by 21295
 delayed m_pullup, m->len: 22  off: 28410  p: 97
 
 
 Fatal trap 12: page fault while in kernel mode
 fault virtual address   = 0x44032cbf
 fault code              = supervisor read, page not present
 instruction pointer     = 0x20:0xc05a9b54
 stack pointer           = 0x28:0xc1f274cc
 frame pointer           = 0x28:0xc1f274d4
 code segment            = base 0x0, limit 0xfffff, type 0x1b
                         = DPL 0, pres 1, def32 1, gran 1
 processor eflags        = interrupt enabled, resume, IOPL = 0
 current process         = 19 (irq5: xl0)
 trap number             = 12
 panic: page fault
 Uptime: 7d15h55m48s
 Physical memory: 246 MB
 Dumping 58 MB: 43 27 11
 
 Reading symbols from /boot/kernel/acpi.ko...Reading symbols from 
 /boot/kernel/acpi.ko.symbols...done.
 done.
 Loaded symbols for /boot/kernel/acpi.ko
 #0  doadump () at /usr/src/sys/kern/kern_shutdown.c:244
 244             dumptid = curthread->td_tid;
 (kgdb) where
 #0  doadump () at /usr/src/sys/kern/kern_shutdown.c:244
 #1  0xc0542599 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:418
 #2  0xc0542955 in panic (fmt=Could not find the frame base for "panic".) 
 at /usr/src/sys/kern/kern_shutdown.c:574
 #3  0xc0754ee1 in trap_fatal (frame=0xc1f2748c, eva=1141058751) at 
 /usr/src/sys/i386/i386/trap.c:939
 #4  0xc0754ad0 in trap_pfault (frame=0xc1f2748c, usermode=0, 
 eva=1141058751) at /usr/src/sys/i386/i386/trap.c:852
 #5  0xc075439a in trap (frame=0xc1f2748c) at 
 /usr/src/sys/i386/i386/trap.c:530
 #6  0xc073c3fb in calltrap () at /usr/src/sys/i386/i386/exception.s:159
 #7  0xc05a9b54 in m_tag_locate (m=0xc3004300, cookie=0, type=21, t=0x0) 
 at /usr/src/sys/kern/uipc_mbuf2.c:391
 #8  0xc043ef81 in m_tag_find (m=0xc3004300, type=21, start=0x0) at 
 mbuf.h:957
 #9  0xc043eee1 in pf_get_mtag (m=0xc3004300) at pf_mtag.h:70
 #10 0xc044c44e in pf_test (dir=2, ifp=0xc20b6c00, m0=0xc1f276f0, eh=0x0, 
 inp=0x0) at /usr/src/sys/contrib/pf/net/pf.c:6776
 #11 0xc0459f3f in pf_check_out (arg=0x0, m=0xc1f276f0, ifp=0xc20b6c00, 
 dir=2, inp=0x0) at /usr/src/sys/contrib/pf/net/pf_ioctl.c:3687
 #12 0xc061da71 in pfil_run_hooks (ph=0xc07dee60, mp=0xc1f277b4, 
 ifp=0xc20b6c00, dir=2, inp=0x0) at /usr/src/sys/net/pfil.c:78
 #13 0xc064557e in ip_output (m=0xc3004300, opt=0x0, ro=0xc21b1aa4, 
 flags=0, imo=0x0, inp=0x0) at /usr/src/sys/netinet/ip_output.c:443
 #14 0xc06359d3 in in_gif_output (ifp=0xc2103400, family=18, 
 m=0xc3004300) at /usr/src/sys/netinet/in_gif.c:244
 #15 0xc061b3cd in gif_output (ifp=0xc2103400, m=0xc238c100, 
 dst=0xc219c560, rt=0x0) at /usr/src/sys/net/if_gif.c:455
 #16 0xc061b009 in gif_start (ifp=0xc2103400) at 
 /usr/src/sys/net/if_gif.c:351
 #17 0xc0612c39 in bridge_enqueue (sc=0xc222c800, dst_ifp=0xc2103400, 
 m=0x0) at /usr/src/sys/net/if_bridge.c:1742
 #18 0xc0614d9c in bridge_broadcast (sc=0xc222c800, src_if=0xc2196000, 
 m=0xc238c100, runfilt=1) at /usr/src/sys/net/if_bridge.c:2386
 #19 0xc0613951 in bridge_forward (sc=0xc222c800, sbif=0xc22c8400, 
 m=0xc238c100) at /usr/src/sys/net/if_bridge.c:2046
 #20 0xc0613ed5 in bridge_input (ifp=0xc2196000, m=0xc3068600) at 
 /usr/src/sys/net/if_bridge.c:2168
 #21 0xc061b6da in gif_input (m=0xc3068600, af=18, ifp=0xc2196000) at 
 /usr/src/sys/net/if_gif.c:563
 #22 0xc0635d34 in in_gif_input (m=0xc3068600, off=20) at 
 /usr/src/sys/netinet/in_gif.c:342
 #23 0xc063d905 in encap4_input (m=0xc3068600, off=20) at 
 /usr/src/sys/netinet/ip_encap.c:191
 #24 0xc064190f in ip_input (m=0xc3068600) at 
 /usr/src/sys/netinet/ip_input.c:664
 #25 0xc061d61a in netisr_dispatch (num=2, m=0xc3068600) at 
 /usr/src/sys/net/netisr.c:185
 #26 0xc0619d16 in ether_demux (ifp=0xc20b6c00, m=0xc3068600) at 
 /usr/src/sys/net/if_ethersubr.c:834
 #27 0xc0619af2 in ether_input (ifp=0xc20b6c00, m=0xc3068600) at 
 /usr/src/sys/net/if_ethersubr.c:692
 #28 0xc06a1924 in xl_rxeof (sc=0xc20d7000) at /usr/src/sys/pci/if_xl.c:2022
 #29 0xc06a23ce in xl_intr (arg=0xc20d7000) at /usr/src/sys/pci/if_xl.c:2257
 #30 0xc0518ca0 in ithread_execute_handlers (p=0xc20aa000, ie=0xc2009900) 
 at /usr/src/sys/kern/kern_intr.c:1088
 #31 0xc0518e67 in ithread_loop (arg=0xc20d3c80) at 
 /usr/src/sys/kern/kern_intr.c:1175
 #32 0xc0516d23 in fork_exit (callout=0xc0518de0 <ithread_loop>, 
 arg=0xc20d3c80, frame=0xc1f27d38) at /usr/src/sys/kern/kern_fork.c:810
 #33 0xc073c470 in fork_trampoline () at 
 /usr/src/sys/i386/i386/exception.s:264
State-Changed-From-To: open->feedback 
State-Changed-By: glebius 
State-Changed-When: Sat Oct 15 08:41:16 UTC 2011 
State-Changed-Why:  
Feedback requeted. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=129508 

From: Gleb Smirnoff <glebius@glebius.int.ru>
To: Boris Kochergin <spawk@acm.poly.edu>
Cc: bug-followup@FreeBSD.org
Subject: kern/129508: [carp] [panic] Kernel panic with EtherIP (may be
 related to SVN commit 178025)
Date: Sat, 15 Oct 2011 12:41:07 +0400

   Boris,
 
   I am now going through all carp-related PRs, since I'm working
 on major carp rewrite.
 
   Your PR doesn't look directly related to CARP, but anyway I am
 interested on status update for this PR.
 
   Did you manage to find workaround for your problem? Had you
 done any operating system upgrades since last report? Any additional
 info?
 
 -- 
 Totus tuus, Glebius.
>Unformatted:
