From nobody@FreeBSD.org  Wed Mar 28 10:06:49 2001
Return-Path: <nobody@FreeBSD.org>
Received: from freefall.freebsd.org (freefall.freebsd.org [216.136.204.21])
	by hub.freebsd.org (Postfix) with ESMTP id 0F6B337B724
	for <freebsd-gnats-submit@FreeBSD.org>; Wed, 28 Mar 2001 10:06:41 -0800 (PST)
	(envelope-from nobody@FreeBSD.org)
Received: (from nobody@localhost)
	by freefall.freebsd.org (8.11.1/8.11.1) id f2SI6fi39811;
	Wed, 28 Mar 2001 10:06:41 -0800 (PST)
	(envelope-from nobody)
Message-Id: <200103281806.f2SI6fi39811@freefall.freebsd.org>
Date: Wed, 28 Mar 2001 10:06:41 -0800 (PST)
From: gunther@gusw.net
To: freebsd-gnats-submit@FreeBSD.org
Subject: Kernel panic when using IPsec on high loads
X-Send-Pr-Version: www-1.0

>Number:         26176
>Category:       kern
>Synopsis:       Kernel panic when using IPsec on high loads
>Confidential:   no
>Severity:       critical
>Priority:       high
>Responsible:    freebsd-bugs
>State:          closed
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Wed Mar 28 10:10:01 PST 2001
>Closed-Date:    Tue Jun 5 12:52:21 PDT 2001
>Last-Modified:  Tue Jun 05 12:57:25 PDT 2001
>Originator:     Gunther Schadow
>Release:        4.2-RELEASE
>Organization:
Regenstrief Institute for Health Care
>Environment:
PicoBSD (no uname) but: it's FreeBSD 4.2-RELEASE on i386 built recently

Hardware: recent DELL "Dimension 4100" PC.

boot log with devices etc:

Copyright (c) 1992-2000 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
        The Regents of the University of California. All rights reserved.
FreeBSD 4.2-RELEASE #12: Thu Mar 15 22:14:04 EST 2001
    schadow@aurora.regenstrief.org:/usr/src/sys-4.2/compile/PICOBSD-ngigw1
Timecounter "i8254"  frequency 1193182 Hz
Timecounter "TSC"  frequency 930319864 Hz
CPU: Pentium III/Pentium III Xeon/Celeron (930.32-MHz 686-class CPU)
  Origin = "GenuineIntel"  Id = 0x686  Stepping = 6
  Features=0x383f9ff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,SEP,MTRR,PGE,MCA,CMOV,PA
T,PSE36,MMX,FXSR,SSE>
real memory  = 133955584 (130816K bytes)
avail memory = 126279680 (123320K bytes)
pnpbios: Bad PnP BIOS data checksum
Preloaded elf kernel "kernel" at 0xc0427000.
Preloaded mfs_root "/fs.PICOBSD" at 0xc0427084.
Pentium Pro MTRR support enabled
md0: Preloaded image </fs.PICOBSD> 2048000 bytes at 0xc02313e8
md1: Malloc disk
npx0: <math processor> on motherboard
npx0: INT 16 interface
pcib0: <Host to PCI bridge> on motherboard
pci0: <PCI bus> on pcib0
pcib1: <PCI to PCI bridge (vendor=8086 device=1131)> at device 1.0 on pci0
pci1: <PCI bus> on pcib1
pci1: <ATI model 5046 graphics accelerator> at 0.0 irq 11
pcib2: <PCI to PCI bridge (vendor=8086 device=244e)> at device 30.0 on pci0
pci2: <PCI bus> on pcib2
xl0: <3Com 3c905C-TX Fast Etherlink XL> port 0xdc00-0xdc7f mem 0xff9ffc00-0xff9f
fc7f irq 3 at device 9.0 on pci2
xl0: Ethernet address: 00:01:03:d6:24:fd
miibus0: <MII bus> on xl0
xlphy0: <3c905C 10/100 internal PHY> on miibus0
xlphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
pci2: <unknown card> (vendor=0x1274, dev=0x1371) at 12.0 irq 9
isab0: <PCI to ISA bridge (vendor=8086 device=2440)> at device 31.0 on pci0
isa0: <ISA bus> on isab0
pci0: <Unknown PCI ATA controller> at 31.1
pci0: <UHCI USB controller> at 31.2 irq 10
pci0: <unknown card> (vendor=0x8086, dev=0x2443) at 31.3 irq 9
fdc0: <NEC 72065B or clone> at port 0x3f0-0x3f5,0x3f7 irq 6 drq 2 on isa0
fdc0: FIFO enabled, 8 bytes threshold
fd0: <1440-KB 3.5" drive> on fdc0 drive 0
sio0 at port 0x3f8-0x3ff irq 4 flags 0x10 on isa0
sio0: type 16550A, console
DUMMYNET initialized (000608)
IP packet filtering initialized, divert enabled, rule-based forwarding disabled,
 default to accept, logging disabled
IPsec: Initialized Security Association Processing.

>Description:
Kernel panic (see messages below) when used as a router for
for videoconferencing (i.e. under moderate to high load of
UDP streaming, 1.5 to 2 Mb/s) with IPsec tunnel mode enabled
(static keying, *no* IKE/racoon) and using IPFW (with or 
without any firewall rules.) Kernel panics approximately 5
minutes after the streaming starts.

Both input and output to the gateway go through the same 
ethernet device (an "xl" device.)

Will reproduce more of problem tomorrow and will use kernel
debugger and more recent STABLE versions. Would be good to 
have some feedback though that would limit my search space
for trial-and-error attempts.

Two incidents:

Incident 1:

Fatal trap 12: page fault while in kernel mode
fault virtual address   = 0xb5c0a612
fault code              = supervisor read, page not present
instruction pointer     = 0x8:0xc014dcf8
stack pointer           = 0x10:0xc0201d8c
frame pointer           = 0x10:0xc0201d98
code segment            = base 0x0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, def32 1, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = Idle
interrupt mask          = net tty
trap number             = 12
panic: page fault

syncing disks...
done
Uptime: 1h48m1s


Fatal trap 12: page fault while in kernel mode
fault virtual address   = 0xb6c03812
fault code              = supervisor read, page not present
instruction pointer     = 0x8:0xc014dcf8
stack pointer           = 0x10:0xc0201b08
frame pointer           = 0x10:0xc0201b14
code segment            = base 0x0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, def32 1, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = Idle
interrupt mask          = net tty
trap number             = 12
panic: page fault
Uptime: 1h48m2s


-------------------
Incident 2:

Fatal trap 12: page fault while in kernel mode
fault virtual address   = 0xb4c08a00
fault code              = supervisor read, page not present
instruction pointer     = 0x8:0xc0197484
stack pointer           = 0x10:0xc0201ab8
frame pointer           = 0x10:0xc0201ac8
code segment            = base 0x0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, def32 1, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = Idle
interrupt mask          = net tty
trap number             = 12
panic: page fault

syncing disks...
done
Uptime: 36m55s

>How-To-Repeat:
The problem consistently occurs about 5 minutes after the high
load begins. Have one machine at remote site now but will try
in laboratory setting and with KDB tomorrow.

PLEASE let me know if this problem (or a similar problem) is 
known and has workaround or fixes of any kind.
>Fix:

>Release-Note:
>Audit-Trail:

From: Gunther Schadow <gunther@aurora.regenstrief.org>
To: freebsd-gnats-submit@FreeBSD.org
Cc:  
Subject: Re: kern/26176: Kernel panic when using IPsec on high loads
Date: Thu, 29 Mar 2001 18:14:05 +0000

 Here is more information:
 
 Fatal trap 12: page fault while in kernel mode
 fault virtual address   = 0xb2c04400
 fault code              = supervisor read, page not present
 instruction pointer     = 0x8:0xc0199fa0
 stack pointer           = 0x10:0xc020c218
 frame pointer           = 0x10:0xc020c268
 code segment            = base 0x0, limit 0xfffff, type 0x1b
                         = DPL 0, pres 1, def32 1, gran 1
 processor eflags        = interrupt enabled, resume, IOPL = 0
 current process         = Idle
 interrupt mask          = net tty
 kernel: type 12 trap, code=0
 Stopped at      esp_hdrsiz+0x498:       movl    0(%edx),%eax
 
 So, the problem seems to be in the IPsec code, sys/netinet6/esp_output.c
 called from sys/netinet6/ipsec.c. Here is the stack trace:
 
 esp_hdrsiz(c0b48500,c0b485f5,c0b3f400,c0ceb800,2) at esp_hdrsiz+0x498
 esp4_output(c0b48500,c0ceb800,c0ceba00,0,1) at esp4_output+0x48
 ipsec4_output(c020c418,c0ceba00,1,c0ceef00,c0b5af00) at ipsec4_output+0x2e3
 ip_output(c0b1be00,0,c0229a50,1,0) at ip_output+0x762
 ip_stripoptions(c0b1be00,0,c0b1be00,0,ffffffff) at ip_stripoptions+0x211
 ip_input(c0b1be00) at ip_input+0x462
 ip_input(c01d374f,0,d0f0010,10,c7a50010) at ip_input+0x7b7
 doreti_popl_fs_fault() at doreti_popl_fs_fault+0x91
 
 I am assuming if I upgrade to some more current version of the IPsec
 code the problem might have been fixed. But am not sure... I will
 report more later.
 
 thanks
 -- 
 Gunther Schadow, M.D., Ph.D.                    gschadow@regenstrief.org
 Medical Information Scientist      Regenstrief Institute for Health Care
 Adjunct Assistent Professor        Indiana University School of Medicine
 tel:1(317)630-7960                         http://aurora.regenstrief.org
State-Changed-From-To: open->feedback 
State-Changed-By: iedowse 
State-Changed-When: Thu Apr 12 09:31:02 PDT 2001 
State-Changed-Why:  
I think this may have been fixed in revision 1.130.2.21 of  
src/sys/netinet/ip_input.c. Could you try updating to a more recent 
-stable to see if this problem still exists? 


http://www.freebsd.org/cgi/query-pr.cgi?pr=26176 
State-Changed-From-To: feedback->closed 
State-Changed-By: iedowse 
State-Changed-When: Tue Jun 5 12:52:21 PDT 2001 
State-Changed-Why:  

This bug (icmp_error mbuf corruption) has been fixed. Thanks 
for the bug report! 

http://www.FreeBSD.org/cgi/query-pr.cgi?pr=26176 
>Unformatted:
