From andre.albsmeier@mchp.siemens.de  Sat Aug 18 04:33:29 2001
Return-Path: <andre.albsmeier@mchp.siemens.de>
Received: from goliath.siemens.de (goliath.siemens.de [194.138.37.131])
	by hub.freebsd.org (Postfix) with ESMTP id 7DD1637B406
	for <FreeBSD-gnats-submit@freebsd.org>; Sat, 18 Aug 2001 04:33:24 -0700 (PDT)
	(envelope-from andre.albsmeier@mchp.siemens.de)
Received: from mail2.siemens.de (mail2.siemens.de [139.25.208.11])
	by goliath.siemens.de (8.11.1/8.11.1) with ESMTP id f7IBXMb07598
	for <FreeBSD-gnats-submit@freebsd.org>; Sat, 18 Aug 2001 13:33:23 +0200 (MET DST)
Received: from curry.mchp.siemens.de (curry.mchp.siemens.de [139.25.42.7])
	by mail2.siemens.de (8.11.4/8.11.4) with ESMTP id f7IBXMn05594
	for <FreeBSD-gnats-submit@freebsd.org>; Sat, 18 Aug 2001 13:33:22 +0200 (MET DST)
Received: (from localhost)
	by curry.mchp.siemens.de (8.11.3/8.11.3) id f7IBXMK26812
	for FreeBSD-gnats-submit@freebsd.org; Sat, 18 Aug 2001 13:33:22 +0200 (CEST)
Message-Id: <200108181133.f7IBXMj95078@curry.mchp.siemens.de>
Date: Sat, 18 Aug 2001 13:33:22 +0200 (CEST)
From: Andre Albsmeier <andre.albsmeier@mchp.siemens.de>
To: FreeBSD-gnats-submit@freebsd.org
Cc:
Subject: 4.4-PRERELEASE crashes under heavy net I/O
X-Send-Pr-Version: 3.113
X-GNATS-Notify:

>Number:         29845
>Category:       kern
>Synopsis:       4.4-PRERELEASE crashes under heavy net I/O
>Confidential:   no
>Severity:       critical
>Priority:       medium
>Responsible:    freebsd-bugs
>State:          closed
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Sat Aug 18 04:40:02 PDT 2001
>Closed-Date:    Sun Aug 26 07:36:41 PDT 2001
>Last-Modified:  Sun Aug 26 07:45:10 PDT 2001
>Originator:     Andre Albsmeier
>Release:        FreeBSD 4.4-PRERELEASE i386
>Organization:
>Environment:

FreeBSD 4.4-PRERELEASE #26: Wed Aug 15 17:04:18 CEST 2001
on a Siemens 510 AGP laptop:

Copyright (c) 1992-2001 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
	The Regents of the University of California. All rights reserved.
FreeBSD 4.4-RC #81: Sat Aug 18 10:46:50 CEST 2001
    root@voyager.home.albsmeier.net:/src/obj-4/src/src-4/sys/schlappy
Timecounter "i8254"  frequency 1193143 Hz
CPU: Pentium II/Pentium II Xeon/Celeron (366.66-MHz 686-class CPU)
  Origin = "GenuineIntel"  Id = 0x66a  Stepping = 10
  Features=0x183f9ff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,MMX,FXSR>
real memory  = 134152192 (131008K bytes)
avail memory = 127582208 (124592K bytes)
Preloaded elf kernel "kernel" at 0xc0301000.
Pentium Pro MTRR support enabled
Using $PIR table, 7 entries at 0xc00fdf50
apm0: <APM BIOS> on motherboard
apm: found APM BIOS v1.2, connected at v1.2
npx0: <math processor> on motherboard
npx0: INT 16 interface
pcib0: <Intel 82443BX (440 BX) host to PCI bridge> on motherboard
pci0: <PCI bus> on pcib0
pcib1: <Intel 82443BX (440 BX) PCI-PCI (AGP) bridge> at device 1.0 on pci0
pci1: <PCI bus> on pcib1
pci1: <NeoMagic MagicMedia 256AV SVGA controller> at 0.0 irq 10
chip1: <NeoMagic MagicMedia 256AX Audio controller> mem 0xfea00000-0xfeafffff,0xf7800000-0xf7bfffff irq 11 at device 0.1 on pci1
isab0: <Intel 82371AB PCI to ISA bridge> at device 7.0 on pci0
isa0: <ISA bus> on isab0
atapci0: <Intel PIIX4 ATA33 controller> port 0xfcd0-0xfcdf at device 7.1 on pci0
ata0: at 0x1f0 irq 14 on atapci0
pci0: <Intel 82371AB/EB (PIIX4) USB controller> at 7.2 irq 9
chip2: <Intel 82371AB Power management controller> port 0x2180-0x218f at device 7.3 on pci0
pcic0: <TI PCI-1225 PCI-CardBus Bridge> irq 10 at device 10.0 on pci0
pcic0: PCI Memory allocated: 0x44000000
pcic0: TI12XX PCI Config Reg: [ring enable][speaker enable][pwr save][pci only]
pccard0: <PC Card bus (classic)> on pcic0
pcic1: <TI PCI-1225 PCI-CardBus Bridge> irq 11 at device 10.1 on pci0
pcic1: PCI Memory allocated: 0x44001000
pcic1: TI12XX PCI Config Reg: [ring enable][speaker enable][pwr save][pci only]
pccard1: <PC Card bus (classic)> on pcic1
orm0: <Option ROM> at iomem 0xc0000-0xcbfff on isa0
atkbdc0: <Keyboard controller (i8042)> at port 0x60,0x64 on isa0
atkbd0: <AT Keyboard> irq 1 on atkbdc0
psm0: <PS/2 Mouse> irq 12 on atkbdc0
psm0: model Generic PS/2 mouse, device ID 0
vga0: <Generic ISA VGA> at port 0x3b0-0x3df iomem 0xa0000-0xbffff on isa0
sc0: <System console> on isa0
sc0: VGA <9 virtual consoles, flags=0x200>
fdc0: <NEC 72065B or clone> at port 0x3f0-0x3f5,0x3f7 irq 6 drq 2 on isa0
fdc0: FIFO enabled, 8 bytes threshold
fd0: <1440-KB 3.5" drive> on fdc0 drive 0
sio0 at port 0x3f8-0x3ff irq 4 flags 0x10 on isa0
sio0: type 16550A
sio1 at port 0x2f8-0x2ff irq 3 on isa0
sio1: type 16550A
pcm0: <OPL3-SAx (YMF719)> at port 0x530-0x537,0x370-0x371,0xf8c-0xf94,0xe0e irq 5 drq 1 flags 0xc113 on isa0
ppc0: <Parallel port> at port 0x378-0x37f irq 7 on isa0
ppc0: Generic chipset (NIBBLE-only) in COMPATIBLE mode
lpt0: <Printer> on ppbus0
lpt0: Interrupt-driven port
pcic0: Event mask 0xf
pcic1: Event mask 0xf
ata0-master: DMA limited to UDMA33, non-ATA66 compliant cable
ad0: 28615MB <IBM-DJSA-230> [58140/16/63] at ata0-master UDMA33
acd0: DVD-ROM <MATSHITADVD-ROM SR-8171> at ata0-slave using PIO4
Mounting root from ufs:/dev/ad0s2a
pccard: card inserted, slot 0
pccard: card inserted, slot 1
pcic0: debounced state is 0x30000419
pcic1: debounced state is 0x30000459
WARNING: / was not properly dismounted
pccard: card inserted, slot 0
pccard: card inserted, slot 1
xe0 at port 0x240-0x24f iomem 0xcc000-0xccfff irq 10 slot 0 on pccard0
xe0: Intel CE3, bonding version 0x45, 100Mbps capable
xe0: DingoID = 0x444b, RevisionID = 0x1, VendorID = 0
xe0: Ethernet address 00:a0:c9:bb:80:26
sio2 at port 0x2e8-0x2ef irq 11 slot 1 on pccard1
sio2: type 16550A

>Description:

Coming from 4.3-STABLE as of May 18th I tried to test 4.4-PRERELEASE
on the above machine. I can reliably crash the box when doing heavy
net I/O, otherwise it runs fine. I replaced the Intel NIC with a 3COM
589D but this didn't help.

It runs stable under Win98 (as stable as Win98 can run :-)).

I have other machines (non laptops) here which run perfectly.
_I_ would assume it is something laptop specific -- might
be related to Warners new pccard code. I have also reconfigured
the BIOS to change/share various interrupt combinations without
success.

I have saved the crashdumps for further examination. As you can see,
the box crashes in whichever process it wants...

********************************************************************************

root@schlappy:/var/crash>gdb -k kernel vmcore.6
GNU gdb 4.18
Copyright 1998 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i386-unknown-freebsd"...
IdlePTD 3276800
initial pcb at 293180
panicstr: page fault
panic messages:
---
Fatal trap 12: page fault while in kernel mode
fault virtual address   = 0xc08befd6
fault code              = supervisor write, page not present
instruction pointer     = 0x8:0xc0191cf3
stack pointer           = 0x10:0xc8367de8
frame pointer           = 0x10:0xc8367e0c
code segment            = base 0x0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, def32 1, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 239 (nfsiod)
interrupt mask          = net 
trap number             = 12
panic: page fault

syncing disks... 10 9 8 5 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 
done
Uptime: 12m29s

dumping to dev #ad/1, offset 150076
dump ata0: resetting devices .. ata0-slave: timeout waiting for command=ef s=00 e=00
done
127 126 125 124 123 122 121 120 119 118 117 116 115 114 113 112 111 110 109 108 107 106 105 104 103 102 101 100 99 98 97 96 95 94 93 92 91 90 89 88 87 86 85 84 83 82 81 80 79 78 77 76 75 74 73 72 71 70 69 68 67 66 65 64 63 62 61 60 59 58 57 56 55 54 53 52 51 50 49 48 47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 
---
#0  dumpsys () at /src/src-4/sys/kern/kern_shutdown.c:472
472             if (dumping++) {
(kgdb) where
#0  dumpsys () at /src/src-4/sys/kern/kern_shutdown.c:472
#1  0xc013bd7f in boot (howto=256) at /src/src-4/sys/kern/kern_shutdown.c:312
#2  0xc013c165 in panic (fmt=0xc02574ec "%s") at /src/src-4/sys/kern/kern_shutdown.c:580
#3  0xc02203ef in trap_fatal (frame=0xc8367da8, eva=3230396374) at /src/src-4/sys/i386/i386/trap.c:956
#4  0xc022009d in trap_pfault (frame=0xc8367da8, usermode=0, eva=3230396374) at /src/src-4/sys/i386/i386/trap.c:849
#5  0xc021fc43 in trap (frame={tf_fs = 16, tf_es = -935985136, tf_ds = 16, tf_edi = 0, tf_esi = -1066136064, tf_ebp = -935952884, tf_isp = -935952940, tf_ebx = -1066136064, 
      tf_edx = -1066254336, tf_ecx = -1913498372, tf_eax = 1683414, tf_trapno = 12, tf_err = 2, tf_eip = -1072096013, tf_cs = 8, tf_eflags = 66051, tf_esp = -1066121172, tf_ss = -935952684})
    at /src/src-4/sys/i386/i386/trap.c:448
#6  0xc0191cf3 in nfsm_uiotombuf (uiop=0xc8367ed4, mq=0xc8367e74, siz=8192, bpos=0xc8367e78) at /src/src-4/sys/nfs/nfs_subs.c:892
#7  0xc0199bb3 in nfs_writerpc (vp=0xc83a3d00, uiop=0xc8367ed4, cred=0xc0afaf80, iomode=0xc8367ec4, must_commit=0xc8367ec0) at /src/src-4/sys/nfs/nfs_vnops.c:1183
#8  0xc018e034 in nfs_doio (bp=0xc32f8264, cr=0xc0afaf80, p=0x0) at /src/src-4/sys/nfs/nfs_bio.c:1518
#9  0xc019303a in nfssvc_iod (p=0xc75e15a0) at /src/src-4/sys/nfs/nfs_syscalls.c:970
#10 0xc0192e5c in nfssvc (p=0xc75e15a0, uap=0xc8367f80) at /src/src-4/sys/nfs/nfs_syscalls.c:166
#11 0xc02206a1 in syscall2 (frame={tf_fs = 47, tf_es = 47, tf_ds = 47, tf_edi = -1077936680, tf_esi = 0, tf_ebp = -1077936776, tf_isp = -935952428, tf_ebx = 2, tf_edx = 1, tf_ecx = 19, 
      tf_eax = 155, tf_trapno = 12, tf_err = 2, tf_eip = 134515664, tf_cs = 31, tf_eflags = 643, tf_esp = -1077936852, tf_ss = 47}) at /src/src-4/sys/i386/i386/trap.c:1155
#12 0xc02144c5 in Xint0x80_syscall ()
#13 0x8048135 in ?? ()
(kgdb) 

********************************************************************************

root@schlappy:/var/crash>gdb -k kernel vmcore.7
GNU gdb 4.18
Copyright 1998 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i386-unknown-freebsd"...
IdlePTD 3276800
initial pcb at 293180
panicstr: sf_buf_ref: referencing a free sf_buf
panic messages:
---
panic: sf_buf_ref: referencing a free sf_buf

syncing disks... 3 3 
done
Uptime: 3m31s

dumping to dev #ad/1, offset 150076
dump ata0: resetting devices .. ata0-slave: timeout waiting for command=ef s=00 e=00
done
127 126 125 124 123 122 121 120 119 118 117 116 115 114 113 112 111 110 109 108 107 106 105 104 103 102 101 100 99 98 97 96 95 94 93 92 91 90 89 88 87 86 85 84 83 82 81 80 79 78 77 76 75 74 73 72 71 70 69 68 67 66 65 64 63 62 61 60 59 58 57 56 55 54 53 52 51 50 49 48 47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 
---
#0  dumpsys () at /src/src-4/sys/kern/kern_shutdown.c:472
472             if (dumping++) {
(kgdb) where
#0  dumpsys () at /src/src-4/sys/kern/kern_shutdown.c:472
#1  0xc013bd7f in boot (howto=256) at /src/src-4/sys/kern/kern_shutdown.c:312
#2  0xc013c165 in panic (fmt=0xc0239060 "sf_buf_ref: referencing a free sf_buf") at /src/src-4/sys/kern/kern_shutdown.c:580
#3  0xc015e6ef in sf_buf_ref (addr=0xc78ee000 <Address 0xc78ee000 out of bounds>, size=4096) at /src/src-4/sys/kern/uipc_syscalls.c:1469
#4  0xc015792e in m_copym (m=0xc0741f00, off0=4344, len=1448, wait=1) at /src/src-4/sys/kern/uipc_mbuf.c:713
#5  0xc0188a58 in tcp_output (tp=0xc79270c0) at /src/src-4/sys/netinet/tcp_output.c:592
#6  0xc0187bc1 in tcp_input (m=0xc0741000, off0=20, proto=6) at /src/src-4/sys/netinet/tcp_input.c:2316
#7  0xc01829f5 in ip_input (m=0xc0741000) at /src/src-4/sys/netinet/ip_input.c:820
#8  0xc0182a53 in ipintr () at /src/src-4/sys/netinet/ip_input.c:848
#9  0xc02158b5 in swi_net_next ()
#10 0xc02206a1 in syscall2 (frame={tf_fs = 47, tf_es = 47, tf_ds = 47, tf_edi = 134152192, tf_esi = 0, tf_ebp = -1077940616, tf_isp = -935239724, tf_ebx = 0, 
      tf_edx = 6, tf_ecx = 134152192, tf_eax = 336, tf_trapno = 22, tf_err = 2, tf_eip = 672045052, tf_cs = 31, tf_eflags = 518, tf_esp = -1077940692, tf_ss = 47})
    at /src/src-4/sys/i386/i386/trap.c:1155
#11 0xc02144c5 in Xint0x80_syscall ()
#12 0x804c8c2 in ?? ()
#13 0x8050a7a in ?? ()
#14 0x804b1d5 in ?? ()
#15 0x804a76d in ?? ()
(kgdb) 

********************************************************************************

root@schlappy:/var/crash>gdb -k kernel vmcore.8 
GNU gdb 4.18
Copyright 1998 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i386-unknown-freebsd"...
IdlePTD 3276800
initial pcb at 293180
panicstr: page fault
panic messages:
---
Fatal trap 12: page fault while in kernel mode
fault virtual address   = 0x371f4
fault code              = supervisor read, page not present
instruction pointer     = 0x8:0xc0157545
stack pointer           = 0x10:0xc843bda0
frame pointer           = 0x10:0xc843bdac
code segment            = base 0x0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, def32 1, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 416 (xeyes)
interrupt mask          = net 
trap number             = 12
panic: page fault

syncing disks... 7 4 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 
done
Uptime: 6m36s

dumping to dev #ad/1, offset 150076
dump ata0: resetting devices .. ata0-slave: timeout waiting for command=ef s=00 e=00
done
127 126 125 124 123 122 121 120 119 118 117 116 115 114 113 112 111 110 109 108 107 106 105 104 103 102 101 100 99 98 97 96 95 94 93 92 91 90 89 88 87 86 85 84 83 82 81 80 79 78 77 76 75 74 73 72 71 70 69 68 67 66 65 64 63 62 61 60 59 58 57 56 55 54 53 52 51 50 49 48 47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 
---
#0  dumpsys () at /src/src-4/sys/kern/kern_shutdown.c:472
472             if (dumping++) {
(kgdb) where
#0  dumpsys () at /src/src-4/sys/kern/kern_shutdown.c:472
#1  0xc013bd7f in boot (howto=256) at /src/src-4/sys/kern/kern_shutdown.c:312
#2  0xc013c165 in panic (fmt=0xc02574ec "%s") at /src/src-4/sys/kern/kern_shutdown.c:580
#3  0xc02203ef in trap_fatal (frame=0xc843bd60, eva=225780) at /src/src-4/sys/i386/i386/trap.c:956
#4  0xc022009d in trap_pfault (frame=0xc843bd60, usermode=0, eva=225780) at /src/src-4/sys/i386/i386/trap.c:849
#5  0xc021fc43 in trap (frame={tf_fs = -1063387120, tf_es = 16, tf_ds = 16, tf_edi = 6684672, tf_esi = 225764, tf_ebp = -935084628, tf_isp = -935084660, 
      tf_ebx = 225764, tf_edx = 6684672, tf_ecx = 28, tf_eax = 6684672, tf_trapno = 12, tf_err = 0, tf_eip = -1072335547, tf_cs = 8, tf_eflags = 66054, 
      tf_esp = -1066149888, tf_ss = -1066149888}) at /src/src-4/sys/i386/i386/trap.c:448
#6  0xc0157545 in m_freem (m=0x371e4) at /src/src-4/sys/kern/uipc_mbuf.c:618
#7  0xc015746d in m_free (m=0xc073d800) at /src/src-4/sys/kern/uipc_mbuf.c:605
#8  0xc015c638 in sbcompress (sb=0xc78f388c, m=0xc073aa00, n=0x0) at /src/src-4/sys/kern/uipc_socket2.c:718
#9  0xc015c257 in sbappend (sb=0xc78f388c, m=0xc073aa00) at /src/src-4/sys/kern/uipc_socket2.c:506
#10 0xc015f5c9 in uipc_send (so=0xc78f3900, flags=0, m=0xc073aa00, nam=0x0, control=0x0, p=0xc8415f60) at /src/src-4/sys/kern/uipc_usrreq.c:344
#11 0xc0159f53 in sosend (so=0xc78f3900, addr=0x0, uio=0xc843bed8, top=0xc073aa00, control=0x0, flags=0, p=0xc8415f60) at /src/src-4/sys/kern/uipc_socket.c:611
#12 0xc014dbe8 in soo_write (fp=0xc0a2b480, uio=0xc843bed8, cred=0xc0afc080, flags=0, p=0xc8415f60) at /src/src-4/sys/kern/sys_socket.c:81
#13 0xc014a74d in dofilewrite (p=0xc8415f60, fp=0xc0a2b480, fd=3, buf=0x8054800, nbyte=8, offset=-1, flags=0) at /src/src-4/sys/sys/file.h:162
#14 0xc014a606 in write (p=0xc8415f60, uap=0xc843bf80) at /src/src-4/sys/kern/sys_generic.c:329
#15 0xc02206a1 in syscall2 (frame={tf_fs = 47, tf_es = 47, tf_ds = -1078001617, tf_edi = 134561792, tf_esi = 134563840, tf_ebp = -1077938904, tf_isp = -935084076, 
      tf_ebx = 672709204, tf_edx = 672694464, tf_ecx = 29360136, tf_eax = 4, tf_trapno = 7, tf_err = 2, tf_eip = 673326260, tf_cs = 31, tf_eflags = 647, 
      tf_esp = -1077938948, tf_ss = 47}) at /src/src-4/sys/i386/i386/trap.c:1155
#16 0xc02144c5 in Xint0x80_syscall ()
#17 0x281388d3 in ?? ()
#18 0x2811d84e in ?? ()
#19 0x2811ec0a in ?? ()
#20 0x28115d36 in ?? ()
#21 0x8049833 in ?? ()
#22 0x2809bb7c in ?? ()
#23 0x2809bdf1 in ?? ()
#24 0x280919be in ?? ()
#25 0x80491ac in ?? ()
#26 0x8048f59 in ?? ()
(kgdb) 

********************************************************************************

root@schlappy:/var/crash>gdb -k kernel vmcore.9 
GNU gdb 4.18
Copyright 1998 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i386-unknown-freebsd"...
IdlePTD 3276800
initial pcb at 293180
panicstr: page fault
panic messages:
---
Fatal trap 12: page fault while in kernel mode
fault virtual address   = 0x0
fault code              = supervisor read, page not present
instruction pointer     = 0x8:0x0
stack pointer           = 0x10:0xc833bc8c
frame pointer           = 0x10:0xc833bd1c
code segment            = base 0x0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, def32 1, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 3
current process         = 357 (Xserver)
interrupt mask          = none
trap number             = 12
panic: page fault

syncing disks... 5 4 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 
done
Uptime: 4m31s

dumping to dev #ad/1, offset 150076
dump ata0: resetting devices .. ata0-slave: timeout waiting for command=ef s=00 e=00
done
127 126 125 124 123 122 121 120 119 118 117 116 115 114 113 112 111 110 109 108 107 106 105 104 103 102 101 100 99 98 97 96 95 94 93 92 91 90 89 88 87 86 85 84 83 82 81 80 79 78 77 76 75 74 73 72 71 70 69 68 67 66 65 64 63 62 61 60 59 58 57 56 55 54 53 52 51 50 49 48 47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 
---
#0  dumpsys () at /src/src-4/sys/kern/kern_shutdown.c:472
472             if (dumping++) {
(kgdb) where
#0  dumpsys () at /src/src-4/sys/kern/kern_shutdown.c:472
#1  0xc013bd7f in boot (howto=256) at /src/src-4/sys/kern/kern_shutdown.c:312
#2  0xc013c165 in panic (fmt=0xc02574ec "%s") at /src/src-4/sys/kern/kern_shutdown.c:580
#3  0xc02203ef in trap_fatal (frame=0xc833bc4c, eva=0) at /src/src-4/sys/i386/i386/trap.c:956
#4  0xc022009d in trap_pfault (frame=0xc833bc4c, usermode=0, eva=0) at /src/src-4/sys/i386/i386/trap.c:849
#5  0xc021fc43 in trap (frame={tf_fs = -1071054832, tf_es = -936181744, tf_ds = -936181744, tf_edi = -1071020724, tf_esi = -946937856, tf_ebp = -936133348, 
      tf_isp = -936133512, tf_ebx = -950131232, tf_edx = -936140800, tf_ecx = -936133432, tf_eax = 14, tf_trapno = 12, tf_err = 0, tf_eip = 0, tf_cs = 8, 
      tf_eflags = 78466, tf_esp = -65536, tf_ss = 0}) at /src/src-4/sys/i386/i386/trap.c:448
#6  0x0 in ?? ()
(kgdb) 

>How-To-Repeat:

Not difficult here, e.g., by pulling the crashdumps from the box onto
another machine with ftp :-)

>Fix:

Unknown. I have the dumps for further investigation. I can even
upload them somewhere if needed. I can also do tests if desired.
>Release-Note:
>Audit-Trail:

From: David Malone <dwmalone@maths.tcd.ie>
To: Andre Albsmeier <andre.albsmeier@mchp.siemens.de>
Cc: FreeBSD-gnats-submit@freebsd.org,
	Ian Dowse <iedowse@maths.tcd.ie>
Subject: Re: kern/29845: 4.4-PRERELEASE crashes under heavy net I/O
Date: Sat, 18 Aug 2001 14:49:23 +0100

 On Sat, Aug 18, 2001 at 01:33:22PM +0200, Andre Albsmeier wrote:
 > I have saved the crashdumps for further examination. As you can see,
 > the box crashes in whichever process it wants...
 
 I'd guess that something is freeing an mbuf while it is still in
 use.  This would result in either a panic when the mbuf is corrupted
 while in use or a double freeing of the mbuf. This could plausable
 explain the panics you included trace backs for.
 
 I think Ian Dowse has some tools for examining the mbuf free lists
 in kernel dumps. He did also have some patches for catching writes
 to shared or free mbuf clusters, which might help figure out what's
 going on here.
 
 The only thing that doesn't tally is that this is only effecting
 your laptop and not all your machines.
 
 	David.

From: Andre Albsmeier <andre.albsmeier@mchp.siemens.de>
To: David Malone <dwmalone@maths.tcd.ie>
Cc: Andre Albsmeier <andre.albsmeier@mchp.siemens.de>,
	FreeBSD-gnats-submit@freebsd.org, Ian Dowse <iedowse@maths.tcd.ie>
Subject: Re: kern/29845: 4.4-PRERELEASE crashes under heavy net I/O
Date: Sat, 18 Aug 2001 19:55:37 +0200

 On Sat, 18-Aug-2001 at 14:49:23 +0100, David Malone wrote:
 > On Sat, Aug 18, 2001 at 01:33:22PM +0200, Andre Albsmeier wrote:
 > > I have saved the crashdumps for further examination. As you can see,
 > > the box crashes in whichever process it wants...
 > 
 > I'd guess that something is freeing an mbuf while it is still in
 > use.  This would result in either a panic when the mbuf is corrupted
 > while in use or a double freeing of the mbuf. This could plausable
 > explain the panics you included trace backs for.
 > 
 > I think Ian Dowse has some tools for examining the mbuf free lists
 > in kernel dumps. He did also have some patches for catching writes
 > to shared or free mbuf clusters, which might help figure out what's
 > going on here.
 
 As I said: I am glad to try anything.
 
 > The only thing that doesn't tally is that this is only effecting
 > your laptop and not all your machines.
 
 The first thing I thought of was a hardware problem. But the old
 version ran fine as does Win98 :-). But:
 
 <wild and amateurish speculation on>
 I am using the Intel Etherexpress 100MBit PCMCIA card with the xe
 driver. The driver is somehow inefficient: When doing heavy net I/O
 with it, the load gets up to 4 and higher. It has always been like
 this. Maybe some changes with the mbuf handling and Warners recent
 pccard commits cause these problems under load now.
 </wild and amateurish speculation on>
 
 Sometimes I can ftp the crashdumps to another machine, sometimes not.
 Hmm, I have the same box again at work. On monday I will swap the
 harddrives and see how this behaves...
 
 	-Andre

From: Andre Albsmeier <andre.albsmeier@mchp.siemens.de>
To: David Malone <dwmalone@maths.tcd.ie>
Cc: Andre Albsmeier <andre.albsmeier@mchp.siemens.de>,
	FreeBSD-gnats-submit@freebsd.org, Ian Dowse <iedowse@maths.tcd.ie>
Subject: Re: kern/29845: 4.4-PRERELEASE crashes under heavy net I/O
Date: Tue, 21 Aug 2001 09:43:14 +0200

 On Sat, 18-Aug-2001 at 14:49:23 +0100, David Malone wrote:
 > On Sat, Aug 18, 2001 at 01:33:22PM +0200, Andre Albsmeier wrote:
 > > I have saved the crashdumps for further examination. As you can see,
 > > the box crashes in whichever process it wants...
 > 
 > I'd guess that something is freeing an mbuf while it is still in
 > use.  This would result in either a panic when the mbuf is corrupted
 > while in use or a double freeing of the mbuf. This could plausable
 > explain the panics you included trace backs for.
 > 
 > I think Ian Dowse has some tools for examining the mbuf free lists
 > in kernel dumps. He did also have some patches for catching writes
 > to shared or free mbuf clusters, which might help figure out what's
 > going on here.
 > 
 > The only thing that doesn't tally is that this is only effecting
 > your laptop and not all your machines.
 
 OK, I have some news here:
 
 1.) I put the harddisk into another machine of the same type (Siemens
     Mobile 510 AGP). Same bad effects here. So we can be quite sure
     it is no problem with RAM/CPU ...
 
 2.) I tried the newest 4.4-RC1. Same problems.
 
 
 Now it comes:
 
 3.) I put the box into a docking station which got an Intel 
     Etherexpress PRO 100 sitting on the PCI bus. Now I can
     stress the machine as much as I want... no problems.
     As soon as I go back using the pccard stuff for networking
     my problems are back.
 
 
 It really seems to be somehow pccard related...
 
 	-Andre
 

From: Ian Dowse <iedowse@maths.tcd.ie>
To: Andre Albsmeier <andre.albsmeier@mchp.siemens.de>
Cc: Warner Losh <imp@harmony.village.org>,
	freebsd-gnats-submit@FreeBSD.org
Subject: Re: kern/29845: 4.4-PRERELEASE crashes under heavy net I/O 
Date: Tue, 21 Aug 2001 14:42:54 +0100

 [ Andre tried adding the kludge from if_sl.c that merges the tty and
 net interrupt masks - not as a solution - but just to determine if
 these crashes are caused by an spl problem. It seems they are. ]
 
 In message <20010821143627.A26964@curry.mchp.siemens.de>, Andre Albsmeier write
 s:
 >On Tue, 21-Aug-2001 at 11:52:57 +0100, Ian Dowse wrote:
 >> The fact that it is only pccard cards that have problems really
 >> suggests a problem there, but the crashes are so random that it
 >> has to be some kind of spl problem. Previously the cards got their
 >> own IRQ, so they would set it up with the right interrupt mask.
 >> Now all pccard interrupt handlers are called from the pcic one,
 >> so I don't think splimp() is blocking these interrupts.
 >
 >Yes, that seems to do it! I have my dd from /dev/zero to /dev/null via
 >rsh running in both directions for about 5 minutes now... No problems
 >so far.
 
 So at the moment, when the network code calls splimp(), it does
 not block NIC interrupts that come in via the pccard code. That
 certainly explains all the odd crashes.
 
 I'm not sure how to solve this problem properly, but it seems that
 pcic_pci_setup_intr() needs to call bus_generic_setup_intr() to
 properly update the interrupt masks. I assume there is a reason
 for not just using bus_generic_setup_intr() as the pcic_pci
 bus_setup_intr method?
 
 Thanks for trying out that kludge Andre! Hopefully there's enough
 information now to get it fixed properly.
 
 Ian

From: Andre Albsmeier <andre.albsmeier@mchp.siemens.de>
To: Ian Dowse <iedowse@maths.tcd.ie>
Cc: Andre Albsmeier <andre.albsmeier@mchp.siemens.de>,
	Warner Losh <imp@harmony.village.org>,
	freebsd-gnats-submit@FreeBSD.org
Subject: Re: kern/29845: 4.4-PRERELEASE crashes under heavy net I/O
Date: Tue, 21 Aug 2001 16:17:49 +0200

 On Tue, 21-Aug-2001 at 14:42:54 +0100, Ian Dowse wrote:
 > 
 > [ Andre tried adding the kludge from if_sl.c that merges the tty and
 > net interrupt masks - not as a solution - but just to determine if
 > these crashes are caused by an spl problem. It seems they are. ]
 > 
 > In message <20010821143627.A26964@curry.mchp.siemens.de>, Andre Albsmeier write
 > s:
 > >On Tue, 21-Aug-2001 at 11:52:57 +0100, Ian Dowse wrote:
 > >> The fact that it is only pccard cards that have problems really
 > >> suggests a problem there, but the crashes are so random that it
 > >> has to be some kind of spl problem. Previously the cards got their
 > >> own IRQ, so they would set it up with the right interrupt mask.
 > >> Now all pccard interrupt handlers are called from the pcic one,
 > >> so I don't think splimp() is blocking these interrupts.
 > >
 > >Yes, that seems to do it! I have my dd from /dev/zero to /dev/null via
 > >rsh running in both directions for about 5 minutes now... No problems
 > >so far.
 > 
 > So at the moment, when the network code calls splimp(), it does
 > not block NIC interrupts that come in via the pccard code. That
 > certainly explains all the odd crashes.
 > 
 > I'm not sure how to solve this problem properly, but it seems that
 > pcic_pci_setup_intr() needs to call bus_generic_setup_intr() to
 > properly update the interrupt masks. I assume there is a reason
 > for not just using bus_generic_setup_intr() as the pcic_pci
 > bus_setup_intr method?
 > 
 > Thanks for trying out that kludge Andre! Hopefully there's enough
 > information now to get it fixed properly.
 
 Well I was only whining about the problem, you fixed it (or at least
 isolated it) :-)
 
 Anyway, I am looking forward to testing other suggestions. It seems
 that I have an environment that triggers the problem easily.
 
 Thanks,
 
 	-Andre

From: Warner Losh <imp@harmony.village.org>
To: Ian Dowse <iedowse@maths.tcd.ie>
Cc: Andre Albsmeier <andre.albsmeier@mchp.siemens.de>,
	freebsd-gnats-submit@FreeBSD.org
Subject: Re: kern/29845: 4.4-PRERELEASE crashes under heavy net I/O 
Date: Tue, 21 Aug 2001 09:35:51 -0600

 In message <200108211442.aa32071@salmon.maths.tcd.ie> Ian Dowse writes:
 : I'm not sure how to solve this problem properly, but it seems that
 : pcic_pci_setup_intr() needs to call bus_generic_setup_intr() to
 : properly update the interrupt masks. I assume there is a reason
 : for not just using bus_generic_setup_intr() as the pcic_pci
 : bus_setup_intr method?
 
 I wanted the ability to intercept the interrupt.  I can do that easily 
 enough with a second function...  I'm still not sure the proper way to 
 handle this.  But if I'm understanding you correctly, we're not
 blocking splnet interrupts.  But in this case, when there's only one
 network card, wouldn't the net spl mask only have one bit, which is
 the IRQ that we're in?
 
 Warner

From: Warner Losh <imp@harmony.village.org>
To: Andre Albsmeier <andre.albsmeier@mchp.siemens.de>
Cc: Ian Dowse <iedowse@maths.tcd.ie>,
	freebsd-gnats-submit@FreeBSD.org
Subject: Re: kern/29845: 4.4-PRERELEASE crashes under heavy net I/O 
Date: Tue, 21 Aug 2001 09:39:50 -0600

 In message <20010821161749.A29621@curry.mchp.siemens.de> Andre Albsmeier writes:
 : Well I was only whining about the problem, you fixed it (or at least
 : isolated it) :-)
 
 Here's a simple fix you can try.  I don't see how this would help, but 
 if it does, we know what the problem is.  Ian suggested this a while
 ago, and I'm still not sure how this could be a problem, but if it is
 Ian's suggestions are right.
 
 Warner
 
 Index: pcic_pci.c
 ===================================================================
 RCS file: /home/imp/FreeBSD/CVS/src/sys/pccard/pcic_pci.c,v
 retrieving revision 1.54.2.7
 diff -u -r1.54.2.7 pcic_pci.c
 --- pcic_pci.c	2001/08/21 09:06:25	1.54.2.7
 +++ pcic_pci.c	2001/08/21 15:38:29
 @@ -522,8 +522,11 @@
  	 * interrupt handler for it.  Since multifunction cards aren't
  	 * supported, this shouldn't cause a problem in practice.
  	 */
 -	if (sc->cd_present && sp->intr != NULL)
 +	if (sc->cd_present && sp->intr != NULL) {
 +		s = splhigh();
  		sp->intr(sp->argp);
 +		splx(s);
 +	}
  }
  
  /*

From: Ian Dowse <iedowse@maths.tcd.ie>
To: Warner Losh <imp@harmony.village.org>
Cc: Andre Albsmeier <andre.albsmeier@mchp.siemens.de>,
	freebsd-gnats-submit@FreeBSD.org
Subject: Re: kern/29845: 4.4-PRERELEASE crashes under heavy net I/O 
Date: Tue, 21 Aug 2001 17:13:26 +0100

 In message <200108211539.f7LFdoW65851@harmony.village.org>, Warner Losh writes:
 >Here's a simple fix you can try.  I don't see how this would help, but 
 >if it does, we know what the problem is.  Ian suggested this a while
 >ago, and I'm still not sure how this could be a problem, but if it is
 >Ian's suggestions are right.
 
 No, I was confused when I suggested this to you :-) It is too late
 when pcic_pci_intr() is called, because at that point a critical
 section of some network code has already been interrupted. Once a
 NIC has registered a net interrupt on IRQ X, splimp() should mask
 IRQ X, but here the pcic code never changes the interrupt mask when
 a NIC registers its interrupt.
 
 e.g. consider some network code that does splimp():
 
 	s = splimp();
 
 	(critical stuff where no net interrupts should occur)
 
 	<pcic interrupt occurs>
 		pcic_pci_intr() called
 			s = splhigh();
 			(this blocks further interrupts)
 
 			NIC ISR called
 				(messes with splimp-protected state)
 
 			splx(x);
 		pcic_pci_intr() returns
 	<pcic interrupt end>
 
 	(network code finds its state messed up)
 
 	splx(s);
 
 When the pccard NIC sets up its interrupt, it needs to go through
 all the mask adjustment behind bus_generic_setup_intr() to ensure
 that the first splimp() call above actually blocks the pcic interrupts
 too. That's why I'm suggesting using bus_generic_setup_intr() either
 within or instead of pcic_pci_setup_intr().
 
 Ian
 

From: Warner Losh <imp@harmony.village.org>
To: Ian Dowse <iedowse@maths.tcd.ie>
Cc: Andre Albsmeier <andre.albsmeier@mchp.siemens.de>,
	freebsd-gnats-submit@FreeBSD.org
Subject: Re: kern/29845: 4.4-PRERELEASE crashes under heavy net I/O 
Date: Tue, 21 Aug 2001 10:33:13 -0600

 In message <200108211713.aa61585@salmon.maths.tcd.ie> Ian Dowse writes:
 : No, I was confused when I suggested this to you :-) It is too late
 : when pcic_pci_intr() is called, because at that point a critical
 : section of some network code has already been interrupted. Once a
 : NIC has registered a net interrupt on IRQ X, splimp() should mask
 : IRQ X, but here the pcic code never changes the interrupt mask when
 : a NIC registers its interrupt.
 
 Lightbulb.  I completely understand now.  NEWCARD has exactly the same 
 problem.
 
 : When the pccard NIC sets up its interrupt, it needs to go through
 : all the mask adjustment behind bus_generic_setup_intr() to ensure
 : that the first splimp() call above actually blocks the pcic interrupts
 : too. That's why I'm suggesting using bus_generic_setup_intr() either
 : within or instead of pcic_pci_setup_intr().
 
 I think we need to use it within pcic_pci_setup_intr so our own
 function gets called and we only call the ISR if the card is in
 place.
 
 My splhigh() changes have 0 chance of working.
 
 Warner
State-Changed-From-To: open->feedback 
State-Changed-By: iedowse 
State-Changed-When: Sun Aug 26 04:08:22 PDT 2001 
State-Changed-Why:  

I think this has been resolved now - may I close the PR? 

http://www.FreeBSD.org/cgi/query-pr.cgi?pr=29845 
State-Changed-From-To: feedback->closed 
State-Changed-By: iedowse 
State-Changed-When: Sun Aug 26 07:36:41 PDT 2001 
State-Changed-Why:  

Fixed by a number of pccard and interrupt changes over the last 
week. I think pcic_pci.c rev 1.54.2.8 solved the main problem, 
which was that NIC interrupts were not set up to be blocked by 
splimp(). Thanks for the bug report! 

http://www.FreeBSD.org/cgi/query-pr.cgi?pr=29845 
>Unformatted:
