From nobody@FreeBSD.org  Tue Oct 24 18:36:01 2006
Return-Path: <nobody@FreeBSD.org>
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 2B93B16A412
	for <freebsd-gnats-submit@FreeBSD.org>; Tue, 24 Oct 2006 18:36:01 +0000 (UTC)
	(envelope-from nobody@FreeBSD.org)
Received: from www.freebsd.org (www.freebsd.org [216.136.204.117])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 4CB5D43DC0
	for <freebsd-gnats-submit@FreeBSD.org>; Tue, 24 Oct 2006 18:35:46 +0000 (GMT)
	(envelope-from nobody@FreeBSD.org)
Received: from www.freebsd.org (localhost [127.0.0.1])
	by www.freebsd.org (8.13.1/8.13.1) with ESMTP id k9OIZj8W048674
	for <freebsd-gnats-submit@FreeBSD.org>; Tue, 24 Oct 2006 18:35:45 GMT
	(envelope-from nobody@www.freebsd.org)
Received: (from nobody@localhost)
	by www.freebsd.org (8.13.1/8.13.1/Submit) id k9OIZjBR048673;
	Tue, 24 Oct 2006 18:35:45 GMT
	(envelope-from nobody)
Message-Id: <200610241835.k9OIZjBR048673@www.freebsd.org>
Date: Tue, 24 Oct 2006 18:35:45 GMT
From: Kai Gallasch<gallasch@free.de>
To: freebsd-gnats-submit@FreeBSD.org
Subject: kernel panic 6.2 prerelease-20061017 amd64 
X-Send-Pr-Version: www-3.0

>Number:         104765
>Category:       kern
>Synopsis:       kernel panic 6.2 prerelease-20061017 amd64
>Confidential:   no
>Severity:       serious
>Priority:       high
>Responsible:    rwatson
>State:          closed
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Tue Oct 24 18:40:17 GMT 2006
>Closed-Date:    Wed Dec 06 12:44:45 GMT 2006
>Last-Modified:  Wed Dec 06 12:44:45 GMT 2006
>Originator:     Kai Gallasch
>Release:        6.2 prerelease (checkout 20061017)
>Organization:
FREE!
>Environment:
FreeBSD geldkraft.free.de 6.2-PRERELEASE FreeBSD 6.2-PRERELEASE #0: Sun Oct 22 13:36:38 CEST 2006     houdini@geldkraft.free.de:/usr/obj/usr/src/sys/SMP  amd64
>Description:
Kernel panics after 1-3 days uptime with trap number 12 - page fault.



kernel config:
--------------
GENERIC (SMP) with "makeoptions DEBUG=-g"
$FreeBSD: src/sys/amd64/conf/GENERIC,v 1.439.2.14 2006/10/09 18:41:36 simon Exp $


Hardware:
---------

HP/Compaq DL385 Dual Opteron (Dual Core) with ServeRaid 6 (Raid 5) and 1G RAM.


dmesg:
------

Copyright (c) 1992-2006 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
        The Regents of the University of California. All rights reserved.
FreeBSD is a registered trademark of The FreeBSD Foundation.
FreeBSD 6.2-PRERELEASE #0: Sun Oct 22 13:36:38 CEST 2006
    houdini@geldkraft.free.de:/usr/obj/usr/src/sys/SMP
ACPI APIC Table: <HP     00000083>
Timecounter "i8254" frequency 1193182 Hz quality 0
CPU: AMD Opteron(tm) Processor 280 (2405.47-MHz K8-class CPU)
  Origin = "AuthenticAMD"  Id = 0x20f12  Stepping = 2
  Features=0x178bfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2,HTT>
  Features2=0x1<SSE3>
  AMD Features=0xe2500800<SYSCALL,NX,MMX+,FFXSR,LM,3DNow+,3DNow>
  AMD Features2=0x2<CMP>
  Cores per package: 2
real memory  = 1073709056 (1023 MB)
avail memory = 1023938560 (976 MB)
FreeBSD/SMP: Multiprocessor System Detected: 4 CPUs
 cpu0 (BSP): APIC ID:  0
 cpu1 (AP): APIC ID:  1
 cpu2 (AP): APIC ID:  2
 cpu3 (AP): APIC ID:  3
MADT: Forcing active-low polarity and level trigger for SCI
ioapic0 <Version 1.1> irqs 0-23 on motherboard
ioapic1 <Version 1.1> irqs 24-27 on motherboard
ioapic2 <Version 1.1> irqs 28-31 on motherboard
ioapic3 <Version 1.1> irqs 32-35 on motherboard
ioapic4 <Version 1.1> irqs 36-39 on motherboard
kbd1 at kbdmux0
ath_hal: 0.9.17.2 (AR5210, AR5211, AR5212, RF5111, RF5112, RF2413, RF5413)
acpi0: <HP A05> on motherboard
acpi0: Power Button (fixed)
Timecounter "ACPI-fast" frequency 3579545 Hz quality 1000
acpi_timer0: <32-bit timer at 3.579545MHz> port 0x908-0x90b on acpi0
cpu0: <ACPI CPU> on acpi0
cpu1: <ACPI CPU> on acpi0
cpu2: <ACPI CPU> on acpi0
cpu3: <ACPI CPU> on acpi0
pcib0: <ACPI Host-PCI bridge> on acpi0
pci0: <ACPI PCI bus> on pcib0
pcib1: <ACPI PCI-PCI bridge> at device 3.0 on pci0
pci1: <ACPI PCI bus> on pcib1
ohci0: <OHCI (generic) USB controller> mem 0xf7df0000-0xf7df0fff irq 19 at device 0.0 on pci1
ohci0: [GIANT-LOCKED]
usb0: OHCI version 1.0, legacy support
usb0: SMM does not respond, resetting
usb0: <OHCI (generic) USB controller> on ohci0
usb0: USB revision 1.0
uhub0: AMD OHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub0: 3 ports with 3 removable, self powered
ohci1: <OHCI (generic) USB controller> mem 0xf7de0000-0xf7de0fff irq 19 at device 0.1 on pci1
ohci1: [GIANT-LOCKED]
usb1: OHCI version 1.0, legacy support
usb1: SMM does not respond, resetting
usb1: <OHCI (generic) USB controller> on ohci1
usb1: USB revision 1.0
uhub1: AMD OHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub1: 3 ports with 3 removable, self powered
pci1: <base peripheral> at device 2.0 (no driver attached)
pci1: <base peripheral> at device 2.2 (no driver attached)
pci1: <display, VGA> at device 3.0 (no driver attached)
isab0: <PCI-ISA bridge> at device 4.0 on pci0
isa0: <ISA bus> on isab0
atapci0: <AMD 8111 UDMA133 controller> port 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0x2000-0x200f at device 4.1 on pci0
ata0: <ATA channel 0> on atapci0
ata1: <ATA channel 1> on atapci0
pci0: <bridge> at device 4.3 (no driver attached)
pcib2: <ACPI PCI-PCI bridge> at device 7.0 on pci0
pci2: <ACPI PCI bus> on pcib2
ciss0: <HP Smart Array 6i> port 0x5000-0x50ff mem 0xf7ef0000-0xf7ef1fff,0xf7e80000-0xf7ebffff irq 24 at device 4.0 on pci2
ciss0: [GIANT-LOCKED]
pci0: <base peripheral, interrupt controller> at device 7.1 (no driver attached)
pcib3: <ACPI PCI-PCI bridge> at device 8.0 on pci0
pci3: <ACPI PCI bus> on pcib3
bge0: <Broadcom BCM5704 B0, ASIC rev. 0x2100> mem 0xf7ff0000-0xf7ffffff irq 28 at device 6.0 on pci3
miibus0: <MII bus> on bge0
brgphy0: <BCM5704 10/100/1000baseTX PHY> on miibus0
brgphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseTX, 1000baseTX-FDX, auto
bge0: Ethernet address: 00:17:a4:8f:27:68
bge1: <Broadcom BCM5704 B0, ASIC rev. 0x2100> mem 0xf7fe0000-0xf7feffff irq 29 at device 6.1 on pci3
miibus1: <MII bus> on bge1
brgphy1: <BCM5704 10/100/1000baseTX PHY> on miibus1
brgphy1:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseTX, 1000baseTX-FDX, auto
bge1: Ethernet address: 00:17:a4:8f:27:67
pci0: <base peripheral, interrupt controller> at device 8.1 (no driver attached)
pcib4: <ACPI Host-PCI bridge> on acpi0
pci4: <ACPI PCI bus> on pcib4
pcib5: <ACPI PCI-PCI bridge> at device 9.0 on pci4
pci5: <ACPI PCI bus> on pcib5
pci4: <base peripheral, interrupt controller> at device 9.1 (no driver attached)
pcib6: <ACPI PCI-PCI bridge> at device 10.0 on pci4
pci6: <ACPI PCI bus> on pcib6
pci4: <base peripheral, interrupt controller> at device 10.1 (no driver attached)
atkbdc0: <Keyboard controller (i8042)> port 0x60,0x64 irq 1 on acpi0
atkbd0: <AT Keyboard> flags 0x1 irq 1 on atkbdc0
kbd0 at atkbd0
atkbd0: [GIANT-LOCKED]
psm0: <PS/2 Mouse> irq 12 on atkbdc0
psm0: [GIANT-LOCKED]
psm0: model IntelliMouse, device ID 3
sio0: <Standard PC COM port> port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0
sio0: type 16550A, console
fdc0: <floppy drive controller (FDE)> port 0x3f2-0x3f5 irq 6 drq 2 on acpi0
fdc0: [FAST]
orm0: <ISA Option ROMs> at iomem 0xc0000-0xc7fff,0xc8000-0xcbfff,0xee000-0xeffff on isa0
ppc0: cannot reserve I/O port range
sc0: <System console> at flags 0x100 on isa0
sc0: VGA <16 virtual consoles, flags=0x300>
sio1 at port 0x2f8-0x2ff irq 3 on isa0
sio1: type 16550A
vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0
Timecounters tick every 1.000 msec
acd0: CDROM <HL-DT-ST GCR-8240N/2.03> at ata0-master PIO4
SMP: AP CPU #1 Launched!
SMP: AP CPU #3 Launched!
SMP: AP CPU #2 Launched!
da0 at ciss0 bus 0 target 0 lun 0
da0: <COMPAQ RAID 5  VOLUME OK> Fixed Direct Access SCSI-0 device 
da0: 135.168MB/s transfers
da0: 17200MB (35226720 512 byte sectors: 255H 32S/T 4317C)
da1 at ciss0 bus 0 target 1 lun 0
da1: <COMPAQ RAID 5  VOLUME OK> Fixed Direct Access SCSI-0 device 
da1: 135.168MB/s transfers
da1: 17200MB (35226720 512 byte sectors: 255H 32S/T 4317C)
da2 at ciss0 bus 0 target 2 lun 0
da2: <COMPAQ RAID 5  VOLUME OK> Fixed Direct Access SCSI-0 device 
da2: 135.168MB/s transfers
da2: 69499MB (142334880 512 byte sectors: 255H 32S/T 17443C)
da3 at ciss0 bus 0 target 3 lun 0
da3: <COMPAQ RAID 5  VOLUME OK> Fixed Direct Access SCSI-0 device 
da3: 135.168MB/s transfers
da3: 69499MB (142334880 512 byte sectors: 255H 32S/T 17443C)
da4 at ciss0 bus 0 target 4 lun 0
da4: <COMPAQ RAID 5  VOLUME OK> Fixed Direct Access SCSI-0 device 
da4: 135.168MB/s transfers
da4: 139799MB (286309920 512 byte sectors: 255H 32S/T 35087C)


backtrace:
----------

[GDB will not be able to debug user-mode threads: /usr/lib/libthread_db.so: Undefined symbol "ps_pglobal_lookup"]
GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "amd64-marcel-freebsd".

Unread portion of the kernel message buffer:
d, page not present
instruction pointer     = 0x8:0xffffffff803eea47
stack pointer           = 0x10:0xffffffffa814a8b0
frame pointer           = 0x10:0x4
code segment            = base 0x0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = resume, IOPL = 0
current process         = 27596 (tcpserver)
trap number             = 12
panic: page fault
cpuid = 3
Uptime: 2h12m0s
Dumping 1023 MB (2 chunks)
  chunk 0: 1MB (156 pages) ... ok
  chunk 1: 1023MB (261880 pages) 1008 992 976 960 944 928 912 896 880 864 848 832 816 800 784 768 752 736 720 704 688 672 656 640 624 608 592 576 560 544 528 512 496 480 464 448 432 416 400 384 368 352 336 320 304 288 272 256 240 224 208 192 176 160 144 128 112 96 80 64 48 32 16

#0  doadump () at pcpu.h:172
172     pcpu.h: No such file or directory.
        in pcpu.h
(kgdb) quit
geldkraft:/etc # mount /usr/src/
geldkraft:/etc # cd /usr/src/sys/amd64/conf/
geldkraft:/usr/src/sys/amd64/conf # kgdb SMP  /var/crash/vmcore.0
kgdb: bad namelist - no kernbase
geldkraft:/usr/src/sys/amd64/conf # kgdb /boot/kernel/kernel /var/crash/vmcore.0
[GDB will not be able to debug user-mode threads: /usr/lib/libthread_db.so: Undefined symbol "ps_pglobal_lookup"]
GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "amd64-marcel-freebsd".

Unread portion of the kernel message buffer:
d, page not present
instruction pointer     = 0x8:0xffffffff803eea47
stack pointer           = 0x10:0xffffffffa814a8b0
frame pointer           = 0x10:0x4
code segment            = base 0x0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = resume, IOPL = 0
current process         = 27596 (tcpserver)
trap number             = 12
panic: page fault
cpuid = 3
Uptime: 2h12m0s
Dumping 1023 MB (2 chunks)
  chunk 0: 1MB (156 pages) ... ok
  chunk 1: 1023MB (261880 pages) 1008 992 976 960 944 928 912 896 880 864 848 832 816 800 784 768 752 736 720 704 688 672 656 640 624 608 592 576 560 544 528 512 496 480 464 448 432 416 400 384 368 352 336 320 304 288 272 256 240 224 208 192 176 160 144 128 112 96 80 64 48 32 16

#0  doadump () at pcpu.h:172
172     pcpu.h: No such file or directory.
        in pcpu.h
(kgdb) list *0xffffffff803eea47
0xffffffff803eea47 is in _mtx_lock_sleep (/usr/src/sys/kern/kern_mutex.c:548).
543                      * If the current owner of the lock is executing on another
544                      * CPU, spin instead of blocking.
545                      */
546                     owner = (struct thread *)(v & MTX_FLAGMASK);
547     #ifdef ADAPTIVE_GIANT
548                     if (TD_IS_RUNNING(owner)) {
549     #else
550                     if (m != &Giant && TD_IS_RUNNING(owner)) {
551     #endif
552                             turnstile_release(&m->mtx_object);
(kgdb) backtrace
#0  doadump () at pcpu.h:172
#1  0x0000000000000004 in ?? ()
#2  0xffffffff803f8fd7 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:409
#3  0xffffffff803f9671 in panic (fmt=0xffffff0002116980 "X?J:") at /usr/src/sys/kern/kern_shutdown.c:565
#4  0xffffffff80618b3f in trap_fatal (frame=0xffffff0002116980, eva=18446742975175902040) at /usr/src/sys/amd64/amd64/trap.c:660
#5  0xffffffff80619066 in trap (frame=
      {tf_rdi = 11, tf_rsi = -1099476932224, tf_rdx = 6, tf_rcx = 0, tf_r8 = 4, tf_r9 = -1098475933086, tf_rax = 1, tf_rbx = -1099415090280, tf_rbp = 4, tf_r10 = 4, tf_r11 = 4, tf_r12 = -1099476932224, tf_r13 = -1098728017152, tf_r14 = 0, tf_r15 = 1, tf_trapno = 12, tf_addr = 396, tf_flags = -2141616351, tf_err = 0, tf_rip = -2143360441, tf_cs = 8, tf_rflags = 65538, tf_rsp = -1475041088, tf_ss = 16}) at /usr/src/sys/amd64/amd64/trap.c:238
#6  0xffffffff8060442b in calltrap () at /usr/src/sys/amd64/amd64/exception.S:168
#7  0xffffffff803eea47 in _mtx_lock_sleep (m=0xffffff0005c10b98, tid=18446742974232619392, opts=6, file=0x0, line=4)
    at /usr/src/sys/kern/kern_mutex.c:546
#8  0xffffffff804bb51d in ip_ctloutput (so=0xb, sopt=0xffffffffa814ab30) at /usr/src/sys/netinet/ip_output.c:1193
#9  0xffffffff804ccad5 in tcp_ctloutput (so=0xffffff0024a0d268, sopt=0xffffffffa814ab30) at /usr/src/sys/netinet/tcp_usrreq.c:1038
#10 0xffffffff804416b8 in sosetopt (so=0xffffff0024a0d268, sopt=0xffffffffa814ab30) at /usr/src/sys/kern/uipc_socket.c:1563
#11 0xffffffff80447b93 in kern_setsockopt (td=0xffffff0002116980, s=616888072, level=4, name=0, val=0x4, valseg=1035694690, valsize=11)
    at /usr/src/sys/kern/uipc_syscalls.c:1351
#12 0xffffffff80447bfe in setsockopt (td=0xb, uap=0xffffff0002116980) at /usr/src/sys/kern/uipc_syscalls.c:1307
#13 0xffffffff80619991 in syscall (frame=
      {tf_rdi = 0, tf_rsi = 0, tf_rdx = 1, tf_rcx = 0, tf_r8 = 0, tf_r9 = 140737488350072, tf_rax = 105, tf_rbx = 0, tf_rbp = 3, tf_r10 = -3689348814741910323, tf_r11 = 514, tf_r12 = 140737488350480, tf_r13 = 34368406752, tf_r14 = 0, tf_r15 = 0, tf_trapno = 12, tf_addr = 5283944, tf_flags = 12, tf_err = 2, tf_rip = 34366834188, tf_cs = 43, tf_rflags = 518, tf_rsp = 140737488350184, tf_ss = 35}) at /usr/src/sys/amd64/amd64/trap.c:792
#14 0xffffffff806045c8 in Xfast_syscall () at /usr/src/sys/amd64/amd64/exception.S:270
#15 0x00000008006c460c in ?? ()
Previous frame inner to this frame (corrupt stack?)
(kgdb) 









>How-To-Repeat:
problem occurs in between 1-3 days uptime of server
>Fix:
Raising some sysctl values seems to lengthen the intervals between crashes.
Although I might be mistaken that tweaking them has some effect on the problem.

# default war 12328
#kern.maxfiles=80000

# default 128
#kern.ipc.somaxconn=384

# default war 11095
#kern.maxfilesperproc=50000

 
>Release-Note:
>Audit-Trail:

From: Kai Gallasch <gallasch@free.de>
To: bug-followup@FreeBSD.org,  gallasch@free.de
Cc:  
Subject: Re: kern/104765: kernel panic 6.2 prerelease-20061017 amd64
Date: Wed, 25 Oct 2006 11:49:33 +0200

 Here 1*) is another backtrace of a new kernel panic. Looks very similar
 to my previous commited one - even the same current process "tcpserver"
 that is involved in the panic, which always shows up when the kernel panics.
 
 At first I thought that it's always 'tcpserver' because on a busy
 mailserver running qmail it could be expected as this process is quite
 active, but maybe the panics that I have with my 6.2-PRE are related to
 the folloing thread on freebsd-stable
 
 http://lists.freebsd.org/pipermail/freebsd-stable/2006-October/029433.html
 
 and especially (in this thread)
 
 http://lists.freebsd.org/pipermail/freebsd-stable/2006-October/029487.html
 
 Maybe then to some the snippet 2*) is helpful where I tried to follow
 what Gleb Smirnoff advised to do in
 
 http://lists.freebsd.org/pipermail/freebsd-stable/2006-October/029452.html
 
 Cheers,
 K.
 
 
 
 
 --- 1*) backtrace - 20061025 ---
 
 Unread portion of the kernel message buffer:
 sor read, page not present
 instruction pointer     = 0x8:0xffffffff803eea47
 stack pointer           = 0x10:0xffffffffa7e548b0
 frame pointer           = 0x10:0x4
 code segment            = base 0x0, limit 0xfffff, type 0x1b
                         = DPL 0, pres 1, long 1, def32 0, gran 1
 processor eflags        = resume, IOPL = 0
 current process         = 8013 (tcpserver)
 trap number             = 12
 panic: page fault
 cpuid = 2
 Uptime: 10h10m5s
 Dumping 1023 MB (2 chunks)
   chunk 0: 1MB (156 pages) ... ok
   chunk 1: 1023MB (261880 pages) 1008 992 976 960 944 928 912 896 880
 864 848 832 816 800 784 768 752 736 720 704 688 672 656 640 624 608 592
 576 560 544 528 512 496 480 464 448 432 416 400 384 368 352 336 320 304
 288 272 256 240 224 208 192 176 160 144 128 112 96 80 64 48 32 16
 
 #0  doadump () at pcpu.h:172
 172     pcpu.h: No such file or directory.
         in pcpu.h
 (kgdb) list *0xffffffff803eea47
 0xffffffff803eea47 is in _mtx_lock_sleep
 (/usr/src/sys/kern/kern_mutex.c:548).
 543                      * If the current owner of the lock is executing
 on another
 544                      * CPU, spin instead of blocking.
 545                      */
 546                     owner = (struct thread *)(v & MTX_FLAGMASK);
 547     #ifdef ADAPTIVE_GIANT
 548                     if (TD_IS_RUNNING(owner)) {
 549     #else
 550                     if (m != &Giant && TD_IS_RUNNING(owner)) {
 551     #endif
 552                             turnstile_release(&m->mtx_object);
 (kgdb) bt
 #0  doadump () at pcpu.h:172
 #1  0x0000000000000004 in ?? ()
 #2  0xffffffff803f8fd7 in boot (howto=260) at
 /usr/src/sys/kern/kern_shutdown.c:409
 #3  0xffffffff803f9671 in panic (fmt=0xffffff0010624720 "?\226\230\017")
 at /usr/src/sys/kern/kern_shutdown.c:565
 #4  0xffffffff80618b3f in trap_fatal (frame=0xffffff0010624720,
 eva=18446742974459582128) at /usr/src/sys/amd64/amd64/trap.c:660
 #5  0xffffffff80619066 in trap (frame=
       {tf_rdi = 123, tf_rsi = -1099236751584, tf_rdx = 6, tf_rcx = 0,
 tf_r8 = 0, tf_r9 = 0, tf_rax = 1, tf_rbx = -1099331437672, tf_rbp = 4,
 tf_r10 = -2050201464, tf_r11 = -1099236751584, tf_r12 = -1099236751584,
 tf_r13 = -1098723105024, tf_r14 = 0, tf_r15 = 1, tf_trapno = 12, tf_addr
 = 396, tf_flags = -2141616351, tf_err = 0, tf_rip = -2143360441, tf_cs =
 8, tf_rflags = 65538, tf_rsp = -1478145856, tf_ss = 16}) at
 /usr/src/sys/amd64/amd64/trap.c:238
 #6  0xffffffff8060442b in calltrap () at
 /usr/src/sys/amd64/amd64/exception.S:168
 #7  0xffffffff803eea47 in _mtx_lock_sleep (m=0xffffff000abd7b98,
 tid=18446742974472800032, opts=6, file=0x0, line=0) at
 /usr/src/sys/kern/kern_mutex.c:546
 #8  0xffffffff804bb51d in ip_ctloutput (so=0x7b,
 sopt=0xffffffffa7e54b30) at /usr/src/sys/netinet/ip_output.c:1193
 #9  0xffffffff804ccad5 in tcp_ctloutput (so=0xffffff0033fe14d0,
 sopt=0xffffffffa7e54b30) at /usr/src/sys/netinet/tcp_usrreq.c:1038
 #10 0xffffffff804416b8 in sosetopt (so=0xffffff0033fe14d0,
 sopt=0xffffffffa7e54b30) at /usr/src/sys/kern/uipc_socket.c:1563
 #11 0xffffffff80447b93 in kern_setsockopt (td=0xffffff0010624720,
 s=586531656, level=-2050201464, name=0, val=0x0, valseg=UIO_USERSPACE,
 valsize=123)
     at /usr/src/sys/kern/uipc_syscalls.c:1351
 #12 0xffffffff80447bfe in setsockopt (td=0x7b, uap=0xffffff0010624720)
 at /usr/src/sys/kern/uipc_syscalls.c:1307
 #13 0xffffffff80619991 in syscall (frame=
       {tf_rdi = 0, tf_rsi = 0, tf_rdx = 1, tf_rcx = 0, tf_r8 = 0, tf_r9
 = 140737488350072, tf_rax = 105, tf_rbx = 0, tf_rbp = 3, tf_r10 =
 -3689348814741910323, tf_r11 = 514, tf_r12 = 140737488350480, tf_r13 =
 34368406752, tf_r14 = 0, tf_r15 = 0, tf_trapno = 12, tf_addr = 5283944,
 tf_flags = 12, tf_err = 2, tf_rip = 34366834188, tf_cs = 43, tf_rflags =
 518, tf_rsp = 140737488350184, tf_ss = 35}) at
 /usr/src/sys/amd64/amd64/trap.c:792
 #14 0xffffffff806045c8 in Xfast_syscall () at
 /usr/src/sys/amd64/amd64/exception.S:270
 #15 0x00000008006c460c in ?? ()
 Previous frame inner to this frame (corrupt stack?)
 (kgdb)
 
 
 
 
 --- 2*) kgdb session on latest crashdump - 20061025 ---
 
 instruction pointer     = 0x8:0xffffffff803eea47
 stack pointer           = 0x10:0xffffffffa7e548b0
 frame pointer           = 0x10:0x4
 code segment            = base 0x0, limit 0xfffff, type 0x1b
                         = DPL 0, pres 1, long 1, def32 0, gran 1
 processor eflags        = resume, IOPL = 0
 current process         = 8013 (tcpserver)
 trap number             = 12
 panic: page fault
 cpuid = 2
 Uptime: 10h10m5s
 Dumping 1023 MB (2 chunks)
   chunk 0: 1MB (156 pages) ... ok
   chunk 1: 1023MB (261880 pages) 1008 992 976 960 944 928 912 896 880
 864 848 832 816 800 784 768 752 736 720 704 688 672 656 640 624 608 592
 576 560 544 528 512 496 480 464 448 432 416 400 384 368 352 336 320 304
 288 272 256 240 224 208 192 176 160 144 128 112 96 80 64 48 32 16
 
 #0  doadump () at pcpu.h:172
 172     pcpu.h: No such file or directory.
         in pcpu.h
 (kgdb) where
 #0  doadump () at pcpu.h:172
 #1  0x0000000000000004 in ?? ()
 #2  0xffffffff803f8fd7 in boot (howto=260) at
 /usr/src/sys/kern/kern_shutdown.c:409
 #3  0xffffffff803f9671 in panic (fmt=0xffffff0010624720 "?\226\230\017")
 at /usr/src/sys/kern/kern_shutdown.c:565
 #4  0xffffffff80618b3f in trap_fatal (frame=0xffffff0010624720,
 eva=18446742974459582128) at /usr/src/sys/amd64/amd64/trap.c:660
 #5  0xffffffff80619066 in trap (frame=
       {tf_rdi = 123, tf_rsi = -1099236751584, tf_rdx = 6, tf_rcx = 0,
 tf_r8 = 0, tf_r9 = 0, tf_rax = 1, tf_rbx = -1099331437672, tf_rbp = 4,
 tf_r10 = -2050201464, tf_r11 = -1099236751584, tf_r12 = -1099236751584,
 tf_r13 = -1098723105024, tf_r14 = 0, tf_r15 = 1, tf_trapno = 12, tf_addr
 = 396, tf_flags = -2141616351, tf_err = 0, tf_rip = -2143360441, tf_cs =
 8, tf_rflags = 65538, tf_rsp = -1478145856, tf_ss = 16}) at
 /usr/src/sys/amd64/amd64/trap.c:238
 #6  0xffffffff8060442b in calltrap () at
 /usr/src/sys/amd64/amd64/exception.S:168
 #7  0xffffffff803eea47 in _mtx_lock_sleep (m=0xffffff000abd7b98,
 tid=18446742974472800032, opts=6, file=0x0, line=0) at
 /usr/src/sys/kern/kern_mutex.c:546
 #8  0xffffffff804bb51d in ip_ctloutput (so=0x7b,
 sopt=0xffffffffa7e54b30) at /usr/src/sys/netinet/ip_output.c:1193
 #9  0xffffffff804ccad5 in tcp_ctloutput (so=0xffffff0033fe14d0,
 sopt=0xffffffffa7e54b30) at /usr/src/sys/netinet/tcp_usrreq.c:1038
 #10 0xffffffff804416b8 in sosetopt (so=0xffffff0033fe14d0,
 sopt=0xffffffffa7e54b30) at /usr/src/sys/kern/uipc_socket.c:1563
 #11 0xffffffff80447b93 in kern_setsockopt (td=0xffffff0010624720,
 s=586531656, level=-2050201464, name=0, val=0x0, valseg=UIO_USERSPACE,
 valsize=123)
     at /usr/src/sys/kern/uipc_syscalls.c:1351
 #12 0xffffffff80447bfe in setsockopt (td=0x7b, uap=0xffffff0010624720)
 at /usr/src/sys/kern/uipc_syscalls.c:1307
 #13 0xffffffff80619991 in syscall (frame=
       {tf_rdi = 0, tf_rsi = 0, tf_rdx = 1, tf_rcx = 0, tf_r8 = 0, tf_r9
 = 140737488350072, tf_rax = 105, tf_rbx = 0, tf_rbp = 3, tf_r10 =
 -3689348814741910323, tf_r11 = 514, tf_r12 = 140737488350480, tf_r13 =
 34368406752, tf_r14 = 0, tf_r15 = 0, tf_trapno = 12, tf_addr = 5283944,
 tf_flags = 12, tf_err = 2, tf_rip = 34366834188, tf_cs = 43, tf_rflags =
 518, tf_rsp = 140737488350184, tf_ss = 35}) at
 /usr/src/sys/amd64/amd64/trap.c:792
 #14 0xffffffff806045c8 in Xfast_syscall () at
 /usr/src/sys/amd64/amd64/exception.S:270
 #15 0x00000008006c460c in ?? ()
 Previous frame inner to this frame (corrupt stack?)
 (kgdb) frame 12
 #12 0xffffffff80447bfe in setsockopt (td=0x7b, uap=0xffffff0010624720)
 at /usr/src/sys/kern/uipc_syscalls.c:1307
 1307            return (kern_setsockopt(td, uap->s, uap->level, uap->name,
 (kgdb) p *sopt
 No symbol "sopt" in current context.
 (kgdb) p *kern_setsockopt
 $1 = {int (struct thread *, int, int, int, void *, enum uio_seg,
 socklen_t)} 0xffffffff80447a80 <kern_setsockopt>
 (kgdb) frame 12
 #12 0xffffffff80447bfe in setsockopt (td=0x7b, uap=0xffffff0010624720)
 at /usr/src/sys/kern/uipc_syscalls.c:1307
 1307            return (kern_setsockopt(td, uap->s, uap->level, uap->name,
 (kgdb) p td->td_proc->p_comm
 Cannot access memory at address 0x7b
 

----------
Remko:
Adding to PR (misfiled from 105389):
Randomly, and always while running tcpserver, a component used with qmail. The kernel will panic. This is the same issue as kern/104765 and possibly "freebsd panic on HP Proliant DL360" However, here we have i386 The mail server having this issue, handles 8000 emails a day. Panics occur every few hours to few days. Please see 104765 for traces.

Nov  9 12:50:33 whitepine kernel: Fatal trap 12: page fault while in kernel mode
Nov  9 12:50:33 whitepine kernel: fault virtual address = 0x78
Nov  9 12:50:33 whitepine kernel: fault code            = supervisor read, page not present
Nov  9 12:50:33 whitepine kernel: instruction pointer   = 0x20:0xc06807e1
Nov  9 12:50:33 whitepine kernel: stack pointer         = 0x28:0xeaf2aab8
Nov  9 12:50:33 whitepine kernel: frame pointer         = 0x28:0xeaf2aabc
Nov  9 12:50:33 whitepine kernel: code segment          = base 0x0, limit 0xfffff, type 0x1b
Nov  9 12:50:33 whitepine kernel: = DPL 0, pres 1, def32 1, gran 1
Nov  9 12:50:33 whitepine kernel: processor eflags      = resume, IOPL = 0
Nov  9 12:50:33 whitepine kernel: current process               = 15690 (tcpserver)
Nov  9 12:50:33 whitepine kernel: trap number           = 12
Nov  9 12:50:33 whitepine kernel: panic: page fault
Nov  9 12:50:33 whitepine kernel: Uptime: 51m25s
Nov  9 12:50:33 whitepine kernel: Cannot dump. No dump device defined.


From: Kai Gallasch <gallasch@free.de>
To: bug-followup@FreeBSD.org,  gallasch@free.de
Cc:  
Subject: Re: kern/104765: [hp] kernel panic 6.2 prerelease-20061017 amd64
Date: Mon, 13 Nov 2006 12:49:12 +0100

 Server now runs stable for about 10 days (no crash) with FreeBSD
 6.2-BETA3 (cvs checkout 2006/11/01) and GENERIC SMP Kernel.
 
 
 debug.mpsafenet=0
 
 We set /boot/loader.conf debug.mpsafenet=0 - seems to help here..
 
 Subject of PR 104765 has been changed from
 
 "kern/104765: kernel panic 6.2 prerelease-20061017 amd64"
 
 to
 
 "kern/104765: [hp] kernel panic 6.2 prerelease-20061017 amd64"
 
 Does this mean the bug is HP hardware related? No feedback in the PR?
 
Responsible-Changed-From-To: freebsd-bugs->rwatson 
Responsible-Changed-By: rwatson 
Responsible-Changed-When: Tue Nov 14 10:05:50 UTC 2006 
Responsible-Changed-Why:  
Claim ownership, since I've been looking at issues similar or identical 
to this.  Some questions: 

(1) Could you let me know what versions of ip_output.c and tcp_usrreq.c 
you're running with? 

(2) Could you try the most recent patch attached to PR 102412?  This is 
a patch to ip_ctloutput().  I've attached it below, but the chances 
are good that GNATS will mangle the patch. 

Index: ip_output.c 
=================================================================== 
RCS file: /home/ncvs/src/sys/netinet/ip_output.c,v 
retrieving revision 1.242.2.16 
diff -u -r1.242.2.16 ip_output.c 
--- ip_output.c	24 Oct 2006 13:23:03 -0000	1.242.2.16 
+++ ip_output.c	26 Oct 2006 18:20:55 -0000 
@@ -1155,6 +1155,7 @@ 
struct sockopt *sopt; 
{ 
struct	inpcb *inp = sotoinpcb(so); 
+	struct	inpcbinfo *pcbinfo = inp->inp_pcbinfo; 
int	error, optval; 

error = optval = 0; 
@@ -1190,12 +1191,15 @@ 
m_free(m); 
break; 
} 
+			INP_INFO_WLOCK(pcbinfo); 
if (so->so_pcb == NULL) { 
+				INP_INFO_WUNLOCK(pcbinfo); 
m_free(m); 
error = EINVAL; 
break; 
} 
INP_LOCK(inp); 
+			INP_INFO_WUNLOCK(pcbinfo); 
error = ip_pcbopts(inp, sopt->sopt_name, m); 
INP_UNLOCK(inp); 
return (error); 


http://www.freebsd.org/cgi/query-pr.cgi?pr=104765 

From: Robert Watson <rwatson@FreeBSD.org>
To: Kai Gallasch <gallasch@free.de>
Cc: bug-followup@FreeBSD.org
Subject: Re: kern/104765: kernel panic 6.2 prerelease-20061017 amd64
Date: Tue, 14 Nov 2006 15:49:07 +0000 (GMT)

 On Tue, 14 Nov 2006, Kai Gallasch wrote:
 
 > Robert Watson wrote:
 >> Synopsis: kernel panic 6.2 prerelease-20061017 amd64
 >>
 >> Responsible-Changed-From-To: freebsd-bugs->rwatson
 >> Responsible-Changed-By: rwatson
 >> Responsible-Changed-When: Tue Nov 14 10:05:50 UTC 2006
 >> Responsible-Changed-Why:
 >> Claim ownership, since I've been looking at issues similar or identical
 >> to this.  Some questions:
 >>
 >> (1) Could you let me know what versions of ip_output.c and tcp_usrreq.c
 >>     you're running with?
 >
 > /usr/src/sys/netinet/ip_output.c
 >
 > * @(#)ip_output.c 8.3 (Berkeley) 1/21/94
 > * $FreeBSD: src/sys/netinet/ip_output.c,v 1.242.2.16 2006/10/24 13:23:03
 > rwatson Exp $
 
 Sounds good.  I particularly wanted to make sure you had the most recent 
 revision of this file.
 
 > /usr/src/sys/netinet/tcp_usrreq.c
 >
 > * From: @(#)tcp_usrreq.c  8.2 (Berkeley) 1/3/94
 > * $FreeBSD: src/sys/netinet/tcp_usrreq.c,v 1.124.2.3 2006/09/27 09:24:44
 > mux Exp $
 >
 >> (2) Could you try the most recent patch attached to PR 102412?  This is
 >>     a patch to ip_ctloutput().  I've attached it below, but the chances
 >>     are good that GNATS will mangle the patch.
 >
 > ok, I will apply the patch and rebuild.
 >
 > # cat /boot/loader.conf
 > debug.mpsafenet=0
 >
 > If I recompile and reboot - Should I set debug.mpsafenet=1 ?(which is its 
 > default value) Since I set this value to 0 the server didn't crash and 
 > reached 10 days uptime.
 
 Yes, please.  The race in question does exist with debug.mpsafenet=0, but it 
 would only occur during very heavy paging, in which case Giant gets dropped 
 during copyin/copyout. Otherwise, it doesn't.
 
 Thanks,
 
 Robert N M Watson
 Computer Laboratory
 University of Cambridge

From: Kai Gallasch <gallasch@free.de>
To: Robert Watson <rwatson@FreeBSD.org>
Cc: bug-followup@FreeBSD.org
Subject: Re: kern/104765: kernel panic 6.2 prerelease-20061017 amd64
Date: Thu, 23 Nov 2006 00:34:19 +0100

 Robert Watson wrote:
 > 
 > On Tue, 14 Nov 2006, Kai Gallasch wrote:
 > 
 >> Robert Watson wrote:
 >>> Synopsis: kernel panic 6.2 prerelease-20061017 amd64
 >>>
 >>> Responsible-Changed-From-To: freebsd-bugs->rwatson
 >>> Responsible-Changed-By: rwatson
 >>> Responsible-Changed-When: Tue Nov 14 10:05:50 UTC 2006
 >>> Responsible-Changed-Why:
 >>> Claim ownership, since I've been looking at issues similar or identical
 >>> to this.  Some questions:
 >>>
 >>> (1) Could you let me know what versions of ip_output.c and tcp_usrreq.c
 >>>     you're running with?
 >>
 >> /usr/src/sys/netinet/ip_output.c
 >>
 >> * @(#)ip_output.c 8.3 (Berkeley) 1/21/94
 >> * $FreeBSD: src/sys/netinet/ip_output.c,v 1.242.2.16 2006/10/24 13:23:03
 >> rwatson Exp $
 > 
 > Sounds good.  I particularly wanted to make sure you had the most recent
 > revision of this file.
 > 
 >> /usr/src/sys/netinet/tcp_usrreq.c
 >>
 >> * From: @(#)tcp_usrreq.c  8.2 (Berkeley) 1/3/94
 >> * $FreeBSD: src/sys/netinet/tcp_usrreq.c,v 1.124.2.3 2006/09/27 09:24:44
 >> mux Exp $
 >>
 >>> (2) Could you try the most recent patch attached to PR 102412?  This is
 >>>     a patch to ip_ctloutput().  I've attached it below, but the chances
 >>>     are good that GNATS will mangle the patch.
 >>
 >> ok, I will apply the patch and rebuild.
 
 
 Another crash. (following the previous two crashes after applying your
 patch) Here is the output of kgdb.
 
 To keep bug-followup@freebsd.org for kern/104765 up to date I am
 attaching output of the previous two crashdumps also.
 
 -K.
 
 
 --- kgdb output, kernel panic 20071123 - kern/104765 ---
 
 panic: page fault
 cpuid = 3
 Uptime: 21h6m5s
 Dumping 1023 MB (2 chunks)
   chunk 0: 1MB (156 pages) ... ok
   chunk 1: 1023MB (261880 pages) 1008 992 976 960 944 928 912 896 880
 864 848 832 816 800 784 768 752 736 720 704 688 672 656 640 624 608 592
 576 560 544 528 512 496 480 464 448 432 416 400 384 368 352 336 320 304
 288 272 256 240 224 208 192 176 160 144 128 112 96 80 64 48 32 16
 
 #0  doadump () at pcpu.h:172
 172     pcpu.h: No such file or directory.
         in pcpu.h
 (kgdb) bt
 #0  doadump () at pcpu.h:172
 #1  0x0000000000000004 in ?? ()
 #2  0xffffffff803f9557 in boot (howto=260) at
 /usr/src/sys/kern/kern_shutdown.c:409
 #3  0xffffffff803f9bf1 in panic (fmt=0xffffff00011924c0 "?6\236 ") at
 /usr/src/sys/kern/kern_shutdown.c:565
 #4  0xffffffff8061935f in trap_fatal (frame=0xffffff00011924c0,
 eva=18446742974745163440) at /usr/src/sys/amd64/amd64/trap.c:660
 #5  0xffffffff8061967f in trap_pfault (frame=0xffffffffa7ff6820,
 usermode=0) at /usr/src/sys/amd64/amd64/trap.c:573
 #6  0xffffffff80619933 in trap (frame=
       {tf_rdi = -1099441690208, tf_rsi = -1476433104, tf_rdx =
 -2138554176, tf_rcx = -2138553872, tf_r8 = -1099493202752, tf_r9 =
 -1098475947368, tf_rax = 22, tf_rbx = -1476433104, tf_rbp =
 -1098734529648, tf_r10 = -2138030408, tf_r11 = 0, tf_r12 =
 -1099441690208, tf_r13 = 0, tf_r14 = 0, tf_r15 = 1, tf_trapno = 12,
 tf_addr = 88, tf_flags = -2143224709, tf_err = 0, tf_rip = -2142524282,
 tf_cs = 8, tf_rflags = 66118, tf_rsp = -1476433696, tf_ss = 16}) at
 /usr/src/sys/amd64/amd64/trap.c:352
 #7  0xffffffff80604b2b in calltrap () at
 /usr/src/sys/amd64/amd64/exception.S:168
 #8  0xffffffff804bac86 in ip_ctloutput (so=0xffffff00042b29a0,
 sopt=0xffffffffa7ff6b30) at /usr/src/sys/netinet/ip_output.c:1157
 #9  0xffffffff804cd1c5 in tcp_ctloutput (so=0xffffff00042b29a0,
 sopt=0xffffffffa7ff6b30) at /usr/src/sys/netinet/tcp_usrreq.c:1038
 #10 0xffffffff80441c38 in sosetopt (so=0xffffff00042b29a0,
 sopt=0xffffffffa7ff6b30) at /usr/src/sys/kern/uipc_socket.c:1563
 #11 0xffffffff80448113 in kern_setsockopt (td=0xffffff00011924c0,
 s=774342528, level=-2138030408, name=-2138553872,
 val=0xffffff00011924c0, valseg=1035680408,
     valsize=69937568) at /usr/src/sys/kern/uipc_syscalls.c:1351
 #12 0xffffffff8044817e in setsockopt (td=0xffffff00042b29a0,
 uap=0xffffffffa7ff6b30) at /usr/src/sys/kern/uipc_syscalls.c:1307
 #13 0xffffffff8061a1b1 in syscall (frame=
       {tf_rdi = 0, tf_rsi = 0, tf_rdx = 1, tf_rcx = 0, tf_r8 = 0, tf_r9
 = 140737488350072, tf_rax = 105, tf_rbx = 0, tf_rbp = 3, tf_r10 =
 -3689348814741910323, tf_r11 = 514, tf_r12 = 140737488350480, tf_r13 =
 34368406752, tf_r14 = 0, tf_r15 = 0, tf_trapno = 12, tf_addr = 5283944,
 tf_flags = 12, tf_err = 2, tf_rip = 34366834188, tf_cs = 43, tf_rflags =
 518, tf_rsp = 140737488350184, tf_ss = 35}) at
 /usr/src/sys/amd64/amd64/trap.c:792
 #14 0xffffffff80604cc8 in Xfast_syscall () at
 /usr/src/sys/amd64/amd64/exception.S:270
 #15 0x00000008006c460c in ?? ()
 Previous frame inner to this frame (corrupt stack?)
 
 
 
 --- kgdb output, kernel panic 20071120 - kern/104765 ---
 
 #0  doadump () at pcpu.h:172
 172     pcpu.h: No such file or directory.
         in pcpu.h
 (kgdb) bt
 #0  doadump () at pcpu.h:172
 #1  0x0000000000000004 in ?? ()
 #2  0xffffffff803f9557 in boot (howto=260) at
 /usr/src/sys/kern/kern_shutdown.c:409
 #3  0xffffffff803f9bf1 in panic (fmt=0xffffff000f456980 "\b??\"") at
 /usr/src/sys/kern/kern_shutdown.c:565
 #4  0xffffffff8061935f in trap_fatal (frame=0xffffff000f456980,
 eva=18446742974782290440) at /usr/src/sys/amd64/amd64/trap.c:660
 #5  0xffffffff8061967f in trap_pfault (frame=0xffffffffa7d4e820,
 usermode=0) at /usr/src/sys/amd64/amd64/trap.c:573
 #6  0xffffffff80619933 in trap (frame=
       {tf_rdi = -1098782579504, tf_rsi = -1479218384, tf_rdx =
 -2138554176, tf_rcx = -2138553872, tf_r8 = -1099255420544, tf_r9 =
 -1098475947368, tf_rax = 22, tf_rbx = -1479218384, tf_rbp =
 -1099106545056, tf_r10 = -2138030408, tf_r11 = 0, tf_r12 =
 -1098782579504, tf_r13 = 0, tf_r14 = 0, tf_r15 = 1, tf_trapno = 12,
 tf_addr = 88, tf_flags = -2143224709, tf_err = 0, tf_rip = -2142524282,
 tf_cs = 8, tf_rflags = 66118, tf_rsp = -1479218976, tf_ss = 16}) at
 /usr/src/sys/amd64/amd64/trap.c:352
 #7  0xffffffff80604b2b in calltrap () at
 /usr/src/sys/amd64/amd64/exception.S:168
 #8  0xffffffff804bac86 in ip_ctloutput (so=0xffffff002b7464d0,
 sopt=0xffffffffa7d4eb30) at /usr/src/sys/netinet/ip_output.c:1157
 #9  0xffffffff804cd1c5 in tcp_ctloutput (so=0xffffff002b7464d0,
 sopt=0xffffffffa7d4eb30) at /usr/src/sys/netinet/tcp_usrreq.c:1038
 #10 0xffffffff80441c38 in sosetopt (so=0xffffff002b7464d0,
 sopt=0xffffffffa7d4eb30) at /usr/src/sys/kern/uipc_socket.c:1563
 #11 0xffffffff80448113 in kern_setsockopt (td=0xffffff000f456980,
 s=232190912, level=-2138030408, name=-2138553872,
 val=0xffffff000f456980, valseg=1035680408,
     valsize=729048272) at /usr/src/sys/kern/uipc_syscalls.c:1351
 #12 0xffffffff8044817e in setsockopt (td=0xffffff002b7464d0,
 uap=0xffffffffa7d4eb30) at /usr/src/sys/kern/uipc_syscalls.c:1307
 #13 0xffffffff8061a1b1 in syscall (frame=
       {tf_rdi = 0, tf_rsi = 0, tf_rdx = 1, tf_rcx = 0, tf_r8 = 0, tf_r9
 = 140737488350072, tf_rax = 105, tf_rbx = 0, tf_rbp = 3, tf_r10 =
 -2138030408, tf_r11 = 518, tf_r12 = 140737488350480, tf_r13 =
 34368406752, tf_r14 = 0, tf_r15 = 0, tf_trapno = 12, tf_addr =
 34366834176, tf_flags = 12, tf_err = 2, tf_rip = 34366834188, tf_cs =
 43, tf_rflags = 518, tf_rsp = 140737488350184, tf_ss = 35}) at
 /usr/src/sys/amd64/amd64/trap.c:792
 #14 0xffffffff80604cc8 in Xfast_syscall () at
 /usr/src/sys/amd64/amd64/exception.S:270
 #15 0x00000008006c460c in ?? ()
 Previous frame inner to this frame (corrupt stack?)
 
 
 --- kgdb output, kernel panic 20071116 - kern/104765 ---
 
 
 (kgdb) bt
 #0  doadump () at pcpu.h:172
 #1  0x0000000000000004 in ?? ()
 #2  0xffffffff803f9557 in boot (howto=260) at
 /usr/src/sys/kern/kern_shutdown.c:409
 #3  0xffffffff803f9bf1 in panic (fmt=0xffffff0022251260 "") at
 /usr/src/sys/kern/kern_shutdown.c:565
 #4  0xffffffff8061935f in trap_fatal (frame=0xffffff0022251260,
 eva=18446742975006236672) at /usr/src/sys/amd64/amd64/trap.c:660
 #5  0xffffffff8061967f in trap_pfault (frame=0xffffffffa7c8b820,
 usermode=0) at /usr/src/sys/amd64/amd64/trap.c:573
 #6  0xffffffff80619933 in trap (frame=
       {tf_rdi = -1099263329888, tf_rsi = -1480017104, tf_rdx =
 -2138554176, tf_rcx = -2138553872, tf_r8 = -1098938772896, tf_r9 =
 -1098475947368, tf_rax = 22, tf_rbx = -1480017104, tf_rbp =
 -1099024018240, tf_r10 = -2138030408, tf_r11 = 0, tf_r12 =
 -1099263329888, tf_r13 = 0, tf_r14 = 0, tf_r15 = 1, tf_trapno = 12,
 tf_addr = 88, tf_flags = -2143224709, tf_err = 0, tf_rip = -2142524282,
 tf_cs = 8, tf_rflags = 66118, tf_rsp = -1480017696, tf_ss = 16}) at
 /usr/src/sys/amd64/amd64/trap.c:352
 #7  0xffffffff80604b2b in calltrap () at
 /usr/src/sys/amd64/amd64/exception.S:168
 #8  0xffffffff804bac86 in ip_ctloutput (so=0xffffff000eccb9a0,
 sopt=0xffffffffa7c8bb30) at /usr/src/sys/netinet/ip_output.c:1157
 #9  0xffffffff804cd1c5 in tcp_ctloutput (so=0xffffff000eccb9a0,
 sopt=0xffffffffa7c8bb30) at /usr/src/sys/netinet/tcp_usrreq.c:1038
 #10 0xffffffff80441c38 in sosetopt (so=0xffffff000eccb9a0,
 sopt=0xffffffffa7c8bb30) at /usr/src/sys/kern/uipc_socket.c:1563
 #11 0xffffffff80448113 in kern_setsockopt (td=0xffffff0022251260,
 s=421725600, level=-2138030408, name=-2138553872,
 val=0xffffff0022251260, valseg=1035680408,
     valsize=248297888) at /usr/src/sys/kern/uipc_syscalls.c:1351
 #12 0xffffffff8044817e in setsockopt (td=0xffffff000eccb9a0,
 uap=0xffffffffa7c8bb30) at /usr/src/sys/kern/uipc_syscalls.c:1307
 #13 0xffffffff8061a1b1 in syscall (frame=
       {tf_rdi = 0, tf_rsi = 0, tf_rdx = 1, tf_rcx = 0, tf_r8 = 0, tf_r9
 = 140737488350072, tf_rax = 105, tf_rbx = 0, tf_rbp = 3, tf_r10 =
 -3689348814741910323, tf_r11 = 514, tf_r12 = 140737488350480, tf_r13 =
 34368406752, tf_r14 = 0, tf_r15 = 0, tf_trapno = 12, tf_addr = 5283944,
 tf_flags = 12, tf_err = 2, tf_rip = 34366834188, tf_cs = 43, tf_rflags =
 518, tf_rsp = 140737488350184, tf_ss = 35}) at
 /usr/src/sys/amd64/amd64/trap.c:792
 #14 0xffffffff80604cc8 in Xfast_syscall () at
 /usr/src/sys/amd64/amd64/exception.S:270
 #15 0x00000008006c460c in ?? ()
 Previous frame inner to this frame (corrupt stack?)
 
 
 

From: Robert Watson <rwatson@FreeBSD.org>
To: FreeBSD-gnats-submit@FreeBSD.org
Cc:  
Subject: Re: kern/104765: kernel panic 6.2 prerelease-20061017 amd64 (fwd)
Date: Fri, 24 Nov 2006 14:05:14 +0000 (GMT)

 Append follow-up to PR.
 
 
 
 Robert N M Watson
 Computer Laboratory
 University of Cambridge
 
 ---------- Forwarded message ----------
 Date: Thu, 16 Nov 2006 11:47:29 +0100
 From: Kai Gallasch <gallasch@free.de>
 To: Robert Watson <rwatson@FreeBSD.org>
 Subject: Re: kern/104765: kernel panic 6.2 prerelease-20061017 amd64
 
 Robert Watson wrote:
 >
 > On Tue, 14 Nov 2006, Kai Gallasch wrote:
 >
 >> Robert Watson wrote:
 >>> Synopsis: kernel panic 6.2 prerelease-20061017 amd64
 >>>
 >>> Responsible-Changed-From-To: freebsd-bugs->rwatson
 >>> Responsible-Changed-By: rwatson
 >>> Responsible-Changed-When: Tue Nov 14 10:05:50 UTC 2006
 >>> Responsible-Changed-Why:
 >>> Claim ownership, since I've been looking at issues similar or identical
 >>> to this.  Some questions:
 >>>
 >>> (1) Could you let me know what versions of ip_output.c and tcp_usrreq.c
 >>>     you're running with?
 >>
 >> /usr/src/sys/netinet/ip_output.c
 >>
 >> * @(#)ip_output.c 8.3 (Berkeley) 1/21/94
 >> * $FreeBSD: src/sys/netinet/ip_output.c,v 1.242.2.16 2006/10/24 13:23:03
 >> rwatson Exp $
 >
 > Sounds good.  I particularly wanted to make sure you had the most recent
 > revision of this file.
 >
 >> /usr/src/sys/netinet/tcp_usrreq.c
 >>
 >> * From: @(#)tcp_usrreq.c  8.2 (Berkeley) 1/3/94
 >> * $FreeBSD: src/sys/netinet/tcp_usrreq.c,v 1.124.2.3 2006/09/27 09:24:44
 >> mux Exp $
 >>
 >>> (2) Could you try the most recent patch attached to PR 102412?  This is
 >>>     a patch to ip_ctloutput().  I've attached it below, but the chances
 >>>     are good that GNATS will mangle the patch.
 >>
 >> ok, I will apply the patch and rebuild.
 >>
 >> # cat /boot/loader.conf
 >> debug.mpsafenet=0
 >>
 >> If I recompile and reboot - Should I set debug.mpsafenet=1 ?(which is
 >> its default value) Since I set this value to 0 the server didn't crash
 >> and reached 10 days uptime.
 >
 > Yes, please.  The race in question does exist with debug.mpsafenet=0,
 > but it would only occur during very heavy paging, in which case Giant
 > gets dropped during copyin/copyout. Otherwise, it doesn't.
 
 Hi Robert.
 After 1d 7h the server crashed again. Here is the backtrace.
 
 
 # kgdb /usr/obj/usr/src/sys/SMP/kernel.debug /var/crash/vmcore.4
 
 [GDB will not be able to debug user-mode threads:
 /usr/lib/libthread_db.so: Undefined symbol "ps_pglobal_lookup"]
 GNU gdb 6.1.1 [FreeBSD]
 Copyright 2004 Free Software Foundation, Inc.
 GDB is free software, covered by the GNU General Public License, and you are
 welcome to change it and/or distribute copies of it under certain
 conditions.
 Type "show copying" to see the conditions.
 There is absolutely no warranty for GDB.  Type "show warranty" for details.
 This GDB was configured as "amd64-marcel-freebsd".
 
 Unread portion of the kernel message buffer:
 
 panic: page fault
 cpuid = 3
 Uptime: 1d7h39m57s
 Dumping 1023 MB (2 chunks)
    chunk 0: 1MB (156 pages) ... ok
    chunk 1: 1023MB (261880 pages) 1008 992 976 960 944 928 912 896 880
 864 848 832 816 800 784 768 752 736 720 704 688 672 656 640 624 608 592
 576 560 544 528 512 496 480 464 448 432 416 400 384 368 352 336 320 304
 288 272 256 240 224 208 192 176 160 144 128 112 96 80 64 48 32 16
 
 #0  doadump () at pcpu.h:172
 172             __asm __volatile("movq %%gs:0,%0" : "=r" (td));
 
 (kgdb) bt
 #0  doadump () at pcpu.h:172
 #1  0x0000000000000004 in ?? ()
 #2  0xffffffff803f9557 in boot (howto=260) at
 /usr/src/sys/kern/kern_shutdown.c:409
 #3  0xffffffff803f9bf1 in panic (fmt=0xffffff0022251260 "") at
 /usr/src/sys/kern/kern_shutdown.c:565
 #4  0xffffffff8061935f in trap_fatal (frame=0xffffff0022251260,
 eva=18446742975006236672) at /usr/src/sys/amd64/amd64/trap.c:660
 #5  0xffffffff8061967f in trap_pfault (frame=0xffffffffa7c8b820,
 usermode=0) at /usr/src/sys/amd64/amd64/trap.c:573
 #6  0xffffffff80619933 in trap (frame=
        {tf_rdi = -1099263329888, tf_rsi = -1480017104, tf_rdx =
 -2138554176, tf_rcx = -2138553872, tf_r8 = -1098938772896, tf_r9 =
 -1098475947368, tf_rax = 22, tf_rbx = -1480017104, tf_rbp =
 -1099024018240, tf_r10 = -2138030408, tf_r11 = 0, tf_r12 =
 -1099263329888, tf_r13 = 0, tf_r14 = 0, tf_r15 = 1, tf_trapno = 12,
 tf_addr = 88, tf_flags = -2143224709, tf_err = 0, tf_rip = -2142524282,
 tf_cs = 8, tf_rflags = 66118, tf_rsp = -1480017696, tf_ss = 16}) at
 /usr/src/sys/amd64/amd64/trap.c:352
 #7  0xffffffff80604b2b in calltrap () at
 /usr/src/sys/amd64/amd64/exception.S:168
 #8  0xffffffff804bac86 in ip_ctloutput (so=0xffffff000eccb9a0,
 sopt=0xffffffffa7c8bb30) at /usr/src/sys/netinet/ip_output.c:1157
 #9  0xffffffff804cd1c5 in tcp_ctloutput (so=0xffffff000eccb9a0,
 sopt=0xffffffffa7c8bb30) at /usr/src/sys/netinet/tcp_usrreq.c:1038
 #10 0xffffffff80441c38 in sosetopt (so=0xffffff000eccb9a0,
 sopt=0xffffffffa7c8bb30) at /usr/src/sys/kern/uipc_socket.c:1563
 #11 0xffffffff80448113 in kern_setsockopt (td=0xffffff0022251260,
 s=421725600, level=-2138030408, name=-2138553872, val=0xffffff0022251260,
      valseg=1035680408, valsize=248297888) at
 /usr/src/sys/kern/uipc_syscalls.c:1351
 #12 0xffffffff8044817e in setsockopt (td=0xffffff000eccb9a0,
 uap=0xffffffffa7c8bb30) at /usr/src/sys/kern/uipc_syscalls.c:1307
 #13 0xffffffff8061a1b1 in syscall (frame=
        {tf_rdi = 0, tf_rsi = 0, tf_rdx = 1, tf_rcx = 0, tf_r8 = 0, tf_r9
 = 140737488350072, tf_rax = 105, tf_rbx = 0, tf_rbp = 3, tf_r10 =
 -3689348814741910323, tf_r11 = 514, tf_r12 = 140737488350480, tf_r13 =
 34368406752, tf_r14 = 0, tf_r15 = 0, tf_trapno = 12, tf_addr = 5283944,
 tf_flags = 12, tf_err = 2, tf_rip = 34366834188, tf_cs = 43, tf_rflags =
 518, tf_rsp = 140737488350184, tf_ss = 35}) at
 /usr/src/sys/amd64/amd64/trap.c:792
 #14 0xffffffff80604cc8 in Xfast_syscall () at
 /usr/src/sys/amd64/amd64/exception.S:270
 #15 0x00000008006c460c in ?? ()
 Previous frame inner to this frame (corrupt stack?)
 (kgdb)
 
 
 BTW, make.conf:
 
 # cat /etc/make.conf
 
 CFLAGS=         -O -pipe
 CPUTYPE=        opteron
 NO_PROFILE=     true
 
 
 

From: Robert Watson <rwatson@FreeBSD.org>
To: FreeBSD-gnats-submit@FreeBSD.org
Cc:  
Subject: Re: kern/104765: kernel panic 6.2 prerelease-20061017 amd64 (fwd)
Date: Fri, 24 Nov 2006 14:05:43 +0000 (GMT)

 Append followup to the PR.
 
 Robert N M Watson
 Computer Laboratory
 University of Cambridge
 
 ---------- Forwarded message ----------
 Date: Mon, 20 Nov 2006 17:06:23 +0100
 From: Kai Gallasch <gallasch@free.de>
 To: Robert Watson <rwatson@FreeBSD.org>
 Subject: Re: kern/104765: kernel panic 6.2 prerelease-20061017 amd64
 
 Robert Watson schrieb:
 >
 > On Tue, 14 Nov 2006, Kai Gallasch wrote:
 >
 >> Robert Watson wrote:
 >>> Synopsis: kernel panic 6.2 prerelease-20061017 amd64
 >>>
 >>> Responsible-Changed-From-To: freebsd-bugs->rwatson
 >>> Responsible-Changed-By: rwatson
 >>> Responsible-Changed-When: Tue Nov 14 10:05:50 UTC 2006
 >>> Responsible-Changed-Why:
 >>> Claim ownership, since I've been looking at issues similar or identical
 >>> to this.  Some questions:
 >>>
 >>> (1) Could you let me know what versions of ip_output.c and tcp_usrreq.c
 >>>     you're running with?
 >>
 >> /usr/src/sys/netinet/ip_output.c
 >>
 >> * @(#)ip_output.c 8.3 (Berkeley) 1/21/94
 >> * $FreeBSD: src/sys/netinet/ip_output.c,v 1.242.2.16 2006/10/24 13:23:03
 >> rwatson Exp $
 >
 > Sounds good.  I particularly wanted to make sure you had the most recent
 > revision of this file.
 >
 >> /usr/src/sys/netinet/tcp_usrreq.c
 >>
 >> * From: @(#)tcp_usrreq.c  8.2 (Berkeley) 1/3/94
 >> * $FreeBSD: src/sys/netinet/tcp_usrreq.c,v 1.124.2.3 2006/09/27 09:24:44
 >> mux Exp $
 >>
 >>> (2) Could you try the most recent patch attached to PR 102412?  This is
 >>>     a patch to ip_ctloutput().  I've attached it below, but the chances
 >>>     are good that GNATS will mangle the patch.
 >>
 >> ok, I will apply the patch and rebuild.
 >>
 >> # cat /boot/loader.conf
 >> debug.mpsafenet=0
 >>
 >> If I recompile and reboot - Should I set debug.mpsafenet=1 ?(which is
 >> its default value) Since I set this value to 0 the server didn't crash
 >> and reached 10 days uptime.
 >
 > Yes, please.  The race in question does exist with debug.mpsafenet=0,
 > but it would only occur during very heavy paging, in which case Giant
 > gets dropped during copyin/copyout. Otherwise, it doesn't.
 
 Hi.
 
 Again a kernel panic. This is the second one with your patch applied.
 Attached is the backtrace of the crash. Must the debug kernel
 "kernel.debug" be installed as running kernel on the server, or is it
 sufficient for debugging purposes and kgdb usage to have it available in
 /usr/obj/usr/src/sys/SMP/kernel.debug ?
 
 Cheers,
 Kai.
 
 -- backtrace crash 20061120 --
 
 
 # kgdb /usr/obj/usr/src/sys/SMP/kernel.debug /var/crash/vmcore.5
 [GDB will not be able to debug user-mode threads:
 /usr/lib/libthread_db.so: Undefined symbol "ps_pglobal_lookup"]
 GNU gdb 6.1.1 [FreeBSD]
 Copyright 2004 Free Software Foundation, Inc.
 GDB is free software, covered by the GNU General Public License, and you are
 welcome to change it and/or distribute copies of it under certain
 conditions.
 Type "show copying" to see the conditions.
 There is absolutely no warranty for GDB.  Type "show warranty" for details.
 This GDB was configured as "amd64-marcel-freebsd".
 
 Unread portion of the kernel message buffer:
 
 panic: page fault
 cpuid = 3
 Uptime: 4d8h19m7s
 Dumping 1023 MB (2 chunks)
    chunk 0: 1MB (156 pages) ... ok
    chunk 1: 1023MB (261880 pages) 1008 992 976 960 944 928 912 896 880
 864 848 832 816 800 784 768 752 736 720 704 688 672 656 640 624 608 592
 576 560 544 528 512 496 480 464 448 432 416 400 384 368 352 336 320 304
 288 272 256 240 224 208 192 176 160 144 128 112 96 80 64 48 32 16
 
 #0  doadump () at pcpu.h:172
 172     pcpu.h: No such file or directory.
          in pcpu.h
 (kgdb) bt
 #0  doadump () at pcpu.h:172
 #1  0x0000000000000004 in ?? ()
 #2  0xffffffff803f9557 in boot (howto=260) at
 /usr/src/sys/kern/kern_shutdown.c:409
 #3  0xffffffff803f9bf1 in panic (fmt=0xffffff000f456980 "\b??\"") at
 /usr/src/sys/kern/kern_shutdown.c:565
 #4  0xffffffff8061935f in trap_fatal (frame=0xffffff000f456980,
 eva=18446742974782290440) at /usr/src/sys/amd64/amd64/trap.c:660
 #5  0xffffffff8061967f in trap_pfault (frame=0xffffffffa7d4e820,
 usermode=0) at /usr/src/sys/amd64/amd64/trap.c:573
 #6  0xffffffff80619933 in trap (frame=
        {tf_rdi = -1098782579504, tf_rsi = -1479218384, tf_rdx =
 -2138554176, tf_rcx = -2138553872, tf_r8 = -1099255420544, tf_r9 =
 -1098475947368, tf_rax = 22, tf_rbx = -1479218384, tf_rbp =
 -1099106545056, tf_r10 = -2138030408, tf_r11 = 0, tf_r12 =
 -1098782579504, tf_r13 = 0, tf_r14 = 0, tf_r15 = 1, tf_trapno = 12,
 tf_addr = 88, tf_flags = -2143224709, tf_err = 0, tf_rip = -2142524282,
 tf_cs = 8, tf_rflags = 66118, tf_rsp = -1479218976, tf_ss = 16})
      at /usr/src/sys/amd64/amd64/trap.c:352
 #7  0xffffffff80604b2b in calltrap () at
 /usr/src/sys/amd64/amd64/exception.S:168
 #8  0xffffffff804bac86 in ip_ctloutput (so=0xffffff002b7464d0,
 sopt=0xffffffffa7d4eb30) at /usr/src/sys/netinet/ip_output.c:1157
 #9  0xffffffff804cd1c5 in tcp_ctloutput (so=0xffffff002b7464d0,
 sopt=0xffffffffa7d4eb30) at /usr/src/sys/netinet/tcp_usrreq.c:1038
 #10 0xffffffff80441c38 in sosetopt (so=0xffffff002b7464d0,
 sopt=0xffffffffa7d4eb30) at /usr/src/sys/kern/uipc_socket.c:1563
 #11 0xffffffff80448113 in kern_setsockopt (td=0xffffff000f456980,
 s=232190912, level=-2138030408, name=-2138553872,
 val=0xffffff000f456980, valseg=1035680408,
      valsize=729048272) at /usr/src/sys/kern/uipc_syscalls.c:1351
 #12 0xffffffff8044817e in setsockopt (td=0xffffff002b7464d0,
 uap=0xffffffffa7d4eb30) at /usr/src/sys/kern/uipc_syscalls.c:1307
 #13 0xffffffff8061a1b1 in syscall (frame=
        {tf_rdi = 0, tf_rsi = 0, tf_rdx = 1, tf_rcx = 0, tf_r8 = 0, tf_r9
 = 140737488350072, tf_rax = 105, tf_rbx = 0, tf_rbp = 3, tf_r10 =
 -2138030408, tf_r11 = 518, tf_r12 = 140737488350480, tf_r13 =
 34368406752, tf_r14 = 0, tf_r15 = 0, tf_trapno = 12, tf_addr =
 34366834176, tf_flags = 12, tf_err = 2, tf_rip = 34366834188, tf_cs =
 43, tf_rflags = 518, tf_rsp = 140737488350184, tf_ss = 35}) at
 /usr/src/sys/amd64/amd64/trap.c:792
 #14 0xffffffff80604cc8 in Xfast_syscall () at
 /usr/src/sys/amd64/amd64/exception.S:270
 #15 0x00000008006c460c in ?? ()
 Previous frame inner to this frame (corrupt stack?)
 
 
 -- backtrace crash 20061120 --
 
 
 
 

From: Robert Watson <rwatson@FreeBSD.org>
To: FreeBSD-gnats-submit@FreeBSD.org
Cc:  
Subject: Re: kern/104765: kernel panic 6.2 prerelease-20061017 amd64 (fwd)
Date: Fri, 24 Nov 2006 14:06:13 +0000 (GMT)

 Append followup to PR.
 
 
 
 Robert N M Watson
 Computer Laboratory
 University of Cambridge
 
 ---------- Forwarded message ----------
 Date: Thu, 23 Nov 2006 00:34:19 +0100
 From: Kai Gallasch <gallasch@free.de>
 To: Robert Watson <rwatson@FreeBSD.org>
 Cc: bug-followup@FreeBSD.org
 Subject: Re: kern/104765: kernel panic 6.2 prerelease-20061017 amd64
 
 Robert Watson wrote:
 >
 > On Tue, 14 Nov 2006, Kai Gallasch wrote:
 >
 >> Robert Watson wrote:
 >>> Synopsis: kernel panic 6.2 prerelease-20061017 amd64
 >>>
 >>> Responsible-Changed-From-To: freebsd-bugs->rwatson
 >>> Responsible-Changed-By: rwatson
 >>> Responsible-Changed-When: Tue Nov 14 10:05:50 UTC 2006
 >>> Responsible-Changed-Why:
 >>> Claim ownership, since I've been looking at issues similar or identical
 >>> to this.  Some questions:
 >>>
 >>> (1) Could you let me know what versions of ip_output.c and tcp_usrreq.c
 >>>     you're running with?
 >>
 >> /usr/src/sys/netinet/ip_output.c
 >>
 >> * @(#)ip_output.c 8.3 (Berkeley) 1/21/94
 >> * $FreeBSD: src/sys/netinet/ip_output.c,v 1.242.2.16 2006/10/24 13:23:03
 >> rwatson Exp $
 >
 > Sounds good.  I particularly wanted to make sure you had the most recent
 > revision of this file.
 >
 >> /usr/src/sys/netinet/tcp_usrreq.c
 >>
 >> * From: @(#)tcp_usrreq.c  8.2 (Berkeley) 1/3/94
 >> * $FreeBSD: src/sys/netinet/tcp_usrreq.c,v 1.124.2.3 2006/09/27 09:24:44
 >> mux Exp $
 >>
 >>> (2) Could you try the most recent patch attached to PR 102412?  This is
 >>>     a patch to ip_ctloutput().  I've attached it below, but the chances
 >>>     are good that GNATS will mangle the patch.
 >>
 >> ok, I will apply the patch and rebuild.
 
 
 Another crash. (following the previous two crashes after applying your
 patch) Here is the output of kgdb.
 
 To keep bug-followup@freebsd.org for kern/104765 up to date I am
 attaching output of the previous two crashdumps also.
 
 -K.
 
 
 --- kgdb output, kernel panic 20071123 - kern/104765 ---
 
 panic: page fault
 cpuid = 3
 Uptime: 21h6m5s
 Dumping 1023 MB (2 chunks)
    chunk 0: 1MB (156 pages) ... ok
    chunk 1: 1023MB (261880 pages) 1008 992 976 960 944 928 912 896 880
 864 848 832 816 800 784 768 752 736 720 704 688 672 656 640 624 608 592
 576 560 544 528 512 496 480 464 448 432 416 400 384 368 352 336 320 304
 288 272 256 240 224 208 192 176 160 144 128 112 96 80 64 48 32 16
 
 #0  doadump () at pcpu.h:172
 172     pcpu.h: No such file or directory.
          in pcpu.h
 (kgdb) bt
 #0  doadump () at pcpu.h:172
 #1  0x0000000000000004 in ?? ()
 #2  0xffffffff803f9557 in boot (howto=260) at
 /usr/src/sys/kern/kern_shutdown.c:409
 #3  0xffffffff803f9bf1 in panic (fmt=0xffffff00011924c0 "?6\236 ") at
 /usr/src/sys/kern/kern_shutdown.c:565
 #4  0xffffffff8061935f in trap_fatal (frame=0xffffff00011924c0,
 eva=18446742974745163440) at /usr/src/sys/amd64/amd64/trap.c:660
 #5  0xffffffff8061967f in trap_pfault (frame=0xffffffffa7ff6820,
 usermode=0) at /usr/src/sys/amd64/amd64/trap.c:573
 #6  0xffffffff80619933 in trap (frame=
        {tf_rdi = -1099441690208, tf_rsi = -1476433104, tf_rdx =
 -2138554176, tf_rcx = -2138553872, tf_r8 = -1099493202752, tf_r9 =
 -1098475947368, tf_rax = 22, tf_rbx = -1476433104, tf_rbp =
 -1098734529648, tf_r10 = -2138030408, tf_r11 = 0, tf_r12 =
 -1099441690208, tf_r13 = 0, tf_r14 = 0, tf_r15 = 1, tf_trapno = 12,
 tf_addr = 88, tf_flags = -2143224709, tf_err = 0, tf_rip = -2142524282,
 tf_cs = 8, tf_rflags = 66118, tf_rsp = -1476433696, tf_ss = 16}) at
 /usr/src/sys/amd64/amd64/trap.c:352
 #7  0xffffffff80604b2b in calltrap () at
 /usr/src/sys/amd64/amd64/exception.S:168
 #8  0xffffffff804bac86 in ip_ctloutput (so=0xffffff00042b29a0,
 sopt=0xffffffffa7ff6b30) at /usr/src/sys/netinet/ip_output.c:1157
 #9  0xffffffff804cd1c5 in tcp_ctloutput (so=0xffffff00042b29a0,
 sopt=0xffffffffa7ff6b30) at /usr/src/sys/netinet/tcp_usrreq.c:1038
 #10 0xffffffff80441c38 in sosetopt (so=0xffffff00042b29a0,
 sopt=0xffffffffa7ff6b30) at /usr/src/sys/kern/uipc_socket.c:1563
 #11 0xffffffff80448113 in kern_setsockopt (td=0xffffff00011924c0,
 s=774342528, level=-2138030408, name=-2138553872,
 val=0xffffff00011924c0, valseg=1035680408,
      valsize=69937568) at /usr/src/sys/kern/uipc_syscalls.c:1351
 #12 0xffffffff8044817e in setsockopt (td=0xffffff00042b29a0,
 uap=0xffffffffa7ff6b30) at /usr/src/sys/kern/uipc_syscalls.c:1307
 #13 0xffffffff8061a1b1 in syscall (frame=
        {tf_rdi = 0, tf_rsi = 0, tf_rdx = 1, tf_rcx = 0, tf_r8 = 0, tf_r9
 = 140737488350072, tf_rax = 105, tf_rbx = 0, tf_rbp = 3, tf_r10 =
 -3689348814741910323, tf_r11 = 514, tf_r12 = 140737488350480, tf_r13 =
 34368406752, tf_r14 = 0, tf_r15 = 0, tf_trapno = 12, tf_addr = 5283944,
 tf_flags = 12, tf_err = 2, tf_rip = 34366834188, tf_cs = 43, tf_rflags =
 518, tf_rsp = 140737488350184, tf_ss = 35}) at
 /usr/src/sys/amd64/amd64/trap.c:792
 #14 0xffffffff80604cc8 in Xfast_syscall () at
 /usr/src/sys/amd64/amd64/exception.S:270
 #15 0x00000008006c460c in ?? ()
 Previous frame inner to this frame (corrupt stack?)
 
 
 
 --- kgdb output, kernel panic 20071120 - kern/104765 ---
 
 #0  doadump () at pcpu.h:172
 172     pcpu.h: No such file or directory.
          in pcpu.h
 (kgdb) bt
 #0  doadump () at pcpu.h:172
 #1  0x0000000000000004 in ?? ()
 #2  0xffffffff803f9557 in boot (howto=260) at
 /usr/src/sys/kern/kern_shutdown.c:409
 #3  0xffffffff803f9bf1 in panic (fmt=0xffffff000f456980 "\b??\"") at
 /usr/src/sys/kern/kern_shutdown.c:565
 #4  0xffffffff8061935f in trap_fatal (frame=0xffffff000f456980,
 eva=18446742974782290440) at /usr/src/sys/amd64/amd64/trap.c:660
 #5  0xffffffff8061967f in trap_pfault (frame=0xffffffffa7d4e820,
 usermode=0) at /usr/src/sys/amd64/amd64/trap.c:573
 #6  0xffffffff80619933 in trap (frame=
        {tf_rdi = -1098782579504, tf_rsi = -1479218384, tf_rdx =
 -2138554176, tf_rcx = -2138553872, tf_r8 = -1099255420544, tf_r9 =
 -1098475947368, tf_rax = 22, tf_rbx = -1479218384, tf_rbp =
 -1099106545056, tf_r10 = -2138030408, tf_r11 = 0, tf_r12 =
 -1098782579504, tf_r13 = 0, tf_r14 = 0, tf_r15 = 1, tf_trapno = 12,
 tf_addr = 88, tf_flags = -2143224709, tf_err = 0, tf_rip = -2142524282,
 tf_cs = 8, tf_rflags = 66118, tf_rsp = -1479218976, tf_ss = 16}) at
 /usr/src/sys/amd64/amd64/trap.c:352
 #7  0xffffffff80604b2b in calltrap () at
 /usr/src/sys/amd64/amd64/exception.S:168
 #8  0xffffffff804bac86 in ip_ctloutput (so=0xffffff002b7464d0,
 sopt=0xffffffffa7d4eb30) at /usr/src/sys/netinet/ip_output.c:1157
 #9  0xffffffff804cd1c5 in tcp_ctloutput (so=0xffffff002b7464d0,
 sopt=0xffffffffa7d4eb30) at /usr/src/sys/netinet/tcp_usrreq.c:1038
 #10 0xffffffff80441c38 in sosetopt (so=0xffffff002b7464d0,
 sopt=0xffffffffa7d4eb30) at /usr/src/sys/kern/uipc_socket.c:1563
 #11 0xffffffff80448113 in kern_setsockopt (td=0xffffff000f456980,
 s=232190912, level=-2138030408, name=-2138553872,
 val=0xffffff000f456980, valseg=1035680408,
      valsize=729048272) at /usr/src/sys/kern/uipc_syscalls.c:1351
 #12 0xffffffff8044817e in setsockopt (td=0xffffff002b7464d0,
 uap=0xffffffffa7d4eb30) at /usr/src/sys/kern/uipc_syscalls.c:1307
 #13 0xffffffff8061a1b1 in syscall (frame=
        {tf_rdi = 0, tf_rsi = 0, tf_rdx = 1, tf_rcx = 0, tf_r8 = 0, tf_r9
 = 140737488350072, tf_rax = 105, tf_rbx = 0, tf_rbp = 3, tf_r10 =
 -2138030408, tf_r11 = 518, tf_r12 = 140737488350480, tf_r13 =
 34368406752, tf_r14 = 0, tf_r15 = 0, tf_trapno = 12, tf_addr =
 34366834176, tf_flags = 12, tf_err = 2, tf_rip = 34366834188, tf_cs =
 43, tf_rflags = 518, tf_rsp = 140737488350184, tf_ss = 35}) at
 /usr/src/sys/amd64/amd64/trap.c:792
 #14 0xffffffff80604cc8 in Xfast_syscall () at
 /usr/src/sys/amd64/amd64/exception.S:270
 #15 0x00000008006c460c in ?? ()
 Previous frame inner to this frame (corrupt stack?)
 
 
 --- kgdb output, kernel panic 20071116 - kern/104765 ---
 
 
 (kgdb) bt
 #0  doadump () at pcpu.h:172
 #1  0x0000000000000004 in ?? ()
 #2  0xffffffff803f9557 in boot (howto=260) at
 /usr/src/sys/kern/kern_shutdown.c:409
 #3  0xffffffff803f9bf1 in panic (fmt=0xffffff0022251260 "") at
 /usr/src/sys/kern/kern_shutdown.c:565
 #4  0xffffffff8061935f in trap_fatal (frame=0xffffff0022251260,
 eva=18446742975006236672) at /usr/src/sys/amd64/amd64/trap.c:660
 #5  0xffffffff8061967f in trap_pfault (frame=0xffffffffa7c8b820,
 usermode=0) at /usr/src/sys/amd64/amd64/trap.c:573
 #6  0xffffffff80619933 in trap (frame=
        {tf_rdi = -1099263329888, tf_rsi = -1480017104, tf_rdx =
 -2138554176, tf_rcx = -2138553872, tf_r8 = -1098938772896, tf_r9 =
 -1098475947368, tf_rax = 22, tf_rbx = -1480017104, tf_rbp =
 -1099024018240, tf_r10 = -2138030408, tf_r11 = 0, tf_r12 =
 -1099263329888, tf_r13 = 0, tf_r14 = 0, tf_r15 = 1, tf_trapno = 12,
 tf_addr = 88, tf_flags = -2143224709, tf_err = 0, tf_rip = -2142524282,
 tf_cs = 8, tf_rflags = 66118, tf_rsp = -1480017696, tf_ss = 16}) at
 /usr/src/sys/amd64/amd64/trap.c:352
 #7  0xffffffff80604b2b in calltrap () at
 /usr/src/sys/amd64/amd64/exception.S:168
 #8  0xffffffff804bac86 in ip_ctloutput (so=0xffffff000eccb9a0,
 sopt=0xffffffffa7c8bb30) at /usr/src/sys/netinet/ip_output.c:1157
 #9  0xffffffff804cd1c5 in tcp_ctloutput (so=0xffffff000eccb9a0,
 sopt=0xffffffffa7c8bb30) at /usr/src/sys/netinet/tcp_usrreq.c:1038
 #10 0xffffffff80441c38 in sosetopt (so=0xffffff000eccb9a0,
 sopt=0xffffffffa7c8bb30) at /usr/src/sys/kern/uipc_socket.c:1563
 #11 0xffffffff80448113 in kern_setsockopt (td=0xffffff0022251260,
 s=421725600, level=-2138030408, name=-2138553872,
 val=0xffffff0022251260, valseg=1035680408,
      valsize=248297888) at /usr/src/sys/kern/uipc_syscalls.c:1351
 #12 0xffffffff8044817e in setsockopt (td=0xffffff000eccb9a0,
 uap=0xffffffffa7c8bb30) at /usr/src/sys/kern/uipc_syscalls.c:1307
 #13 0xffffffff8061a1b1 in syscall (frame=
        {tf_rdi = 0, tf_rsi = 0, tf_rdx = 1, tf_rcx = 0, tf_r8 = 0, tf_r9
 = 140737488350072, tf_rax = 105, tf_rbx = 0, tf_rbp = 3, tf_r10 =
 -3689348814741910323, tf_r11 = 514, tf_r12 = 140737488350480, tf_r13 =
 34368406752, tf_r14 = 0, tf_r15 = 0, tf_trapno = 12, tf_addr = 5283944,
 tf_flags = 12, tf_err = 2, tf_rip = 34366834188, tf_cs = 43, tf_rflags =
 518, tf_rsp = 140737488350184, tf_ss = 35}) at
 /usr/src/sys/amd64/amd64/trap.c:792
 #14 0xffffffff80604cc8 in Xfast_syscall () at
 /usr/src/sys/amd64/amd64/exception.S:270
 #15 0x00000008006c460c in ?? ()
 Previous frame inner to this frame (corrupt stack?)
 
 
 

From: Robert Watson <rwatson@FreeBSD.org>
To: Kai Gallasch <gallasch@free.de>
Cc: bug-followup@FreeBSD.org
Subject: Re: kern/104765: kernel panic 6.2 prerelease-20061017 amd64
Date: Fri, 24 Nov 2006 14:16:50 +0000 (GMT)

 On Thu, 23 Nov 2006, Kai Gallasch wrote:
 
 > Another crash. (following the previous two crashes after applying your 
 > patch) Here is the output of kgdb.
 >
 > To keep bug-followup@freebsd.org for kern/104765 up to date I am attaching 
 > output of the previous two crashdumps also.
 
 Hmm.  This is unfortunate, as it suggests that finding a non-disruptive fix 
 for this will be difficult.  I'm not sure how you feel about running a 
 -CURRENT kernel, but the architectural change that fixes this whole class of 
 race conditions is present there.  There have been some recent hitches in 
 7-CURRENT due to introducing MSI support, so if you are willing to give this a 
 try you may also want to add the following to your /boot/loader.conf before 
 starting:
 
     hw.pci.enable_msi="0"
     hw.pci.enable_msix="0"
 
 Otherwise the 7-CURRENT kernel is in quite good shape.  Running with it for a 
 few days to see if the crash problem "goes away" would be quite useful.  In 
 the mean time I'll explore another workaround to use as a substitute for the 
 architectural fix during the release cycle.
 
 Robert N M Watson
 Computer Laboratory
 University of Cambridge

From: Robert Watson <rwatson@FreeBSD.org>
To: Kai Gallasch <gallasch@free.de>
Cc: bug-followup@FreeBSD.org
Subject: Re: kern/104765: kernel panic 6.2 prerelease-20061017 amd64
Date: Sat, 25 Nov 2006 00:57:11 +0000 (GMT)

 On Thu, 23 Nov 2006, Kai Gallasch wrote:
 
 > Another crash. (following the previous two crashes after applying your 
 > patch) Here is the output of kgdb.
 >
 > To keep bug-followup@freebsd.org for kern/104765 up to date I am attaching 
 > output of the previous two crashdumps also.
 
 The attached patch may provide a more substantive solution for this problem, 
 at least until 7.x.  I've booted and tested this, but since I don't have a 
 reproduction scenario for the specific bug you're running into right now, I've 
 not managed to test those particular cases.  If GNATS/etc mangle the patch, 
 you can also download it from:
 
    http://www.watson.org/~robert/freebsd/netperf/20061124-ip_ctloutput.diff
 
 It appears to apply without problems against a stock RELENG_6 src/sys/netinet 
 directory, so you may need to remove the current patch you're running with 
 before proceeding.
 
 Robert N M Watson
 Computer Laboratory
 University of Cambridge
 
 Index: ip_output.c
 ===================================================================
 RCS file: /data/fbsd-cvs/ncvs/src/sys/netinet/ip_output.c,v
 retrieving revision 1.242.2.16
 diff -u -r1.242.2.16 ip_output.c
 --- ip_output.c	24 Oct 2006 13:23:03 -0000	1.242.2.16
 +++ ip_output.c	24 Nov 2006 15:47:13 -0000
 @@ -1148,15 +1148,29 @@
 
   /*
    * IP socket option processing.
 + *
 + * There are two versions of this call in order to work around a race
 + * condition in TCP in FreeBSD 6.x.  In the TCP implementation, so->so_pcb
 + * can become NULL if the pcb or pcbinfo lock isn't held.  However, when
 + * entering ip_ctloutput(), neither lock is held, and finding the pointer to
 + * either lock requires follow so->so_pcb, which may be NULL.
 + * ip_ctloutput_pcbinfo() accepts the pcbinfo pointer so that the lock can be
 + * safely acquired.  This is not required in FreeBSD 7.x because the
 + * invariants on so->so_pcb are much stronger, so it cannot become NULL
 + * while the socket is in use.
    */
   int
 -ip_ctloutput(so, sopt)
 +ip_ctloutput_pcbinfo(so, sopt, pcbinfo)
   	struct socket *so;
   	struct sockopt *sopt;
 +	struct inpcbinfo *pcbinfo;
   {
   	struct	inpcb *inp = sotoinpcb(so);
   	int	error, optval;
 
 +	if (pcbinfo == NULL)
 +		pcbinfo = inp->inp_pcbinfo;
 +
   	error = optval = 0;
   	if (sopt->sopt_level != IPPROTO_IP) {
   		return (EINVAL);
 @@ -1190,12 +1204,15 @@
   				m_free(m);
   				break;
   			}
 +			INP_INFO_WLOCK(pcbinfo);
   			if (so->so_pcb == NULL) {
 +				INP_INFO_WUNLOCK(pcbinfo);
   				m_free(m);
   				error = EINVAL;
   				break;
   			}
   			INP_LOCK(inp);
 +			INP_INFO_WUNLOCK(pcbinfo);
   			error = ip_pcbopts(inp, sopt->sopt_name, m);
   			INP_UNLOCK(inp);
   			return (error);
 @@ -1217,10 +1234,14 @@
   			if (error)
   				break;
 
 +			INP_INFO_WLOCK(pcbinfo);
   			if (so->so_pcb == NULL) {
 +				INP_INFO_WUNLOCK(pcbinfo);
   				error = EINVAL;
   				break;
   			}
 +			INP_LOCK(inp);
 +			INP_INFO_WUNLOCK(pcbinfo);
   			switch (sopt->sopt_name) {
   			case IP_TOS:
   				inp->inp_ip_tos = optval;
 @@ -1277,6 +1298,7 @@
   				OPTSET(INP_DONTFRAG);
   				break;
   			}
 +			INP_UNLOCK(inp);
   			break;
   #undef OPTSET
 
 @@ -1295,11 +1317,13 @@
   			if (error)
   				break;
 
 +			INP_INFO_WLOCK(pcbinfo);
   			if (so->so_pcb == NULL) {
   				error = EINVAL;
   				break;
   			}
   			INP_LOCK(inp);
 +			INP_INFO_WUNLOCK(pcbinfo);
   			switch (optval) {
   			case IP_PORTRANGE_DEFAULT:
   				inp->inp_flags &= ~(INP_LOWPORT);
 @@ -1480,6 +1504,15 @@
   	return (error);
   }
 
 +int
 +ip_ctloutput(so, sopt)
 +	struct socket *so;
 +	struct sockopt *sopt;
 +{
 +
 +	return (ip_ctloutput_pcbinfo(so, sopt, NULL));
 +}
 +
   /*
    * Set up IP options in pcb for insertion in output packets.
    * Store in mbuf with pointer in pcbopt, adding pseudo-option
 Index: ip_var.h
 ===================================================================
 RCS file: /data/fbsd-cvs/ncvs/src/sys/netinet/ip_var.h,v
 retrieving revision 1.95
 diff -u -r1.95 ip_var.h
 --- ip_var.h	2 Jul 2005 23:13:31 -0000	1.95
 +++ ip_var.h	24 Nov 2006 15:32:53 -0000
 @@ -144,6 +144,7 @@
 
   struct ip;
   struct inpcb;
 +struct inpcbinfo;
   struct route;
   struct sockopt;
 
 @@ -164,6 +165,8 @@
   extern struct	pr_usrreqs rip_usrreqs;
 
   int	 ip_ctloutput(struct socket *, struct sockopt *sopt);
 +int	 ip_ctloutput_pcbinfo(struct socket *, struct sockopt *sopt,
 +	    struct inpcbinfo *pcbinfo);
   void	 ip_drain(void);
   void	 ip_fini(void *xtp);
   int	 ip_fragment(struct ip *ip, struct mbuf **m_frag, int mtu,
 Index: tcp_usrreq.c
 ===================================================================
 RCS file: /data/fbsd-cvs/ncvs/src/sys/netinet/tcp_usrreq.c,v
 retrieving revision 1.124.2.3
 diff -u -r1.124.2.3 tcp_usrreq.c
 --- tcp_usrreq.c	27 Sep 2006 09:24:44 -0000	1.124.2.3
 +++ tcp_usrreq.c	24 Nov 2006 14:59:41 -0000
 @@ -1035,7 +1035,7 @@
   			error = ip6_ctloutput(so, sopt);
   		else
   #endif /* INET6 */
 -		error = ip_ctloutput(so, sopt);
 +		error = ip_ctloutput_pcbinfo(so, sopt, &tcbinfo);
   		return (error);
   	}
   	tp = intotcpcb(inp);
State-Changed-From-To: open->feedback 
State-Changed-By: rwatson 
State-Changed-When: Sat Nov 25 11:01:09 UTC 2006 
State-Changed-Why:  
Change to feedback state, waiting feedback on a new patch. 


http://www.freebsd.org/cgi/query-pr.cgi?pr=104765 

From: Kai Gallasch <gallasch@free.de>
To: Robert Watson <rwatson@FreeBSD.org>
Cc: Johannes 5 Joemann <joemann@beefree.free.de>
Subject: Re: kern/104765: kernel panic 6.2 prerelease-20061017 amd64
Date: Sun, 26 Nov 2006 03:23:54 +0100

 Robert Watson schrieb:
 > 
 > On Thu, 23 Nov 2006, Kai Gallasch wrote:
 > 
 >> Another crash. (following the previous two crashes after applying your
 >> patch) Here is the output of kgdb.
 >>
 >> To keep bug-followup@freebsd.org for kern/104765 up to date I am
 >> attaching output of the previous two crashdumps also.
 > 
 > The attached patch may provide a more substantive solution for this
 > problem, at least until 7.x.  I've booted and tested this, but since I
 > don't have a reproduction scenario for the specific bug you're running
 > into right now, I've not managed to test those particular cases.  If
 > GNATS/etc mangle the patch, you can also download it from:
 > 
 >   http://www.watson.org/~robert/freebsd/netperf/20061124-ip_ctloutput.diff
 > 
 > It appears to apply without problems against a stock RELENG_6
 > src/sys/netinet directory, so you may need to remove the current patch
 > you're running with before proceeding.
 
 Hi.
 
 I just rebuilt the server with your patch 20061124-ip_ctloutput.diff
 applied to a fresh checkout of RELENG_6.
 
 Thanks for all your effort and time debugging this problem, especially
 with an upcoming 6.2 release in the queue.
 
 --Kai.
 
 
 > 
 > Robert N M Watson
 > Computer Laboratory
 > University of Cambridge
 > 
 > Index: ip_output.c
 > ===================================================================
 > RCS file: /data/fbsd-cvs/ncvs/src/sys/netinet/ip_output.c,v
 > retrieving revision 1.242.2.16
 > diff -u -r1.242.2.16 ip_output.c
 > --- ip_output.c    24 Oct 2006 13:23:03 -0000    1.242.2.16
 > +++ ip_output.c    24 Nov 2006 15:47:13 -0000
 > @@ -1148,15 +1148,29 @@

From: Robert Watson <rwatson@FreeBSD.org>
To: Kai Gallasch <gallasch@free.de>
Cc: Johannes 5 Joemann <joemann@beefree.free.de>, bug-followup@FreeBSD.org
Subject: Re: kern/104765: kernel panic 6.2 prerelease-20061017 amd64
Date: Tue, 28 Nov 2006 15:01:10 +0000 (GMT)

 On Sun, 26 Nov 2006, Kai Gallasch wrote:
 
 >> The attached patch may provide a more substantive solution for this 
 >> problem, at least until 7.x.  I've booted and tested this, but since I 
 >> don't have a reproduction scenario for the specific bug you're running into 
 >> right now, I've not managed to test those particular cases.  If GNATS/etc 
 >> mangle the patch, you can also download it from:
 >>
 >>   http://www.watson.org/~robert/freebsd/netperf/20061124-ip_ctloutput.diff
 >>
 >> It appears to apply without problems against a stock RELENG_6 
 >> src/sys/netinet directory, so you may need to remove the current patch 
 >> you're running with before proceeding.
 >
 > I just rebuilt the server with your patch 20061124-ip_ctloutput.diff applied 
 > to a fresh checkout of RELENG_6.
 >
 > Thanks for all your effort and time debugging this problem, especially with 
 > an upcoming 6.2 release in the queue.
 
 Any luck with this patch?  I'd love to get this fixed merged into the stable 
 and release branches, but don't want to do that without confirmation it helps.
 
 Thanks,
 
 Robert N M Watson
 Computer Laboratory
 University of Cambridge

From: dfilter@FreeBSD.ORG (dfilter service)
To: bug-followup@FreeBSD.org
Cc:  
Subject: Re: kern/104765: commit references a PR
Date: Tue, 28 Nov 2006 21:41:32 +0000 (UTC)

 rwatson     2006-11-28 21:41:12 UTC
 
   FreeBSD src repository
 
   Modified files:        (Branch: RELENG_6)
     sys/netinet          ip_output.c ip_var.h tcp_usrreq.c 
   Log:
   Reformulate ip_ctloutput() and tcp_ctloutput() to work around the fact
   that so_pcb can be invalidated at any time due to an untimely reset.
   Move the body of ip_ctloutput() to ip_ctloutput_pcbinfo(), which
   accepts a pcbinfo argument, and wrap it with ip_ctloutput(), which
   passes a NULL.  Modify tcp_ctloutput() to directly invoke
   ip_ctloutput_pcbinfo() and pass tcbinfo.  Hold the pcbinfo lock when
   dereferencing so_pcb and acquiring the inpcb lock in order to prevent
   the inpcb from being freed; the pcbinfo lock is then immediately
   dropped.  This is required as TCP may free the inppcb and invalidate
   so_pcb due to a reset at any time in the RELENG_6 network stack, which
   otherwise leads to a panic.
   
   This panic might be frequently seen on highly loaded IRC and Samba
   servers, which have long-lasting TCP connections, query socket options
   frequently, and see a significant number of reset connections.
   
   This change has been merged directly to RELENG_6 as the problem does
   not exist in HEAD, where the invariants for so_pcb are much stronger;
   the architectural changes in HEAD avoid the need to acquire a global
   lock in the socket option path.  This change will be merged to
   RELENG_6_2.
   
   PR:             102412, 104765
   Reviewed by:    Diane Bruce <db at db.net>
   Tested by:      Daniel Austin <daniel at kewlio dot net>,
                   Kai Gallasch <gallasch at free dot de>
   
   Revision    Changes    Path
   1.242.2.17  +34 -1     src/sys/netinet/ip_output.c
   1.95.2.1    +3 -0      src/sys/netinet/ip_var.h
   1.124.2.4   +1 -1      src/sys/netinet/tcp_usrreq.c
 _______________________________________________
 cvs-all@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/cvs-all
 To unsubscribe, send any mail to "cvs-all-unsubscribe@freebsd.org"
 

From: dfilter@FreeBSD.ORG (dfilter service)
To: bug-followup@FreeBSD.org
Cc:  
Subject: Re: kern/104765: commit references a PR
Date: Tue, 28 Nov 2006 23:19:35 +0000 (UTC)

 rwatson     2006-11-28 23:19:18 UTC
 
   FreeBSD src repository
 
   Modified files:        (Branch: RELENG_6_2)
     sys/netinet          ip_output.c ip_var.h tcp_usrreq.c 
   Log:
   Merge ip_output.c:1.242.2.17, ip_var.h:1.95.2.1, tcp_usrreq.c:1.124.2.4
   from RELENG_6 to RELENG_6_2:
   
     Reformulate ip_ctloutput() and tcp_ctloutput() to work around the fact
     that so_pcb can be invalidated at any time due to an untimely reset.
     Move the body of ip_ctloutput() to ip_ctloutput_pcbinfo(), which
     accepts a pcbinfo argument, and wrap it with ip_ctloutput(), which
     passes a NULL.  Modify tcp_ctloutput() to directly invoke
     ip_ctloutput_pcbinfo() and pass tcbinfo.  Hold the pcbinfo lock when
     dereferencing so_pcb and acquiring the inpcb lock in order to prevent
     the inpcb from being freed; the pcbinfo lock is then immediately
     dropped.  This is required as TCP may free the inppcb and invalidate
     so_pcb due to a reset at any time in the RELENG_6 network stack, which
     otherwise leads to a panic.
   
     This panic might be frequently seen on highly loaded IRC and Samba
     servers, which have long-lasting TCP connections, query socket options
     frequently, and see a significant number of reset connections.
   
     This change has been merged directly to RELENG_6 as the problem does
     not exist in HEAD, where the invariants for so_pcb are much stronger;
     the architectural changes in HEAD avoid the need to acquire a global
     lock in the socket option path.  This change will be merged to
     RELENG_6_2.
   
     PR:             102412, 104765
     Reviewed by:    Diane Bruce <db at db.net>
     Tested by:      Daniel Austin <daniel at kewlio dot net>,
                     Kai Gallasch <gallasch at free dot de>
   
   Approved by:    re (kensmith)
   
   Revision        Changes    Path
   1.242.2.16.2.1  +34 -1     src/sys/netinet/ip_output.c
   1.95.8.1        +3 -0      src/sys/netinet/ip_var.h
   1.124.2.3.2.1   +1 -1      src/sys/netinet/tcp_usrreq.c
 _______________________________________________
 cvs-all@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/cvs-all
 To unsubscribe, send any mail to "cvs-all-unsubscribe@freebsd.org"
 
State-Changed-From-To: feedback->closed 
State-Changed-By: rwatson 
State-Changed-When: Wed Dec 6 12:42:54 UTC 2006 
State-Changed-Why:  
As there have been no ruther reports of panic after a week and patches 
have been merged to appropriate branches, assume that the problem is 
resolved.  If this is not the case, please let me know.  Thanks for the 
report, patch testing, and patience! 


http://www.freebsd.org/cgi/query-pr.cgi?pr=104765 
>Unformatted:
