From venglin@freebsd.lublin.pl  Mon Jun 16 10:50:25 2003
Return-Path: <venglin@freebsd.lublin.pl>
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 6C5F437B401
	for <freebsd-gnats-submit@freebsd.org>; Mon, 16 Jun 2003 10:50:25 -0700 (PDT)
Received: from mailhost.freebsd.lublin.pl (mailhost.freebsd.lublin.pl [193.138.118.4])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 5C82C43FBF
	for <freebsd-gnats-submit@freebsd.org>; Mon, 16 Jun 2003 10:50:23 -0700 (PDT)
	(envelope-from venglin@freebsd.lublin.pl)
Received: (from root@localhost)
	by mailhost.freebsd.lublin.pl (8.12.9/8.12.6) id h5GHoLQR034371
	for freebsd-gnats-submit@freebsd.org; Mon, 16 Jun 2003 19:50:21 +0200 (CEST)
	(envelope-from venglin@freebsd.lublin.pl)
Received: from lagoon.freebsd.lublin.pl (qmailr@lagoon.freebsd.lublin.pl [193.138.118.3])
	by mailhost.freebsd.lublin.pl (8.12.9/8.12.9) with SMTP id h5GHnfQv031646
	for <FreeBSD-gnats-submit@freebsd.org>; Mon, 16 Jun 2003 19:49:42 +0200 (CEST)
	(envelope-from venglin@freebsd.lublin.pl)
Received: (qmail 31565 invoked by uid 1001); 16 Jun 2003 17:49:41 -0000
Message-Id: <20030616174941.31554.qmail@lagoon.freebsd.lublin.pl>
Date: 16 Jun 2003 17:49:41 -0000
From: Przemyslaw Frasunek <venglin@freebsd.lublin.pl>
Reply-To: Przemyslaw Frasunek <venglin@freebsd.lublin.pl>
To: FreeBSD-gnats-submit@freebsd.org
Cc:
Subject: Repetable panics in ffs_vget() on Proliant ML350 with SMP/HTT enabled
X-Send-Pr-Version: 3.113
X-GNATS-Notify:

>Number:         53382
>Category:       i386
>Synopsis:       Repetable panics in ffs_vget() on Proliant ML350 with SMP/HTT enabled
>Confidential:   no
>Severity:       serious
>Priority:       high
>Responsible:    remko
>State:          closed
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Mon Jun 16 11:00:34 PDT 2003
>Closed-Date:    Sun Sep 03 13:04:19 GMT 2006
>Last-Modified:  Sun Sep 03 13:04:19 GMT 2006
>Originator:     Przemyslaw Frasunek
>Release:        FreeBSD 4.8-RELEASE i386
>Organization:
ATM S.A.
>Environment:
System: FreeBSD riot.atman.pl 4.8-RELEASE FreeBSD 4.8-RELEASE #0: Mon Jun 16 18:06:45 CEST 2003     root@riot.atman.pl:/usr/src/sys/compile/RIOT  i386

	Compaq Proliant ML350; problem repetable on other ML350s with
	SMP/HTT enabled.

Copyright (c) 1992-2003 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
	The Regents of the University of California. All rights reserved.
FreeBSD 4.8-RELEASE #0: Mon Jun 16 18:06:45 CEST 2003
    root@riot.atman.pl:/usr/src/sys/compile/RIOT
Timecounter "i8254"  frequency 1193182 Hz
Timecounter "TSC"  frequency 2392260632 Hz
CPU: Intel(R) Xeon(TM) CPU 2.40GHz (2392.26-MHz 686-class CPU)
  Origin = "GenuineIntel"  Id = 0xf27  Stepping = 7
  Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE>
  Hyperthreading: 2 logical CPUs
real memory  = 1073717248 (1048552K bytes)
avail memory = 1041403904 (1016996K bytes)
Preloaded elf kernel "kernel" at 0xc0308000.
Pentium Pro MTRR support enabled
npx0: <math processor> on motherboard
npx0: INT 16 interface
pcib0: <Host to PCI bridge> on motherboard
pci0: <PCI bus> on pcib0
ahc0: <Adaptec (Compaq OEM) 3960D Ultra160 SCSI adapter> port 0x2400-0x24ff mem 0xf7cf0000-0xf7cf0fff irq 10 at device 2.0 on pci0
aic7899: Ultra160 Wide Channel A, SCSI Id=7, 32/253 SCBs
ahc1: <Adaptec (Compaq OEM) 3960D Ultra160 SCSI adapter> port 0x2800-0x28ff mem 0xf7ce0000-0xf7ce0fff irq 10 at device 2.1 on pci0
aic7899: Ultra160 Wide Channel B, SCSI Id=7, 32/253 SCBs
pci0: <ATI Mach64-GR graphics accelerator> at 3.0
bge0: <Broadcom BCM5702X Gigabit Ethernet, ASIC rev. 0x1002> mem 0xf5fe0000-0xf5feffff irq 3 at device 4.0 on pci0
bge0: Ethernet address: 00:0b:cd:4e:17:f7
miibus0: <MII bus> on bge0
brgphy0: <BCM5703 10/100/1000baseTX PHY> on miibus0
brgphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseTX, 1000baseTX-FDX, auto
pci0: <unknown card> (vendor=0x0e11, dev=0xa0f0) at 5.0 irq 5
isab0: <PCI to ISA bridge (vendor=1166 device=0201)> at device 15.0 on pci0
isa0: <ISA bus> on isab0
pci0: <Unknown PCI ATA controller> at 15.1
pcib1: <Host to PCI bridge> on motherboard
pci1: <PCI bus> on pcib1
pcib2: <Host to PCI bridge> on motherboard
pci2: <PCI bus> on pcib2
ciss0: <Compaq Smart Array 532> port 0x3000-0x30ff mem 0xf7df0000-0xf7df3fff,0xf7ec0000-0xf7efffff irq 15 at device 1.0 on pci2
ciss0: using 256 of 1024 available commands
ciss0:   0 logical drives configured
ciss0:   firmware 2.20
ciss0:   2 SCSI channels
ciss0:   signature 'CISS'
ciss0:   valence 1
ciss0:   supported I/O methods 0xe<simple,performant,MEMQ>
ciss0:   active I/O method 0x3<simple>
ciss0:   4G page base 0x00000000
ciss0:   interrupt coalesce delay 1000us
ciss0:   interrupt coalesce count 16
ciss0:   max outstanding commands 1024
ciss0:   bus types 0x2<ultra3>
ciss0:   server name ''
ciss0:   heartbeat 0x3000004a
ciss0: 0 logical drive
xl0: <3Com 3c905C-TX Fast Etherlink XL> port 0x3400-0x347f mem 0xf7eb0000-0xf7eb007f irq 11 at device 2.0 on pci2
xl0: reset didn't complete
xl0: Ethernet address: 00:04:75:f2:2b:e1
miibus1: <MII bus> on xl0
ukphy0: <Generic IEEE 802.3u media interface> on miibus1
ukphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
pcib3: <Host to PCI bridge> on motherboard
pci3: <PCI bus> on pcib3
pcib4: <ServerWorks host to PCI bridge(unknown chipset)> on motherboard
pci4: <PCI bus> on pcib4
pcib5: <ServerWorks host to PCI bridge(unknown chipset)> on motherboard
pci5: <PCI bus> on pcib5
xl1: <3Com 3c905C-TX Fast Etherlink XL> port 0x4000-0x407f mem 0xf7ff0000-0xf7ff007f irq 10 at device 1.0 on pci5
xl0: reset didn't complete
xl1: Ethernet address: 00:04:75:f2:2b:dd
miibus2: <MII bus> on xl1
ukphy1: <Generic IEEE 802.3u media interface> on miibus2
ukphy1:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
xl2: <3Com 3c980C Fast Etherlink XL> port 0x4080-0x40ff mem 0xf7fe0000-0xf7fe007f irq 15 at device 2.0 on pci5
xl2: Ethernet address: 00:04:75:db:fa:9c
miibus3: <MII bus> on xl2
xlphy0: <3c905C 10/100 internal PHY> on miibus3
xlphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
orm0: <Option ROMs> at iomem 0xc0000-0xc7fff,0xc8000-0xcbfff,0xcc000-0xcc7ff,0xcc800-0xccfff,0xee000-0xeffff on isa0
fdc0: <NEC 72065B or clone> at port 0x3f0-0x3f5,0x3f7 irq 6 drq 2 on isa0
fdc0: FIFO enabled, 8 bytes threshold
fd0: <1440-KB 3.5" drive> on fdc0 drive 0
atkbdc0: <Keyboard controller (i8042)> at port 0x60,0x64 on isa0
atkbd0: <AT Keyboard> flags 0x1 irq 1 on atkbdc0
kbd0 at atkbd0
psm0: <PS/2 Mouse> irq 12 on atkbdc0
psm0: model IntelliMouse Explorer, device ID 4
vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0
sc0: <System console> at flags 0x100 on isa0
sc0: VGA <16 virtual consoles, flags=0x300>
DUMMYNET initialized (011031)
IP packet filtering initialized, divert disabled, rule-based forwarding enabled, default to accept, logging disabled
IP Filter: v3.4.31 initialized.  Default = pass all, Logging = enabled
Waiting 15 seconds for SCSI devices to settle
pt0 at ahc0 bus 0 target 15 lun 0
pt0: <COMPAQ PROLIANT 4L6I 1.78> Fixed Processor SCSI-2 device 
pt0: 3.300MB/s transfers
da2 at ahc0 bus 0 target 2 lun 0
da2: <COMPAQ BD03664545 B20B> Fixed Direct Access SCSI-2 device 
da2: 160.000MB/s transfers (80.000MHz, offset 127, 16bit), Tagged Queueing Enabled
da2: 34732MB (71132000 512 byte sectors: 255H 63S/T 4427C)
da3 at ahc0 bus 0 target 3 lun 0
da3: <COMPAQ BD03664545 B20B> Fixed Direct Access SCSI-2 device 
da3: 160.000MB/s transfers (80.000MHz, offset 127, 16bit), Tagged Queueing Enabled
da3: 34732MB (71132000 512 byte sectors: 255H 63S/T 4427C)
da1 at ahc0 bus 0 target 1 lun 0
da1: <COMPAQ BD0366349C 3B06> Fixed Direct Access SCSI-2 device 
da1: 160.000MB/s transfers (80.000MHz, offset 63, 16bit), Tagged Queueing Enabled
da1: 34732MB (71132000 512 byte sectors: 255H 63S/T 4427C)
da0 at ahc0 bus 0 target 0 lun 0
da0: <COMPAQ BD0186349B 3B11> Fixed Direct Access SCSI-2 device 
da0: 160.000MB/s transfers (80.000MHz, offset 63, 16bit), Tagged Queueing Enabled
da0: 17365MB (35565080 512 byte sectors: 255H 63S/T 2213C)
da5 at ahc0 bus 0 target 5 lun 0
da5: <COMPAQ BD03685A24 HPB3> Fixed Direct Access SCSI-3 device 
da5: 160.000MB/s transfers (80.000MHz, offset 63, 16bit), Tagged Queueing Enabled
da5: 34732MB (71132000 512 byte sectors: 255H 63S/T 4427C)
da4 at ahc0 bus 0 target 4 lun 0
da4: <COMPAQ BD03685A24 HPB3> Fixed Direct Access SCSI-3 device 
da4: 160.000MB/s transfers (80.000MHz, offset 63, 16bit), Tagged Queueing Enabled
da4: 34732MB (71132000 512 byte sectors: 255H 63S/T 4427C)
Mounting root from ufs:/dev/da0s1a


machine		i386
cpu		I686_CPU
ident		RIOT
maxusers	256

options 	INET
options 	INET6
options 	FFS
options 	FFS_ROOT
options 	SOFTUPDATES
options 	UFS_DIRHASH
options 	COMPAT_43
options 	SCSI_DELAY=15000
options 	USERCONFIG
options 	SYSVSHM
options 	SYSVMSG
options 	SYSVSEM
options         MAXDSIZ="(512*1024*1024)"
options         MAXSSIZ="(512*1024*1024)"
options         DFLDSIZ="(512*1024*1024)"
options		NMBCLUSTERS=131070
options		PMAP_SHPGPERPROC=400
options		SMP
options		APIC_IO
options		HTT
options 	P1003_1B
options 	_KPOSIX_PRIORITY_SCHEDULING
options		ICMP_BANDLIM
options 	KBD_INSTALL_CDEV

options         IPFILTER
options		IPFILTER_LOG

options		IPFIREWALL
options		IPFIREWALL_DEFAULT_TO_ACCEPT
options		DUMMYNET

device		isa
device		pci

device		fdc0	at isa? port IO_FD1 irq 6 drq 2
device		fd0	at fdc0 drive 0
device		fd1	at fdc0 drive 1

device		scbus
device		da
device		sa
device		cd
device		pass
device		pt
device		ses

device		ahc
device		ciss

device		atkbdc0	at isa? port IO_KBD
device		atkbd0	at atkbdc? irq 1 flags 0x1
device		psm0	at atkbdc? irq 12

device		vga0	at isa?

device		sc0	at isa? flags 0x100

device		npx0	at nexus? port IO_NPX irq 13

device		miibus		# MII bus support
device		xl
device		bge

pseudo-device	loop		# Network loopback
pseudo-device	ether		# Ethernet support
pseudo-device	pty		# Pseudo-ttys (telnet etc)
pseudo-device	bpf		#Berkeley packet filter
pseudo-device	tun
pseudo-device	gif

>Description:
	After short period of time with heavy disk activity, most I/O
	operations fails with EBADF. Then, page fault is caught after
	no more than one minute:

SMP 2 cpus
IdlePTD at phsyical address 0x0033a000
initial pcb at physical address 0x002a54e0
panicstr: page fault
panic messages:
---
Fatal trap 12: page fault while in kernel mode
mp_lock = 01000002; cpuid = 1; lapic.id = 07000000
fault virtual address   = 0x0
fault code              = supervisor write, page not present
instruction pointer     = 0x8:0xc023ffb3
stack pointer           = 0x10:0xff6e8be0
frame pointer           = 0x10:0xff6e8c14
code segment            = base 0x0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, def32 1, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 4837 (squid)
interrupt mask          = bio  <- SMP: XXX
trap number             = 12
panic: page fault
mp_lock = 01000002; cpuid = 1; lapic.id = 07000000
boot() called on cpu#1
syncing disks... 109 109 109 109 109 109 109 32 32 32 32 32 32 32 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19
giving up on 19 buffers
Uptime: 5m31s
xl0: reset didn't complete
xl1: reset didn't complete

dumping to dev #da/0x20001, offset 2097200
[...]
(kgdb) bt
#0  0xc0175a26 in dumpsys ()
#1  0xc01757f7 in boot ()
#2  0xc0175c50 in poweroff_wait ()
#3  0xc02415e0 in trap_fatal ()
#4  0xc0241271 in trap_pfault ()
#5  0xc0240e0f in trap ()
#6  0xc023ffb3 in generic_bzero ()
#7  0xc0201ae3 in ffs_vget ()
#8  0xc01f6795 in ffs_valloc ()
#9  0xc0208fa3 in ufs_makeinode ()
#10 0xc02069a8 in ufs_create ()
#11 0xc02092d9 in ufs_vnoperate ()
#12 0xc01aa4d4 in vn_open ()
#13 0xc01a66d0 in open ()
#14 0xc02418b1 in syscall2 ()
#15 0xc022eefb in Xint0x80_syscall ()
cannot read proc at 0
(kgdb) info all
eax            0x0      0
ecx            0x0      0
edx            0x0      0
ebx            0x0      0
esp            0xff6e8ab0       0xff6e8ab0
ebp            0xff6e8abc       0xff6e8abc
esi            0x0      0
edi            0x68000040       1744830528
eip            0xc0175a26       0xc0175a26
eflags         0x0      0
cs             0x0      0
ss             0x0      0
ds             0x0      0
es             0x0      0
fs             cannot read u area ptr for proc at 0

Sometimes, panic in pmap-related functions also occur:

Fatal trap 12: page fault while in kernel mode
mp_lock = 00000002; cpuid = 0; lapic.id = 06000000
fault virtual address   = 0xbfc00000
fault code              = supervisor write, page not present
instruction pointer     = 0x8:0xc023d461
stack pointer           = 0x10:0xff685e30
frame pointer           = 0x10:0xff685e3c
code segment            = base 0x0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, def32 1, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 8523 (cpp0)
interrupt mask          = none <- SMP: XXX
trap number             = 12
panic: page fault
mp_lock = 00000002; cpuid = 0; lapic.id = 06000000
boot() called on cpu#0

syncing disks... 96 87 87 72 71 69 67 66 66 56 53 52 50 48 48 32 31 31 22 21 19 18 17 17 14 12 11 11 3 3
done
Uptime: 5m33s
xl0: reset didn't complete
xl1: reset didn't complete

dumping to dev #da/0x20001, offset 2097200
[...]
(kgdb) bt
#0  0xc0175a26 in dumpsys ()
#1  0xc01757f7 in boot ()
#2  0xc0175c50 in poweroff_wait ()
#3  0xc02415e0 in trap_fatal ()
#4  0xc0241271 in trap_pfault ()
#5  0xc0240e0f in trap ()
#6  0xc023d461 in pmap_qenter ()
#7  0xc0185d56 in pipe_build_write_buffer ()
#8  0xc0185f28 in pipe_direct_write ()
#9  0xc01862ca in pipe_write ()
#10 0xc0184723 in dofilewrite ()
#11 0xc018461a in write ()
#12 0xc02418b1 in syscall2 ()
#13 0xc022eefb in Xint0x80_syscall ()
#14 0x804e900 in ?? ()
#15 0x804a696 in ?? ()
#16 0x804813e in ?? ()
(kgdb) info all
eax            0x0      0
ecx            0x0      0
edx            0x0      0
ebx            0x0      0
esp            0xff685d00       0xff685d00
ebp            0xff685d0c       0xff685d0c
esi            0x0      0
edi            0x0      0
eip            0xc0175a26       0xc0175a26
eflags         0x0      0
cs             0x0      0
ss             0x0      0
ds             0x0      0
es             0x0      0
fs             0x0      0
gs             0x2f     47

>How-To-Repeat:
	Heavy I/O activity on Proliant ML350.
>Fix:
	Turn off SMP. 
>Release-Note:
>Audit-Trail:

From: Przemyslaw Frasunek <venglin@freebsd.lublin.pl>
To: freebsd-gnats-submit@freebsd.org
Cc:  
Subject: Re: i386/53382: Repetable panics in ffs_vget() on Proliant ML350
 with SMP/HTT enabled
Date: Tue, 17 Jun 2003 10:42:42 +0200

 Ok, this looks a little bit more mysterious. I had another one panic with 
 SMP disabled, after about 6 hours of heavy I/O activity. Bear in mind, that 
 all panics I had are null pointer dereferences.
 
 IdlePTD at phsyical address 0x00327000
 initial pcb at physical address 0x0029a560
 panicstr: page fault
 panic messages:
 ---
 Fatal trap 12: page fault while in kernel mode
 fault virtual address   = 0x18
 fault code              = supervisor write, page not present
 instruction pointer     = 0x8:0xc0210175
 stack pointer           = 0x10:0xff5bdc60
 frame pointer           = 0x10:0xff5bdc64
 code segment            = base 0x0, limit 0xfffff, type 0x1b
                          = DPL 0, pres 1, def32 1, gran 1
 processor eflags        = interrupt enabled, resume, IOPL = 0
 current process         = 1092 (cp)
 interrupt mask          = none
 trap number             = 12
 panic: page fault
 syncing disks... 470 328 203 196 192 182 179 170 163 157 143 129 119 113 
 101 88 76 66 59 54 43 36 28 23 18 16 13 10 8 6 4 4
 done
 Uptime: 4h7m2s
 xl0: reset didn't complete
 xl1: reset didn't complete
 
 dumping to dev #da/0x20001, offset 2097200
 [...]
 (kgdb) bt
 #0  dumpsys () at ../../kern/kern_shutdown.c:487
 #1  0xc0173ec7 in boot (howto=256) at ../../kern/kern_shutdown.c:316
 #2  0xc01742ec in poweroff_wait (junk=0xc027036c, howto=-1071186289)
      at ../../kern/kern_shutdown.c:595
 #3  0xc02381ce in trap_fatal (frame=0xff5bdc20, eva=24)
      at ../../i386/i386/trap.c:974
 #4  0xc0237ea1 in trap_pfault (frame=0xff5bdc20, usermode=0, eva=24)
      at ../../i386/i386/trap.c:867
 #5  0xc0237a8b in trap (frame={tf_fs = -11468784, tf_es = -1070989296,
        tf_ds = -1070989296, tf_edi = 1, tf_esi = 0, tf_ebp = -10757020,
        tf_isp = -10757044, tf_ebx = 2, tf_edx = 0, tf_ecx = 1, tf_eax = 2,
        tf_trapno = 12, tf_err = 2, tf_eip = -1071578763, tf_cs = 8,
        tf_eflags = 66118, tf_esp = 2, tf_ss = -10756984})
      at ../../i386/i386/trap.c:466
 #6  0xc0210175 in _vm_object_allocate (type=2, size=1, object=0x0)
      at ../../vm/vm_object.c:157
 #7  0xc0210304 in vm_object_allocate (type=2, size=1)
      at ../../vm/vm_object.c:240
 #8  0xc0215759 in vnode_pager_alloc (handle=0xff676180, size=235, prot=0,
      offset=0) at ../../vm/vnode_pager.c:145
 #9  0xc019ef90 in vop_stdcreatevobject (ap=0xff5bdd74)
      at ../../kern/vfs_default.c:539
 #10 0xc019ec79 in vop_defaultop (ap=0xff5bdd74) at ../../kern/vfs_default.c:152
 #11 0xc0206c09 in ufs_vnoperate (ap=0xff5bdd74)
      at ../../ufs/ufs/ufs_vnops.c:2376
 #12 0xc01a2d86 in vfs_object_create (vp=0xff676180, p=0xff51fc20,
      cred=0xd9fb3180) at vnode_if.h:1383
 #13 0xc019fa4a in namei (ndp=0xff5bded4) at ../../kern/vfs_lookup.c:171
 #14 0xc01a830b in vn_open (ndp=0xff5bded4, fmode=1, cmode=0)
      at ../../kern/vfs_vnops.c:138
 #15 0xc01a4440 in open (p=0xff51fc20, uap=0xff5bdf80)
      at ../../kern/vfs_syscalls.c:1028
 #16 0xc02383f2 in syscall2 (frame={tf_fs = 47, tf_es = 47,
        tf_ds = -1078001617, tf_edi = 134577114, tf_esi = 51,
        tf_ebp = -1077937964, tf_isp = -10756140, tf_ebx = 1,
        tf_edx = 134689024, tf_ecx = 2, tf_eax = 5, tf_trapno = 7, tf_err = 2,
        tf_eip = 134531468, tf_cs = 31, tf_eflags = 659, tf_esp = -1077938024,
        tf_ss = 47}) at ../../i386/i386/trap.c:1175
 #17 0xc022c5d5 in Xint0x80_syscall ()
 #18 0x8048a7d in ?? ()
 #19 0x8048556 in ?? ()
 #20 0x804813e in ?? ()
 
 Then I tried to run GENERIC kernel instead of custom. It works OK (7 hours 
 of uptime). Even if I reenable SMP/HTT on GENERIC, it still works OK.
 
 I've managed to prepare SMP-enabled config, that doesn't cause panics, but 
 I'm still not sure which particular option could introduce instability.
 
 machine         i386
 cpu             I686_CPU
 ident           KBWFW
 maxusers        0
 
 options         INET
 options         INET6
 options         FFS
 options         FFS_ROOT
 options         SOFTUPDATES
 options         UFS_DIRHASH
 options         PROCFS
 options         COMPAT_43
 options         UCONSOLE
 options         USERCONFIG
 options         VISUAL_USERCONFIG
 options         SYSVSHM
 options         SYSVMSG
 options         SYSVSEM
 options         P1003_1B
 options         _KPOSIX_PRIORITY_SCHEDULING
 options         KBD_INSTALL_CDEV
 options         NO_F00F_HACK
 options         NMBCLUSTERS=32768
 options         IPSEC
 options         IPSEC_ESP
 options         MAXDSIZ="(1024*1024*1024)"
 options         MAXSSIZ="(1024*1024*1024)"
 options         DFLDSIZ="(1024*1024*1024)"
 options         IPFILTER
 options         IPFILTER_LOG
 options         IPFIREWALL
 options         DUMMYNET
 options         IPFIREWALL_DEFAULT_TO_ACCEPT
 options         ICMP_BANDLIM
 options         RANDOM_IP_ID
 options         SMP                     # Symmetric MultiProcessor Kernel
 options         APIC_IO                 # Symmetric (APIC) I/O
 options         HTT                     # HyperThreading Technology
 
 device          isa
 device          pci
 
 device          fdc0    at isa? port IO_FD1 irq 6 drq 2
 device          fd0     at fdc0 drive 0
 
 device          ata
 device          atadisk
 device          atapicd
 options         ATA_STATIC_ID
 
 device          ahc
 device          ciss
 
 device          scbus
 device          ch
 device          da
 device          sa
 device          cd
 device          pass
 device          pt
 device          ses
 
 device          atkbdc0 at isa? port IO_KBD
 device          atkbd0  at atkbdc? irq 1 flags 0x1
 
 device          vga0    at isa?
 
 device          sc0     at isa? flags 0x100
 
 device          npx0    at nexus? port IO_NPX irq 13
 
 device          miibus
 device          xl
 device          bge
 
 pseudo-device   loop
 pseudo-device   ether
 pseudo-device   pty
 pseudo-device   bpf
 pseudo-device   gif
 
 
 -- 
 * Fido: 2:480/124 ** WWW: http://www.frasunek.com/ ** NIC-HDL: PMF9-RIPE *
 * Inet: przemyslaw@frasunek.com ** keyId: 2578FCAD | C0613BE3 | EC78FAB5 *
 

From: "=?big5?B?Q2h1bi1UaWVuIENoYW5nIFwosWmnZ6TRXCk=?=" <tcs@kitty.2y.net>
To: <freebsd-gnats-submit@FreeBSD.org>, <venglin@freebsd.lublin.pl>
Cc:  
Subject: Re: i386/53382: Repetable panics in ffs_vget() on Proliant ML350 with SMP/HTT enabled
Date: Thu, 17 Jul 2003 09:44:00 +0800

 This is a multi-part message in MIME format.
 
 ------=_NextPart_000_0003_01C34C47.F3803300
 Content-Type: text/plain;
 	charset="big5"
 Content-Transfer-Encoding: quoted-printable
 
 The same problem can be found in ASUS P4C800 Deluxe Motherboard.
 ------=_NextPart_000_0003_01C34C47.F3803300
 Content-Type: text/html;
 	charset="big5"
 Content-Transfer-Encoding: quoted-printable
 
 <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
 <HTML><HEAD>
 <META http-equiv=3DContent-Type content=3D"text/html; charset=3Dbig5">
 <META content=3D"MSHTML 6.00.2800.1170" name=3DGENERATOR>
 <STYLE></STYLE>
 </HEAD>
 <BODY bgColor=3D#ffffff>
 <DIV><FONT size=3D2>The same problem can be found in ASUS P4C800 Deluxe=20
 Motherboard.</FONT></DIV></BODY></HTML>
 
 ------=_NextPart_000_0003_01C34C47.F3803300--
 

From: Mark Tinguely <tinguely@casselton.net>
To: freebsd-gnats-submit@FreeBSD.org, venglin@freebsd.lublin.pl
Cc: tinguely@casselton.net
Subject: Re: i386/53382: Repetable panics in ffs_vget() on Proliant ML350 with SMP/HTT enabled
Date: Wed, 19 May 2004 07:39:19 -0500 (CDT)

 another person experienced this problem. He has a tyan dual-xeon mobo with
 two CPUs, Adaptec 2100, quality RAM, quality power supply. About every other
 night, while under heavy disk I/O, the computer suffered a panic with
 tracebacks.
 
 				---
 MALLOC/malloc/kmem_malloc can still return a NULL pointer if the Kernel
 Virtual Memory is depleted or fragmented enough that vm_map_findspace()
 fails. I am mostly sending this email for persons searching the bug
 database when they encounter the problem, to suggest they increase their
 Kernel Virtual Memory in your custom kernel configuration to bypass the
 problem:
  
 	options		KVA_PAGES=XXX
 
 and possibly increase the kmem_map size (default MAX size is 200 MB):
 
 	options		VM_KMEM_SIZE_MAX="(YYY*1024*1024)"
 (or set using the /boot/loader.conf kern.vm.kmem.size setting)
 
 Where the user chooses an appropriate larger value XXX for the KVA.
 I suggested a value of 384-512 (1.5-2 GB); and value YYY to increase
 the kernel malloc area, maybe to a value of 256.
 
 				---
 I know changing malloc() to wait for KVM when called with M_WAITOK
 has lots of ramifications. But to close this problem report, I would
 suggest that if malloc() must return a NULL in the M_WAITOK situation,
 FreeBSD should either deny the ffs_vget() request (as the sample patched
 below illustrates -- this may happen elsewhere as well) or assert a
 panic. Letting the bzero() to page fault on a NULL pointer and panic
 indirectly is not the correct thing to do.
 				---
 
 *** ffs_vfsops.c	Sun Jun 23 17:34:52 2002
 --- ffs_vfsops.c.fix	Fri May 14 09:46:06 2004
 ***************
 *** 1105,1110 ****
 --- 1105,1118 ----
   	MALLOC(ip, struct inode *, sizeof(struct inode), 
   	    ump->um_malloctype, M_WAITOK);
   
 + 	/*
 + 	 * fail if MALLOC failed.
 + 	 */
 + 	if (ip == NULL) {
 + 		printf("ffs_vget: malloc of inode failed\n");
 + 		return (ENOMEM);
 + 	}
 + 
   	/* Allocate a new vnode/inode. */
   	error = getnewvnode(VT_UFS, mp, ffs_vnodeop_p, &vp);
   	if (error) {
 				---
 
 --Mark Tinguely			tinguely@casselton.net
State-Changed-From-To: open->feedback 
State-Changed-By: remko 
State-Changed-When: Sun Sep 3 10:11:03 UTC 2006 
State-Changed-Why:  
Hello, 

Is this problem still alive in later FreeBSD releases? 
A Lot had been changed in the meanwhile and I would like 
to know whether this still needs our attention. 


Responsible-Changed-From-To: freebsd-i386->remko 
Responsible-Changed-By: remko 
Responsible-Changed-When: Sun Sep 3 10:11:03 UTC 2006 
Responsible-Changed-Why:  
Grab the PR 

http://www.freebsd.org/cgi/query-pr.cgi?pr=53382 
State-Changed-From-To: feedback->closed 
State-Changed-By: remko 
State-Changed-When: Sun Sep 3 13:03:49 UTC 2006 
State-Changed-Why:  
Close as per request of the submitter. Thanks for the feedback! 

http://www.freebsd.org/cgi/query-pr.cgi?pr=53382 
>Unformatted:
