From nobody@FreeBSD.org  Fri Nov  6 17:28:41 2009
Return-Path: <nobody@FreeBSD.org>
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 0FF8D106568D
	for <freebsd-gnats-submit@FreeBSD.org>; Fri,  6 Nov 2009 17:28:41 +0000 (UTC)
	(envelope-from nobody@FreeBSD.org)
Received: from www.freebsd.org (www.freebsd.org [IPv6:2001:4f8:fff6::21])
	by mx1.freebsd.org (Postfix) with ESMTP id F17048FC0C
	for <freebsd-gnats-submit@FreeBSD.org>; Fri,  6 Nov 2009 17:28:40 +0000 (UTC)
Received: from www.freebsd.org (localhost [127.0.0.1])
	by www.freebsd.org (8.14.3/8.14.3) with ESMTP id nA6HSeZE044891
	for <freebsd-gnats-submit@FreeBSD.org>; Fri, 6 Nov 2009 17:28:40 GMT
	(envelope-from nobody@www.freebsd.org)
Received: (from nobody@localhost)
	by www.freebsd.org (8.14.3/8.14.3/Submit) id nA6HSeV4044890;
	Fri, 6 Nov 2009 17:28:40 GMT
	(envelope-from nobody)
Message-Id: <200911061728.nA6HSeV4044890@www.freebsd.org>
Date: Fri, 6 Nov 2009 17:28:40 GMT
From: Kai Gallasch <gallasch@free.de>
To: freebsd-gnats-submit@FreeBSD.org
Subject: FreeBSD 8.0 RC2 with vm.pmap.pg_ps_enabled=1 kernel panic with makeworld
X-Send-Pr-Version: www-3.1
X-GNATS-Notify:

>Number:         140338
>Category:       kern
>Synopsis:       [vm][panic] FreeBSD 8.0 RC2 with vm.pmap.pg_ps_enabled=1 kernel panic with makeworld
>Confidential:   no
>Severity:       serious
>Priority:       high
>Responsible:    freebsd-bugs
>State:          closed
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Fri Nov 06 17:30:01 UTC 2009
>Closed-Date:    Mon Jul 05 21:03:32 UTC 2010
>Last-Modified:  Mon Jul 05 21:03:32 UTC 2010
>Originator:     Kai Gallasch
>Release:        8.0 RC2 amd64
>Organization:
>Environment:
Copyright (c) 1992-2009 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
	The Regents of the University of California. All rights reserved.
FreeBSD is a registered trademark of The FreeBSD Foundation.
FreeBSD 8.0-RC2 #0: Tue Nov  3 20:24:06 CET 2009
    root@sonnenkraft.free.de:/usr/obj/usr/src/sys/GENERIC amd64
WARNING: WITNESS option enabled, expect reduced performance.
Timecounter "i8254" frequency 1193182 Hz quality 0
CPU: Quad-Core AMD Opteron(tm) Processor 2352 (2100.09-MHz K8-class CPU)
  Origin = "AuthenticAMD"  Id = 0x100f23  Stepping = 3
  Features=0x178bfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2,HTT>
  Features2=0x802009<SSE3,MON,CX16,POPCNT>
  AMD Features=0xee400800<SYSCALL,MMX+,FFXSR,Page1GB,RDTSCP,LM,3DNow!+,3DNow!>
  AMD Features2=0x7ff<LAHF,CMP,SVM,ExtAPIC,CR8,ABM,SSE4A,MAS,Prefetch,OSVW,IBS>
  TSC: P-state invariant
real memory  = 21474836480 (20480 MB)
avail memory = 20701110272 (19742 MB)
ACPI APIC Table: <HP     ProLiant>
FreeBSD/SMP: Multiprocessor System Detected: 8 CPUs
FreeBSD/SMP: 2 package(s) x 4 core(s)
 cpu0 (BSP): APIC ID:  0
 cpu1 (AP): APIC ID:  1
 cpu2 (AP): APIC ID:  2
 cpu3 (AP): APIC ID:  3
 cpu4 (AP): APIC ID:  4
 cpu5 (AP): APIC ID:  5
 cpu6 (AP): APIC ID:  6
 cpu7 (AP): APIC ID:  7
ioapic0 <Version 1.1> irqs 0-15 on motherboard
ioapic1 <Version 1.1> irqs 16-31 on motherboard
ioapic2 <Version 1.1> irqs 32-47 on motherboard
kbd1 at kbdmux0
acpi0: <HP ProLiant> on motherboard
acpi0: [ITHREAD]
acpi0: Power Button (fixed)
Timecounter "ACPI-safe" frequency 3579545 Hz quality 850
acpi_timer0: <32-bit timer at 3.579545MHz> port 0x920-0x923 on acpi0
acpi_hpet0: <High Precision Event Timer> iomem 0xfed00000-0xfed003ff on acpi0
Timecounter "HPET" frequency 14318180 Hz quality 900
pcib0: <ACPI Host-PCI bridge> on acpi0
pci0: <ACPI PCI bus> on pcib0
vgapci0: <VGA-compatible display> port 0x1000-0x10ff mem 0xe8000000-0xefffffff,0xf7ff0000-0xf7ffffff irq 44 at device 3.0 on pci0
pci0: <base peripheral> at device 4.0 (no driver attached)
pci0: <base peripheral> at device 4.2 (no driver attached)
uhci0: <UHCI (generic) USB controller> port 0x1800-0x181f irq 45 at device 4.4 on pci0
uhci0: [ITHREAD]
usbus0: <UHCI (generic) USB controller> on uhci0
pci0: <serial bus> at device 4.6 (no driver attached)
pcib1: <ACPI PCI-PCI bridge> at device 5.0 on pci0
pci1: <ACPI PCI bus> on pcib1
pcib2: <ACPI PCI-PCI bridge> at device 13.0 on pci1
pci2: <ACPI PCI bus> on pcib2
atapci0: <ServerWorks HT1000 UDMA100 controller> port 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0x500-0x50f at device 6.1 on pci0
ata0: <ATA channel 0> on atapci0
ata0: [ITHREAD]
ata1: <ATA channel 1> on atapci0
ata1: [ITHREAD]
isab0: <PCI-ISA bridge> at device 6.2 on pci0
isa0: <ISA bus> on isab0
ohci0: <OHCI (generic) USB controller> port 0x1c00-0x1cff mem 0xf7ee0000-0xf7ee0fff irq 5 at device 7.0 on pci0
ohci0: [ITHREAD]
usbus1: <OHCI (generic) USB controller> on ohci0
ohci1: <OHCI (generic) USB controller> port 0x3000-0x30ff mem 0xf7ed0000-0xf7ed0fff irq 5 at device 7.1 on pci0
ohci1: [ITHREAD]
usbus2: <OHCI (generic) USB controller> on ohci1
ehci0: <EHCI (generic) USB 2.0 controller> port 0x3400-0x34ff mem 0xf7ec0000-0xf7ec0fff irq 5 at device 7.2 on pci0
ehci0: [ITHREAD]
usbus3: EHCI version 1.0
usbus3: <EHCI (generic) USB 2.0 controller> on ehci0
pcib3: <ACPI PCI-PCI bridge> irq 42 at device 15.0 on pci0
pci5: <ACPI PCI bus> on pcib3
pcib4: <ACPI PCI-PCI bridge> irq 38 at device 16.0 on pci0
pci8: <ACPI PCI bus> on pcib4
pcib5: <PCI-PCI bridge> irq 39 at device 17.0 on pci0
pci14: <PCI bus> on pcib5
pcib6: <ACPI PCI-PCI bridge> irq 40 at device 18.0 on pci0
pci11: <ACPI PCI bus> on pcib6
pcib7: <ACPI PCI-PCI bridge> irq 41 at device 19.0 on pci0
pci3: <ACPI PCI bus> on pcib7
pcib8: <PCI-PCI bridge> at device 0.0 on pci3
pci4: <PCI bus> on pcib8
bce0: <HP NC373i Multifunction Gigabit Server Adapter (B2)> mem 0xf8000000-0xf9ffffff irq 41 at device 0.0 on pci4
miibus0: <MII bus> on bce0
brgphy0: <BCM5708C 10/100/1000baseTX PHY> PHY 1 on miibus0
brgphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-FDX, auto
bce0: Ethernet address: 00:1b:78:38:dd:02
bce0: [ITHREAD]
bce0: ASIC (0x57081020); Rev (B2); Bus (PCI-X, 64-bit, 133MHz); B/C (1.9.6); Flags (MSI|MFW); MFW ()
pcib9: <ACPI Host-PCI bridge> on acpi0
pci64: <ACPI PCI bus> on pcib9
pcib10: <ACPI PCI-PCI bridge> irq 36 at device 15.0 on pci64
pci67: <ACPI PCI bus> on pcib10
pcib11: <ACPI PCI-PCI bridge> irq 32 at device 16.0 on pci64
pci70: <ACPI PCI bus> on pcib11
ciss0: <HP Smart Array P400> port 0x4000-0x40ff mem 0xfdf00000-0xfdffffff,0xfdef0000-0xfdef0fff irq 32 at device 0.0 on pci70
ciss0: PERFORMANT Transport
ciss0: [ITHREAD]
pcib12: <PCI-PCI bridge> irq 33 at device 17.0 on pci64
pci73: <PCI bus> on pcib12
pcib13: <ACPI PCI-PCI bridge> irq 34 at device 18.0 on pci64
pci65: <ACPI PCI bus> on pcib13
pcib14: <PCI-PCI bridge> at device 0.0 on pci65
pci66: <PCI bus> on pcib14
bce1: <HP NC373i Multifunction Gigabit Server Adapter (B2)> mem 0xfa000000-0xfbffffff irq 34 at device 0.0 on pci66
miibus1: <MII bus> on bce1
brgphy1: <BCM5708C 10/100/1000baseTX PHY> PHY 1 on miibus1
brgphy1:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-FDX, auto
bce1: Ethernet address: 00:1b:78:38:dd:00
bce1: [ITHREAD]
bce1: ASIC (0x57081020); Rev (B2); Bus (PCI-X, 64-bit, 133MHz); B/C (1.9.6); Flags (MSI|MFW); MFW ()
pcib15: <PCI-PCI bridge> irq 35 at device 19.0 on pci64
pci74: <PCI bus> on pcib15
atkbdc0: <Keyboard controller (i8042)> port 0x60,0x64 irq 1 on acpi0
atkbd0: <AT Keyboard> irq 1 on atkbdc0
kbd0 at atkbd0
atkbd0: [GIANT-LOCKED]
atkbd0: [ITHREAD]
psm0: <PS/2 Mouse> irq 12 on atkbdc0
psm0: [GIANT-LOCKED]
psm0: [ITHREAD]
psm0: model IntelliMouse, device ID 3
atrtc0: <AT realtime clock> port 0x70-0x71 on acpi0
uart0: <16550 or compatible> port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0
uart0: [FILTER]
uart0: console (9600,n,8,1)
cpu0: <ACPI CPU> on acpi0
hwpstate0: <Cool`n'Quiet 2.0> on cpu0
cpu1: <ACPI CPU> on acpi0
cpu2: <ACPI CPU> on acpi0
cpu3: <ACPI CPU> on acpi0
cpu4: <ACPI CPU> on acpi0
cpu5: <ACPI CPU> on acpi0
cpu6: <ACPI CPU> on acpi0
cpu7: <ACPI CPU> on acpi0
orm0: <ISA Option ROMs> at iomem 0xc0000-0xcafff,0xcb000-0xcefff,0xe5000-0xe6fff on isa0
sc0: <System console> at flags 0x100 on isa0
sc0: VGA <16 virtual consoles, flags=0x300>
vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0
ppc0: cannot reserve I/O port range
uart1: <Non-standard ns8250 class UART with FIFOs> at port 0x2f8-0x2ff irq 3 on isa0
uart1: [FILTER]
Timecounters tick every 1.000 msec
usbus0: 12Mbps Full Speed USB v1.0
usbus1: 12Mbps Full Speed USB v1.0
usbus2: 12Mbps Full Speed USB v1.0
usbus3: 480Mbps High Speed USB v2.0
acd0: CDRW <TSSTcorpCDW/DVD TS-L462D/HG01> at ata0-master UDMA33
ugen0.1: <(0x103c)> at usbus0
uhub0: <(0x103c) UHCI root HUB, class 9/0, rev 1.00/1.00, addr 1> on usbus0
ugen1.1: <(0x1166)> at usbus1
uhub1: <(0x1166) OHCI root HUB, class 9/0, rev 1.00/1.00, addr 1> on usbus1
ugen2.1: <(0x1166)> at usbus2
uhub2: <(0x1166) OHCI root HUB, class 9/0, rev 1.00/1.00, addr 1> on usbus2
ugen3.1: <(0x1166)> at usbus3
uhub3: <(0x1166) EHCI root HUB, class 9/0, rev 2.00/1.00, addr 1> on usbus3
uhub1: 2 ports with 2 removable, self powered
uhub2: 2 ports with 2 removable, self powered
uhub0: 2 ports with 2 removable, self powered
ugen0.2: <HP> at usbus0
ukbd0: <Virtual Keyboard> on usbus0
kbd2 at ukbd0
ums0: <Virtual Mouse> on usbus0
ums0: 3 buttons and [XY] coordinates ID=0
uhub3: 4 ports with 4 removable, self powered
ugen0.3: <HP> at usbus0
uhub4: <Virtual Hub> on usbus0
ugen3.2: <vendor 0x04b4> at usbus3
uhub5: <vendor 0x04b4 product 0x6560, class 9/0, rev 2.00/0.0b, addr 2> on usbus3
uhub5: 4 ports with 4 removable, self powered
uhub4: 7 ports with 7 removable, self powered
da0 at ciss0 bus 0 target 0 lun 0
da0: <COMPAQ RAID 5  VOLUME OK> Fixed Direct Access SCSI-5 device 
da0: 135.168MB/s transfers
da0: Command Queueing enabled
da0: 36863MB (75496320 512 byte sectors: 255H 32S/T 9252C)
da1 at ciss0 bus 0 target 1 lun 0
da1: <COMPAQ RAID 5  VOLUME OK> Fixed Direct Access SCSI-5 device 
da1: 135.168MB/s transfers
da1: Command Queueing enabled
da1: 243098MB (497866080 512 byte sectors: 255H 32S/T 61013C)
da2 at ciss0 bus 0 target 2 lun 0
da2: <COMPAQ RAID 0  VOLUME OK> Fixed Direct Access SCSI-5 device 
da2: 135.168MB/s transfers
da2: Command Queueing enabled
da2: 139979MB (286677120 512 byte sectors: 255H 32S/T 35132C)
da3 at ciss0 bus 0 target 3 lun 0
da3: <COMPAQ RAID 0  VOLUME OK> Fixed Direct Access SCSI-5 device 
da3: 135.168MB/s transfers
da3: Command Queueing enabled
da3: 139979MB (286677120 512 byte sectors: 255H 32S/T 35132C)
da4 at ciss0 bus 0 target 4 lun 0
da4: <COMPAQ RAID 0  VOLUME OK> Fixed Direct Access SCSI-5 device 
da4: 135.168MB/s transfers
da4: Command Queueing enabled
da4: 139979MB (286677120 512 byte sectors: 255H 32S/T 35132C)
da5 at ciss0 bus 0 target 5 lun 0
da5: <COMPAQ RAID 0  VOLUME OK> Fixed Direct Access SCSI-5 device 
da5: 135.168MB/s transfers
da5: Command Queueing enabled
da5: 139979MB (286677120 512 byte sectors: 255H 32S/T 35132C)
SMP: AP CPU #1 Launched!
SMP: AP CPU #7 Launched!
SMP: AP CPU #6 Launched!
SMP: AP CPU #3 Launched!
SMP: AP CPU #4 Launched!
SMP: AP CPU #5 Launched!
SMP: AP CPU #2 Launched!
WARNING: WITNESS option enabled, expect reduced performance.GEOM: da0: partition 3 does not start on a track boundary.

GEOM: da0: partition 3 does not end on a track boundary.
GEOM: da0: partition 2 does not start on a track boundary.
GEOM: da0: partition 2 does not end on a track boundary.
GEOM: da0: partition 1 does not start on a track boundary.
GEOM: da0: partition 1 does not end on a track boundary.
GEOM: da0s1: geometry does not match label (255h,63s != 255h,32s).
GEOM: da0s2: geometry does not match label (255h,63s != 255h,32s).
GEOM: da0s3: geometry does not match label (255h,63s != 255h,32s).
Trying to mount root from ufs:/dev/da0s1a
ZFS filesystem version 13
ZFS storage pool version 13
bce0: link state changed to UP

>Description:
I installed 8.0RC2-amd64 on an 8-core opteron server a few days ago when 8.0 RC2 came out.

When I tried to do a make buildworld or make buildkernel the server
rebooted without any message left in the logs. The same happened
when building bigger ports (for example ruby18 or perl58)

After this I installed 7.2-STABLE on this same server and did a "make
buildworld" and "make buildkernel" which completed without any problem.

Then I installed 8.0-BETA4 (crashes also when doing makeworld)

Finally I reinstalled 8.0RC2-amd64 on the server again and build a 8.0RC2 debug kernel on another amd server for this crashing server. 

I also:

- ran several passes with diagnostic software from the server manufacturer
- reset BIOS settings to default
- upgraded BIOS to newest release
- booted server from 2 year old backup BIOS
- took out the only pair of RAM modules that was different from the
rest of the modules
- ran memtest86 on the server (no problems found)

The server kept on crashing under load, when running buildworld.
Although dumpdev + dumpdir were correctly defined, the server just rebooted without writing a crashdump!

- Running a makeworld in about 80% leads to a server crash without
the server writing a crashdump to dumpdir. The server just reboots..

- In about 20% of the cases makeworld gets stuck in a not terminating
process that eats up 100% cpu. This process cannot be killed. When
restarting makeworld the server then reboots again

- It makes no difference doing makeworld -j1 or -j8, result is the same

Finally, I followed a hint I got on the freebsd-current list and set vm.pmap.pg_ps_enabled=0 in /boot/loader.conf an rebooted. The problem was gone!

After successful buildworld and buildkernel I rebooted the server
again with commented out vm.pmap.pg_ps_enabled=0 and the problem
was there again. And then I set vm.pmap.pg_ps_enabled=0 again in loader.conf,
rebooted + make buildworld .. no problem.

Seems to be deterministic. With vm.pmap.pg_ps_enabled=1 the server
crashes without being able to write crashdumps to dumpdev. (at least on
this specific HP Proliant DL385G2 server with 20G RAM)




>How-To-Repeat:
Install FreeBSD 8.0 RC2 amd64 + Sources, do a makeworld.
>Fix:
Workaround: Setting vm.pmap.pg_ps_enabled=0 in loader.conf and reboot.

>Release-Note:
>Audit-Trail:

From: Patrick Lamaiziere <patfbsd@davenulle.org>
To: bug-followup@FreeBSD.org <bug-followup@FreeBSD.org>
Cc: Kai Gallasch <gallasch@free.de>
Subject: Re: kern/140338: FreeBSD 8.0 RC2 with vm.pmap.pg_ps_enabled=1
 kernel panic with makeworld
Date: Fri, 6 Nov 2009 19:36:54 +0100

 Le Fri, 6 Nov 2009 17:28:40 GMT,
 Kai Gallasch <gallasch@free.de> :
 
 Hello,
 
 > ZFS filesystem version 13
 > ZFS storage pool version 13
 
 It seems you are using ZFS on this box?
 
 Well, I saw a similar this issue with 8.0 BETA 4/i386 but only with ZFS.
 Here it's 100% reproductible when I want to copy my /usr on a ZFS tank :
 
 tar cf - -C /usr . |=A0tar xpvf - -C /pool
 
 I was not able to dump the panic but the trace was :
 
 panic vm_fault : fault on no fault entry
 
 free()
 zfs_acl_node_free()
 zfs_acl_release_nodes()
 zfs_acl_free()
 zfs_zaccesss()
 zfs_freebsd_create()
 VOP_CREATE_APV()
 vn_open_read()
 vn_open()
 kern_openat()
 kern_open()
 open()
 syscall (open())
 
 I saw this panic in the write() syscall too, always when freeing
 something.
 
 On the same box, I've used super-pages for a longtime on FreeBSD 7.2 and
 with 8.0/BETA without any problem (but without ZFS too). Since
 I've turned off super-pages, ZFS is stable.
 
 Regards.
 (I'm sorry to no be able to provide more useful informations)
Responsible-Changed-From-To: freebsd-bugs->freebsd-fs 
Responsible-Changed-By: linimon 
Responsible-Changed-When: Sat Nov 7 03:10:45 UTC 2009 
Responsible-Changed-Why:  
Seems to happen with a combination of vm and zfs settings.  Since I have 
to pick an assignee, use the fs@ one. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=140338 

From: Kai Gallasch <gallasch@free.de>
To: bug-followup@FreeBSD.org <bug-followup@FreeBSD.org>
Cc: Patrick Lamaiziere <patfbsd@davenulle.org>, linimon@FreeBSD.org
Subject: Re: kern/140338: FreeBSD 8.0 RC2 with vm.pmap.pg_ps_enabled=1
 kernel panic with makeworld
Date: Sat, 7 Nov 2009 09:09:22 +0100

 Am Fri, 6 Nov 2009 19:36:54 +0100
 schrieb Patrick Lamaiziere <patfbsd@davenulle.org>:
 
 > Le Fri, 6 Nov 2009 17:28:40 GMT,
 > Kai Gallasch <gallasch@free.de> :
 > 
 > Hello,
 > 
 > > ZFS filesystem version 13
 > > ZFS storage pool version 13
 > 
 > It seems you are using ZFS on this box?
 
 No. The server is not in production and never was with FreeBSD 8.
 Only the kernel module is loaded, due to zfs_enable="YES" in rc.conf as
 a preparation for using ZFS with 8.0-RELEASE
 
 The problem described in the PR occured when zfs_enable="YES" was not
 set in rc.conf
 
 I see no direct connection to with this ZFS.
 
 > 
 > On the same box, I've used super-pages for a longtime on FreeBSD 7.2
 > and with 8.0/BETA without any problem (but without ZFS too). Since
 > I've turned off super-pages, ZFS is stable.
 
 I tested superpages on 7.2-STABLE with ZFS and had to deactivate them,
 after the server became instable.
 
 --Kai
Responsible-Changed-From-To: freebsd-fs->freebsd-bugs 
Responsible-Changed-By: gavin 
Responsible-Changed-When: Sat Nov 7 14:25:06 UTC 2009 
Responsible-Changed-Why:  
Not ZFS related after all 

http://www.freebsd.org/cgi/query-pr.cgi?pr=140338 

From: Kai Gallasch <gallasch@free.de>
To: bug-followup@FreeBSD.org <bug-followup@FreeBSD.org>
Cc:  
Subject: kern/140338: FreeBSD 8.0 RC2 with vm.pmap.pg_ps_enabled=1 kernel
 panic with makeworld
Date: Thu, 12 Nov 2009 20:09:36 +0100

 Update:
 
 Following proposals on the list freebsd-current today  I set
 hw.mca.enabled="1" and vm.pmap.pg_ps_enabled="1" in /boot/loader.conf
 on my (under load) spontaneously rebooting opteron proliant server.
 
 The server was upgraded to FREEBSD-8.0-PRERELEASE today (Nov. 12th 2009)
 
 This is what happened..
 
 
 ---- machine check trap, first run ----
 
 sonnenkraft:/usr/obj # MCA: CPU 5 UNCOR PCC OVER DTLB L1 error
 MCA: Address 0x80e5c8000
 
 
 Fatal trap 28: machine check trap while in user mode
 cpuid = 5; apic id = 05
 instruction pointer     = 0x43:0x691688
 stack pointer           = 0x3b:0x7fffffffdf90
 frame pointer           = 0x3b:0x6a2
 code segment            = base 0x0, limit 0xfffff, type 0x1b
                         = DPL 3, pres 1, long 1, def32 0, gran 1
 processor eflags        = interrupt enabled, IOPL = 0
 current process         = 29319 (cc1)
 [thread pid 29319 tid 100086 ]
 Stopped at      0x691688:       leal    0x1(%rax),%edx
 db> where    
 Tracing pid 29319 tid 100086 td 0xffffff000e065390
 WAKEUP_cpu() at 0x691688
 *** error reading from address 6aa ***
 db> bt    
 Tracing pid 29319 tid 100086 td 0xffffff000e065390
 WAKEUP_cpu() at 0x691688
 *** error reading from address 6aa ***
 db> call doadump    
 Cannot dump. Device not defined or unavailable.
 = 0x30
 
 
 ---- machine check trap, second run - this
                     time with dumpdev defined ----
 
 sonnenkraft:~ # MCA: CPU 2 UNCOR PCC OVER DTLB L1 error
 MCA: Address 0x8011d3000
 
 
 Fatal trap 28: machine check trap while in user mode
 cpuid = 2; apic id = 02
 instruction pointer     = 0x43:0x6b1241
 stack pointer           = 0x3b:0x7fffffffe200
 frame pointer           = 0x3b:0x7fffffffe240
 code segment            = base 0x0, limit 0xfffff, type 0x1b
                         = DPL 3, pres 1, long 1, def32 0, gran 1
 processor eflags        = interrupt enabled, IOPL = 0
 current process         = 69498 (cc1)
 [thread pid 69498 tid 100338 ]
 Stopped at      0x6b1241:       call    0x6af140
 db> where    
 Tracing pid 69498 tid 100338 td 0xffffff000ef75720
 WAKEUP_cpu() at 0x6b1241
 db> bt    
 Tracing pid 69498 tid 100338 td 0xffffff000ef75720
 WAKEUP_cpu() at 0x6b1241
 db> call doadump    
 Physical memory: 20462 MB
 Dumping 2303 MB: 2288 2272 2256 2240 2224 2208 2192 2176 2160 2144 2128
 2112 2096 2080 2064 2048 2032 2016 2000 1984 1968 1952 1936 1920 1904
 1888 1872 1856 1840 1824 1808 1792 1776 1760 1744 1728 1712 1696 1680
 1664 1648 1632 1616 1600 1584 1568 1552 1536 1520 1504 1488 1472 1456
 1440 1424 1408 1392 1376 1360 1344 1328 1312 1296 1280 1264 1248 1232
 1216 1200 1184 1168 1152 1136 1120 1104 1088 1072 1056 1040 1024 1008
 992 976 960 944 928 912 896 880 864 848 832 816 800 784 768 752 736 720
 704 688 672 656 640 624 608 592 576 560 544 528 512 496 480 464 448 432
 416 400 384 368 352 336 320 304 288 272 256 240 224 208 192 176 160 144
 128 112 96 80 64 48 32 16
 Dump complete
 = 0
 db> reboot    
 cpu_reset: Restarting BSP
 cpu_reset_proxy: Stopped CPU 2
 
 
 ---- machine check trap, third run - BIOS: static low
                power mode enabled, to rule out power/heat issue ----
 
 sonnenkraft:~ # MCA: CPU 4 UNCOR PCC OVER DTLB L1 error
 MCA: Address 0x8011fd000
 
 
 Fatal trap 28: machine check trap while in user mode
 cpuid = 4; apic id = 04
 instruction pointer     = 0x43:0x76127d
 stack pointer           = 0x3b:0x7fffffffe068
 frame pointer           = 0x3b:0x7fffffffe090
 code segment            = base 0x0, limit 0xfffff, type 0x1b
                         = DPL 3, pres 1, long 1, def32 0, gran 1
 processor eflags        = interrupt enabled, IOPL = 0
 current process         = 73135 (cc1)
 [thread pid 73135 tid 100146 ]
 Stopped at      0x76127d:       xorl    %edx,%edx
 db> where    
 Tracing pid 73135 tid 100146 td 0xffffff00071caab0
 WAKEUP_cpu() at 0x76127d
 db> bt    
 Tracing pid 73135 tid 100146 td 0xffffff00071caab0
 WAKEUP_cpu() at 0x76127d
 db> call doadump    
 Physical memory: 20462 MB
 Dumping 2335 MB: 2320 2304 2288 2272 2256 2240 2224 2208 2192 2176 2160
 2144 2128 2112 2096 2080 2064 2048 2032 2016 2000 1984 1968 1952 1936
 1920 1904 1888 1872 1856 1840 1824 1808 1792 1776 1760 1744 1728 1712
 1696 1680 1664 1648 1632 1616 1600 1584 1568 1552 1536 1520 1504 1488
 1472 1456 1440 1424 1408 1392 1376 1360 1344 1328 1312 1296 1280 1264
 1248 1232 1216 1200 1184 1168 1152 1136 1120 1104 1088 1072 1056 1040
 1024 1008 992 976 960 944 928 912 896 880 864 848 832 816 800 784 768
 752 736 720 704 688 672 656 640 624 608 592 576 560 544 528 512 496 480
 464 448 432 416 400 384 368 352 336 320 304 288 272 256 240 224 208 192
 176 160 144 128 112 96 80 64 48 32 16
 Dump complete
 = 0
 db> reboot    
 cpu_reset: Restarting BSP
 cpu_reset_proxy: Stopped CPU 4
 
 ---- END: ----
 
 
State-Changed-From-To: open->closed 
State-Changed-By: alc 
State-Changed-When: Mon Jul 5 20:51:26 UTC 2010 
State-Changed-Why:  
Ultimately, it was found that the root cause of these crashes was a 
hardware bug in AMD Family 10h processors.  In January, AMD documented 
this bug as Errata 383.  AMD's recommended workaround is implemented 
in FreeBSD 8.1-RELEASE and 7.3-STABLE. 

As a workaround in earlier releases, either hw.mca.enabled or 
vm.pmap.pg_ps_enabled must be disabled.  In some cases, such 
FreeBSD running as a virtual machine, the only option is to 
disable vm.pmap.pg_ps_enabled because the hypervisor controls 
the machine check hardware. 


http://www.freebsd.org/cgi/query-pr.cgi?pr=140338 
>Unformatted:
