From jonathan@sproggit.fluent.ltd.uk  Wed Nov 26 12:44:21 2003
Return-Path: <jonathan@sproggit.fluent.ltd.uk>
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id ACA2B16A4CE
	for <FreeBSD-gnats-submit@freebsd.org>; Wed, 26 Nov 2003 12:44:21 -0800 (PST)
Received: from sproggit.fluent.ltd.uk (host-210-240-27-217.pobox.net.uk [217.27.240.210])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 232E143FE3
	for <FreeBSD-gnats-submit@freebsd.org>; Wed, 26 Nov 2003 12:44:19 -0800 (PST)
	(envelope-from jonathan@sproggit.fluent.ltd.uk)
Received: from sproggit.fluent.ltd.uk (localhost [127.0.0.1])
	by sproggit.fluent.ltd.uk (8.12.10/8.12.10) with ESMTP id hAQKk69U005872;
	Wed, 26 Nov 2003 20:46:06 GMT
	(envelope-from jonathan@sproggit.fluent.ltd.uk)
Received: (from root@localhost)
	by sproggit.fluent.ltd.uk (8.12.10/8.12.10/Submit) id hAQKk4Ai005871;
	Wed, 26 Nov 2003 20:46:04 GMT
	(envelope-from jonathan)
Message-Id: <200311262046.hAQKk4Ai005871@sproggit.fluent.ltd.uk>
Date: Wed, 26 Nov 2003 20:46:04 GMT
From: Jonathan.Gilpin@sproggit.fluent.ltd.uk
Reply-To: jonathan@gilpin.org
To: FreeBSD-gnats-submit@freebsd.org
Cc: jonathan@fluent.ltd.uk
Subject: FreeBSD 4.9 Crashes on SuperMicro with SMP enabled and dual Xeons	
X-Send-Pr-Version: 3.113
X-GNATS-Notify:

>Number:         59719
>Category:       i386
>Synopsis:       [crash] 4.9 Crashes on SuperMicro with SMP enabled and dual Xeons
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    remko
>State:          closed
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Wed Nov 26 12:50:21 PST 2003
>Closed-Date:    Tue Sep 12 08:54:46 GMT 2006
>Last-Modified:  Tue Sep 12 08:54:46 GMT 2006
>Originator:     Jonathan Gilpin
>Release:        FreeBSD 4.9-STABLE i386
>Organization:
>Environment:
System: FreeBSD sproggit.fluent.ltd.uk 4.9-STABLE FreeBSD 4.9-STABLE #8: Wed Nov 26 19:48:53 GMT 2003 root@sproggit.fluent.ltd.uk:/usr/src/sys/compile/SPROGGIT i386

Copyright (c) 1992-2003 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
	The Regents of the University of California. All rights reserved.
FreeBSD 4.9-STABLE #8: Wed Nov 26 19:48:53 GMT 2003
    root@sproggit.fluent.ltd.uk:/usr/src/sys/compile/SPROGGIT
Timecounter "i8254"  frequency 1193182 Hz
CPU: Intel(R) Xeon(TM) CPU 2.40GHz (2400.11-MHz 686-class CPU)
  Origin = "GenuineIntel"  Id = 0xf27  Stepping = 7
  Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE>
real memory  = 2147418112 (2097088K bytes)
avail memory = 2088071168 (2039132K bytes)
Programming 16 pins in IOAPIC #0
IOAPIC #0 intpin 2 -> irq 0
Programming 16 pins in IOAPIC #1
Programming 16 pins in IOAPIC #2
Programming 16 pins in IOAPIC #3
FreeBSD/SMP: Multiprocessor motherboard: 2 CPUs
 cpu0 (BSP): apic id:  0, version: 0x00050014, at 0xfee00000
 cpu1 (AP):  apic id:  1, version: 0x00050014, at 0xfee00000
 io0 (APIC): apic id:  8, version: 0x000f0011, at 0xfec00000
 io1 (APIC): apic id:  9, version: 0x000f0011, at 0xfec01000
 io2 (APIC): apic id: 10, version: 0x000f0011, at 0xfec02000
 io3 (APIC): apic id: 11, version: 0x000f0011, at 0xfec03000
Preloaded elf kernel "kernel" at 0xc0388000.
Warning: Pentium 4 CPU: PSE disabled
Pentium Pro MTRR support enabled
Using $PIR table, 11 entries at 0xc00f4fd0
npx0: <math processor> on motherboard
npx0: INT 16 interface
pcib0: <ServerWorks host to PCI bridge(unknown chipset)> on motherboard
IOAPIC #1 intpin 12 -> irq 2
IOAPIC #1 intpin 10 -> irq 9
IOAPIC #1 intpin 13 -> irq 10
IOAPIC #1 intpin 1 -> irq 11
pci0: <PCI bus> on pcib0
em0: <Intel(R) PRO/1000 Network Connection, Version - 1.7.19> port 0xe000-0xe03f mem 0xfeb60000-0xfeb7ffff irq 2 at device 8.0 on pci0
em0:  Speed:N/A  Duplex:N/A
em1: <Intel(R) PRO/1000 Network Connection, Version - 1.7.19> port 0xe400-0xe43f mem 0xfeba0000-0xfebbffff irq 9 at device 9.0 on pci0
em1:  Speed:N/A  Duplex:N/A
pci0: <ATI Mach64-GR graphics accelerator> at 11.0 irq 10
atapci0: <Generic PCI ATA controller> port 0xffa0-0xffaf,0x374-0x377,0x170-0x177,0x3f4-0x3f7,0x1f0-0x1f7 at device 15.1 on pci0
ata0: at 0x1f0 irq 14 on atapci0
ata1: at 0x170 irq 15 on atapci0
ohci0: <OHCI (generic) USB controller> mem 0xfebfe000-0xfebfefff irq 11 at device 15.2 on pci0
usb0: OHCI version 1.0, legacy support
usb0: SMM does not respond, resetting
usb0: <OHCI (generic) USB controller> on ohci0
usb0: USB revision 1.0
uhub0: (0x1166) OHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub0: 4 ports with 4 removable, self powered
isab0: <PCI to ISA bridge (vendor=1166 device=0227)> at device 15.3 on pci0
isa0: <ISA bus> on isab0
pcib255: <ServerWorks host to PCI bridge(unknown chipset)> on motherboard
pci255: <PCI bus> on pcib255
pcib1: <Host to PCI bridge> on motherboard
pci1: <PCI bus> on pcib1
orm0: <Option ROMs> at iomem 0xc0000-0xc7fff,0xc8000-0xc97ff,0xc9800-0xcafff on isa0
pmtimer0 on isa0
fdc0: <NEC 72065B or clone> at port 0x3f0-0x3f5,0x3f7 irq 6 drq 2 on isa0
fdc0: FIFO enabled, 8 bytes threshold
fd0: <1440-KB 3.5" drive> on fdc0 drive 0
atkbdc0: <Keyboard controller (i8042)> at port 0x60,0x64 on isa0
atkbd0: <AT Keyboard> flags 0x1 irq 1 on atkbdc0
kbd0 at atkbd0
psm0: failed to get data.
psm0: <PS/2 Mouse> irq 12 on atkbdc0
psm0: model IntelliMouse Explorer, device ID 4
vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0
sc0: <System console> at flags 0x100 on isa0
sc0: VGA <16 virtual consoles, flags=0x300>
sio0 at port 0x3f8-0x3ff irq 4 flags 0x10 on isa0
sio0: type 16550A
sio1 at port 0x2f8-0x2ff irq 3 on isa0
sio1: type 16550A
APIC_IO: Testing 8254 interrupt delivery
APIC_IO: Broken MP table detected: 8254 is not connected to IOAPIC #0 intpin 2
APIC_IO: routing 8254 via 8259 and IOAPIC #0 intpin 0
SMP: AP CPU #1 Launched!
ad1: 32253MB <Maxtor 6E040L0> [65531/16/63] at ata0-slave BIOSDMA
ad3: 32253MB <Maxtor 6E040L0> [65531/16/63] at ata1-slave BIOSDMA
Mounting root from ufs:/dev/ad1s1a
WARNING: / was not properly dismounted
em0: Link is up 100 Mbps Full Duplex
Limiting closed port RST response from 201 to 200 packets per second

>Description:

	This boxes crashes with SMP enabled. It is a dual Xeon system with 2GB of RAM.
	The error message is:

nic: vm_page_remove(): page not found in hash
mp_lock = 01000001; cpuid = 1; lapic.id = 01000000
boot() called on cpu#1
 
syncing disks...

	The system works fine with SMP disabled.

>How-To-Repeat:
	Simply let the box run
>Fix:

	


>Release-Note:
>Audit-Trail:

From: Marc Olzheim <marcolz@mozilla.experimental.net>
To: freebsd-gnats-submit@FreeBSD.org, jonathan@gilpin.org
Cc:  
Subject: Re: kern/59719: FreeBSD 4.9 Crashes on SuperMicro with SMP enabled
 and dual Xeons
Date: Fri, 28 Nov 2003 15:22:14 +0100

 I've got multiple Supermicros running here... What's the system's type ?
 

From: "Jonathan Gilpin" <jonathan@fluent.ltd.uk>
To: "Marc Olzheim" <marcolz@mozilla.experimental.net>,
	<freebsd-gnats-submit@FreeBSD.org>
Cc:  
Subject: Re: kern/59719: FreeBSD 4.9 Crashes on SuperMicro with SMP enabled and dual Xeons
Date: Fri, 28 Nov 2003 14:33:59 -0000

 It's a 6013-I
 
 with 2 x 2.4GHZ Xeons and 2GB of RAM.
 
 have you got any running smp ok?
 
 Jonathan
 ----- Original Message ----- 
 From: "Marc Olzheim" <marcolz@mozilla.experimental.net>
 To: <freebsd-gnats-submit@FreeBSD.org>; <jonathan@gilpin.org>
 Sent: Friday, November 28, 2003 2:22 PM
 Subject: Re: kern/59719: FreeBSD 4.9 Crashes on SuperMicro with SMP enabled
 and dual Xeons
 
 
 > I've got multiple Supermicros running here... What's the system's type ?
 >
 >
 

From: Marc Olzheim <marcolz@mozilla.experimental.net>
To: freebsd-gnats-submit@FreeBSD.org, jonathan@gilpin.org
Cc:  
Subject: Re: kern/59719: FreeBSD 4.9 Crashes on SuperMicro with SMP enabled
 and dual Xeons
Date: Fri, 28 Nov 2003 16:12:50 +0100

 We've got multiple 6012L-6, 6022L-6 and SS6013P-8 all runing SMP, some 
 with HTT, some without...
 

From: Marc Olzheim <marcolz@mozilla.experimental.net>
To: freebsd-gnats-submit@FreeBSD.org, jonathan@gilpin.org
Cc:  
Subject: Re: kern/59719: FreeBSD 4.9 Crashes on SuperMicro with SMP enabled
 and dual Xeons
Date: Fri, 28 Nov 2003 16:18:14 +0100

 Hmm, that sets the main difference in the presence of SCSI and IDE.
 I get this in the dmesg for atapci0 on an 6013P-8:
 
 atapci0: <Intel ICH3 ATA100 controller> port 
 0x2060-0x206f,0x374-0x377,0x170-0x177,0x3f4-0x3f7,0x1f0-0
 x1f7 irq 0 at device 31.1 on pci0
 

From: "Jonathan Gilpin" <jonathan@fluent.ltd.uk>
To: "David Malone" <dwmalone@maths.tcd.ie>,
	<freebsd-gnats-submit@FreeBSD.org>
Cc: <freebsd-bugs@freebsd.org>, <freebsd-stable@freebsd.org>,
	"Don Bowman" <don@sandvine.com>
Subject: Re: kern/59719 Re: 4.9 Stable Crashes on SuperMicro with SMP
Date: Sat, 29 Nov 2003 01:22:21 -0000

 I've run memtest (memtest86.com) kindly provided by Don and it passed all
 the tests. I've installed installed a kernel module to test for memory
 errors and found that again no memory errors are found... So this means it's
 either a problem with the CPU's or a geniune bug in the kernel. (bugger!)
 
 I'm going to switch the CPU's around (we dont have any spare) and then try
 and get spares from my supplier to test with. Today I aslo enabled
 Hypterthreading in the BIOS and Disabled MPS 1.4. This had no effect and the
 box continued to crash...
 
 The strange thing is that this box ran fine using Suse...
 
 The box BTW is a SuperMicro 6013-I (some of you have asked).
 
 Is it worth enabling any debug stuff in the kernel? I'm not familair with
 gdb but can follow instructions to provide more info to anyone investigating
 possible bugs such as these...
 
 Unless the switching of the CPU's around works I'm going to have to go back
 to 1 CPU for stability...
 
 Jonathan
 
 
 
 ----- Original Message ----- 
 From: "David Malone" <dwmalone@maths.tcd.ie>
 To: "Jonathan Gilpin" <jonathan@fluent.ltd.uk>
 Cc: <freebsd-bugs@freebsd.org>; <freebsd-stable@freebsd.org>
 Sent: Thursday, November 27, 2003 2:09 PM
 Subject: Re: 4.9 Stable Crashes on SuperMicro with SMP
 
 
 > On Wed, Nov 26, 2003 at 10:35:49PM -0000, Jonathan Gilpin wrote:
 > > Further Crashes as reported before:
 >
 > Both of these crashes could potentially be caused by hardware or
 > memory problems. While it is possible it's a bug of some sort, I'd
 > start by checking out my hardware, if I were you...
 >
 > David.
 >
 

From: Uwe Doering <gemini@geminix.org>
To: freebsd-gnats-submit@FreeBSD.org
Cc: freebsd-bugs@freebsd.org, freebsd-stable@freebsd.org
Subject: Re: kern/59719 Re: 4.9 Stable Crashes on SuperMicro with SMP
Date: Sat, 29 Nov 2003 09:28:06 +0100

 Jonathan Gilpin wrote:
 > I've run memtest (memtest86.com) kindly provided by Don and it passed all
 > the tests. I've installed installed a kernel module to test for memory
 > errors and found that again no memory errors are found... So this means it's
 > either a problem with the CPU's or a geniune bug in the kernel. (bugger!)
 
 No, that's unfortunately not what it means.  If a memory test fails you 
 can draw the conclusion that you have bad memory, but this doesn't work 
 the other way round.  If a memory test passes there is still a 
 possibility that a memory chip is the culprit since memory test software 
 cannot find all errors.
 
 Also, there is the chip set on the mainboard that coordinates bus access 
 etc. for the two CPUs.  Mainboard and chip set developers are known to 
 make errors, too.  In this case you would have to swap the entire 
 mainboard, possible with one from a different manufacturer.  I can tell 
 you from my own experience that it is really hard to find reliable PC 
 hardware these days, in light of ever shorter and faster product release 
 cycles.
 
     Uwe
 -- 
 Uwe Doering         |  EscapeBox - Managed On-Demand UNIX Servers
 gemini@geminix.org  |  http://www.escapebox.net
 

From: "Jonathan Gilpin" <jonathan@fluent.ltd.uk>
To: <freebsd-gnats-submit@FreeBSD.org>
Cc: <freebsd-bugs@freebsd.org>, <freebsd-stable@freebsd.org>
Subject: Re: kern/59719 Re: 4.9 Stable Crashes on SuperMicro with SMP
Date: Sat, 29 Nov 2003 13:41:39 -0000

 Having disabled the SMP in the kernel. The box is running with
 Hypterthreading Turned on thus using 2 virtual CPU's.
 
 Would this indicatate that the problem is with the second CPU? Or does this
 prove nothing...
 
 Jonathan
 
 
 
 ----- Original Message ----- 
 From: "Jonathan Gilpin" <jonathan@fluent.ltd.uk>
 To: "David Malone" <dwmalone@maths.tcd.ie>;
 <freebsd-gnats-submit@FreeBSD.org>
 Cc: <freebsd-bugs@freebsd.org>; <freebsd-stable@freebsd.org>; "Don Bowman"
 <don@sandvine.com>
 Sent: Saturday, November 29, 2003 1:22 AM
 Subject: Re: kern/59719 Re: 4.9 Stable Crashes on SuperMicro with SMP
 
 
 > I've run memtest (memtest86.com) kindly provided by Don and it passed all
 > the tests. I've installed installed a kernel module to test for memory
 > errors and found that again no memory errors are found... So this means
 it's
 > either a problem with the CPU's or a geniune bug in the kernel. (bugger!)
 >
 > I'm going to switch the CPU's around (we dont have any spare) and then try
 > and get spares from my supplier to test with. Today I aslo enabled
 > Hypterthreading in the BIOS and Disabled MPS 1.4. This had no effect and
 the
 > box continued to crash...
 >
 > The strange thing is that this box ran fine using Suse...
 >
 > The box BTW is a SuperMicro 6013-I (some of you have asked).
 >
 > Is it worth enabling any debug stuff in the kernel? I'm not familair with
 > gdb but can follow instructions to provide more info to anyone
 investigating
 > possible bugs such as these...
 >
 > Unless the switching of the CPU's around works I'm going to have to go
 back
 > to 1 CPU for stability...
 >
 > Jonathan
 >
 >
 >
 > ----- Original Message ----- 
 > From: "David Malone" <dwmalone@maths.tcd.ie>
 > To: "Jonathan Gilpin" <jonathan@fluent.ltd.uk>
 > Cc: <freebsd-bugs@freebsd.org>; <freebsd-stable@freebsd.org>
 > Sent: Thursday, November 27, 2003 2:09 PM
 > Subject: Re: 4.9 Stable Crashes on SuperMicro with SMP
 >
 >
 > > On Wed, Nov 26, 2003 at 10:35:49PM -0000, Jonathan Gilpin wrote:
 > > > Further Crashes as reported before:
 > >
 > > Both of these crashes could potentially be caused by hardware or
 > > memory problems. While it is possible it's a bug of some sort, I'd
 > > start by checking out my hardware, if I were you...
 > >
 > > David.
 > >
 >
 

From: Don Bowman <don@sandvine.com>
To: 'Uwe Doering' <gemini@geminix.org>,
	freebsd-gnats-submit@FreeBSD.org
Cc: freebsd-bugs@freebsd.org, freebsd-stable@freebsd.org
Subject: RE: kern/59719 Re: 4.9 Stable Crashes on SuperMicro with SMP
Date: Sat, 29 Nov 2003 11:33:58 -0500

 From: Uwe Doering [mailto:gemini@geminix.org]
 > Jonathan Gilpin wrote:
 > > I've run memtest (memtest86.com) kindly provided by Don and 
 > it passed all
 > > the tests. I've installed installed a kernel module to test 
 > for memory
 > > errors and found that again no memory errors are found... 
 > So this means it's
 > > either a problem with the CPU's or a geniune bug in the 
 > kernel. (bugger!)
 > 
 > No, that's unfortunately not what it means.  If a memory test 
 > fails you 
 > can draw the conclusion that you have bad memory, but this 
 > doesn't work 
 > the other way round.  If a memory test passes there is still a 
 > possibility that a memory chip is the culprit since memory 
 > test software 
 > cannot find all errors.
 > 
 > Also, there is the chip set on the mainboard that coordinates 
 > bus access 
 > etc. for the two CPUs.  Mainboard and chip set developers are 
 > known to 
 > make errors, too.  In this case you would have to swap the entire 
 > mainboard, possible with one from a different manufacturer.  
 > I can tell 
 > you from my own experience that it is really hard to find reliable PC 
 > hardware these days, in light of ever shorter and faster 
 > product release 
 > cycles.
 
 I have several hundred of the motherboard the poster is using,
 and it works reliably with MP operation with 4.X.
 The memtest86 that i sent him understands the ECC registers
 on the e7501 MCH, it should find all correctable and uncorrectable
 errors.
 
 --don
State-Changed-From-To: open->feedback 
State-Changed-By: remko 
State-Changed-When: Mon Sep 11 11:18:11 UTC 2006 
State-Changed-Why:  
Hello, 

We are at release 6.1 now, can you tell me whether the problem 
is still there in that release? Reason I am asking is that this 
PR is rather old and perhaps is not up to date any more. 

Thanks in advance 


Responsible-Changed-From-To: freebsd-i386->remko 
Responsible-Changed-By: remko 
Responsible-Changed-When: Mon Sep 11 11:18:11 UTC 2006 
Responsible-Changed-Why:  
Grab the PR 

http://www.freebsd.org/cgi/query-pr.cgi?pr=59719 

From: "Jonathan Gilpin" <fluentltd@gmail.com>
To: bug-followup@FreeBSD.org
Cc:  
Subject: Re: i386/59719: [crash] 4.9 Crashes on SuperMicro with SMP enabled and dual Xeons
Date: Mon, 11 Sep 2006 15:52:40 +0100

 ------=_Part_179973_4659874.1157986360398
 Content-Type: text/plain; charset=ISO-8859-1; format=flowed
 Content-Transfer-Encoding: 7bit
 Content-Disposition: inline
 
 I believe this is fixed in 6.1
 
 Jonathan
 
 ------=_Part_179973_4659874.1157986360398
 Content-Type: text/html; charset=ISO-8859-1
 Content-Transfer-Encoding: 7bit
 Content-Disposition: inline
 
 I believe this is fixed in 6.1<br><br>Jonathan<br><br>
 
 ------=_Part_179973_4659874.1157986360398--
State-Changed-From-To: feedback->closed 
State-Changed-By: remko 
State-Changed-When: Tue Sep 12 08:54:45 UTC 2006 
State-Changed-Why:  
The submitter reports that this got solved in 6.1, thanks for the 
feedback! => solved 

http://www.freebsd.org/cgi/query-pr.cgi?pr=59719 
>Unformatted:
