From nobody@FreeBSD.org  Sat Nov 20 12:23:06 2004
Return-Path: <nobody@FreeBSD.org>
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 30EBD16A4CE
	for <freebsd-gnats-submit@FreeBSD.org>; Sat, 20 Nov 2004 12:23:06 +0000 (GMT)
Received: from www.freebsd.org (www.freebsd.org [216.136.204.117])
	by mx1.FreeBSD.org (Postfix) with ESMTP id E0AB243D5D
	for <freebsd-gnats-submit@FreeBSD.org>; Sat, 20 Nov 2004 12:23:05 +0000 (GMT)
	(envelope-from nobody@FreeBSD.org)
Received: from www.freebsd.org (localhost [127.0.0.1])
	by www.freebsd.org (8.12.11/8.12.11) with ESMTP id iAKCN5Rl094202
	for <freebsd-gnats-submit@FreeBSD.org>; Sat, 20 Nov 2004 12:23:05 GMT
	(envelope-from nobody@www.freebsd.org)
Received: (from nobody@localhost)
	by www.freebsd.org (8.12.11/8.12.11/Submit) id iAKCN5ww094201;
	Sat, 20 Nov 2004 12:23:05 GMT
	(envelope-from nobody)
Message-Id: <200411201223.iAKCN5ww094201@www.freebsd.org>
Date: Sat, 20 Nov 2004 12:23:05 GMT
From: "O. Hartmann" <ohartman@web.de>
To: freebsd-gnats-submit@FreeBSD.org
Subject: SMP crashes
X-Send-Pr-Version: www-2.3

>Number:         74156
>Category:       kern
>Synopsis:       SMP crashes
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    freebsd-bugs
>State:          closed
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Sat Nov 20 12:30:28 GMT 2004
>Closed-Date:    Mon Mar 21 07:25:48 GMT 2005
>Last-Modified:  Mon Mar 21 07:25:48 GMT 2005
>Originator:     O. Hartmann
>Release:        FreeBSD 5.3-RELEASE/FreeBSD 5.3-STABLE
>Organization:
Department for Geophysic Johannes Gutenberg-Universitaet Mainz
>Environment:
FreeBSD edda.geo.uni-mainz.de 5.3-RELEASE-p1 FreeBSD 5.3-RELEASE-p1 #74: Fri Nov 19 17:05:11 UTC 2004 root@edda.geo.uni-mainz.de:/usr/obj/usr/src/sys/EDDA  i386

>Description:
While in SMP mode utilizing two 1GHz Intel PIII CPUs FreeBSD crashes after a whi
le. I reportet this kind of crash many times in the bug report and I was advised
 to deliver more informations about this error. I will do again a full report.

The Crash only occurs when using two CPUs on the same hardware. Disabling SMP in
 /boot/loader.conf.local via kern.smp.disabled="1" keeps the system stable for d
ays and weeks (longest uptime: 13 days under load with FreeBSD 5.3-RELEASE). My
first reports on this crash related to two 866 Mhz CPUs with different steppings
, changing to two 1GHz P3 with the same stepping results in the same crash behav
iour. I will append a mptable -verbose -dmesg output!

This is the crash message I caught:

Fatal trap 12: page fault while in kernel mode
cpuid = 1; apic = 00
fault virtual address   =   0x1c
fault code              =   supervisor write, page not present
instruction pointer     =   0x8:0xc062ac76
stack pointer           =   0x10:0xe4e2d7ac
frame pointer           =   0x10:0xe4e2d7c4
code segment            =   base 0x0, limit 0xfffff, type 0x1b
                        =   DPL 0, pres 1, def32 1, gran 1
processor eflags        =   interrupt enabled, resume, IOPL = 0                        =   DPL 0, pres 1, def32 1, gran 1
processor eflags        =   interrupt enabled, resume, IOPL = 0
current process         =   44 (swi5: clock sio)
[thread 100042]
Stopped at                  ref +0x16: lock cmpxchgl %edx, 0x1c(%edx)
 
mptable -verbose -dmesg:


===============================================================================

MPTable, version 2.0.15

 looking for EBDA pointer @ 0x040e, found, searching EBDA @ 0x0009f000
 searching CMOS 'top of mem' @ 0x0009ec00 (635K)
 searching default 'top of mem' @ 0x0009fc00 (639K)
 searching BIOS @ 0x000f0000

 MP FPS found in BIOS @ physical addr: 0x000f5270

-------------------------------------------------------------------------------
                        
MP Floating Pointer Structure:

  location:                     BIOS
  physical address:             0x000f5270
  signature:                    '_MP_'
  length:                       16 bytes
  version:                      1.4
  checksum:                     0xe3
  mode:                         Virtual Wire

-------------------------------------------------------------------------------

MP Config Table Header:
 
  physical address:             0x000f4e60
  signature:                    'PCMP'
  base table length:            276
  version:                      1.4
  checksum:                     0x0d
  OEM ID:                       'OEM00000'
    Product ID:                   'PROD00000000'
  OEM table pointer:            0x00000000
  OEM table size:               0
  entry count:                  26  
  local APIC address:           0xfee00000
  extended table length:        124
  extended table checksum:      198

-------------------------------------------------------------------------------

MP Config Base Table Entries:

--
Processors:     APIC ID Version State           Family  Model   Step    Flags
                 3       0x11    BSP, usable     6       8       6       0x387fb
ff
                 0       0x11    AP, usable      6       8       6       0x387fb
ff
--
Bus:            Bus ID  Type
                 0       PCI
                 1       PCI
                 2       ISA
--
I/O APICs:      APIC ID Version State           Address
                 2       0x11    usable          0xfec00000
                 3       0x11    usable          0xfec01000
--
I/O Ints:       Type    Polarity    Trigger     Bus ID   IRQ    APIC ID PIN#
                ExtINT   conforms    conforms        2     0          2    0
                INT      conforms    conforms        2     1          2    1
                INT      conforms    conforms        2     0          2    2
                INT      conforms    conforms        2     3          2    3
                INT      conforms    conforms        2     4          2    4
                INT      conforms    conforms        2     6          2    6
                INT      conforms    conforms        2     7          2    7
                INT      conforms    conforms        2     8          2    8
                INT      conforms    conforms        2    12          2   12
                INT      conforms    conforms        2    13          2   13
                INT      conforms    conforms        2    14          2   14
                INT      conforms    conforms        2    15          2   15
                INT     active-lo       level        0  15:A          3   14
                INT     active-lo       level        2     9          2    9
                INT     active-lo       level        1   3:A          3    6
                INT     active-lo       level        1   5:A          3    8
                INT     active-lo       level        1   5:B          3    9
--
Local Ints:     Type    Polarity    Trigger     Bus ID   IRQ    APIC ID PIN#
                ExtINT  active-hi        edge        2     0        255    0
                NMI     active-hi        edge        2     0        255    1

-------------------------------------------------------------------------------

MP Config Extended Table Entries:

--
System Address Space
 bus ID: 0 address type: I/O address
 address base: 0x0
 address range: 0x10000
--
System Address Space
 bus ID: 0 address type: memory address
 address base: 0x40000000
 address range: 0xbebe0000
--
System Address Space
 bus ID: 0 address type: prefetch address
 address base: 0xfebe0000   
 address range: 0xe9420000  
--
System Address Space
 bus ID: 0 address type: memory address
 address base: 0xe8000000
 address range: 0x18000000
--
System Address Space
 bus ID: 0 address type: memory address
 address base: 0xa0000
 address range: 0x20000
--
Bus Heirarchy
 bus ID: 2 bus info: 0x01 parent bus ID: 0
--
Compatibility Bus Address
 bus ID: 0 address modifier: add
 predefined range: 0x00000000
--
Compatibility Bus Address
 bus ID: 0 address modifier: add
 predefined range: 0x00000001

-------------------------------------------------------------------------------

dmesg output:

Copyright (c) 1992-2004 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
        The Regents of the University of California. All rights reserved.
FreeBSD 5.3-RELEASE-p1 #74: Fri Nov 19 17:05:11 UTC 2004
    root@edda.geo.uni-mainz.de:/usr/obj/usr/src/sys/EDDA
ACPI APIC Table: <ASUS   CUR-DLS >
Timecounter "i8254" frequency 1193182 Hz quality 0
CPU: Intel Pentium III (1000.04-MHz 686-class CPU)
  Origin = "GenuineIntel"  Id = 0x686  Stepping = 6
  Features=0x387fbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,PN,MMX,FXSR,SSE>
real memory  = 1073721344 (1023 MB)
avail memory = 1041166336 (992 MB)
FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs
 cpu0 (BSP): APIC ID:  3
 cpu1 (AP): APIC ID:  0
ioapic0 <Version 1.1> irqs 0-15 on motherboard
ioapic1 <Version 1.1> irqs 16-31 on motherboard
netsmb_dev: loaded
npx0: [FAST]
npx0: <math processor> on motherboard
npx0: INT 16 interface
acpi0: <ASUS CUR-DLS> on motherboard
acpi0: Power Button (fixed)
Timecounter "ACPI-safe" frequency 3579545 Hz quality 1000
acpi_timer0: <32-bit timer at 3.579545MHz> port 0xe408-0xe40b on acpi0
cpu0: <ACPI CPU> on acpi0
cpu1: <ACPI CPU> on acpi0
acpi_button0: <Power Button> on acpi0
pcib0: <ACPI Host-PCI bridge> port 0xcf8-0xcff on acpi0
pci0: <ACPI PCI bus> on pcib0
pci0: <display, VGA> at device 7.0 (no driver attached)
isab0: <PCI-ISA bridge> port 0xe800-0xe80f at device 15.0 on pci0
isa0: <ISA bus> on isab0
atapci0: <ServerWorks ROSB4 UDMA33 controller> port 0xd400-0xd40f,0x376,0x170-0x177,0x3f6,0x1f0-
0x1f7 at device 15.1 on pci0
ata0: channel #0 on atapci0
ata1: channel #1 on atapci0
ohci0: <OHCI (generic) USB controller> mem 0xfc000000-0xfc000fff irq 9 at device 15.2 on pci0
ohci0: [GIANT-LOCKED]
usb0: OHCI version 1.0, legacy support
usb0: <OHCI (generic) USB controller> on ohci0
usb0: USB revision 1.0
uhub0: (0x1166) OHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub0: 4 ports with 4 removable, self powered
ugen0: OmniVision OV511+ Camera, rev 1.00/1.00, addr 2
pcib1: <ACPI Host-PCI bridge> on acpi0
pci1: <ACPI PCI bus> on pcib1
em0: <Intel(R) PRO/1000 Network Connection, Version - 1.7.35> port 0xd000-0xd03f mem 0xfb800000-
0xfb81ffff irq 22 at device 3.0 on pci1
em0: Ethernet address: 00:07:e9:14:8f:7b
em0:  Speed:N/A  Duplex:N/A
sym0: <1010-33> port 0xb800-0xb8ff mem 0xfa800000-0xfa801fff,0xfb000000-0xfb0003ff irq 24 at dev
ice 5.0 on pci1
sym0: Symbios NVRAM, ID 7, Fast-80, LVD, parity checking
sym0: open drain IRQ line driver, using on-chip SRAM
sym0: using LOAD/STORE-based firmware.
sym0: handling phase mismatch from SCRIPTS.
sym0: [GIANT-LOCKED]
sym1: <1010-33> port 0xb400-0xb4ff mem 0xf9800000-0xf9801fff,0xfa000000-0xfa0003ff irq 25 at dev
ice 5.1 on pci1
sym1: Symbios NVRAM, ID 7, Fast-80, LVD, parity checking
sym1: open drain IRQ line driver, using on-chip SRAM
sym1: using LOAD/STORE-based firmware.
sym1: handling phase mismatch from SCRIPTS.
sym1: [GIANT-LOCKED]
atkbdc0: <Keyboard controller (i8042)> port 0x64,0x60 irq 1 on acpi0
atkbd0: <AT Keyboard> irq 1 on atkbdc0
kbd0 at atkbd0
atkbd0: [GIANT-LOCKED]
psm0: <PS/2 Mouse> irq 12 on atkbdc0
psm0: [GIANT-LOCKED]
psm0: model IntelliMouse, device ID 3
sio0: <16550A-compatible COM port> port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0
sio0: type 16550A
sio1: <16550A-compatible COM port> port 0x2f8-0x2ff irq 3 on acpi0
sio1: type 16550A
ppc0: <ECP parallel printer port> port 0x778-0x77a,0x378-0x37f irq 7 drq 3 flags 0x8 on acpi0
ppc0: Generic chipset (ECP-only) in ECP mode
ppc0: FIFO with 16/16/8 bytes threshold
ppbus0: <Parallel port bus> on ppc0
lpt0: <Printer> on ppbus0
lpt0: Interrupt-driven port
fdc0: <floppy drive controller> port 0x3f7,0x3f0-0x3f5 irq 6 drq 2 on acpi0
fdc0: [FAST]
fd0: <1440-KB 3.5" drive> on fdc0 drive 0
orm0: <ISA Option ROMs> at iomem 0xd0000-0xd3fff,0xc0000-0xca7ff on isa0
sc0: <System console> at flags 0x100 on isa0
sc0: VGA <8 virtual consoles, flags=0x300>
vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0
fb0 at vga0
Timecounters tick every 2.000 msec
acd0: DVDR <NEC DVD RW ND-3500AG/2.16> at ata0-master UDMA33
Waiting 5 seconds for SCSI devices to settle
(noperiph:sym0:0:-1:-1): SCSI BUS reset delivered.
(noperiph:sym1:0:-1:-1): SCSI BUS reset delivered.
da0 at sym0 bus 0 target 0 lun 0
da0: <IBM IC35L018UWD210-0 S5BS> Fixed Direct Access SCSI-3 device
da0: 160.000MB/s transfers (80.000MHz, offset 62, 16bit), Tagged Queueing Enabled
da0: 17501MB (35843670 512 byte sectors: 255H 63S/T 2231C)
da1 at sym0 bus 0 target 1 lun 0
da1: <IBM DDYS-T18350N S96H> Fixed Direct Access SCSI-3 device
da1: 160.000MB/s transfers (80.000MHz, offset 62, 16bit), Tagged Queueing Enabled
da1: 17501MB (35843670 512 byte sectors: 255H 63S/T 2231C)
da2 at sym0 bus 0 target 2 lun 0
da2: <FUJITSU MAJ3182MP 5207> Fixed Direct Access SCSI-3 device
da2: 160.000MB/s transfers (80.000MHz, offset 62, 16bit), Tagged Queueing Enabled
da2: 17429MB (35694904 512 byte sectors: 255H 63S/T 2221C)
cd0 at ata0 bus 0 target 0 lun 0
cd0: <_NEC DVD_RW ND-3500AG 2.16> Removable CD-ROM SCSI-0 device
cd0: 33.000MB/s transfers
cd0: Attempt to query device size failed: NOT READY, Medium not present
SMP: AP CPU #1 Launched!
Mounting root from ufs:/dev/da0s1a
em0: Link is up 100 Mbps Full Duplex
pflog0: promiscuous mode enabled

===============================================================================




>How-To-Repeat:
Use ASUS CUR-DLS mainboard with FreeBSD 5.3 and utilize two CPUs and the built-in VGA (ATI RAGE XL) with 16 bit colours and Xorg 6.7.0
>Fix:
      
>Release-Note:
>Audit-Trail:

From: Robert Watson <rwatson@FreeBSD.org>
To: freebsd-gnats-submit@FreeBSD.org
Cc:  
Subject: Re: kern/74156: SMP crashes
Date: Sat, 20 Nov 2004 17:58:40 +0000 (GMT)

 On Sat, 20 Nov 2004, O. Hartmann wrote:
 
 > Fatal trap 12: page fault while in kernel mode
 > cpuid = 1; apic = 00
 > fault virtual address   =   0x1c
 > fault code              =   supervisor write, page not present
 > instruction pointer     =   0x8:0xc062ac76
 > stack pointer           =   0x10:0xe4e2d7ac
 > frame pointer           =   0x10:0xe4e2d7c4
 > code segment            =   base 0x0, limit 0xfffff, type 0x1b
 >                         =   DPL 0, pres 1, def32 1, gran 1
 > processor eflags        =   interrupt enabled, resume, IOPL = 0                        =   DPL 0, pres 1, def32 1, gran 1
 > processor eflags        =   interrupt enabled, resume, IOPL = 0
 > current process         =   44 (swi5: clock sio)
 > [thread 100042]
 > Stopped at                  ref +0x16: lock cmpxchgl %edx, 0x1c(%edx)
 
 Could I get you to run the following commands in DDB and include the
 output here:
 
 - show pcpu
 - "show pcpu X" foreach each cpu X
 - trace
 - trace on each thread active on a cpu shown using "show pcpu"
 - Use addr2line or gdb on a kernel with debug symbols to convert the
   symbol+offsets
 
 I've seen some similar reports of a page fault out of the clock/sio
 thread, but it may well be a specific callout.
 
 Thanks!
 
 Robert N M Watson             FreeBSD Core Team, TrustedBSD Projects
 robert@fledge.watson.org      Principal Research Scientist, McAfee Research
 
 
State-Changed-From-To: open->feedback 
State-Changed-By: linimon 
State-Changed-When: Mon Nov 29 08:01:24 GMT 2004 
State-Changed-Why:  
Set to feedback to note that submitter has been asked for more information. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=74156 

Adding to audit trail from related PR kern/74182 filed Sun Nov 21
02:00:48 GMT 2004:

I found out that PCI VGA graphics cards seem to trigger those problems.

In single user mode (for update and upgrade purposes) the console freezes
after doing a lot of output (while showing what' getting compiled or installed).
This 'freezing' does not occur using a Matrox Millenium II PCI! Single
user mode works great and stable - but I can not boot machine in multi
user mode anymore, the console get stuck with a blinking cursor. That is
really weird and seems to be a software problem. Disabling SMP keeps the
system running for days with the built in ATI Rage XL PCI graphics, but
it still breaks single user mode! I ran into harsh situations while trying
to do a buildworld and getting stuck due a dying system when libs got
installed!  SMP brings down the machine very rapid when doing a lot of
graphics!  Changing to a Matrox Millenium II, 4MB PCI graphics card keeps
single user mode working but freezes the box when coming up into multi
user mode! Machine get stuck with a blinking green carret but is still
responsive via network.
State-Changed-From-To: feedback->closed 
State-Changed-By: linimon 
State-Changed-When: Mon Mar 21 07:24:55 GMT 2005 
State-Changed-Why:  
Feedback timeout (> 4 months). 

Also note that submitter had been having hardware problems; see 
http://www.freebsd.org/cgi/query-pr.cgi?pr=72866. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=74156 
>Unformatted:
