From sigxcpu@oppenheimer.ccc-offenbach.org  Mon Sep 27 06:32:07 2004
Return-Path: <sigxcpu@oppenheimer.ccc-offenbach.org>
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 6A36316A4CE
	for <FreeBSD-gnats-submit@freebsd.org>; Mon, 27 Sep 2004 06:32:07 +0000 (GMT)
Received: from oppenheimer.ccc-offenbach.org (reverse-213-146-118-58.dialin.kamp-dsl.de [213.146.118.58])
	by mx1.FreeBSD.org (Postfix) with SMTP id 28C5943D41
	for <FreeBSD-gnats-submit@freebsd.org>; Mon, 27 Sep 2004 06:32:05 +0000 (GMT)
	(envelope-from sigxcpu@oppenheimer.ccc-offenbach.org)
Received: (qmail 3835 invoked by uid 1001); 27 Sep 2004 06:32:03 -0000
Message-Id: <20040927063203.2622.qmail@oppenheimer.ccc-offenbach.org>
Date: 27 Sep 2004 06:32:03 -0000
From: Jens Binnewies <sigxcpu@ccc-offenbach.org>
Reply-To: Jens Binnewies <sigxcpu@ccc-offenbach.org>
To: FreeBSD-gnats-submit@freebsd.org
Cc: g-c-n@gmx.de
Subject: "APIC: Previous IPI is stuck" on Siemens Primergy SMP
X-Send-Pr-Version: 3.113
X-GNATS-Notify:

>Number:         72123
>Category:       kern
>Synopsis:       [smp] "APIC: Previous IPI is stuck" on Siemens Primergy SMP
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    ups
>State:          closed
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Mon Sep 27 06:40:10 GMT 2004
>Closed-Date:    Tue Mar 08 22:16:20 GMT 2005
>Last-Modified:  Tue Mar 08 22:16:20 GMT 2005
>Originator:     Jens Binnewies
>Release:        FreeBSD 6.0-CURRENT i386
>Organization:
>Environment:

Copyright (c) 1992-2004 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
        The Regents of the University of California. All rights reserved.
FreeBSD 6.0-CURRENT #0: Sun Sep 26 19:29:34 CEST 2004
    sigxcpu@oppenheimer.cccom.bsd:/data.2.1/src/sys/i386/compile/OPPENHEIMER
Timecounter "i8254" frequency 1193182 Hz quality 0
CPU: Pentium Pro (199.99-MHz 686-class CPU)
  Origin = "GenuineIntel"  Id = 0x619  Stepping = 9
  Features=0xfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV>
real memory  = 1073741824 (1024 MB)
avail memory = 1045397504 (996 MB)
MPTable: <SNI D887 PRIMERGY P6 >
FreeBSD/SMP: Multiprocessor System Detected: 4 CPUs
 cpu0 (BSP): APIC ID:  3
 cpu1 (AP): APIC ID:  0
 cpu2 (AP): APIC ID:  1
 cpu3 (AP): APIC ID:  2
ioapic0: Assuming intbase of 0
ioapic1: Assuming intbase of 16
ioapic0 <Version 1.1> irqs 0-15 on motherboard
ioapic1 <Version 1.1> irqs 16-27 on motherboard
npx0: [FAST]
npx0: <math processor> on motherboard
npx0: INT 16 interface
pcib0: <Intel 82454KX/GX (Orion) host to PCI bridge> pcibus 0 on motherboard
pci0: <PCI bus> on pcib0
eisab0: <PCI-EISA bridge> at device 1.0 on pci0
eisa0: <EISA bus> on eisab0
mainboard0: <SNIfc11 (System Board)> on eisa0 slot 0
isa0: <ISA bus> on eisab0
pci0: <unknown> at device 2.0 (no driver attached)
pci0: <unknown> at device 2.1 (no driver attached)
pci0: <display, VGA> at device 3.0 (no driver attached)
pci0: <memory, RAM> at device 20.0 (no driver attached)
pcib1: <MPTable Host-PCI bridge> pcibus 1 on motherboard
pci1: <PCI bus> on pcib1
pci1: <unknown> at device 1.0 (no driver attached)
mlx0: <Mylex version 3 RAID interface> port 0xf800-0xf87f mem 0xfcd00000-0xfcd0007f irq 16 at device 8.0 on pci1
mlx0: [GIANT-LOCKED]
mlx0: DAC960P/PD, 3 channels, firmware 3.52-0-02, 16MB RAM
mlxd0: <Mylex System Drive> on mlx0
mlxd0: 26040MB (53329920 sectors) RAID 5 (online)
ahc0: <Adaptec 2940 Ultra SCSI adapter> port 0xf400-0xf4ff mem 0xfcc00000-0xfcc00fff irq 18 at device 10.0 on pci1
ahc0: [GIANT-LOCKED]
aic7880: Ultra Wide Channel A, SCSI Id=7, 16/253 SCBs
xl0: <3Com 3c905-TX Fast Etherlink XL> port 0xf000-0xf03f irq 19 at device 11.0 on pci1
miibus0: <MII bus> on xl0
nsphy0: <DP83840 10/100 media interface> on miibus0
nsphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
xl0: Ethernet address: 00:10:4b:af:c3:cb
cpu0 on motherboard
cpu1 on motherboard
cpu2 on motherboard
cpu3 on motherboard
orm0: <ISA Option ROMs> at iomem 0xca800-0xd07ff,0xc0000-0xc7fff on isa0
pmtimer0 on isa0
ata0 at port 0x3f6,0x1f0-0x1f7 irq 14 on isa0
ata1 at port 0x376,0x170-0x177 irq 15 on isa0
atkbdc0: <Keyboard controller (i8042)> at port 0x64,0x60 on isa0
atkbd0: <AT Keyboard> irq 1 on atkbdc0
kbd0 at atkbd0
atkbd0: [GIANT-LOCKED]
ppc0: parallel port not found.
sc0: <System console> at flags 0x100 on isa0
sc0: VGA <16 virtual consoles, flags=0x300>
sio0 at port 0x3f8-0x3ff irq 4 flags 0x10 on isa0
sio0: type 16550A
sio1: configured irq 3 not in bitmap of probed irqs 0
sio1: port may not be enabled
vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0
unknown: <PNP0303> can't assign resources (port)
unknown: <PNP0001> can't assign resources (irq)
unknown: <PNP0501> can't assign resources (port)
ahc1: No resources allocated.
ppc1: parallel port not found.
ahc1: No resources allocated.
ahc1: No resources allocated.
fdc1: cannot allocate I/O port (6 ports)
ahc1: No resources allocated.
Timecounters tick every 10.000 msec
Waiting 15 seconds for SCSI devices to settle
SMP: AP CPU #1 Launched!
SMP: AP CPU #2 Launched!
SMP: AP CPU #3 Launched!
cd0 at ahc0 bus 0 target 5 lun 0
cd0: <NEC CD-ROM DRIVE:463 1.15> Removable CD-ROM SCSI-2 device 
cd0: 10.000MB/s transfers (10.000MHz, offset 15)
cd0: Attempt to query device size failed: NOT READY, Medium not present
da3 at ahc0 bus 0 target 3 lun 0
da3: <COMPAQ ST15150W 6216> Fixed Direct Access SCSI-2 device 
da3: 20.000MB/s transfers (10.000MHz, offset 8, 16bit), Tagged Queueing Enabled
da3: 4094MB (8386000 512 byte sectors: 255H 63S/T 522C)
da2 at ahc0 bus 0 target 2 lun 0
da2: <COMPAQ ST15150W 6216> Fixed Direct Access SCSI-2 device 
da2: 20.000MB/s transfers (10.000MHz, offset 8, 16bit), Tagged Queueing Enabled
da2: 4094MB (8386000 512 byte sectors: 255H 63S/T 522C)
da1 at ahc0 bus 0 target 1 lun 0
da1: <HP C3728S 6039> Fixed Direct Access SCSI-2 device 
da1: 10.000MB/s transfers (10.000MHz, offset 15)
da1: 2047MB (4194058 512 byte sectors: 255H 63S/T 261C)
da0 at ahc0 bus 0 target 0 lun 0
da0: <COMPAQ ST15150W 6216> Fixed Direct Access SCSI-2 device 
da0: 20.000MB/s transfers (10.000MHz, offset 8, 16bit), Tagged Queueing Enabled
da0: 4094MB (8386000 512 byte sectors: 255H 63S/T 522C)
Mounted root from ufs:/dev/mlxd0s1a.

debug.mpsafevm: 1
debug.mpsafenet: 1 

Scheduler: SCHED_4BSD (same with SCHED_ULE) with PREEMPTION (without if SCHED_ULE) enabled
Userspace is in sync with Kernel


>Description:
After a unspec. amount of time (max 5 hours) with SMP enabled, my Siemens Primergy crashes.
I tried this several times always with the same result: "APIC: Previous IPI is stuck"

Good dump found on device /dev/da0s1b
  Architecture: i386
  Architecture version: 1
  Dump length: 1073741824B (1024 MB)
  Blocksize: 512
  Dumptime: Mon Sep 27 02:34:11 2004
  Hostname: oppenheimer.cccom.bsd
  Versionstring: FreeBSD 6.0-CURRENT #0: Sun Sep 26 19:29:34 CEST 2004
    sigxcpu@oppenheimer.cccom.bsd:/data.2.1/src/sys/i386/compile/OPPENHEIMER
  Panicstring: APIC: Previous IPI is stuck
  Bounds: 3

[root@oppenheimer: crash] (0) # kgdb kernel.debug vmcore.3
[GDB will not be able to debug user-mode threads: /usr/lib/libthread_db.so: Undefined symbol "ps_pglobal_lookup"]
GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i386-marcel-freebsd".
doadump () at pcpu.h:159
(kgdb) bt
#0  doadump () at pcpu.h:159
#1  0xc0504871 in boot (howto=260) at ../../../kern/kern_shutdown.c:385
#2  0xc0504c93 in panic (fmt=0xc06a7a15 "APIC: Previous IPI is stuck") at ../../../kern/kern_shutdown.c:541
#3  0xc065d5d6 in lapic_ipi_vectored (vector=243, dest=0) at ../../../i386/i386/local_apic.c:730
#4  0xc0662775 in ipi_selected (cpus=2, ipi=243) at ../../../i386/i386/mp_machdep.c:1163
#5  0xc0519798 in forward_wakeup (cpunum=255) at ../../../kern/sched_4bsd.c:939
#6  0xc05198b1 in sched_add (td=0xc2565900, flags=0) at ../../../kern/sched_4bsd.c:1011
#7  0xc0519f89 in setrunqueue (td=0xc2565900, flags=0) at kern_switch.c:350
#8  0xc052cf0b in turnstile_unpend (ts=0x0) at ../../../kern/subr_turnstile.c:739
#9  0xc04f9925 in _mtx_unlock_sleep (m=0xc06e1ee0, opts=0, file=0x0, line=0) at ../../../kern/kern_mutex.c:673
#10 0xc057c290 in vn_statfile (fp=0x0, sb=0x0, active_cred=0x0, td=0xc2957180) at ../../../kern/vfs_vnops.c:637
#11 0xc04da581 in fstat (td=0xc2957180, uap=0xe716ad14) at file.h:280
#12 0xc066c450 in syscall (frame=
      {tf_fs = 47, tf_es = 47, tf_ds = 47, tf_edi = -1077942988, tf_esi = 671544384,
 tf_ebp = -1077943016, tf_isp = -417944204, tf_ebx = 671538328, tf_edx = -1, tf_ecx = 671544384,
 tf_eax = 189, tf_trapno = 12, tf_err = 2, tf_eip = 671442255, tf_cs = 31, tf_eflags = 514, 
 tf_esp = -1077943156, tf_ss = 47}) at ../../../i386/i386/trap.c:1001
#13 0xc065679f in Xint0x80_syscall () at ../../../i386/i386/exception.s:201
#14 0x0000002f in ?? ()
#15 0x0000002f in ?? ()
#16 0x0000002f in ?? ()
#17 0xbfbfe534 in ?? ()
#18 0x2806f440 in ?? ()
#19 0xbfbfe518 in ?? ()
#20 0xe716ad74 in ?? ()
#21 0x2806dc98 in ?? ()
#22 0xffffffff in ?? ()
#23 0x2806f440 in ?? ()
#24 0x000000bd in ?? ()
#25 0x0000000c in ?? ()
#26 0x00000002 in ?? ()
#27 0x2805654f in ?? ()
#28 0x0000001f in ?? ()
#29 0x00000202 in ?? ()
#30 0xbfbfe48c in ?? ()
#31 0x0000002f in ?? ()
#32 0x00000000 in ?? ()
#33 0x00000000 in ?? ()
#34 0x00000000 in ?? ()
#35 0x00000000 in ?? ()
#36 0x2d658000 in ?? ()
#37 0xc2b4f380 in ?? ()
#38 0xc2957180 in ?? ()
#39 0xe716ab80 in ?? ()
#40 0xe716ab64 in ?? ()
#41 0xc1e79780 in ?? ()
#42 0xc05195b0 in sched_switch (td=0x2806f440, newtd=0x2806dc98, flags=Cannot access memory at address 0xbfbfe528
) at ../../../kern/sched_4bsd.c:841
Previous frame inner to this frame (corrupt stack?)
(kgdb)

I hope this helps a lil bit tracking the Problem.

>How-To-Repeat:
Just fire up the Machine and wait

>Fix:
>Release-Note:
>Audit-Trail:

From: Jens Binnewies <sigxcpu@ccc-offenbach.org>
To: bug-followup@FreeBSD.org
Cc:  
Subject: Re: kern/72123: "APIC: Previous IPI is stuck" on Siemens Primergy SMP
Date: Fri, 12 Nov 2004 14:49:20 +0100

 The following crash occurs under this environment:
 
 debug.mpsafenet: 1
 debug.mpsafevm: 1
 -
 SCHED_4BSD with and without PREEMPTION enabled 
 SCHED_ULE without PREEMPTION
 turned MPSAFE on and off on both kernels
 No 'special' kerneloptions
 
 
 -START
 
  Architecture: i386
  Architecture version: 1
  Dump length: 1073741824B (1024 MB)
  Blocksize: 512
  Dumptime: Fri Nov 12 13:40:00 2004
  Hostname: oppenheimer.cccom.bsd
  Versionstring: FreeBSD 6.0-CURRENT #0: Fri Nov 12 01:53:44 CET 2004
    sigxcpu@oppenheimer.cccom.bsd:/data.2.1/src/sys/i386/compile/OPPENHEIMER
  Panicstring: APIC: Previous IPI is stuck
  Bounds: 4
 
  #1  0xc0504d0c in boot (howto=260) at ../../../kern/kern_shutdown.c:401
  #2  0xc0505133 in panic (fmt=0xc06a3b6b "APIC: Previous IPI is stuck")
  at ../../../kern/kern_shutdown.c:557
  #3  0xc065de66 in lapic_ipi_vectored (vector=251, dest=0) at
  ../../../i386/i386/local_apic.c:730
  #4  0xc0663185 in ipi_selected (cpus=12, ipi=251) at
  ../../../i386/i386/mp_machdep.c:1178
  #5  0xc0663149 in forward_hardclock () at
  ../../../i386/i386/mp_machdep.c:1162
  #6  0xc067001a in clkintr (frame=0xe2f46c98) at
  ../../../i386/isa/clock.c:194
  #7  0xc065aecb in intr_execute_handlers (isrc=0xc06d3cc0,
  iframe=0xe2f46c98) at ../../../i386/i386/intr_machdep.c:201
  #8  0xc066fddf in atpic_handle_intr (iframe=
        {if_vec = 0, if_fs = 24, if_es = 16, if_ds = 267780112, if_edi =
  1, if_esi = 4, if_ebp = -487297828, if_ebx = -1041651072, if_edx =
  -1066549056, if_ecx = -1041651072, if_eax = 0, if_eip = -1067059467,
  if_cs = 8, if_eflags = 582, if_esp = -487297820, if_ss = -1067059416})
      at ../../../i386/isa/atpic.c:562
  #9  0xc0657090 in Xatpic_intr0 () at atpic_vector.s:70
  #10 0x00000000 in ?? ()
  #11 0x00000018 in ?? ()
  #12 0x00000010 in ?? ()
  #13 0x0ff60010 in ?? ()
  #14 0x00000001 in ?? ()
  #15 0x00000004 in ?? ()
  #16 0xe2f46cdc in ?? ()
  #17 0xe2f46cc8 in ?? ()
  #18 0xc1e9aa80 in ?? ()
  #19 0xc06dc0c0 in runq ()
  #20 0xc1e9aa80 in ?? ()
  #21 0x00000000 in ?? ()
  #22 0x00000000 in ?? ()
  #23 0x00000000 in ?? ()
  #24 0xc065f6f5 in cpu_idle_default () at
  ../../../i386/i386/machdep.c:1062
  #25 0xc065f728 in cpu_idle () at ../../../i386/i386/machdep.c:1085
  #26 0xc04e9cd5 in idle_proc (dummy=0x0) at ../../../kern/kern_idle.c:118
  #27 0xc04e98c0 in fork_exit (callout=0xc04e9c20 <idle_proc>, arg=0x0,
  frame=0x0) at ../../../kern/kern_fork.c:801
  #28 0xc065705c in fork_trampoline () at
  ../../../i386/i386/exception.s:209
 
 -END
 
 Occurs in the same situations as above

From: Stephan Uphoff <ups@tree.com>
To: Jens Binnewies <sigxcpu@ccc-offenbach.org>
Cc: FreeBSD-gnats-submit@FreeBSD.org, g-c-n@gmx.de
Subject: Re: kern/72123: "APIC: Previous IPI is stuck" on Siemens Primergy
	SMP
Date: Mon, 15 Nov 2004 17:29:12 -0500

 Can you try the patch from:
 
 http://lists.freebsd.org/pipermail/freebsd-current/2004-November/043156.html
 
 Thanks
 	Stephan
 
State-Changed-From-To: open->patched 
State-Changed-By: ups 
State-Changed-When: Tue Dec 7 20:50:28 GMT 2004 
State-Changed-Why:  
Should be fixed in current. 
Needs to be MFCed. 


Responsible-Changed-From-To: freebsd-bugs->ups 
Responsible-Changed-By: ups 
Responsible-Changed-When: Tue Dec 7 20:50:28 GMT 2004 
Responsible-Changed-Why:  
MFC reminder. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=72123 
State-Changed-From-To: patched->closed 
State-Changed-By: ups 
State-Changed-When: Tue Mar 8 22:14:56 GMT 2005 
State-Changed-Why:  
Fix has been MFCed to RELENG_5 in January by Xin LI. (Thanks again) 

http://www.freebsd.org/cgi/query-pr.cgi?pr=72123 
>Unformatted:
