From nobody@FreeBSD.org  Sun Aug 23 23:48:29 2009
Return-Path: <nobody@FreeBSD.org>
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 5F2D8106568C
	for <freebsd-gnats-submit@FreeBSD.org>; Sun, 23 Aug 2009 23:48:29 +0000 (UTC)
	(envelope-from nobody@FreeBSD.org)
Received: from www.freebsd.org (www.freebsd.org [IPv6:2001:4f8:fff6::21])
	by mx1.freebsd.org (Postfix) with ESMTP id 4CCA58FC14
	for <freebsd-gnats-submit@FreeBSD.org>; Sun, 23 Aug 2009 23:48:29 +0000 (UTC)
Received: from www.freebsd.org (localhost [127.0.0.1])
	by www.freebsd.org (8.14.3/8.14.3) with ESMTP id n7NNmStQ033700
	for <freebsd-gnats-submit@FreeBSD.org>; Sun, 23 Aug 2009 23:48:28 GMT
	(envelope-from nobody@www.freebsd.org)
Received: (from nobody@localhost)
	by www.freebsd.org (8.14.3/8.14.3/Submit) id n7NNmSc1033699;
	Sun, 23 Aug 2009 23:48:28 GMT
	(envelope-from nobody)
Message-Id: <200908232348.n7NNmSc1033699@www.freebsd.org>
Date: Sun, 23 Aug 2009 23:48:28 GMT
From: Terrence Koeman <root@mediamonks.net>
To: freebsd-gnats-submit@FreeBSD.org
Subject: spin lock held too long
X-Send-Pr-Version: www-3.1
X-GNATS-Notify:

>Number:         138117
>Category:       kern
>Synopsis:       [kernel] [panic] spin lock held too long
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    freebsd-bugs
>State:          closed
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Sun Aug 23 23:50:01 UTC 2009
>Closed-Date:    Wed May 02 15:54:31 UTC 2012
>Last-Modified:  Wed May 02 15:54:31 UTC 2012
>Originator:     Terrence Koeman
>Release:        7.2-STABLE
>Organization:
>Environment:
FreeBSD persephone 7.2-STABLE FreeBSD 7.2-STABLE #15: Mon Aug 24 01:00:54 CEST 2009     terrence@persephone:/usr/obj/usr/src/sys/PERSEPHONE-SMP-PAE  i386
>Description:
When rebooting the kernel panics with 'spin lock held too long' (full
text below).

This happens every reboot, not incidentally as I've seen it reported by
others.

I tried the following:

-Removed PAE (revert to GENERIC), this decreased the panics from every
 reboot to about 50% of reboots.
-Changed SCHED_ULE to SCHED_4BSD, did not change anything.
-Added KDB_UNATTENDED, but the server hangs on the panic.

Also the server hangs on boot, after the 10s countdown it hangs for
about 2-3 minutes before either booting normally or (1 out of 10 boots)
hangs forever (apparently, I've let it sit for hours).

I did notice that when the server is in the hung state on booting, I can
sometimes make it continue to boot by pressing and holding return on the
(usb) keyboard. I've tried other keys, but only the return key sometimes
works. It's not a coincidence, I can let it sit 10m or an hour and when
then holding return it will continue to boot within 3-5 seconds.

Aside from booting the server is rock-solid, I had it running overnight
with a concurrent 'make -j 20 buildkernel && make -j 20 buildworld',
'portupgrade -afr' and '/usr/libexec/locate.updatedb' in a continous
loop while running apachebench with 1500 connections on it from another
machine on the LAN.

Aside from an ACPI warning at boot (see below for dmesg) there are no
errors whatsoever and also no dumps or possibility to get into the debugger.

When ACPI is disabled the server never boots, it hangs at the same point
after the 10s countdown but never hangs and does not respond to return.

I'd be happy to get more info, but with no debugger or dump I don't know
where to start :) Please advise.

Panic:

---
Rebooting...
cpu_reset: Stopping other CPUs
spin lock 0xc05a8c00 (sched lock 0) held by 0xd3087000 (tid 100011) too long
panic: spin lock held too long
cpuid = 0
---
(with about half the panics the 5 lines text above the panic message are
 garbled with what looks like random ASCII printable data)

dmesg:
(Server is an IBM xseries 445 with 4x 3Ghz Xeon and 12Gb memory.

---
Copyright (c) 1992-2009 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
        The Regents of the University of California. All rights reserved.
FreeBSD is a registered trademark of The FreeBSD Foundation.
FreeBSD 7.2-STABLE #15: Mon Aug 24 01:00:54 CEST 2009
    terrence@persephone:/usr/obj/usr/src/sys/PERSEPHONE-SMP-PAE
Timecounter "i8254" frequency 1193182 Hz quality 0
CPU: Intel(R) Xeon(TM) CPU 3.00GHz (2993.89-MHz 686-class CPU)
  Origin = "GenuineIntel"  Id = 0xf29  Stepping = 9
  Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE>
  Features2=0x4400<CNXT-ID,xTPR>
  Logical CPUs per core: 2
real memory  = 13421772800 (12800 MB)
avail memory = 12619841536 (12035 MB)
ACPI APIC Table: <IBM    SERVIGIL>
FreeBSD/SMP: Multiprocessor System Detected: 8 CPUs
 cpu0 (BSP): APIC ID:  0
 cpu1 (AP/HT): APIC ID:  1
 cpu2 (AP): APIC ID: 18
 cpu3 (AP/HT): APIC ID: 19
 cpu4 (AP): APIC ID: 32
 cpu5 (AP/HT): APIC ID: 33
 cpu6 (AP): APIC ID: 50
 cpu7 (AP/HT): APIC ID: 51
ACPI Warning (tbfadt-0505): Optional field "Gpe1Block" has zero address or length:        0       0/4 [20070320]
MADT: Forcing active-low polarity and level trigger for SCI
ioapic1 <Version 1.1> irqs 44-87 on motherboard
ioapic0 <Version 1.1> irqs 0-43 on motherboard
kbd1 at kbdmux0
acpi0: <IBM SERVIGIL> on motherboard
acpi0: [ITHREAD]
acpi0: Power Button (fixed)
acpi0: reservation of 400, 100 (3) failed
Timecounter "ACPI-fast" frequency 3579545 Hz quality 1000
acpi_timer0: <32-bit timer at 3.579545MHz> port 0x508-0x50b on acpi0
pcib0: <ACPI Host-PCI bridge> on acpi0
pci0: <ACPI PCI bus> on pcib0
vgapci0: <VGA-compatible display> port 0x1800-0x18ff mem 0xe0000000-0xe7ffffff,0xf0a20000-0xf0a2ffff irq 16 at device 4.0 on pci0
isab0: <PCI-ISA bridge> at device 5.0 on pci0
isa0: <ISA bus> on isab0
atapci0: <VIA 82C686B UDMA100 controller> port 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0x700-0x70f at device 5.1 on pci0
ata0: <ATA channel 0> on atapci0
ata0: [ITHREAD]
ata1: <ATA channel 1> on atapci0
ata1: [ITHREAD]
uhci0: <VIA 83C572 USB controller> port 0x1900-0x191f irq 18 at device 5.2 on pci0
uhci0: [GIANT-LOCKED]
uhci0: [ITHREAD]
usb0: <VIA 83C572 USB controller> on uhci0
usb0: USB revision 1.0
uhub0: <VIA UHCI root hub, class 9/0, rev 1.00/1.00, addr 1> on usb0
uhub0: 2 ports with 2 removable, self powered
uhci1: <VIA 83C572 USB controller> port 0x1920-0x193f irq 18 at device 5.3 on pci0
uhci1: [GIANT-LOCKED]
uhci1: [ITHREAD]
usb1: <VIA 83C572 USB controller> on uhci1
usb1: USB revision 1.0
uhub1: <VIA UHCI root hub, class 9/0, rev 1.00/1.00, addr 1> on usb1
uhub1: 2 ports with 2 removable, self powered
pci0: <serial bus, SMBus> at device 5.4 (no driver attached)
pcib1: <ACPI Host-PCI bridge> on acpi0
pci1: <ACPI PCI bus> on pcib1
bge0: <Broadcom NetXtreme Gigabit Ethernet Controller, ASIC rev. 0x2002> mem 0xf0b00000-0xf0b0ffff,0xf0b10000-0xf0b1ffff irq 42 at device 4.0 on pci1
miibus0: <MII bus> on bge0
brgphy0: <BCM5704 10/100/1000baseTX PHY> PHY 1 on miibus0
brgphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-FDX, auto
bge0: Ethernet address: 00:09:6b:e6:39:7f
bge0: [ITHREAD]
bge1: <Broadcom NetXtreme Gigabit Ethernet Controller, ASIC rev. 0x2002> mem 0xf0b20000-0xf0b2ffff,0xf0b30000-0xf0b3ffff irq 11 at device 4.1 on pci1
miibus1: <MII bus> on bge1
brgphy1: <BCM5704 10/100/1000baseTX PHY> PHY 1 on miibus1
brgphy1:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-FDX, auto
bge1: Ethernet address: 00:09:6b:16:39:7f
bge1: [ITHREAD]
pcib2: <ACPI Host-PCI bridge> on acpi0
pci2: <ACPI PCI bus> on pcib2
pcib3: <ACPI Host-PCI bridge> on acpi0
pci5: <ACPI PCI bus> on pcib3
pcib4: <ACPI Host-PCI bridge> on acpi0
pci7: <ACPI PCI bus> on pcib4
ips0: <IBM ServeRAID Adapter> mem 0xf9c00000-0xf9c01fff irq 60 at device 3.0 on pci7
ips0: [ITHREAD]
pcib5: <ACPI Host-PCI bridge> on acpi0
pci9: <ACPI PCI bus> on pcib5
fdc0: <floppy drive controller> port 0x3f0-0x3f5 irq 6 drq 2 on acpi0
fdc0: [FILTER]
fd0: <1440-KB 3.5" drive> on fdc0 drive 0
cpu0: <ACPI CPU> on acpi0
p4tcc0: <CPU Frequency Thermal Control> on cpu0
cpu1: <ACPI CPU> on acpi0
p4tcc1: <CPU Frequency Thermal Control> on cpu1
cpu2: <ACPI CPU> on acpi0
p4tcc2: <CPU Frequency Thermal Control> on cpu2
cpu3: <ACPI CPU> on acpi0
p4tcc3: <CPU Frequency Thermal Control> on cpu3
cpu4: <ACPI CPU> on acpi0
p4tcc4: <CPU Frequency Thermal Control> on cpu4
cpu5: <ACPI CPU> on acpi0
p4tcc5: <CPU Frequency Thermal Control> on cpu5
cpu6: <ACPI CPU> on acpi0
p4tcc6: <CPU Frequency Thermal Control> on cpu6
cpu7: <ACPI CPU> on acpi0
p4tcc7: <CPU Frequency Thermal Control> on cpu7
pmtimer0 on isa0
orm0: <ISA Option ROMs> at iomem 0xc0000-0xcafff,0xcb000-0xce7ff pnpid ORM0000 on isa0
atkbdc0: <Keyboard controller (i8042)> at port 0x60,0x64 on isa0
atkbd0: <AT Keyboard> irq 1 on atkbdc0
kbd0 at atkbd0
atkbd0: [GIANT-LOCKED]
atkbd0: [ITHREAD]
sc0: <System console> at flags 0x100 on isa0
sc0: VGA <16 virtual consoles, flags=0x300>
vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0
ukbd0: <Microsoft Natural\M-. Ergonomic Keyboard 4000, class 0/0, rev 2.00/1.73, addr 2> on uhub1
kbd2 at ukbd0
Timecounters tick every 1.000 msec
ipfw2 (+ipv6) initialized, divert loadable, nat loadable, rule-based forwarding disabled, default to accept, logging limited to 2000 packets/entry by default
acd0: DVDROM <MATSHITADVD-ROM SR-8177/NB21> at ata0-master UDMA33
ips0: adapter type: ServeRAID 4Lx (neo lite)
ips0: logical drives: 1
ips0: Logical Drive 0: RAID1 sectors: 286748672, state OK
ipsd0: <Logical Drive> on ips0
ipsd0: Logical Drive  (140014MB)
SMP: AP CPU #7 Launched!
SMP: AP CPU #6 Launched!
SMP: AP CPU #5 Launched!
SMP: AP CPU #4 Launched!
SMP: AP CPU #1 Launched!
SMP: AP CPU #3 Launched!
SMP: AP CPU #2 Launched!
Trying to mount root from ufs:/dev/ipsd0s1a
bge0: link state changed to UP
---

>How-To-Repeat:
reboot
>Fix:


>Release-Note:
>Audit-Trail:
Responsible-Changed-From-To: freebsd-i386->freebsd-bugs 
Responsible-Changed-By: linimon 
Responsible-Changed-When: Mon Aug 24 04:11:50 UTC 2009 
Responsible-Changed-Why:  
This may not be i386-specific. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=138117 
State-Changed-From-To: open->feedback 
State-Changed-By: jh 
State-Changed-When: Sat Mar 10 08:55:53 UTC 2012 
State-Changed-Why:  
Can you still reproduce this on recent FreeBSD versions? 

http://www.freebsd.org/cgi/query-pr.cgi?pr=138117 
State-Changed-From-To: feedback->closed 
State-Changed-By: jh 
State-Changed-When: Wed May 2 15:54:30 UTC 2012 
State-Changed-Why:  
Feedback timeout. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=138117 
>Unformatted:
