From nobody@FreeBSD.org  Sat Jan  6 08:38:20 2007
Return-Path: <nobody@FreeBSD.org>
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id 85E1D16A492
	for <freebsd-gnats-submit@FreeBSD.org>; Sat,  6 Jan 2007 08:38:20 +0000 (UTC)
	(envelope-from nobody@FreeBSD.org)
Received: from www.freebsd.org (www.freebsd.org [69.147.83.33])
	by mx1.freebsd.org (Postfix) with ESMTP id 6624313C45D
	for <freebsd-gnats-submit@FreeBSD.org>; Sat,  6 Jan 2007 08:38:20 +0000 (UTC)
	(envelope-from nobody@FreeBSD.org)
Received: from www.freebsd.org (localhost [127.0.0.1])
	by www.freebsd.org (8.13.1/8.13.1) with ESMTP id l068cK3E064041
	for <freebsd-gnats-submit@FreeBSD.org>; Sat, 6 Jan 2007 08:38:20 GMT
	(envelope-from nobody@www.freebsd.org)
Received: (from nobody@localhost)
	by www.freebsd.org (8.13.1/8.13.1/Submit) id l068cK4h064040;
	Sat, 6 Jan 2007 08:38:20 GMT
	(envelope-from nobody)
Message-Id: <200701060838.l068cK4h064040@www.freebsd.org>
Date: Sat, 6 Jan 2007 08:38:20 GMT
From: Jens Kleister<kleister@1a-infosysteme.de>
To: freebsd-gnats-submit@FreeBSD.org
Subject: Raid Problem beim Zugriff auf Raid
X-Send-Pr-Version: www-3.0

>Number:         107608
>Category:       kern
>Synopsis:       [twa] [hang] Raid Problem beim Zugriff auf Raid
>Confidential:   no
>Severity:       serious
>Priority:       high
>Responsible:    freebsd-bugs
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Sat Jan 06 08:40:11 GMT 2007
>Closed-Date:    
>Last-Modified:  Mon Apr 14 19:48:34 UTC 2008
>Originator:     Jens Kleister
>Release:        FreeBSD 6.2-PRERELEASE (FILESERVER)
>Organization:
1A Infosysteme GmbH
>Environment:
[root@fileserver1 ~]# uname -a
FreeBSD fileserver1.1a-infosysteme.de 6.2-PRERELEASE FreeBSD 6.2-PRERELEASE #0: Thu Dec 28 16:18:40 UTC 2006     root@fileserver1.1a-infosysteme.de:/usr/src/sys/i386/compile/FILESERVER  i386

>Description:
Hallo, es besteht folgendes Problem.

Der Server luft ca. 1 Woche aktiv auch unter hoher Last ohne Probleme.
Wenn dann irgendwann ein Zugriff auf den Swap erfolgen soll, dann hngt
sich das ganze System auf. 

Folgende Fehlermeldung:
swap_pager: indefinite wait buffer: bufobj: 0, blkno:2, size: 4096
swap_pager: indefinite wait buffer: bufobj: 0, blkno:18, size: 4096
swap_pager: indefinite wait buffer: bufobj: 0, blkno:5, size: 4096

Jetzt geht nichts mehr und man kann das System nur resetten.

Dieses Problem scheint es nach Google auch bei anderen Systeme schon
gegeben haben, aber scheinbar nur in Verbindung mit Raid-Karten.
 
Das Raid ist dabei dann aber nicht zerstrt.

tw_cli /c0 show all
/c0 Driver Version = 3.60.02.012
/c0 Model = 9500S-8
/c0 Memory Installed  = 112MB
/c0 Firmware Version = FE9X 2.08.00.006
/c0 Bios Version = BE9X 2.03.01.052
/c0 Monitor Version = BL9X 2.02.00.001
/c0 Serial Number = L19404A5400107
/c0 PCB Version = Rev 019
/c0 PCHIP Version = 1.50
/c0 ACHIP Version = 3.20
/c0 Number of Ports = 8
/c0 Number of Units = 3
/c0 Number of Drives = 8
/c0 Total Optimal Units = 3
/c0 Not Optimal Units = 0
/c0 JBOD Export Policy = off
/c0 Disk Spinup Policy = 1
/c0 Spinup Stagger Time Policy (sec) = 2
/c0 Auto-Carving Policy = off
/c0 Auto-Carving Size = 2048 GB
/c0 Cache on Degrade Policy = Cache Off

Unit  UnitType  Status         %RCmpl  %V/I/M  Stripe  Size(GB)  Cache  AVrfy
------------------------------------------------------------------------------
u0    RAID-10   OK             -       -       64K     130.365   ON     OFF
u1    RAID-5    OK             -       -       64K     465.641   ON     OFF
u2    SPARE     OK             -       -       -       232.877   -      OFF

Port   Status           Unit   Size        Blocks        Serial
---------------------------------------------------------------
p0     OK               u0     69.25 GB    145226112     WD-WMAKE1901350
p1     OK               u0     69.25 GB    145226112     WD-WMAKE1901550
p2     OK               u0     69.25 GB    145226112     WD-WMAKE1902083
p3     OK               u0     69.25 GB    145226112     WD-WMAKE1901530
p4     OK               u1     232.88 GB   488397168     WD-WMANY1038026
p5     OK               u1     232.88 GB   488397168     WD-WCANK2644746
p6     OK               u1     232.88 GB   488397168     WD-WCANK3893478
p7     OK               u2     232.88 GB   488397168     WD-WCAL76951640



Anmerkung, in der 5.5 Freebsd-Version gab es das Problem nicht.




>How-To-Repeat:
Nur mit eingebauter Raidkarte. Ich habe das gleiche Problem auch auf
anderen Systemen mit Raid.
Hardwaredefekt schliee ich aus.



>Fix:
Swap deaktivieren, aber auf dauer nicht so schn.
>Release-Note:
>Audit-Trail:
State-Changed-From-To: open->feedback 
State-Changed-By: remko 
State-Changed-When: Sun Jan 7 11:22:22 UTC 2007 
State-Changed-Why:  
We provide english only lists. Please translate your potential 
problem into the English language. 

Thanks! 

http://www.freebsd.org/cgi/query-pr.cgi?pr=107608 

From: Lars Engels <lars.engels@0x20.net>
To: bug-followup@FreeBSD.org, kleister@1a-infosysteme.de
Cc:  
Subject: Re: kern/107608: Raid Problem beim Zugriff auf Raid
Date: Sun, 7 Jan 2007 22:52:14 +0100

 --Pbi2MkfjSbXbI4MN
 Content-Type: text/plain; charset=iso-8859-15
 Content-Disposition: inline
 
 Remko asked me to translate the pr from german to (bad) english.
 
 ------------------------------------------------------------------
 
 Hello, following problem:
 
 The server runs under heavy load without any problem for about a week.
 Then, when the swap is accessed, the whole system freezes with this
 error message:
 
 swap_pager: indefinite wait buffer: bufobj: 0, blkno:2, size: 4096
 swap_pager: indefinite wait buffer: bufobj: 0, blkno:18, size: 4096
 swap_pager: indefinite wait buffer: bufobj: 0, blkno:5, size: 4096
 
 The System hangs and needs to be resetted.
 
 I already googled the problem and found other people having the same
 problem but related to RAID cards.
 The RAID doesn't get destroyed then.
 
 tw_cli /c0 show all
 /c0 Driver Version = 3.60.02.012
 /c0 Model = 9500S-8
 /c0 Memory Installed = 112MB
 /c0 Firmware Version = FE9X 2.08.00.006
 /c0 Bios Version = BE9X 2.03.01.052
 /c0 Monitor Version = BL9X 2.02.00.001
 /c0 Serial Number = L19404A5400107
 /c0 PCB Version = Rev 019
 /c0 PCHIP Version = 1.50
 /c0 ACHIP Version = 3.20
 /c0 Number of Ports = 8
 /c0 Number of Units = 3
 /c0 Number of Drives = 8
 /c0 Total Optimal Units = 3
 /c0 Not Optimal Units = 0
 /c0 JBOD Export Policy = off
 /c0 Disk Spinup Policy = 1
 /c0 Spinup Stagger Time Policy (sec) = 2
 /c0 Auto-Carving Policy = off
 /c0 Auto-Carving Size = 2048 GB
 /c0 Cache on Degrade Policy = Cache Off
 
 Unit UnitType Status %RCmpl %V/I/M Stripe Size(GB) Cache AVrfy
 ------------------------------------------------------------------------------
 u0 RAID-10 OK - - 64K 130.365 ON OFF
 u1 RAID-5 OK - - 64K 465.641 ON OFF
 u2 SPARE OK - - - 232.877 - OFF
 
 Port Status Unit Size Blocks Serial
 ---------------------------------------------------------------
 p0 OK u0 69.25 GB 145226112 WD-WMAKE1901350
 p1 OK u0 69.25 GB 145226112 WD-WMAKE1901550
 p2 OK u0 69.25 GB 145226112 WD-WMAKE1902083
 p3 OK u0 69.25 GB 145226112 WD-WMAKE1901530
 p4 OK u1 232.88 GB 488397168 WD-WMANY1038026
 p5 OK u1 232.88 GB 488397168 WD-WCANK2644746
 p6 OK u1 232.88 GB 488397168 WD-WCANK3893478
 p7 OK u2 232.88 GB 488397168 WD-WCAL76951640
 
 
 
 Remark: The problem didn't occut in FreeBSD 5.5.
 
 How-To-Repeat:
 Only with RAID card. I also have the same problem on other RAID
 machines.
 I rule out hardware issues.
 
 Fix:
 Deactivate swap, which isn't that nice...
 
 --Pbi2MkfjSbXbI4MN
 Content-Type: application/pgp-signature
 Content-Disposition: inline
 
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.6 (FreeBSD)
 
 iD8DBQFFoWuOKc512sD3afgRAsllAJ0fTEd2PI3kIr0wdikFcxw+bG4UKwCeMIIu
 TBuUe9BMkaFGAyjCmguB4KY=
 =cPTw
 -----END PGP SIGNATURE-----
 
 --Pbi2MkfjSbXbI4MN--
State-Changed-From-To: feedback->open 
State-Changed-By: linimon 
State-Changed-When: Sun May 13 04:37:50 UTC 2007 
State-Changed-Why:  
Feedback was received. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=107608 

From: Remco Bressers <remco@signet.nl>
To: bug-followup@FreeBSD.org,  kleister@1a-infosysteme.de
Cc: kris@obsecurity.org, Piet van der Steen <pvdsteen@signet.nl>
Subject: Re: kern/107608: Raid Problem beim Zugriff auf Raid
Date: Thu, 21 Jun 2007 01:03:44 +0200

 Hi,
 
 This very same thing is also filed in kern/103455.
 I'm in the middle of a discussion with Kris Kennaway about this issue.
 
 I see you're also using a 3ware card just as i did, but that could be a 
 coincidence. I'll CC Kris on this.
 
 Regards,
 
 Remco Bressers
 Signet B.V.

From: Michael Dosser <mic@strg.at>
To: bug-followup@FreeBSD.org
Cc:  
Subject: Re: kern/107608: Raid Problem beim Zugriff auf Raid
Date: Mon, 13 Aug 2007 11:45:40 +0200

 Hi,
 
 I had exactly the same messages (lots of them) on the screen of one of 
 our servers last saturday:
 
 swap_pager: indefinite wait buffer: [couldn't type from the terminal]
 
 The machine gave back pongs when pinging, but login was impossible. All 
 services were unresponsive. This machine acts as a web-, database- and 
 mailserver. Before the machine was unresponsive, I saw about 300 MB of 
 swap used. Now I see the following:
 
 # swapinfo
 Device          1K-blocks     Used    Avail Capacity
 /dev/da0s1b       4096380     2288  4094092     0%
 
 # uname -a
 FreeBSD xxx.xxxx.xx 6.2-RELEASE FreeBSD 6.2-RELEASE #0: Fri Feb  2 
 15:05:26 CET 2007     root@xxx.xxxx.xx:/usr/obj/usr/src/sys/XXX  i386
 
 The machine is running since February 2007 as a production machine and 
 this never happened. I can provide munin statistics of page in/out since 
 February if needed. There was far more swap activity before that day 
 what I can see from the graphs.
 
 This machine is a "Intel(R) Core(TM)2 CPU 6300 @ 1.86GHz" with four 
 gigabyte of RAM installed.
 
 # tw_cli /c0 show all
 /c0 Driver Version = 3.60.02.012
 /c0 Model = 9550SXU-4LP
 /c0 Memory Installed  = 112MB
 /c0 Firmware Version = FE9X 3.02.00.016
 /c0 Bios Version = BE9X 3.01.00.027
 /c0 Monitor Version = BL9X 3.02.00.001
 /c0 Serial Number = L320909A6200350
 /c0 PCB Version = Rev 032
 /c0 PCHIP Version = 1.60
 /c0 ACHIP Version = 1.90
 /c0 Number of Ports = 4
 /c0 Number of Units = 2
 /c0 Number of Drives = 4
 /c0 Total Optimal Units = 2
 /c0 Not Optimal Units = 0
 /c0 JBOD Export Policy = off
 /c0 Disk Spinup Policy = 1
 /c0 Spinup Stagger Time Policy (sec) = 2
 /c0 Auto-Carving Policy = off
 /c0 Auto-Carving Size = 2048 GB
 /c0 Auto-Rebuild Policy = on
 /c0 Controller Bus Type = PCIX
 /c0 Controller Bus Width = 64 bits
 /c0 Controller Bus Speed = 133 Mhz
 
 Unit  UnitType  Status         %RCmpl  %V/I/M  Stripe  Size(GB)  Cache 
 AVrfy
 ------------------------------------------------------------------------------
 u0    RAID-1    OK             -       -       -       298.013   ON 
 OFF
 u1    RAID-1    OK             -       -       -       149.001   ON 
 OFF
 
 Port   Status           Unit   Size        Blocks        Serial
 ---------------------------------------------------------------
 p0     OK               u0     298.09 GB   625142448     WD-WCAPD3473759 
 
 p1     OK               u0     298.09 GB   625142448     WD-WCAPD3473574 
 
 p2     OK               u1     149.01 GB   312500000     Y4D9GSGE 
 
 p3     OK               u1     149.01 GB   312500000     Y4D9GTCE 
 
 
 Unit "u0" is the array were swap is located (and also "/", "/tmp", 
 "/usr", "/usr/home" and "/var"), "u1" is a backup array ("/mnt").
 
 Disks are verified via S.M.A.R.T. and they are OK. Also the controller 
 says, the disks are functional. After a hard reset the machine was up 
 again, no rebuild of the RAID-1 arrays were necessary. Of course 
 background fsck was running.
 
 Is there some other information I can provide?
 
 Thanks for your help!
 
 Michael Dosser
>Unformatted:
