From nobody@FreeBSD.org  Tue Sep 20 08:26:08 2005
Return-Path: <nobody@FreeBSD.org>
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id A4AB816A41F
	for <freebsd-gnats-submit@FreeBSD.org>; Tue, 20 Sep 2005 08:26:08 +0000 (GMT)
	(envelope-from nobody@FreeBSD.org)
Received: from www.freebsd.org (www.freebsd.org [216.136.204.117])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 3401043D45
	for <freebsd-gnats-submit@FreeBSD.org>; Tue, 20 Sep 2005 08:26:08 +0000 (GMT)
	(envelope-from nobody@FreeBSD.org)
Received: from www.freebsd.org (localhost [127.0.0.1])
	by www.freebsd.org (8.13.1/8.13.1) with ESMTP id j8K8Q7g7014359
	for <freebsd-gnats-submit@FreeBSD.org>; Tue, 20 Sep 2005 08:26:07 GMT
	(envelope-from nobody@www.freebsd.org)
Received: (from nobody@localhost)
	by www.freebsd.org (8.13.1/8.13.1/Submit) id j8K8Q7jO014354;
	Tue, 20 Sep 2005 08:26:07 GMT
	(envelope-from nobody)
Message-Id: <200509200826.j8K8Q7jO014354@www.freebsd.org>
Date: Tue, 20 Sep 2005 08:26:07 GMT
From: Thede Loder <thede@loder.com>
To: freebsd-gnats-submit@FreeBSD.org
Subject: ATA woes, SATA controller: failed writes, FS corruption, and system hang under heavy loads
X-Send-Pr-Version: www-2.3

>Number:         86364
>Category:       i386
>Synopsis:       [ata] ATA woes, SATA controller: failed writes, FS corruption, and system hang under heavy loads
>Confidential:   no
>Severity:       critical
>Priority:       high
>Responsible:    remko
>State:          closed
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Tue Sep 20 08:30:18 GMT 2005
>Closed-Date:    Sat Nov 11 09:13:22 GMT 2006
>Last-Modified:  Sat Nov 11 09:13:22 GMT 2006
>Originator:     Thede Loder
>Release:        FreeBSD-6.0BETA5
>Organization:
Paritive, Inc. 
>Environment:
FreeBSD davros.loder.com 6.0-BETA5 FreeBSD 6.0-BETA5 #21: Sun Sep 18 22:34:20 PDT 2005    root@davros.loder.com:/usr/src/sys/i386/compile/DAVROS  i386

>Description:
Hi all.  A little ATA trouble.  I've been running an NFS client-driven
stress test on a NFS exported file system.  

It seems just fine unless the FS is hosted on a drive attached to a PCI
SATA controller, which is a Promise SATAII150 TX2plus.  After a short
period of time with the stress test (as little as a few seconds, as long
as a minute or two), the exported drive simply hangs, eventually causing
writes to timing out on the NFS client  The drive device path /dev/ad4
remains visible in /dev, but calls to access the drive do not seem to
return.  'umount'ing the filesystem on the hung drive freezes all ATA
devices and hangs the system (I am not overclocked).  

A hard reboot is required to bring things back to normal.  Not sure if
data is being lost or not, but fsck always finds FS errors, and self
reboot is not possible with the console reporting failed buffer writes.  

I have repeated the stress test using a filesystem on ATA100 drives hosted
by the mainboard's VIA 8235 without any problems, so it seems to be specific
to the PCI Promise Controller and it's drives.  

The drive itself is a Western Digital (WDC WD2500JD-50GBB0 02.05D02).  
Motherboard is a KT3 Ultra 2 with an AMD 1800+ on it.  

I'm happy to dig into it further and provide more specifics, 
but need some experienced advice as to where to instrument.  

>How-To-Repeat:
Export, via NFS, a filesystem that is on a drive hosted by the SATA
controller.  Stress the filesystem (I used an import of mp3 files using 
iTunes).  After a minute or two, (repeatable) the the kernel outputs
"ad4: FAILURE - SETFEATURES SET TRANSFER MODE timed out", and the drive
becomes unresponsive, halting the NFS activity.  A subsequent 'umount'
of the file system hangs all ATA devices on the system, preventing login
or logout.  If "reboot" is issued before the 'umount', the reboot
process starts but hangs while flushing buffers.  
>Fix:
>Release-Note:
>Audit-Trail:
State-Changed-From-To: open->feedback 
State-Changed-By: remko 
State-Changed-When: Mon Sep 11 11:38:02 UTC 2006 
State-Changed-Why:  
Hello, 

We have been through a lot of revisions for both 6.0 and 6.1 and 
6.1 is now declared stable with 6.2 upcoming, can you tell me 
whether the problem you were facing is still present? If so we 
need to look further in the current code to see what is whacking 
this. 

thanks 


Responsible-Changed-From-To: freebsd-i386->remko 
Responsible-Changed-By: remko 
Responsible-Changed-When: Mon Sep 11 11:38:02 UTC 2006 
Responsible-Changed-Why:  
grab the PR 

http://www.freebsd.org/cgi/query-pr.cgi?pr=86364 
State-Changed-From-To: feedback->closed 
State-Changed-By: remko 
State-Changed-When: Sat Nov 11 09:12:11 UTC 2006 
State-Changed-Why:  
Closing the PR, i did not recieve any feedback so far. If you have feedback please 
respond to 103435 to keep the information central. 

Closed at:	EuroBSDCon 2006 

http://www.freebsd.org/cgi/query-pr.cgi?pr=86364 
>Unformatted:
