From nobody@FreeBSD.org  Mon Apr 26 10:24:47 2004
Return-Path: <nobody@FreeBSD.org>
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 55EEB16A4CF
	for <freebsd-gnats-submit@FreeBSD.org>; Mon, 26 Apr 2004 10:24:47 -0700 (PDT)
Received: from www.freebsd.org (www.freebsd.org [216.136.204.117])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 4B84343D64
	for <freebsd-gnats-submit@FreeBSD.org>; Mon, 26 Apr 2004 10:24:47 -0700 (PDT)
	(envelope-from nobody@FreeBSD.org)
Received: from www.freebsd.org (localhost [127.0.0.1])
	by www.freebsd.org (8.12.11/8.12.11) with ESMTP id i3QHOlGb075670
	for <freebsd-gnats-submit@FreeBSD.org>; Mon, 26 Apr 2004 10:24:47 -0700 (PDT)
	(envelope-from nobody@www.freebsd.org)
Received: (from nobody@localhost)
	by www.freebsd.org (8.12.11/8.12.11/Submit) id i3QHOl4x075669;
	Mon, 26 Apr 2004 10:24:47 -0700 (PDT)
	(envelope-from nobody)
Message-Id: <200404261724.i3QHOl4x075669@www.freebsd.org>
Date: Mon, 26 Apr 2004 10:24:47 -0700 (PDT)
From: Patrick Mackinlay <patrick@spacesurfer.com>
To: freebsd-gnats-submit@FreeBSD.org
Subject: ATA driver does not recover from READ_DMA TIMEOUT
X-Send-Pr-Version: www-2.3

>Number:         66001
>Category:       kern
>Synopsis:       ATA driver does not recover from READ_DMA TIMEOUT
>Confidential:   no
>Severity:       non-critical
>Priority:       low
>Responsible:    sos
>State:          closed
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Mon Apr 26 10:30:21 PDT 2004
>Closed-Date:    Mon Aug 16 11:27:10 GMT 2004
>Last-Modified:  Mon Aug 16 11:27:10 GMT 2004
>Originator:     Patrick Mackinlay
>Release:        5.2.1-RELEASE-p5
>Organization:
>Environment:
>Description:
Transferring data from one drive to another I get the following errors:
ad7: TIMEOUT - READ_DMA retrying (2 retries left) LBA=19030399
ad7: WARNING - READ_DMA interrupt was seen but timeout fired LBA=19030399
ad7: WARNING - READ_DMA interrupt was seen but taskqueue stalled LBA=19030399
The process that caused the error (cp or mv in my case) can be interrupted or killed, but will otherwise block. After this point it is no longer possible to access ad7. All processes that either read, write or try and umount the driver block and cannot be killed, interrupted or stopped. Since ad7 cannot be umounted it becomed useless. Furthermore, eventually the entire machine will simply hang (presumably when sufficient processes try and access ad7).
The folling lines from dmesg are also relevant:

atapci1: <HighPoint HPT370 UDMA100 controller> port 0xc000-0xc0ff,0xbc00-0xbc03,0xb800-0xb807,0xb400-0xb403,0xb000-0xb007 irq 11 at device 19.0 on pci0
atapci1: [MPSAFE]

ata3: at 0xb800 on atapci1
ata3: [MPSAFE]

GEOM: create disk ad7 dp=0xc637da60
ad7: 14649MB <IBM-DTLA-307015> [29765/16/63] at ata3-slave UDMA100

ad7: TIMEOUT - READ_DMA retrying (2 retries left) LBA=19030399
ad7: WARNING - READ_DMA interrupt was seen but timeout fired LBA=19030399
ad7: WARNING - READ_DMA interrupt was seen but taskqueue stalled LBA=19030399

Please let me know if you require further details.
>How-To-Repeat:
      
>Fix:
      
>Release-Note:
>Audit-Trail:
Responsible-Changed-From-To: freebsd-bugs->sos 
Responsible-Changed-By: simon 
Responsible-Changed-When: Mon Apr 26 13:38:54 PDT 2004 
Responsible-Changed-Why:  
Sounds like an ata(4) issue, so over to ata maintainer. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=66001 

From: David Kelly <dkelly@HiWAAY.net>
To: patrick@spacesurfer.com, freebsd-gnats-submit@FreeBSD.org
Cc:  
Subject: Re: kern/66001: ATA driver does not recover from READ_DMA TIMEOUT
Date: Fri, 30 Jul 2004 12:23:06 -0500

 Believe I am having same problem. And that kern/62897 is probably the 
 same thing too.
 
 Bought a brand new Dell 400SC, then a pair of Hitachi HDS722516VLSA80 
 160G SATA drives. The base Seagate ST340014A 40G is on parallel ATA 
 partioned with sysinstall's "auto" defaults. System withstood a couple 
 of days of abuse including "make world" before installing the SATA 
 drives, leaving the PATA 40G booting FreeBSD 5.2.1-p9.
 
 Partitioned the 160's with 1G of swap at the start, remainder native 
 FreeBSD. Have not used the swap partitions.
 
 Striped the two large partitions with vinum. Then started filling via 
 ftp. Instantly locked the machine requiring power cycle to recover.
 
 Have removed vinum, newfs'ed the bare partitions ad[46]s1d and tried 
 using them simply. cp from PATA to the fs on ad6s1d works just great. 
 cp of files on the fs at ad6s1d to the fs on ad4s1d gets READ_DMA 
 timeout at 1349058560 bytes into the first file. This cp process is 
 stuck. Its not moving. Its not responding to kill. Apparently 
 everything to ad4 is blocked until this clears.
 
 Shutdown gave up on syncing 22 buffers. Fsck reports "bad inode number 
 1083392 to nextinode" now on ad4s1d. Time for newfs.
 
 CPU is a P4-2.8G 512k with Hyperthreading enabled. Disabling HT appears 
 to have ended the problem and results in a reliable machine.
 

From: Patrick Mackinlay <patrick@spacesurfer.com>
To: David Kelly <dkelly@HiWAAY.net>
Cc: freebsd-gnats-submit@FreeBSD.org
Subject: Re: kern/66001: ATA driver does not recover from READ_DMA TIMEOUT
Date: Fri, 30 Jul 2004 17:40:40 +0000

 -----BEGIN PGP SIGNED MESSAGE-----
 Hash: SHA1
 
 Hello,
 
 I can reproduce this every time. I finally identified the file that is
 using the disk sectors that are causing the fault and renamed the file
 to "/file_system_mount_point/broken". This is a work arround that works,
 however what really needs to be fixed is the ata driver. It quite clearly
 does not handle hard disk failures properly.
 
 Patrick
 
 David Kelly wrote:
 | Believe I am having same problem. And that kern/62897 is probably the
 | same thing too.
 |
 | Bought a brand new Dell 400SC, then a pair of Hitachi HDS722516VLSA80
 | 160G SATA drives. The base Seagate ST340014A 40G is on parallel ATA
 | partioned with sysinstall's "auto" defaults. System withstood a couple
 | of days of abuse including "make world" before installing the SATA
 | drives, leaving the PATA 40G booting FreeBSD 5.2.1-p9.
 |
 | Partitioned the 160's with 1G of swap at the start, remainder native
 | FreeBSD. Have not used the swap partitions.
 |
 | Striped the two large partitions with vinum. Then started filling via
 | ftp. Instantly locked the machine requiring power cycle to recover.
 |
 | Have removed vinum, newfs'ed the bare partitions ad[46]s1d and tried
 | using them simply. cp from PATA to the fs on ad6s1d works just great. cp
 | of files on the fs at ad6s1d to the fs on ad4s1d gets READ_DMA timeout
 | at 1349058560 bytes into the first file. This cp process is stuck. Its
 | not moving. Its not responding to kill. Apparently everything to ad4 is
 | blocked until this clears.
 |
 | Shutdown gave up on syncing 22 buffers. Fsck reports "bad inode number
 | 1083392 to nextinode" now on ad4s1d. Time for newfs.
 |
 | CPU is a P4-2.8G 512k with Hyperthreading enabled. Disabling HT appears
 | to have ended the problem and results in a reliable machine.
 |
 |
 
 
 - --
 Patrick Mackinlay                              patrick@spacesurfer.com
 http://patrick.spacesurfer.com/                    tel: +44.7050699851
 Yahoo messenger: patrick00_uk                      fax: +44.7050699852
 SpaceSurfer Limited                           http://www.spacereg.com/
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.2.4 (GNU/Linux)
 Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
 
 iD8DBQFBCogYD97IpyzY3RIRAi6kAKCKUho4Tx/vJfnxks+lXsu2m5RDCgCcDIh7
 CeCO1LrgwWYUGPUFQ2lnBdw=
 =T48C
 -----END PGP SIGNATURE-----

From: David Kelly <dkelly@HiWAAY.net>
To: Patrick Mackinlay <patrick@spacesurfer.com>
Cc: freebsd-gnats-submit@FreeBSD.org
Subject: Re: kern/66001: ATA driver does not recover from READ_DMA TIMEOUT
Date: Fri, 30 Jul 2004 12:57:35 -0500

 On Jul 30, 2004, at 12:40 PM, Patrick Mackinlay wrote:
 
 > I can reproduce this every time. I finally identified the file that is
 > using the disk sectors that are causing the fault and renamed the file
 > to "/file_system_mount_point/broken". This is a work arround that 
 > works,
 > however what really needs to be fixed is the ata driver. It quite 
 > clearly
 > does not handle hard disk failures properly.
 
 That cause sounds different than my problem altho its likely we are 
 both hanging on the same error handling problem. Sounds like Patrick 
 has a bad block on the media? See badsect(8) for something that might 
 help create a bandaid.
 
 Since posting earlier I have disabled hyperthreading of the CPU in the 
 BIOS and have written over 50G to each of the "problem" SATA drives, 
 reading from one or the other.
 
 Am now confident hyperthreading (SMP) was the root of my problem and am 
 ready to set the machine to the tasks it was purchased for, with HT 
 disabled.
 
State-Changed-From-To: open->closed 
State-Changed-By: sos 
State-Changed-When: Mon Aug 16 11:23:00 GMT 2004 
State-Changed-Why:  
You should try -current (or soon to be 5.3) as I've fixed a couble of races 
that could provoke this.. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=66001 
>Unformatted:
