From nobody@FreeBSD.org  Thu Nov 18 12:43:16 2004
Return-Path: <nobody@FreeBSD.org>
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 8236B16A4CE
	for <freebsd-gnats-submit@FreeBSD.org>; Thu, 18 Nov 2004 12:43:16 +0000 (GMT)
Received: from www.freebsd.org (www.freebsd.org [216.136.204.117])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 6D55043D31
	for <freebsd-gnats-submit@FreeBSD.org>; Thu, 18 Nov 2004 12:43:16 +0000 (GMT)
	(envelope-from nobody@FreeBSD.org)
Received: from www.freebsd.org (localhost [127.0.0.1])
	by www.freebsd.org (8.12.11/8.12.11) with ESMTP id iAIChFFn053206
	for <freebsd-gnats-submit@FreeBSD.org>; Thu, 18 Nov 2004 12:43:15 GMT
	(envelope-from nobody@www.freebsd.org)
Received: (from nobody@localhost)
	by www.freebsd.org (8.12.11/8.12.11/Submit) id iAIChFFn053205;
	Thu, 18 Nov 2004 12:43:15 GMT
	(envelope-from nobody)
Message-Id: <200411181243.iAIChFFn053205@www.freebsd.org>
Date: Thu, 18 Nov 2004 12:43:15 GMT
From: Tuure Laurinolli <tuure@laurinolli.net>
To: freebsd-gnats-submit@FreeBSD.org
Subject: DMA problwms with large disks and HPT370
X-Send-Pr-Version: www-2.3

>Number:         74070
>Category:       kern
>Synopsis:       DMA problems with large disks and HPT370
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    sos
>State:          closed
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Thu Nov 18 12:50:34 GMT 2004
>Closed-Date:    Mon Apr 11 11:14:53 GMT 2005
>Last-Modified:  Mon Apr 11 11:14:53 GMT 2005
>Originator:     Tuure Laurinolli
>Release:        FreeBSD 5.3-RC1 i386
>Organization:
>Environment:
FreeBSD vortex.home.lan 5.3-RC1 FreeBSD 5.3-RC1 #6: Thu Oct 21 19:26:48 EEST 2004 root@vortex.home.lan:/usr/obj/usr/src/sys/VORTEX i386

Abit VP6 motherboard with latest BIOS, ie. HPT370 RAID controller
two Seagate HDDs, model <ST3200822A/3.01>, ie. 200GB models
>Description:
I get DMA errors when trying to access sector 268435455, or the 2^28th sector from the beginning of the disk.

I guess this is a controller problem, however I don't have any real proof, because this is my only available controller that supports disks as large as those. I will try to find another controller to test with. I think it would be very unlikely for two new disks to both have the same problem on the same sector.

With a single disk, the errors of dd if=/dev/ad6 of=/tmp/test6 are:

ad6: TIMEOUT - READ_DMA retrying (2 retries left) LBA=268435455
ad6: TIMEOUT - READ_DMA retrying (1 retries left) LBA=268435455
ad6: FAILURE - READ_DMA timed out

With a HPT-native RAID1 setup the results are worse. I don't have exact error messages, but there are DMA timeouts on both disks (ad4 and ad6), that result in tearing the array (ar0) apart, and causing a kernel panic (maybe because the array is the root disk too).
>How-To-Repeat:
[14:30:23][tazle@vortex][/var/run]% sudo dd if=/dev/ad6 of=/tmp/test6 skip=268435450 count=10
dd: /dev/ad6: Input/output error
5+0 records in
5+0 records out
2560 bytes transferred in 15.645115 secs (164 bytes/sec)


The system console gives the errors given in full desription.
>Fix:
      
>Release-Note:
>Audit-Trail:

From: Ilya Pizik <polzun@scar.jinr.ru>
To: freebsd-gnats-submit@FreeBSD.org, tuure@laurinolli.net
Cc:  
Subject: Re: kern/74070: DMA problwms with large disks and HPT370
Date: Fri, 19 Nov 2004 11:21:41 +0300

 Me has the same problem:
 RELENG_5 from 16.11
 There are 5 HDD in my PC:
 60Gb (Seagate connected via Intel ICH2 UDMA100 controller)
 120Gb (Seagate connected via Intel ICH2 UDMA100 controller)
 250Gb (WD connected via Intel ICH2 UDMA100 controller)
 250Gb (WD connected via Promise PDC20268 UDMA100 controller)
 120Gb (Seagate connected via Promise PDC20268 UDMA100 controller)
 
 Such messages appear in log when PC is heavy loaded:
 kernel: ad3: TIMEOUT - READ_DMA retrying (2 retries left) LBA=369792831
 kernel: ad3: FAILURE - READ_DMA timed out
 kernel: ad3: TIMEOUT - READ_DMA retrying (2 retries left) LBA=421053375
 kernel: ad3: FAILURE - READ_DMA timed out
 kernel: ad2: TIMEOUT - READ_DMA retrying (2 retries left) LBA=355563327
 kernel: ad2: FAILURE - READ_DMA timed out
 kernel: ad2: TIMEOUT - READ_DMA retrying (2 retries left) LBA=11109887
 kernel: ad2: WARNING - removed from configuration
 kernel: ata1-slave: FAILURE - READ_DMA timed out
 kernel: ad2: TIMEOUT - READ_DMA retrying (2 retries left) LBA=488374591
 kernel: ad2: WARNING - removed from configuration
 kernel: ata1-slave: FAILURE - READ_DMA timed out
 
 ...
 
 I try to detach an attach devices with atacontrol - result is DMA errors
 only with 250Gb HDDs
 
 
 -- 
 With respect, Pizik Ilya.
Responsible-Changed-From-To: freebsd-bugs->sos 
Responsible-Changed-By: glebius 
Responsible-Changed-When: Mon Nov 22 08:24:57 GMT 2004 
Responsible-Changed-Why:  
Over to ATA maintainer. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=74070 

From: Tuure Laurinoli <tuure@laurinolli.net>
To: freebsd-gnats-submit@FreeBSD.org, tuure@laurinolli.net
Cc:  
Subject: Re: kern/74070: DMA problems with large disks and HPT370
Date: Sun, 09 Jan 2005 02:03:00 +0200

 I tested the same machine with linux, and had no problems reading sector 
 268435455. After digging around the sources for a while, it seemed that 
 Linux uses 48-bit operations whenever they're available, and FreeBSD 
 only for sectors > 268435455. Maybe this is the source of failure here?
 
 The linux driver also seems to reset the HPT state machine before each 
 command is run, though it's hard to see how this would cause problems 
 with one specific sector, independent of previous commands.

From: Tuure Laurinolli <tuure@laurinolli.net>
To: freebsd-gnats-submit@FreeBSD.org, tuure@laurinolli.net
Cc:  
Subject: Re: kern/74070: DMA problems with large disks and HPT370
Date: Fri, 14 Jan 2005 11:09:33 +0200

 The problem was indeed solved by the LBA tripover changes in -CURRENT, 
 so this is can now be closed as far as I'm considered.

From: "Przemek Syta" <psyta@koelner.com.pl>
To: <freebsd-gnats-submit@FreeBSD.org>, <tuure@laurinolli.net>
Cc:  
Subject: Re: kern/74070: DMA problems with large disks and HPT370
Date: Tue, 15 Mar 2005 09:47:28 +0100

 This is a multi-part message in MIME format.
 
 ------=_NextPart_000_0003_01C52944.0051C140
 Content-Type: text/plain;
 	charset="iso-8859-2"
 Content-Transfer-Encoding: quoted-printable
 
 Same problems on FreeBSD 5.3-RC2 and -RELEASE.
 HPT370 with no raid option on.
 
 Mar 15 05:00:25 obcy4 kernel: ad7: TIMEOUT - READ_DMA retrying (2 =
 retries left) LBA=3D378610367
 Mar 15 05:00:26 obcy4 kernel: ad7: WARNING - removed from configuration
 Mar 15 05:00:26 obcy4 kernel: ata3-slave: FAILURE - READ_DMA timed out
 Mar 15 05:06:35 obcy4 kernel: ad7: TIMEOUT - WRITE_DMA retrying (2 =
 retries left) LBA=3D303905343
 Mar 15 05:06:39 obcy4 kernel: ad7: WARNING - removed from configuration
 Mar 15 05:06:39 obcy4 kernel: ata3-slave: FAILURE - WRITE_DMA timed out
 Mar 15 05:06:39 obcy4 kernel: ad4: TIMEOUT - READ_DMA retrying (2 =
 retries left) LBA=3D15923711
 Mar 15 05:06:39 obcy4 kernel: ad4: FAILURE - READ_DMA timed out
 
 Then panic. :/
 
 
 
 ------=_NextPart_000_0003_01C52944.0051C140
 Content-Type: text/html;
 	charset="iso-8859-2"
 Content-Transfer-Encoding: quoted-printable
 
 <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
 <HTML><HEAD>
 <META http-equiv=3DContent-Type content=3D"text/html; =
 charset=3Diso-8859-2">
 <META content=3D"MSHTML 6.00.2600.0" name=3DGENERATOR>
 <STYLE></STYLE>
 </HEAD>
 <BODY bgColor=3D#ffffff>
 <DIV><FONT face=3DArial size=3D2>Same problems on FreeBSD 5.3-RC2 and=20
 -RELEASE.</FONT></DIV>
 <DIV><FONT face=3DArial size=3D2>HPT370 with no raid option =
 on.</FONT></DIV>
 <DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
 <DIV><FONT face=3DArial size=3D2>Mar 15 05:00:25 obcy4 kernel: ad7: =
 TIMEOUT -=20
 READ_DMA retrying (2 retries left) LBA=3D378610367<BR>Mar 15 05:00:26 =
 obcy4=20
 kernel: ad7: WARNING - removed from configuration<BR>Mar 15 05:00:26 =
 obcy4=20
 kernel: ata3-slave: FAILURE - READ_DMA timed out<BR>Mar 15 05:06:35 =
 obcy4=20
 kernel: ad7: TIMEOUT - WRITE_DMA retrying (2 retries left) =
 LBA=3D303905343<BR>Mar=20
 15 05:06:39 obcy4 kernel: ad7: WARNING - removed from =
 configuration<BR>Mar 15=20
 05:06:39 obcy4 kernel: ata3-slave: FAILURE - WRITE_DMA timed out<BR>Mar =
 15=20
 05:06:39 obcy4 kernel: ad4: TIMEOUT - READ_DMA retrying (2 retries left) =
 
 LBA=3D15923711<BR>Mar 15 05:06:39 obcy4 kernel: ad4: FAILURE - READ_DMA =
 timed=20
 out<BR></FONT></DIV>
 <DIV><FONT face=3DArial size=3D2>Then panic. :/</FONT></DIV>
 <DIV><FONT face=3DArial size=3D2>&nbsp;</DIV></FONT>
 <DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV></BODY></HTML>
 
 ------=_NextPart_000_0003_01C52944.0051C140--
 

From: Tuure Laurinolli <tuure@laurinolli.net>
To: Przemek Syta <psyta@koelner.com.pl>
Cc: freebsd-gnats-submit@FreeBSD.org
Subject: Re: kern/74070: DMA problems with large disks and HPT370
Date: Tue, 15 Mar 2005 14:09:41 +0200

 This is an OpenPGP/MIME signed message (RFC 2440 and 3156)
 --------------enig778982BC8CD443CF904D8545
 Content-Type: text/plain; charset=ISO-8859-2; format=flowed
 Content-Transfer-Encoding: 7bit
 
 Przemek Syta wrote:
 
 > Mar 15 05:00:25 obcy4 kernel: ad7: TIMEOUT - READ_DMA retrying (2 retries left) LBA=378610367
 > Mar 15 05:06:35 obcy4 kernel: ad7: TIMEOUT - WRITE_DMA retrying (2 retries left) LBA=303905343
 > Mar 15 05:06:39 obcy4 kernel: ad4: TIMEOUT - READ_DMA retrying (2 retries left) LBA=15923711
 
 
 Notice that your failing sectors are different from my original ones, 
 which were caused by the combination of drive firmware bug and FreeBSD.
 
 --------------enig778982BC8CD443CF904D8545
 Content-Type: application/pgp-signature; name="signature.asc"
 Content-Description: OpenPGP digital signature
 Content-Disposition: attachment; filename="signature.asc"
 
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.2.4 (MingW32)
 
 iD8DBQFCNtCJwcRGkkVZ0NkRAvV7AJ0abMR0h0ZPmtyLFjDolZIrU4wYWACdFIrh
 47GBvKaIpqZVlItF0TJhp9w=
 =1LNQ
 -----END PGP SIGNATURE-----
 
 --------------enig778982BC8CD443CF904D8545--
State-Changed-From-To: open->closed 
State-Changed-By: sos 
State-Changed-When: Mon Apr 11 11:14:10 GMT 2005 
State-Changed-Why:  
Fixed in both 5.x-stable and -current 

http://www.freebsd.org/cgi/query-pr.cgi?pr=74070 
>Unformatted:
