From nobody@FreeBSD.org  Fri Mar 24 22:19:06 2000
Return-Path: <nobody@FreeBSD.org>
Received: from freefall.freebsd.org (freefall.FreeBSD.ORG [204.216.27.21])
	by hub.freebsd.org (Postfix) with ESMTP id 30BCE37B5F8
	for <freebsd-gnats-submit@FreeBSD.org>; Fri, 24 Mar 2000 22:19:06 -0800 (PST)
	(envelope-from nobody@FreeBSD.org)
Received: (from nobody@localhost)
	by freefall.freebsd.org (8.9.3/8.9.2) id WAA61574;
	Fri, 24 Mar 2000 22:19:06 -0800 (PST)
	(envelope-from nobody@FreeBSD.org)
Message-Id: <200003250619.WAA61574@freefall.freebsd.org>
Date: Fri, 24 Mar 2000 22:19:06 -0800 (PST)
From: dforste@uswest.net
Sender: nobody@FreeBSD.org
To: freebsd-gnats-submit@FreeBSD.org
Subject: ata READ/WRITE command timeouts
X-Send-Pr-Version: www-1.0

>Number:         17592
>Category:       kern
>Synopsis:       ata READ/WRITE command timeouts
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    sos
>State:          closed
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Fri Mar 24 22:20:01 PST 2000
>Closed-Date:    Tue Nov 14 00:40:04 PST 2000
>Last-Modified:  Mon Dec  4 10:10:02 PST 2000
>Originator:     David Forster
>Release:        4.0-RELEASE
>Organization:
>Environment:
FreeBSD maridia 4.0-RELEASE FreeBSD 4.0-RELEASE #0: Fri Mar 24 02:20:42 MST 2000     root@maridia:/usr/src/sys/compile/MARIDIA  i386
>Description:
Periodicly the ata driver whill report:
ad0: READ command timeout - resetting
ata0: resetting devices .. done

This preatty consitantly happens after the hard drive has been sitting
idle for a bit (~20sec?) and then there is an IO request (a similiar
message is reported for WRITE requests)...There's a short pause (4-5sec)
and then everything works fine...

I'm running FreeBSD 4.0-RELEASE (installed from original ISO) on my Sony
PCG-748 laptop.  
The kernel reports:
atapci0: <Intel PIIX4 ATA33 controller> port 0xfcd0-0xfcdf at device 7.1 on pci0
...
ad0: 3909MB <FUJITSU MHC2040AT> [7944/16/63] at ata0-master using UDMA33
acd0: CDROM <TOSHIBA CD-ROM XM-1802B> at ata1-master using WDMA2

Basicly Generic kernel + APM, sound and minus extranous hardware
>How-To-Repeat:
Wait a bit (~20seconds) with no disk activity and then perform READ/WRITE
request on IDE drive.
>Fix:


>Release-Note:
>Audit-Trail:
Responsible-Changed-From-To: freebsd-bugs->sos 
Responsible-Changed-By: dan 
Responsible-Changed-When: Sat Mar 25 08:02:48 PST 2000 
Responsible-Changed-Why:  
This is Soren's area/ 

From: Cy Schubert - ITSD Open Systems Group <Cy.Schubert@uumail.gov.bc.ca>
To: freebsd-gnats-submit@FreeBSD.org, sos@FreeBSD.org
Cc: dforste@uswest.net
Subject: Re: kern/17592: ata READ/WRITE command timeouts
Date: Sun, 13 Aug 2000 09:05:36 -0700

 I seem to have a similar problem, e.g. same message, with a Western 
 Digital 2.5 GB disk.
 
 Aug 13 08:41:16 cwsys /kernel: ata1-master: timeout waiting to give 
 command=c8 s=d0 e=00
 Aug 13 08:41:16 cwsys /kernel: ad2: error executing command - resetting
 Aug 13 08:41:16 cwsys /kernel: ata1: resetting devices .. done
 
 I'm not exactly sure whether the original cause of the problem, the 
 timeout itself, is a FreeBSD bug (PR 17592) or a drive problem.  This 
 drive has suffered timeouts under FreeBSD using DMA mode ever since it 
 was new about 5 years ago, yet the Western Digital diagnostics see no 
 problem, nor does PIO mode have any problem (or no flags when previous 
 versions of FreeBSD were installed).
 
 Relevant dmesg output:
 
 atapci0: <Intel PIIX3 ATA controller> port 0xf000-0xf00f at device 7.1 
 on pci0
 ata0: at 0x1f0 irq 14 on atapci0
 ata1: at 0x170 irq 15 on atapci0
 ad0: 2014MB <WDC AC22100H> [4092/16/63] at ata0-master using WDMA2
 ad2: 2441MB <WDC AC22500L> [4960/16/63] at ata1-master using WDMA2
 
 uname -a output:
 
 FreeBSD cwsys 4.1-RELEASE FreeBSD 4.1-RELEASE #5: Sun Aug 13 08:36:00 
 PDT 2000     root@cwsys:/usr/opt/cvs-410r/src/sys/compile/CWSYS  i386
 
 ad0 has no problems, yet ad2 has had random timeouts ever since it was 
 new, which can be recreated by doing a stat of all all the files in a 
 large directory.
 
 
 Regards,                       Phone:  (250)387-8437
 Cy Schubert                      Fax:  (250)387-5766
 Team Leader, Sun/DEC Team   Internet:  Cy.Schubert@osg.gov.bc.ca
 Open Systems Group, ITSD, ISTA
 Province of BC            
 
 
 
 

From: "Wildph" <wildph@wildph.club24.co.uk>
To: <freebsd-gnats-submit@FreeBSD.org>, <dforste@uswest.net>
Cc:  
Subject: Re: kern/17592: ata READ/WRITE command timeouts
Date: Sat, 30 Sep 2000 22:30:15 +0100

 This is a multi-part message in MIME format.
 
 ------=_NextPart_000_000F_01C02B2E.01735700
 Content-Type: text/plain;
 	charset="iso-8859-1"
 Content-Transfer-Encoding: quoted-printable
 
 hi there,=20
 
 I'm getting this sort of thing too..  Using a WD Caviar HDD Drive. =
 relevant messages are below.. ask me if you need more.. The motherboard =
 chipset is intel 430hx pentium based.=20
 
 Sometimes the ata0:resetting devices appears to work.. Most times the =
 unix box locks up and needs to be physically rebooted (off/on/reset =
 switch).=20
 
 I've CVSup'd to 4.1.1STABLE last night.  keeping an eye on it.=20
 
 Cheers
 
 Graeme.
 
 <at boot>
 /kernel: ad0: 3020MB <WDC AC33100H> [6136/16/63] at ata0-master using =
 WDMA2
 
 <during operation /var/log/messages>
 Sep 16 13:39:33 p200 /kernel: FreeBSD 4.1-STABLE #1: Sun Aug 13 21:05:36 =
 BST 2000
 -------
 Sep 16 16:49:38 p200 /kernel: ad0: WRITE command timeout - resetting
 Sep 16 16:49:38 p200 /kernel: ata0: resetting devices .. done
 Sep 16 17:36:31 p200 /kernel: ad0: READ command timeout - resetting
 Sep 16 17:36:31 p200 /kernel: ata0: resetting devices .. done
 <reboot>
 Sep 23 00:00:10 p200 /kernel: ad0: WRITE command timeout - resetting
 Sep 23 00:00:10 p200 /kernel: ata0: resetting devices .. done
 <reboot>
 --------
 =20
 
 ------=_NextPart_000_000F_01C02B2E.01735700
 Content-Type: text/html;
 	charset="iso-8859-1"
 Content-Transfer-Encoding: quoted-printable
 
 <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
 <HTML><HEAD>
 <META http-equiv=3DContent-Type content=3D"text/html; =
 charset=3Diso-8859-1">
 <META content=3D"MSHTML 5.50.4134.600" name=3DGENERATOR>
 <STYLE></STYLE>
 </HEAD>
 <BODY bgColor=3D#ffffff>
 <DIV><FONT face=3DArial size=3D2>hi there, </FONT></DIV>
 <DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
 <DIV><FONT face=3DArial size=3D2>I'm getting this sort of thing =
 too..&nbsp; Using a=20
 WD Caviar HDD Drive. relevant messages are below.. ask me if you need =
 more.. The=20
 motherboard chipset is intel 430hx pentium based. </FONT></DIV>
 <DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
 <DIV><FONT face=3DArial size=3D2>Sometimes the ata0:resetting devices =
 appears to=20
 work.. Most times the unix box locks up and needs to be physically =
 rebooted=20
 (off/on/reset switch). </FONT></DIV>
 <DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
 <DIV><FONT face=3DArial size=3D2>I've CVSup'd to 4.1.1STABLE last =
 night.&nbsp;=20
 keeping an eye on it. </FONT></DIV>
 <DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
 <DIV><FONT face=3DArial size=3D2>Cheers</FONT></DIV>
 <DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
 <DIV><FONT face=3DArial size=3D2>Graeme.</FONT></DIV>
 <DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
 <DIV><FONT face=3DArial size=3D2>&lt;at boot&gt;</FONT></DIV>
 <DIV><FONT face=3DArial size=3D2>/kernel: ad0: 3020MB &lt;WDC =
 AC33100H&gt;=20
 [6136/16/63] at ata0-master using WDMA2</FONT></DIV>
 <DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
 <DIV><FONT face=3DArial size=3D2>&lt;during operation=20
 /var/log/messages&gt;</FONT></DIV>
 <DIV><FONT face=3DArial size=3D2>Sep 16 13:39:33 p200 /kernel: FreeBSD =
 4.1-STABLE=20
 #1: Sun Aug 13 21:05:36 BST 2000</FONT></DIV>
 <DIV><FONT face=3DArial size=3D2>-------</FONT></DIV>
 <DIV><FONT face=3DArial size=3D2>Sep 16 16:49:38 p200 /kernel: ad0: =
 WRITE command=20
 timeout - resetting<BR>Sep 16 16:49:38 p200 /kernel: ata0: resetting =
 devices ..=20
 done</FONT></DIV>
 <DIV><FONT face=3DArial size=3D2>Sep 16 17:36:31 p200 /kernel: ad0: READ =
 command=20
 timeout - resetting</FONT></DIV>
 <DIV><FONT face=3DArial size=3D2>Sep 16 17:36:31 p200 /kernel: ata0: =
 resetting=20
 devices .. done</FONT></DIV>
 <DIV><FONT face=3DArial size=3D2>&lt;reboot&gt;<BR>Sep 23 00:00:10 p200 =
 /kernel:=20
 ad0: WRITE command timeout - resetting<BR>Sep 23 00:00:10 p200 /kernel: =
 ata0:=20
 resetting devices .. done</FONT></DIV>
 <DIV><FONT face=3DArial size=3D2>&lt;reboot&gt;</FONT></DIV>
 <DIV><FONT face=3DArial size=3D2>--------</FONT></DIV>
 <DIV><FONT face=3DArial size=3D2>&nbsp;</FONT></DIV></BODY></HTML>
 
 ------=_NextPart_000_000F_01C02B2E.01735700--
 
 
State-Changed-From-To: open->closed 
State-Changed-By: sos 
State-Changed-When: Tue Nov 14 00:40:04 PST 2000 
State-Changed-Why:  
Upgrade to 4.2 that should solve you problem. 


http://www.freebsd.org/cgi/query-pr.cgi?pr=17592 

From: Chris Hardie <chris@summersault.com>
To: <freebsd-gnats-submit@FreeBSD.org>
Cc: <dforste@uswest.net>, <sos@freebsd.org>
Subject: Re: kern/17592: ata READ/WRITE command timeouts
Date: Wed, 29 Nov 2000 22:40:36 -0500 (EST)

 Greetings.  We were having problems very similar to the ones described in
 kern/17592.  The recommended fix was to upgrade to 4.2, which we did.  We
 continue to have these problems, so I thought I'd submit a follow-up
 report in case this PR needs re-opening.
 
 uname -a:
 
 FreeBSD nollie.summersault.com 4.2-RELEASE FreeBSD 4.2-RELEASE #0: Tue Nov
 21 13:04:31 EST 2000
 root@nollie.summersault.com:/usr/src/sys/compile/NOLLIE.112100 i386
 
 From dmesg:
 
 CPU: Pentium III/Pentium III Xeon/Celeron (651.48-MHz 686-class CPU)
   Origin = "GenuineIntel"  Id = 0x681  Stepping = 1
   Features=0x383f9ff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,MMX,FXSR,SSE>
 <snip>
 ad0: 19574MB <WDC WD205BA> [39770/16/63] at ata0-master UDMA66
 ata1-master: timeout waiting for command=ef s=00 e=00
 (null): MODE_SENSE_BIG command timeout - resetting
 ata1: resetting devices .. done
 (null): MODE_SENSE_BIG command timeout - resetting
 ata1: resetting devices .. done
 (null): MODE_SENSE_BIG command timeout - resetting
 ata1: resetting devices .. done
 (null): MODE_SENSE_BIG command timeout - resetting
 ata1: resetting devices .. done
 
 We used to get these kinds of messages in /var/log/messages right before a
 crash:
 
 Nov 10 18:53:52 nollie /kernel: acd0: PREVENT_ALLOW command timeout - resetting
 Nov 10 18:53:52 nollie /kernel: ata1: resetting devices .. done
 
 but with 4.2 we no longer do.
 
 The basic behavior is that the system just freezes up and can only be
 recovered with a push of the reset button.
 
 Other random notes:
   -The crashes seem to be happening on a semi-regular basis, about every 4
 days, but not at the same time for each crash.
   -The drive is only a few months old, and has been tested several times.
   -Possibly unrelated: the kernel displays a message "14: not found" right
 after displaying "WARNING: / was not properly dismounted", but this
 doesn't show up in dmesg.  I've seen this on a few other FreeBSD boxes we
 have.
 
 I'm compiling a kernel with debug symbols now, so hopefully I'll have more
 information to offer soon.
 
 Anyone else reporting continued problems since upgrading to 4.2?
 
 Thanks,
 Chris
 
 -- Chris Hardie -----------------------------
 ----- mailto:chris@summersault.com ----------
 -------- http://www.summersault.com/chris/ --
 
 

From: Chris Hardie <chris@summersault.com>
To: <sos@freebsd.org>
Cc: <freebsd-gnats-submit@FreeBSD.org>
Subject: Re: kern/17592: ata READ/WRITE command timeouts
Date: Mon, 4 Dec 2000 13:06:49 -0500 (EST)

 Soren,
 
 We continue to have problems with our system locking up on a regular basis
 as described in my 29 Nov 2000 followup to this PR.  I ran the debug
 kernel and no core dump was produced.  This suggests either a problem with
 the hardware itself, or a problem with FreeBSD's interaction with the
 hardware.  Upgrading to 4.2 didn't seem to help, so I'm wondering what you
 would suggest as our next step in figuring out what's wrong and how to
 proceed with fixing it.  At this point I'm at a loss for figuring out
 what's happening at the moment of the crash.
 
 Hopefully unrelated is the note that this is the same box where we had
 problems described in PR#16740.  I hope to avoid the dismissal of the
 problem as a hardware defect because we've tested and re-tested the
 motherboard several times, but I thought I should mention it.
 
 Thanks,
 Chris
 
 
 -- Chris Hardie -----------------------------
 ----- mailto:chris@summersault.com ----------
 -------- http://www.summersault.com/chris/ --
 
 
 
 
>Unformatted:
