From apeiron@coitusmentis.info  Fri Sep  2 00:20:12 2005
Return-Path: <apeiron@coitusmentis.info>
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 957EE16A41F;
	Fri,  2 Sep 2005 00:20:12 +0000 (GMT)
	(envelope-from apeiron@coitusmentis.info)
Received: from coitusmentis.info (pcp01649268pcs.levtwn01.pa.comcast.net [68.32.0.126])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 3212143D45;
	Fri,  2 Sep 2005 00:20:12 +0000 (GMT)
	(envelope-from apeiron@coitusmentis.info)
Received: by prophecy.velum (Postfix, from userid 1000)
	id B7D311CC19; Thu,  1 Sep 2005 20:20:32 -0400 (EDT)
Message-Id: <20050902002032.GA1016@prophecy.dyndns.org>
Date: Thu, 1 Sep 2005 20:20:32 -0400
From: Christopher Nehren <apeiron+usenet@coitusmentis.info>
To: Philip Paeps <philip@freebsd.org>
Cc: FreeBSD-gnats-submit@freebsd.org
In-Reply-To: <200509012243.j81MhQDY035598@fasolt.home.paeps.cx>
Subject: Re: FS corruption and 'uncorrectable' DMA errors on ATA disks after unclean shutdown
References: <200509012243.j81MhQDY035598@fasolt.home.paeps.cx>

>Number:         85613
>Category:       kern
>Synopsis:       Re: FS corruption and 'uncorrectable' DMA errors on ATA disks after unclean shutdown
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    freebsd-ports-bugs
>State:          closed
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Fri Sep 02 00:20:21 GMT 2005
>Closed-Date:    Fri Sep 02 00:40:09 GMT 2005
>Last-Modified:  Fri Sep 16 23:20:10 GMT 2005
>Originator:     
>Release:        
>Organization:
>Environment:
>Description:
 --6c2NcOVqGQ03X4Wi
 Content-Type: text/plain; charset=us-ascii
 Content-Disposition: inline
 Content-Transfer-Encoding: quoted-printable
 
 On Thu, Sep 01, 2005 at 18:43:26 EDT, Philip Paeps scribbled these
 curious markings:
 > >Environment:
 > System: FreeBSD fasolt.home.paeps.cx 7.0-CURRENT FreeBSD 7.0-CURRENT #39:=
  Sun
 > Aug 21 15:52:38 CEST 2005
 > philip@fasolt.home.paeps.cx:/usr/obj/usr/src/sys/FASOLT i386
 
 Here's my environment:
 
 FreeBSD prophecy.dyndns.org 5.4-STABLE FreeBSD 5.4-STABLE #0:
 Sat Jul 16 20:37:57 EDT 2005
 root@prophecy.dyndns.org:/usr/obj/usr/src/sys/PROPHECY  i386
 
 
 > >Description:
 > =09
 > Recently, after a power failure, I experience some inexplicable problems =
 with
 > an ATA disks, which could quite possibly be due to hardware.  However, af=
 ter
 > having experienced the same problems on a second disk, and discovering, i=
 n a
 > discussion on comp.unix.bsd.freebsd.misc, that others have seen the same =
 sort
 > of issue, I've begun to suspect a kernel issue.
 
 I don't remember if a power failure was involved in my disk
 issues, but I do remember having some isolated power issues
 around the time that my disk started to misbehave.
 
 > The first time I saw the problem, the machine initially came up fine, and=
  I
 > could dirty-mount the filesystem and let bgfsck take care of things.  Soon
 > after the fsck began, the kernel started spewing out errors along the lin=
 es
 > of 'uncorrectable' and 'dma_read'.  Unfortunately, I've not managed to
 > reproduce the problem with a loggable console.
 
 Mine manifested itself differently, with DMA_READ / DMA_WRITE
 errors during normal use.
 
 > After a reboot, the filesystem on the disk refused to mount again.  Manua=
 lly
 > forcing an fsck, complained about unreadable sectors.  Again, the kernel
 > spewed out the 'uncorrectable' and 'dma_read' errors.
 
 I could fsck mine just fine, but it gave the same error
 repeatedly.
 
 > According to SMART, the disk is quite healthy, though some errors were lo=
 gged
 > in the the log:
 
 SMART (both boot-up and smartctl) say my drive is fine. A
 just-now-performed `smartctl -a` shows no logged errors.
 
 > The funny thing is, after newfsing the disk, and restoring the data, all =
 seems
 > to be working well and happy on the disk.  The first disk I had this prob=
 lem
 > with, has now been under medium heavy use again for over a month, the sec=
 ond
 > disk (see below) has been in use again for two weeks.
 
 I too can say that a newfs curiously enough seems to cure the
 problem. I was able to restore approximately 10 GB of backups to
 the disk and wrote a GB or two more to it the day that I did the
 restoration without any difficulties, and today has seen light
 reads and writes.
 
 > I'm happy to help debug this further, if indeed it's a software bug, and =
 not
 > something with flaky hardware.  Cc: Christopher Nehren who reported simil=
 ar
 > issues on Usenet and suggested a PR be filed.  He might be able to add mo=
 re
 > useful information.
 
 If anything more is requested, please advise. I (*gasp*) work
 now, though, so I don't have quite so many tuits as I used to
 have.
 
 > For what it's worth, the disks were Maxtor 6Y200P0 and Maxtor 6E040L0 on =
 a=20
 > VIA 8235 UDMA133 controller and a VIA 8231 UDMA100 controller in my case.
 
 My experience is with a Seagate ST3120026A on a SiS 730 UDMA100
 controller.
 
 > >How-To-Repeat:
 > =09
 > Lose power or panic the machine with a filesystem on an ATA disk and wait=
  for
 > phase of moon and other elements of faith to be properly aligned.  I have=
  been
 > able to reproduce the problem (and the 'working well after newfs') three =
 times
 > by accident, never yet by force.
 
 I can't provide any more definitive information on how to
 reproduce this phenomenon other than seconding the notion that
 it does seem fairly random. Someone else from Usenet might be
 able to do so; Philip (thanks!) mentioned it in the Usenet
 thread and volunteered to corral some others into contributing.
 
 Best regards,
 Christopher Nehren
 
 --=20
 I abhor a system designed for the "user", if that word is a coded
 pejorative meaning "stupid and unsophisticated". -- Ken Thompson
 If you ask questions of idiots, you get "Joel on Software".
 Unix is user friendly. However, it isn't idiot friendly.
 
 --6c2NcOVqGQ03X4Wi
 Content-Type: application/pgp-signature
 Content-Disposition: inline
 
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.1 (FreeBSD)
 
 iD8DBQFDF5rQk/lo7zvzJioRAq8VAJ9kvjRubIulj49V4BDmigS3LDMkRACfa2NE
 rvKdAA8dFSmp8y2iOcqQAI4=
 =wEqD
 -----END PGP SIGNATURE-----
 
 --6c2NcOVqGQ03X4Wi--
>How-To-Repeat:
>Fix:
>Release-Note:
>Audit-Trail:
State-Changed-From-To: open->closed 
State-Changed-By: linimon 
State-Changed-When: Fri Sep 2 00:38:55 GMT 2005 
State-Changed-Why:  
Yet another case of email mangling.  This is a misfiled followup to 
kern/85603; I have copied over the source. 

I may live long enough to kill all the quoted-printable stuff that 
gets into GNATS.  Or maybe not. 


Responsible-Changed-From-To: gnats-admin->freebsd-ports-bugs 
Responsible-Changed-By: linimon 
Responsible-Changed-When: Fri Sep 2 00:38:55 GMT 2005 
Responsible-Changed-Why:  

http://www.freebsd.org/cgi/query-pr.cgi?pr=85613 

From: Christopher Nehren <apeiron@coitusmentis.info>
To: bug-followup@freebsd.org,
 apeiron+usenet@coitusmentis.info
Cc:  
Subject: Re: kern/85613: Re: FS corruption and 'uncorrectable' DMA errors on ATA disks after unclean shutdown
Date: Fri, 16 Sep 2005 19:16:36 -0400

 Seems that running newfs didn't quite fix it for me. The day 
 after I said it worked fine, it started giving errors.
 
 I selected the link on the Web page interface to the PR 
 interface for this bug, so I really hope it's going to the 
 right place ... if it's not, I'm going to cry. :(
>Unformatted:
