From nobody@FreeBSD.ORG  Sun Dec 12 18:40:28 1999
Return-Path: <nobody@FreeBSD.ORG>
Received: by hub.freebsd.org (Postfix, from userid 32767)
	id 7522114D48; Sun, 12 Dec 1999 18:40:28 -0800 (PST)
Message-Id: <19991213024028.7522114D48@hub.freebsd.org>
Date: Sun, 12 Dec 1999 18:40:28 -0800 (PST)
From: klh@netcom.com
Sender: nobody@FreeBSD.ORG
To: freebsd-gnats-submit@freebsd.org
Subject: Seagate ST32550 (Barracuda 2LP) may be a broken tagged queueing drive?
X-Send-Pr-Version: www-1.0

>Number:         15447
>Category:       kern
>Synopsis:       Seagate ST32550 (Barracuda 2LP) may be a broken tagged queueing drive?
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    ken
>State:          closed
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Sun Dec 12 18:50:01 PST 1999
>Closed-Date:    Tue Dec 14 23:31:11 PST 1999
>Last-Modified:  Tue Dec 14 23:32:04 PST 1999
>Originator:     Ken Harrenstien
>Release:        3.1-RELEASE
>Organization:
>Environment:
FreeBSD <hostname> 3.1-RELEASE FreeBSD 3.1-RELEASE #<n>: <buildstring>  i386
>Description:
A separate problem is causing my system to sometimes boot up with
tagged queueing enabled and sometimes not.  I've recently been stressing
the disk significantly more than usual and have encountered user-level
I/O errors that I traced back to the enabling of tagged queueing.

With tagged queueing off, everything always works.  With it on,
a heavy load of seeks will cause reads and writes to start failing.
I was able to verify this by running a test case on two ST32550s, both
on line during the same kernel boot and both identical in all respects
except that one had tagged queueing enabled and the other didn't (the
randomness of this enabling is a separate problem).  The drive without
tagging always works perfectly; the drive with tagging always fails
at random places during the test.  I verified that it is not specific
to the individual drives by doing reboots until the formerly tag-enabled
drive booted up tag-disabled -- whereupon it then performed perfectly
again.  I also verified that the filesystems were identical by doing
a complete track-by-track copy of one to the other prior to testing.

The ST32550 is not in the latest quirks table in cam_xtp.c, although
several other Seagates are.  The Barracuda 2LP was at one time fairly
popular so I'm a little surprised this hasn't shown up before, but
who knows.  Maybe most FreeBSD users have IDE drives.




>How-To-Repeat:
My test case consisted of saving a tar of /usr/src/sys in a partition
about 1G distance away from /usr on the same drive, and attempting to
restore to /usr from that.  Tar encounters I/O errors attempting to
restore perhaps 1% to 2% of the files.  Once in a while the kernel
will get a swap error and complain, but otherwise no diagnostics are
shown on the console.

>Fix:
Obviously the ST32550 can be added to the quirks table in cam_xtp.c.
I just hope this does not reflect some underlying problem with
tagged queueing support of Seagates in general.



>Release-Note:
>Audit-Trail:

From: "Kenneth D. Merry" <ken@kdm.org>
To: klh@netcom.com
Cc: freebsd-gnats-submit@FreeBSD.ORG
Subject: Re: kern/15447: Seagate ST32550 (Barracuda 2LP) may be a broken tagged queueing drive?
Date: Sun, 12 Dec 1999 21:29:20 -0700

 On Sun, Dec 12, 1999 at 06:40:28PM -0800, klh@netcom.com wrote:
 > A separate problem is causing my system to sometimes boot up with
 > tagged queueing enabled and sometimes not.  I've recently been stressing
 > the disk significantly more than usual and have encountered user-level
 > I/O errors that I traced back to the enabling of tagged queueing.
 > 
 > With tagged queueing off, everything always works.  With it on,
 > a heavy load of seeks will cause reads and writes to start failing.
 > I was able to verify this by running a test case on two ST32550s, both
 > on line during the same kernel boot and both identical in all respects
 > except that one had tagged queueing enabled and the other didn't (the
 > randomness of this enabling is a separate problem).  The drive without
 > tagging always works perfectly; the drive with tagging always fails
 > at random places during the test.  I verified that it is not specific
 > to the individual drives by doing reboots until the formerly tag-enabled
 > drive booted up tag-disabled -- whereupon it then performed perfectly
 > again.  I also verified that the filesystems were identical by doing
 > a complete track-by-track copy of one to the other prior to testing.
 > 
 > The ST32550 is not in the latest quirks table in cam_xtp.c, although
 > several other Seagates are.  The Barracuda 2LP was at one time fairly
 > popular so I'm a little surprised this hasn't shown up before, but
 > who knows.  Maybe most FreeBSD users have IDE drives.
 
 No, there are many, many people using Seagate drives (including me)
 successfully in FreeBSD systems.  I think this problem is most likely
 peculiar to your particular system and/or drives.
 
 > >Fix:
 > Obviously the ST32550 can be added to the quirks table in cam_xtp.c.
 > I just hope this does not reflect some underlying problem with
 > tagged queueing support of Seagates in general.
 
 Nope, it reflects a problem either with your drives or your cabling and
 termination setup.
 
 You need to supply some more information before we can make any sort of
 guess at what is going on.  So, please send (and make sure you do a "group"
 reply to this mail, so it winds up in the PR database) full 'dmesg' output
 from your system, including any kernel messages that have shown up while
 doing your tests.
 
 Please don't send the output of /var/log/messages, unless it is necessary
 to show problems that happened in a previous boot.  The output of dmesg(8)
 is easier to read.
 
 Also, please send a description of your cabling and termination setup.
 
 Ken
 -- 
 Kenneth Merry
 ken@kdm.org
 

From: Ken Harrenstien <klh@netcom.com>
To: "Kenneth D. Merry" <ken@kdm.org>
Cc: klh@netcom.com, freebsd-gnats-submit@FreeBSD.ORG
Subject: Re: kern/15447: Seagate ST32550 (Barracuda 2LP) may be a broken
        tagged queueing drive?
Date: Tue, 14 Dec 99 4:58:33 PST

 > On Sun, Dec 12, 1999 at 06:40:28PM -0800, klh@netcom.com wrote:
 > > A separate problem is causing my system to sometimes boot up with
 > > tagged queueing enabled and sometimes not.  I've recently been stressing
 > > the disk significantly more than usual and have encountered user-level
 > > I/O errors that I traced back to the enabling of tagged queueing.
 > > 
 > > With tagged queueing off, everything always works.  With it on,
 > > a heavy load of seeks will cause reads and writes to start failing.
 > > I was able to verify this by running a test case on two ST32550s, both
 > > on line during the same kernel boot and both identical in all respects
 > > except that one had tagged queueing enabled and the other didn't (the
 > > randomness of this enabling is a separate problem).  The drive without
 > > tagging always works perfectly; the drive with tagging always fails
 > > at random places during the test.  I verified that it is not specific
 > > to the individual drives by doing reboots until the formerly tag-enabled
 > > drive booted up tag-disabled -- whereupon it then performed perfectly
 > > again.  I also verified that the filesystems were identical by doing
 > > a complete track-by-track copy of one to the other prior to testing.
 > > 
 > > The ST32550 is not in the latest quirks table in cam_xtp.c, although
 > > several other Seagates are.  The Barracuda 2LP was at one time fairly
 > > popular so I'm a little surprised this hasn't shown up before, but
 > > who knows.  Maybe most FreeBSD users have IDE drives.
 > 
 > No, there are many, many people using Seagate drives (including me)
 > successfully in FreeBSD systems.  I think this problem is most likely
 > peculiar to your particular system and/or drives.
 
 Agreed.  These are surplus drives that appear to be Sun OEM but those,
 also, are in wide use.  If any of the hardware is to be suspected, I
 would squint at the AM53C974 or more properly its driver.  Read on.
 
 > > >Fix:
 > > Obviously the ST32550 can be added to the quirks table in cam_xtp.c.
 > > I just hope this does not reflect some underlying problem with
 > > tagged queueing support of Seagates in general.
 > 
 > Nope, it reflects a problem either with your drives or your cabling and
 > termination setup.
 
 I think the cabling and termination is highly unlikely to be a problem
 in this case; I am familiar with the SCSI requirements and use
 high-quality components, active termination, etc.  In any case, it's
 only a fast-10 bus and there have been no other signs of trouble.
 It's *only* when the kernel thinks that tagged queueing is enabled
 that I start to get user-mode I/O errors, and then only when doing a
 lot of long-distance seeks (ie when commands would start piling up).
 
 One of the main things I've been trying to pin down is whether this
 problem is specific to the ST32550s or if it happens with other drives
 as well.  Finally, after several hours far into the night of reboots
 with various versions and flags, I struck paydirt and enticed the
 system to come up with the Fujitsu M2952 TQ-enabled.
 
 Guess what?  It behaves just like the ST32550s, meaning that it causes
 the same problems with Tagged-Queueing enabled, but works fine
 otherwise.
 
 I wondered if perhaps the problem might be a queue-full condition;
 the ST32550 manual says it can handle up to 64 commands, while the
 kernel default is 255 (implying it expects a QUEUE FULL response from
 the drive).  So I tried adding a quirk entry limiting the Seagate to
 a maxtags of 63.  No luck.  Tried 32.  Still no change.  Now I'm using
 0 which disables it altogether and things are now safe.
 
 > You need to supply some more information before we can make any sort of
 > guess at what is going on.  So, please send (and make sure you do a "group"
 > reply to this mail, so it winds up in the PR database) full 'dmesg' output
 > from your system, including any kernel messages that have shown up while
 > doing your tests.
 > 
 > Please don't send the output of /var/log/messages, unless it is necessary
 > to show problems that happened in a previous boot.  The output of dmesg(8)
 > is easier to read.
 
 Done; see response to kern/15446.
 
 > Also, please send a description of your cabling and termination setup.
 
 #1 ---- #7 ---- #0 ---- #2 ---- #3 ---- TERM
 DPES	amd0	ST32550	ST32550	M2952
 (term)
 
 #1 internal, #0,2,3 external.
 
 Because of your statement that the ST32550 is known to work, and the
 fact that my Fujitsu was failing in the same way, I don't think the
 drives are at fault.  So we're left with either the controller, or
 FreeBSD 3.1's support of it, or something else.  The controller seems
 unlikely since Tagged Queueing is a higher-level protocol and there's
 no reason to suspect either the physical bus or the link-level
 protocol (otherwise many more problems would have evinced themselves).
 
 One more data point.  I use the same kernel source base in another
 system (NCR 53c895, 3 IBM drives) where all drives are TQ-enabled and
 have never had problems despite much heavier usage.
 
 I'm starting to think that whatever is causing the kernel to be
 spastic about whether or not to use Tagged Queueing (cf kern/15446)
 may also be responsible for its failure to operate properly.  In any
 case, since the ST32550 is no longer a suspect, I suggest that this
 bug (kern/15447) be closed and the above information made a follow-up
 to kern/15446.
 
 --Ken
 

From: "Kenneth D. Merry" <ken@kdm.org>
To: Ken Harrenstien <klh@netcom.com>
Cc: freebsd-gnats-submit@FreeBSD.ORG
Subject: Re: kern/15447: Seagate ST32550 (Barracuda 2LP) may be a broken tagged queueing drive?
Date: Wed, 15 Dec 1999 00:13:04 -0700

 On Tue, Dec 14, 1999 at 04:58:33 -0800, Ken Harrenstien wrote:
 > Because of your statement that the ST32550 is known to work, and the
 > fact that my Fujitsu was failing in the same way, I don't think the
 > drives are at fault.  So we're left with either the controller, or
 > FreeBSD 3.1's support of it, or something else.  The controller seems
 > unlikely since Tagged Queueing is a higher-level protocol and there's
 > no reason to suspect either the physical bus or the link-level
 > protocol (otherwise many more problems would have evinced themselves).
 > 
 > One more data point.  I use the same kernel source base in another
 > system (NCR 53c895, 3 IBM drives) where all drives are TQ-enabled and
 > have never had problems despite much heavier usage.
 > 
 > I'm starting to think that whatever is causing the kernel to be
 > spastic about whether or not to use Tagged Queueing (cf kern/15446)
 > may also be responsible for its failure to operate properly.  In any
 > case, since the ST32550 is no longer a suspect, I suggest that this
 > bug (kern/15447) be closed and the above information made a follow-up
 > to kern/15446.
 
 Thanks for all the detailed information.  Based on PR kern/15447, I think
 we can close this and assume for now that this is a problem with Tekram's
 amd driver.
 
 Ken
 -- 
 Kenneth Merry
 ken@kdm.org
 
State-Changed-From-To: open->closed 
State-Changed-By: ken 
State-Changed-When: Tue Dec 14 23:31:11 PST 1999 
State-Changed-Why:  
Closed at the request of the submitter.  See PR kern/15446 for additional 
followup information on this problem. 


Responsible-Changed-From-To: freebsd-bugs->ken 
Responsible-Changed-By: ken 
Responsible-Changed-When: Tue Dec 14 23:31:11 PST 1999 
Responsible-Changed-Why:  
I'm handling this. 
>Unformatted:
