From dawes@rf900.physics.usyd.edu.au  Wed Apr 30 03:21:17 1997
Received: from rf900.physics.usyd.edu.au (rf900.physics.usyd.edu.au [129.78.129.109])
          by hub.freebsd.org (8.8.5/8.8.5) with ESMTP id DAA20217
          for <FreeBSD-gnats-submit@freebsd.org>; Wed, 30 Apr 1997 03:21:14 -0700 (PDT)
Received: (from dawes@localhost) by rf900.physics.usyd.edu.au (8.8.5/8.8.2) id UAA29575; Wed, 30 Apr 1997 20:21:09 +1000 (EST)
Message-Id: <199704301021.UAA29575@rf900.physics.usyd.edu.au>
Date: Wed, 30 Apr 1997 20:21:09 +1000 (EST)
From: dawes@rf900.physics.usyd.edu.au
Reply-To: dawes@rf900.physics.usyd.edu.au
To: FreeBSD-gnats-submit@freebsd.org
Subject: Timeouts too low in st.c
X-Send-Pr-Version: 3.2

>Number:         3428
>Category:       kern
>Synopsis:       Some timeouts in st.c are too low when a DAT retries
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    steve
>State:          closed
>Quarter:
>Keywords:
>Date-Required:
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Wed Apr 30 03:30:01 PDT 1997
>Closed-Date:    Sat Aug 23 09:34:12 PDT 1997
>Last-Modified:  Sat Aug 23 09:35:05 PDT 1997
>Originator:     David Dawes
>Release:        FreeBSD 2.2-STABLE i386
>Organization:
>Environment:

	Pentium 120 with NCR SCSI, DAT drive:

	ncr0 <ncr 53c810 scsi> rev 2 int a irq 10 on pci0:11
	(ncr0:4:0): "SONY SDT-4000 2.09" type 1 removable SCSI 2
	st0(ncr0:4:0): Sequential-Access 
	st0(ncr0:4:0): 5.0 MB/s (200 ns, offset 7)
	density code 0x13,  drive empty

>Description:

	When the DAT drive I'm currently using (Sony SDT4000) does many
	retries on read/write (because of media and/or head problems),
	the command times out.  The NCR driver doesn't handle timeouts
	very well, and the result is that the machine becomes unusable
	until rebooted.

>How-To-Repeat:

	Read or write a tape that requires the drive to do many retries.

>Fix:
	
	I found that increasing the appropriate SCSI command timeouts
	in st.c was a suitable way of avoiding the problem.  It would
	be nice if all the SCSI hardware drivers recovered gracefully
	from timeouts, but they don't, and that is a more difficult
	problem to fix.  I don't know much about how SCSI command timeouts
	"should" be set, but it seems to me that they should be high
	enough to accomodate the behaviour of the devices.  In the case
	of this particular drive, its internal read/write retries can
	take longer than 100 seconds.

	The following patch is what I'm successfully using:


*** st.c	Sat Sep  7 09:09:20 1996
--- sys/scsi/st.c	Sun Apr 13 23:01:41 1997
***************
*** 969,975 ****
  			(u_char *) bp->b_un.b_addr,
  			bp->b_bcount,
  			0,	/* can't retry a read on a tape really */
! 			100000,
  			bp,
  			flags) == SUCCESSFULLY_QUEUED) {
  		} else {
--- 969,975 ----
  			(u_char *) bp->b_un.b_addr,
  			bp->b_bcount,
  			0,	/* can't retry a read on a tape really */
! 			1000000,
  			bp,
  			flags) == SUCCESSFULLY_QUEUED) {
  		} else {
***************
*** 1214,1220 ****
  		(u_char *) buf,
  		size,
  		0,		/* not on io commands */
! 		100000,
  		NULL,
  		flags | SCSI_DATA_IN));
  }
--- 1214,1220 ----
  		(u_char *) buf,
  		size,
  		0,		/* not on io commands */
! 		1000000,
  		NULL,
  		flags | SCSI_DATA_IN));
  }
***************
*** 1609,1615 ****
  		0,
  		0,
  		0,		/* no retries, just fail */
! 		100000,		/* 10 secs.. (may need to repos head ) */
  		NULL,
  		flags);
  }
--- 1609,1615 ----
  		0,
  		0,
  		0,		/* no retries, just fail */
! 		1000000,	/* 1000 secs.. (may need to repos head ) */
  		NULL,
  		flags);
  }
>Release-Note:
>Audit-Trail:
State-Changed-From-To: open->closed 
State-Changed-By: steve 
State-Changed-When: Sat Aug 23 09:34:12 PDT 1997 
State-Changed-Why:  
PR should be closed as noted in misc/4028. 
>Unformatted:
