From jin@iss-p1.lbl.gov  Wed Mar 12 10:52:39 1997
Received: from iss-p1.lbl.gov (iss-p1.lbl.gov [131.243.2.47])
          by freefall.freebsd.org (8.8.5/8.8.5) with ESMTP id KAA02276
          for <FreeBSD-gnats-submit@freebsd.org>; Wed, 12 Mar 1997 10:52:39 -0800 (PST)
Received: (from jin@localhost)
	by iss-p1.lbl.gov (8.8.5/8.8.5) id KAA06337;
	Wed, 12 Mar 1997 10:52:38 -0800 (PST)
Message-Id: <199703121852.KAA06337@iss-p1.lbl.gov>
Date: Wed, 12 Mar 1997 10:52:38 -0800 (PST)
From: "Jin Guojun[ITG]" <jin@iss-p1.lbl.gov>
Reply-To: jin@iss-p1.lbl.gov
To: FreeBSD-gnats-submit@freebsd.org
Subject: st0 hang/fail on reading 4mm DAT tape for larger files
X-Send-Pr-Version: 3.2

>Number:         2965
>Category:       kern
>Synopsis:       st0 hang/fail on reading 4mm DAT tape for larger files
>Confidential:   no
>Severity:       serious
>Priority:       high
>Responsible:    freebsd-bugs
>State:          closed
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Wed Mar 12 11:00:02 PST 1997
>Closed-Date:    Fri Nov 19 13:50:39 PST 1999
>Last-Modified:  Fri Nov 19 13:51:09 PST 1999
>Originator:     Jin Guojun[ITG]
>Release:        FreeBSD 2.2-970310-GAMMA i386
>Organization:
>Environment:

	2.2-SNAP(s) with HP C1536A SCSI 4mm DAT tape drive:
		(ncr0:4:0): "HP HP35480A T503" type 1 removable SCSI 2
		st0(ncr0:4:0): Sequential-Access 
		st0(ncr0:4:0): 5.0 MB/s (200 ns, offset 8)

>Description:

	tar -c some.files # writing OK
	tar -t
	or
	tar -xv
	will hang when looking/reading a file larger than about 6090 Bytes;

tar: read error on /dev/nrst0 : Input/output error

	tar process is hanging at here and tape drive stopped.

	system errors:
ncr0:4: ERROR (0:48) (1-21-1e) (88/13) @ (c2c:19000200).
        script cmd = 89030000
        reg:     da 10 80 13 47 88 04 1f 01 01 84 21 80 01 99 00.
ncr0: have to clear fifos.
ncr0: restart (fatal error).
st0(ncr0:4:0): COMMAND FAILED (9 ff) @f2136c00.
ncr0: timeout ccb=f2136c00 (skip)

	The tar process cannot be killed. The only solution is power
	cycle the tape drive.

	The same hardware worked with 2.1.{6-7} without any problem.
	So, it looks like software problem in the kernel somewhere.

>How-To-Repeat:

	Attach a HP C1536A SCSI DAT tape drive (dmesg says HP35480A)
	to a 2.2 system. The file can be either text or binary.
	truncat is not a standard UNIX command. You may use tail command
	to do so.

146 /home/users/jin/src/unix/FreeBSD: truncat 0 6100 < y > z
147 /home/users/jin/src/unix/FreeBSD: ll
total 54
-rw-r--r--   1 jin  advdev  8192 Mar 12 10:33 x
-rw-r--r--   1 jin  advdev  6550 Mar 12 10:35 y
-rw-r--r--   1 jin  advdev  6100 Mar 12 10:39 z
148 /home/users/jin/src/unix/FreeBSD: mt rew
149 /home/users/jin/src/unix/FreeBSD: tar -cv z
z
150 /home/users/jin/src/unix/FreeBSD: mt rew
151 /home/users/jin/src/unix/FreeBSD: tar -t
tar: read error on /dev/nrst0 : Input/output error

####### power cycle tape drive HERE #########

152 /home/users/jin/src/unix/FreeBSD: truncat -l 0 6090 < x > z
153 /home/users/jin/src/unix/FreeBSD: ll
total 54
-rw-r--r--   1 jin  advdev  8192 Mar 12 10:33 x
-rw-r--r--   1 jin  advdev  6550 Mar 12 10:35 y
-rw-r--r--   1 jin  advdev  6090 Mar 12 10:41 z
154 /home/users/jin/src/unix/FreeBSD: mt rew
155 /home/users/jin/src/unix/FreeBSD: tar -cv z
z
mt rew
tar -t
156 /home/users/jin/src/unix/FreeBSD: z


>Fix:
	
	

>Release-Note:
>Audit-Trail:

From: Stefan Esser <se@freebsd.org>
To: jin@iss-p1.lbl.gov
Cc: FreeBSD-gnats-submit@freebsd.org
Subject: Re: kern/2965: st0 hang/fail on reading 4mm DAT tape for larger files
Date: Wed, 12 Mar 1997 21:42:40 +0100

 On Mar 12, "Jin Guojun[ITG]" <jin@iss-p1.lbl.gov> wrote:
 > >Synopsis:       st0 hang/fail on reading 4mm DAT tape for larger files
 
 > 	2.2-SNAP(s) with HP C1536A SCSI 4mm DAT tape drive:
 > 		(ncr0:4:0): "HP HP35480A T503" type 1 removable SCSI 2
 > 		st0(ncr0:4:0): Sequential-Access 
 > 		st0(ncr0:4:0): 5.0 MB/s (200 ns, offset 8)
 > 
 > >Description:
 > 
 > 	tar -c some.files # writing OK
 > 	tar -t
 > 	or
 > 	tar -xv
 > 	will hang when looking/reading a file larger than about 6090 Bytes;
 > 
 > tar: read error on /dev/nrst0 : Input/output error
 > 
 > 	tar process is hanging at here and tape drive stopped.
 > 
 > 	system errors:
 > ncr0:4: ERROR (0:48) (1-21-1e) (88/13) @ (c2c:19000200).
 
 The error code (0x48) signals a GROSS ERROR, which in most
 cases is an over- or underflow on the SCSI bus while doing
 a synchronous transfer. This means, that either one byte
 has been acknowledged null or two times.
 
 This is a reported hardware error.
 
 >         script cmd = 89030000
 >         reg:     da 10 80 13 47 88 04 1f 01 01 84 21 80 01 99 00.
 > ncr0: have to clear fifos.
 > ncr0: restart (fatal error).
 > st0(ncr0:4:0): COMMAND FAILED (9 ff) @f2136c00.
 > ncr0: timeout ccb=f2136c00 (skip)
 > 
 > 	The tar process cannot be killed. The only solution is power
 > 	cycle the tape drive.
 > 
 > 	The same hardware worked with 2.1.{6-7} without any problem.
 > 	So, it looks like software problem in the kernel somewhere.
 
 Well, it looks like the error recovery fails, and there 
 is actually a difference between 2.1 and 2.2, but with 
 2.2 surviving a number of scenarios where 2.1 hung.
 
 I know that the error recovery code needs quite some work,
 but the primary cause of your problem is more likely 
 related to a SCSI bus problem, which is only visible by
 the failed recovery procedure.
 
 I'm using a HP1533A DDS-2 DAT drive for my backups and 
 to exchange data, and never had a single occurance of a
 hang under any -current (i.e. post 2.1 kernel).
 
 Please make sure that there is no problem with your 
 controller or SCSI bus cable/terminators/terminator
 power ...
 
 Regards, STefan

From: "Jin Guojun[ITG]" <jin@george.lbl.gov>
To: jin@iss-p1.lbl.gov, se@freebsd.org
Cc: FreeBSD-gnats-submit@freebsd.org
Subject: Re: kern/2965: st0 hang/fail on reading 4mm DAT tape for larger files
Date: Wed, 12 Mar 1997 13:43:55 -0800

 } On Mar 12, "Jin Guojun[ITG]" <jin@iss-p1.lbl.gov> wrote:
 } > >Synopsis:       st0 hang/fail on reading 4mm DAT tape for larger files
 } 
 } >       2.2-SNAP(s) with HP C1536A SCSI 4mm DAT tape drive:
 } >               (ncr0:4:0): "HP HP35480A T503" type 1 removable SCSI 2
 } >               st0(ncr0:4:0): Sequential-Access 
 } >               st0(ncr0:4:0): 5.0 MB/s (200 ns, offset 8)
 } > 
 } > >Description:
 } > 
 } >       tar -c some.files # writing OK
 } >       tar -t
 } >       or
 } >       tar -xv
 } >       will hang when looking/reading a file larger than about 6090 Bytes;
 } > 
 } > tar: read error on /dev/nrst0 : Input/output error
 } > 
 } >       tar process is hanging at here and tape drive stopped.
 } > 
 } >       system errors:
 } > ncr0:4: ERROR (0:48) (1-21-1e) (88/13) @ (c2c:19000200).
 } 
 } The error code (0x48) signals a GROSS ERROR, which in most
 } cases is an over- or underflow on the SCSI bus while doing
 } a synchronous transfer. This means, that either one byte
 } has been acknowledged null or two times.
 } 
 } This is a reported hardware error.
 } 
 } >         script cmd = 89030000
 } >         reg:     da 10 80 13 47 88 04 1f 01 01 84 21 80 01 99 00.
 } > ncr0: have to clear fifos.
 } > ncr0: restart (fatal error).
 } > st0(ncr0:4:0): COMMAND FAILED (9 ff) @f2136c00.
 } > ncr0: timeout ccb=f2136c00 (skip)
 } > 
 } >       The tar process cannot be killed. The only solution is power
 } >       cycle the tape drive.
 } > 
 } >       The same hardware worked with 2.1.{6-7} without any problem.
 } >       So, it looks like software problem in the kernel somewhere.
 } 
 } Well, it looks like the error recovery fails, and there 
 } is actually a difference between 2.1 and 2.2, but with 
 } 2.2 surviving a number of scenarios where 2.1 hung.
 } 
 } I know that the error recovery code needs quite some work,
 } but the primary cause of your problem is more likely 
 } related to a SCSI bus problem, which is only visible by
 } the failed recovery procedure.
 } 
 } I'm using a HP1533A DDS-2 DAT drive for my backups and 
 } to exchange data, and never had a single occurance of a
 } hang under any -current (i.e. post 2.1 kernel).
 } 
 } Please make sure that there is no problem with your 
 } controller or SCSI bus cable/terminators/terminator
 } power ...
 } 
 } Regards, STefan
 
 This is one machine that runs both 2.1.x and 2.2-SNAP. So, there is no
 hardware problem at all.
 tar -cv files under 2.2-SNAP
 and reboot it to 2.1.x right way, 2.1.x read the tape perfectly.
 However, the tape written by both 2.1.x and 2.2-SNAP will not be able to
 be read by 2.2-SNAP. That is why I think it is 2.2 problem, nothing else.
 
 -Jin

From: "Jin Guojun[ITG]" <jin@george.lbl.gov>
To: se@freebsd.org
Cc: FreeBSD-gnats-submit@freebsd.org
Subject: Re: kern/2965: st0 hang/fail on reading 4mm DAT tape for larger files
Date: Wed, 12 Mar 1997 18:28:41 -0800

 } >         script cmd = 89030000
 } >         reg:     da 10 80 13 47 88 04 1f 01 01 84 21 80 01 99 00.
 } > ncr0: have to clear fifos.
 } > ncr0: restart (fatal error).
 } > st0(ncr0:4:0): COMMAND FAILED (9 ff) @f2136c00.
 } > ncr0: timeout ccb=f2136c00 (skip)
 } > 
 } >       The tar process cannot be killed. The only solution is power
 } >       cycle the tape drive.
 } > 
 } >       The same hardware worked with 2.1.{6-7} without any problem.
 } >       So, it looks like software problem in the kernel somewhere.
 } 
 } Well, it looks like the error recovery fails, and there 
 } is actually a difference between 2.1 and 2.2, but with 
 } 2.2 surviving a number of scenarios where 2.1 hung.
 } 
 } I know that the error recovery code needs quite some work,
 } but the primary cause of your problem is more likely 
 } related to a SCSI bus problem, which is only visible by
 } the failed recovery procedure.
 } 
 } I'm using a HP1533A DDS-2 DAT drive for my backups and 
 } to exchange data, and never had a single occurance of a
 } hang under any -current (i.e. post 2.1 kernel).
 } 
 } Please make sure that there is no problem with your 
 } controller or SCSI bus cable/terminators/terminator
 } power ...
 } 
 } Regards, STefan
 
 Here is more accurate information. Eventhough both 2.1.7 and 2.2-SNAP
 run on the same hardware, but they are on different hard drives.
 2.2-SNAP is on IDE drive, and 2.1.7 was plugged on SCSI bus when it is booted.
 So, put a SCSI disk drive on the SCSI chain helped the SCSI tape drive to work.
 
 So, put 2.1.7 on an IDE drive to test the tape drive, it got the similar
 errors. But, the 2.1.7 does not hang the process, and 2.2 does.
 
 Thanks,
 
 -Jin
 

From: "Jordan K. Hubbard" <jkh@time.cdrom.com>
To: jin@iss-p1.lbl.gov
Cc: FreeBSD-gnats-submit@FreeBSD.ORG
Subject: Re: kern/2965: st0 hang/fail on reading 4mm DAT tape for larger files 
Date: Thu, 13 Mar 1997 00:26:57 -0800

 > 	2.2-SNAP(s) with HP C1536A SCSI 4mm DAT tape drive:
 > 		(ncr0:4:0): "HP HP35480A T503" type 1 removable SCSI 2
 
 Do you have any other non-NCR SCSI controllers around there?  It would
 be very instructive to know if the same error occurs with all the same
 hardware except an Adaptec 2940 instead of the NCR.  If not, that would
 greatly narrow down the number of places to search!
 
 						Jordan

From: "Jin Guojun[ITG]" <jin@george.lbl.gov>
To: jkh@time.cdrom.com
Cc: FreeBSD-gnats-submit@FreeBSD.ORG
Subject: Re: kern/2965: st0 hang/fail on reading 4mm DAT tape for larger files
Date: Thu, 13 Mar 1997 08:34:03 -0800

 } >       2.2-SNAP(s) with HP C1536A SCSI 4mm DAT tape drive:
 } >               (ncr0:4:0): "HP HP35480A T503" type 1 removable SCSI 2
 } 
 } Do you have any other non-NCR SCSI controllers around there?  It would
 } be very instructive to know if the same error occurs with all the same
 } hardware except an Adaptec 2940 instead of the NCR.  If not, that would
 } greatly narrow down the number of places to search!
 } 
 }                                                 Jordan
 
 Unforturnately, I have only NCR SCSI controllers that work for FreeBSD.
 I do have a SCSI controller with QLogic chipset, but It seems not working
 under FreeBSD. The controller is made by DEC, and it has 21040 ethernet
 chipset which works for FreeBSD. If some one knows how to make this SCSI
 controller to workunder FreeBSD, I will use it to do some further testing.
 
 Thanks,
 
 -Jin
 

From: "Jordan K. Hubbard" <jkh@time.cdrom.com>
To: "Jin Guojun[ITG]" <jin@george.lbl.gov>
Cc: FreeBSD-gnats-submit@FreeBSD.ORG
Subject: Re: kern/2965: st0 hang/fail on reading 4mm DAT tape for larger files 
Date: Thu, 13 Mar 1997 21:46:50 -0800

 > I do have a SCSI controller with QLogic chipset, but It seems not working
 > under FreeBSD. The controller is made by DEC, and it has 21040 ethernet
 
 No, the QLogic has never been supported, I'm afraid. :(
 
 					Jordan
State-Changed-From-To: open->closed 
State-Changed-By: phk 
State-Changed-When: Fri Nov 19 13:50:39 PST 1999 
State-Changed-Why:  
drive is no longer in the system (pre CAM scsi) 
>Unformatted:
