From nobody@FreeBSD.org  Thu Sep 13 13:59:40 2001
Return-Path: <nobody@FreeBSD.org>
Received: from freefall.freebsd.org (freefall.freebsd.org [216.136.204.21])
	by hub.freebsd.org (Postfix) with ESMTP id 2B7EC37B406
	for <freebsd-gnats-submit@FreeBSD.org>; Thu, 13 Sep 2001 13:59:40 -0700 (PDT)
Received: (from nobody@localhost)
	by freefall.freebsd.org (8.11.4/8.11.4) id f8DKxem61171;
	Thu, 13 Sep 2001 13:59:40 -0700 (PDT)
	(envelope-from nobody)
Message-Id: <200109132059.f8DKxem61171@freefall.freebsd.org>
Date: Thu, 13 Sep 2001 13:59:40 -0700 (PDT)
From: Jeremy Chadwick <jdc@best.net>
To: freebsd-gnats-submit@FreeBSD.org
Subject: Intense SCSI tape access results in controller errors
X-Send-Pr-Version: www-1.0

>Number:         30559
>Category:       kern
>Synopsis:       Intense SCSI tape access results in controller errors
>Confidential:   no
>Severity:       serious
>Priority:       high
>Responsible:    freebsd-bugs
>State:          closed
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Thu Sep 13 14:00:00 PDT 2001
>Closed-Date:    Mon Sep 17 11:45:22 PDT 2001
>Last-Modified:  Mon Sep 17 11:47:09 PDT 2001
>Originator:     Jeremy Chadwick
>Release:        4.4-RC
>Organization:
Best Internet/Verio/NTT
>Environment:
FreeBSD 4.4-RC #0: Tue Sep 11 03:10:20 PDT 2001
    root@backup2.ba.best.net:/usr/obj/usr/src/sys/BEST-43-SMP
Timecounter "i8254"  frequency 1193182 Hz
CPU: Pentium II/Pentium II Xeon/Celeron (400.91-MHz 686-class CPU)
  Origin = "GenuineIntel"  Id = 0x652  Stepping = 2
  Features=0x183fbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,MMX,FXSR>
real memory  = 268435456 (262144K bytes)
avail memory = 257765376 (251724K bytes)
Programming 24 pins in IOAPIC #0
IOAPIC #0 intpin 2 -> irq 0
FreeBSD/SMP: Multiprocessor motherboard
 cpu0 (BSP): apic id:  0, version: 0x00040011, at 0xfee00000
 cpu1 (AP):  apic id:  1, version: 0x00040011, at 0xfee00000
 io0 (APIC): apic id:  2, version: 0x00170011, at 0xfec00000
Preloaded elf kernel "kernel" at 0xc0320000.
ccd0-7: Concatenated disk drivers
Pentium Pro MTRR support enabled
npx0: <math processor> on motherboard
npx0: INT 16 interface
pcib0: <Intel 82443BX (440 BX) host to PCI bridge> on motherboard
IOAPIC #0 intpin 16 -> irq 2
IOAPIC #0 intpin 17 -> irq 9
IOAPIC #0 intpin 18 -> irq 10
pci0: <PCI bus> on pcib0
pcib1: <Intel 82443BX (440 BX) PCI-PCI (AGP) bridge> at device 1.0 on pci0
pci1: <PCI bus> on pcib1
pci1: <Trident model 9750 VGA-compatible display device> at 0.0 irq 0
isab0: <Intel 82371AB PCI to ISA bridge> at device 7.0 on pci0
isa0: <ISA bus> on isab0
pci0: <Intel PIIX4 ATA controller> at 7.1
pci0: <Intel 82371AB/EB (PIIX4) USB controller> at 7.2
intpm0: <Intel 82371AB Power management controller> port 0x440-0x44f irq 9 at device 7.3 on pci0
intpm0: I/O mapped 440
intpm0: intr IRQ 9 enabled revision 0
smbus0: <System Management Bus> on intsmb0
smb0: <SMBus general purpose I/O> on smbus0
intpm0: PM I/O mapped 400 
ahc0: <Adaptec 2940 Ultra SCSI adapter> port 0xe800-0xe8ff mem 0xfebff000-0xfebfffff irq 2 at device 16.0 on pci0
aic7880: Ultra Wide Channel A, SCSI Id=7, 16/255 SCBs
xl0: <3Com 3c905B-TX Fast Etherlink XL> port 0xec00-0xec7f mem 0xfebfef80-0xfebfefff irq 9 at device 17.0 on pci0
xl0: Ethernet address: 00:10:5a:18:4d:0a
miibus0: <MII bus> on xl0
xlphy0: <3Com internal media interface> on miibus0
xlphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
ahc1: <Adaptec 2940 Ultra SCSI adapter> port 0xe400-0xe4ff mem 0xfebfd000-0xfebfdfff irq 10 at device 18.0 on pci0
aic7880: Ultra Wide Channel A, SCSI Id=7, 16/255 SCBs
orm0: <Option ROM> at iomem 0xc0000-0xcbfff on isa0
fdc0: <NEC 72065B or clone> at port 0x3f0-0x3f5,0x3f7 irq 6 drq 2 on isa0
fdc0: FIFO enabled, 8 bytes threshold
fd0: <1440-KB 3.5" drive> on fdc0 drive 0
atkbdc0: <Keyboard controller (i8042)> at port 0x60,0x64 on isa0
atkbd0: <AT Keyboard> irq 1 on atkbdc0
vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0
sc0: <System console> on isa0
sc0: VGA <16 virtual consoles, flags=0x0>
sio0 at port 0x3f8-0x3ff irq 4 flags 0x10 on isa0
sio0: type 16550A, console
sio1 at port 0x2f8-0x2ff irq 3 on isa0
sio1: type 16550A
APIC_IO: Testing 8254 interrupt delivery
APIC_IO: routing 8254 via IOAPIC #0 intpin 2
IP packet filtering initialized, divert disabled, rule-based forwarding disabled, default to deny, unlimited logging
Waiting 5 seconds for SCSI devices to settle
SMP: AP CPU #1 Launched!
sa0 at ahc1 bus 0 target 1 lun 0
sa0: <SONY SDX-500C 0102> Removable Sequential Access SCSI-2 device 
sa0: 40.000MB/s transfers (20.000MHz, offset 8, 16bit)
sa1 at ahc1 bus 0 target 2 lun 0
sa1: <SONY SDX-500C 0102> Removable Sequential Access SCSI-2 device 
sa1: 40.000MB/s transfers (20.000MHz, offset 8, 16bit)
sa2 at ahc1 bus 0 target 3 lun 0
sa2: <SONY SDX-500C 0107> Removable Sequential Access SCSI-2 device 
sa2: 40.000MB/s transfers (20.000MHz, offset 8, 16bit)
da0 at ahc0 bus 0 target 0 lun 0
da0: <SEAGATE ST34573W 6244> Fixed Direct Access SCSI-2 device 
da0: 40.000MB/s transfers (20.000MHz, offset 8, 16bit), Tagged Queueing Enabled
da0: 4340MB (8888924 512 byte sectors: 255H 63S/T 553C)
da2 at ahc0 bus 0 target 2 lun 0
da2: <SEAGATE ST318275LW 0001> Fixed Direct Access SCSI-2 device 
da2: 40.000MB/s transfers (20.000MHz, offset 8, 16bit), Tagged Queueing Enabled
da2: 17366MB (35566480 512 byte sectors: 255H 63S/T 2213C)
da1 at ahc0 bus 0 target 1 lun 0
da1: <SEAGATE ST318275LW 0001> Fixed Direct Access SCSI-2 device 
da1: 40.000MB/s transfers (20.000MHz, offset 8, 16bit), Tagged Queueing Enabled
da1: 17366MB (35566480 512 byte sectors: 255H 63S/T 2213C)
ch0 at ahc1 bus 0 target 0 lun 0
ch0: <QUALSTAR TLS-46120 1.31> Removable Changer SCSI-2 device 
ch0: 3.300MB/s transfers
ch0: 126 slots, 3 drives, 1 picker, 1 portal
Mounting root from ufs:/dev/da0s1a

>Description:
  Under heavy SCSI tape access, our system spits out the following on the console.  Please note this applies to the ahc1 controller.

(sa0:ahc1:0:1:0): SCB 0x7 - timed out
ahc1: Dumping Card State in Data-out phase, at SEQADDR 0x6c
ACCUM = 0x0, SINDEX = 0x8, DINDEX = 0x8f, ARG_2 = 0x1
HCNT = 0x0
SCSISEQ = 0x12, SBLKCTL = 0x2
 DFCNTRL = 0x3c, DFSTATUS = 0x6d
LASTPHASE = 0x0, SCSISIGI = 0x4, SXFRCTL0 = 0xa0
SSTAT0 = 0x0, SSTAT1 = 0x2
STACK == 0x83, 0x188, 0x147, 0x0
SCB count = 20
Kernel NEXTQSCB = 9
Card NEXTQSCB = 9
QINFIFO entries: 
Waiting Queue entries: 
Disconnected Queue entries: 
QOUTFIFO entries: 
Sequencer Free SCB List: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 
Pending list: 7
Kernel Free SCB list: 14 6 8 15 16 17 18 19 0 1 2 3 4 5 13 12 11 10 
Untagged Q(1): 7 
sg[0] - Addr 0x48fc000 : Length 4096
sg[1] - Addr 0x315d000 : Length 4096
sg[2] - Addr 0x7be000 : Length 4096
sg[3] - Addr 0x3fdf000 : Length 4096
sg[4] - Addr 0xd4c0000 : Length 4096
sg[5] - Addr 0xb001000 : Length 4096
sg[6] - Addr 0x63e2000 : Length 4096
sg[7] - Addr 0x38a3000 : Length 4096
sg[8] - Addr 0x6a04000 : Length 4096
sg[9] - Addr 0x2de5000 : Length 4096
sg[10] - Addr 0x46e6000 : Length 4096
sg[11] - Addr 0x52c7000 : Length 4096
sg[12] - Addr 0x6ee8000 : Length 4096
sg[13] - Addr 0xa6c9000 : Length 4096
sg[14] - Addr 0x5d2a000 : Length 4096
sg[15] - Addr 0x3b0b000 : Length 4096
(sa0:ahc1:0:1:0): BDR message in message buffer
(sa0:ahc1:0:1:0): SCB 0x7 - timed out
ahc1: Dumping Card State in Data-out phase, at SEQADDR 0x6d
ACCUM = 0x0, SINDEX = 0x8, DINDEX = 0x8f, ARG_2 = 0x1
HCNT = 0x0
SCSISEQ = 0x12, SBLKCTL = 0x2
 DFCNTRL = 0x3c, DFSTATUS = 0x6d
LASTPHASE = 0x0, SCSISIGI = 0x14, SXFRCTL0 = 0xa0
SSTAT0 = 0x0, SSTAT1 = 0x2
STACK == 0x83, 0x188, 0x147, 0x0
SCB count = 20
Kernel NEXTQSCB = 9
Card NEXTQSCB = 9
QINFIFO entries: 
Waiting Queue entries: 
Disconnected Queue entries: 
QOUTFIFO entries: 
Sequencer Free SCB List: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 
Pending list: 7
Kernel Free SCB list: 14 6 8 15 16 17 18 19 0 1 2 3 4 5 13 12 11 10 
Untagged Q(1): 7 
sg[0] - Addr 0x48fc000 : Length 4096
sg[1] - Addr 0x315d000 : Length 4096
sg[2] - Addr 0x7be000 : Length 4096
sg[3] - Addr 0x3fdf000 : Length 4096
sg[4] - Addr 0xd4c0000 : Length 4096
sg[5] - Addr 0xb001000 : Length 4096
sg[6] - Addr 0x63e2000 : Length 4096
sg[7] - Addr 0x38a3000 : Length 4096
sg[8] - Addr 0x6a04000 : Length 4096
sg[9] - Addr 0x2de5000 : Length 4096
sg[10] - Addr 0x46e6000 : Length 4096
sg[11] - Addr 0x52c7000 : Length 4096
sg[12] - Addr 0x6ee8000 : Length 4096
sg[13] - Addr 0xa6c9000 : Length 4096
sg[14] - Addr 0x5d2a000 : Length 4096
sg[15] - Addr 0x3b0b000 : Length 4096
(sa0:ahc1:0:1:0): no longer in timeout, status = 34b
ahc1: Issued Channel A Bus Reset. 1 SCBs aborted
(sa0:ahc1:0:1:0): failed to write terminating filemark(s)
(sa0:ahc1:0:1:0): tape is now frozen- use an OFFLINE, REWIND or MTEOM command to clear this state.

  Our SCSI bus is terminated properly.  The drives are not LVD.  Cables do not "run too close to the power supply."  Cable length does not exceed specification.  Cable quality is high -- replacing cables made no difference.  Decreasing speed from 40MB/sec to 20MB/sec made no difference.  Disabling SMP (via sysctl MIB) made no difference.

  The only thing I haven't tried is removing the drive from the library/changer system itself, and throwing it right off the main SCSI cable.

  We have no problems with the other Adaptec controller (although used for hard disks).  Both controllers use the same BIOS version.
>How-To-Repeat:
$ tar -b 512 -vpcf /dev/nsa0 shell2.la.best.com__sd*
shell2.la.best.com__sd0a.gz
shell2.la.best.com__sd0d.gz
shell2.la.best.com__sd0e.gz
shell2.la.best.com__sd0f.gz
shell2.la.best.com__sd0g.gz
shell2.la.best.com__sd0h.gz
shell2.la.best.com__sd1d.gz
shell2.la.best.com__sd1e.gz
shell2.la.best.com__sd2d.gz
tar: can't write to /dev/nsa0 : Input/output error

  Where the files in question total 1023875396 bytes (~1GB).

  Using a smaller blocksize results in the operation getting further, but still errors out:

$ tar -b 20 -vpcf /dev/nsa0 shell2.la.best.com__sd*
shell2.la.best.com__sd0a.gz
shell2.la.best.com__sd0d.gz
shell2.la.best.com__sd0e.gz
shell2.la.best.com__sd0f.gz
shell2.la.best.com__sd0g.gz
shell2.la.best.com__sd0h.gz
shell2.la.best.com__sd1d.gz
shell2.la.best.com__sd1e.gz
shell2.la.best.com__sd2d.gz
shell2.la.best.com__sd2e.gz
tar: can't write to /dev/nsa0 : Input/output error

  Blocksize set via mt is 512 bytes:

$ mt -f /dev/sa0.ctl status
Mode      Density              Blocksize      bpi      Compression
Current:  0x31                 512 bytes      0        0x3
---------available modes---------
0:        0x31                 512 bytes      0        0x3
1:        0x31                 512 bytes      0        0x3
2:        0x31                 512 bytes      0        0x3
3:        0x31                 512 bytes      0        0x3
---------------------------------
Current Driver State: at rest.
---------------------------------
File Number: 0  Record Number: 0    Residual Count 0

  Disabling hardware compression (mt comp off) makes no difference.

  The problem is 100% repeatable.
>Fix:
Fix unknown.
>Release-Note:
>Audit-Trail:

From: "Justin T. Gibbs" <gibbs@scsiguy.com>
To: Jeremy Chadwick <jdc@best.net>
Cc: freebsd-gnats-submit@FreeBSD.ORG
Subject: Re: kern/30559: Intense SCSI tape access results in controller errors 
Date: Mon, 17 Sep 2001 11:16:07 -0600

 >>Description:
 >  Under heavy SCSI tape access, our system spits out the following on the cons
 >ole.  Please note this applies to the ahc1 controller.
 
 This essentially tells us that the controller is waiting for the target to
 REQ the last bits of data on this transfer.  Either the target failed to see
 an ACK from the initiator, or the initiator failed to see a REQ from the target.
 
 >  Our SCSI bus is terminated properly.  The drives are not LVD.  Cables do
 > not "run too close to the power supply."  Cable length does not exceed
 > specification.  Cable quality is high -- replacing cables made no difference.
 > Decreasing speed from 40MB/sec to 20MB/sec made no difference.  Disabling SMP
 > (via sysc tl MIB) made no difference.
 >
 >  The only thing I haven't tried is removing the drive from the library/changer
 >  system itself, and throwing it right off the main SCSI cable.
 
 Nonetheless, this is an "environmental" problem.  Perhaps your changer has
 a bad power supply.  Perhaps the changer design does not allow you to run
 with anything other than a very short cable (well below the maximum length
 allowed by the SCSI spec), etc.
 
 If you bootverbose, does the controller report the termination values
 you expect?
 
 --
 Justin

From: Jeremy Chadwick <jdc@best.net>
To: "Justin T. Gibbs" <gibbs@scsiguy.com>
Cc: freebsd-gnats-submit@FreeBSD.ORG
Subject: Re: kern/30559: Intense SCSI tape access results in controller errors
Date: Mon, 17 Sep 2001 11:29:03 -0700

 On Mon, Sep 17, 2001 at 11:16:07AM -0600, gibbs@scsiguy.com wrote:
 > >>Description:
 > >  Under heavy SCSI tape access, our system spits out the following on the cons
 > >ole.  Please note this applies to the ahc1 controller.
 > 
 > This essentially tells us that the controller is waiting for the target to
 > REQ the last bits of data on this transfer.  Either the target failed to see
 > an ACK from the initiator, or the initiator failed to see a REQ from the target.
 > 
 > >  Our SCSI bus is terminated properly.  The drives are not LVD.  Cables do
 > > not "run too close to the power supply."  Cable length does not exceed
 > > specification.  Cable quality is high -- replacing cables made no difference.
 > > Decreasing speed from 40MB/sec to 20MB/sec made no difference.  Disabling SMP
 > > (via sysc tl MIB) made no difference.
 > >
 > >  The only thing I haven't tried is removing the drive from the library/changer
 > >  system itself, and throwing it right off the main SCSI cable.
 > 
 > Nonetheless, this is an "environmental" problem.  Perhaps your changer has
 > a bad power supply.  Perhaps the changer design does not allow you to run
 > with anything other than a very short cable (well below the maximum length
 > allowed by the SCSI spec), etc.
 > 
 > If you bootverbose, does the controller report the termination values
 > you expect?
 
         Thanks for getting back to me.
 
         In a "last attempt" to figure out the problem, we flipped our
         1st and 3rd SDX-500C drives (in the library system).  Oddly
         enough, the problem went away.
 
         This leads me to believe the problem relates to either a flakey
         SCSI port on one of the SDX-500C drives (the one reporting errors
         in my bug report), or possibly bad cabling within the library
         system itself.
 
         Since swapping the drive order, we've seen fantastic results in
         speed and stability from the drives.  Hence my prognosis.
 
         This bug report can be closed.
 
 -- 
 | Jeremy Chadwick                                         jdc@best.net |
 | Best Internet/Verio Pacific                                 ext 8251 |
 | UNIX Systems Administrator                    Mountain View, CA, USA |
 | Verio - "the new world of business"                                  |
 
State-Changed-From-To: open->closed 
State-Changed-By: dwmalone 
State-Changed-When: Mon Sep 17 11:45:22 PDT 2001 
State-Changed-Why:  
Closed as submitter is happy with Justin's explaination. 

http://www.FreeBSD.org/cgi/query-pr.cgi?pr=30559 
>Unformatted:
