From daniels@borg.mit.edu  Mon Oct 28 19:47:44 1996
Received: from borg.mit.edu (BORG.MIT.EDU [18.239.1.69])
          by freefall.freebsd.org (8.7.5/8.7.3) with ESMTP id TAA21085
          for <FreeBSD-gnats-submit@freebsd.org>; Mon, 28 Oct 1996 19:46:40 -0800 (PST)
Received: (from root@localhost) by borg.mit.edu (8.7.5/8.7.3) id WAA00402; Mon, 28 Oct 1996 22:45:40 -0500 (EST)
Message-Id: <199610290345.WAA00402@borg.mit.edu>
Date: Mon, 28 Oct 1996 22:45:40 -0500 (EST)
From: "Daniel C. Stevenson" <daniels@borg.mit.edu>
Reply-To: daniels@borg.mit.edu
To: FreeBSD-gnats-submit@freebsd.org
Subject: NCR PCI error
X-Send-Pr-Version: 3.2

>Number:         1919
>Category:       kern
>Synopsis:       access to files/directories fails, gives NCR PCI error
>Confidential:   no
>Severity:       serious
>Priority:       high
>Responsible:    se
>State:          closed
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Mon Oct 28 19:50:03 PST 1996
>Closed-Date:    Fri Feb 19 16:14:01 PST 1999
>Last-Modified:  Sat Feb 15 11:11:00 PST 2003
>Originator:     Daniel C. Stevenson
>Release:        FreeBSD 2.1-STABLE i386
>Organization:
>Environment:

The system has an NCR PCI controller. It has a 2GB SCSI hard
disk, 48MB of memory, and an ISA-based Ethernet card (3C509).

>Description:

The problem: access to various directories and files fails, giving an
error to the shell of "Input/output failed" for the affected files or
directories. The console displays the following message repeatedly:

assertion "cp" failed: file "../../pci/ncr.c", line 5563
sd0(ncr0:0:0):COMMAND FAILED (4 28) @f0d7c400

(I'm not sure which of these 2 lines actually goes first, but the
pattern repeats continuously)

This problem seems to happen at random times. It has happened when the
system has been up for over a week (since the last reboot) or just a
few days after the last reboot. The system currently maintains a
moderate Web server load and 1 or 2 users. The problem seems to happen
independent of server load or any other discernible influences.

When the error happens, some but not all partitions of the disk
are affected, and only parts of them are affected.

>How-To-Repeat:

Wait for it to happen again.

>Fix:
	
A hard reboot (cycling the power). "reboot" doesn't work
("Input/output error") and using Ctrl-Alt-Del doesn't work either; in
the latter case, it results in a hung system at the "Boot:" prompt.
>Release-Note:
>Audit-Trail:

From: se@zpr.uni-koeln.de (Stefan Esser)
To: daniels@borg.mit.edu
Cc: FreeBSD-gnats-submit@freebsd.org
Subject: Re: kern/1919: NCR PCI error
Date: Tue, 29 Oct 1996 19:44:15 +0100

 Daniel C. Stevenson writes:
 > The system has an NCR PCI controller. It has a 2GB SCSI hard
 > disk, 48MB of memory, and an ISA-based Ethernet card (3C509).
 
 What model of disk drive is that ?
 
 > >Description:
 > 
 > The problem: access to various directories and files fails, giving an
 > error to the shell of "Input/output failed" for the affected files or
 > directories. The console displays the following message repeatedly:
 > 
 > assertion "cp" failed: file "../../pci/ncr.c", line 5563
 > sd0(ncr0:0:0):COMMAND FAILED (4 28) @f0d7c400
 
 This is a secondary effect. Please send the lines ABOVE those.
 They should start with the word ERROR in capital letters ...
 
 > (I'm not sure which of these 2 lines actually goes first, but the
 > pattern repeats continuously)
 > 
 > This problem seems to happen at random times. It has happened when the
 > system has been up for over a week (since the last reboot) or just a
 > few days after the last reboot. The system currently maintains a
 > moderate Web server load and 1 or 2 users. The problem seems to happen
 > independent of server load or any other discernible influences.
 > 
 > When the error happens, some but not all partitions of the disk
 > are affected, and only parts of them are affected.
 
 This is quite interesting. Something like that might happen if you 
 have tagged command queues enabled, but the drive does not (always)
 support as many tags. Since the driver default is 4 tags (and most
 drives should easily support at least 32), this is a safe default. 
 But I've got some reports, which seem to indidcate that certain drives
 don't support while doing error recovery after a failed read or write.
 
 I'm not certain about this, but you may want to
 
 1) make sure that automatic bad block replacement is enabled
    (see the "scsi" command, mode page 1, ARRE and AWRE)
 
 2) check whether not using tags does help ("ncrcontrol -s tags=0")
 
 > >How-To-Repeat:
 > 
 > Wait for it to happen again.
 
 Well, you should tell me how I might repeat it. Hmmm, just send me
 your system and I'll wait :)
 
 > >Fix:
 > 	
 > A hard reboot (cycling the power). "reboot" doesn't work
 > ("Input/output error") and using Ctrl-Alt-Del doesn't work either; in
 > the latter case, it results in a hung system at the "Boot:" prompt.
 
 Ctrl-Alt-Del does not work ???
 Hmmm ... Seems that the drive did lock up and does not even recover
 if faced with a SCSI bus reset ...
 
 I'll look into this again after I receive some more detailed information
 about your disk drive and whether the system works reliably without tags.
 
 Regards, STefan
State-Changed-From-To: open->closed
State-Changed-By: se 
State-Changed-When: Tue Oct 29 11:33:40 PST 1996 
State-Changed-Why:  
More information on the disk drive and the exact error message is required. 
Responsible-Changed-From-To: freebsd-bugs->se 
Responsible-Changed-By: mpp 
Responsible-Changed-When: Tue Mar 25 18:29:35 PST 1997 
Responsible-Changed-Why:  
NCR/PCI failure, and the audit trail shows that Stefan was waiting 
for some feedback on this, and here it is: 

From Daniel C. Stevenson: 

Thanks for asking. Setting the tags with ncrcontrols worked, and I've had
no problems since. Is that a typo above, or is there a 3.0 close to 
 

Grr...guess I'll have to type if in by hand...here is what Daniel 
really sent to me: 

Thanks for asking.  Setting the tags with ncrcontrol worked, and I've had 
no problems since. ...[trimmed] 

From: Studded <Studded@san.rr.com>
To: freebsd-gnats-submit@freebsd.org, daniels@borg.mit.edu
Cc:  Subject: Re: kern/1919: access to files/directories fails, gives NCR PCI error
Date: Wed, 08 Apr 1998 23:22:02 -0700

 Mail ping fails with:
 
    ----- The following addresses had permanent fatal errors -----
 <daniels@borg.mit.edu>
 
    ----- Transcript of session follows -----
 <daniels@borg.mit.edu>... Deferred: Connection refused by borg.mit.edu.
 Message could not be delivered for 5 days
 Message will be deleted from queue
State-Changed-From-To: feedback->closed 
State-Changed-By: se 
State-Changed-When: Fri Feb 19 16:14:01 PST 1999 
State-Changed-Why:  
This appears to have been another drive that required tags to be disabled. 
>Unformatted:
