From nobody@FreeBSD.ORG  Fri Jan 14 03:56:17 2000
Return-Path: <nobody@FreeBSD.ORG>
Received: by hub.freebsd.org (Postfix, from userid 32767)
	id E013515294; Fri, 14 Jan 2000 03:56:16 -0800 (PST)
Message-Id: <20000114115616.E013515294@hub.freebsd.org>
Date: Fri, 14 Jan 2000 03:56:16 -0800 (PST)
From: borki@xs.use.ch
Sender: nobody@FreeBSD.ORG
To: freebsd-gnats-submit@FreeBSD.org
Subject: Unexptected busfree with AIC 7890/91 (ASUS P2B-DS)
X-Send-Pr-Version: www-1.0

>Number:         16121
>Category:       kern
>Synopsis:       Unexptected busfree with AIC 7890/91 (ASUS P2B-DS)
>Confidential:   no
>Severity:       critical
>Priority:       high
>Responsible:    ken
>State:          closed
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Fri Jan 14 04:00:02 PST 2000
>Closed-Date:    Mon Apr 3 14:42:49 PDT 2000
>Last-Modified:  Mon Apr  3 14:43:41 PDT 2000
>Originator:     Reto Burkhalter
>Release:        3.3-RELEASE
>Organization:
>Environment:
FreeBSD magnum.bumbacher.ch 3.3-RELEASE FreeBSD 3.3-RELEASE #0: Sat Sep 18 18:25:16 CEST 1999     root@magnum2.xs.use.ch:/usr/src/sys/compile/MAGNUM  i386
>Description:
We recently have some problems with one of our FreeBSD 3.3-RELEASE boxes.

We had an uptime of over 100 days without any problems, the machine
was sometimes quite heavily loaded.

I have syslogd set up to log also to my home machine, the last thing
I receive is this (it's not in the /var/log/messages file of this
particular machine):

Jan 11 16:25:40 <server> /kernel: Unexpected busfree.  LASTPHASE == 0x0

We first thought it might be a problem with the uptime or the load of
the machine, but two days later it happened again (in the afternoon
when the system is quite idle).

The remote staff gave me some kernel debug output information, that
was on the console (repeating), the system was halted and reported
something like this:

swap_pager: indefinit wait buffer: device 0x2047, blk_no, 3496, size 4096


The machine specs from 'dmesg':

Copyright (c) 1992-1999 FreeBSD Inc.
Copyright (c) 1982, 1986, 1989, 1991, 1993
	The Regents of the University of California. All rights reserved.
FreeBSD 3.3-RELEASE #0: Sat Sep 18 18:25:16 CEST 1999
    root@magnum2.xs.use.ch:/usr/src/sys/compile/MAGNUM
Timecounter "i8254"  frequency 1193182 Hz
CPU: Pentium III (686-class CPU)
  Origin = "GenuineIntel"  Id = 0x672  Stepping = 2
  Features=0x383fbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,MMX,FXSR,<b25>>
real memory  = 805306368 (786432K bytes)
avail memory = 780619776 (762324K bytes)
Programming 24 pins in IOAPIC #0
FreeBSD/SMP: Multiprocessor motherboard
 cpu0 (BSP): apic id:  1, version: 0x00040011, at 0xfee00000
 cpu1 (AP):  apic id:  0, version: 0x00040011, at 0xfee00000
 io0 (APIC): apic id:  2, version: 0x00170011, at 0xfec00000
Preloaded elf kernel "kernel" at 0xc0299000.
Pentium Pro MTRR support enabled
Probing for devices on PCI bus 0:
chip0: <Intel 82443BX host to PCI bridge> rev 0x03 on pci0.0.0
chip1: <Intel 82443BX host to AGP bridge> rev 0x03 on pci0.1.0
chip2: <Intel 82371AB PCI to ISA bridge> rev 0x02 on pci0.4.0
chip3: <Intel 82371AB Power management controller> rev 0x02 on pci0.4.3
ahc0: <Adaptec aic7890/91 Ultra2 SCSI adapter> rev 0x00 int a irq 19 on pci0.6.0
ahc0: aic7890/91 Wide Channel A, SCSI Id=7, 16/255 SCBs
fxp0: <Intel EtherExpress Pro 10/100B Ethernet> rev 0x05 int a irq 17 on pci0.11.0
fxp0: Ethernet address 00:90:27:3c:44:cc
vga0: <ATI model 4756 graphics accelerator> rev 0x7a int a irq 16 on pci0.12.0
Probing for devices on PCI bus 1:
Probing for devices on the ISA bus:
sc0 on isa
sc0: VGA color <16 virtual consoles, flags=0x0>
atkbdc0 at 0x60-0x6f on motherboard
atkbd0 irq 1 on isa
sio0 at 0x3f8-0x3ff irq 4 flags 0x10 on isa
sio0: type 16550A
sio1 at 0x2f8-0x2ff irq 3 on isa
sio1: type 16550A
fdc0 at 0x3f0-0x3f7 irq 6 drq 2 on isa
fdc0: FIFO enabled, 8 bytes threshold
fd0: 1.44MB 3.5in
vga0 at 0x3b0-0x3df maddr 0xa0000 msize 131072 on isa
npx0 on motherboard
npx0: INT 16 interface
APIC_IO: Testing 8254 interrupt delivery
APIC_IO: routing 8254 via pin 2
Waiting 5 seconds for SCSI devices to settle
SMP: AP CPU #1 Launched!
sa0 at ahc0 bus 0 target 1 lun 0
sa0: <HP C1533A A708> Removable Sequential Access SCSI-2 device 
sa0: 10.000MB/s transfers (10.000MHz, offset 32)
da0 at ahc0 bus 0 target 0 lun 0
da0: <CMD TECH CRD-5440-1 C1-9> Fixed Direct Access SCSI-2 device 
da0: 40.000MB/s transfers (20.000MHz, offset 15, 16bit), Tagged Queueing Enabled
da0: 17496MB (35831808 512 byte sectors: 255H 63S/T 2230C)
changing root device to da0s1a
WARNING: / was not properly dismounted

---

Motherboard is an ASUS-P2B-DS, to the onboard SCSI Controller we
attached an CMD CRD-5440 RAID Controller. This controller had
this two messages in the event log, when the system died.

Host Error      | Connection timed out by Host Port
Host Error      | DEVICE RESET

At the bottom of this file, I will attach the exact screen from
the event logger.


Because this machine is in production and we can not power-cycle
it remotely, we need an ultimate solution to this problem as
fast as possible.

My questions are:

Is this error caused by either 

a) a defective onboard AIC chip?
b) a defective RAID controller?
c) a software bug in the AIC driver of FreeBSD 3.3-RELEASE?
d) something else?

What may I do to prevent this error from happening again?


Thanks for any help!!!!

Reto Burkhalter
borki@xs.use.ch



Attachment: Exact log entries from CMD CRD 5440:
------------------------------------------------------------------------------
System Information:

                           CRD-5440-1 Monitor Utility                01-11-00   
                            MANUFACTURING INFORMATION                18:58:06   

      +++++      
      | Model Number  | 5440         | Processor Type   | 33310          |      
      | Serial Number | 00006507     | Processor Clock  | 40 MHz         |      
      | Date Of Mfg.  | 08-13-99     | Processor Memory | Processor DRAM |      
      | DRAM 0 ID     | A2           | SIMM 0A Size     | 32 MB          |      
      | DRAM 1 ID     | 00           | SIMM 0B Size     |                |      
      | XOR ID        | A2           | SIMM 1A Size     |                |      
      | PIC ID        | A2           | SIMM 1B Size     |                |      
      ++++      
      | Channel 0     | 16-Bit Single-Ended Ultra Host Module, ID A2, 57 |      
      | Channel 1     | 16-Bit Single-Ended Ultra Disk Module, ID A2, 57 |      
      | Channel 2     | 16-Bit Single-Ended Ultra Disk Module, ID A2, 57 |      
      | Channel 3     | 16-Bit Single-Ended Ultra Disk Module, ID A2, 57 |      
      | Channel 4     | No Slot                                          |      
      | Channel 5     | No Slot                                          |      
      | Channel 6     | No Slot                                          |      
      | Channel 7     | No Slot                                          |      
      | Channel 8     | No Slot                                          |      
      +++      

 CTRL-Z: EXIT |                                                                

----------------------------------------------------------------------
Error Log (Event Log):

                           CRD-5440-1 Monitor Utility                01-11-00
                                    EVENT LOG                        18:50:24

+++++ 
| Sequence Number | 0                     | Date             | 01-11-00       | 
| Recorded Event  | HOST                  | Time             | 16:25:41       | 
++ 
| Host Channel    | 00                    | Tag Type         | Simple         | 
| Host LUN        | 00                    | Tag Number       | 0E             | 
| Host Initiator  | 07                    | Host SCSI Status | 00             | 
++++ 
| SCSI Command    | 0A 1A C7 5F 80 00 00 00 00 00 00 00 00 00 00 00           | 
| SCSI Sense Data | F0 00 00 00 00 00 00 0A 00 00 00 00 00 00 00 00 00 00     | 
| SCSI Chip Stat. | 9A                                                        | 
| SCSI Chip Intr. | 02                                                        | 
| CDRP            | 80120508                                                  | 
| CDRP Flags      | 0000                                                      | 
| HTCB Flags      | 19                                                        | 
| CDRP Host Flags | 0028                                                      | 
| Host Error      | Connection timed out by Host Port                         | 
|                 |                                                           | 
+++ 
                                                                                
UP ARROW: NEXT EVL | DOWN ARROW: PREV EVL | F: FLTR | C: CLR LOG | CTRL-Z: EXIT


                           CRD-5440-1 Monitor Utility                01-11-00
                                    EVENT LOG                        18:52:19

+++++ 
| Sequence Number | 1                     | Date             | 01-11-00       | 
| Recorded Event  | HOST                  | Time             | 16:25:41       | 
++ 
| Host Channel    | 00                    | Tag Type         | Simple         | 
| Host LUN        | 00                    | Tag Number       | 0B             | 
| Host Initiator  | 07                    | Host SCSI Status | 00             | 
++++ 
| SCSI Command    | 0A 1A C4 DF 80 00 00 00 00 00 00 00 00 00 00 00           | 
| SCSI Sense Data | F0 00 06 00 00 00 00 0A 00 00 00 00 29 00 00 00 00 00     | 
| SCSI Chip Stat. | 86                                                        | 
| SCSI Chip Intr. | 08                                                        | 
| CDRP            | 80122728                                                  | 
| CDRP Flags      | 000C                                                      | 
| HTCB Flags      | 02                                                        | 
| CDRP Host Flags | 0008                                                      | 
| Host Error      | DEVICE RESET                                              | 
|                 |                                                           | 
+++ 
                                                                                
UP ARROW: NEXT EVL | DOWN ARROW: PREV EVL | F: FLTR | C: CLR LOG | CTRL-Z: EXIT


==============================================================================
(The date/time from syslog must not exactly match the date/time from the
controller's event log.. it's quite exact, but I can not determine, which
one was really first)

---

>How-To-Repeat:
Don't know... Waiting for it to happen.
>Fix:
-

>Release-Note:
>Audit-Trail:

From: "Kenneth D. Merry" <ken@kdm.org>
To: borki@xs.use.ch
Cc: freebsd-gnats-submit@FreeBSD.ORG
Subject: Re: kern/16121: Unexptected busfree with AIC 7890/91 (ASUS P2B-DS)
Date: Sat, 15 Jan 2000 15:26:09 -0700

 On Fri, Jan 14, 2000 at 03:56:16 -0800, borki@xs.use.ch wrote:
 > >Environment:
 > FreeBSD magnum.bumbacher.ch 3.3-RELEASE FreeBSD 3.3-RELEASE #0: Sat Sep 18 18:25:16 CEST 1999     root@magnum2.xs.use.ch:/usr/src/sys/compile/MAGNUM  i386
 > >Description:
 > We recently have some problems with one of our FreeBSD 3.3-RELEASE boxes.
 > 
 > We had an uptime of over 100 days without any problems, the machine
 > was sometimes quite heavily loaded.
 > 
 > I have syslogd set up to log also to my home machine, the last thing
 > I receive is this (it's not in the /var/log/messages file of this
 > particular machine):
 > 
 > Jan 11 16:25:40 <server> /kernel: Unexpected busfree.  LASTPHASE == 0x0
 > 
 > We first thought it might be a problem with the uptime or the load of
 > the machine, but two days later it happened again (in the afternoon
 > when the system is quite idle).
 
 In general, an Unexpected busfree means that the device the Adaptec driver
 was talking to went off the bus when it was instead expected to transmit
 data.
 
 That often indicates a problem with the device in question.  That may or
 may not be the problem in your case.  See below.
 
 > The remote staff gave me some kernel debug output information, that
 > was on the console (repeating), the system was halted and reported
 > something like this:
 > 
 > swap_pager: indefinit wait buffer: device 0x2047, blk_no, 3496, size 4096
 
 [ ... ]
 
 > ahc0: <Adaptec aic7890/91 Ultra2 SCSI adapter> rev 0x00 int a irq 19 on pci0.6.0
 > ahc0: aic7890/91 Wide Channel A, SCSI Id=7, 16/255 SCBs
 
 [ ... ]
 
 There is a bug in the Adaptec 7890 that wasn't worked around until after
 FreeBSD 3.3 was released.  To get the fix, you'll need a version of the
 Adaptec driver from after September 20th, 1999.
 
 > Motherboard is an ASUS-P2B-DS, to the onboard SCSI Controller we
 > attached an CMD CRD-5440 RAID Controller. This controller had
 > this two messages in the event log, when the system died.
 > 
 > Host Error      | Connection timed out by Host Port
 > Host Error      | DEVICE RESET
 > 
 > At the bottom of this file, I will attach the exact screen from
 > the event logger.
 > 
 > 
 > Because this machine is in production and we can not power-cycle
 > it remotely, we need an ultimate solution to this problem as
 > fast as possible.
 > 
 > My questions are:
 > 
 > Is this error caused by either 
 > 
 > a) a defective onboard AIC chip?
 
 Yes, but the bug has been worked around.  See above.
 
 > b) a defective RAID controller?
 > c) a software bug in the AIC driver of FreeBSD 3.3-RELEASE?
 
 Yes and no.  You probably need the fixed Adaptec driver, which wasn't
 checked in until September 20th.
 
 > d) something else?
 > 
 > What may I do to prevent this error from happening again?
 
 I recommend that you upgrade to 3.4, or a more recent 3.4-stable snapshot
 release.
 
 There is no guarantee that an updated driver will fix your problem, of
 course.  But you can at least avoid the problem with the Adaptec 7890.
 
 Ken
 -- 
 Kenneth Merry
 ken@kdm.org
 

From: Reto Burkhalter <borki@xs.use.ch>
To: freebsd-gnats-submit@FreeBSD.org, borki@xs.use.ch
Cc:  
Subject: Re: kern/16121: Unexptected busfree with AIC 7890/91 (ASUS P2B-DS)
Date: Mon, 03 Apr 2000 23:12:02 +0200

 The problem seems to be solved. A disk from the drive array failed and
 possibly
 caused timeouts, which the raid controller couldn't handle.
 No problems since the new disk is in the array.
 
 Thanks for all your help, especially Ken!
 -Reto
 
 
State-Changed-From-To: open->closed 
State-Changed-By: ken 
State-Changed-When: Mon Apr 3 14:42:49 PDT 2000 
State-Changed-Why:  
Submitter reports that the bug is fixed. 


Responsible-Changed-From-To: freebsd-bugs->ken 
Responsible-Changed-By: ken 
Responsible-Changed-When: Mon Apr 3 14:42:49 PDT 2000 
Responsible-Changed-Why:  
I'm closing this one. 
>Unformatted:
