From luigi@iet.unipi.it  Fri Apr  4 06:54:00 1997
Received: from prova.iet.unipi.it (prova.iet.unipi.it [131.114.9.236])
          by freefall.freebsd.org (8.8.5/8.8.5) with SMTP id GAA11199
          for <FreeBSD-gnats-submit@freebsd.org>; Fri, 4 Apr 1997 06:53:28 -0800 (PST)
Received: (from luigi@localhost) by prova.iet.unipi.it (8.6.12/8.6.12) id QAA00589; Fri, 4 Apr 1997 16:45:24 +0200
Message-Id: <199704041445.QAA00589@prova.iet.unipi.it>
Date: Fri, 4 Apr 1997 16:45:24 +0200
From: Luigi Rizzo <luigi@iet.unipi.it>
Reply-To: luigi@iet.unipi.it
To: FreeBSD-gnats-submit@freebsd.org
Subject: ahc panic
X-Send-Pr-Version: 3.2

>Number:         3195
>Category:       i386
>Synopsis:       ahc panic
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    gibbs
>State:          closed
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Fri Apr  4 07:00:01 PST 1997
>Closed-Date:    Sat May 23 01:06:29 PDT 1998
>Last-Modified:  Sat May 23 01:06:55 PDT 1998
>Originator:     Luigi Rizzo
>Release:        FreeBSD 2.2.1-RELEASE
>Organization:
DEIT
>Environment:

	PPro, ASUS MB (Natoma), Adaptec2940U (PCI),
	unknown SCSI CD-WRITER (T.YUDEN CD-WO EW-50)

	$Id: aic7xxx.c,v 1.81.2.17 1997/03/24 19:17:33 gibbs Exp $

>Description:

	Trying to access /dev/worm0 causes the following message (from
	memory, it was on the console...):

	    worm0: ahc(0:4:0) SCB 0x0 timed out in command phase, SCSISIGI
		== 0x84, SEQADDR == 0x42
	    worm0: ahc(0:4:0) abort message in message buffer
	    worm0: ahc(0:4:0) SCB 0x0 abort completed

	    panic: Couldn't find busy SCB


	The relevant file for the error message is /sys/i386/scsi/aic7xxx.c
	tagged

	    $Id: aic7xxx.c,v 1.81.2.17 1997/03/24 19:17:33 gibbs Exp $

	While I have no interest in this particular Worm drive (except
	that it costs 2/3 the price of a Philips) I do have an
	interest in having a 2940 working...

>How-To-Repeat:

	just access /dev/worm0 , e.g. dd if=/dev/worm0

	causes the panic, in a very repeatable way.

>Fix:
>Release-Note:
>Audit-Trail:

From: Petr Lampa <lampa@fee.vutbr.cz>
To: freebsd-gnats-submit@freebsd.org, luigi@iet.unipi.it
Cc:  Subject: Re: i386/3195: ahc panic
Date: Fri, 04 Apr 1997 17:47:07 +0200

 I have got similar problems on Adaptec 3940W and 3940U with
 2 disks per channel and FreeBSD-GAMME, FreeBSD-2.2, and 
 FreeBSD-2.2.1. System works only for several days (sometimes
 even hours) and then one of the following errors happens:
 
 no active SCB for reconnection target - issuing ABORT
 SAVED_TCL == 0x0 (0x10)
 SCB 0x3 - timed out in message in phase, SCSISIGI == 0xf4
 SEQADDR == 0x42
 Yucky Immediate reset. Flags == 0x1
 
 OR
 
 SCB 0x1 timed out in datain phase, SCSISIGI == 0x44
 SEQADDR == 0x126
 abort message in message buff
 SCB 0x3 timed out in datain phase, SCSISIGI == 0x54
 
 and system hangs. When I activate kernel debugger, trace
 is always:
 
 ahc_scsi_cmd()
 sdstart()
 free_xs()
 scsi_done()
 ahc_done()
 ahc_run_done_queue()
 ahc_reset_channel()
 ahc_timeout()
 
 This looks like bus reset never succeeds and loops forever.
 Both systems were running 2.1.6 without problems for months.
 AHC driver is compiled without any options (no tag queueing,
 no memio, no scb paging).
 
 						Petr Lampa

From: "Justin T. Gibbs" <gibbs@plutotech.com>
To: luigi@iet.unipi.it
Cc: FreeBSD-gnats-submit@FreeBSD.ORG
Subject: Re: i386/3195: ahc panic 
Date: Fri, 04 Apr 1997 09:53:00 -0700

 Have you tried tracking the 2.2 branch and picking up recent changes
 to the driver?
 
 What AHC options are you using?
 
 --
 Justin T. Gibbs
 ===========================================
   FreeBSD: Turning PCs into workstations
 ===========================================
 
 

From: Petr Lampa <lampa@fee.vutbr.cz>
To: luigi@labinfo.iet.unipi.it (Luigi Rizzo)
Cc: freebsd-gnats-submit@freebsd.org
Subject: Re: i386/3195: ahc panic
Date: Mon, 7 Apr 1997 09:13:19 +0200 (MET DST)

 > > From: Petr Lampa <lampa@fee.vutbr.cz>
 > > To: freebsd-gnats-submit@freebsd.org, luigi@iet.unipi.it
 > > Cc:  Subject: Re: i386/3195: ahc panic
 > > Date: Fri, 04 Apr 1997 17:47:07 +0200
 > > 
 > >  I have got similar problems on Adaptec 3940W and 3940U with
 > >  2 disks per channel and FreeBSD-GAMME, FreeBSD-2.2, and 
 > >  FreeBSD-2.2.1. System works only for several days (sometimes
 > 
 > have you tried to update the aic7xxx.seq (microcode) to the latest version ?
 > There is one dated 4/4/97 (tonight!) I havent tried it yet.
 > 
 
 Still the same problems, after some time error:
 
 ahc1:A:1: no active SCB for reconnecting target - issuing ABORT
 SAVED_TCL == 0x10
 ahc1:A:1: Target did not send an IDENTIFY message
 LASTPHAS = 0x0, SAVED_TCL = 0x10
 
 and systems loops in ahc driver, everytime I hit kernel debugger, stack is:
 
 ahc_scsi_cmd
 scsi_scsi_cmd
 sdstart
 free_xs
 scsi_done
 ahc_done
 ahc_run_done_queue
 ahc_reset_channel
 
 I have tried to change a bit of code, which probably causes hangup:
 
 
 *** aic7xxx.c.old	Mon Apr  7 09:05:34 1997
 --- aic7xxx.c	Sun Apr  6 21:47:23 1997
 ***************
 *** 3668,3673 ****
 --- 3668,3674 ----
   	printf("Clearing 'in-reset' flag\n");
   	ahc->in_reset &= (args->bus == 'A' ? ~CHANNEL_A_RESET
   					   : ~CHANNEL_B_RESET);
 + 	ahc_run_done_queue(ahc);
   	splx(s);
   }
   
 ***************
 *** 3753,3759 ****
   		ahc_clear_intstat(ahc);
   		restart_sequencer(ahc);
   	}
 ! 	ahc_run_done_queue(ahc);
   	return found;
   }
   
 --- 3754,3760 ----
   		ahc_clear_intstat(ahc);
   		restart_sequencer(ahc);
   	}
 ! 	if (!initiate_reset) ahc_run_done_queue(ahc);
   	return found;
   }
   
 
 After that, system doesn't loop in ahc driver, but it cannot
 recover from ahc timeout. Here is syslog:
 
 Apr  7 05:00:29 boco /kernel: ahc1:A:0: no active SCB for reconnecting target - issuing ABORT
 Apr  7 05:00:29 boco /kernel: SAVED_TCL == 0x0
 Apr  7 05:00:39 boco /kernel: sd4(ahc1:1:0): SCB 0x1 - timed out in message in phase, SCSISIGI == 0xf4
 Apr  7 05:00:39 boco /kernel: SEQADDR == 0x42
 Apr  7 05:00:39 boco /kernel: sd3(ahc1:0:0): abort message in message buffer
 Apr  7 05:00:39 boco /kernel: sd4(ahc1:1:0): SCB 0x3 timedout while recovery in progress
 Apr  7 05:00:39 boco /kernel: sd3(ahc1:0:0): SCB 0x2 - timed out in message in phase, SCSISIGI == 0xf4
 Apr  7 05:00:39 boco /kernel: SEQADDR == 0x42
 Apr  7 05:00:39 boco /kernel: ahc1: Issued Channel A Bus Reset. 4 SCBs aborted
 Apr  7 05:00:39 boco /kernel: Clearing bus reset
 Apr  7 05:00:39 boco /kernel: Timedout SCB handled by another timeout
 Apr  7 05:00:39 boco /kernel: Clearing 'in-reset' flag
 Apr  7 05:00:39 boco /kernel: sd3(ahc1:0:0): no longer in timeout
 Apr  7 05:00:39 boco /kernel: sd3(ahc1:0:0): UNIT ATTENTION asc:29,0
 Apr  7 05:00:39 boco /kernel: sd3(ahc1:0:0):  Power on, reset, or bus device reset occurred field replaceable unit: 14
 Apr  7 05:00:39 boco /kernel: , retries:3
 Apr  7 05:00:49 boco /kernel: sd4(ahc1:1:0): SCB 0x3 - timed out while idle, LASTPHASE == 0x1, SCSISIGI == 0x0
 Apr  7 05:00:49 boco /kernel: SEQADDR == 0x5
 Apr  7 05:00:49 boco /kernel: sd4(ahc1:1:0): SCB 3: Immediate reset.  Flags = 0x401
 Apr  7 05:00:49 boco /kernel: ahc1: Issued Channel A Bus Reset. 2 SCBs aborted
 Apr  7 05:00:49 boco /kernel: Timedout SCB handled by another timeout
 Apr  7 05:00:49 boco /kernel: Clearing bus reset
 Apr  7 05:00:49 boco /kernel: Clearing 'in-reset' flag
 Apr  7 05:00:49 boco /kernel: sd4(ahc1:1:0): no longer in timeout
 Apr  7 05:00:59 boco /kernel: sd4(ahc1:1:0): SCB 0x3 - timed out while idle, LASTPHASE == 0x1, SCSISIGI == 0x0
 Apr  7 05:00:59 boco /kernel: SEQADDR == 0x5
 Apr  7 05:00:59 boco /kernel: sd4(ahc1:1:0): SCB 3: Immediate reset.  Flags = 0x401
 Apr  7 05:00:59 boco /kernel: ahc1: Issued Channel A Bus Reset. 2 SCBs aborted
 Apr  7 05:00:59 boco /kernel: Timedout SCB handled by another timeout
 Apr  7 05:00:59 boco /kernel: Clearing bus reset
 Apr  7 05:00:59 boco /kernel: Clearing 'in-reset' flag
 Apr  7 05:00:59 boco /kernel: sd4(ahc1:1:0): no longer in timeout
 Apr  7 05:01:09 boco /kernel: sd4(ahc1:1:0): SCB 0x3 - timed out while idle, LASTPHASE == 0x1, SCSISIGI == 0x0
 Apr  7 05:01:09 boco /kernel: SEQADDR == 0x7
 Apr  7 05:01:09 boco /kernel: sd4(ahc1:1:0): SCB 3: Immediate reset.  Flags = 0x401
 Apr  7 05:01:09 boco /kernel: ahc1: Issued Channel A Bus Reset. 2 SCBs aborted
 Apr  7 05:01:09 boco /kernel: Timedout SCB handled by another timeout
 Apr  7 05:01:09 boco /kernel: Clearing bus reset
 Apr  7 05:01:09 boco /kernel: Clearing 'in-reset' flag
 Apr  7 05:01:09 boco /kernel: sd4(ahc1:1:0): no longer in timeout
 Apr  7 05:01:19 boco /kernel: sd4(ahc1:1:0): SCB 0x3 - timed out while idle, LASTPHASE == 0x1, SCSISIGI == 0x0
 Apr  7 05:01:19 boco /kernel: SEQADDR == 0x5
 Apr  7 05:01:19 boco /kernel: sd4(ahc1:1:0): SCB 3: Immediate reset.  Flags = 0x401
 Apr  7 05:01:19 boco /kernel: ahc1: Issued Channel A Bus Reset. 2 SCBs aborted
 Apr  7 05:01:19 boco /kernel: Timedout SCB handled by another timeout
 Apr  7 05:01:19 boco /kernel: Clearing bus reset
 Apr  7 05:01:19 boco /kernel: Clearing 'in-reset' flag
 Apr  7 05:01:19 boco /kernel: sd4(ahc1:1:0): no longer in timeout
 Apr  7 05:01:20 boco /kernel: sd3(ahc1:0:0): UNIT ATTENTION asc:29,0
 Apr  7 05:01:20 boco /kernel: sd3(ahc1:0:0):  Power on, reset, or bus device reset occurred field replaceable unit: 14
 Apr  7 05:01:20 boco /kernel: , retries:4
 Apr  7 05:01:29 boco /kernel: sd4(ahc1:1:0): SCB 0x3 - timed out while idle, LASTPHASE == 0x1, SCSISIGI == 0x0
 Apr  7 05:01:29 boco /kernel: SEQADDR == 0x4
 Apr  7 05:01:29 boco /kernel: sd4(ahc1:1:0): SCB 3: Immediate reset.  Flags = 0x401
 Apr  7 05:01:30 boco /kernel: ahc1: Issued Channel A Bus Reset. 2 SCBs aborted
 Apr  7 05:01:30 boco /kernel: Timedout SCB handled by another timeout
 Apr  7 05:01:30 boco /kernel: Clearing bus reset
 Apr  7 05:01:30 boco /kernel: Clearing 'in-reset' flag
 Apr  7 05:01:30 boco /kernel: sd4(ahc1:1:0): no longer in timeout
 Apr  7 05:01:40 boco /kernel: sd4(ahc1:1:0): SCB 0x3 - timed out while idle, LASTPHASE == 0x1, SCSISIGI == 0x0
 ....
 
 							Petr Lampa
 
 -- 
 Department of Computer Science and Engineering  E-mail: lampa@fee.vutbr.cz
 Faculty of El. Engineering and Comp. Science	Phone: (+420 5) 7275/225,111
 Technical University of Brno			Fax:  (+420 5) 41211141
 Bozetechova 2, 612 66 Brno, Czech Republic
State-Changed-From-To: open->feedback 
State-Changed-By: joerg 
State-Changed-When: Sat Aug 23 16:17:12 MEST 1997 
State-Changed-Why:  

Please update to the most recent version of 2.2-stable, and see 
whether the problem still persists.  A number of bugs have been 
fixed in ahc(4) since. 


Responsible-Changed-From-To: freebsd-bugs->gibbs 
Responsible-Changed-By: joerg 
Responsible-Changed-When: Sat Aug 23 16:17:12 MEST 1997 
Responsible-Changed-Why:  
Justin's field. 

From: "Greg A. Woods" <woods@zeus.leitch.com>
To: freebsd-gnats-submit@freebsd.org, lampa@fee.vutbr.cz
Cc: woods@zeus.leitch.com, tholmes@zeus.leitch.com
Subject: Re: i386/3195: ahc panic
Date: Thu, 12 Feb 1998 12:40:51 -0500

 I've just managed to hang a system running RELENG_2_2 cvsup'ed on
 1998/01/26.
 
 We have a P2 w/ 128MB RAM, Adaptec AHC-2940UW connected to an Adaptec
 UltraRAID using their AEC-4312A controller.
 
 I ran two simultaneous pax copies, one from an NFS mount over 100Mb
 local Ethernet, and the other from the IDE system disk, both targeted
 at the array filesystem.
 
 Soon there were literaly zillions of kernel messages from both
 ahc0 and sd0 about timeouts, bus resets, etc.  Now the scsi bus
 is hung (the RAID array shows the host active light steady), and
 virtual console swapping is frozen (though numlock/capslock et al
 still work).
 
 The last kernel message is:
 
 sd0(ahc0:0:0): SCB 0x - timed out while idle, LASTPHASE == 0x1 SCSISIGI
 == 0xf7
 SEQADDR = 0x58 SCSISEQ = 0x12 SSTAT0 = 0x5 SSTAT1 = 0x2
 sd0(ahc0:0:0): Queueing an Abort SCB
 
 The array works fine if activity is kept to a minimum, such as typing
 single commands to create, copy, remove, fsck, etc.
 
 So, I'd say there are still lots of timing problems in ahc.
 
 -- 
 							Greg A. Woods
 
 +1 416 443-1734      VE3TCP      <gwoods@acm.org>      <robohack!woods>
 Planix, Inc. <woods@planix.com>; Secrets of the Weird <woods@weird.com>
State-Changed-From-To: feedback->closed 
State-Changed-By: phk 
State-Changed-When: Sat May 23 01:06:29 PDT 1998 
State-Changed-Why:  
timed out.  CAM is hopefully able to solve this problem. 
>Unformatted:
