From dillon@flea.best.net  Tue Jul 28 00:51:49 1998
Received: from flea.best.net (root@flea.best.net [206.184.139.131])
          by hub.freebsd.org (8.8.8/8.8.8) with ESMTP id AAA25575
          for <FreeBSD-gnats-submit@freebsd.org>; Tue, 28 Jul 1998 00:51:48 -0700 (PDT)
          (envelope-from dillon@flea.best.net)
Received: (from dillon@localhost)
	by flea.best.net (8.9.0/8.9.0/best.fl) id AAA15626;
	Tue, 28 Jul 1998 00:51:17 -0700 (PDT)
Message-Id: <199807280751.AAA15626@flea.best.net>
Date: Tue, 28 Jul 1998 00:51:17 -0700 (PDT)
From: Matt Dillon <dillon@best.net>
Reply-To: dillon@best.net
To: FreeBSD-gnats-submit@freebsd.org
Subject: biodone: buffer not busy panics
X-Send-Pr-Version: 3.2

>Number:         7424
>Category:       kern
>Synopsis:       Machine crashes do not occur very often, but when they do occur it's usually a panic on biodone: buffer not busy.
>Confidential:   no
>Severity:       serious
>Priority:       low
>Responsible:    dillon
>State:          closed
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Tue Jul 28 01:00:01 PDT 1998
>Closed-Date:    Mon Dec 10 14:47:28 PST 2001
>Last-Modified:  Mon Dec 10 14:49:32 PST 2001
>Originator:     Matt Dillon
>Release:        FreeBSD 2.2.6-STABLE i386
>Organization:
Best Internet Communications
>Environment:

	FreeBSD-stable from CVS (somewhere inbetween 2.2.6, 2.2.7).
	FreeBSD-current

    All of our boxes use Adaptec 2940UW boards and nearly all of them are
    single-cpu PPro-200 motherboards.  Boot information is similar to
    as shown below.  Tagged queueing is enabled.

    options         AHC_TAGENABLE
    options         AHC_ALLOW_MEMIO


    ahc0 <Adaptec 2940 Ultra SCSI host adapter> rev 0 int a irq 11 on pci0:12:0
    ahc0: aic7880 Wide Channel, SCSI Id=7, 16 SCBs
    ahc0 waiting for scsi devices to settle
    ahc0: target 0 Tagged Queuing Device
    (ahc0:0:0): "SEAGATE ST34371W 0484" type 0 fixed SCSI 2
    sd0(ahc0:0:0): Direct-Access 4148MB (8496884 512 byte sectors)
    sd0(ahc0:0:0): with 5172 cyls, 10 heads, and an average 164 sectors/track
    ahc0: target 1 Tagged Queuing Device
    (ahc0:1:0): "SEAGATE ST19171W 0023" type 0 fixed SCSI 2
    sd1(ahc0:1:0): Direct-Access 8683MB (17783112 512 byte sectors)
    sd1(ahc0:1:0): with 5268 cyls, 20 heads, and an average 168 sectors/track
    ahc0: target 2 Tagged Queuing Device
    (ahc0:2:0): "SEAGATE ST19171W 0023" type 0 fixed SCSI 2
    sd2(ahc0:2:0): Direct-Access 8683MB (17783112 512 byte sectors)
    sd2(ahc0:2:0): with 5268 cyls, 20 heads, and an average 168 sectors/track



>Description:

    I'm submitting this bug report even though I don't have a hard 
    backtrace.  Unfortuntely, as you can see, the nature of the panic
    generally precludes getting a dump.  I figure it's good to keep the
    PR in the bug list.  I've set the severity to to serious since it is
    a crash, but the priority to low because it only happens once a month
    per machine or so... but it's an important stability issue because
    'biodone: buffer not busy' panics are responsible for most of the
    crashes we get these days.  If it could be fixed, it would considerably
    increase machine reliability.

    Our -stable machines, around 40 of them, each tend to crash around
    once a month (so we get about a crash a day).  i.e. they do not 
    crash very often.  But when they do, many of the crashes are due to
    biodone: buffer not busy panics.  These crashes are sometimes preceded
    by kernel printf's relating to the SCSI subsystem.

    We have also seen this crash on our FreeBSD-current test box.

					    -Matt

(FROM CONSOLE LOGS)

ahc0: WARNING no command for scb 4 (cmdcmplt)
QOUTCNT == 8
panic: biodone: buffer not busy
Debugger("panic")


db> trace
_Debugger(f0113258) at _Debugger+0x35
_panic(f012ebe9,f1adb080,f10a7c00,f3b51114,f1aabfd0) at _panic+0x5a
_biodone(f3b51114,f1adb080,f10a7c00,1,f1adb080) at _biodone+0x30
_scsi_done(f1adb080,f1aafa60,f1ab0800,40000,f01db4e1) at _scsi_done+0x84
_ahc_done(f1ab0800,f1aafa60) at _ahc_done+0x155
_ahc_intr(f1ab0800,0,27,efbf0027,40000) at _ahc_intr+0x1c7
Xresume11() at Xresume11+0x2b
--- interrupt, eip = 0xa5c4, esp = 0xefbffff0, ebp = 0xefbfb8b8 ---
db> pani
panic: from debugger

dumping to dev 401, offset 786432
dump panic: biodone: buffer not busy

dumping to dev 401, offset 786432
dump device not ready
Automatic reboot in 15 seconds - press a key on the console to abort

>How-To-Repeat:

	The panics cannot be deterministically reproduced, but occur
	around once a month per machine.

>Fix:
	


>Release-Note:
>Audit-Trail:

From: Tor.Egge@fast.no
To: dillon@best.net
Cc: FreeBSD-gnats-submit@FreeBSD.ORG
Subject: Re: kern/7424: biodone: buffer not busy panics
Date: Tue, 28 Jul 1998 18:09:00 +0200

 >     We have also seen this crash on our FreeBSD-current test box.
 
 I'm using some patches the the scsi driver in an attempt to
 track down this bug.
 
 ---------
 Index: scsi/scsi_base.c
 ===================================================================
 RCS file: /home/ncvs/src/sys/scsi/scsi_base.c,v
 retrieving revision 1.54
 diff -u -r1.54 scsi_base.c
 --- scsi_base.c	1998/02/20 13:37:39	1.54
 +++ scsi_base.c	1998/02/20 21:48:09
 @@ -110,6 +110,7 @@
  	struct scsi_link *sc_link;	/* who to credit for returning it */
  	u_int32_t flags;
  {
 +	xs->flags &= ~INUSE;
  	xs->next = next_free_xs;
  	next_free_xs = xs;
  
 Index: i386/scsi/aic7xxx.c
 ===================================================================
 RCS file: /home/ncvs/src/sys/i386/scsi/aic7xxx.c,v
 retrieving revision 1.126
 diff -u -r1.126 aic7xxx.c
 --- aic7xxx.c	1997/09/27 19:38:27	1.126
 +++ aic7xxx.c	1998/04/07 01:24:50
 @@ -769,6 +769,14 @@
  						qoutcnt);
  					continue;
  				}
 +				if ((scb->flags & SCB_ON_CMPLETE_QUEUE) != 0) {
 +					printf("%s: WARNING "
 +					       "scb %d already "
 +					       "on cmplete queue\n",
 +					       ahc_name(ahc), scb_index);
 +					continue;
 +				}
 +				scb->flags |= SCB_ON_CMPLETE_QUEUE;
  				STAILQ_INSERT_TAIL(&ahc->cmplete_scbs, scb,
  						   links);
  			}
 @@ -786,6 +794,8 @@
  			}
  			while((scb = ahc->cmplete_scbs.stqh_first) != NULL) {
  				STAILQ_REMOVE_HEAD(&ahc->cmplete_scbs, links);
 +				if ((scb->flags & SCB_ON_CMPLETE_QUEUE) == 0)
 +					panic("ahc_intr: scb not on cmplete_scbs queue");
  				/*
  				 * Save off the residual if there is one.
  				 */
 @@ -816,6 +826,9 @@
  							 scb->xs->error);
  					ahc_run_done_queue(ahc);
  				}
 +				if ((scb->flags & SCB_ON_CMPLETE_QUEUE) == 0)
 +					panic("ahc_intr: scb not on cmplete_scbs queue");
 +				scb->flags &= ~SCB_ON_CMPLETE_QUEUE;
  				ahc_done(ahc, scb);
  			}
  			ahc_outb(ahc, CLRINT, CLRCMDINT);
 @@ -1883,6 +1896,12 @@
  	 * (SCSI_ERR_OK in FreeBSD), we don't have to care this case.
  	 */
  #endif
 +	if ((xs->flags & ITSDONE) != 0)
 +	  panic("ahc_done: scsi_xfer already done");
 +	if ((xs->flags & INUSE) == 0)
 +	  panic("ahc_done: scsi_xfer is unused");
 +	if ((scb->flags & SCB_ON_CMPLETE_QUEUE) != 0)
 +	  panic("ahc_done: scb on cmplete_scbs queue");
  	xs->flags |= ITSDONE;
  #ifdef AHC_TAGENABLE
  	/*
 @@ -2583,6 +2602,8 @@
  	opri = splbio();
  
  	/* Clean up for the next user */
 +	if (scb->flags & SCB_ON_CMPLETE_QUEUE)
 +		panic("ahc_free_scb: scb on cmplete_scbs queue");
  	scb->flags = SCB_FREE;
  	hscb->control = 0;
  	hscb->status = 0;
 @@ -2858,7 +2879,8 @@
  		DELAY(1000);
  		if (ahc_inb(ahc, INTSTAT) & INT_PEND)
  			break;
 -	} if (wait == 0) {
 +	}
 +	if (wait == 0) {
  		printf("%s: board is not responding\n", ahc_name(ahc));
  		return (EIO);
  	}
 Index: i386/scsi/aic7xxx.h
 ===================================================================
 RCS file: /home/ncvs/src/sys/i386/scsi/aic7xxx.h,v
 retrieving revision 1.43
 diff -u -r1.43 aic7xxx.h
 --- aic7xxx.h	1997/08/15 19:27:43	1.43
 +++ aic7xxx.h	1997/10/26 18:13:38
 @@ -164,7 +164,8 @@
  	SCB_MSGOUT_SDTR		= 0x0400,
  	SCB_MSGOUT_WDTR		= 0x0800,
  	SCB_ABORT		= 0x1000,
 -	SCB_QUEUED_ABORT	= 0x2000
 +	SCB_QUEUED_ABORT	= 0x2000,
 +	SCB_ON_CMPLETE_QUEUE	= 0x4000
  } scb_flag;
  
  /*
 ---------
 
 the result is an occational burst of syslog messages
 ------
 ahc1: WARNING scb 4 already on cmplete queue
 ahc1: WARNING scb 4 already on cmplete queue
 ahc1: WARNING scb 4 already on cmplete queue
 ahc1: WARNING scb 4 already on cmplete queue
 ahc1: WARNING scb 4 already on cmplete queue
 ahc1: WARNING scb 4 already on cmplete queue
 ahc1: WARNING scb 4 already on cmplete queue
 ahc1: WARNING scb 4 already on cmplete queue
 ahc1: WARNING scb 4 already on cmplete queue
 ahc1: WARNING scb 4 already on cmplete queue
 ahc1: WARNING scb 4 already on cmplete queue
 ahc1: WARNING scb 4 already on cmplete queue
 ahc1: WARNING scb 4 already on cmplete queue
 ahc1: WARNING scb 4 already on cmplete queue
 ahc1: WARNING no command for scb 4 (cmdcmplt)
 QOUTCNT == 1
 ------
 
 indicating a problem related to the FIFO for completed requests.
 
 I've not had problems getting memory dumps when this problem occurs,
 but this is due to the dump partition being located on a disk
 connected to ahc0, not ahc1.
 
 - Tor Egge
Responsible-Changed-From-To: freebsd-bugs->dillon 
Responsible-Changed-By: johan 
Responsible-Changed-When: Thu Aug 10 23:43:30 PDT 2000 
Responsible-Changed-Why:  
Let Matt handle his own PRs. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=7424 
State-Changed-From-To: open->closed 
State-Changed-By: dillon 
State-Changed-When: Mon Dec 10 14:47:28 PST 2001 
State-Changed-Why:  
Closed For Winter. 

http://www.FreeBSD.org/cgi/query-pr.cgi?pr=7424 
>Unformatted:
