From marijn@gewis.win.tue.nl Fri Feb 26 09:15:30 1999
Return-Path: <marijn@gewis.win.tue.nl>
Received: from mailhost.tue.nl (mailhost.tue.nl [131.155.2.5])
	by hub.freebsd.org (Postfix) with ESMTP id 991D514FB3
	for <FreeBSD-gnats-submit@freebsd.org>; Fri, 26 Feb 1999 09:15:21 -0800 (PST)
	(envelope-from marijn@gewis.win.tue.nl)
Received: from gewis.win.tue.nl [131.155.71.116] by mailhost.tue.nl (8.9.0)
	  for <FreeBSD-gnats-submit@freebsd.org>
	  id SAA08829 (ESMTP). Fri, 26 Feb 1999 18:15:01 +0100 (MET)
Received: (from marijn@localhost)
	by gewis.win.tue.nl (8.9.2/8.9.2) id SAA41198;
	Fri, 26 Feb 1999 18:15:00 +0100 (CET)
	(envelope-from marijn)
Message-Id: <199902261715.SAA41198@gewis.win.tue.nl>
Date: Fri, 26 Feb 1999 18:15:00 +0100 (CET)
From: marijn@gewis.win.tue.nl
Reply-To: marijn@gewis.win.tue.nl
To: FreeBSD-gnats-submit@freebsd.org
Subject: Crash of 3.1-STABLE system due to scsi error. 
X-Send-Pr-Version: 3.2

>Number:         10281
>Category:       kern
>Synopsis:       Crash of 3.1-STABLE system due to scsi error.
>Confidential:   no
>Severity:       critical
>Priority:       high
>Responsible:    freebsd-bugs
>State:          closed
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Fri Feb 26 10:10:08 PST 1999
>Closed-Date:    Wed May 23 14:22:57 PDT 2001
>Last-Modified:  Wed May 23 14:23:16 PDT 2001
>Originator:     Marijn Meijles
>Release:        FreeBSD 3.1-STABLE i386
>Organization:
Student assocation GEWIS
>Environment:

Feb 24 22:17:06 gewis /kernel: Copyright (c) 1992-1999 FreeBSD Inc.
Feb 24 22:17:06 gewis /kernel: Copyright (c) 1982, 1986, 1989, 1991, 1993
Feb 24 22:17:06 gewis /kernel: The Regents of the University of California. All rights reserved.
Feb 24 22:17:06 gewis /kernel: FreeBSD 3.1-STABLE #0: Wed Feb 24 20:34:15 CET 1999
Feb 24 22:17:06 gewis /kernel: marijn@gewis.win.tue.nl:/usr/src/sys/compile/GEWIS
Feb 24 22:17:06 gewis /kernel: Timecounter "i8254"  frequency 1193182 Hz
Feb 24 22:17:06 gewis /kernel: Timecounter "TSC"  frequency 199739596 Hz
Feb 24 22:17:06 gewis /kernel: CPU: Pentium/P55C (199.74-MHz 586-class CPU)
Feb 24 22:17:06 gewis /kernel: Origin = "GenuineIntel"  Id = 0x543  Stepping=3
Feb 24 22:17:06 gewis /kernel: Features=0x8001bf<FPU,VME,DE,PSE,TSC,MSR,MCE,CX8,MMX>
Feb 24 22:17:06 gewis /kernel: real memory  = 67108864 (65536K bytes)
Feb 24 22:17:06 gewis /kernel: avail memory = 62799872 (61328K bytes)
Feb 24 22:17:06 gewis /kernel: Preloaded elf kernel "kernel" at 0xf0254000.
Feb 24 22:17:06 gewis /kernel: Probing for devices on PCI bus 0:
Feb 24 22:17:06 gewis /kernel: chip0: <VIA 82C585 (Apollo VP1/VPX) system controller> rev 0x23 on pci0.0.0
Feb 24 22:17:07 gewis /kernel: chip1: <VIA 82C586 PCI-ISA bridge> rev 0x27 on pci0.7.0
Feb 24 22:17:07 gewis /kernel: ide_pci0: <VIA 82C586x (Apollo) Bus-master IDE controller> rev 0x06 on pci0.7.1
Feb 24 22:17:07 gewis /kernel: vga0: <S3 Trio 64 graphics accelerator> rev 0x16 int a irq 9 on pci0.13.0
Feb 24 22:17:07 gewis /kernel: ed1: <NE2000 PCI Ethernet (RealTek 8029)> rev 0x00 int a irq 12 on pci0.14.0
Feb 24 22:17:07 gewis /kernel: ed1: address 0
Feb 24 22:17:07 gewis /kernel: 0:4f:49:04:6f:c7, type NE2000 (16 bit) 
Feb 24 22:17:07 gewis /kernel: Probing for devices on the ISA bus:
Feb 24 22:17:07 gewis /kernel: sc0 on isa
Feb 24 22:17:07 gewis /kernel: sc0: VGA color <16 virtual consoles, flags=0x0>
Feb 24 22:17:07 gewis /kernel: atkbdc0 at 0x60-0x6f on motherboard
Feb 24 22:17:07 gewis /kernel: atkbd0 irq 1 on isa
Feb 24 22:17:07 gewis /kernel: sio0 at 0x3f8-0x3ff irq 4 on isa
Feb 24 22:17:07 gewis /kernel: sio0: type 16550A
Feb 24 22:17:07 gewis /kernel: sio1 at 0x2f8-0x2ff irq 3 on isa
Feb 24 22:17:07 gewis /kernel: sio1: type 16550A
Feb 24 22:17:07 gewis /kernel: lpt0 at 0x378-0x37f irq 7 on isa
Feb 24 22:17:07 gewis /kernel: lpt0: Interrupt-driven port
Feb 24 22:17:07 gewis /kernel: lp0: TCP/IP capable interface
Feb 24 22:17:07 gewis /kernel: lpt-266371084: this driver is deprecated; use ppbus instead.
Feb 24 22:17:07 gewis /kernel: fdc0 at 0x3f0-0x3f7 irq 6 drq 2 on isa
Feb 24 22:17:07 gewis /kernel: fdc0: FIFO enabled, 8 bytes threshold
Feb 24 22:17:07 gewis /kernel: fd0: 1.44MB 3.5in
Feb 24 22:17:07 gewis /kernel: wdc0 at 0x1f0-0x1f7 irq 14 on isa
Feb 24 22:17:07 gewis /kernel: wdc0: unit 0 (wd0): <WDC AC36400L>
Feb 24 22:17:07 gewis /kernel: wd0: 6149MB (12594960 sectors), 13328 cyls, 15 heads, 63 S/T, 512 B/S
Feb 24 22:17:07 gewis /kernel: wdc0: unit 1 (atapi): <TOSHIBA CD-ROM XM-6102B/1106>, removable, accel, ovlap, dma, iordy
Feb 24 22:17:07 gewis /kernel: wdc0: ATAPI CD-ROMs not configured
Feb 24 22:17:07 gewis /kernel: wdc1 at 0x170-0x177 irq 15 on isa
Feb 24 22:17:07 gewis /kernel: wdc1: unit 0 (wd2): <WDC AC24300L>
Feb 24 22:17:07 gewis /kernel: wd2: 4112MB (8421840 sectors), 8912 cyls, 15 heads, 63 S/T, 512 B/S
Feb 24 22:17:07 gewis /kernel: aha0 at 0x330-0x333 irq 11 drq 5 on isa
Feb 24 22:17:07 gewis /kernel: aha0: AHA-1542CF FW Rev. B.0 (ID=45) SCSI 
Feb 24 22:17:07 gewis /kernel: Host Adapter, SCSI ID 7, 16 CCBs
Feb 24 22:17:07 gewis /kernel: vga0 at 0x3b0-0x3df maddr 0xa0000 msize 131072 on isa
Feb 24 22:17:07 gewis /kernel: npx0 flags 0x1 on motherboard
Feb 24 22:17:07 gewis /kernel: npx0: INT 16 interface
Feb 24 22:17:07 gewis /kernel: Intel Pentium detected, installing workaround for F00F bug
Feb 24 22:17:07 gewis /kernel: IP packet filtering initialized, divert disabled, rule-based forwarding disabled, logging limited to 100 packets/entry
Feb 24 22:17:07 gewis /kernel: Waiting 5 seconds for SCSI devices to settle
Feb 24 22:17:07 gewis /kernel: sa0 at aha0 bus 0 target 0 lun 0
Feb 24 22:17:07 gewis /kernel: sa0: <HP T4000s 1.08> Removable Sequential Access SCSI-2 device 
Feb 24 22:17:07 gewis /kernel: sa0: 3.300MB/s transfers
Feb 24 22:17:07 gewis /kernel: da0 at aha0 bus 0 target 2 lun 0
Feb 24 22:17:07 gewis /kernel: da0: <FUJITSU M2694ES-512 812A> Fixed Direct Access SCSI-2 device 
Feb 24 22:17:07 gewis /kernel: aha0: ahafetchtransinfo - Inquire Setup Info Failed
Feb 24 22:17:07 gewis /kernel: da0: 3.300MB/s transfers
Feb 24 22:17:07 gewis /kernel: da0: 1033MB (2117025 512 byte sectors: 64H 32S/T 1033C)
Feb 24 22:17:07 gewis /kernel: da1 at aha0 bus 0 target 3 lun 0
Feb 24 22:17:07 gewis /kernel: da1: <FUJITSU M2266S-512 0014> Fixed Direct Access SCSI-CCS device 
Feb 24 22:17:07 gewis /kernel: da1: 3.300MB/s transfers
Feb 24 22:17:07 gewis /kernel: da1: 1030MB (2111018 512 byte sectors: 64H 32S/T 1030C)
Feb 24 22:17:07 gewis /kernel: da2 at aha0 bus 0 target 4 lun 0
Feb 24 22:17:07 gewis /kernel: da2: <FUJITSU M2266S-512 0020> Fixed Direct Access SCSI-CCS device 
Feb 24 22:17:07 gewis /kernel: da2: 3.300MB/s transf
Feb 24 22:17:07 gewis /kernel: ers
Feb 24 22:17:07 gewis /kernel: da2: 1030MB (2111018 512 byte sectors: 64H 32S/T 1030C)
Feb 24 22:17:07 gewis /kernel: da3 at aha0 bus 0 target 5 lun 0
Feb 24 22:17:07 gewis /kernel: da3: <FUJITSU M2624F-512 0405> Fixed Direct Access SCSI-2 device 
Feb 24 22:17:07 gewis /kernel: da3: 3.300MB/s transfers
Feb 24 22:17:07 gewis /kernel: da3: 496MB (1015812 512 byte sectors: 64H 32S/T 496C)
Feb 24 22:17:07 gewis /kernel: changing root device to wd0s1a
Feb 24 22:17:07 gewis /kernel: (cd0:aha0:0:1:0): CCB 0xf3e0333c - timed out
Feb 24 22:17:07 gewis /kernel: (cd0:aha0:0:1:0): CCB 0xf3e0333c - timed out
Feb 24 22:17:07 gewis /kernel: aha0: No longer in timeout
Feb 24 22:17:07 gewis /kernel: cd0 at aha0 bus 0 target 1 lun 0
Feb 24 22:17:07 gewis /kernel: cd0: <HP CD-Writer 6020 1.07> Removable CD-ROM SCSI-2 device 
Feb 24 22:17:07 gewis /kernel: cd0: 3.300MB/s transfers
Feb 24 22:17:07 gewis /kernel: cd0: Attempt to query device size failed: NOT READY, Logical unit not ready, cause not reportable

>Description:
 >Description;
 
 While making a backup using afio, we got the following log entries:
 
 Feb 26 10:11:07 gewis /kernel: (sa0:aha0:0:0:0): SPACE. CDB: 11 1 0 0 1 0 
 Feb 26 10:11:07 gewis /kernel: (sa0:aha0:0:0:0): BLANK CHECK req sz: 1 (decimal) asc:0,5
 Feb 26 10:11:07 gewis /kernel: (sa0:aha0:0:0:0): End-of-data detected
 
 The one above is while reading an empty tape, so no problem here
 
 
 Feb 26 10:19:47 gewis /kernel: (sa0:aha0:0:0:0): SPACE. CDB: 11 1 ff ff ff 0 
 Feb 26 10:19:47 gewis /kernel: (sa0:aha0:0:0:0): ILLEGAL REQUEST asc:2c,0
 Feb 26 10:19:47 gewis /kernel: (sa0:aha0:0:0:0): Command sequence error
 Feb 26 10:19:47 gewis /kernel: (sa0:aha0:0:0:0): unable to backspace over one of double filemarks at EOD- opting for safety
 Feb 26 14:15:36 gewis /kernel: (sa0:aha0:0:0:0): SPACE. CDB: 11 1 ff ff ff 0 
 Feb 26 14:15:36 gewis /kernel: (sa0:aha0:0:0:0): ILLEGAL REQUEST asc:2c,0
 Feb 26 14:15:36 gewis /kernel: (sa0:aha0:0:0:0): Command sequence error
 Feb 26 14:15:36 gewis /kernel: (sa0:aha0:0:0:0): unable to backspace over one of double filemarks at EOD- opting for safety
 
 These two were generated during the backup. After the last entry there are
 no more log entries of the tape drive until the reboot. At 14:33 it
 crashed with a pagefault in CAM. I'm sorry I don't remember the details,
 but I wasn't the console at the time and the guy who was only remembers
 this.
 Oh, it tried to sync the filesystems, but that failed with 'giving up'.
 
>How-To-Repeat:

I don't know. I think it will repeat when I make another backup, but I
rather not try...


>Fix:
	
	


>Release-Note:
>Audit-Trail:

From: "Kenneth D. Merry" <ken@plutotech.com>
To: marijn@gewis.win.tue.nl
Cc: FreeBSD-gnats-submit@FreeBSD.ORG, mjacob@feral.com
Subject: Re: kern/10281: Crash of 3.1-STABLE system due to scsi error.
Date: Sat, 27 Feb 1999 21:45:20 -0700 (MST)

 marijn@gewis.win.tue.nl wrote...
 > Feb 24 22:17:07 gewis /kernel: Waiting 5 seconds for SCSI devices to settle
 > Feb 24 22:17:07 gewis /kernel: sa0 at aha0 bus 0 target 0 lun 0
 > Feb 24 22:17:07 gewis /kernel: sa0: <HP T4000s 1.08> Removable Sequential Access SCSI-2 device 
 > Feb 24 22:17:07 gewis /kernel: sa0: 3.300MB/s transfers
 > Feb 24 22:17:07 gewis /kernel: da0 at aha0 bus 0 target 2 lun 0
 > Feb 24 22:17:07 gewis /kernel: da0: <FUJITSU M2694ES-512 812A> Fixed Direct Access SCSI-2 device 
 > Feb 24 22:17:07 gewis /kernel: aha0: ahafetchtransinfo - Inquire Setup Info Failed
 > Feb 24 22:17:07 gewis /kernel: da0: 3.300MB/s transfers
 > Feb 24 22:17:07 gewis /kernel: da0: 1033MB (2117025 512 byte sectors: 64H 32S/T 1033C)
 > Feb 24 22:17:07 gewis /kernel: da1 at aha0 bus 0 target 3 lun 0
 > Feb 24 22:17:07 gewis /kernel: da1: <FUJITSU M2266S-512 0014> Fixed Direct Access SCSI-CCS device 
 > Feb 24 22:17:07 gewis /kernel: da1: 3.300MB/s transfers
 > Feb 24 22:17:07 gewis /kernel: da1: 1030MB (2111018 512 byte sectors: 64H 32S/T 1030C)
 > Feb 24 22:17:07 gewis /kernel: da2 at aha0 bus 0 target 4 lun 0
 > Feb 24 22:17:07 gewis /kernel: da2: <FUJITSU M2266S-512 0020> Fixed Direct Access SCSI-CCS device 
 > Feb 24 22:17:07 gewis /kernel: da2: 3.300MB/s transf
 > Feb 24 22:17:07 gewis /kernel: ers
 > Feb 24 22:17:07 gewis /kernel: da2: 1030MB (2111018 512 byte sectors: 64H 32S/T 1030C)
 > Feb 24 22:17:07 gewis /kernel: da3 at aha0 bus 0 target 5 lun 0
 > Feb 24 22:17:07 gewis /kernel: da3: <FUJITSU M2624F-512 0405> Fixed Direct Access SCSI-2 device 
 > Feb 24 22:17:07 gewis /kernel: da3: 3.300MB/s transfers
 > Feb 24 22:17:07 gewis /kernel: da3: 496MB (1015812 512 byte sectors: 64H 32S/T 496C)
 > Feb 24 22:17:07 gewis /kernel: changing root device to wd0s1a
 > Feb 24 22:17:07 gewis /kernel: (cd0:aha0:0:1:0): CCB 0xf3e0333c - timed out
 > Feb 24 22:17:07 gewis /kernel: (cd0:aha0:0:1:0): CCB 0xf3e0333c - timed out
 > Feb 24 22:17:07 gewis /kernel: aha0: No longer in timeout
 > Feb 24 22:17:07 gewis /kernel: cd0 at aha0 bus 0 target 1 lun 0
 > Feb 24 22:17:07 gewis /kernel: cd0: <HP CD-Writer 6020 1.07> Removable CD-ROM SCSI-2 device 
 > Feb 24 22:17:07 gewis /kernel: cd0: 3.300MB/s transfers
 > Feb 24 22:17:07 gewis /kernel: cd0: Attempt to query device size failed: NOT READY, Logical unit not ready, cause not reportable
 > 
 > >Description:
 >  >Description;
 >  
 >  While making a backup using afio, we got the following log entries:
 >  
 >  Feb 26 10:11:07 gewis /kernel: (sa0:aha0:0:0:0): SPACE. CDB: 11 1 0 0 1 0 
 >  Feb 26 10:11:07 gewis /kernel: (sa0:aha0:0:0:0): BLANK CHECK req sz: 1 (decimal) asc:0,5
 >  Feb 26 10:11:07 gewis /kernel: (sa0:aha0:0:0:0): End-of-data detected
 >  
 >  The one above is while reading an empty tape, so no problem here
 >  
 >  
 >  Feb 26 10:19:47 gewis /kernel: (sa0:aha0:0:0:0): SPACE. CDB: 11 1 ff ff ff 0 
 >  Feb 26 10:19:47 gewis /kernel: (sa0:aha0:0:0:0): ILLEGAL REQUEST asc:2c,0
 >  Feb 26 10:19:47 gewis /kernel: (sa0:aha0:0:0:0): Command sequence error
 >  Feb 26 10:19:47 gewis /kernel: (sa0:aha0:0:0:0): unable to backspace over one of double filemarks at EOD- opting for safety
 >  Feb 26 14:15:36 gewis /kernel: (sa0:aha0:0:0:0): SPACE. CDB: 11 1 ff ff ff 0 
 >  Feb 26 14:15:36 gewis /kernel: (sa0:aha0:0:0:0): ILLEGAL REQUEST asc:2c,0
 >  Feb 26 14:15:36 gewis /kernel: (sa0:aha0:0:0:0): Command sequence error
 >  Feb 26 14:15:36 gewis /kernel: (sa0:aha0:0:0:0): unable to backspace over one of double filemarks at EOD- opting for safety
 >  
 >  These two were generated during the backup. After the last entry there are
 >  no more log entries of the tape drive until the reboot. At 14:33 it
 >  crashed with a pagefault in CAM. I'm sorry I don't remember the details,
 >  but I wasn't the console at the time and the guy who was only remembers
 >  this.
 >  Oh, it tried to sync the filesystems, but that failed with 'giving up'.
 
 Without a stack trace, it's going to be extremely difficult to say what
 caused the panic, and without some cause, we can't come up with a solution.
 
 So, without some more information, there's very little we can do to help
 with the panic.
 
 As for the two errors above, I'm CCing this to Matthew Jacob
 <mjacob@feral.com> who is the current maintainer of the CAM tape driver.
 He may be interested in them.
 
 Ken
 -- 
 Kenneth Merry
 ken@plutotech.com
 

From: Matthew Jacob <mjacob@feral.com>
To: marijn@gewis.win.tue.nl
Cc: "Kenneth D. Merry" <ken@plutotech.com>,
	FreeBSD-gnats-submit@FreeBSD.ORG
Subject: Re: kern/10281: Crash of 3.1-STABLE system due to scsi error.
Date: Thu, 4 Mar 1999 08:00:02 -0800 (PST)

 So, what ever happened with this?
 
 

From: "Kenneth D. Merry" <ken@plutotech.com>
To: mjacob@feral.com
Cc: marijn@gewis.win.tue.nl, FreeBSD-gnats-submit@FreeBSD.ORG
Subject: Re: kern/10281: Crash of 3.1-STABLE system due to scsi error.
Date: Thu, 4 Mar 1999 09:19:05 -0700 (MST)

 Matthew Jacob wrote...
 > 
 > 
 > So, what ever happened with this?
 
 I don't think I've heard anything back.
 
 Ken
 -- 
 Kenneth Merry
 ken@plutotech.com
 

From: Marijn Meijles <marijn@gewis.win.tue.nl>
To: "Kenneth D. Merry" <ken@plutotech.com>
Cc: mjacob@feral.com, freebsd-gnats-submit@freebsd.org
Subject: Re: kern/10281: Crash of 3.1-STABLE system due to scsi error.
Date: Thu, 4 Mar 1999 20:05:30 +0100

 You wrote:
 > Matthew Jacob wrote...
 > > 
 > > 
 > > So, what ever happened with this?
 > 
 > I don't think I've heard anything back.
 > 
 Well, I did a succesful backup last night without any errors, but
 today the system crashed again. At the university I spoke with a
 friend of Guido (ex core member) and he told me that there is
 a bug in the scsi-system which brings down cdrom.com every three
 days. I think we have the same problem, do you know more about this?
 The error message is:
 devstat_end_transaction: HELP!! busy_count for da2 < 0 (-245774)
 biodone: buffer already done
 
 the busy counts gets more and more negative.
 
 -- 
 Marijn
 

From: "Kenneth D. Merry" <ken@plutotech.com>
To: marijn@gewis.win.tue.nl (Marijn Meijles)
Cc: mjacob@feral.com, freebsd-gnats-submit@freebsd.org
Subject: Re: kern/10281: Crash of 3.1-STABLE system due to scsi error.
Date: Thu, 4 Mar 1999 12:47:22 -0700 (MST)

 Marijn Meijles wrote...
 > You wrote:
 > > Matthew Jacob wrote...
 > > > 
 > > > 
 > > > So, what ever happened with this?
 > > 
 > > I don't think I've heard anything back.
 > > 
 > Well, I did a succesful backup last night without any errors, but
 > today the system crashed again. At the university I spoke with a
 > friend of Guido (ex core member) and he told me that there is
 > a bug in the scsi-system which brings down cdrom.com every three
 > days.
 
 Umm, I know nothing about this.  If there is a bug in the SCSI subsystem
 that causes crashes, why haven't I heard about it?
 
 And which machine at "cdrom.com" is crashing?
 
 > I think we have the same problem, do you know more about this?
 > The error message is:
 > devstat_end_transaction: HELP!! busy_count for da2 < 0 (-245774)
 > biodone: buffer already done
 > 
 > the busy counts gets more and more negative.
 
 Now that is interesting.  That indicates that, somehow or another,
 transactions are getting done more than once.
 
 Which version of sys/cam/scsi/scsi_da.c do you have?  Can you mail me the
 file?
 
 If you're getting crashes, it would help immensely if you could provide a
 stack trace from the panic.
 
 Ken
 -- 
 Kenneth Merry
 ken@plutotech.com
 

From: Marijn Meijles <marijn@gewis.win.tue.nl>
To: "Kenneth D. Merry" <ken@plutotech.com>
Cc: mjacob@feral.com, freebsd-gnats-submit@freebsd.org
Subject: Re: kern/10281: Crash of 3.1-STABLE system due to scsi error.
Date: Thu, 4 Mar 1999 21:03:34 +0100

 You wrote:
 > 
 > Umm, I know nothing about this.  If there is a bug in the SCSI subsystem
 > that causes crashes, why haven't I heard about it?
 > 
 > And which machine at "cdrom.com" is crashing?
 > 
 I think wcarchive, but it was during a party that I spoke with the guy,
 so it was all a bit 'clouded'. I'll mail him and get back to you with
 some more details.
 
 > > I think we have the same problem, do you know more about this?
 > > The error message is:
 > > devstat_end_transaction: HELP!! busy_count for da2 < 0 (-245774)
 > > biodone: buffer already done
 > > 
 > > the busy counts gets more and more negative.
 > 
 > Now that is interesting.  That indicates that, somehow or another,
 > transactions are getting done more than once.
 > 
 > Which version of sys/cam/scsi/scsi_da.c do you have?  Can you mail me the
 > file?
 > 
 ok, I'll mail it seperately.
 
 > If you're getting crashes, it would help immensely if you could provide a
 > stack trace from the panic.
 > 
 Yup, I know. I've instructed the guys at the university to write down
 everything, but the crash of today just showed this and no panic. you
 could only ping and it had to be rebooted.
 
 -- 
 Marijn
 

From: "Kenneth D. Merry" <ken@plutotech.com>
To: marijn@gewis.win.tue.nl (Marijn Meijles)
Cc: mjacob@feral.com, freebsd-gnats-submit@freebsd.org
Subject: Re: kern/10281: Crash of 3.1-STABLE system due to scsi error.
Date: Thu, 4 Mar 1999 13:16:08 -0700 (MST)

 Marijn Meijles wrote...
 > You wrote:
 > > 
 > > Umm, I know nothing about this.  If there is a bug in the SCSI subsystem
 > > that causes crashes, why haven't I heard about it?
 > > 
 > > And which machine at "cdrom.com" is crashing?
 > > 
 > I think wcarchive, but it was during a party that I spoke with the guy,
 > so it was all a bit 'clouded'. I'll mail him and get back to you with
 > some more details.
 
 Yeah, I'd like to hear about it.  If it is wcarchive, David Greenman has
 been uncharacteristically silent on it.
 
 > > > I think we have the same problem, do you know more about this?
 > > > The error message is:
 > > > devstat_end_transaction: HELP!! busy_count for da2 < 0 (-245774)
 > > > biodone: buffer already done
 > > > 
 > > > the busy counts gets more and more negative.
 > > 
 > > Now that is interesting.  That indicates that, somehow or another,
 > > transactions are getting done more than once.
 > > 
 > > Which version of sys/cam/scsi/scsi_da.c do you have?  Can you mail me the
 > > file?
 > > 
 > ok, I'll mail it seperately.
 
 It looks normal enough.  I'm not sure what could be going on.  I haven't
 seen this sort of problem before.
 
 > > If you're getting crashes, it would help immensely if you could provide a
 > > stack trace from the panic.
 > > 
 > Yup, I know. I've instructed the guys at the university to write down
 > everything, but the crash of today just showed this and no panic. you
 > could only ping and it had to be rebooted.
 
 Hmm, sounds like it may just be a hang and not a panic.
 
 It's possible that it could be a bug in the aha driver or something.  That
 driver hasn't been tested extensively.  Some portion of the system is
 causing transactions to get completed more than once.
 
 Ken
 -- 
 Kenneth Merry
 ken@plutotech.com
 

From: Matthew Jacob <mjacob@feral.com>
To: Marijn Meijles <marijn@gewis.win.tue.nl>
Cc: "Kenneth D. Merry" <ken@plutotech.com>,
	freebsd-gnats-submit@freebsd.org
Subject: Re: kern/10281: Crash of 3.1-STABLE system due to scsi error.
Date: Tue, 16 Mar 1999 12:28:37 -0800 (PST)

 Did this ever get any more info?
 
 
 
 

From: Marijn Meijles <marijn@gewis.win.tue.nl>
To: mjacob@feral.com
Cc: "Kenneth D. Merry" <ken@plutotech.com>,
	freebsd-gnats-submit@freebsd.org
Subject: Re: kern/10281: Crash of 3.1-STABLE system due to scsi error.
Date: Tue, 16 Mar 1999 21:48:20 +0100

 You wrote:
 > 
 > Did this ever get any more info?
 > 
 nope. last thing we had was a reboot without a trace, after 4 days uptime
 with a kernel from march 5. yesterday I built a new kernel because there
 have been some serious CAM updates. Now all I can do is wait and see.
 Next week, when my exams are over and it still crashes/reboots, I'll
 look more deeply into it.
 
 > 
 > 
 
 -- 
 Marijn
 
State-Changed-From-To: open->suspended 
State-Changed-By: mjacob 
State-Changed-When: Sat May 8 11:45:15 PDT 1999 
State-Changed-Why:  
We haven't heard back about this problem in some time. Moving to suspended state. 
State-Changed-From-To: suspended->closed 
State-Changed-By: phk 
State-Changed-When: Wed May 23 14:22:57 PDT 2001 
State-Changed-Why:  
Overtaken by events. 

http://www.FreeBSD.org/cgi/query-pr.cgi?pr=10281 
>Unformatted:
