From bob@smtp.whitebarn.com  Thu Apr  6 21:04:12 2000
Return-Path: <bob@smtp.whitebarn.com>
Received: from smtp.whitebarn.com (Spin.WhiteBarn.Com [216.0.13.113])
	by hub.freebsd.org (Postfix) with ESMTP id 5F37A37BA30
	for <FreeBSD-gnats-submit@freebsd.org>; Thu,  6 Apr 2000 21:04:01 -0700 (PDT)
	(envelope-from bob@smtp.whitebarn.com)
Received: (from bob@localhost)
	by smtp.whitebarn.com (8.9.3/8.9.3) id XAA80210;
	Thu, 6 Apr 2000 23:03:42 -0500 (CDT)
	(envelope-from bob)
Message-Id: <200004070403.XAA80210@smtp.whitebarn.com>
Date: Thu, 6 Apr 2000 23:03:42 -0500 (CDT)
From: Bob@WhiteBarn.Com
Sender: bob@smtp.whitebarn.com
Reply-To: Bob@WhiteBarn.Com
To: FreeBSD-gnats-submit@freebsd.org
Subject: ad driver and SMP kernel panic (vinum may be involved)
X-Send-Pr-Version: 3.2

>Number:         17839
>Category:       kern
>Synopsis:       ad driver and SMP kernel panic (vinum may be involved)
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    sos
>State:          closed
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Thu Apr  6 21:10:01 PDT 2000
>Closed-Date:    Tue Nov 14 00:42:13 PST 2000
>Last-Modified:  Tue Nov 14 00:43:04 PST 2000
>Originator:     Bob Van Valzah
>Release:        FreeBSD 4.0-RELEASE i386, FreeBSD 4.0-STABLE
>Organization:
>Environment:

	ASUS P2B-D Dual 500 MHz P-III 256M
	3x Seagate 7200 RPM ATAPI UDMA33 drives

>Description:

	"Page fault while in kernel mode" panic whenever /etc/security
	is run. (This is very inconvenient since /etc/periodic/daily
	runs it daily.)

	The instruction pointer at the time of the page fault is bogus,
	apparently since biodone() is jumping off the deep end.

	I suspect the ad driver since a traceback shows ad_interrupt()
	calling biodone() leading up to the panic.  See below.

	I suspect the biodone() code is failing in an SMP environment
	since it runs fine on my uniprocessor hardware.

	I am running some of my filesytems under vinum so it may have a
	role in this too.

	It gave up trying to sync 492 dirty buffers following the panic.
	That seems like a lot to me so perhaps that's related to the
	panic? FWIW, crash dumps work fine through the ata driver once
	it is reset following the panic.

	(kgdb) where
	#0  0xc0151390 in boot ()
	#1  0xc0151748 in poweroff_wait ()
	#2  0xc028670f in trap_fatal ()
	#3  0xc02863a5 in trap_pfault ()
	#4  0xc0285f77 in trap ()
	#5  0xc0ef8ef4 in ?? ()	<-- instruction pointer at time of page fault
	#6  0xc01752d7 in biodone ()
	#7  0xc0258a5e in ad_interrupt ()
	#8  0xc025544e in ata_intr ()

	This line printed by panic() may be helpful:
		interrupt mask          = bio  <- SMP: XXX
	since it seems to have something to do with the SMP code.

>How-To-Repeat:

	Just run /etc/security on the above hardware.  Blamo.

	The panic happens about 5 minutes into /etc/security, perhaps
	since I have several filesystems.  Some use vinum and some don't.

	Other activity may well also cause the panic, but /etc/security
	kills it every time.

>Fix:
	
	Don't have one yet, but I'll tell you what I've tried:

	I tried running a kernel with INVARIANT set and that didn't catch
	anything.

	I tried running a kernel with VFS_BIO_DEBUG set and that didn't
	even make it throught /etc/rc before panicing.

	I don't know what to try next, but I'll happily run any further
	tests you can propose on my hardware.  Should I try -CURENT?

>Release-Note:
>Audit-Trail:
Responsible-Changed-From-To: freebsd-bugs->sos 
Responsible-Changed-By: sheldonh 
Responsible-Changed-When: Wed Apr 12 04:25:13 PDT 2000 
Responsible-Changed-Why:  
Over to the ata driver's maintainer. 
State-Changed-From-To: open->analyzed 
State-Changed-By: sos 
State-Changed-When: Thu Apr 13 00:04:26 PDT 2000 
State-Changed-Why:  
Hmm, this looks exactly like the problem others are having with vinum 
both in 4.0-RELEASE and -CURRENT. Greg and I have been using alot of 
time lately to corner this what appears to be random memory corruption. 

One thing that has helped for me so far is to not use a fxp netcard, 
and use the patches that greg has put in -current, but it is still 
not 100% stable yet. 

Could you send me the verbose boot message from that machinese, so we 
can use that in our search, as hw combinations and drivers might be 
an issue here ? 

State-Changed-From-To: analyzed->closed 
State-Changed-By: sos 
State-Changed-When: Tue Nov 14 00:42:13 PST 2000 
State-Changed-Why:  
Please upgrade to 4.2, lots of problems has been fixed both in 
vinum and the ata driver 


http://www.freebsd.org/cgi/query-pr.cgi?pr=17839 
>Unformatted:
