From nobody@FreeBSD.org  Thu Sep 20 17:37:15 2012
Return-Path: <nobody@FreeBSD.org>
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 36311106566C
	for <freebsd-gnats-submit@FreeBSD.org>; Thu, 20 Sep 2012 17:37:15 +0000 (UTC)
	(envelope-from nobody@FreeBSD.org)
Received: from red.freebsd.org (red.freebsd.org [IPv6:2001:4f8:fff6::22])
	by mx1.freebsd.org (Postfix) with ESMTP id 170578FC12
	for <freebsd-gnats-submit@FreeBSD.org>; Thu, 20 Sep 2012 17:37:15 +0000 (UTC)
Received: from red.freebsd.org (localhost [127.0.0.1])
	by red.freebsd.org (8.14.5/8.14.5) with ESMTP id q8KHbE0W029253
	for <freebsd-gnats-submit@FreeBSD.org>; Thu, 20 Sep 2012 17:37:14 GMT
	(envelope-from nobody@red.freebsd.org)
Received: (from nobody@localhost)
	by red.freebsd.org (8.14.5/8.14.5/Submit) id q8KHbEZk029252;
	Thu, 20 Sep 2012 17:37:14 GMT
	(envelope-from nobody)
Message-Id: <201209201737.q8KHbEZk029252@red.freebsd.org>
Date: Thu, 20 Sep 2012 17:37:14 GMT
From: Paul Procacci <pprocacci@gmail.com>
To: freebsd-gnats-submit@FreeBSD.org
Subject: [panic] bioq_init or bioq_remove (unsure which)
X-Send-Pr-Version: www-3.1
X-GNATS-Notify:

>Number:         171814
>Category:       amd64
>Synopsis:       [panic] bioq_init or bioq_remove (unsure which)
>Confidential:   no
>Severity:       non-critical
>Priority:       low
>Responsible:    freebsd-amd64
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Thu Sep 20 17:40:03 UTC 2012
>Closed-Date:    
>Last-Modified:  Tue Sep 25 22:03:43 UTC 2012
>Originator:     Paul Procacci
>Release:        9.0-RELEASE-p3
>Organization:
Datapipe
>Environment:
FreeBSD db1.xxxxxxxxxxxxx.com 9.0-RELEASE-p3 FreeBSD 9.0-RELEASE-p3 #0: Tue Jun 12 02:52:29 UTC 2012     root@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC  amd64
>Description:
cpuid = 5; acpic id = 13
fault virtual address      = 0x20
fault code                 = supervisor read data, page not present
instruction pointer        = 0x20 :0xffffffff80865023
stack pointer              = 0x28 :0xffffff80002b3b30
frame pointer              = 0x28 :0xffffff80002b3b50
code segment               = base 0x0, limit 0xfffff, type 0x1b
                           = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags           = interrupt enabled, resume, IOPL = 0
current process            = 13 (g_event)
trap number                = 12
panic: page fault
cpuid = 5
KDB: stack backtrace:
#0 0xffffffff808680fe at kdb_backtrace+0x5e
#1 0xffffffff8x832cb7 at panic+0x187
#2 0xffffffff80b185a0 at trap_fatal+0x290
#3 0xffffffff80b188e9 at trap_pfault+0x1f9
#4 0xffffffff80b18daf at trap+0x3df
#5 0xffffffff80b0324f at calltrap+0x8
#6 0xffffffff807d165c at g_destroy_consumer+0x4c
#7 0xffffffff807ce6cc at g_run_events+0x1ec
#8 0xffffffff8080682f at fork_exit+0x11f
#9 0xffffffff80b0377e at fork_trampoline+0xe

#############################################################

- I'm using a GENERIC kernel.
- Following the instructions here: http://www.freebsd.org/doc/faq/advanced.html
  I'm able to ascertain that the problem exists in one of the following two functions:

db1# nm -n /boot/kernel/kernel | fgrep ffffffff808650
ffffffff80865080 T bioq_init
ffffffff808650b0 T bioq_remove

#############################################################

I'm using zfs over gmultipath over an isp device.

Here is the last errors from /var/log/messages leading up to the panic:
#############################################################
Sep 18 22:48:57 db1 kernel: (da3:isp1:0:0:1): lost device - 4 outstanding
Sep 18 22:48:57 db1 kernel: (ses2:isp1:0:0:254): lost device
Sep 18 22:48:57 db1 kernel: (ses2:isp1:0:0:254): removing device entry
Sep 18 22:48:57 db1 kernel: (da3:isp1:0:0:1): oustanding 3
Sep 18 22:48:57 db1 kernel: GEOM_MULTIPATH: da3 failed in PG
Sep 18 22:48:57 db1 kernel: (da3:GEOM_MULTIPATH: da1 now active path in PG
Sep 18 22:48:57 db1 kernel: isp1:0:0:1): oustanding 2
Sep 18 22:48:57 db1 kernel: (da3:isp1:0:0:1): oustanding 1
Sep 18 22:48:57 db1 kernel: (da3:isp1:0:0:1): oustanding 0
Sep 18 22:48:57 db1 kernel: (da3:isp1:0:0:1): removing device entry
Sep 18 22:48:57 db1 kernel: GEOM_MULTIPATH: da3 removed from PG
Sep 18 22:48:57 db1 kernel:
Sep 18 22:48:57 db1 kernel:
Sep 18 22:48:57 db1 kernel: Fatal trap 12: page fault while in kernel mode
Sep 18 22:48:57 db1 kernel: cpuid = 5; apic id = 13
Sep 18 22:48:57 db1 kernel: fault virtual address       = 0x20
Sep 18 22:48:57 db1 kernel: fault code          = supervisor read data, page not present
#############################################################
>How-To-Repeat:
I cannot repeat this problem on demand.  It's happened twice in the past couple of months, but I do not have a test case in which it can be reproduced.
>Fix:
Unknown.

>Release-Note:
>Audit-Trail:

From: John Baldwin <jhb@freebsd.org>
To: freebsd-ia64@freebsd.org
Cc: Paul Procacci <pprocacci@gmail.com>,
 freebsd-gnats-submit@freebsd.org
Subject: Re: ia64/171814: [panic] bioq_init or bioq_remove (unsure which)
Date: Tue, 25 Sep 2012 08:45:59 -0400

 On Thursday, September 20, 2012 1:37:14 pm Paul Procacci wrote:
 > 
 > >Number:         171814
 > >Category:       ia64
 > >Synopsis:       [panic] bioq_init or bioq_remove (unsure which)
 > >Confidential:   no
 > >Severity:       non-critical
 > >Priority:       low
 > >Responsible:    freebsd-ia64
 > >State:          open
 > >Quarter:        
 > >Keywords:       
 > >Date-Required:
 > >Class:          sw-bug
 > >Submitter-Id:   current-users
 > >Arrival-Date:   Thu Sep 20 17:40:03 UTC 2012
 > >Closed-Date:
 > >Last-Modified:
 > >Originator:     Paul Procacci
 > >Release:        9.0-RELEASE-p3
 > >Organization:
 > Datapipe
 > >Environment:
 > FreeBSD db1.xxxxxxxxxxxxx.com 9.0-RELEASE-p3 FreeBSD 9.0-RELEASE-p3 #0: Tue 
 Jun 12 02:52:29 UTC 2012     root@amd64-
 builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC  amd64
 > >Description:
 > cpuid = 5; acpic id = 13
 > fault virtual address      = 0x20
 > fault code                 = supervisor read data, page not present
 > instruction pointer        = 0x20 :0xffffffff80865023
 > stack pointer              = 0x28 :0xffffff80002b3b30
 > frame pointer              = 0x28 :0xffffff80002b3b50
 > code segment               = base 0x0, limit 0xfffff, type 0x1b
 >                            = DPL 0, pres 1, long 1, def32 0, gran 1
 > processor eflags           = interrupt enabled, resume, IOPL = 0
 > current process            = 13 (g_event)
 > trap number                = 12
 > panic: page fault
 > cpuid = 5
 > KDB: stack backtrace:
 > #0 0xffffffff808680fe at kdb_backtrace+0x5e
 > #1 0xffffffff8x832cb7 at panic+0x187
 > #2 0xffffffff80b185a0 at trap_fatal+0x290
 > #3 0xffffffff80b188e9 at trap_pfault+0x1f9
 > #4 0xffffffff80b18daf at trap+0x3df
 > #5 0xffffffff80b0324f at calltrap+0x8
 > #6 0xffffffff807d165c at g_destroy_consumer+0x4c
 > #7 0xffffffff807ce6cc at g_run_events+0x1ec
 > #8 0xffffffff8080682f at fork_exit+0x11f
 > #9 0xffffffff80b0377e at fork_trampoline+0xe
 > 
 > #############################################################
 > 
 > - I'm using a GENERIC kernel.
 > - Following the instructions here: 
 http://www.freebsd.org/doc/faq/advanced.html
 >   I'm able to ascertain that the problem exists in one of the following two 
 functions:
 > 
 > db1# nm -n /boot/kernel/kernel | fgrep ffffffff808650
 > ffffffff80865080 T bioq_init
 > ffffffff808650b0 T bioq_remove
 
 No, I think it occurred in some other routine.  Note that 5023 < 5080, so the 
 PC is before the start of 'bioq_init()'.  It's probably in some static 
 function called by g_destroy_consumer() such as g_do_wither().  Do you have
 a kernel.symbols file?  If so, doing 'gdb /boot/kernel/kernel' followed by
 'l *0xffffffff80865023' would be very helpful.
 
 -- 
 John Baldwin

From: Paul Procacci <pprocacci@gmail.com>
To: John Baldwin <jhb@freebsd.org>
Cc: freebsd-ia64@freebsd.org, freebsd-gnats-submit@freebsd.org
Subject: Re: ia64/171814: [panic] bioq_init or bioq_remove (unsure which)
Date: Tue, 25 Sep 2012 12:11:17 -0500

 --047d7b66f839532c0a04ca89cbf7
 Content-Type: text/plain; charset=ISO-8859-1
 
 Thanks John for your response.
 
 Here is the output provided what you had explained to do:
 
 
 0xffffffff80865023 is in devstat_remove_entry
 (/usr/src/sys/kern/subr_devstat.c:193).
 188
 189             /* Remove this entry from the devstat queue */
 190             atomic_add_acq_int(&ds->sequence1, 1);
 191             if (ds->id == NULL) {
 192                     devstat_num_devs--;
 193                     STAILQ_REMOVE(devstat_head, ds, devstat, dev_links);
 194             }
 195             devstat_free(ds);
 196             devstat_generation++;
 197             mtx_unlock(&devstat_mutex);
 
 
 
 -- 
 __________________
 
 :(){ :|:& };:
 
 --047d7b66f839532c0a04ca89cbf7
 Content-Type: text/html; charset=ISO-8859-1
 Content-Transfer-Encoding: quoted-printable
 
 Thanks John for your response.<div><br></div><div>Here is the output provid=
 ed what you had explained to do:</div><div><br><div><br></div><div><div>0xf=
 fffffff80865023 is in devstat_remove_entry (/usr/src/sys/kern/subr_devstat.=
 c:193).</div>
 <div>188</div><div>189 =A0 =A0 =A0 =A0 =A0 =A0 /* Remove this entry from th=
 e devstat queue */</div><div>190 =A0 =A0 =A0 =A0 =A0 =A0 atomic_add_acq_int=
 (&amp;ds-&gt;sequence1, 1);</div><div>191 =A0 =A0 =A0 =A0 =A0 =A0 if (ds-&g=
 t;id =3D=3D NULL) {</div><div>192 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 d=
 evstat_num_devs--;</div>
 <div>193 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 STAILQ_REMOVE(devstat_head=
 , ds, devstat, dev_links);</div><div>194 =A0 =A0 =A0 =A0 =A0 =A0 }</div><di=
 v>195 =A0 =A0 =A0 =A0 =A0 =A0 devstat_free(ds);</div><div>196 =A0 =A0 =A0 =
 =A0 =A0 =A0 devstat_generation++;</div><div>197 =A0 =A0 =A0 =A0 =A0 =A0 mtx=
 _unlock(&amp;devstat_mutex);</div>
 <div><br></div><br clear=3D"all"><div><br></div>-- <br>__________________<b=
 r><br>:(){ :|:&amp; };:<br>
 </div></div>
 
 --047d7b66f839532c0a04ca89cbf7--
Responsible-Changed-From-To: freebsd-ia64->freebsd-amd64 
Responsible-Changed-By: marcel 
Responsible-Changed-When: Tue Sep 25 22:03:08 UTC 2012 
Responsible-Changed-Why:  
Change category to match environment. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=171814 
>Unformatted:
