From nobody@FreeBSD.org  Fri Dec 21 13:21:00 2007
Return-Path: <nobody@FreeBSD.org>
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 10EC316A41A
	for <freebsd-gnats-submit@FreeBSD.org>; Fri, 21 Dec 2007 13:21:00 +0000 (UTC)
	(envelope-from nobody@FreeBSD.org)
Received: from www.freebsd.org (www.freebsd.org [IPv6:2001:4f8:fff6::21])
	by mx1.freebsd.org (Postfix) with ESMTP id E72CF13C448
	for <freebsd-gnats-submit@FreeBSD.org>; Fri, 21 Dec 2007 13:20:59 +0000 (UTC)
	(envelope-from nobody@FreeBSD.org)
Received: from www.freebsd.org (localhost [127.0.0.1])
	by www.freebsd.org (8.14.2/8.14.2) with ESMTP id lBLDKZ6L054989
	for <freebsd-gnats-submit@FreeBSD.org>; Fri, 21 Dec 2007 13:20:35 GMT
	(envelope-from nobody@www.freebsd.org)
Received: (from nobody@localhost)
	by www.freebsd.org (8.14.2/8.14.1/Submit) id lBLDKZFh054988;
	Fri, 21 Dec 2007 13:20:35 GMT
	(envelope-from nobody)
Message-Id: <200712211320.lBLDKZFh054988@www.freebsd.org>
Date: Fri, 21 Dec 2007 13:20:35 GMT
From: Barkley Vowk <bvowk@math.ualberta.ca>
To: freebsd-gnats-submit@FreeBSD.org
Subject: 7.0-BETA4 from yesterday panics when nfs server is mounted
X-Send-Pr-Version: www-3.1
X-GNATS-Notify:

>Number:         118928
>Category:       kern
>Synopsis:       [nfs] 7.0-BETA4 from yesterday panics when nfs server is mounted
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    mohans
>State:          closed
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Fri Dec 21 13:30:01 UTC 2007
>Closed-Date:    Sat Dec 29 10:36:02 UTC 2007
>Last-Modified:  Sat Dec 29 10:40:01 UTC 2007
>Originator:     Barkley Vowk
>Release:        7.0-BETA4
>Organization:
>Environment:
FreeBSD tethys 7.0-BETA4 FreeBSD 7.0-BETA4 #0: Thu Dec 20 15:57:56 EET 2007     bvowk@tethys:/usr/obj/usr/src/sys/GENERIC  amd64

>Description:
tethys# kgdb kernel.debug /var/crash/vmcore.0
[GDB will not be able to debug user-mode threads: /usr/lib/libthread_db.so: Undefined symbol "ps_pglobal_lookup"]
GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "amd64-marcel-freebsd".

Unread portion of the kernel message buffer:
kernel trap 12 with interrupts disabled


Fatal trap 12: page fault while in kernel mode
cpuid = 2; apic id = 02
fault virtual address	= 0x30
fault code		= supervisor read data, page not present
instruction pointer	= 0x8:0xffffffff804a9d1f
stack pointer	        = 0x10:0xffffffffb12ef820
frame pointer	        = 0x10:0xffffff0003b43680
code segment		= base 0x0, limit 0xfffff, type 0x1b
			= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags	= resume, IOPL = 0
current process		= 810 (nfsd)
trap number		= 12
panic: page fault
cpuid = 2
Uptime: 1m27s
Physical memory: 8179 MB
Dumping 555 MB: 540 524 508 492 476 460 444 428 412 396 380 364 348 332 316 300 284 268 252 236 220 204 188 172 156 140 124 108 92 76 60 44 28 12

#0  doadump () at pcpu.h:194
194		__asm __volatile("movq %%gs:0,%0" : "=r" (td));
(kgdb) bt
#0  doadump () at pcpu.h:194
#1  0x0000000000000004 in ?? ()
#2  0xffffffff804775c9 in boot (howto=260)
    at /usr/src/sys/kern/kern_shutdown.c:409
#3  0xffffffff804779cd in panic (fmt=0x104 <Address 0x104 out of bounds>)
    at /usr/src/sys/kern/kern_shutdown.c:563
#4  0xffffffff8074c254 in trap_fatal (frame=0xffffff0003b43680, 
    eva=18446742974260342784) at /usr/src/sys/amd64/amd64/trap.c:724
#5  0xffffffff8074cecf in trap (frame=0xffffffffb12ef770)
    at /usr/src/sys/amd64/amd64/trap.c:251
#6  0xffffffff80732bbe in calltrap ()
    at /usr/src/sys/amd64/amd64/exception.S:169
#7  0xffffffff804a9d1f in turnstile_broadcast (ts=0x0, queue=0)
    at /usr/src/sys/kern/subr_turnstile.c:835
#8  0xffffffff8046c01a in _mtx_unlock_sleep (m=0xffffffff80a709e0, opts=Variable "opts" is not available.
)
    at /usr/src/sys/kern/kern_mutex.c:605
#9  0xffffffff80603dd2 in nfsrv3_access (nfsd=0xffffff00059e0200, 
    slp=0xffffff0003adbc00, td=0xffffff0003b43680, mrq=0xffffffffb12efaf0)
    at /usr/src/sys/nfsserver/nfs_serv.c:253
#10 0xffffffff8061549d in nfssvc (td=Variable "td" is not available.
)
    at /usr/src/sys/nfsserver/nfs_syscalls.c:461
#11 0xffffffff8074c8a7 in syscall (frame=0xffffffffb12efc70)
    at /usr/src/sys/amd64/amd64/trap.c:852
---Type <return> to continue, or q <return> to quit--- 
#12 0xffffffff80732dcb in Xfast_syscall ()
#5  0xffffffff8074cecf in trap (frame=0xffffffffb12ef770)
    at /usr/src/sys/amd64/amd64/trap.c:251
#6  0xffffffff80732bbe in calltrap ()
    at /usr/src/sys/amd64/amd64/exception.S:169
#7  0xffffffff804a9d1f in turnstile_broadcast (ts=0x0, queue=0)
    at /usr/src/sys/kern/subr_turnstile.c:835
#8  0xffffffff8046c01a in _mtx_unlock_sleep (m=0xffffffff80a709e0, opts=Variable "opts" is not available.
)
    at /usr/src/sys/kern/kern_mutex.c:605
#9  0xffffffff80603dd2 in nfsrv3_access (nfsd=0xffffff00059e0200, 
    slp=0xffffff0003adbc00, td=0xffffff0003b43680, mrq=0xffffffffb12efaf0)
    at /usr/src/sys/nfsserver/nfs_serv.c:253
#10 0xffffffff8061549d in nfssvc (td=Variable "td" is not available.
)
    at /usr/src/sys/nfsserver/nfs_syscalls.c:461
#11 0xffffffff8074c8a7 in syscall (frame=0xffffffffb12efc70)
    at /usr/src/sys/amd64/amd64/trap.c:852
---Type <return> to continue, or q <return> to quit---
#12 0xffffffff80732dcb in Xfast_syscall ()
    at /usr/src/sys/amd64/amd64/exception.S:290
#13 0x00000008006874fc in ?? ()
Previous frame inner to this frame (corrupt stack?)
(kgdb) 

>How-To-Repeat:
I have a group of 7.0-B4 boxes that export /var to a debian etch
machine. If I enable nfsd and attempt to read the directory, the machine
instantly panics and reboots. And will keep booting and panic'ing seconds
after NFSD is restarted. 


>Fix:


>Release-Note:
>Audit-Trail:
Responsible-Changed-From-To: freebsd-bugs->mohans 
Responsible-Changed-By: remko 
Responsible-Changed-When: Fri Dec 21 18:41:28 UTC 2007 
Responsible-Changed-Why:  
reassign to maintainer 

http://www.freebsd.org/cgi/query-pr.cgi?pr=118928 

From: Kris Kennaway <kris@FreeBSD.org>
To: Barkley Vowk <bvowk@math.ualberta.ca>
Cc: freebsd-gnats-submit@FreeBSD.org
Subject: Re: kern/118928: 7.0-BETA4 from yesterday panics when nfs server
 is	mounted
Date: Tue, 25 Dec 2007 15:06:30 +0100

 Barkley Vowk wrote:
 
 > tethys# kgdb kernel.debug /var/crash/vmcore.0
 > [GDB will not be able to debug user-mode threads: /usr/lib/libthread_db.so: Undefined symbol "ps_pglobal_lookup"]
 > GNU gdb 6.1.1 [FreeBSD]
 > Copyright 2004 Free Software Foundation, Inc.
 > GDB is free software, covered by the GNU General Public License, and you are
 > welcome to change it and/or distribute copies of it under certain conditions.
 > Type "show copying" to see the conditions.
 > There is absolutely no warranty for GDB.  Type "show warranty" for details.
 > This GDB was configured as "amd64-marcel-freebsd".
 > 
 > Unread portion of the kernel message buffer:
 > kernel trap 12 with interrupts disabled
 > 
 > 
 > Fatal trap 12: page fault while in kernel mode
 > cpuid = 2; apic id = 02
 > fault virtual address	= 0x30
 > fault code		= supervisor read data, page not present
 > instruction pointer	= 0x8:0xffffffff804a9d1f
 > stack pointer	        = 0x10:0xffffffffb12ef820
 > frame pointer	        = 0x10:0xffffff0003b43680
 > code segment		= base 0x0, limit 0xfffff, type 0x1b
 > 			= DPL 0, pres 1, long 1, def32 0, gran 1
 > processor eflags	= resume, IOPL = 0
 > current process		= 810 (nfsd)
 > trap number		= 12
 > panic: page fault
 > cpuid = 2
 > Uptime: 1m27s
 > Physical memory: 8179 MB
 > Dumping 555 MB: 540 524 508 492 476 460 444 428 412 396 380 364 348 332 316 300 284 268 252 236 220 204 188 172 156 140 124 108 92 76 60 44 28 12
 > 
 > #0  doadump () at pcpu.h:194
 > 194		__asm __volatile("movq %%gs:0,%0" : "=r" (td));
 > (kgdb) bt
 > #0  doadump () at pcpu.h:194
 > #1  0x0000000000000004 in ?? ()
 > #2  0xffffffff804775c9 in boot (howto=260)
 >     at /usr/src/sys/kern/kern_shutdown.c:409
 > #3  0xffffffff804779cd in panic (fmt=0x104 <Address 0x104 out of bounds>)
 >     at /usr/src/sys/kern/kern_shutdown.c:563
 > #4  0xffffffff8074c254 in trap_fatal (frame=0xffffff0003b43680, 
 >     eva=18446742974260342784) at /usr/src/sys/amd64/amd64/trap.c:724
 > #5  0xffffffff8074cecf in trap (frame=0xffffffffb12ef770)
 >     at /usr/src/sys/amd64/amd64/trap.c:251
 > #6  0xffffffff80732bbe in calltrap ()
 >     at /usr/src/sys/amd64/amd64/exception.S:169
 > #7  0xffffffff804a9d1f in turnstile_broadcast (ts=0x0, queue=0)
 >     at /usr/src/sys/kern/subr_turnstile.c:835
 > #8  0xffffffff8046c01a in _mtx_unlock_sleep (m=0xffffffff80a709e0, opts=Variable "opts" is not available.
 > )
 >     at /usr/src/sys/kern/kern_mutex.c:605
 > #9  0xffffffff80603dd2 in nfsrv3_access (nfsd=0xffffff00059e0200, 
 >     slp=0xffffff0003adbc00, td=0xffffff0003b43680, mrq=0xffffffffb12efaf0)
 >     at /usr/src/sys/nfsserver/nfs_serv.c:253
 > #10 0xffffffff8061549d in nfssvc (td=Variable "td" is not available.
 > )
 >     at /usr/src/sys/nfsserver/nfs_syscalls.c:461
 > #11 0xffffffff8074c8a7 in syscall (frame=0xffffffffb12efc70)
 >     at /usr/src/sys/amd64/amd64/trap.c:852
 > ---Type <return> to continue, or q <return> to quit--- 
 > #12 0xffffffff80732dcb in Xfast_syscall ()
 > #5  0xffffffff8074cecf in trap (frame=0xffffffffb12ef770)
 >     at /usr/src/sys/amd64/amd64/trap.c:251
 > #6  0xffffffff80732bbe in calltrap ()
 >     at /usr/src/sys/amd64/amd64/exception.S:169
 > #7  0xffffffff804a9d1f in turnstile_broadcast (ts=0x0, queue=0)
 >     at /usr/src/sys/kern/subr_turnstile.c:835
 > #8  0xffffffff8046c01a in _mtx_unlock_sleep (m=0xffffffff80a709e0, opts=Variable "opts" is not available.
 > )
 >     at /usr/src/sys/kern/kern_mutex.c:605
 > #9  0xffffffff80603dd2 in nfsrv3_access (nfsd=0xffffff00059e0200, 
 >     slp=0xffffff0003adbc00, td=0xffffff0003b43680, mrq=0xffffffffb12efaf0)
 >     at /usr/src/sys/nfsserver/nfs_serv.c:253
 > #10 0xffffffff8061549d in nfssvc (td=Variable "td" is not available.
 > )
 >     at /usr/src/sys/nfsserver/nfs_syscalls.c:461
 > #11 0xffffffff8074c8a7 in syscall (frame=0xffffffffb12efc70)
 >     at /usr/src/sys/amd64/amd64/trap.c:852
 > ---Type <return> to continue, or q <return> to quit---
 > #12 0xffffffff80732dcb in Xfast_syscall ()
 >     at /usr/src/sys/amd64/amd64/exception.S:290
 > #13 0x00000008006874fc in ?? ()
 > Previous frame inner to this frame (corrupt stack?)
 > (kgdb) 
 > 
 >> How-To-Repeat:
 > I have a group of 7.0-B4 boxes that export /var to a debian etch machine. If I enable nfsd and attempt to read the directory, the machine instantly panics and reboots. And will keep booting and panic'ing seconds after NFSD is restarted. 
 
 Can you provide more information about your configuration? 
 nfs_serv.c:253 is
 
          VFS_UNLOCK_GIANT(vfslocked);
 
 which should be a NOP in the case of a UFS export.  What is the 
 filesystem being exported?  It looks like you might be using a 
 Giant-locked filesystem as your /var.
 
 Can you enable INVARIANTS and INVARIANT_SUPPORT in your kernel and 
 reproduce the panic?  It should give a better debugging panic.
 
 Kris
 

From: Barkley Vowk <bvowk@math.ualberta.ca>
To: Kris Kennaway <kris@FreeBSD.org>
Cc: freebsd-gnats-submit@FreeBSD.org
Subject: Re: kern/118928: 7.0-BETA4 from yesterday panics when nfs server is
 mounted
Date: Fri, 28 Dec 2007 01:15:23 -0700 (MST)

 I can't reproduce it with any kernel except the ones that bail. If I 
 recompile the code with the same GENERIC as the one used to build the ones
 that bail, it seems to work just fine. But if I install a broken one, 
 it's an instant panic with exactly the same dump. This is frustrating, 
 because I repeatably compiled a panicing kernel before I submitted the PR. 
 Other than some reboots and some single user fsck, nothing has changed!
 
 I'm betting this is either random user or hardware error. Sorry, I thought 
 I had eliminated that by recompiling broken more than once on hardware 
 that had been in production.
 
 -----------------------------------------------------------
 Barkley C. Vowk -- Systems Analyst -- University of Alberta
 Math Sciences Department - Barkley.Vowk@math.ualberta.ca
 Office: CAB642A, 780-492-4064
 
 Opinions expressed are the responsibility of the author and
 may not reflect the opinions of others or reality.
 
 On Tue, 25 Dec 2007, Kris Kennaway wrote:
 
 > Barkley Vowk wrote:
 >
 >> tethys# kgdb kernel.debug /var/crash/vmcore.0
 >> [GDB will not be able to debug user-mode threads: /usr/lib/libthread_db.so: 
 >> Undefined symbol "ps_pglobal_lookup"]
 >> GNU gdb 6.1.1 [FreeBSD]
 >> Copyright 2004 Free Software Foundation, Inc.
 >> GDB is free software, covered by the GNU General Public License, and you 
 >> are
 >> welcome to change it and/or distribute copies of it under certain 
 >> conditions.
 >> Type "show copying" to see the conditions.
 >> There is absolutely no warranty for GDB.  Type "show warranty" for details.
 >> This GDB was configured as "amd64-marcel-freebsd".
 >> 
 >> Unread portion of the kernel message buffer:
 >> kernel trap 12 with interrupts disabled
 >> 
 >> 
 >> Fatal trap 12: page fault while in kernel mode
 >> cpuid = 2; apic id = 02
 >> fault virtual address	= 0x30
 >> fault code		= supervisor read data, page not present
 >> instruction pointer	= 0x8:0xffffffff804a9d1f
 >> stack pointer	        = 0x10:0xffffffffb12ef820
 >> frame pointer	        = 0x10:0xffffff0003b43680
 >> code segment		= base 0x0, limit 0xfffff, type 0x1b
 >> 			= DPL 0, pres 1, long 1, def32 0, gran 1
 >> processor eflags	= resume, IOPL = 0
 >> current process		= 810 (nfsd)
 >> trap number		= 12
 >> panic: page fault
 >> cpuid = 2
 >> Uptime: 1m27s
 >> Physical memory: 8179 MB
 >> Dumping 555 MB: 540 524 508 492 476 460 444 428 412 396 380 364 348 332 316 
 >> 300 284 268 252 236 220 204 188 172 156 140 124 108 92 76 60 44 28 12
 >> 
 >> #0  doadump () at pcpu.h:194
 >> 194		__asm __volatile("movq %%gs:0,%0" : "=r" (td));
 >> (kgdb) bt
 >> #0  doadump () at pcpu.h:194
 >> #1  0x0000000000000004 in ?? ()
 >> #2  0xffffffff804775c9 in boot (howto=260)
 >>     at /usr/src/sys/kern/kern_shutdown.c:409
 >> #3  0xffffffff804779cd in panic (fmt=0x104 <Address 0x104 out of bounds>)
 >>     at /usr/src/sys/kern/kern_shutdown.c:563
 >> #4  0xffffffff8074c254 in trap_fatal (frame=0xffffff0003b43680, 
 >> eva=18446742974260342784) at /usr/src/sys/amd64/amd64/trap.c:724
 >> #5  0xffffffff8074cecf in trap (frame=0xffffffffb12ef770)
 >>     at /usr/src/sys/amd64/amd64/trap.c:251
 >> #6  0xffffffff80732bbe in calltrap ()
 >>     at /usr/src/sys/amd64/amd64/exception.S:169
 >> #7  0xffffffff804a9d1f in turnstile_broadcast (ts=0x0, queue=0)
 >>     at /usr/src/sys/kern/subr_turnstile.c:835
 >> #8  0xffffffff8046c01a in _mtx_unlock_sleep (m=0xffffffff80a709e0, 
 >> opts=Variable "opts" is not available.
 >> )
 >>     at /usr/src/sys/kern/kern_mutex.c:605
 >> #9  0xffffffff80603dd2 in nfsrv3_access (nfsd=0xffffff00059e0200, 
 >> slp=0xffffff0003adbc00, td=0xffffff0003b43680, mrq=0xffffffffb12efaf0)
 >>     at /usr/src/sys/nfsserver/nfs_serv.c:253
 >> #10 0xffffffff8061549d in nfssvc (td=Variable "td" is not available.
 >> )
 >>     at /usr/src/sys/nfsserver/nfs_syscalls.c:461
 >> #11 0xffffffff8074c8a7 in syscall (frame=0xffffffffb12efc70)
 >>     at /usr/src/sys/amd64/amd64/trap.c:852
 >> ---Type <return> to continue, or q <return> to quit--- #12 
 >> 0xffffffff80732dcb in Xfast_syscall ()
 >> #5  0xffffffff8074cecf in trap (frame=0xffffffffb12ef770)
 >>     at /usr/src/sys/amd64/amd64/trap.c:251
 >> #6  0xffffffff80732bbe in calltrap ()
 >>     at /usr/src/sys/amd64/amd64/exception.S:169
 >> #7  0xffffffff804a9d1f in turnstile_broadcast (ts=0x0, queue=0)
 >>     at /usr/src/sys/kern/subr_turnstile.c:835
 >> #8  0xffffffff8046c01a in _mtx_unlock_sleep (m=0xffffffff80a709e0, 
 >> opts=Variable "opts" is not available.
 >> )
 >>     at /usr/src/sys/kern/kern_mutex.c:605
 >> #9  0xffffffff80603dd2 in nfsrv3_access (nfsd=0xffffff00059e0200, 
 >> slp=0xffffff0003adbc00, td=0xffffff0003b43680, mrq=0xffffffffb12efaf0)
 >>     at /usr/src/sys/nfsserver/nfs_serv.c:253
 >> #10 0xffffffff8061549d in nfssvc (td=Variable "td" is not available.
 >> )
 >>     at /usr/src/sys/nfsserver/nfs_syscalls.c:461
 >> #11 0xffffffff8074c8a7 in syscall (frame=0xffffffffb12efc70)
 >>     at /usr/src/sys/amd64/amd64/trap.c:852
 >> ---Type <return> to continue, or q <return> to quit---
 >> #12 0xffffffff80732dcb in Xfast_syscall ()
 >>     at /usr/src/sys/amd64/amd64/exception.S:290
 >> #13 0x00000008006874fc in ?? ()
 >> Previous frame inner to this frame (corrupt stack?)
 >> (kgdb) 
 >>> How-To-Repeat:
 >> I have a group of 7.0-B4 boxes that export /var to a debian etch machine. 
 >> If I enable nfsd and attempt to read the directory, the machine instantly 
 >> panics and reboots. And will keep booting and panic'ing seconds after NFSD 
 >> is restarted. 
 >
 > Can you provide more information about your configuration? nfs_serv.c:253 is
 >
 >        VFS_UNLOCK_GIANT(vfslocked);
 >
 > which should be a NOP in the case of a UFS export.  What is the filesystem 
 > being exported?  It looks like you might be using a Giant-locked filesystem 
 > as your /var.
 >
 > Can you enable INVARIANTS and INVARIANT_SUPPORT in your kernel and reproduce 
 > the panic?  It should give a better debugging panic.
 >
 > Kris
 >
State-Changed-From-To: open->closed 
State-Changed-By: kris 
State-Changed-When: Sat Dec 29 10:35:35 UTC 2007 
State-Changed-Why:  
Submitter is not able to reproduce 

http://www.freebsd.org/cgi/query-pr.cgi?pr=118928 

From: Kris Kennaway <kris@FreeBSD.org>
To: Barkley Vowk <bvowk@math.ualberta.ca>
Cc: freebsd-gnats-submit@FreeBSD.org
Subject: Re: kern/118928: 7.0-BETA4 from yesterday panics when nfs server
 is mounted
Date: Sat, 29 Dec 2007 11:36:13 +0100

 Barkley Vowk wrote:
 > I can't reproduce it with any kernel except the ones that bail. If I 
 > recompile the code with the same GENERIC as the one used to build the ones
 > that bail, it seems to work just fine. But if I install a broken one, 
 > it's an instant panic with exactly the same dump. This is frustrating, 
 > because I repeatably compiled a panicing kernel before I submitted the 
 > PR. Other than some reboots and some single user fsck, nothing has changed!
 > 
 > I'm betting this is either random user or hardware error. Sorry, I 
 > thought I had eliminated that by recompiling broken more than once on 
 > hardware that had been in production.
 
 OK, I'll close the PR or now but please follow up again if it recurs. 
 Thanks!
 
 Kris
 
>Unformatted:
