From nobody@FreeBSD.org  Mon Apr  4 07:05:30 2011
Return-Path: <nobody@FreeBSD.org>
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 5CE7F106564A
	for <freebsd-gnats-submit@FreeBSD.org>; Mon,  4 Apr 2011 07:05:30 +0000 (UTC)
	(envelope-from nobody@FreeBSD.org)
Received: from red.freebsd.org (red.freebsd.org [IPv6:2001:4f8:fff6::22])
	by mx1.freebsd.org (Postfix) with ESMTP id 498A48FC0C
	for <freebsd-gnats-submit@FreeBSD.org>; Mon,  4 Apr 2011 07:05:30 +0000 (UTC)
Received: from red.freebsd.org (localhost [127.0.0.1])
	by red.freebsd.org (8.14.4/8.14.4) with ESMTP id p3475UKo075761
	for <freebsd-gnats-submit@FreeBSD.org>; Mon, 4 Apr 2011 07:05:30 GMT
	(envelope-from nobody@red.freebsd.org)
Received: (from nobody@localhost)
	by red.freebsd.org (8.14.4/8.14.4/Submit) id p3475UBs075760;
	Mon, 4 Apr 2011 07:05:30 GMT
	(envelope-from nobody)
Message-Id: <201104040705.p3475UBs075760@red.freebsd.org>
Date: Mon, 4 Apr 2011 07:05:30 GMT
From: Nikita <niakrisn@gmail.com>
To: freebsd-gnats-submit@FreeBSD.org
Subject: Kernel panic under concurrent access over NFS
X-Send-Pr-Version: www-3.1
X-GNATS-Notify:

>Number:         156168
>Category:       kern
>Synopsis:       [nfs] [panic] Kernel panic under concurrent access over NFS
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    rmacklem
>State:          closed
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Mon Apr 04 07:10:10 UTC 2011
>Closed-Date:    Mon Nov 21 16:19:30 UTC 2011
>Last-Modified:  Tue Nov 22 01:40:06 UTC 2011
>Originator:     Nikita
>Release:        FreeBSD 8.2-RELEASE #0
>Organization:
>Environment:
FreeBSD achilles.hstg.ru 8.2-RELEASE FreeBSD 8.2-RELEASE #0: Fri Feb 18 02:24:46 UTC 2011     root@almeida.cse.buffalo.edu:/usr/obj/usr/src/sys/GENERIC  i386
>Description:
I have 3 apache22-itk web servers with DOCUMENT_ROOT shared over NFS.
Sometimes i get kernel panic:

Fatal trap 12: page fault while in kernel mode
cpuid = 3; apic id = 03
fault virtual address   = 0x10
fault code              = supervisor write, page not present
instruction pointer     = 0x20:0xc0aa3236
stack pointer           = 0x28:0xea1ae528
frame pointer           = 0x28:0xea1ae5f4
code segment            = base 0x0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, def32 1, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 78988 (httpd)
trap number             = 12
panic: page fault
cpuid = 3
KDB: stack backtrace:
#0 0xc08e0d07 at kdb_backtrace+0x47
#1 0xc08b1dc7 at panic+0x117
#2 0xc0be4b43 at trap_fatal+0x323
#3 0xc0be4dc0 at trap_pfault+0x270
#4 0xc0be5305 at trap+0x465
#5 0xc0bcbebc at calltrap+0x6
#6 0xc0aa89c7 at clnt_call_private+0xf7
#7 0xc0a97dcb at nlm_get_rpc+0x19b
#8 0xc0a98379 at nlm_host_get_rpc+0x169
#9 0xc0a949eb at nlm_clearlock+0xeb
#10 0xc0a95d2a at nlm_advlock_internal+0x9ca
#11 0xc0a9651a at nlm_advlock+0x3a
#12 0xc0a80239 at nfs_advlock+0xa9
#13 0xc0c038c7 at VOP_ADVLOCK_APV+0x47
#14 0xc0875dee at closef+0xfe
#15 0xc087653f at kern_close+0x17f
#16 0xc087661a at close+0x1a
#17 0xc08eca39 at syscallenter+0x329
Uptime: 6d22h54m7s
Physical memory: 3059 MB
Dumping 335 MB: 320 304 288 272 256 240 224 208 192 176 160 144 128 112 96 80 64 48 32 16
>How-To-Repeat:

>Fix:


>Release-Note:
>Audit-Trail:
Responsible-Changed-From-To: freebsd-bugs->freebsd-fs 
Responsible-Changed-By: linimon 
Responsible-Changed-When: Sat Apr 9 19:59:33 UTC 2011 
Responsible-Changed-Why:  
Over to maintainer(s). 

http://www.freebsd.org/cgi/query-pr.cgi?pr=156168 

From: Mark Saad <nonesuch@longcount.org>
To: bug-followup@FreeBSD.org, niakrisn@gmail.com
Cc:  
Subject: Re: kern/156168: [nfs] [panic] Kernel panic under concurrent access
 over NFS
Date: Thu, 29 Sep 2011 11:32:12 -0400

 All
   I am seeing a similar crash on 7.3-RELEASE-p2 amd64 when using
 apache-1.3.34 with accf_httpd and a nfs docroot
 The servers that have crashed are all FreeBSD 7.3-RELEASE amd64.
 Hardware is HP Dl145 g2
 They have 2G of ram and 2G swap with one single core opteron cpu.
 
 
 We are using the following sysctls .
 
 kern.ipc.maxsockbuf=2097152
 kern.ipc.nmbclusters=32768
 kern.ipc.somaxconn=1024
 kern.maxfiles=131072
 kern.maxfilesperproc=32768
 net.inet.tcp.inflight.enable=0
 net.inet.tcp.path_mtu_discovery=0
 net.inet.tcp.recvbuf_inc=524288
 net.inet.tcp.recvbuf_max=8388608
 net.inet.tcp.recvspace=32768
 net.inet.tcp.sendbuf_inc=16384
 net.inet.tcp.sendbuf_max=8388608
 net.inet.tcp.sendspace=32768
 net.inet.udp.recvspace=42080
 net.isr.direct=1
 vm.pmap.shpgperproc=600
 
 
 Up time prior to the crash was not the other system was up for 11 days
 this one was 6 days.
 
 Here is the contents of my crash
 
 
 [root@web29 /var/crash]# kgdb /boot/kernel/kernel /var/crash/vmcore.0
 GNU gdb 6.1.1 [FreeBSD]
 Copyright 2004 Free Software Foundation, Inc.
 GDB is free software, covered by the GNU General Public License, and you are
 welcome to change it and/or distribute copies of it under certain conditions.
 Type "show copying" to see the conditions.
 There is absolutely no warranty for GDB.  Type "show warranty" for details.
 This GDB was configured as "amd64-marcel-freebsd"...
 
 Unread portion of the kernel message buffer:
 
 
 Fatal trap 12: page fault while in kernel mode
 cpuid = 0; apic id = 00
 fault virtual address   = 0x258
 fault code              = supervisor read data, page not present
 instruction pointer     = 0x8:0xffffffff8051a66d
 stack pointer           = 0x10:0xffffff803e69b1c0
 frame pointer           = 0x10:0xffffff0001b50ae0
 code segment            = base 0x0, limit 0xfffff, type 0x1b
                         = DPL 0, pres 1, long 1, def32 0, gran 1
 processor eflags        = interrupt enabled, resume, IOPL = 0
 current process         = 9336 (libhttpd.ep)
 trap number             = 12
 panic: page fault
 cpuid = 0
 Uptime: 6d5h18m39s
 Physical memory: 2034 MB
 Dumping 1451 MB: 1436 1420 1404 1388 1372 1356 1340 1324 1308 1292
 1276 1260 1244 1228 1212 1196 1180 1164 1148 1132 1116 1100 1084 1068
 1052 1036 1020 1004 988 972 956 940 924 908 892 876 860 844 828 812
 796 780 764 748 732 716 700 684 668 652 636 620 604 588 572 556 540
 524 508 492 476 460 444 428 412 396 380 364 348 332 316 300 284 268
 252 236 220 204 188 172 156 140 124 108 92 76 60 44 28 12
 
 Reading symbols from /boot/kernel/accf_http.ko...Reading symbols from
 /boot/kernel/accf_http.ko.symbols...done.
 done.
 Loaded symbols for /boot/kernel/accf_http.ko
 #0  doadump () at pcpu.h:195
 195     pcpu.h: No such file or directory.
         in pcpu.h
 (kgdb) bt
 #0  doadump () at pcpu.h:195
 #1  0x0000000000000004 in ?? ()
 #2  0xffffffff805285f9 in boot (howto=260) at
 /usr/src/sys/kern/kern_shutdown.c:418
 #3  0xffffffff80528a02 in panic (fmt=0x104 <Address 0x104 out of
 bounds>) at /usr/src/sys/kern/kern_shutdown.c:574
 #4  0xffffffff807ec813 in trap_fatal (frame=0xffffff0001b50ae0,
 eva=Variable "eva" is not available.
 ) at /usr/src/sys/amd64/amd64/trap.c:777
 #5  0xffffffff807ecbe5 in trap_pfault (frame=0xffffff803e69b110,
 usermode=0) at /usr/src/sys/amd64/amd64/trap.c:693
 #6  0xffffffff807ed50c in trap (frame=0xffffff803e69b110) at
 /usr/src/sys/amd64/amd64/trap.c:464
 #7  0xffffffff807d614e in calltrap () at
 /usr/src/sys/amd64/amd64/exception.S:218
 #8  0xffffffff8051a66d in _mtx_lock_sleep (m=0xffffff002f3d7a80,
 tid=18446742974226565856, opts=Variable "opts" is not available.
 )
     at /usr/src/sys/kern/kern_mutex.c:339
 #9  0xffffffff80701f60 in clnt_dg_create (so=0xffffff00017755a0,
 svcaddr=0xffffff803e69b310, program=100000, version=4, sendsz=Variable
 "sendsz" is not available.
 )
     at /usr/src/sys/rpc/clnt_dg.c:259
 #10 0xffffffff806e97c9 in nlm_get_rpc (sa=Variable "sa" is not available.
 ) at /usr/src/sys/nlm/nlm_prot_impl.c:327
 #11 0xffffffff806e9d39 in nlm_host_get_rpc (host=0xffffff0001705000)
 at /usr/src/sys/nlm/nlm_prot_impl.c:1199
 #12 0xffffffff806e680f in nlm_clearlock (host=0xffffff0001705000,
 ext=0xffffff803e69b9a0, vers=4, timo=0xffffff803e69b9d0,
     retries=2147483647, vp=0xffffff004881edc8, op=2,
 fl=0xffffff803e69bac0, flags=64, svid=9336, fhlen=32,
 fh=0xffffff803e69b750,
     size=689) at /usr/src/sys/nlm/nlm_advlock.c:943
 #13 0xffffffff806e7801 in nlm_advlock_internal (vp=0xffffff004881edc8,
 id=Variable "id" is not available.
 ) at /usr/src/sys/nlm/nlm_advlock.c:355
 #14 0xffffffff806e8166 in nlm_advlock (ap=Variable "ap" is not available.
 ) at /usr/src/sys/nlm/nlm_advlock.c:392
 #15 0xffffffff806ced28 in nfs_advlock (ap=0xffffff803e69ba90) at
 /usr/src/sys/nfsclient/nfs_vnops.c:3153
 #16 0xffffffff804f40e2 in closef (fp=0xffffff0073716d80,
 td=0xffffff0001b50ae0) at vnode_if.h:1036
 #17 0xffffffff804f462b in kern_close (td=0xffffff0001b50ae0,
 fd=Variable "fd" is not available.
 ) at /usr/src/sys/kern/kern_descrip.c:1125
 #18 0xffffffff807ece67 in syscall (frame=0xffffff803e69bc80) at
 /usr/src/sys/amd64/amd64/trap.c:920
 #19 0xffffffff807d635b in Xfast_syscall () at
 /usr/src/sys/amd64/amd64/exception.S:339
 #20 0x00000008009c5b1c in ?? ()
 Previous frame inner to this frame (corrupt stack?)
 
 -- 
 mark saad | nonesuch@longcount.org
State-Changed-From-To: open->feedback 
State-Changed-By: rmacklem 
State-Changed-When: Thu Oct 20 22:42:03 UTC 2011 
State-Changed-Why:  

I have sent the person that reported this a patch to test 
and am waiting for feedback. I've taken responsibility for this. 


Responsible-Changed-From-To: freebsd-fs->rmacklem 
Responsible-Changed-By: rmacklem 
Responsible-Changed-When: Thu Oct 20 22:42:03 UTC 2011 
Responsible-Changed-Why:  

I have sent the person that reported this a patch for testing 
and will update the status when I hear back from them. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=156168 

From: dfilter@FreeBSD.ORG (dfilter service)
To: bug-followup@FreeBSD.org
Cc:  
Subject: Re: kern/156168: commit references a PR
Date: Thu,  3 Nov 2011 14:38:17 +0000 (UTC)

 Author: rmacklem
 Date: Thu Nov  3 14:38:03 2011
 New Revision: 227059
 URL: http://svn.freebsd.org/changeset/base/227059
 
 Log:
   Both a crash reported on freebsd-current on Oct. 18 under the
   subject heading "mtx_lock() of destroyed mutex on NFS" and
   PR# 156168 appear to be caused by clnt_dg_destroy() closing
   down the socket prematurely. When to close down the socket
   is controlled by a reference count (cs_refs), but clnt_dg_create()
   checks for sb_upcall being non-NULL to decide if a new socket
   is needed. I believe the crashes were caused by the following race:
     clnt_dg_destroy() finds cs_refs == 0 and decides to delete socket
     clnt_dg_destroy() then loses race with clnt_dg_create() for
       acquisition of the SOCKBUF_LOCK()
     clnt_dg_create() finds sb_upcall != NULL and increments cs_refs to 1
     clnt_dg_destroy() then acquires SOCKBUF_LOCK(), sets sb_upcall to
       NULL and destroys socket
   
   This patch fixes the above race by changing clnt_dg_destroy() so
   that it acquires SOCKBUF_LOCK() before testing cs_refs.
   
   Tested by:	bz
   PR:		156168
   Reviewed by:	dfr
   MFC after:	2 weeks
 
 Modified:
   head/sys/rpc/clnt_dg.c
 
 Modified: head/sys/rpc/clnt_dg.c
 ==============================================================================
 --- head/sys/rpc/clnt_dg.c	Thu Nov  3 14:36:56 2011	(r227058)
 +++ head/sys/rpc/clnt_dg.c	Thu Nov  3 14:38:03 2011	(r227059)
 @@ -1001,12 +1001,12 @@ clnt_dg_destroy(CLIENT *cl)
  	cs = cu->cu_socket->so_rcv.sb_upcallarg;
  	clnt_dg_close(cl);
  
 +	SOCKBUF_LOCK(&cu->cu_socket->so_rcv);
  	mtx_lock(&cs->cs_lock);
  
  	cs->cs_refs--;
  	if (cs->cs_refs == 0) {
  		mtx_unlock(&cs->cs_lock);
 -		SOCKBUF_LOCK(&cu->cu_socket->so_rcv);
  		soupcall_clear(cu->cu_socket, SO_RCV);
  		clnt_dg_upcallsdone(cu->cu_socket, cs);
  		SOCKBUF_UNLOCK(&cu->cu_socket->so_rcv);
 @@ -1015,6 +1015,7 @@ clnt_dg_destroy(CLIENT *cl)
  		lastsocketref = TRUE;
  	} else {
  		mtx_unlock(&cs->cs_lock);
 +		SOCKBUF_UNLOCK(&cu->cu_socket->so_rcv);
  		lastsocketref = FALSE;
  	}
  
 _______________________________________________
 svn-src-all@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/svn-src-all
 To unsubscribe, send any mail to "svn-src-all-unsubscribe@freebsd.org"
 
State-Changed-From-To: feedback->closed 
State-Changed-By: rmacklem 
State-Changed-When: Mon Nov 21 16:17:48 UTC 2011 
State-Changed-Why:  

I believe this bug is fixed by r227059 which has been MFC'd 
to stable/8 r227601. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=156168 

From: dfilter@FreeBSD.ORG (dfilter service)
To: bug-followup@FreeBSD.org
Cc:  
Subject: Re: kern/156168: commit references a PR
Date: Tue, 22 Nov 2011 01:33:17 +0000 (UTC)

 Author: rmacklem
 Date: Tue Nov 22 01:32:57 2011
 New Revision: 227810
 URL: http://svn.freebsd.org/changeset/base/227810
 
 Log:
   MFC: r227059
   Both a crash reported on freebsd-current on Oct. 18 under the
   subject heading "mtx_lock() of destroyed mutex on NFS" and
   PR# 156168 appear to be caused by clnt_dg_destroy() closing
   down the socket prematurely. When to close down the socket
   is controlled by a reference count (cs_refs), but clnt_dg_create()
   checks for sb_upcall being non-NULL to decide if a new socket
   is needed. I believe the crashes were caused by the following race:
     clnt_dg_destroy() finds cs_refs == 0 and decides to delete socket
     clnt_dg_destroy() then loses race with clnt_dg_create() for
       acquisition of the SOCKBUF_LOCK()
     clnt_dg_create() finds sb_upcall != NULL and increments cs_refs to 1
     clnt_dg_destroy() then acquires SOCKBUF_LOCK(), sets sb_upcall to
       NULL and destroys socket
   
   This patch fixes the above race by changing clnt_dg_destroy() so
   that it acquires SOCKBUF_LOCK() before testing cs_refs.
   This is a slightly modified patch for stable/7. It fixes the
   above race, although others still exist, since some patches
   such as r193272 cannot be MFC'd.
   
   Tested by:	nonesuch at longcount.org (Mark Saad)
   PR:		kern/156168
 
 Modified:
   stable/7/sys/rpc/clnt_dg.c
 Directory Properties:
   stable/7/sys/   (props changed)
   stable/7/sys/cddl/contrib/opensolaris/   (props changed)
   stable/7/sys/contrib/dev/acpica/   (props changed)
   stable/7/sys/contrib/pf/   (props changed)
 
 Modified: stable/7/sys/rpc/clnt_dg.c
 ==============================================================================
 --- stable/7/sys/rpc/clnt_dg.c	Tue Nov 22 00:35:30 2011	(r227809)
 +++ stable/7/sys/rpc/clnt_dg.c	Tue Nov 22 01:32:57 2011	(r227810)
 @@ -811,18 +811,22 @@ clnt_dg_destroy(CLIENT *cl)
  	while (cu->cu_threads)
  		msleep(cu, &cs->cs_lock, 0, "rpcclose", 0);
  
 +	mtx_unlock(&cs->cs_lock);
 +	SOCKBUF_LOCK(&cu->cu_socket->so_rcv);
 +	mtx_lock(&cs->cs_lock);
  	cs->cs_refs--;
  	if (cs->cs_refs == 0) {
 -		mtx_destroy(&cs->cs_lock);
 -		SOCKBUF_LOCK(&cu->cu_socket->so_rcv);
 +		mtx_unlock(&cs->cs_lock);
  		cu->cu_socket->so_upcallarg = NULL;
  		cu->cu_socket->so_upcall = NULL;
  		cu->cu_socket->so_rcv.sb_flags &= ~SB_UPCALL;
  		SOCKBUF_UNLOCK(&cu->cu_socket->so_rcv);
 +		mtx_destroy(&cs->cs_lock);
  		mem_free(cs, sizeof(*cs));
  		lastsocketref = TRUE;
  	} else {
  		mtx_unlock(&cs->cs_lock);
 +		SOCKBUF_UNLOCK(&cu->cu_socket->so_rcv);
  		lastsocketref = FALSE;
  	}
  
 _______________________________________________
 svn-src-all@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/svn-src-all
 To unsubscribe, send any mail to "svn-src-all-unsubscribe@freebsd.org"
 
>Unformatted:
