From garry@NetworkPhysics.COM  Fri Aug 19 20:51:05 2005
Return-Path: <garry@NetworkPhysics.COM>
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 0729A16A421
	for <FreeBSD-gnats-submit@freebsd.org>; Fri, 19 Aug 2005 20:51:05 +0000 (GMT)
	(envelope-from garry@NetworkPhysics.COM)
Received: from NetworkPhysics.COM (fw.networkphysics.com [205.158.104.176])
	by mx1.FreeBSD.org (Postfix) with ESMTP id A919043D45
	for <FreeBSD-gnats-submit@freebsd.org>; Fri, 19 Aug 2005 20:51:04 +0000 (GMT)
	(envelope-from garry@NetworkPhysics.COM)
Received: from focus5.fractal.networkphysics.com (focus5.fractal.networkphysics.com [10.10.0.112])
	by NetworkPhysics.COM (8.12.10/8.12.10) with ESMTP id j7JKo1gb027018
	for <FreeBSD-gnats-submit@freebsd.org>; Fri, 19 Aug 2005 13:51:04 -0700 (PDT)
	(envelope-from garry@NetworkPhysics.COM)
Received: (from garry@localhost)
	by focus5.fractal.networkphysics.com (8.13.1/8.13.1/Submit) id j7INPEnG075206;
	Thu, 18 Aug 2005 16:25:14 -0700 (PDT)
	(envelope-from garry)
Message-Id: <200508182325.j7INPEnG075206@focus5.fractal.networkphysics.com>
Date: Thu, 18 Aug 2005 16:25:14 -0700 (PDT)
From: Garry Belka <garry@NetworkPhysics.COM>
To: FreeBSD-gnats-submit@freebsd.org
Cc:
Subject: [patch] pseudofs: a panic due to sleep with held mutex
X-Send-Pr-Version: 3.113
X-GNATS-Notify:

>Number:         85137
>Category:       kern
>Synopsis:       [pseudofs] [patch] panic due to sleep with held mutex
>Confidential:   no
>Severity:       serious
>Priority:       low
>Responsible:    des
>State:          closed
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Fri Aug 19 21:00:33 GMT 2005
>Closed-Date:    Tue Nov 09 18:31:26 UTC 2010
>Last-Modified:  Tue Nov 09 18:31:26 UTC 2010
>Originator:     Garry Belka
>Release:        FreeBSD 5.4-RELEASE i386
>Organization:
Network Physics
>Environment:
System: FreeBSD tempo.fractal.networkphysics.com NP-5.4-20050728 FreeBSD NP-5.4-20050728 #1: Thu Jul 28 15:28:58 PDT 2005     garry@focus5.fractal.networkphysics.com:/u1/k5/bb/FreeBSD/sys/i386/compile/NPUNI  i386

>Description:
We saw several panics of the same kind on different systems running 5.4-STABLE. The panic was in propagate_priority() and was ultimately traced to vget() call in  pfs_vncache_alloc(). vget() is called under global pfs mutex. When vget() sleeps,
 propagate_priority() in a different thread comes across a sleeping thread that owns a blocked mutex, and that causes a panic.

>How-To-Repeat:
	We saw these panics once every several days per machine in intensive testing scenario under our load. I do not know how to reproduce it easily.

>Fix:
a tentative patch for 5.4-STABLE is attached. In addition to a fix for panic, it includes
changes to switch to LIST_*() macros instead of directly manipulating queue pointers.

I'd be most interested to hear opinion of people experienced with vfs
whether this patch is suitable or what problems they see with it.

In order to apply it to 6.0, I think it might be sufficient to
uncomment XXX-- comments, but I hadn't checked that: the patch also includes some 6.0 fixes from Isilon backported to 5.4, and some of those depend on other 6.0 vfs changes and will fail on 5.4 so they are partially commented out to make it work on 5.4.

--- pseudofs_vncache.patch begins here ---
diff -Naur ../../b54/sys/fs/pseudofs/pseudofs_internal.h sys/fs/pseudofs/pseudofs_internal.h
--- ../../b54/sys/fs/pseudofs/pseudofs_internal.h	2001-10-01 04:26:33.000000000 +0000
+++ sys/fs/pseudofs/pseudofs_internal.h	2005-08-05 20:50:42.000000000 +0000
@@ -43,9 +43,10 @@
 	struct pfs_node	*pvd_pn;
 	pid_t		 pvd_pid;
 	struct vnode	*pvd_vnode;
-	struct pfs_vdata*pvd_prev, *pvd_next;
+        LIST_ENTRY(pfs_vdata)  pvd_link;
 };
 
+
 /*
  * Vnode cache
  */
diff -Naur ../../b54/sys/fs/pseudofs/pseudofs_vncache.c sys/fs/pseudofs/pseudofs_vncache.c
--- ../../b54/sys/fs/pseudofs/pseudofs_vncache.c	2004-08-15 21:58:02.000000000 +0000
+++ sys/fs/pseudofs/pseudofs_vncache.c	2005-08-05 20:50:42.000000000 +0000
@@ -38,6 +38,7 @@
 #include <sys/proc.h>
 #include <sys/sysctl.h>
 #include <sys/vnode.h>
+#include <sys/queue.h>
 
 #include <fs/pseudofs/pseudofs.h>
 #include <fs/pseudofs/pseudofs_internal.h>
@@ -45,7 +46,8 @@
 static MALLOC_DEFINE(M_PFSVNCACHE, "pfs_vncache", "pseudofs vnode cache");
 
 static struct mtx pfs_vncache_mutex;
-static struct pfs_vdata *pfs_vncache;
+static LIST_HEAD(, pfs_vdata) pfs_vncache_list =
+    LIST_HEAD_INITIALIZER(&pfs_vncache_list);
 static eventhandler_tag pfs_exit_tag;
 static void pfs_exit(void *arg, struct proc *p);
 
@@ -106,6 +108,7 @@
 		  struct pfs_node *pn, pid_t pid)
 {
 	struct pfs_vdata *pvd;
+        struct vnode *vnp;
 	int error;
 
 	/*
@@ -113,10 +116,10 @@
 	 * XXX linear search is not very efficient.
 	 */
 	mtx_lock(&pfs_vncache_mutex);
-	for (pvd = pfs_vncache; pvd; pvd = pvd->pvd_next) {
+        LIST_FOREACH(pvd, &pfs_vncache_list, pvd_link) {
 		if (pvd->pvd_pn == pn && pvd->pvd_pid == pid &&
 		    pvd->pvd_vnode->v_mount == mp) {
-			if (vget(pvd->pvd_vnode, 0, curthread) == 0) {
+			if (vget(pvd->pvd_vnode, LK_NOWAIT, curthread) == 0) {
 				++pfs_vncache_hits;
 				*vpp = pvd->pvd_vnode;
 				mtx_unlock(&pfs_vncache_mutex);
@@ -127,6 +130,20 @@
 				return (0);
 			}
 			/* XXX if this can happen, we're in trouble */
+                        /* the vnode is being cleaned.
+                         * need to wait until it's gone
+                         */
+			vnp = pvd->pvd_vnode;
+                        vhold(vnp);
+			mtx_unlock(&pfs_vncache_mutex);
+                        /*XXX-- VOP_LOCK(vnp, LK_EXCLUSIVE, curthread); */
+                        if (vget(vnp, 0, curthread) == 0) {
+                                /* XXX shouldn't happen.  */
+                                vrele(vnp);
+                        }
+                        /*XXX-- VOP_UNLOCK(vnp, 0, curthread); */
+                        vdrop(vnp);
+			mtx_lock(&pfs_vncache_mutex);
 			break;
 		}
 	}
@@ -135,8 +152,6 @@
 
 	/* nope, get a new one */
 	MALLOC(pvd, struct pfs_vdata *, sizeof *pvd, M_PFSVNCACHE, M_WAITOK);
-	if (++pfs_vncache_entries > pfs_vncache_maxentries)
-		pfs_vncache_maxentries = pfs_vncache_entries;
 	error = getnewvnode("pseudofs", mp, pfs_vnodeop_p, vpp);
 	if (error) {
 		FREE(pvd, M_PFSVNCACHE);
@@ -176,12 +191,13 @@
 	if ((pn->pn_flags & PFS_PROCDEP) != 0)
 		(*vpp)->v_vflag |= VV_PROCDEP;
 	pvd->pvd_vnode = *vpp;
+
 	mtx_lock(&pfs_vncache_mutex);
-	pvd->pvd_prev = NULL;
-	pvd->pvd_next = pfs_vncache;
-	if (pvd->pvd_next)
-		pvd->pvd_next->pvd_prev = pvd;
-	pfs_vncache = pvd;
+
+        LIST_INSERT_HEAD(&pfs_vncache_list, pvd, pvd_link);
+	if (++pfs_vncache_entries > pfs_vncache_maxentries)
+		pfs_vncache_maxentries = pfs_vncache_entries;
+
 	mtx_unlock(&pfs_vncache_mutex);
         (*vpp)->v_vnlock->lk_flags |= LK_CANRECURSE;
 	vn_lock(*vpp, LK_RETRY | LK_EXCLUSIVE, curthread);
@@ -199,15 +215,10 @@
 	mtx_lock(&pfs_vncache_mutex);
 	pvd = (struct pfs_vdata *)vp->v_data;
 	KASSERT(pvd != NULL, ("pfs_vncache_free(): no vnode data\n"));
-	if (pvd->pvd_next)
-		pvd->pvd_next->pvd_prev = pvd->pvd_prev;
-	if (pvd->pvd_prev)
-		pvd->pvd_prev->pvd_next = pvd->pvd_next;
-	else
-		pfs_vncache = pvd->pvd_next;
+        LIST_REMOVE(pvd, pvd_link);
+	--pfs_vncache_entries;
 	mtx_unlock(&pfs_vncache_mutex);
 
-	--pfs_vncache_entries;
 	FREE(pvd, M_PFSVNCACHE);
 	vp->v_data = NULL;
 	return (0);
@@ -222,6 +233,8 @@
 	struct pfs_vdata *pvd;
 	struct vnode *vnp;
 
+        if (LIST_EMPTY(&pfs_vncache_list))
+            return;
 	mtx_lock(&Giant);
 	/*
 	 * This is extremely inefficient due to the fact that vgone() not
@@ -237,16 +250,18 @@
 	 * this particular case would be a BST sorted by PID.
 	 */
 	mtx_lock(&pfs_vncache_mutex);
-	pvd = pfs_vncache;
-	while (pvd != NULL) {
+    restart:
+        LIST_FOREACH(pvd, &pfs_vncache_list, pvd_link) {
 		if (pvd->pvd_pid == p->p_pid) {
 			vnp = pvd->pvd_vnode;
+                        vhold(vnp);
 			mtx_unlock(&pfs_vncache_mutex);
+                        /*XXX-- VOP_LOCK(vnp, LK_EXCLUSIVE, curthread); */
 			vgone(vnp);
+                        /*XXX-- VOP_UNLOCK(vnp, 0, curthread); */
+                        vdrop(vnp);
 			mtx_lock(&pfs_vncache_mutex);
-			pvd = pfs_vncache;
-		} else {
-			pvd = pvd->pvd_next;
+                        goto restart;
 		}
 	}
 	mtx_unlock(&pfs_vncache_mutex);
@@ -267,16 +282,19 @@
 	pn->pn_flags |= PFS_DISABLED;
 	/* XXX see comment above nearly identical code in pfs_exit() */
 	mtx_lock(&pfs_vncache_mutex);
-	pvd = pfs_vncache;
-	while (pvd != NULL) {
+    restart:
+        LIST_FOREACH(pvd, &pfs_vncache_list, pvd_link) {
 		if (pvd->pvd_pn == pn) {
 			vnp = pvd->pvd_vnode;
+                        vhold(vnp);
 			mtx_unlock(&pfs_vncache_mutex);
+                        /*XXX-- VOP_LOCK(vnp, LK_EXCLUSIVE, curthread); */
 			vgone(vnp);
+                        /*XXX-- VOP_UNLOCK(vnp, 0, curthread); */
+                        vdrop(vnp);
+
 			mtx_lock(&pfs_vncache_mutex);
-			pvd = pfs_vncache;
-		} else {
-			pvd = pvd->pvd_next;
+                        goto restart;
 		}
 	}
 	mtx_unlock(&pfs_vncache_mutex);
--- pseudofs_vncache.patch ends here ---


>Release-Note:
>Audit-Trail:
Responsible-Changed-From-To: freebsd-bugs->des 
Responsible-Changed-By: rwatson 
Responsible-Changed-When: Sat Mar 8 20:10:48 UTC 2008 
Responsible-Changed-Why:  
Assign to pseudofs maintainer; this is a fairly well-aged PR so it could 
be that this is already resolved in more recent versions. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=85137 
State-Changed-From-To: open->feedback 
State-Changed-By: emaste 
State-Changed-When: Tue Jan 26 14:52:59 UTC 2010 
State-Changed-Why:  
Feedback requested.  I believe this does not occur anymore. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=85137 

From: Ed Maste <emaste@freebsd.org>
To: bug-followup@FreeBSD.org, garry@NetworkPhysics.COM
Cc:  
Subject: Re: kern/85137: [pseudofs] [patch] panic due to sleep with held mutex
Date: Tue, 26 Jan 2010 09:52:36 -0500

 Has anyone seen this issue on more recent releases?  I ran into it
 several times at work on various 5.x trees, but did not see it again
 on any of our 6.1 or later trees.  I suspect it should be closed.
 
 -Ed

From: =?utf-8?Q?Dag-Erling_Sm=C3=B8rgrav?= <des@des.no>
To: Ed Maste <emaste@freebsd.org>
Cc: freebsd-gnats-submit@freebsd.org
Subject: Re: kern/85137: [pseudofs] [patch] panic due to sleep with held mutex
Date: Tue, 26 Jan 2010 16:16:15 +0100

 It was most likely fixed last spring along with a slew of other locking
 issues.
 
 DES
 --=20
 Dag-Erling Sm=C3=B8rgrav - des@des.no
State-Changed-From-To: feedback->closed 
State-Changed-By: emaste 
State-Changed-When: Tue Nov 9 18:30:20 UTC 2010 
State-Changed-Why:  
des@ thinks this was likely fixed and there's been no followup stating 
that it's still observerd. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=85137 
>Unformatted:
