From joshua@green.shallow.net  Fri Apr 19 21:30:35 2002
Return-Path: <joshua@green.shallow.net>
Received: from green.shallow.net (c16486.smelb1.vic.optusnet.com.au [210.49.224.105])
	by hub.freebsd.org (Postfix) with ESMTP id 0308E37B41E
	for <FreeBSD-gnats-submit@freebsd.org>; Fri, 19 Apr 2002 21:30:34 -0700 (PDT)
Received: by green.shallow.net (Postfix, from userid 1001)
	id C873A3E2A; Sat, 20 Apr 2002 14:30:30 +1000 (EST)
Message-Id: <20020420043030.C873A3E2A@green.shallow.net>
Date: Sat, 20 Apr 2002 14:30:30 +1000 (EST)
From: Joshua Goodall <joshua@roughtrade.net>
Reply-To: Joshua Goodall <joshua@roughtrade.net>
To: FreeBSD-gnats-submit@freebsd.org
Cc:
Subject: nullfs broken by locking changes in -current
X-Send-Pr-Version: 3.113
X-GNATS-Notify:

>Number:         37270
>Category:       kern
>Synopsis:       nullfs broken by locking changes in -current
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    tjr
>State:          closed
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Fri Apr 19 21:40:01 PDT 2002
>Closed-Date:    Mon Jul 07 15:57:43 PDT 2003
>Last-Modified:  Mon Jul 07 15:57:43 PDT 2003
>Originator:     Joshua Goodall
>Release:        FreeBSD 5.0-CURRENT i386
>Organization:
>Environment:
System: FreeBSD toxic.myrrh.net 5.0-CURRENT FreeBSD 5.0-CURRENT #1: Sat Apr 20 03:25:54 EST 2002     joshua@green.shallow.net:/usr/obj/usr/current/sys/TOXIC  i386
>Description:

The change to default to shared looks during a namei lookup causes
a deadlock in nullfs, which doesn't propogate the flags to functions
in null_subr.c

Thus nullfs tries to get an exclusive recursive lock on a v_vnlock
that is already locked shared during the same traversal, and hangs.

>How-To-Repeat:

# mount -t nullfs /var /null
# cd /null/tmp
(never returns, deadlocked in wchan "inode")

reproduced on a) a sony vaio and b) vmware

>Fix:

This was fun[tm] to debug.

The ugly workaround is "options LOOKUP_EXCLUSIVE".

A suggested fix is below. It passes the flags down during appropriate
operations, defaulting to exclusive (flags = 0). This fixes the
problem on both crashboxes and survives a basic fs stressing (multiple
parallel finds, cp -R's and postmarks to/from a given nullmount for
an hour).

A similar fix works for my WIP morphfs layer.

Joshua

--- nullfs-locking.diff begins here ---
Index: null.h
===================================================================
RCS file: /cvs/src/sys/fs/nullfs/null.h,v
retrieving revision 1.15
diff -u -r1.15 null.h
--- null.h	5 Sep 2000 09:02:07 -0000	1.15
+++ null.h	19 Apr 2002 16:05:12 -0000
@@ -63,7 +63,8 @@
 
 int nullfs_init(struct vfsconf *vfsp);
 int nullfs_uninit(struct vfsconf *vfsp);
-int null_node_create(struct mount *mp, struct vnode *target, struct vnode **vpp);
+int null_node_create(struct mount *mp, struct vnode *target,
+			struct vnode **vpp, int flags);
 int null_bypass(struct vop_generic_args *ap);
 
 #ifdef DIAGNOSTIC
Index: null_subr.c
===================================================================
RCS file: /cvs/src/sys/fs/nullfs/null_subr.c,v
retrieving revision 1.32
diff -u -r1.32 null_subr.c
--- null_subr.c	12 Sep 2001 08:37:19 -0000	1.32
+++ null_subr.c	19 Apr 2002 17:50:08 -0000
@@ -46,6 +46,7 @@
 #include <sys/mount.h>
 #include <sys/proc.h>
 #include <sys/vnode.h>
+#include <sys/namei.h>
 
 #include <fs/nullfs/null.h>
 
@@ -71,9 +72,9 @@
 MALLOC_DEFINE(M_NULLFSNODE, "NULLFS node", "NULLFS vnode private part");
 
 static int	null_node_alloc(struct mount *mp, struct vnode *lowervp,
-				     struct vnode **vpp);
+				     struct vnode **vpp, int flags);
 static struct vnode *
-		null_node_find(struct mount *mp, struct vnode *lowervp);
+		null_node_find(struct mount *mp, struct vnode *lowervp, int flags);
 
 /*
  * Initialise cache headers
@@ -106,14 +107,16 @@
  * Lower vnode should be locked on entry and will be left locked on exit.
  */
 static struct vnode *
-null_node_find(mp, lowervp)
+null_node_find(mp, lowervp, flags)
 	struct mount *mp;
 	struct vnode *lowervp;
+	int flags;
 {
 	struct thread *td = curthread;	/* XXX */
 	struct null_node_hashhead *hd;
 	struct null_node *a;
 	struct vnode *vp;
+	int error;
 
 	/*
 	 * Find hash base, and then search the (two-way) linked
@@ -133,7 +136,15 @@
 			 * stuff, but we don't want to lock
 			 * the lower node.
 			 */
-			if (vget(vp, LK_EXCLUSIVE | LK_CANRECURSE, td)) {
+#ifndef LOOKUP_EXCLUSIVE
+			if ((flags & ISLASTCN) && (flags & LOCKSHARED))
+				error = vget(vp, LK_SHARED, td);
+			else
+				error = vget(vp, LK_EXCLUSIVE | LK_CANRECURSE, td);
+#else
+			error = vget(vp, LK_EXCLUSIVE | LK_CANRECURSE, td);
+#endif
+			if (error) {
 				printf ("null_node_find: vget failed.\n");
 				goto loop;
 			};
@@ -157,10 +168,11 @@
  * Maintain a reference to (lowervp).
  */
 static int
-null_node_alloc(mp, lowervp, vpp)
+null_node_alloc(mp, lowervp, vpp, flags)
 	struct mount *mp;
 	struct vnode *lowervp;
 	struct vnode **vpp;
+	int flags;
 {
 	struct thread *td = curthread;	/* XXX */
 	struct null_node_hashhead *hd;
@@ -192,7 +204,7 @@
 	 * check to see if someone else has beaten us to it.
 	 * (We could have slept in MALLOC.)
 	 */
-	othervp = null_node_find(mp, lowervp);
+	othervp = null_node_find(mp, lowervp, flags);
 	if (othervp) {
 		vp->v_data = NULL;
 		FREE(xp, M_NULLFSNODE);
@@ -213,7 +225,14 @@
 
 	lockmgr(&null_hashlock, LK_EXCLUSIVE, NULL, td);
 	vp->v_vnlock = lowervp->v_vnlock;
+#ifndef LOOKUP_EXCLUSIVE
+	if ((flags & ISLASTCN) && (flags & LOCKSHARED))
+		error = VOP_LOCK(vp, LK_SHARED | LK_THISLAYER, td);
+	else
+		error = VOP_LOCK(vp, LK_EXCLUSIVE | LK_THISLAYER, td);
+#else
 	error = VOP_LOCK(vp, LK_EXCLUSIVE | LK_THISLAYER, td);
+#endif
 	if (error)
 		panic("null_node_alloc: can't lock new vnode\n");
 
@@ -231,14 +250,15 @@
  * vnode which contains a reference to the lower vnode.
  */
 int
-null_node_create(mp, lowervp, newvpp)
+null_node_create(mp, lowervp, newvpp, flags)
 	struct mount *mp;
 	struct vnode *lowervp;
 	struct vnode **newvpp;
+	int flags;
 {
 	struct vnode *aliasvp;
 
-	aliasvp = null_node_find(mp, lowervp);
+	aliasvp = null_node_find(mp, lowervp, flags);
 	if (aliasvp) {
 		/*
 		 * null_node_find has taken another reference
@@ -259,7 +279,7 @@
 		/*
 		 * Make new vnode reference the null_node.
 		 */
-		error = null_node_alloc(mp, lowervp, &aliasvp);
+		error = null_node_alloc(mp, lowervp, &aliasvp, flags);
 		if (error)
 			return error;
 
Index: null_vfsops.c
===================================================================
RCS file: /cvs/src/sys/fs/nullfs/null_vfsops.c,v
retrieving revision 1.51
diff -u -r1.51 null_vfsops.c
--- null_vfsops.c	17 Mar 2002 01:25:41 -0000	1.51
+++ null_vfsops.c	19 Apr 2002 15:59:32 -0000
@@ -171,7 +171,7 @@
 	 * Save reference.  Each mount also holds
 	 * a reference on the root vnode.
 	 */
-	error = null_node_create(mp, lowerrootvp, &vp);
+	error = null_node_create(mp, lowerrootvp, &vp, 0);
 	/*
 	 * Unlock the node (either the lower or the alias)
 	 */
@@ -356,7 +356,7 @@
 	if (error)
 		return (error);
 
-	return (null_node_create(mp, *vpp, vpp));
+	return (null_node_create(mp, *vpp, vpp, flags));
 }
 
 static int
@@ -370,7 +370,7 @@
 	if (error)
 		return (error);
 
-	return (null_node_create(mp, *vpp, vpp));
+	return (null_node_create(mp, *vpp, vpp, 0));
 }
 
 static int
Index: null_vnops.c
===================================================================
RCS file: /cvs/src/sys/fs/nullfs/null_vnops.c,v
retrieving revision 1.50
diff -u -r1.50 null_vnops.c
--- null_vnops.c	12 Sep 2001 08:37:19 -0000	1.50
+++ null_vnops.c	19 Apr 2002 17:00:42 -0000
@@ -346,7 +346,7 @@
 		vppp = VOPARG_OFFSETTO(struct vnode***,
 				 descp->vdesc_vpp_offset,ap);
 		if (*vppp)
-			error = null_node_create(old_vps[0]->v_mount, **vppp, *vppp);
+			error = null_node_create(old_vps[0]->v_mount, **vppp, *vppp, 0);
 	}
 
  out:
@@ -400,7 +400,7 @@
 			VREF(dvp);
 			vrele(lvp);
 		} else {
-			error = null_node_create(dvp->v_mount, lvp, &vp);
+			error = null_node_create(dvp->v_mount, lvp, &vp, flags);
 			if (error == 0)
 				*ap->a_vpp = vp;
 		}
--- nullfs-locking.diff ends here ---


>Release-Note:
>Audit-Trail:
Responsible-Changed-From-To: freebsd-bugs->jeffr 
Responsible-Changed-By: alfred 
Responsible-Changed-When: Fri Apr 19 21:53:32 PDT 2002 
Responsible-Changed-Why:  


http://www.freebsd.org/cgi/query-pr.cgi?pr=37270 
Responsible-Changed-From-To: jeffr->jeff 
Responsible-Changed-By: alfred 
Responsible-Changed-When: Fri Apr 19 21:54:10 PDT 2002 
Responsible-Changed-Why:  
Jeff did the "lookup shared" thing. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=37270 

From: Joshua Goodall <joshua@roughtrade.net>
To: FreeBSD-gnats-submit@FreeBSD.org, jeff@FreeBSD.org
Cc:  
Subject: Re: kern/37270: nullfs broken by locking changes in -current
Date: Sun, 21 Apr 2002 02:58:50 +1000

 Somehow I was working against r1.50 of null_vnops.c rather than
 r1.51.  There are further issues in r1.51 in particular with draining
 locks... ugh.  Consider this stalled whilst I work on it.
 
 joshua

From: Jeff Roberson <jroberson@chesapeake.net>
To: Joshua Goodall <joshua@roughtrade.net>
Cc: FreeBSD-gnats-submit@FreeBSD.org, <jeff@FreeBSD.org>
Subject: Re: kern/37270: nullfs broken by locking changes in -current
Date: Fri, 3 May 2002 23:13:33 -0400 (EDT)

 On Sun, 21 Apr 2002, Joshua Goodall wrote:
 
 > Somehow I was working against r1.50 of null_vnops.c rather than
 > r1.51.  There are further issues in r1.51 in particular with draining
 > locks... ugh.  Consider this stalled whilst I work on it.
 >
 > joshua
 >
 
 What is the current status of this?
 
 Thanks,
 Jeff
 

From: Joshua Goodall <joshua@roughtrade.net>
To: Jeff Roberson <jroberson@chesapeake.net>
Cc: FreeBSD-gnats-submit@FreeBSD.org
Subject: Re: kern/37270: nullfs broken by locking changes in -current
Date: Sat, 4 May 2002 13:48:28 +1000

 On Fri, May 03, 2002 at 11:13:33PM -0400, Jeff Roberson wrote:
 > On Sun, 21 Apr 2002, Joshua Goodall wrote:
 > 
 > > Somehow I was working against r1.50 of null_vnops.c rather than
 > > r1.51.  There are further issues in r1.51 in particular with draining
 > > locks... ugh.  Consider this stalled whilst I work on it.
 > >
 > > joshua
 > >
 > 
 > What is the current status of this?
 
 The patch in the PR makes nullfs usable but not stable. Executing the following
 will still consistently deadlock it after about ten minutes on my vaio:
 
 mount -t nullfs /tmp /null
 cd /tmp
 ( while true; do cp -Rp /etc a >& /dev/null; done ) &
 ( while true; do cp -Rp /etc b >& /dev/null; done ) &
 ( while true; do find . >& /dev/null; done ) &
 ( while true; do find . >& /dev/null; done ) &
 cd /null
 ( while true; do cp -Rp /etc a >& /dev/null; done ) &
 ( while true; do cp -Rp /etc b >& /dev/null; done ) &
 ( while true; do find . >& /dev/null; done ) &
 ( while true; do find . >& /dev/null; done ) &
 cd /
 
 After about ten minutes they'll hang in "inode". An analysis of
 several coredumps indicated that in each case, the deadlocked lock
 graph is rooted in a UFS vnode that has a corresponding nullfs vnode
 (with which it shares a v_vnlock), is locked shared (with lockholder
 = LK_NOPROC) and is always flagged VONWORKLST, which points at the
 syncer.
 
 I can't reproduce the deadlock if
 
 a) I omit the operations in /tmp, or
 b) I omit the operations in /null, or
 c) I only do find, or only do cp
 
 Although obviously that's inconclusive.
 
 At this point I am still quarrying for more VFS & syncer clue.  I
 keep upgrading the amount of logging I get via KTR in order to see
 more, but that's where I'm at with this. Bear in mind that I had
 basically zero VFS exposure until last week.
 
 I've since reproduced it with both r1.50 and r1.51 of null_vnops.c,
 so that's probably not at fault.
 
 Regards
 Joshua.
State-Changed-From-To: open->feedback 
State-Changed-By: tjr 
State-Changed-When: Tue Jun 17 06:21:18 PDT 2003 
State-Changed-Why:  
Could you please check whether the bug still exists in -current? 


Responsible-Changed-From-To: jeff->tjr 
Responsible-Changed-By: tjr 
Responsible-Changed-When: Tue Jun 17 06:21:18 PDT 2003 
Responsible-Changed-Why:  
I'll handle this. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=37270 
State-Changed-From-To: feedback->closed 
State-Changed-By: tjr 
State-Changed-When: Mon Jul 7 15:57:08 PDT 2003 
State-Changed-Why:  
Feedback timeout. I think this has already been fixed. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=37270 
>Unformatted:
