From bms@spc.org  Mon Jul 21 01:37:56 2003
Return-Path: <bms@spc.org>
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id A86ED37B401
	for <FreeBSD-gnats-submit@freebsd.org>; Mon, 21 Jul 2003 01:37:56 -0700 (PDT)
Received: from bigboy.spc.org (bigboy.spc.org [195.206.69.225])
	by mx1.FreeBSD.org (Postfix) with ESMTP id D443643FAF
	for <FreeBSD-gnats-submit@freebsd.org>; Mon, 21 Jul 2003 01:37:55 -0700 (PDT)
	(envelope-from bms@spc.org)
Received: from saboteur.dek.spc.org (unknown [81.3.72.68])
	(using TLSv1 with cipher EDH-RSA-DES-CBC3-SHA (168/168 bits))
	(No client certificate requested)
	by bigboy.spc.org (Postfix) with ESMTP id 99B5C316A
	for <FreeBSD-gnats-submit@freebsd.org>; Mon, 21 Jul 2003 09:38:09 +0100 (BST)
Received: by saboteur.dek.spc.org (Postfix, from userid 1001)
	id 5073D5CB; Mon, 21 Jul 2003 09:37:47 +0100 (BST)
Message-Id: <20030721083747.5073D5CB@saboteur.dek.spc.org>
Date: Mon, 21 Jul 2003 09:37:47 +0100 (BST)
From: Bruce M Simpson <bms@spc.org>
Reply-To: Bruce M Simpson <bms@spc.org>
To: FreeBSD-gnats-submit@freebsd.org
Cc:
Subject: [PATCH] wrap swap_pager's swhash with a mutex
X-Send-Pr-Version: 3.113
X-GNATS-Notify:

>Number:         54690
>Category:       kern
>Synopsis:       [PATCH] wrap swap_pager's swhash with a mutex
>Confidential:   no
>Severity:       non-critical
>Priority:       medium
>Responsible:    alc
>State:          closed
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          change-request
>Submitter-Id:   current-users
>Arrival-Date:   Mon Jul 21 01:40:18 PDT 2003
>Closed-Date:    Mon Nov 10 22:54:44 PST 2003
>Last-Modified:  Mon Nov 10 22:54:44 PST 2003
>Originator:     Bruce M Simpson
>Release:        FreeBSD 5.1-RELEASE i386
>Organization:
>Environment:
System: FreeBSD saboteur.dek.spc.org 5.1-RELEASE FreeBSD 5.1-RELEASE #3: Mon Jun 23 06:55:01 BST 2003 root@saboteur.dek.spc.org:/usr/src/sys/i386/compile/SABOTEUR i386


	
>Description:
	Requested and reviewed by alc@freebsd.org -- wrap the swap block
	hash table in the swap pager with a fine grained lock.
>How-To-Repeat:
	
>Fix:

	

--- swhash.col.patch begins here ---
Generated by diffcoll on Sat 19 Jul 2003 20:35:54 BST

diff -uN src/sys/vm/swap_pager.c.orig src/sys/vm/swap_pager.c
--- /usr/src/sys/vm/swap_pager.c.orig	Sat Jul 19 00:56:10 2003
+++ /usr/src/sys/vm/swap_pager.c	Sat Jul 19 20:35:47 2003
@@ -113,8 +113,11 @@
 static int nsw_cluster_max;	/* maximum VOP I/O allowed		*/
 
 struct blist *swapblist;
+
+static struct mtx swhash_mtx;	/* protect hash table */
 static struct swblock **swhash;
 static int swhash_mask;
+
 static int swap_async_max = 4;	/* maximum in-progress async I/O's	*/
 static struct sx sw_alloc_sx;
 
@@ -256,6 +259,8 @@
 	 */
 	dmmax = SWB_NPAGES * 2;
 	dmmax_mask = ~(dmmax - 1);
+
+	mtx_init(&swhash_mtx, "swap_pager swhash mutex", NULL, MTX_DEF);
 }
 
 /*
@@ -1752,6 +1757,7 @@
 
 full_rescan:
 	waitobj = NULL;
+	mtx_lock(&swhash_mtx);
 	for (i = 0; i <= swhash_mask; i++) { /* '<=' is correct here */
 restart:
 		pswap = &swhash[i];
@@ -1763,7 +1769,9 @@
                                         break;
                         }
 			if (j < SWAP_META_PAGES) {
+				mtx_unlock(&swhash_mtx);
 				swp_pager_force_pagein(swap, j);
+				mtx_lock(&swhash_mtx);
 				goto restart;
 			} else if (swap->swb_object->paging_in_progress) {
 				if (!waitobj)
@@ -1772,6 +1780,8 @@
 			pswap = &swap->swb_hnext;
 		}
 	}
+	mtx_unlock(&swhash_mtx);
+
 	if (waitobj && *sw_used) {
 	    /*
 	     * We wait on an arbitrary object to clock our rescans
@@ -1782,6 +1792,7 @@
 	    VM_OBJECT_UNLOCK(waitobj);
 	    goto full_rescan;
 	}
+
 	if (*sw_used)
 	    panic("swapoff: failed to locate %d swap blocks", *sw_used);
 }
@@ -1817,6 +1828,8 @@
 	struct swblock *swap;
 
 	index &= ~(vm_pindex_t)SWAP_META_MASK;
+
+	mtx_lock(&swhash_mtx);
 	pswap = &swhash[(index ^ (int)(intptr_t)object) & swhash_mask];
 	while ((swap = *pswap) != NULL) {
 		if (swap->swb_object == object &&
@@ -1826,6 +1839,8 @@
 		}
 		pswap = &swap->swb_hnext;
 	}
+	mtx_unlock(&swhash_mtx);
+
 	return (pswap);
 }
 
--- swhash.col.patch ends here ---


>Release-Note:
>Audit-Trail:
Responsible-Changed-From-To: freebsd-bugs->alc 
Responsible-Changed-By: mtm 
Responsible-Changed-When: Mon Jul 21 04:46:48 PDT 2003 
Responsible-Changed-Why:  
This seems to be in your area Alan. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=54690 

From: Bruce M Simpson <bms@spc.org>
To: "Alan L. Cox" <alc@imimic.com>
Cc: freebsd-gnats-submit@FreeBSD.org
Subject: Re: kern/54690: [PATCH] wrap swap_pager's swhash with a mutex
Date: Wed, 23 Jul 2003 05:40:46 +0100

 On Tue, Jul 22, 2003 at 03:21:08PM -0500, Alan L. Cox wrote:
 > Briefly, the pointer returned by swp_pager_hash() is only valid for as
 > long as the hash table mutex is held.  Thus, for insertion and deletion,
 > swp_pager_hash() will need to return with the mutex held.
 > 
 > In other words, any place you see "*pswap" or "->swb_hnext", the mutex
 > must be held.
 
 No problem - to be perfectly honest I wasn't 100% sure about it, but it
 looked OK. I'll walk through the pointer accesses by hand and arrive
 at a correct result.
 
 BMS

From: Bruce M Simpson <bms@spc.org>
To: "Alan L. Cox" <alc@imimic.com>
Cc: freebsd-gnats-submit@FreeBSD.org
Subject: Re: kern/54690: [PATCH] wrap swap_pager's swhash with a mutex
Date: Wed, 23 Jul 2003 10:49:08 +0100

 --ZfOjI3PrQbgiZnxM
 Content-Type: text/plain; charset=us-ascii
 Content-Disposition: inline
 
 On Tue, Jul 22, 2003 at 03:21:08PM -0500, Alan L. Cox wrote:
 > In other words, any place you see "*pswap" or "->swb_hnext", the mutex
 > must be held.
 
 Please see the attached patch.
 
 There is a problem with this patch. When a process exists,
 swap_pager_freespace() is called, and WITNESS reports a
 lock order reversal between vm_object and swhash. I've attached
 the DDB backtrace.
 
 I don't quite understand what's going on, because I've looked at these
 functions, and they all acquire the vm object lock first:-
 
 swap_pager_getpages()   --> doesn't lock vm_object, but references it
 swap_pager_dealloc()  --> entered with a vm_object lock, then takes swhash lock
 swap_pager_freespace() --> entered with a vm_object lock
 
 swp_pager_meta_build() --> is not called with vm_object lock
 
 swp_pager_meta_ctl() -> not called with vm_object lock
    - does not attempt to obtain vm_object lock
    - GIANT_REQUIRED
 
 swp_pager_meta_free() -> vm_object held
   - called from swap_pager_freespace()
   -> first call triggers 'lock order reversal' error in WITNESS
 
 BMS
 
 --ZfOjI3PrQbgiZnxM
 Content-Type: text/plain; charset=us-ascii
 Content-Disposition: attachment; filename="swhash.col.patch"
 
 Generated by diffcoll on Wed 23 Jul 2003 07:57:34 BST
 
 diff -uN src/sys/vm/swap_pager.c.orig src/sys/vm/swap_pager.c
 --- /usr/src/sys/vm/swap_pager.c.orig	Sat Jul 19 00:56:10 2003
 +++ /usr/src/sys/vm/swap_pager.c	Wed Jul 23 07:57:13 2003
 @@ -113,8 +113,11 @@
  static int nsw_cluster_max;	/* maximum VOP I/O allowed		*/
  
  struct blist *swapblist;
 +
 +static struct mtx swhash_mtx;	/* protect hash table */
  static struct swblock **swhash;
  static int swhash_mask;
 +
  static int swap_async_max = 4;	/* maximum in-progress async I/O's	*/
  static struct sx sw_alloc_sx;
  
 @@ -256,6 +259,8 @@
  	 */
  	dmmax = SWB_NPAGES * 2;
  	dmmax_mask = ~(dmmax - 1);
 +
 +	mtx_init(&swhash_mtx, "swap_pager swhash mutex", NULL, MTX_DEF);
  }
  
  /*
 @@ -1648,6 +1653,7 @@
  	int i;
  
  	VM_OBJECT_LOCK_ASSERT(object, MA_OWNED);
 +	mtx_lock(&swhash_mtx);
  	for (bcount = 0; bcount < object->un_pager.swp.swp_bcount; bcount++) {
  		struct swblock *swap;
  
 @@ -1655,8 +1661,10 @@
  			for (i = 0; i < SWAP_META_PAGES; ++i) {
  				daddr_t v = swap->swb_pages[i];
  				if (v != SWAPBLK_NONE &&
 -				    BLK2DEVIDX(v) == devidx)
 +				    BLK2DEVIDX(v) == devidx) {
 +					mtx_unlock(&swhash_mtx);
  					return 1;
 +				}
  			}
  		}
  
 @@ -1664,6 +1672,7 @@
  		if (index > 0x20000000)
  			panic("swap_pager_isswapped: failed to locate all swap meta blocks");
  	}
 +	mtx_unlock(&swhash_mtx);
  	return 0;
  }
  
 @@ -1752,6 +1761,7 @@
  
  full_rescan:
  	waitobj = NULL;
 +	mtx_lock(&swhash_mtx);
  	for (i = 0; i <= swhash_mask; i++) { /* '<=' is correct here */
  restart:
  		pswap = &swhash[i];
 @@ -1763,7 +1773,9 @@
                                          break;
                          }
  			if (j < SWAP_META_PAGES) {
 +				mtx_unlock(&swhash_mtx);
  				swp_pager_force_pagein(swap, j);
 +				mtx_lock(&swhash_mtx);
  				goto restart;
  			} else if (swap->swb_object->paging_in_progress) {
  				if (!waitobj)
 @@ -1772,6 +1784,8 @@
  			pswap = &swap->swb_hnext;
  		}
  	}
 +	mtx_unlock(&swhash_mtx);
 +
  	if (waitobj && *sw_used) {
  	    /*
  	     * We wait on an arbitrary object to clock our rescans
 @@ -1782,6 +1796,7 @@
  	    VM_OBJECT_UNLOCK(waitobj);
  	    goto full_rescan;
  	}
 +
  	if (*sw_used)
  	    panic("swapoff: failed to locate %d swap blocks", *sw_used);
  }
 @@ -1809,6 +1824,8 @@
   *	find a swapblk.
   *
   *	This routine must be called at splvm().
 + *	It is the caller's responsibility to obtain the swhash_mtx lock
 + *	before calling.
   */
  static __inline struct swblock **
  swp_pager_hash(vm_object_t object, vm_pindex_t index)
 @@ -1817,6 +1834,7 @@
  	struct swblock *swap;
  
  	index &= ~(vm_pindex_t)SWAP_META_MASK;
 +
  	pswap = &swhash[(index ^ (int)(intptr_t)object) & swhash_mask];
  	while ((swap = *pswap) != NULL) {
  		if (swap->swb_object == object &&
 @@ -1826,6 +1844,7 @@
  		}
  		pswap = &swap->swb_hnext;
  	}
 +
  	return (pswap);
  }
  
 @@ -1883,16 +1902,20 @@
  	 * and, since the hash table may have changed, retry.
  	 */
  retry:
 +	mtx_lock(&swhash_mtx);
  	pswap = swp_pager_hash(object, pindex);
  
  	if ((swap = *pswap) == NULL) {
  		int i;
  
 -		if (swapblk == SWAPBLK_NONE)
 +		if (swapblk == SWAPBLK_NONE) {
 +			mtx_unlock(&swhash_mtx);
  			return;
 +		}
  
  		swap = *pswap = uma_zalloc(swap_zone, M_NOWAIT);
  		if (swap == NULL) {
 +			mtx_unlock(&swhash_mtx);
  			VM_WAIT;
  			goto retry;
  		}
 @@ -1924,6 +1947,8 @@
  	swap->swb_pages[idx] = swapblk;
  	if (swapblk != SWAPBLK_NONE)
  		++swap->swb_count;
 +
 +	mtx_unlock(&swhash_mtx);
  }
  
  /*
 @@ -1946,6 +1971,7 @@
  	if (object->type != OBJT_SWAP)
  		return;
  
 +	mtx_lock(&swhash_mtx);
  	while (count > 0) {
  		struct swblock **pswap;
  		struct swblock *swap;
 @@ -1973,6 +1999,7 @@
  			index += n;
  		}
  	}
 +	mtx_unlock(&swhash_mtx);
  }
  
  /*
 @@ -1993,6 +2020,7 @@
  	if (object->type != OBJT_SWAP)
  		return;
  
 +	mtx_lock(&swhash_mtx);
  	while (object->un_pager.swp.swp_bcount) {
  		struct swblock **pswap;
  		struct swblock *swap;
 @@ -2018,6 +2046,7 @@
  		if (index > 0x20000000)
  			panic("swp_pager_meta_free_all: failed to locate all swap meta blocks");
  	}
 +	mtx_unlock(&swhash_mtx);
  }
  
  /*
 @@ -2062,6 +2091,7 @@
  		return (SWAPBLK_NONE);
  
  	r1 = SWAPBLK_NONE;
 +	mtx_lock(&swhash_mtx);
  	pswap = swp_pager_hash(object, pindex);
  
  	if ((swap = *pswap) != NULL) {
 @@ -2083,6 +2113,7 @@
  			} 
  		}
  	}
 +	mtx_unlock(&swhash_mtx);
  	return (r1);
  }
  
 
 
 --ZfOjI3PrQbgiZnxM
 Content-Type: text/plain; charset=us-ascii
 Content-Disposition: attachment; filename="foo.txt"
 
 process exception: page fault, fault VA = 0x17f4b000
 lock order reversal
  1st 0xc27abb88 vm object (vm object) @ vm/vm_map.c:2224
  2nd 0xc03e2ae0 swap_pager swhash mutex (swap_pager swhash mutex) @ vm/swap_pager.c:1974
 Stack backtrace:
 backtrace(c0341300,c03e2ae0,c035555a,c035555a,c035557b) at backtrace+0x17
 witness_lock(c03e2ae0,8,c035557b,7b6,d5d1eba0) at witness_lock+0x697
 _mtx_lock_flags(c03e2ae0,0,c0355572,7b6,d5d1eba0) at _mtx_lock_flags+0xb1
 swp_pager_meta_free(c27abb88,0,0,1,0) at swp_pager_meta_free+0x70
 swap_pager_freespace(c27abb88,0,0,1,0) at swap_pager_freespace+0x58
 vm_map_delete(c0eb9bdc,0,bfc00000,c0eb9bdc,c2677200) at vm_map_delete+0x3c6
 vm_map_remove(c0eb9bdc,0,bfc00000,111,d5d1ec88) at vm_map_remove+0x58
 exit1(c25dd390,8b,1d1,c25dcf00,0) at exit1+0x626
 sigexit(c25dd390,b,c033e8c8,865,0) at sigexit+0x1a7
 postsig(b,0,c0340dbd,f8,30800) at postsig+0x164
 ast(d5d1ed48) at ast+0x46f
 doreti_ast() at doreti_ast+0x17
 Debugger("witness_lock")
 Stopped at      Debugger+0x54:  xchgl   %ebx,in_Debugger.0
 db> show witness
 ...
 3   swap_pager swhash mutex -- last acquired @ vm/swap_pager.c:2094
 7    vm object -- (already displayed)
 ...
 
 --ZfOjI3PrQbgiZnxM--
State-Changed-From-To: open->closed 
State-Changed-By: alc 
State-Changed-When: Mon Nov 10 22:52:53 PST 2003 
State-Changed-Why:  
A variation of this patch has been applied to -CURRENT.  Thanks, Bruce. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=54690 
>Unformatted:
