From dillon@flea.best.net  Thu Apr  9 17:41:30 1998
Received: from flea.best.net (root@flea.best.net [206.184.139.131])
          by hub.freebsd.org (8.8.8/8.8.8) with ESMTP id RAA10544
          for <FreeBSD-gnats-submit@freebsd.org>; Thu, 9 Apr 1998 17:41:30 -0700 (PDT)
          (envelope-from dillon@flea.best.net)
Received: (from dillon@localhost) by flea.best.net (8.8.8/8.7.3) id RAA01574; Thu, 9 Apr 1998 17:40:57 -0700 (PDT)
Message-Id: <199804100040.RAA01574@flea.best.net>
Date: Thu, 9 Apr 1998 17:40:57 -0700 (PDT)
From: Matt Dillon <dillon@best.net>
Reply-To: dillon@best.net
To: FreeBSD-gnats-submit@freebsd.org
Subject: FreeBSD-2.2.6 VM lockup on kernel map due to brelse calling bfreekva(), dump lockup in getnewbuf() due to fragmented buffer_map
X-Send-Pr-Version: 3.2

>Number:         6258
>Category:       kern
>Synopsis:       A fix required to prevent kernel lockups in brelse causes the dump program to lockup in 'newbuf' [2.2 ISSUE]
>Confidential:   no
>Severity:       critical
>Priority:       high
>Responsible:    freebsd-bugs
>State:          closed
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Thu Apr  9 17:50:00 PDT 1998
>Closed-Date:    Sun Apr 26 14:01:32 PDT 1998
>Last-Modified:  Sun Apr 26 14:01:52 PDT 1998
>Originator:     Matt Dillon
>Release:        FreeBSD 2.2.6-STABLE i386
>Organization:
Best Internet Communications, Inc.
>Environment:

	Heavily loaded shell machine, PPro 200, 128MB of ram.

>Description:

	Problem #1:  Kernel locks up in kernel map due to brelse() calling
	bfreekva() from a SCSI interrupt while kernel map is already locked
	(there was another bugtrack on this problem).  The fix for this was
	to defer calling bfreekva().

	Problem #2: Unfortunately, this appears to create a new problem.  The 
	new problem is that since bfreekva() is not called when a buffer
	is released, the buffer_map can get fragmented and prevent large 
	getnewbuf allocations from succeeding.  

	The dump program attempts to allocate a 64K buffer to load the
	disklabel.  About once a week the program locks up in a 'newbuf' 
	waitstate, but still eats cpu because it is constantly being woken
	up but then (as far as I can tell) is unable to allocate a bp of
	sufficient size.  This becomes a permanent condition.

	Since we can't put the bfreekva() back into brelse(), my solution
	is to put code in getnewbuf().  If the vm_map_findspace() call
	fails, my proposed code (the second set of changes below) wipes
	the kvm mappings for all EMPTY bp's in an attempt to defragment it
	then retries the vm_map_findspace() call.

	I'm running this code now but it hasn't hit it yet.

>How-To-Repeat:

	I can't reliably get it repeatable.  The problem happens once a week
	or so on our admin machine.  However, I believe the general problem
	is important enough to be flagged critical since apparently the
	original #ifdef notdef patch to remove bfreekva() did not make it
	into 2.2.6. I don't know why.  brelse() is a critical kernel call
	that can occur in an interrupt and should not do anything complex...
	certainly not call bfreekva().  Manipulating the kernel map and
	associated insundry activity is much safer to do in getnewbuf() then
	in brelse().

>Fix:


--- LINK/vfs_bio.c	Fri Mar 13 13:13:57 1998
+++ vfs_bio.c	Thu Apr  9 17:38:01 1998
@@ -597,10 +597,12 @@
 		LIST_REMOVE(bp, b_hash);
 		LIST_INSERT_HEAD(&invalhash, bp, b_hash);
 		bp->b_dev = NODEV;
+#ifdef notdef
 		/*
 		 * Get rid of the kva allocation *now*
 		 */
 		bfreekva(bp);
+#endif
 		if (needsbuffer) {
 			wakeup(&needsbuffer);
 			needsbuffer=0;
@@ -986,9 +988,33 @@
 		 */
 		if (vm_map_findspace(buffer_map,
 			vm_map_min(buffer_map), maxsize, &addr)) {
-			bp->b_flags |= B_INVAL;
-			brelse(bp);
-			goto trytofreespace;
+
+			/*
+			 * Matt hack.  Since we can't call bfreekva() in
+			 * brelse(), the bp's on the EMPTY list may all
+			 * still have allocated KVM.  If we can't find
+			 * unused space in the buffer_map, we should try
+			 * to defragment the map by freeing as much from
+			 * the empty list as possible.
+			 */
+			printf("vm_map_findspace() failed, defragmenting freelist\n");
+			for (bp = TAILQ_FIRST(&bufqueues[QUEUE_EMPTY]);
+				bp;
+				bp = TAILQ_NEXT(bp, b_freelist)
+			) {
+			    if (bp->b_kvasize)
+				bfreekva(bp);
+			    if (bp->b_qindex != QUEUE_EMPTY)
+				break;
+			}
+			addr = 0;
+			if (vm_map_findspace(buffer_map,
+				vm_map_min(buffer_map), maxsize, &addr)) {
+
+				bp->b_flags |= B_INVAL;
+				brelse(bp);
+				goto trytofreespace;
+			}
 		}
 	}
 
>Release-Note:
>Audit-Trail:

From: "Justin T. Gibbs" <gibbs@plutotech.com>
To: dillon@best.net
Cc: FreeBSD-gnats-submit@freebsd.org
Subject: Re: kern/6258: FreeBSD-2.2.6 VM lockup on kernel map due to brelse calling bfreekva(), dump lockup in getnewbuf() due to fragmented buffer_map 
Date: Thu, 09 Apr 1998 22:26:18 -0600

 >	Problem #1:  Kernel locks up in kernel map due to brelse() calling
 >	bfreekva() from a SCSI interrupt while kernel map is already locked
 >	(there was another bugtrack on this problem).  The fix for this was
 >	to defer calling bfreekva().
 
 Perhaps this can be addressed by simply delaying the call only until the vm
 SWI can run.  I added a vm SWI for the bus dma stuff, but it could easily
 be expanded to perform this task as well.  I would expect us to hit the 
 SWI rapidly in most cases, thereby preventing fragmentation.
 
 --
 Justin
 
 

From: David Greenman <dg@root.com>
To: dillon@best.net
Cc: FreeBSD-gnats-submit@FreeBSD.ORG
Subject: Re: kern/6258: FreeBSD-2.2.6 VM lockup on kernel map due to brelse calling bfreekva(), dump lockup in getnewbuf() due to fragmented buffer_map 
Date: Thu, 09 Apr 1998 22:04:25 -0700

 >	Since we can't put the bfreekva() back into brelse(), my solution
 >	is to put code in getnewbuf().  If the vm_map_findspace() call
 >	fails, my proposed code (the second set of changes below) wipes
 >	the kvm mappings for all EMPTY bp's in an attempt to defragment it
 >	then retries the vm_map_findspace() call.
 
    This is similar but different from the way it was fixed in -current. The
 fix from -current should be adopted for -stable.
 
 -DG
 
 David Greenman
 Core-team/Principal Architect, The FreeBSD Project
State-Changed-From-To: open->analyzed 
State-Changed-By: phk 
State-Changed-When: Wed Apr 15 10:08:26 PDT 1998 
State-Changed-Why:  
merge to 2.2 issue 
State-Changed-From-To: analyzed->closed 
State-Changed-By: phk 
State-Changed-When: Sun Apr 26 14:01:32 PDT 1998 
State-Changed-Why:  
fixed in current 
>Unformatted:
