From chuck@gs1.research.att.com  Wed Jul 30 12:58:51 2003
Return-Path: <chuck@gs1.research.att.com>
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 6988E37B401
	for <FreeBSD-gnats-submit@freebsd.org>; Wed, 30 Jul 2003 12:58:51 -0700 (PDT)
Received: from gs1.research.att.com (H-135-207-14-45.research.att.com [135.207.14.45])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 4BA4143FB1
	for <FreeBSD-gnats-submit@freebsd.org>; Wed, 30 Jul 2003 12:58:50 -0700 (PDT)
	(envelope-from chuck@gs1.research.att.com)
Received: from gs1.research.att.com (localhost.research.att.com [127.0.0.1])
	by gs1.research.att.com (8.12.6/8.12.6) with ESMTP id h6UJwjdF055735;
	Wed, 30 Jul 2003 19:58:49 GMT
	(envelope-from chuck@gs1.research.att.com)
Received: (from chuck@localhost)
	by gs1.research.att.com (8.12.6/8.12.6/Submit) id h6UJt1aN055545;
	Wed, 30 Jul 2003 19:55:01 GMT
Message-Id: <200307301955.h6UJt1aN055545@gs1.research.att.com>
Date: Wed, 30 Jul 2003 19:55:01 GMT
From: Chuck Cranor <chuck@research.att.com>
Reply-To: Chuck Cranor <chuck@research.att.com>
To: FreeBSD-gnats-submit@freebsd.org
Cc: chuck@research.att.com
Subject: contigmalloc API semantics inadequate --- forces KVM mapping
X-Send-Pr-Version: 3.113
X-GNATS-Notify:

>Number:         55081
>Category:       kern
>Synopsis:       contigmalloc API semantics inadequate --- forces KVM mapping
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    green
>State:          closed
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Wed Jul 30 13:00:28 PDT 2003
>Closed-Date:    Mon Aug 16 11:16:11 GMT 2004
>Last-Modified:  Mon Aug 16 11:16:11 GMT 2004
>Originator:     Chuck Cranor
>Release:        FreeBSD 4.7-RELEASE i386
>Organization:
AT&T Labs--Research
>Environment:

System: FreeBSD gs1.research.att.com 4.7-RELEASE FreeBSD 4.7-RELEASE #2: Mon Mar 3 15:40:37 GMT 2003 chuck@gs1.research.att.com:/usr/home/chuck/src/sys/compile/RESEARCH47MP i386

>Description:

    we have a PCI card that communicates with its user-level
applications by using a physically contiguous memory mapped buffer
[mapped via mmap(2)].  currently the card's driver allocates the
buffer using contigmalloc() during autoconfig.

    we would like to make this buffer quite large (all our systems
have 4GB RAM), but unfortunately the semantics of contigmalloc()
cannot support this!   

    the problem with the contigmalloc() API is that in addition to
allocating physically contiguous memory, it also insists on mapping
the allocated memory into the kernel virtual address space!  thus,
when we try and allocate large buffers for our PCI device, our
allocation fails because we run out of kernel virtual memory.  this is
bad, since the PCI card we are using needs large blocks of contiguous
memory, but it does not need _any_ kernel mappings of these blocks.


    to solve the problem, i have broken contigmalloc() up into two
functions: 
	   vm_contig_pg_alloc() - allocates physical contig memory,
				  return index of first page in vm_page_array
				  or -1 on error
	   vm_contig_pg_kmapin() - maps a set of physically contig
	   			  pages into KVM

and now contigmalloc() is just a wrapper around calls to these two
functions.  i also have a vm_contig_pg_free() function that undoes a
vm_contig_pg_alloc() operation.

    for our device, we only need to call vm_contig_pg_alloc()...
that will save us from running out of KVM.		


    I would like to get some sort of change along these lines
committed to the FreeBSD 4 and 5 branches.  I'm willing to rework the
patch a bit if needed.

>How-To-Repeat:

	attempt to allocate a large (e.g. 1GB) physically contig buffer
	in the FreeBSD kernel.

>Fix:

here is a context diff that makes my proposed changes.
also, after the diff is a simple kld module that I used
for doing basic testing of the APIs (seems to work).


*** vm_page.h_47	Thu Jul 24 13:43:16 2003
--- vm_page.h	Thu Jul 24 17:03:33 2003
***************
*** 393,398 ****
--- 393,402 ----
  #define	VM_ALLOC_ZERO		3
  #define	VM_ALLOC_RETRY		0x80
  
+ int vm_contig_pg_alloc(u_long, u_long, u_long, u_long, u_long);
+ vm_offset_t vm_contig_pg_kmapin(int, u_long, vm_map_t);
+ void vm_contig_pg_free(int, u_long);
+ 
  void vm_page_unhold(vm_page_t mem);
  
  void vm_page_activate (vm_page_t);
*** vm_page.c_47	Thu Jul 24 13:33:20 2003
--- vm_page.c	Thu Jul 24 18:26:06 2003
***************
*** 1774,1815 ****
  }
  
  /*
!  * This interface is for merging with malloc() someday.
!  * Even if we never implement compaction so that contiguous allocation
!  * works after initialization time, malloc()'s data structures are good
!  * for statistics and for allocations of less than a page.
   */
! void *
! contigmalloc1(
! 	unsigned long size,	/* should be size_t here and for malloc() */
! 	struct malloc_type *type,
! 	int flags,
! 	unsigned long low,
! 	unsigned long high,
! 	unsigned long alignment,
! 	unsigned long boundary,
! 	vm_map_t map)
! {
! 	int i, s, start;
! 	vm_offset_t addr, phys, tmp_addr;
! 	int pass;
  	vm_page_t pga = vm_page_array;
  
  	size = round_page(size);
  	if (size == 0)
! 		panic("contigmalloc1: size must not be 0");
  	if ((alignment & (alignment - 1)) != 0)
! 		panic("contigmalloc1: alignment must be a power of 2");
  	if ((boundary & (boundary - 1)) != 0)
! 		panic("contigmalloc1: boundary must be a power of 2");
  
  	start = 0;
  	for (pass = 0; pass <= 1; pass++) {
  		s = splvm();
  again:
  		/*
! 		 * Find first page in array that is free, within range, aligned, and
! 		 * such that the boundary won't be crossed.
  		 */
  		for (i = start; i < cnt.v_page_count; i++) {
  			int pqtype;
--- 1774,1806 ----
  }
  
  /*
!  * vm_contig_pg_alloc: allocate a set of physically contig pages in a
!  * given range.  we return the index (in the vm_page_array[]) of the
!  * first page allocated, or return -1 on error.
   */
! int
! vm_contig_pg_alloc(u_long size, u_long low, u_long high, u_long alignment,
! 		   u_long boundary) {
! 
! 	int i, s, start, pass;
! 	vm_offset_t phys;
  	vm_page_t pga = vm_page_array;
  
  	size = round_page(size);
  	if (size == 0)
! 		panic("vm_contig_pg_alloc: size must not be 0");
  	if ((alignment & (alignment - 1)) != 0)
! 		panic("vm_contig_pg_alloc: alignment must be a power of 2");
  	if ((boundary & (boundary - 1)) != 0)
! 		panic("vm_contig_pg_alloc: boundary must be a power of 2");
  
  	start = 0;
  	for (pass = 0; pass <= 1; pass++) {
  		s = splvm();
  again:
  		/*
! 		 * Find first page in array that is free, within range, 
! 		 * aligned, and such that the boundary won't be crossed.
  		 */
  		for (i = start; i < cnt.v_page_count; i++) {
  			int pqtype;
***************
*** 1835,1841 ****
  				m = next) {
  
  				KASSERT(m->queue == PQ_INACTIVE,
! 					("contigmalloc1: page %p is not PQ_INACTIVE", m));
  
  				next = TAILQ_NEXT(m, pageq);
  				if (vm_page_sleep_busy(m, TRUE, "vpctw0"))
--- 1826,1832 ----
  				m = next) {
  
  				KASSERT(m->queue == PQ_INACTIVE,
! 					("vm_contig_pg_alloc: page %p is not PQ_INACTIVE", m));
  
  				next = TAILQ_NEXT(m, pageq);
  				if (vm_page_sleep_busy(m, TRUE, "vpctw0"))
***************
*** 1862,1868 ****
  				m = next) {
  
  				KASSERT(m->queue == PQ_ACTIVE,
! 					("contigmalloc1: page %p is not PQ_ACTIVE", m));
  
  				next = TAILQ_NEXT(m, pageq);
  				if (vm_page_sleep_busy(m, TRUE, "vpctw1"))
--- 1853,1859 ----
  				m = next) {
  
  				KASSERT(m->queue == PQ_ACTIVE,
! 					("vm_contig_pg_alloc: page %p is not PQ_ACTIVE", m));
  
  				next = TAILQ_NEXT(m, pageq);
  				if (vm_page_sleep_busy(m, TRUE, "vpctw1"))
***************
*** 1885,1891 ****
  			}
  
  			splx(s);
! 			continue;
  		}
  		start = i;
  
--- 1876,1882 ----
  			}
  
  			splx(s);
! 			continue;		/* next pass */
  		}
  		start = i;
  
***************
*** 1917,1963 ****
  			if (m->flags & PG_ZERO)
  				vm_page_zero_count--;
  			m->flags = 0;
! 			KASSERT(m->dirty == 0, ("contigmalloc1: page %p was dirty", m));
  			m->wire_count = 0;
  			m->busy = 0;
  			m->object = NULL;
  		}
  
  		/*
! 		 * We've found a contiguous chunk that meets are requirements.
! 		 * Allocate kernel VM, unfree and assign the physical pages to it and
! 		 * return kernel VM pointer.
  		 */
! 		vm_map_lock(map);
! 		if (vm_map_findspace(map, vm_map_min(map), size, &addr) !=
! 		    KERN_SUCCESS) {
! 			/*
! 			 * XXX We almost never run out of kernel virtual
! 			 * space, so we don't make the allocated memory
! 			 * above available.
! 			 */
! 			vm_map_unlock(map);
! 			splx(s);
! 			return (NULL);
! 		}
! 		vm_object_reference(kernel_object);
! 		vm_map_insert(map, kernel_object, addr - VM_MIN_KERNEL_ADDRESS,
! 		    addr, addr + size, VM_PROT_ALL, VM_PROT_ALL, 0);
! 		vm_map_unlock(map);
  
! 		tmp_addr = addr;
! 		for (i = start; i < (start + size / PAGE_SIZE); i++) {
! 			vm_page_t m = &pga[i];
! 			vm_page_insert(m, kernel_object,
! 				OFF_TO_IDX(tmp_addr - VM_MIN_KERNEL_ADDRESS));
! 			tmp_addr += PAGE_SIZE;
! 		}
! 		vm_map_pageable(map, addr, addr + size, FALSE);
  
  		splx(s);
! 		return ((void *)addr);
  	}
! 	return NULL;
  }
  
  void *
--- 1908,2041 ----
  			if (m->flags & PG_ZERO)
  				vm_page_zero_count--;
  			m->flags = 0;
! 			KASSERT(m->dirty == 0, ("vm_contig_pg_alloc: page %p was dirty", m));
  			m->wire_count = 0;
  			m->busy = 0;
  			m->object = NULL;
  		}
  
  		/*
! 		 * success!
  		 */
! 		splx(s);
! 		return(start);
  
! 	}	/* end of pass loop */
! 
! 	/*
! 	 * failed...
! 	 */
! 	splx(s);
! 	return(-1);
! }
! 
! /*
!  * vm_contig_pg_free: undo a vm_contig_pg_alloc.  we assume that all
!  * references to the pages have been removed and that it is OK to add
!  * them back to the free list.
!  */
! void
! vm_contig_pg_free(int start, u_long size) {
! 
! 	vm_page_t pga = vm_page_array;
! 	int i;
! 
! 	size = round_page(size);
! 	if (size == 0)
! 		panic("vm_contig_pg_free: size must not be 0");
! 
! 	for (i = start; i < (start + size / PAGE_SIZE); i++) {
! 		vm_page_free(&pga[i]);
! 	}
! }
! 
! 
! /*
!  * vm_contig_pg_kmapin: map a previously allocated set of contig pages
!  * from the vm_page_array[] into the kernel address space.   once mapped,
!  * the pages become part of the kernel object and should be freed with
!  * kmem_free(kernel_map, address, size).
!  */
! vm_offset_t
! vm_contig_pg_kmapin(int start, u_long size, vm_map_t map) {
! 
! 	int i, s;
! 	vm_offset_t addr, tmp_addr;
! 	vm_page_t pga = vm_page_array;
! 
! 	size = round_page(size);
! 	if (size == 0)
! 		panic("vm_contig_pg_kmapin: size must not be 0");
  
+ 	s = splvm();	/* XXX: is this really needed? */
+ 	/*
+ 	 * We've found a contiguous chunk that meets are requirements.
+ 	 * Allocate kernel VM, unfree and assign the physical pages to it and
+ 	 * return kernel VM pointer.
+ 	 */
+ 	vm_map_lock(map);
+ 	if (vm_map_findspace(map, vm_map_min(map), size, &addr) !=
+ 	    KERN_SUCCESS) {
+ 		vm_map_unlock(map);
  		splx(s);
! 		return (0);
! 	}
! 	vm_object_reference(kernel_object);
! 	vm_map_insert(map, kernel_object, addr - VM_MIN_KERNEL_ADDRESS,
! 	    addr, addr + size, VM_PROT_ALL, VM_PROT_ALL, 0);
! 	vm_map_unlock(map);
! 
! 	tmp_addr = addr;
! 	for (i = start; i < (start + size / PAGE_SIZE); i++) {
! 		vm_page_t m = &pga[i];
! 		vm_page_insert(m, kernel_object,
! 			OFF_TO_IDX(tmp_addr - VM_MIN_KERNEL_ADDRESS));
! 		tmp_addr += PAGE_SIZE;
  	}
! 	vm_map_pageable(map, addr, addr + size, FALSE);
! 
! 	splx(s);
! 	return (addr);
! }
! 
! /*
!  * This interface is for merging with malloc() someday.
!  * Even if we never implement compaction so that contiguous allocation
!  * works after initialization time, malloc()'s data structures are good
!  * for statistics and for allocations of less than a page.
!  */
! void *
! contigmalloc1(
! 	unsigned long size,	/* should be size_t here and for malloc() */
! 	struct malloc_type *type,
! 	int flags,
! 	unsigned long low,
! 	unsigned long high,
! 	unsigned long alignment,
! 	unsigned long boundary,
! 	vm_map_t map)
! {
! 	int index;
! 	void *rv;
! 
! 	size = round_page(size);
! 	if (size == 0)
! 		panic("contigmalloc1: size must not be 0");
! 	if ((alignment & (alignment - 1)) != 0)
! 		panic("contigmalloc1: alignment must be a power of 2");
! 	if ((boundary & (boundary - 1)) != 0)
! 		panic("contigmalloc1: boundary must be a power of 2");
! 
! 	index = vm_contig_pg_alloc(size, low, high, alignment, boundary);
! 	if (index < 0)
! 		return(NULL);
! 
! 	rv = (void *) vm_contig_pg_kmapin(index, size, map);
! 	if (!rv) {
! 		vm_contig_pg_free(index, size);
! 	}
! 
! 	return(rv);
  }
  
  void *



# This is a shell archive.  Save it in a file, remove anything before
# this line, and then unpack it by entering "sh file".  Note, it may
# create directories; files and directories will be owned by you and
# have default permissions.
#
# This archive contains:
#
#	Makefile
#	vm_contig_test.c
#
echo x - Makefile
sed 's/^X//' >Makefile << 'END-of-Makefile'
X# $FreeBSD$
X
XKMOD	= vm_contig_test
XSRCS	= vm_contig_test.c
X
X.include <bsd.kmod.mk>
END-of-Makefile
echo x - vm_contig_test.c
sed 's/^X//' >vm_contig_test.c << 'END-of-vm_contig_test.c'
X/* $FreeBSD$ */
X
X/*
X * vm_contig_test.c  test vm_contig page allocation functions
X * 30-Jul-2003  chuck@research.att.com
X */
X
X#include <sys/param.h>
X#include <sys/systm.h>
X#include <sys/kernel.h>
X#include <sys/module.h>
X#include <sys/malloc.h>
X
X#include <vm/vm.h>
X#include <vm/pmap.h>
X#include <vm/vm_page.h>
X
X/*
X * module glue, as per module(9)
X */
Xstatic int vmct_handler(module_t mod, int what, void *arg);
Xstatic void doit(void);
X
Xstatic moduledata_t mod_data = {
X	"vm_contig_test",	/* name */
X	vmct_handler,		/* handler */
X	0,			/* private */
X};
X
XMODULE_VERSION(vm_contig_test, 1);
X
XDECLARE_MODULE(vm_contig_test, mod_data, SI_SUB_EXEC, SI_ORDER_ANY);
X
X/*
X * vmct_handler: module function to perform some testing of the
X * vm_contig page allocation API.
X */
X
Xstatic int vmct_handler(module_t mod, int what, void *arg) {
X
X	int err = 0;
X
X	switch (what) {
X	case MOD_LOAD:
X		uprintf("vmct_handler: loading contig test module\n");
X		doit();
X		uprintf("vmct_handler: test done!  now kldunload me.\n");
X		break;
X	case MOD_UNLOAD:
X		uprintf("vmct_handler: unloading contig test module\n");
X		break;
X	default:
X		uprintf("unknown contig test module command (%d)\n", what);
X		break;
X
X	}
X	return(err);
X}
X
X/*
X * doit: do the tests
X */
X
Xvoid doit() {
X	int ntp = 8;		/* number of test pages */
X	int psz = PAGE_SIZE;	/* page size */
X	int n, span, idx, i;
X	char *b1, *p;
X	vm_page_t pga = vm_page_array;
X
X	uprintf("doit: testing VM contig API...  PAGE_SIZE=%d\n", psz);
X
X	b1 = contigmalloc(psz * ntp, M_DEVBUF, M_NOWAIT, 0, 0xffffffff, psz, 0);
X	uprintf("contigmalloc %d pages: %p\n", ntp, b1);
X
X	if (b1) {
X		for (p = b1, n = 0 ; p < b1 + (psz * ntp) ; p += psz, n++) {
X			if (p == b1) {
X				uprintf("first page: va=%p, pa=%x\n", 
X							p, vtophys(p));
X			} else {
X				span = vtophys(p) - vtophys(p - psz);
X				uprintf("page %d: span=%d %s\n", n, span,
X					(span == psz) ? "OK" : "BAD!!!!");
X			}
X		}
X		contigfree(b1, psz * ntp, M_DEVBUF);
X		uprintf("freed the pages\n");
X	}
X
X	idx = vm_contig_pg_alloc(psz * ntp, 0, 0xffffffff, psz, 0);
X	if (idx < 0) {
X		uprintf("vm_contig_pg_alloc: API failed %d\n", idx);
X	} else { 
X		for (i = idx, n = 0 ; i < idx + ntp ; i++, n++) { 
X			if (n == 0) {
X				uprintf("first page: pa=%x\n", 
X					pga[i].phys_addr);
X			} else {
X				span = pga[i].phys_addr - pga[i-1].phys_addr;
X				uprintf("page %d: span=%d %s\n", n, span,
X					(span == psz) ? "OK" : "BAD!!!!");
X			}
X		}
X		uprintf("freeing the pages\n");
X		vm_contig_pg_free(idx, psz * ntp);
X	}
X}
END-of-vm_contig_test.c
exit

>Release-Note:
>Audit-Trail:
Responsible-Changed-From-To: freebsd-bugs->green 
Responsible-Changed-By: bms 
Responsible-Changed-When: Wed Jun 23 09:15:07 GMT 2004 
Responsible-Changed-Why:  
green@ is rewriting contigmalloc 

http://www.freebsd.org/cgi/query-pr.cgi?pr=55081 

From: Brian Fundakowski Feldman <green@FreeBSD.org>
To: freebsd-bugs@FreeBSD.org
Cc:  
Subject: Re: kern/55081: contigmalloc API semantics inadequate --- forces KVM mapping
Date: Wed, 23 Jun 2004 11:26:38 -0400

 Indeed I have a reimplementation of contigmalloc(9) in the waiting.
 As you suggested it should be, it is split apart into functions that
 can be called without mapping the memory into kernel space.  However,
 it's also a reimplementation of the actual acquisition of pages such
 that it should be far more reliable than before: instead of paging
 out everything and hoping to get a contiguous region, it pages out
 a page at a time and should only fail due to running completely out
 of swap space or having so much wired memory that it fragments the
 part of the physical address pspace you want to use.
 
 Please try this patch and see how well it works for you.  You do not
 have to change vm.old_contigmalloc to be able to use the new
 vm_page_alloc_contig() and vm_page_release_contig() functions.
 You should also notice the allocated memory show up in vmstat -m
 output for calls to contigmalloc(); you can report your driver's
 memory to these pools, too, using the malloc_type_*() functions.
 
 Index: sys/malloc.h
 ===================================================================
 RCS file: /usr/ncvs/src/sys/sys/malloc.h,v
 retrieving revision 1.76
 diff -u -r1.76 malloc.h
 --- sys/malloc.h	7 Apr 2004 04:19:49 -0000	1.76
 +++ sys/malloc.h	19 Jun 2004 05:20:30 -0000
 @@ -105,6 +105,8 @@
  void	*malloc(unsigned long size, struct malloc_type *type, int flags);
  void	malloc_init(void *);
  int	malloc_last_fail(void);
 +void	malloc_type_allocated(struct malloc_type *type, unsigned long size);
 +void	malloc_type_freed(struct malloc_type *type, unsigned long size);
  void	malloc_uninit(void *);
  void	*realloc(void *addr, unsigned long size, struct malloc_type *type,
  	    int flags);
 Index: kern/kern_malloc.c
 ===================================================================
 RCS file: /usr/ncvs/src/sys/kern/kern_malloc.c,v
 retrieving revision 1.133
 diff -u -r1.133 kern_malloc.c
 --- kern/kern_malloc.c	31 May 2004 21:46:04 -0000	1.133
 +++ kern/kern_malloc.c	19 Jun 2004 05:22:40 -0000
 @@ -175,6 +175,47 @@
  }
  
  /*
 + * Add this to the informational malloc_type bucket.
 + */
 +static void
 +malloc_type_zone_allocated(struct malloc_type *ksp, unsigned long size,
 +    int zindx)
 +{
 +	mtx_lock(&ksp->ks_mtx);
 +	ksp->ks_calls++;
 +	if (zindx != -1)
 +		ksp->ks_size |= 1 << zindx;
 +	if (size != 0) {
 +		ksp->ks_memuse += size;
 +		ksp->ks_inuse++;
 +		if (ksp->ks_memuse > ksp->ks_maxused)
 +			ksp->ks_maxused = ksp->ks_memuse;
 +	}
 +	mtx_unlock(&ksp->ks_mtx);
 +}
 +
 +void
 +malloc_type_allocated(struct malloc_type *ksp, unsigned long size)
 +{
 +	malloc_type_zone_allocated(ksp, size, -1);
 +}
 +
 +/*
 + * Remove this allocation from the informational malloc_type bucket.
 + */
 +void
 +malloc_type_freed(struct malloc_type *ksp, unsigned long size)
 +{
 +	mtx_lock(&ksp->ks_mtx);
 +	KASSERT(size <= ksp->ks_memuse,
 +		("malloc(9)/free(9) confusion.\n%s",
 +		 "Probably freeing with wrong type, but maybe not here."));
 +	ksp->ks_memuse -= size;
 +	ksp->ks_inuse--;
 +	mtx_unlock(&ksp->ks_mtx);
 +}
 +
 +/*
   *	malloc:
   *
   *	Allocate a block of memory.
 @@ -195,7 +236,6 @@
  #ifdef DIAGNOSTIC
  	unsigned long osize = size;
  #endif
 -	register struct malloc_type *ksp = type;
  
  #ifdef INVARIANTS
  	/*
 @@ -241,29 +281,16 @@
  		krequests[size >> KMEM_ZSHIFT]++;
  #endif
  		va = uma_zalloc(zone, flags);
 -		mtx_lock(&ksp->ks_mtx);
 -		if (va == NULL) 
 -			goto out;
 -
 -		ksp->ks_size |= 1 << indx;
 -		size = keg->uk_size;
 +		if (va != NULL)
 +			size = keg->uk_size;
 +		malloc_type_zone_allocated(type, va == NULL ? 0 : size, indx);
  	} else {
  		size = roundup(size, PAGE_SIZE);
  		zone = NULL;
  		keg = NULL;
  		va = uma_large_malloc(size, flags);
 -		mtx_lock(&ksp->ks_mtx);
 -		if (va == NULL)
 -			goto out;
 +		malloc_type_allocated(type, va == NULL ? 0 : size);
  	}
 -	ksp->ks_memuse += size;
 -	ksp->ks_inuse++;
 -out:
 -	ksp->ks_calls++;
 -	if (ksp->ks_memuse > ksp->ks_maxused)
 -		ksp->ks_maxused = ksp->ks_memuse;
 -
 -	mtx_unlock(&ksp->ks_mtx);
  	if (flags & M_WAITOK)
  		KASSERT(va != NULL, ("malloc(M_WAITOK) returned NULL"));
  	else if (va == NULL)
 @@ -288,7 +315,6 @@
  	void *addr;
  	struct malloc_type *type;
  {
 -	register struct malloc_type *ksp = type;
  	uma_slab_t slab;
  	u_long size;
  
 @@ -296,7 +322,7 @@
  	if (addr == NULL)
  		return;
  
 -	KASSERT(ksp->ks_memuse > 0,
 +	KASSERT(type->ks_memuse > 0,
  		("malloc(9)/free(9) confusion.\n%s",
  		 "Probably freeing with wrong type, but maybe not here."));
  	size = 0;
 @@ -333,13 +359,7 @@
  		size = slab->us_size;
  		uma_large_free(slab);
  	}
 -	mtx_lock(&ksp->ks_mtx);
 -	KASSERT(size <= ksp->ks_memuse,
 -		("malloc(9)/free(9) confusion.\n%s",
 -		 "Probably freeing with wrong type, but maybe not here."));
 -	ksp->ks_memuse -= size;
 -	ksp->ks_inuse--;
 -	mtx_unlock(&ksp->ks_mtx);
 +	malloc_type_freed(type, size);
  }
  
  /*
 Index: vm/vm_page.h
 ===================================================================
 RCS file: /usr/ncvs/src/sys/vm/vm_page.h,v
 retrieving revision 1.131
 diff -u -r1.131 vm_page.h
 --- vm/vm_page.h	5 Jun 2004 21:06:42 -0000	1.131
 +++ vm/vm_page.h	19 Jun 2004 13:55:08 -0000
 @@ -342,6 +342,9 @@
  
  void vm_page_activate (vm_page_t);
  vm_page_t vm_page_alloc (vm_object_t, vm_pindex_t, int);
 +vm_page_t vm_page_alloc_contig (vm_pindex_t, vm_paddr_t, vm_paddr_t,
 +	    vm_offset_t, vm_offset_t);
 +void vm_page_release_contig (vm_page_t, vm_pindex_t);
  vm_page_t vm_page_grab (vm_object_t, vm_pindex_t, int);
  void vm_page_cache (register vm_page_t);
  int vm_page_try_to_cache (vm_page_t);
 Index: vm/vm_contig.c
 ===================================================================
 RCS file: /usr/ncvs/src/sys/vm/vm_contig.c,v
 retrieving revision 1.35
 diff -u -r1.35 vm_contig.c
 --- vm/vm_contig.c	15 Jun 2004 01:02:00 -0000	1.35
 +++ vm/vm_contig.c	19 Jun 2004 13:56:18 -0000
 @@ -68,6 +68,9 @@
  #include <sys/malloc.h>
  #include <sys/mutex.h>
  #include <sys/proc.h>
 +#include <sys/kernel.h>
 +#include <sys/linker_set.h>
 +#include <sys/sysctl.h>
  #include <sys/vmmeter.h>
  #include <sys/vnode.h>
  
 @@ -83,49 +86,63 @@
  #include <vm/vm_extern.h>
  
  static int
 -vm_contig_launder(int queue)
 +vm_contig_launder_page(vm_page_t m)
  {
  	vm_object_t object;
 -	vm_page_t m, m_tmp, next;
 +	vm_page_t m_tmp;
  	struct vnode *vp;
  
 +	if (!VM_OBJECT_TRYLOCK(m->object))
 +		return (EAGAIN);
 +	if (vm_page_sleep_if_busy(m, TRUE, "vpctw0")) {
 +		VM_OBJECT_UNLOCK(m->object);
 +		vm_page_lock_queues();
 +		return (EBUSY);
 +	}
 +	vm_page_test_dirty(m);
 +	if (m->dirty == 0 && m->busy == 0 && m->hold_count == 0)
 +		pmap_remove_all(m);
 +	if (m->dirty) {
 +		object = m->object;
 +		if (object->type == OBJT_VNODE) {
 +			vm_page_unlock_queues();
 +			vp = object->handle;
 +			VM_OBJECT_UNLOCK(object);
 +			vn_lock(vp, LK_EXCLUSIVE | LK_RETRY, curthread);
 +			VM_OBJECT_LOCK(object);
 +			vm_object_page_clean(object, 0, 0, OBJPC_SYNC);
 +			VM_OBJECT_UNLOCK(object);
 +			VOP_UNLOCK(vp, 0, curthread);
 +			vm_page_lock_queues();
 +			return (0);
 +		} else if (object->type == OBJT_SWAP ||
 +			   object->type == OBJT_DEFAULT) {
 +			m_tmp = m;
 +			vm_pageout_flush(&m_tmp, 1, VM_PAGER_PUT_SYNC);
 +			VM_OBJECT_UNLOCK(object);
 +			return (0);
 +		}
 +	} else if (m->busy == 0 && m->hold_count == 0)
 +		vm_page_cache(m);
 +	VM_OBJECT_UNLOCK(m->object);
 +	return (0);
 +}
 +
 +static int
 +vm_contig_launder(int queue)
 +{
 +	vm_page_t m, next;
 +	int error;
 +
  	for (m = TAILQ_FIRST(&vm_page_queues[queue].pl); m != NULL; m = next) {
  		next = TAILQ_NEXT(m, pageq);
  		KASSERT(m->queue == queue,
  		    ("vm_contig_launder: page %p's queue is not %d", m, queue));
 -		if (!VM_OBJECT_TRYLOCK(m->object))
 -			continue;
 -		if (vm_page_sleep_if_busy(m, TRUE, "vpctw0")) {
 -			VM_OBJECT_UNLOCK(m->object);
 -			vm_page_lock_queues();
 +		error = vm_contig_launder_page(m);
 +		if (error == 0)
  			return (TRUE);
 -		}
 -		vm_page_test_dirty(m);
 -		if (m->dirty == 0 && m->busy == 0 && m->hold_count == 0)
 -			pmap_remove_all(m);
 -		if (m->dirty) {
 -			object = m->object;
 -			if (object->type == OBJT_VNODE) {
 -				vm_page_unlock_queues();
 -				vp = object->handle;
 -				VM_OBJECT_UNLOCK(object);
 -				vn_lock(vp, LK_EXCLUSIVE | LK_RETRY, curthread);
 -				VM_OBJECT_LOCK(object);
 -				vm_object_page_clean(object, 0, 0, OBJPC_SYNC);
 -				VM_OBJECT_UNLOCK(object);
 -				VOP_UNLOCK(vp, 0, curthread);
 -				vm_page_lock_queues();
 -				return (TRUE);
 -			} else if (object->type == OBJT_SWAP ||
 -				   object->type == OBJT_DEFAULT) {
 -				m_tmp = m;
 -				vm_pageout_flush(&m_tmp, 1, VM_PAGER_PUT_SYNC);
 -				VM_OBJECT_UNLOCK(object);
 -				return (TRUE);
 -			}
 -		} else if (m->busy == 0 && m->hold_count == 0)
 -			vm_page_cache(m);
 -		VM_OBJECT_UNLOCK(m->object);
 +		if (error == EBUSY)
 +			return (FALSE);
  	}
  	return (FALSE);
  }
 @@ -308,6 +325,193 @@
  	return (NULL);
  }
  
 +static void
 +vm_page_release_contigl(vm_page_t m, vm_pindex_t count)
 +{
 +	mtx_lock_spin(&vm_page_queue_free_mtx);
 +	while (count--) {
 +		vm_pageq_enqueue(PQ_FREE + m->pc, m);
 +		m++;
 +	}
 +	mtx_unlock_spin(&vm_page_queue_free_mtx);
 +}
 +
 +void
 +vm_page_release_contig(vm_page_t m, vm_pindex_t count)
 +{
 +	vm_page_lock_queues();
 +	vm_page_release_contigl(m, count);
 +	vm_page_unlock_queues();
 +}
 +
 +static void
 +vm_contig_unqueue_free(vm_page_t m)
 +{
 +
 +	KASSERT((m->queue - m->pc) == PQ_FREE,
 +	    ("contigmalloc2: page %p not freed", m));
 +	mtx_lock_spin(&vm_page_queue_free_mtx);
 +	vm_pageq_remove_nowakeup(m);
 +	mtx_unlock_spin(&vm_page_queue_free_mtx);
 +	m->valid = VM_PAGE_BITS_ALL;
 +	if (m->flags & PG_ZERO)
 +		vm_page_zero_count--;
 +	/* Don't clear the PG_ZERO flag; we'll need it later. */
 +	m->flags = PG_UNMANAGED | (m->flags & PG_ZERO);
 +	KASSERT(m->dirty == 0,
 +	    ("contigmalloc2: page %p was dirty", m));
 +	m->wire_count = 0;
 +	m->busy = 0;
 +	m->object = NULL;
 +}
 +
 +vm_page_t
 +vm_page_alloc_contig(vm_pindex_t npages, vm_paddr_t low, vm_paddr_t high,
 +	    vm_offset_t alignment, vm_offset_t boundary)
 +{
 +	vm_object_t object;
 +	vm_offset_t size;
 +	vm_paddr_t phys;
 +	vm_page_t pga = vm_page_array;
 +	int i, pass, pqtype, start;
 +
 +	size = npages << PAGE_SHIFT;
 +	if (size == 0)
 +		panic("vm_page_alloc_contig: size must not be 0");
 +	if ((alignment & (alignment - 1)) != 0)
 +		panic("vm_page_alloc_contig: alignment must be a power of 2");
 +	if ((boundary & (boundary - 1)) != 0)
 +		panic("vm_page_alloc_contig: boundary must be a power of 2");
 +
 +	for (pass = 0; pass < 2; pass++) {
 +		start = cnt.v_page_count;
 +		vm_page_lock_queues();
 +retry:
 +		start--;
 +		/*
 +		 * Find last page in array that is free, within range,
 +		 * aligned, and such that the boundary won't be crossed.
 +		 */
 +		for (i = start; i >= 0; i--) {
 +			phys = VM_PAGE_TO_PHYS(&pga[i]);
 +			pqtype = pga[i].queue - pga[i].pc;
 +			if (pass == 0) {
 +				if (pqtype != PQ_FREE && pqtype != PQ_CACHE)
 +					continue;
 +			} else if (pqtype != PQ_FREE && pqtype != PQ_CACHE &&
 +				    pga[i].queue != PQ_ACTIVE &&
 +				    pga[i].queue != PQ_INACTIVE)
 +				continue;
 +			if (phys >= low && phys + size <= high &&
 +			    ((phys & (alignment - 1)) == 0) &&
 +			    ((phys ^ (phys + size - 1)) & ~(boundary - 1)) == 0)
 +			break;
 +		}
 +		/* There are no candidates at all. */
 +		if (i == -1) {
 +			vm_page_unlock_queues();
 +			continue;
 +		}
 +		start = i;
 +		/*
 +		 * Check successive pages for contiguous and free.
 +		 */
 +		for (i = start + 1; i < start + npages; i++) {
 +			pqtype = pga[i].queue - pga[i].pc;
 +			if (VM_PAGE_TO_PHYS(&pga[i]) !=
 +			    VM_PAGE_TO_PHYS(&pga[i - 1]) + PAGE_SIZE)
 +				goto retry;
 +			if (pass == 0) {
 +				if (pqtype != PQ_FREE && pqtype != PQ_CACHE)
 +					goto retry;
 +			} else if (pqtype != PQ_FREE && pqtype != PQ_CACHE &&
 +				    pga[i].queue != PQ_ACTIVE &&
 +				    pga[i].queue != PQ_INACTIVE)
 +				goto retry;
 +		}
 +		for (i = start; i < start + npages; i++) {
 +			vm_page_t m = &pga[i];
 +
 +			pqtype = m->queue - m->pc;
 +			if (pass != 0 && pqtype != PQ_FREE &&
 +			    pqtype != PQ_CACHE) {
 +				switch (m->queue) {
 +				case PQ_ACTIVE:
 +				case PQ_INACTIVE:
 +					if (vm_contig_launder_page(m) != 0)
 +						goto cleanup_freed;
 +					pqtype = m->queue - m->pc;
 +					if (pqtype == PQ_FREE ||
 +					    pqtype == PQ_CACHE)
 +						break;
 +				default:
 +cleanup_freed:
 +					vm_page_release_contigl(&pga[start],
 +					    i - start);
 +					goto retry;
 +				}
 +			}
 +			if (pqtype == PQ_CACHE) {
 +				object = m->object;
 +				if (!VM_OBJECT_TRYLOCK(object))
 +					goto retry;
 +				vm_page_busy(m);
 +				vm_page_free(m);
 +				VM_OBJECT_UNLOCK(object);
 +			}
 +			vm_contig_unqueue_free(m);
 +		}
 +		vm_page_unlock_queues();
 +		/*
 +		 * We've found a contiguous chunk that meets are requirements.
 +		 */
 +		return (&pga[start]);
 +	}
 +	return (NULL);
 +}
 +
 +static void *
 +contigmalloc2(vm_page_t m, vm_pindex_t npages, int flags)
 +{
 +	vm_object_t object = kernel_object;
 +	vm_map_t map = kernel_map;
 +	vm_offset_t addr, tmp_addr;
 +	vm_pindex_t i;
 + 
 +	/*
 +	 * Allocate kernel VM, unfree and assign the physical pages to
 +	 * it and return kernel VM pointer.
 +	 */
 +	vm_map_lock(map);
 +	if (vm_map_findspace(map, vm_map_min(map), npages << PAGE_SHIFT, &addr)
 +	    != KERN_SUCCESS) {
 +		vm_map_unlock(map);
 +		return (NULL);
 +	}
 +	vm_object_reference(object);
 +	vm_map_insert(map, object, addr - VM_MIN_KERNEL_ADDRESS,
 +	    addr, addr + (npages << PAGE_SHIFT), VM_PROT_ALL, VM_PROT_ALL, 0);
 +	vm_map_unlock(map);
 +	tmp_addr = addr;
 +	VM_OBJECT_LOCK(object);
 +	for (i = 0; i < npages; i++) {
 +		vm_page_insert(&m[i], object,
 +		    OFF_TO_IDX(tmp_addr - VM_MIN_KERNEL_ADDRESS));
 +		if ((flags & M_ZERO) && !(m->flags & PG_ZERO))
 +			pmap_zero_page(&m[i]);
 +		tmp_addr += PAGE_SIZE;
 +	}
 +	VM_OBJECT_UNLOCK(object);
 +	vm_map_wire(map, addr, addr + (npages << PAGE_SHIFT),
 +	    VM_MAP_WIRE_SYSTEM | VM_MAP_WIRE_NOHOLES);
 +	return ((void *)addr);
 +}
 +
 +static int vm_old_contigmalloc = 1;
 +SYSCTL_INT(_vm, OID_AUTO, old_contigmalloc,
 +    CTLFLAG_RW, &vm_old_contigmalloc, 0, "Use the old contigmalloc algorithm");
 +TUNABLE_INT("vm.old_contigmalloc", &vm_old_contigmalloc);
 +
  void *
  contigmalloc(
  	unsigned long size,	/* should be size_t here and for malloc() */
 @@ -319,17 +523,37 @@
  	unsigned long boundary)
  {
  	void * ret;
 +	vm_page_t pages;
 +	vm_pindex_t npgs;
  
 +	npgs = round_page(size) >> PAGE_SHIFT;
  	mtx_lock(&Giant);
 -	ret = contigmalloc1(size, type, flags, low, high, alignment, boundary,
 -	    kernel_map);
 +	if (vm_old_contigmalloc) {
 +		ret = contigmalloc1(size, type, flags, low, high, alignment,
 +		    boundary, kernel_map);
 +	} else {
 +		pages = vm_page_alloc_contig(npgs, low, high,
 +		    alignment, boundary);
 +		if (pages == NULL) {
 +			ret = NULL;
 +		} else {
 +			ret = contigmalloc2(pages, npgs, flags);
 +			if (ret == NULL)
 +				vm_page_release_contig(pages, npgs);
 +		}
 +		
 +	}
  	mtx_unlock(&Giant);
 +	malloc_type_allocated(type, ret == NULL ? 0 : npgs << PAGE_SHIFT);
  	return (ret);
  }
  
  void
  contigfree(void *addr, unsigned long size, struct malloc_type *type)
  {
 +	vm_pindex_t npgs;
  
 +	npgs = round_page(size) >> PAGE_SHIFT;
  	kmem_free(kernel_map, (vm_offset_t)addr, size);
 +	malloc_type_freed(type, npgs << PAGE_SHIFT);
  }
 
 -- 
 Brian Fundakowski Feldman                           \'[ FreeBSD ]''''''''''\
   <> green@FreeBSD.org                               \  The Power to Serve! \
  Opinions expressed are my own.                       \,,,,,,,,,,,,,,,,,,,,,,\
State-Changed-From-To: open->closed 
State-Changed-By: green 
State-Changed-When: Mon Aug 16 11:14:14 GMT 2004 
State-Changed-Why:  
This exists in -CURRENT now for you to use; I hope you like the API. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=55081 
>Unformatted:
