From karl@fs.denninger.net  Fri Mar 14 20:41:42 2014
Return-Path: <karl@fs.denninger.net>
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
	(using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
	(No client certificate requested)
	by hub.freebsd.org (Postfix) with ESMTPS id 773719D6
	for <FreeBSD-gnats-submit@freebsd.org>; Fri, 14 Mar 2014 20:41:42 +0000 (UTC)
Received: from fs.denninger.net (wsip-70-169-168-7.pn.at.cox.net [70.169.168.7])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by mx1.freebsd.org (Postfix) with ESMTPS id 25574C85
	for <FreeBSD-gnats-submit@freebsd.org>; Fri, 14 Mar 2014 20:41:41 +0000 (UTC)
Received: from fs.denninger.net (localhost [127.0.0.1])
	by fs.denninger.net (8.14.8/8.14.8) with ESMTP id s2EKfb0q011032
	for <FreeBSD-gnats-submit@freebsd.org>; Fri, 14 Mar 2014 15:41:38 -0500 (CDT)
	(envelope-from karl@fs.denninger.net)
Received: from fs.denninger.net (TLS/SSL) [127.0.0.1] by Spamblock-sys (LOCAL/AUTH);
	Fri Mar 14 15:41:37 2014
Received: (from karl@localhost)
	by fs.denninger.net (8.14.8/8.14.8/Submit) id s2EKfW0M011029;
	Fri, 14 Mar 2014 15:41:32 -0500 (CDT)
	(envelope-from karl)
Message-Id: <201403142041.s2EKfW0M011029@fs.denninger.net>
Date: Fri, 14 Mar 2014 15:41:32 -0500 (CDT)
From: Karl Denninger <karl@fs.denninger.net>
Reply-To: Karl Denninger <karl@fs.denninger.net>
To: FreeBSD-gnats-submit@freebsd.org
Cc:
Subject: REPLACES PR187572 - ZFS ARC behavior problem and fix
X-Send-Pr-Version: 3.114
X-GNATS-Notify: swills@FreeBSD.org

>Number:         187594
>Category:       kern
>Synopsis:       [zfs] [patch] ZFS ARC behavior problem and fix
>Confidential:   no
>Severity:       serious
>Priority:       high
>Responsible:    freebsd-fs
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Fri Mar 14 20:50:00 UTC 2014
>Closed-Date:    
>Last-Modified:  Thu May 15 15:30:00 UTC 2014
>Originator:     Karl Denninger
>Release:        FreeBSD 10.0-STABLE amd64
>Organization:
Karls Sushi and Packet Smashers
>Environment:
System: FreeBSD NewFS.denninger.net 10.0-STABLE FreeBSD 10.0-STABLE #13 r263037M: Fri Mar 14 14:58:11 CDT 2014 karl@NewFS.denninger.net:/usr/obj/usr/src/sys/KSD-SMP amd64

Note: also likely impacts previous versions of FreeBSD with ZFS.

>Description:
ZFS can be convinced to engage in pathological behavior due to a bad
low-memory test in arc.c

The offending file is at
/usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c; it allegedly
checks for 25% free memory, and if it is less asks for the cache to shrink.

(snippet from arc.c around line 2494 of arc.c in 10-STABLE; path
/usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs)

#else /* !sun */
if (kmem_used() > (kmem_size() * 3) / 4)
return (1);
#endif /* sun */

Unfortunately these two functions do not return what the authors thought
they did.  It's clear what they're trying to do from the Solaris-specific
code up above this test.

The result is that the cache only shrinks when vm_paging_needed() tests
true, but by that time the system is in serious memory trouble and by
triggering only there it actually drives the system further into paging,
because the pager will not recall pages from the swap until they are next
executed.  This leads the ARC to try to fill in all the available RAM even
though pages have been pushed off onto swap.  Not good.


>How-To-Repeat:
	Set up a cache-heavy workload on large (~terabyte sized or bigger)
	ZFS filesystems and note that free RAM drops to the point that
	starvation occurs, while "wired" memory pins at the maximum ARC
	cache size, even though you have other demands for RAM that should
	cause the ARC memory congestion control algorithm to evict some of
	the cache as demand rises.  It does not.

>Fix:

	The following context diff corrects the problem.  If NEWRECLAIM is 
	defined (by default turned on once the patch is applied) we declare 
	and export a new tunable:

	vfs.zfs.arc_freepage_percent_target
	
	The default on zfs load is set to to 25 percent, as was the intent
	of the original software.  You may tune this in real time with
	sysctl to suit your workload and machine's installed RAM; unlike
	setting "arc_max" which can only be done at boot the target
	RAM consumption percentage is adaptive.

	Instead of the above code we then test for wired, active, inactive
	and cache pages, comparing free space against the total.  If the
	free space is less in percentage terms than the sum of all five we
	declare memory constrained, otherwise we declare that it is not.

	We retain the paging check but none of the others if this option is
	declared.

	A debugging flag called "NEWRECLAIM_DEBUG" is present in the code, 
	if changed from "undef" to "define" at compile time it will cause 
	printing of status changes (constrained .vs. not) along with any 
	picked up changes in the target in real-time.  This should not 
	be used in production.



*** /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c	Fri Mar 14 15:36:17 2014
--- arc.c.original	Thu Mar 13 09:18:48 2014
***************
*** 18,85 ****
   *
   * CDDL HEADER END
   */
- 
- /* Karl Denninger (karl@denninger.net), 3/13/2014, FreeBSD-specific
-  * 
-  * If "NEWRECLAIM" is defined, change the "low memory" warning that causes
-  * the ARC cache to be pared down.  The reason for the change is that the
-  * apparent attempted algorithm is to start evicting ARC cache when free
-  * pages fall below 25% of installed RAM.  This maps reasonably well to how
-  * Solaris is documented to behave; when "lotsfree" is invaded ZFS is told 
-  * to pare down.  
-  *
-  * The problem is that on FreeBSD machines the system doesn't appear to be 
-  * getting what the authors of the original code thought they were looking at
-  * with its test and as a result that test never triggers.  That leaves the 
-  * only reclaim trigger as the "paging needed" status flag, and by the time 
-  * that trips the system is already in low-memory trouble.  This can lead to 
-  * severe pathological behavior under the following scenario:
-  * - The system starts to page and ARC is evicted.
-  * - The system stops paging as ARC's eviction drops wired RAM a bit.
-  * - ARC starts increasing its allocation again, and wired memory grows.
-  * - A new image is activated, and the system once again attempts to page.
-  * - ARC starts to be evicted again.
-  * - Back to #2
-  * 
-  * Note that ZFS's ARC default (unless you override it in /boot/loader.conf)
-  * is to allow the ARC cache to grab nearly all of free RAM, provided nobody
-  * else needs it.  That would be ok if we evicted cache when required.
-  * 
-  * Unfortunately the system can get into a state where it never
-  * manages to page anything of materiality back in, as if there is active
-  * I/O the ARC will start grabbing space once again as soon as the memory 
-  * contention state drops.  For this reason the "paging is occurring" flag 
-  * should be the **last resort** condition for ARC eviction; you want to 
-  * (as Solaris does) start when there is material free RAM left in the hope 
-  * of never getting into the condition where you're potentially paging off 
-  * executables in favor of leaving disk cache allocated.  That's a recipe 
-  * for terrible overall system performance.
-  *
-  * To fix this we instead grab four OIDs out of the sysctl status
-  * messages -- wired pages, active pages, inactive pages and cache (vnodes?)
-  * pages, sum those and compare against the free page count from the
-  * VM sysctl status OID, giving us a percentage of pages free.  This
-  * is checked against a new tunable "vfs.zfs.arc_freepage_percent_target"
-  * and if less, we declare the system low on memory.
-  * 
-  * Note that this sysctl variable is runtime tunable if you have reason
-  * to change it (e.g. you want more or less RAM free to be the "clean up"
-  * threshold.)
-  *
-  * If we're using this check for low memory we are replacing the previous
-  * ones, including the oddball "random" reclaim that appears to fire far
-  * more often than it should.  We still trigger if the system pages.
-  *
-  * If you turn on NEWRECLAIM_DEBUG then the kernel will print on the console
-  * status messages when the reclaim status trips on and off, along with the
-  * page count aggregate that triggered it (and the free space) for each
-  * event. 
-  */
- 
- #define	NEWRECLAIM
- #undef	NEWRECLAIM_DEBUG
- 
- 
  /*
   * Copyright (c) 2005, 2010, Oracle and/or its affiliates. All rights reserved.
   * Copyright (c) 2013 by Delphix. All rights reserved.
--- 18,23 ----
***************
*** 201,212 ****
  
  #include <vm/vm_pageout.h>
  
- #ifdef	NEWRECLAIM
- #ifdef	__FreeBSD__
- #include <sys/sysctl.h>
- #endif
- #endif	/* NEWRECLAIM */
- 
  #ifdef illumos
  #ifndef _KERNEL
  /* set with ZFS_DEBUG=watch, to enable watchpoints on frozen buffers */
--- 139,144 ----
***************
*** 271,303 ****
  int zfs_arc_shrink_shift = 0;
  int zfs_arc_p_min_shift = 0;
  int zfs_disable_dup_eviction = 0;
- #ifdef	NEWRECLAIM
- #ifdef  __FreeBSD__
- static	int percent_target = 25;
- #endif
- #endif
  
  TUNABLE_QUAD("vfs.zfs.arc_max", &zfs_arc_max);
  TUNABLE_QUAD("vfs.zfs.arc_min", &zfs_arc_min);
  TUNABLE_QUAD("vfs.zfs.arc_meta_limit", &zfs_arc_meta_limit);
- #ifdef	NEWRECLAIM
- #ifdef  __FreeBSD__
- TUNABLE_INT("vfs.zfs.arc_freepage_percent_target", &percent_target);
- #endif
- #endif
- 
  SYSCTL_DECL(_vfs_zfs);
  SYSCTL_UQUAD(_vfs_zfs, OID_AUTO, arc_max, CTLFLAG_RDTUN, &zfs_arc_max, 0,
      "Maximum ARC size");
  SYSCTL_UQUAD(_vfs_zfs, OID_AUTO, arc_min, CTLFLAG_RDTUN, &zfs_arc_min, 0,
      "Minimum ARC size");
  
- #ifdef	NEWRECLAIM
- #ifdef  __FreeBSD__
- SYSCTL_INT(_vfs_zfs, OID_AUTO, arc_freepage_percent_target, CTLFLAG_RWTUN, &percent_target, 0, "ARC Free RAM Target percentage");
- #endif
- #endif
- 
  /*
   * Note that buffers can be in one of 6 states:
   *	ARC_anon	- anonymous (discussed below)
--- 203,218 ----
***************
*** 2523,2544 ****
  {
  
  #ifdef _KERNEL
- #ifdef	NEWRECLAIM
- #ifdef  __FreeBSD__
-         u_int	vmwire = 0;
- 	u_int	vmactive = 0;
- 	u_int	vminactive = 0;
- 	u_int	vmcache = 0;
- 	u_int	vmfree = 0;
- 	u_int	vmtotal = 0;
- 	int	percent = 25;
- 	size_t	vmsize;
- #ifdef	NEWRECLAIM_DEBUG
- 	static	int	xval = -1;
- 	static	int	oldpercent = 0;
- #endif	/* NEWRECLAIM_DEBUG */
- #endif	/* NEWRECLAIM */
- #endif
  
  	if (needfree)
  		return (1);
--- 2438,2443 ----
***************
*** 2577,2583 ****
  		return (1);
  
  #if defined(__i386)
- 
  	/*
  	 * If we're on an i386 platform, it's possible that we'll exhaust the
  	 * kernel heap space before we ever run out of available physical
--- 2476,2481 ----
***************
*** 2594,2659 ****
  		return (1);
  #endif
  #else	/* !sun */
- 
- #ifdef	NEWRECLAIM
- #ifdef  __FreeBSD__
- /*
-  * Implement the new tunable free RAM algorithm.  We check the various page
-  * VM stats and add them up, then check the free count percentage against
-  * the specified target.  If we're under the target we are memory constrained
-  * and ask for ARC cache shrinkage.  If this is defined on a FreeBSD system
-  * the older checks are not performed.
-  */
- 	vmsize = sizeof(vmwire);
-         kernel_sysctlbyname(curthread, "vm.stats.vm.v_wire_count", &vmwire, &vmsize, NULL, 0, NULL, 0);
- 	vmsize = sizeof(vmactive);
-         kernel_sysctlbyname(curthread, "vm.stats.vm.v_active_count", &vmactive, &vmsize, NULL, 0, NULL, 0);
- 	vmsize = sizeof(vminactive);
-         kernel_sysctlbyname(curthread, "vm.stats.vm.v_inactive_count", &vminactive, &vmsize, NULL, 0, NULL, 0);
- 	vmsize = sizeof(vmcache);
-         kernel_sysctlbyname(curthread, "vm.stats.vm.v_cache_count", &vmcache, &vmsize, NULL, 0, NULL, 0);
- 	vmsize = sizeof(vmfree);
-         kernel_sysctlbyname(curthread, "vm.stats.vm.v_free_count", &vmfree, &vmsize, NULL, 0, NULL, 0);
- 	vmsize = sizeof(percent);
-         kernel_sysctlbyname(curthread, "vfs.zfs.arc_freepage_percent_target", &percent, &vmsize, NULL, 0, NULL, 0);
- 	vmtotal = vmwire + vmactive + vminactive + vmcache + vmfree;
- #ifdef	NEWRECLAIM_DEBUG
- 	if (percent != oldpercent) {
- 		printf("ZFS ARC: Reservation change to [%d], [%d] pages, [%d] free\n", percent, vmtotal, vmfree);
- 		oldpercent = percent;
- 	}
- #endif
- 
- 	if (!vmtotal) {
- 		vmtotal = 1;	/* Protect against divide by zero */
- 				/* (should be impossible, but...) */
- 	}
- 
- 	if (((vmfree * 100) / vmtotal) < percent) {
- #ifdef	NEWRECLAIM_DEBUG
- 		if (xval != 1) {
- 			printf("ZFS ARC: RECLAIM total %u, free %u, free pct (%u), target pct (%u)\n", vmtotal, vmfree, ((vmfree * 100) / vmtotal), percent);
- 			xval = 1;
- 		}
- #endif	/* NEWRECLAIM_DEBUG */
- 		return(1);
- 	} else {
- #ifdef	NEWRECLAIM_DEBUG
- 		if (xval != 0) {
- 			printf("ZFS ARC: NORMAL total %u, free %u, free pct (%u), target pct (%u)\n", vmtotal, vmfree, ((vmfree * 100) / vmtotal), percent);
- 			xval = 0;
- 		}
- #endif
- 		return(0);
- 	}
- 
- #endif	/* __FreeBSD__ */
- #endif	/* NEWRECLAIM */
- 
  	if (kmem_used() > (kmem_size() * 3) / 4)
  		return (1);
  #endif	/* sun */
  
  	if (spa_get_random(100) == 0)
  		return (1);
  #endif
--- 2492,2502 ----
  		return (1);
  #endif
  #else	/* !sun */
  	if (kmem_used() > (kmem_size() * 3) / 4)
  		return (1);
  #endif	/* sun */
  
+ #else
  	if (spa_get_random(100) == 0)
  		return (1);
  #endif


>Release-Note:
>Audit-Trail:
Responsible-Changed-From-To: freebsd-bugs->freebsd-fs 
Responsible-Changed-By: linimon 
Responsible-Changed-When: Sat Mar 15 21:21:18 UTC 2014 
Responsible-Changed-Why:  
Over to maintainer(s). 

http://www.freebsd.org/cgi/query-pr.cgi?pr=187594 

From: Adam McDougall <mcdouga9@egr.msu.edu>
To: bug-followup@FreeBSD.org, karl@fs.denninger.net
Cc:  
Subject: Re: kern/187594: [zfs] [patch] ZFS ARC behavior problem and fix
Date: Sat, 15 Mar 2014 23:42:00 -0400

 This is generally working well for me so far, been running it over a day
 on my desktop at home with only 4G ram and I have not needlessly
 swapped.  Generally have 1GB or more free ram now although I also
 decreased vfs.zfs.arc_freepage_percent_target to 15 because my ARC total
 was pretty low.  At the moment I have 406M ARC and 1070M free while
 Thunderbird and over a dozen Chromium tabs open.  Thanks for working on
 a patch!

From: Andriy Gapon <avg@FreeBSD.org>
To: bug-followup@FreeBSD.org, karl@fs.denninger.net
Cc:  
Subject: Re: kern/187594: [zfs] [patch] ZFS ARC behavior problem and fix
Date: Tue, 18 Mar 2014 17:15:05 +0200

 Karl Denninger <karl@fs.denninger.net> wrote:
 > ZFS can be convinced to engage in pathological behavior due to a bad
 > low-memory test in arc.c
 > 
 > The offending file is at
 > /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c; it allegedly
 > checks for 25% free memory, and if it is less asks for the cache to shrink.
 > 
 > (snippet from arc.c around line 2494 of arc.c in 10-STABLE; path
 > /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs)
 > 
 > #else /* !sun */
 > if (kmem_used() > (kmem_size() * 3) / 4)
 > return (1);
 > #endif /* sun */
 > 
 > Unfortunately these two functions do not return what the authors thought
 > they did. It's clear what they're trying to do from the Solaris-specific
 > code up above this test.
 
 No, these functions do return what the authors think they do.
 The check is for KVA usage (kernel virtual address space), not for physical memory.
 
 > The result is that the cache only shrinks when vm_paging_needed() tests
 > true, but by that time the system is in serious memory trouble and by
 
 No, it is not.
 The description and numbers here are a little bit outdated but they should give
 an idea of how paging works in general:
 https://wiki.freebsd.org/AvgPageoutAlgorithm
 
 > triggering only there it actually drives the system further into paging,
 
 How does ARC eviction drives the system further into paging?
 
 > because the pager will not recall pages from the swap until they are next
 > executed. This leads the ARC to try to fill in all the available RAM even
 > though pages have been pushed off onto swap. Not good.
 
 Unused physical memory is a waste.  It is true that ARC tries to use as much of
 memory as it is allowed.  The same applies to the page cache (Active, Inactive).
 Memory management is a dynamic system and there are a few competing agents.
 
 It is hard to correctly tune that system using a large hummer such as your
 patch.  I believe that with your patch ARC will get shrunk to its minimum size
 in due time.  Active + Inactive will grow to use the memory that you are denying
 to ARC driving Free below a threshold, which will reduce ARC.  Repeated enough
 times this will drive ARC to its minimum.
 
 Also, there are a few technical problems with the patch:
 - you don't need to use sysctl interface in kernel, the values you need are
 available directly, just take a look at e.g. implementation of vm_paging_needed()
 - similarly, querying vfs.zfs.arc_freepage_percent_target value via
 kernel_sysctlbyname is just bogus; you can use percent_target directly
 - you don't need to sum various page counters to get a total count, there is
 v_page_count
 
 Lastly, can you try to test reverting your patch and instead setting
 vm.lowmem_period=0 ?
 
 -- 
 Andriy Gapon

From: Karl Denninger <karl@denninger.net>
To: avg@FreeBSD.org
Cc: freebsd-fs@freebsd.org, bug-followup@FreeBSD.org
Subject: Re: kern/187594: [zfs] [patch] ZFS ARC behavior problem and fix
Date: Wed, 19 Mar 2014 09:18:40 -0500

 This is a cryptographically signed message in MIME format.
 
 --------------ms010701070402040604030408
 Content-Type: text/plain; charset=ISO-8859-1; format=flowed
 Content-Transfer-Encoding: quoted-printable
 
 
 On 3/18/2014 12:19 PM, Karl Denninger wrote:
 >
 > On 3/18/2014 10:20 AM, Andriy Gapon wrote:
 >> The following reply was made to PR kern/187594; it has been noted by=20
 >> GNATS.
 >>
 >> From: Andriy Gapon <avg@FreeBSD.org>
 >> To: bug-followup@FreeBSD.org, karl@fs.denninger.net
 >> Cc:
 >> Subject: Re: kern/187594: [zfs] [patch] ZFS ARC behavior problem and f=
 ix
 >> Date: Tue, 18 Mar 2014 17:15:05 +0200
 >>
 >>   Karl Denninger <karl@fs.denninger.net> wrote:
 >>   > ZFS can be convinced to engage in pathological behavior due to a b=
 ad
 >>   > low-memory test in arc.c
 >>   >
 >>   > The offending file is at
 >>   > /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c; it =
 
 >> allegedly
 >>   > checks for 25% free memory, and if it is less asks for the cache=20
 >> to shrink.
 >>   >
 >>   > (snippet from arc.c around line 2494 of arc.c in 10-STABLE; path
 >>   > /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs)
 >>   >
 >>   > #else /* !sun */
 >>   > if (kmem_used() > (kmem_size() * 3) / 4)
 >>   > return (1);
 >>   > #endif /* sun */
 >>   >
 >>   > Unfortunately these two functions do not return what the authors=20
 >> thought
 >>   > they did. It's clear what they're trying to do from the=20
 >> Solaris-specific
 >>   > code up above this test.
 >>     No, these functions do return what the authors think they do.
 >>   The check is for KVA usage (kernel virtual address space), not for=20
 >> physical memory.
 > I understand, but that's nonsensical in the context of the Solaris=20
 > code.  "lotsfree" is *not* a declaration of free kvm space, it's a=20
 > declaration of when the system has "lots" of free *physical* memory.
 >
 > Further it makes no sense at all to allow the ARC cache to force=20
 > things into virtual (e.g. swap-space backed) memory.  But that's the=20
 > behavior that has been observed, and it fits with the code as=20
 > originally written.
 >
 >>     > The result is that the cache only shrinks when=20
 >> vm_paging_needed() tests
 >>   > true, but by that time the system is in serious memory trouble=20
 >> and by
 >>     No, it is not.
 >>   The description and numbers here are a little bit outdated but they =
 
 >> should give
 >>   an idea of how paging works in general:
 >>   https://wiki.freebsd.org/AvgPageoutAlgorithm
 >>     > triggering only there it actually drives the system further=20
 >> into paging,
 >>     How does ARC eviction drives the system further into paging?
 > 1. System gets low on physical memory but the ARC cache is looking at=20
 > available kvm (of which there is plenty.)  The ARC cache continues to=20
 > expand.
 >
 > 2. vm_paging_needed() returns true and the system begins to page off=20
 > to the swap.  At the same time the ARC cache is pared down because=20
 > arc_reclaim_needed has returned "1".
 >
 > 3. As the ARC cache shrinks and paging occurs vm_paging_needed()=20
 > returns false.  Paging out ceases but inactive pages remain on the=20
 > swap.  They are not recalled until and unless they are scheduled to=20
 > execute.  Arc_reclaim_needed again returns "0".
 >
 > 4. The hold-down timer expires in the ARC cache code=20
 > ("arc_grow_retry", declared as 60 seconds) and the ARC cache begins to =
 
 > expand again.
 >
 > Go back to #2 until the system's performance starts to deteriorate=20
 > badly enough due to the paging that you notice it, which occurs when=20
 > something that is actually consuming CPU time has to be called in from =
 
 > swap.
 >
 > This is consistent with what I and others have observed on both 9.2=20
 > and 10.0; the ARC will expand until it hits the maximum configured=20
 > even at the expense of forcing pages onto the swap.  In this specific=20
 > machine's case left to defaults it will grab nearly all physical=20
 > memory (over 20GB of 24) and wire it down.
 >
 > Limiting arc_max to 16GB sorta fixes it.  I say "sorta" because it=20
 > turns out that 16GB is still too much for the workload; it prevents=20
 > the pathological behavior where system "stalls" happen but only in the =
 
 > extreme.  It turns out with the patch in my ARC cache stabilizes at=20
 > about 13.5GB during the busiest part of the day, growing to about 16=20
 > off-hours.
 >
 > One of the problems with just limiting it in /boot/loader.conf is that =
 
 > you have to guess and the system doesn't reasonably adapt to changing=20
 > memory loads.  The code is clearly intended to do that but it doesn't=20
 > end up working that way in practice.
 >>     > because the pager will not recall pages from the swap until=20
 >> they are next
 >>   > executed. This leads the ARC to try to fill in all the available=20
 >> RAM even
 >>   > though pages have been pushed off onto swap. Not good.
 >>     Unused physical memory is a waste.  It is true that ARC tries to=20
 >> use as much of
 >>   memory as it is allowed.  The same applies to the page cache=20
 >> (Active, Inactive).
 >>   Memory management is a dynamic system and there are a few competing =
 
 >> agents.
 > That's true.  However, what the stock code does is force working set=20
 > out of memory and into the swap.  The ideal situation is one in which=20
 > there is no free memory because cache has sized itself to consume=20
 > everything *not* necessary for the working set of the processes that=20
 > are running.  Unfortunately we cannot determine this presciently=20
 > because a new process may come along and we do not necessarily know=20
 > for how long a process that is blocked on an event will remain blocked =
 
 > (e.g. something waiting on network I/O, etc.)
 >
 > However, it is my contention that you do not want to evict a process=20
 > that is scheduled to run (or is going to be) in favor of disk cache=20
 > because you're defeating yourself by doing so.  The point of the disk=20
 > cache is to avoid going to the physical disk for I/O, but if you page=20
 > something you have ditched a physical I/O for data in favor of having=20
 > to go to physical disk *twice* -- first to write the paged-out data to =
 
 > swap, and then to retrieve it when it is to be executed.  This also=20
 > appears to be consistent with what is present for Solaris machines.
 >
 > From the Sun code:
 >
 > #ifdef sun
 >         /*
 >          * take 'desfree' extra pages, so we reclaim sooner, rather=20
 > than later
 >          */
 >         extra =3D desfree;
 >
 >         /*
 >          * check that we're out of range of the pageout scanner. It=20
 > starts to
 >          * schedule paging if freemem is less than lotsfree and needfre=
 e.
 >          * lotsfree is the high-water mark for pageout, and needfree=20
 > is the
 >          * number of needed free pages.  We add extra pages here to=20
 > make sure
 >          * the scanner doesn't start up while we're freeing memory.
 >          */
 >         if (freemem < lotsfree + needfree + extra)
 >                 return (1);
 >
 >         /*
 >          * check to make sure that swapfs has enough space so that anon=
 
 >          * reservations can still succeed. anon_resvmem() checks that t=
 he
 >          * availrmem is greater than swapfs_minfree, and the number of =
 
 > reserved
 >          * swap pages.  We also add a bit of extra here just to prevent=
 
 >          * circumstances from getting really dire.
 >          */
 >         if (availrmem < swapfs_minfree + swapfs_reserve + extra)
 >                 return (1);
 >
 > "freemem" is not virtual memory, it's actual memory.  "Lotsfree" is=20
 > the point where the system considers free RAM to be "ample";=20
 > "needfree" is the "desperation" point and "extra" is the margin=20
 > (presumably for image activation.)
 >
 > The base code on FreeBSD doesn't look at physical memory at all; it=20
 > looks at kvm space instead.
 >
 >>   It is hard to correctly tune that system using a large hummer such=20
 >> as your
 >>   patch.  I believe that with your patch ARC will get shrunk to its=20
 >> minimum size
 >>   in due time.  Active + Inactive will grow to use the memory that=20
 >> you are denying
 >>   to ARC driving Free below a threshold, which will reduce ARC.=20
 >> Repeated enough
 >>   times this will drive ARC to its minimum.
 > I disagree both in design theory and based on the empirical evidence=20
 > of actual operation.
 >
 > First, I don't (ever) want to give memory to the ARC cache that=20
 > otherwise would go to "active", because any time I do that I'm going=20
 > to force two page events, which is double the amount of I/O I would=20
 > take on a cache *miss*, and even with the ARC at minimum I get a=20
 > reasonable hit percentage.  If I therefore prefer ARC over "active"=20
 > pages I am going to take *at least* a 200% penalty on physical I/O and =
 
 > if I get an 80% hit ratio with the ARC at a minimum the penalty is=20
 > closer to 800%!
 >
 > For inactive pages it's a bit more complicated as those may not be=20
 > reactivated.  However, I am trusting FreeBSD's VM subsystem to demote=20
 > those that are unlikely to be reactivated to the cache bucket and then =
 
 > to "free", where they are able to be re-used. This is consistent with=20
 > what I actually see on a running system -- the "inact" bucket is=20
 > typically fairly large (often on a busy machine close to that of=20
 > "active") but pages demoted to "cache" don't stay there long - they=20
 > either get re-promoted back up or they are freed and go on the free lis=
 t.
 >
 > The only time I see "inact" get out of control is when there's a=20
 > kernel memory leak somewhere (such as what I ran into the other day=20
 > with the in-kernel NAT subsystem on 10-STABLE.)  But that's a bug and=20
 > if it happens you're going to get bit anyway.
 >
 > For example right now on one of my very busy systems with 24GB of=20
 > installed RAM and many terabytes of storage across three ZFS pools I'm =
 
 > seeing 17GB wired of which 13.5 is ARC cache.  That's the adaptive=20
 > figure it currently is running at, with a maximum of 22.3 and a=20
 > minimum of 2.79 (8:1 ratio.)  The remainder is wired down for other=20
 > reasons (there's a fairly large Postgres server running on that box,=20
 > among other things, and it has a big shared buffer declaration --=20
 > that's most of the difference.)  Cache hit efficiency is currently 97.8=
 %.
 >
 > Active is 2.26G right now, and inactive is 2.09G.  Both are stable.=20
 > Overnight inactive will drop to about 1.1GB while active will not=20
 > change all that much since most of it postgres and the middleware that =
 
 > talks to it along with apache, which leaves most of its processes=20
 > present even when they go idle.  Peak load times are about right now=20
 > (mid-day), and again when the system is running backups nightly.
 >
 > Cache is 7448, in other words, insignificant.  Free memory is 2.6G.
 >
 > The tunable is set to 10%, which is almost exactly what free memory=20
 > is.  I find that when the system gets under 1G free transient image=20
 > activation can drive it into paging and performance starts to suffer=20
 > for my particular workload.
 >
 >>     Also, there are a few technical problems with the patch:
 >>   - you don't need to use sysctl interface in kernel, the values you=20
 >> need are
 >>   available directly, just take a look at e.g. implementation of=20
 >> vm_paging_needed()
 > That's easily fixed.  I will look at it.
 >>   - similarly, querying vfs.zfs.arc_freepage_percent_target value via
 >>   kernel_sysctlbyname is just bogus; you can use percent_target direct=
 ly
 > I did not know if during setup of the OID the value was copied (and=20
 > thus you had to reference it later on) or the entry simply took the=20
 > pointer and stashed that.  Easily corrected.
 >>   - you don't need to sum various page counters to get a total count, =
 
 >> there is
 >>   v_page_count
 > Fair enough as well.
 >>   Lastly, can you try to test reverting your patch and instead setting=
 
 >>   vm.lowmem_period=3D0 ?
 > Yes.  By default it's 10; I have not tampered with that default.
 >
 > Let me do a bit of work and I'll post back with a revised patch.=20
 > Perhaps a tunable for percentage free + a free reserve that is a=20
 > "floor"?  The problem with that is where to put the defaults.  One=20
 > option would be to grab total size at init time and compute something=20
 > similar to what "lotsfree" is for Solaris, allowing that to be tuned=20
 > with the percentage if desired.  I selected 25% because that's what=20
 > the original test was expressing and it should be reasonable for=20
 > modest RAM configurations.  It's clearly too high for moderately large =
 
 > (or huge) memory machines unless they have a lot of RAM -hungry=20
 > processes running on them.
 >
 > The percentage test, however, is an easy knob to twist that is=20
 > unlikely to severely harm you if you dial it too far in either=20
 > direction; anyone setting it to zero obviously knows what they're=20
 > getting into, and if you crank it too high all you end up doing is=20
 > limiting the ARC to the minimum value.
 >
 
 Responsive to the criticisms and in an attempt to better-track what the=20
 VM system does, I offer this update to the patch.  The following changes =
 
 have been made:
 
 1. There are now two tunables:
 vfs.zfs.arc_freepages -- the number of free pages below which we declare =
 
 low memory and ask for ARC paring.
 vfs.zfs.arc_freepage_percent -- the additional free RAM to reserve in=20
 percent of total, if any (added to freepages)
 
 2. vfs.zfs.arc_freepages, if zero (as is the default at boot), defaults=20
 to "vm.stats.vm.v_free_target" less 20%.  This allows the system to get=20
 into the page-stealing paradigm before the ARC cache is invaded.  While=20
 I do not run into a situation of unbridled inact page growth here the=20
 criticism that the original patch could allow this appears to be=20
 well-founded.  Setting the low memory alert here should prevent this, as =
 
 the system will now allow the ARC to grow to the point that=20
 page-stealing takes place.
 
 3. The previous option to reserve either a hard amount of RAM or a=20
 percentage of RAM remains.
 
 4. The defaults should auto-tune for any particular RAM configuration to =
 
 reasonable values that prevent stalls, yet if you have circumstances=20
 that argue for reserving more memory you may do so.
 
 Updated patch follows:
 
 *** arc.c.original	Thu Mar 13 09:18:48 2014
 --- arc.c	Wed Mar 19 07:44:01 2014
 ***************
 *** 18,23 ****
 --- 18,99 ----
     *
     * CDDL HEADER END
     */
 +
 + /* Karl Denninger (karl@denninger.net), 3/18/2014, FreeBSD-specific
 +  *
 +  * If "NEWRECLAIM" is defined, change the "low memory" warning that cau=
 ses
 +  * the ARC cache to be pared down.  The reason for the change is that t=
 he
 +  * apparent attempted algorithm is to start evicting ARC cache when fre=
 e
 +  * pages fall below 25% of installed RAM.  This maps reasonably well to=
  how
 +  * Solaris is documented to behave; when "lotsfree" is invaded ZFS is t=
 old
 +  * to pare down.
 +  *
 +  * The problem is that on FreeBSD machines the system doesn't appear to=
  be
 +  * getting what the authors of the original code thought they were look=
 ing at
 +  * with its test -- or at least not what Solaris did -- and as a result=
  that
 +  * test never triggers.  That leaves the only reclaim trigger as the "p=
 aging
 +  * needed" status flag, and by the time * that trips the system is alre=
 ady
 +  * in low-memory trouble.  This can lead to severe pathological behavio=
 r
 +  * under the following scenario:
 +  * - The system starts to page and ARC is evicted.
 +  * - The system stops paging as ARC's eviction drops wired RAM a bit.
 +  * - ARC starts increasing its allocation again, and wired memory grows=
 =2E
 +  * - A new image is activated, and the system once again attempts to pa=
 ge.
 +  * - ARC starts to be evicted again.
 +  * - Back to #2
 +  *
 +  * Note that ZFS's ARC default (unless you override it in /boot/loader.=
 conf)
 +  * is to allow the ARC cache to grab nearly all of free RAM, provided n=
 obody
 +  * else needs it.  That would be ok if we evicted cache when required.
 +  *
 +  * Unfortunately the system can get into a state where it never
 +  * manages to page anything of materiality back in, as if there is acti=
 ve
 +  * I/O the ARC will start grabbing space once again as soon as the memo=
 ry
 +  * contention state drops.  For this reason the "paging is occurring" f=
 lag
 +  * should be the **last resort** condition for ARC eviction; you want t=
 o
 +  * (as Solaris does) start when there is material free RAM left BUT the=
 
 +  * vm system thinks it needs to be active to steal pages back in the at=
 tempt
 +  * to never get into the condition where you're potentially paging off
 +  * executables in favor of leaving disk cache allocated.
 +  *
 +  * To fix this we change how we look at low memory, declaring two new
 +  * runtime tunables.
 +  *
 +  * The new sysctls are:
 +  * vfs.zfs.arc_freepages (free pages required to call RAM "sufficient")=
 
 +  * vfs.zfs.arc_freepage_percent (additional reservation percentage, def=
 ault 0)
 +  *
 +  * vfs.zfs.arc_freepages is initialized from vm.stats.vm.v_free_target,=
 
 +  * less 20% if we find that it is zero.  Note that vm.stats.vm.v_free_t=
 arget
 +  * is not initialized at boot -- the system has to be running first, so=
  we
 +  * cannot initialize this in arc_init.  So we check during runtime; thi=
 s
 +  * also allows the user to return to defaults by setting it to zero.
 +  *
 +  * This should insure that we allow the VM system to steal pages first,=
 
 +  * but pare the cache before we suspend processes attempting to get mor=
 e
 +  * memory, thereby avoiding "stalls."  You can set this higher if you w=
 ish,
 +  * or force a specific percentage reservation as well, but doing so may=
 
 +  * cause the cache to pare back while the VM system remains willing to
 +  * allow "inactive" pages to accumulate.  The challenge is that image
 +  * activation can force things into the page space on a repeated basis
 +  * if you allow this level to be too small (the above pathological
 +  * behavior); the defaults should avoid that behavior but the sysctls
 +  * are exposed should your workload require adjustment.
 +  *
 +  * If we're using this check for low memory we are replacing the previo=
 us
 +  * ones, including the oddball "random" reclaim that appears to fire fa=
 r
 +  * more often than it should.  We still trigger if the system pages.
 +  *
 +  * If you turn on NEWRECLAIM_DEBUG then the kernel will print on the co=
 nsole
 +  * status messages when the reclaim status trips on and off, along with=
  the
 +  * page count aggregate that triggered it (and the free space) for each=
 
 +  * event.
 +  */
 +
 + #define	NEWRECLAIM
 + #undef	NEWRECLAIM_DEBUG
 +
 +
    /*
     * Copyright (c) 2005, 2010, Oracle and/or its affiliates. All rights =
 reserved.
     * Copyright (c) 2013 by Delphix. All rights reserved.
 ***************
 *** 139,144 ****
 --- 215,226 ----
   =20
    #include <vm/vm_pageout.h>
   =20
 + #ifdef	NEWRECLAIM
 + #ifdef	__FreeBSD__
 + #include <sys/sysctl.h>
 + #endif
 + #endif	/* NEWRECLAIM */
 +
    #ifdef illumos
    #ifndef _KERNEL
    /* set with ZFS_DEBUG=3Dwatch, to enable watchpoints on frozen buffers=
  */
 ***************
 *** 203,218 ****
 --- 285,320 ----
    int zfs_arc_shrink_shift =3D 0;
    int zfs_arc_p_min_shift =3D 0;
    int zfs_disable_dup_eviction =3D 0;
 + #ifdef	NEWRECLAIM
 + #ifdef  __FreeBSD__
 + static	int freepages =3D 0;	/* This much memory is considered critical =
 */
 + static	int percent_target =3D 0;	/* Additionally reserve "X" percent fr=
 ee RAM */
 + #endif	/* __FreeBSD__ */
 + #endif	/* NEWRECLAIM */
   =20
    TUNABLE_QUAD("vfs.zfs.arc_max", &zfs_arc_max);
    TUNABLE_QUAD("vfs.zfs.arc_min", &zfs_arc_min);
    TUNABLE_QUAD("vfs.zfs.arc_meta_limit", &zfs_arc_meta_limit);
 + #ifdef	NEWRECLAIM
 + #ifdef  __FreeBSD__
 + TUNABLE_INT("vfs.zfs.arc_freepages", &freepages);
 + TUNABLE_INT("vfs.zfs.arc_freepage_percent", &percent_target);
 + #endif	/* __FreeBSD__ */
 + #endif	/* NEWRECLAIM */
 +
    SYSCTL_DECL(_vfs_zfs);
    SYSCTL_UQUAD(_vfs_zfs, OID_AUTO, arc_max, CTLFLAG_RDTUN, &zfs_arc_max,=
  0,
        "Maximum ARC size");
    SYSCTL_UQUAD(_vfs_zfs, OID_AUTO, arc_min, CTLFLAG_RDTUN, &zfs_arc_min,=
  0,
        "Minimum ARC size");
   =20
 + #ifdef	NEWRECLAIM
 + #ifdef  __FreeBSD__
 + SYSCTL_INT(_vfs_zfs, OID_AUTO, arc_freepages, CTLFLAG_RWTUN, &freepages=
 , 0, "ARC Free RAM Pages Required");
 + SYSCTL_INT(_vfs_zfs, OID_AUTO, arc_freepage_percent, CTLFLAG_RWTUN, &pe=
 rcent_target, 0, "ARC Free RAM Target percentage");
 + #endif	/* __FreeBSD__ */
 + #endif	/* NEWRECLAIM */
 +
    /*
     * Note that buffers can be in one of 6 states:
     *	ARC_anon	- anonymous (discussed below)
 ***************
 *** 2438,2443 ****
 --- 2540,2557 ----
    {
   =20
    #ifdef _KERNEL
 + #ifdef	NEWRECLAIM
 + #ifdef  __FreeBSD__
 + 	u_int	vmfree =3D 0;
 + 	u_int	vmtotal =3D 0;
 + 	size_t	vmsize;
 + #ifdef	NEWRECLAIM_DEBUG
 + 	static	int	xval =3D -1;
 + 	static	int	oldpercent =3D 0;
 + 	static	int	oldfreepages =3D 0;
 + #endif	/* NEWRECLAIM_DEBUG */
 + #endif	/* __FreeBSD__ */
 + #endif	/* NEWRECLAIM */
   =20
    	if (needfree)
    		return (1);
 ***************
 *** 2476,2481 ****
 --- 2590,2596 ----
    		return (1);
   =20
    #if defined(__i386)
 +
    	/*
    	 * If we're on an i386 platform, it's possible that we'll exhaust the=
 
    	 * kernel heap space before we ever run out of available physical
 ***************
 *** 2492,2502 ****
    		return (1);
    #endif
    #else	/* !sun */
    	if (kmem_used() > (kmem_size() * 3) / 4)
    		return (1);
    #endif	/* sun */
   =20
 - #else
    	if (spa_get_random(100) =3D=3D 0)
    		return (1);
    #endif
 --- 2607,2680 ----
    		return (1);
    #endif
    #else	/* !sun */
 +
 + #ifdef	NEWRECLAIM
 + #ifdef  __FreeBSD__
 + /*
 +  * Implement the new tunable free RAM algorithm.  We check the free pag=
 es
 +  * against the minimum specified target and the percentage that should =
 be
 +  * free.  If we're low we ask for ARC cache shrinkage.  If this is defi=
 ned
 +  * on a FreeBSD system the older checks are not performed.
 +  *
 +  * Check first to see if we need to init freepages, then test.
 +  */
 + 	if (!freepages) {		/* If zero then (re)init */
 + 		vmsize =3D sizeof(vmtotal);
 + 		kernel_sysctlbyname(curthread, "vm.stats.vm.v_free_target", &vmtotal,=
  &vmsize, NULL, 0, NULL, 0);
 + 		freepages =3D vmtotal - (vmtotal / 5);
 + #ifdef	NEWRECLAIM_DEBUG
 + 		printf("ZFS ARC: Default vfs.zfs.arc_freepages to [%u] [%u less 20%%]=
 \n", freepages, vmtotal);
 + #endif	/* NEWRECLAIM_DEBUG */
 + 	}
 +
 + 	vmsize =3D sizeof(vmtotal);
 +         kernel_sysctlbyname(curthread, "vm.stats.vm.v_page_count", &vmt=
 otal, &vmsize, NULL, 0, NULL, 0);
 + 	vmsize =3D sizeof(vmfree);
 +         kernel_sysctlbyname(curthread, "vm.stats.vm.v_free_count", &vmf=
 ree, &vmsize, NULL, 0, NULL, 0);
 + #ifdef	NEWRECLAIM_DEBUG
 + 	if (percent_target !=3D oldpercent) {
 + 		printf("ZFS ARC: Reservation percent change to [%d], [%d] pages, [%d]=
  free\n", percent_target, vmtotal, vmfree);
 + 		oldpercent =3D percent_target;
 + 	}
 + 	if (freepages !=3D oldfreepages) {
 + 		printf("ZFS ARC: Low RAM page change to [%d], [%d] pages, [%d] free\n=
 ", freepages, vmtotal, vmfree);
 + 		oldfreepages =3D freepages;
 + 	}
 + #endif	/* NEWRECLAIM_DEBUG */
 + 	if (!vmtotal) {
 + 		vmtotal =3D 1;	/* Protect against divide by zero */
 + 				/* (should be impossible, but...) */
 + 	}
 + /*
 +  * Now figure out how much free RAM we require to call the ARC cache st=
 atus
 +  * "ok".  Add the percentage specified of the total to the base require=
 ment.
 +  */
 +
 + 	if (vmfree < freepages + ((vmtotal / 100) * percent_target)) {
 + #ifdef	NEWRECLAIM_DEBUG
 + 		if (xval !=3D 1) {
 + 			printf("ZFS ARC: RECLAIM total %u, free %u, free pct (%u), reserved =
 (%u), target pct (%u)\n", vmtotal, vmfree, ((vmfree * 100) / vmtotal), fr=
 eepages, percent_target);
 + 			xval =3D 1;
 + 		}
 + #endif	/* NEWRECLAIM_DEBUG */
 + 		return(1);
 + 	} else {
 + #ifdef	NEWRECLAIM_DEBUG
 + 		if (xval !=3D 0) {
 + 			printf("ZFS ARC: NORMAL total %u, free %u, free pct (%u), reserved (=
 %u), target pct (%u)\n", vmtotal, vmfree, ((vmfree * 100) / vmtotal), fre=
 epages, percent_target);
 + 			xval =3D 0;
 + 		}
 + #endif	/* NEWRECLAIM_DEBUG */
 + 		return(0);
 + 	}
 +
 + #endif	/* __FreeBSD__ */
 + #endif	/* NEWRECLAIM */
 +
    	if (kmem_used() > (kmem_size() * 3) / 4)
    		return (1);
    #endif	/* sun */
   =20
    	if (spa_get_random(100) =3D=3D 0)
    		return (1);
    #endif
 
 
 --=20
 -- Karl
 karl@denninger.net
 
 
 
 --------------ms010701070402040604030408
 Content-Type: application/pkcs7-signature; name="smime.p7s"
 Content-Transfer-Encoding: base64
 Content-Disposition: attachment; filename="smime.p7s"
 Content-Description: S/MIME Cryptographic Signature
 
 MIAGCSqGSIb3DQEHAqCAMIACAQExCzAJBgUrDgMCGgUAMIAGCSqGSIb3DQEHAQAAoIIFTzCC
 BUswggQzoAMCAQICAQgwDQYJKoZIhvcNAQEFBQAwgZ0xCzAJBgNVBAYTAlVTMRAwDgYDVQQI
 EwdGbG9yaWRhMRIwEAYDVQQHEwlOaWNldmlsbGUxGTAXBgNVBAoTEEN1ZGEgU3lzdGVtcyBM
 TEMxHDAaBgNVBAMTE0N1ZGEgU3lzdGVtcyBMTEMgQ0ExLzAtBgkqhkiG9w0BCQEWIGN1c3Rv
 bWVyLXNlcnZpY2VAY3VkYXN5c3RlbXMubmV0MB4XDTEzMDgyNDE5MDM0NFoXDTE4MDgyMzE5
 MDM0NFowWzELMAkGA1UEBhMCVVMxEDAOBgNVBAgTB0Zsb3JpZGExFzAVBgNVBAMTDkthcmwg
 RGVubmluZ2VyMSEwHwYJKoZIhvcNAQkBFhJrYXJsQGRlbm5pbmdlci5uZXQwggIiMA0GCSqG
 SIb3DQEBAQUAA4ICDwAwggIKAoICAQC5n2KBrBmG22nVntVdvgKCB9UcnapNThrW1L+dq6th
 d9l4mj+qYMUpJ+8I0rTbY1dn21IXQBoBQmy8t1doKwmTdQ59F0FwZEPt/fGbRgBKVt3Quf6W
 6n7kRk9MG6gdD7V9vPpFV41e+5MWYtqGWY3ScDP8SyYLjL/Xgr+5KFKkDfuubK8DeNqdLniV
 jHo/vqmIgO+6NgzPGPgmbutzFQXlxUqjiNAAKzF2+Tkddi+WKABrcc/EqnBb0X8GdqcIamO5
 SyVmuM+7Zdns7D9pcV16zMMQ8LfNFQCDvbCuuQKMDg2F22x5ekYXpwjqTyfjcHBkWC8vFNoY
 5aFMdyiN/Kkz0/kduP2ekYOgkRqcShfLEcG9SQ4LQZgqjMpTjSOGzBr3tOvVn5LkSJSHW2Z8
 Q0dxSkvFG2/lsOWFbwQeeZSaBi5vRZCYCOf5tRd1+E93FyQfpt4vsrXshIAk7IK7f0qXvxP4
 GDli5PKIEubD2Bn+gp3vB/DkfKySh5NBHVB+OPCoXRUWBkQxme65wBO02OZZt0k8Iq0i4Rci
 WV6z+lQHqDKtaVGgMsHn6PoeYhjf5Al5SP+U3imTjF2aCca1iDB5JOccX04MNljvifXgcbJN
 nkMgrzmm1ZgJ1PLur/ADWPlnz45quOhHg1TfUCLfI/DzgG7Z6u+oy4siQuFr9QT0MQIDAQAB
 o4HWMIHTMAkGA1UdEwQCMAAwEQYJYIZIAYb4QgEBBAQDAgWgMAsGA1UdDwQEAwIF4DAsBglg
 hkgBhvhCAQ0EHxYdT3BlblNTTCBHZW5lcmF0ZWQgQ2VydGlmaWNhdGUwHQYDVR0OBBYEFHw4
 +LnuALyLA5Cgy7T5ZAX1WzKPMB8GA1UdIwQYMBaAFF3U3hpBZq40HB5VM7B44/gmXiI0MDgG
 CWCGSAGG+EIBAwQrFilodHRwczovL2N1ZGFzeXN0ZW1zLm5ldDoxMTQ0My9yZXZva2VkLmNy
 bDANBgkqhkiG9w0BAQUFAAOCAQEAZ0L4tQbBd0hd4wuw/YVqEBDDXJ54q2AoqQAmsOlnoxLO
 31ehM/LvrTIP4yK2u1VmXtUumQ4Ao15JFM+xmwqtEGsh70RRrfVBAGd7KOZ3GB39FP2TgN/c
 L5fJKVxOqvEnW6cL9QtvUlcM3hXg8kDv60OB+LIcSE/P3/s+0tEpWPjxm3LHVE7JmPbZIcJ1
 YMoZvHh0NSjY5D0HZlwtbDO7pDz9sZf1QEOgjH828fhtborkaHaUI46pmrMjiBnY6ujXMcWD
 pxtikki0zY22nrxfTs5xDWGxyrc/cmucjxClJF6+OYVUSaZhiiHfa9Pr+41okLgsRB0AmNwE
 f6ItY3TI8DGCBQowggUGAgEBMIGjMIGdMQswCQYDVQQGEwJVUzEQMA4GA1UECBMHRmxvcmlk
 YTESMBAGA1UEBxMJTmljZXZpbGxlMRkwFwYDVQQKExBDdWRhIFN5c3RlbXMgTExDMRwwGgYD
 VQQDExNDdWRhIFN5c3RlbXMgTExDIENBMS8wLQYJKoZIhvcNAQkBFiBjdXN0b21lci1zZXJ2
 aWNlQGN1ZGFzeXN0ZW1zLm5ldAIBCDAJBgUrDgMCGgUAoIICOzAYBgkqhkiG9w0BCQMxCwYJ
 KoZIhvcNAQcBMBwGCSqGSIb3DQEJBTEPFw0xNDAzMTkxNDE4NDBaMCMGCSqGSIb3DQEJBDEW
 BBQei71KWp0Us3DWHQWNCkeF3NMHRjBsBgkqhkiG9w0BCQ8xXzBdMAsGCWCGSAFlAwQBKjAL
 BglghkgBZQMEAQIwCgYIKoZIhvcNAwcwDgYIKoZIhvcNAwICAgCAMA0GCCqGSIb3DQMCAgFA
 MAcGBSsOAwIHMA0GCCqGSIb3DQMCAgEoMIG0BgkrBgEEAYI3EAQxgaYwgaMwgZ0xCzAJBgNV
 BAYTAlVTMRAwDgYDVQQIEwdGbG9yaWRhMRIwEAYDVQQHEwlOaWNldmlsbGUxGTAXBgNVBAoT
 EEN1ZGEgU3lzdGVtcyBMTEMxHDAaBgNVBAMTE0N1ZGEgU3lzdGVtcyBMTEMgQ0ExLzAtBgkq
 hkiG9w0BCQEWIGN1c3RvbWVyLXNlcnZpY2VAY3VkYXN5c3RlbXMubmV0AgEIMIG2BgsqhkiG
 9w0BCRACCzGBpqCBozCBnTELMAkGA1UEBhMCVVMxEDAOBgNVBAgTB0Zsb3JpZGExEjAQBgNV
 BAcTCU5pY2V2aWxsZTEZMBcGA1UEChMQQ3VkYSBTeXN0ZW1zIExMQzEcMBoGA1UEAxMTQ3Vk
 YSBTeXN0ZW1zIExMQyBDQTEvMC0GCSqGSIb3DQEJARYgY3VzdG9tZXItc2VydmljZUBjdWRh
 c3lzdGVtcy5uZXQCAQgwDQYJKoZIhvcNAQEBBQAEggIAYM3wX4zcQ6slDGipG999HQbbYlLY
 wEaJRr1wTMOUoP+KPdDpxP9hJ6lOJYbiaM98HM1mSjxEvyX6kydwbKvV9QKVld7dliA2+pTy
 yH7ZlVdVKgtYWH6J03fjyIIdZaFHpAfSVmHeoNxKvgVZ27ur0cLs5VG+BcOeW37Jctenhidf
 H2XMs5DgCQMcn2ZcUqM7ncq3zPQu5K3afxcrmFhkrvKoeUgiLnZtERGHKClhdhQHthOGjaPa
 WShUih/yJoDcsEeuOOio4wQ3mM7DIwvn2F4B/hL90NIM0VLW95NyeJJ2TjbMa8kQ2tSv+PC3
 NPXNCJRv6wONUT3i+U+9Dl69sJVrLmfXku+vbXFb7VirsEN7WP8x7ABX6TA3WIDNTy+RMcMx
 EmYim5pmLId5h3s72b48vR/ptwPrAmxrQOaLPt5kKkRxZ4D4uTQb0+XPtAFJKEhGCQyEQ86n
 4b7Kzskoucm2UWx78uMUPD6eSiWdvv0AtnkYULhnPAErNz2t1hnpmsJK23dDZQfyIYRDxc8Q
 3UZX2KVyyD/gnq3G3JNDj5zayedh2f08bCPKBqoYUbWnhY0rtkyCdWaL3zz+CXGqnT8Kp/wF
 Uan14xdvVyETg6xXOLxFAYIj16nXS/gjWm45oyhEGlT0GcKCBcjK8V46KuXqwqZ1k5ojKWYV
 AfZPB7YAAAAAAAA=
 --------------ms010701070402040604030408--
 
 

From: Karl Denninger <karl@denninger.net>
To: bug-followup@FreeBSD.org, karl@fs.denninger.net
Cc:  
Subject: Re: kern/187594: [zfs] [patch] ZFS ARC behavior problem and fix
Date: Wed, 19 Mar 2014 13:03:30 -0500

 This is a cryptographically signed message in MIME format.
 
 --------------ms030600050808010400080908
 Content-Type: text/plain; charset=ISO-8859-1; format=flowed
 Content-Transfer-Encoding: quoted-printable
 
 The 20% invasion of the first-level paging regime looks too aggressive=20
 under very heavy load.  I have changed my system here to 10% and obtain=20
 a better response profile.
 
 At 20% the system will still occasionally page recently-used executable=20
 code to disk before cache is released which is undesirable.  10% looks=20
 better but may STILL be too aggressive (in other words, 5% might be=20
 "just right")
 
 Being able to tune this in real time is a BIG help!
 
 Adjusted patch follows (only a couple of lines have changed)
 
 *** arc.c.original	Thu Mar 13 09:18:48 2014
 --- arc.c	Wed Mar 19 13:01:48 2014
 ***************
 *** 18,23 ****
 --- 18,99 ----
     *
     * CDDL HEADER END
     */
 +
 + /* Karl Denninger (karl@denninger.net), 3/18/2014, FreeBSD-specific
 +  *
 +  * If "NEWRECLAIM" is defined, change the "low memory" warning that cau=
 ses
 +  * the ARC cache to be pared down.  The reason for the change is that t=
 he
 +  * apparent attempted algorithm is to start evicting ARC cache when fre=
 e
 +  * pages fall below 25% of installed RAM.  This maps reasonably well to=
  how
 +  * Solaris is documented to behave; when "lotsfree" is invaded ZFS is t=
 old
 +  * to pare down.
 +  *
 +  * The problem is that on FreeBSD machines the system doesn't appear to=
  be
 +  * getting what the authors of the original code thought they were look=
 ing at
 +  * with its test -- or at least not what Solaris did -- and as a result=
  that
 +  * test never triggers.  That leaves the only reclaim trigger as the "p=
 aging
 +  * needed" status flag, and by the time * that trips the system is alre=
 ady
 +  * in low-memory trouble.  This can lead to severe pathological behavio=
 r
 +  * under the following scenario:
 +  * - The system starts to page and ARC is evicted.
 +  * - The system stops paging as ARC's eviction drops wired RAM a bit.
 +  * - ARC starts increasing its allocation again, and wired memory grows=
 =2E
 +  * - A new image is activated, and the system once again attempts to pa=
 ge.
 +  * - ARC starts to be evicted again.
 +  * - Back to #2
 +  *
 +  * Note that ZFS's ARC default (unless you override it in /boot/loader.=
 conf)
 +  * is to allow the ARC cache to grab nearly all of free RAM, provided n=
 obody
 +  * else needs it.  That would be ok if we evicted cache when required.
 +  *
 +  * Unfortunately the system can get into a state where it never
 +  * manages to page anything of materiality back in, as if there is acti=
 ve
 +  * I/O the ARC will start grabbing space once again as soon as the memo=
 ry
 +  * contention state drops.  For this reason the "paging is occurring" f=
 lag
 +  * should be the **last resort** condition for ARC eviction; you want t=
 o
 +  * (as Solaris does) start when there is material free RAM left BUT the=
 
 +  * vm system thinks it needs to be active to steal pages back in the at=
 tempt
 +  * to never get into the condition where you're potentially paging off
 +  * executables in favor of leaving disk cache allocated.
 +  *
 +  * To fix this we change how we look at low memory, declaring two new
 +  * runtime tunables.
 +  *
 +  * The new sysctls are:
 +  * vfs.zfs.arc_freepages (free pages required to call RAM "sufficient")=
 
 +  * vfs.zfs.arc_freepage_percent (additional reservation percentage, def=
 ault 0)
 +  *
 +  * vfs.zfs.arc_freepages is initialized from vm.stats.vm.v_free_target,=
 
 +  * less 10% if we find that it is zero.  Note that vm.stats.vm.v_free_t=
 arget
 +  * is not initialized at boot -- the system has to be running first, so=
  we
 +  * cannot initialize this in arc_init.  So we check during runtime; thi=
 s
 +  * also allows the user to return to defaults by setting it to zero.
 +  *
 +  * This should insure that we allow the VM system to steal pages first,=
 
 +  * but pare the cache before we suspend processes attempting to get mor=
 e
 +  * memory, thereby avoiding "stalls."  You can set this higher if you w=
 ish,
 +  * or force a specific percentage reservation as well, but doing so may=
 
 +  * cause the cache to pare back while the VM system remains willing to
 +  * allow "inactive" pages to accumulate.  The challenge is that image
 +  * activation can force things into the page space on a repeated basis
 +  * if you allow this level to be too small (the above pathological
 +  * behavior); the defaults should avoid that behavior but the sysctls
 +  * are exposed should your workload require adjustment.
 +  *
 +  * If we're using this check for low memory we are replacing the previo=
 us
 +  * ones, including the oddball "random" reclaim that appears to fire fa=
 r
 +  * more often than it should.  We still trigger if the system pages.
 +  *
 +  * If you turn on NEWRECLAIM_DEBUG then the kernel will print on the co=
 nsole
 +  * status messages when the reclaim status trips on and off, along with=
  the
 +  * page count aggregate that triggered it (and the free space) for each=
 
 +  * event.
 +  */
 +
 + #define	NEWRECLAIM
 + #undef	NEWRECLAIM_DEBUG
 +
 +
    /*
     * Copyright (c) 2005, 2010, Oracle and/or its affiliates. All rights =
 reserved.
     * Copyright (c) 2013 by Delphix. All rights reserved.
 ***************
 *** 139,144 ****
 --- 215,226 ----
   =20
    #include <vm/vm_pageout.h>
   =20
 + #ifdef	NEWRECLAIM
 + #ifdef	__FreeBSD__
 + #include <sys/sysctl.h>
 + #endif
 + #endif	/* NEWRECLAIM */
 +
    #ifdef illumos
    #ifndef _KERNEL
    /* set with ZFS_DEBUG=3Dwatch, to enable watchpoints on frozen buffers=
  */
 ***************
 *** 203,218 ****
 --- 285,320 ----
    int zfs_arc_shrink_shift =3D 0;
    int zfs_arc_p_min_shift =3D 0;
    int zfs_disable_dup_eviction =3D 0;
 + #ifdef	NEWRECLAIM
 + #ifdef  __FreeBSD__
 + static	int freepages =3D 0;	/* This much memory is considered critical =
 */
 + static	int percent_target =3D 0;	/* Additionally reserve "X" percent fr=
 ee RAM */
 + #endif	/* __FreeBSD__ */
 + #endif	/* NEWRECLAIM */
   =20
    TUNABLE_QUAD("vfs.zfs.arc_max", &zfs_arc_max);
    TUNABLE_QUAD("vfs.zfs.arc_min", &zfs_arc_min);
    TUNABLE_QUAD("vfs.zfs.arc_meta_limit", &zfs_arc_meta_limit);
 + #ifdef	NEWRECLAIM
 + #ifdef  __FreeBSD__
 + TUNABLE_INT("vfs.zfs.arc_freepages", &freepages);
 + TUNABLE_INT("vfs.zfs.arc_freepage_percent", &percent_target);
 + #endif	/* __FreeBSD__ */
 + #endif	/* NEWRECLAIM */
 +
    SYSCTL_DECL(_vfs_zfs);
    SYSCTL_UQUAD(_vfs_zfs, OID_AUTO, arc_max, CTLFLAG_RDTUN, &zfs_arc_max,=
  0,
        "Maximum ARC size");
    SYSCTL_UQUAD(_vfs_zfs, OID_AUTO, arc_min, CTLFLAG_RDTUN, &zfs_arc_min,=
  0,
        "Minimum ARC size");
   =20
 + #ifdef	NEWRECLAIM
 + #ifdef  __FreeBSD__
 + SYSCTL_INT(_vfs_zfs, OID_AUTO, arc_freepages, CTLFLAG_RWTUN, &freepages=
 , 0, "ARC Free RAM Pages Required");
 + SYSCTL_INT(_vfs_zfs, OID_AUTO, arc_freepage_percent, CTLFLAG_RWTUN, &pe=
 rcent_target, 0, "ARC Free RAM Target percentage");
 + #endif	/* __FreeBSD__ */
 + #endif	/* NEWRECLAIM */
 +
    /*
     * Note that buffers can be in one of 6 states:
     *	ARC_anon	- anonymous (discussed below)
 ***************
 *** 2438,2443 ****
 --- 2540,2557 ----
    {
   =20
    #ifdef _KERNEL
 + #ifdef	NEWRECLAIM
 + #ifdef  __FreeBSD__
 + 	u_int	vmfree =3D 0;
 + 	u_int	vmtotal =3D 0;
 + 	size_t	vmsize;
 + #ifdef	NEWRECLAIM_DEBUG
 + 	static	int	xval =3D -1;
 + 	static	int	oldpercent =3D 0;
 + 	static	int	oldfreepages =3D 0;
 + #endif	/* NEWRECLAIM_DEBUG */
 + #endif	/* __FreeBSD__ */
 + #endif	/* NEWRECLAIM */
   =20
    	if (needfree)
    		return (1);
 ***************
 *** 2476,2481 ****
 --- 2590,2596 ----
    		return (1);
   =20
    #if defined(__i386)
 +
    	/*
    	 * If we're on an i386 platform, it's possible that we'll exhaust the=
 
    	 * kernel heap space before we ever run out of available physical
 ***************
 *** 2492,2502 ****
    		return (1);
    #endif
    #else	/* !sun */
    	if (kmem_used() > (kmem_size() * 3) / 4)
    		return (1);
    #endif	/* sun */
   =20
 - #else
    	if (spa_get_random(100) =3D=3D 0)
    		return (1);
    #endif
 --- 2607,2680 ----
    		return (1);
    #endif
    #else	/* !sun */
 +
 + #ifdef	NEWRECLAIM
 + #ifdef  __FreeBSD__
 + /*
 +  * Implement the new tunable free RAM algorithm.  We check the free pag=
 es
 +  * against the minimum specified target and the percentage that should =
 be
 +  * free.  If we're low we ask for ARC cache shrinkage.  If this is defi=
 ned
 +  * on a FreeBSD system the older checks are not performed.
 +  *
 +  * Check first to see if we need to init freepages, then test.
 +  */
 + 	if (!freepages) {		/* If zero then (re)init */
 + 		vmsize =3D sizeof(vmtotal);
 + 		kernel_sysctlbyname(curthread, "vm.stats.vm.v_free_target", &vmtotal,=
  &vmsize, NULL, 0, NULL, 0);
 + 		freepages =3D vmtotal - (vmtotal / 10);
 + #ifdef	NEWRECLAIM_DEBUG
 + 		printf("ZFS ARC: Default vfs.zfs.arc_freepages to [%u] [%u less 10%%]=
 \n", freepages, vmtotal);
 + #endif	/* NEWRECLAIM_DEBUG */
 + 	}
 +
 + 	vmsize =3D sizeof(vmtotal);
 +         kernel_sysctlbyname(curthread, "vm.stats.vm.v_page_count", &vmt=
 otal, &vmsize, NULL, 0, NULL, 0);
 + 	vmsize =3D sizeof(vmfree);
 +         kernel_sysctlbyname(curthread, "vm.stats.vm.v_free_count", &vmf=
 ree, &vmsize, NULL, 0, NULL, 0);
 + #ifdef	NEWRECLAIM_DEBUG
 + 	if (percent_target !=3D oldpercent) {
 + 		printf("ZFS ARC: Reservation percent change to [%d], [%d] pages, [%d]=
  free\n", percent_target, vmtotal, vmfree);
 + 		oldpercent =3D percent_target;
 + 	}
 + 	if (freepages !=3D oldfreepages) {
 + 		printf("ZFS ARC: Low RAM page change to [%d], [%d] pages, [%d] free\n=
 ", freepages, vmtotal, vmfree);
 + 		oldfreepages =3D freepages;
 + 	}
 + #endif	/* NEWRECLAIM_DEBUG */
 + 	if (!vmtotal) {
 + 		vmtotal =3D 1;	/* Protect against divide by zero */
 + 				/* (should be impossible, but...) */
 + 	}
 + /*
 +  * Now figure out how much free RAM we require to call the ARC cache st=
 atus
 +  * "ok".  Add the percentage specified of the total to the base require=
 ment.
 +  */
 +
 + 	if (vmfree < freepages + ((vmtotal / 100) * percent_target)) {
 + #ifdef	NEWRECLAIM_DEBUG
 + 		if (xval !=3D 1) {
 + 			printf("ZFS ARC: RECLAIM total %u, free %u, free pct (%u), reserved =
 (%u), target pct (%u)\n", vmtotal, vmfree, ((vmfree * 100) / vmtotal), fr=
 eepages, percent_target);
 + 			xval =3D 1;
 + 		}
 + #endif	/* NEWRECLAIM_DEBUG */
 + 		return(1);
 + 	} else {
 + #ifdef	NEWRECLAIM_DEBUG
 + 		if (xval !=3D 0) {
 + 			printf("ZFS ARC: NORMAL total %u, free %u, free pct (%u), reserved (=
 %u), target pct (%u)\n", vmtotal, vmfree, ((vmfree * 100) / vmtotal), fre=
 epages, percent_target);
 + 			xval =3D 0;
 + 		}
 + #endif	/* NEWRECLAIM_DEBUG */
 + 		return(0);
 + 	}
 +
 + #endif	/* __FreeBSD__ */
 + #endif	/* NEWRECLAIM */
 +
    	if (kmem_used() > (kmem_size() * 3) / 4)
    		return (1);
    #endif	/* sun */
   =20
    	if (spa_get_random(100) =3D=3D 0)
    		return (1);
    #endif
 
 
 --=20
 -- Karl
 karl@denninger.net
 
 
 
 --------------ms030600050808010400080908
 Content-Type: application/pkcs7-signature; name="smime.p7s"
 Content-Transfer-Encoding: base64
 Content-Disposition: attachment; filename="smime.p7s"
 Content-Description: S/MIME Cryptographic Signature
 
 MIAGCSqGSIb3DQEHAqCAMIACAQExCzAJBgUrDgMCGgUAMIAGCSqGSIb3DQEHAQAAoIIFTzCC
 BUswggQzoAMCAQICAQgwDQYJKoZIhvcNAQEFBQAwgZ0xCzAJBgNVBAYTAlVTMRAwDgYDVQQI
 EwdGbG9yaWRhMRIwEAYDVQQHEwlOaWNldmlsbGUxGTAXBgNVBAoTEEN1ZGEgU3lzdGVtcyBM
 TEMxHDAaBgNVBAMTE0N1ZGEgU3lzdGVtcyBMTEMgQ0ExLzAtBgkqhkiG9w0BCQEWIGN1c3Rv
 bWVyLXNlcnZpY2VAY3VkYXN5c3RlbXMubmV0MB4XDTEzMDgyNDE5MDM0NFoXDTE4MDgyMzE5
 MDM0NFowWzELMAkGA1UEBhMCVVMxEDAOBgNVBAgTB0Zsb3JpZGExFzAVBgNVBAMTDkthcmwg
 RGVubmluZ2VyMSEwHwYJKoZIhvcNAQkBFhJrYXJsQGRlbm5pbmdlci5uZXQwggIiMA0GCSqG
 SIb3DQEBAQUAA4ICDwAwggIKAoICAQC5n2KBrBmG22nVntVdvgKCB9UcnapNThrW1L+dq6th
 d9l4mj+qYMUpJ+8I0rTbY1dn21IXQBoBQmy8t1doKwmTdQ59F0FwZEPt/fGbRgBKVt3Quf6W
 6n7kRk9MG6gdD7V9vPpFV41e+5MWYtqGWY3ScDP8SyYLjL/Xgr+5KFKkDfuubK8DeNqdLniV
 jHo/vqmIgO+6NgzPGPgmbutzFQXlxUqjiNAAKzF2+Tkddi+WKABrcc/EqnBb0X8GdqcIamO5
 SyVmuM+7Zdns7D9pcV16zMMQ8LfNFQCDvbCuuQKMDg2F22x5ekYXpwjqTyfjcHBkWC8vFNoY
 5aFMdyiN/Kkz0/kduP2ekYOgkRqcShfLEcG9SQ4LQZgqjMpTjSOGzBr3tOvVn5LkSJSHW2Z8
 Q0dxSkvFG2/lsOWFbwQeeZSaBi5vRZCYCOf5tRd1+E93FyQfpt4vsrXshIAk7IK7f0qXvxP4
 GDli5PKIEubD2Bn+gp3vB/DkfKySh5NBHVB+OPCoXRUWBkQxme65wBO02OZZt0k8Iq0i4Rci
 WV6z+lQHqDKtaVGgMsHn6PoeYhjf5Al5SP+U3imTjF2aCca1iDB5JOccX04MNljvifXgcbJN
 nkMgrzmm1ZgJ1PLur/ADWPlnz45quOhHg1TfUCLfI/DzgG7Z6u+oy4siQuFr9QT0MQIDAQAB
 o4HWMIHTMAkGA1UdEwQCMAAwEQYJYIZIAYb4QgEBBAQDAgWgMAsGA1UdDwQEAwIF4DAsBglg
 hkgBhvhCAQ0EHxYdT3BlblNTTCBHZW5lcmF0ZWQgQ2VydGlmaWNhdGUwHQYDVR0OBBYEFHw4
 +LnuALyLA5Cgy7T5ZAX1WzKPMB8GA1UdIwQYMBaAFF3U3hpBZq40HB5VM7B44/gmXiI0MDgG
 CWCGSAGG+EIBAwQrFilodHRwczovL2N1ZGFzeXN0ZW1zLm5ldDoxMTQ0My9yZXZva2VkLmNy
 bDANBgkqhkiG9w0BAQUFAAOCAQEAZ0L4tQbBd0hd4wuw/YVqEBDDXJ54q2AoqQAmsOlnoxLO
 31ehM/LvrTIP4yK2u1VmXtUumQ4Ao15JFM+xmwqtEGsh70RRrfVBAGd7KOZ3GB39FP2TgN/c
 L5fJKVxOqvEnW6cL9QtvUlcM3hXg8kDv60OB+LIcSE/P3/s+0tEpWPjxm3LHVE7JmPbZIcJ1
 YMoZvHh0NSjY5D0HZlwtbDO7pDz9sZf1QEOgjH828fhtborkaHaUI46pmrMjiBnY6ujXMcWD
 pxtikki0zY22nrxfTs5xDWGxyrc/cmucjxClJF6+OYVUSaZhiiHfa9Pr+41okLgsRB0AmNwE
 f6ItY3TI8DGCBQowggUGAgEBMIGjMIGdMQswCQYDVQQGEwJVUzEQMA4GA1UECBMHRmxvcmlk
 YTESMBAGA1UEBxMJTmljZXZpbGxlMRkwFwYDVQQKExBDdWRhIFN5c3RlbXMgTExDMRwwGgYD
 VQQDExNDdWRhIFN5c3RlbXMgTExDIENBMS8wLQYJKoZIhvcNAQkBFiBjdXN0b21lci1zZXJ2
 aWNlQGN1ZGFzeXN0ZW1zLm5ldAIBCDAJBgUrDgMCGgUAoIICOzAYBgkqhkiG9w0BCQMxCwYJ
 KoZIhvcNAQcBMBwGCSqGSIb3DQEJBTEPFw0xNDAzMTkxODAzMzBaMCMGCSqGSIb3DQEJBDEW
 BBSFyxvPVuD9KQx+MckyjGvGxg/wgzBsBgkqhkiG9w0BCQ8xXzBdMAsGCWCGSAFlAwQBKjAL
 BglghkgBZQMEAQIwCgYIKoZIhvcNAwcwDgYIKoZIhvcNAwICAgCAMA0GCCqGSIb3DQMCAgFA
 MAcGBSsOAwIHMA0GCCqGSIb3DQMCAgEoMIG0BgkrBgEEAYI3EAQxgaYwgaMwgZ0xCzAJBgNV
 BAYTAlVTMRAwDgYDVQQIEwdGbG9yaWRhMRIwEAYDVQQHEwlOaWNldmlsbGUxGTAXBgNVBAoT
 EEN1ZGEgU3lzdGVtcyBMTEMxHDAaBgNVBAMTE0N1ZGEgU3lzdGVtcyBMTEMgQ0ExLzAtBgkq
 hkiG9w0BCQEWIGN1c3RvbWVyLXNlcnZpY2VAY3VkYXN5c3RlbXMubmV0AgEIMIG2BgsqhkiG
 9w0BCRACCzGBpqCBozCBnTELMAkGA1UEBhMCVVMxEDAOBgNVBAgTB0Zsb3JpZGExEjAQBgNV
 BAcTCU5pY2V2aWxsZTEZMBcGA1UEChMQQ3VkYSBTeXN0ZW1zIExMQzEcMBoGA1UEAxMTQ3Vk
 YSBTeXN0ZW1zIExMQyBDQTEvMC0GCSqGSIb3DQEJARYgY3VzdG9tZXItc2VydmljZUBjdWRh
 c3lzdGVtcy5uZXQCAQgwDQYJKoZIhvcNAQEBBQAEggIAPdSjbOwT77SmBWi5+zg+Jonbc/Lb
 7lyt9dX3xR2xX8Bkxa6HS/n0DjQ+vvNQSUOthkjzmlFs53EbSVuhMQGFfXYJOmeB7oBeRPzR
 VXUl2X8zgajok9SjKCkeV0Wx2yHCW+cMc/k5e0dDU6QCH05d26qbaEp0iOG7T0SAcdZagFKv
 EHOCYvCtDI7kzajdSQOAwp4JsuBpHO4Bx+h6dn8MleeCm229cKl5rVbVM4KkBXECDzJ8vu05
 ggfisQKeJzKVyMwFvkVlZtN54Sf2Vy70X/S0wYIMeopHFe18Ua1mpfUrG+FlvDVrxb0XawV+
 L52GKFTVWAhdYzflZdonkZP2cU3luJuhLVo4GQL6CMQnmV8YhREoBZYlsaa6A4HYcWjuy8kC
 rUohrgDQTdlG3KXbLksQ8joEc2lp79QAwF7c4+ggUVqFIbE8YiYvr0lJCxEce8BF3pVknNcM
 oF8tsIac3LnHj5fdmogToohf4vCC+UA6DL3eL5PMpQT7RdwaYqrZbcNGKcSSB1H662trqceV
 yO7vDDzkEwu5bh3afW88Q9vEIr21LcMi75cZn1FRp3TmIWNVZboop9MCGpKSMenqDtAkDSCY
 ex+IBP3eEsoDtNfNDQCAbC3vO9stQVYKwPvPNng4FlwFSkxXsQ1urRL6XK3LwHC0RO3B4DLh
 Nt8StxkAAAAAAAA=
 --------------ms030600050808010400080908--
 
 

From: Karl Denninger <karl@denninger.net>
To: bug-followup@FreeBSD.org, karl@fs.denninger.net
Cc:  
Subject: Re: kern/187594: [zfs] [patch] ZFS ARC behavior problem and fix
Date: Thu, 20 Mar 2014 07:26:39 -0500

 This is a cryptographically signed message in MIME format.
 
 --------------ms010203060907010901070002
 Content-Type: text/plain; charset=ISO-8859-1; format=flowed
 Content-Transfer-Encoding: quoted-printable
 
 I am increasingly-convinced with increasing runtime now on both=20
 synthetic and real loads in production that the proper default value for =
 
 vfs.zfs.arc_freepages is vm.stats.vm.v_free_target less "just a bit." =20
 Five percent appears to be ok for most workloads with RAM configurations =
 
 ranging from 4GB to the 24GB area (configurations that I can easily test =
 
 under both synthetic and real environments.)
 
 Larger invasions of the free target increasingly risk provocation of the =
 
 behavior that prompted me to get involved in working this part of the=20
 code in the first place, including short-term (~5-10 second) "stalls"=20
 during which the system appears to be locked up, but is not.
 
 It appears that the key to avoiding that behavior is to not allow the=20
 ARC to continue to take RAM when a material invasion of that target=20
 space has occurred.
 
 --=20
 -- Karl
 karl@denninger.net
 
 
 
 --------------ms010203060907010901070002
 Content-Type: application/pkcs7-signature; name="smime.p7s"
 Content-Transfer-Encoding: base64
 Content-Disposition: attachment; filename="smime.p7s"
 Content-Description: S/MIME Cryptographic Signature
 
 MIAGCSqGSIb3DQEHAqCAMIACAQExCzAJBgUrDgMCGgUAMIAGCSqGSIb3DQEHAQAAoIIFTzCC
 BUswggQzoAMCAQICAQgwDQYJKoZIhvcNAQEFBQAwgZ0xCzAJBgNVBAYTAlVTMRAwDgYDVQQI
 EwdGbG9yaWRhMRIwEAYDVQQHEwlOaWNldmlsbGUxGTAXBgNVBAoTEEN1ZGEgU3lzdGVtcyBM
 TEMxHDAaBgNVBAMTE0N1ZGEgU3lzdGVtcyBMTEMgQ0ExLzAtBgkqhkiG9w0BCQEWIGN1c3Rv
 bWVyLXNlcnZpY2VAY3VkYXN5c3RlbXMubmV0MB4XDTEzMDgyNDE5MDM0NFoXDTE4MDgyMzE5
 MDM0NFowWzELMAkGA1UEBhMCVVMxEDAOBgNVBAgTB0Zsb3JpZGExFzAVBgNVBAMTDkthcmwg
 RGVubmluZ2VyMSEwHwYJKoZIhvcNAQkBFhJrYXJsQGRlbm5pbmdlci5uZXQwggIiMA0GCSqG
 SIb3DQEBAQUAA4ICDwAwggIKAoICAQC5n2KBrBmG22nVntVdvgKCB9UcnapNThrW1L+dq6th
 d9l4mj+qYMUpJ+8I0rTbY1dn21IXQBoBQmy8t1doKwmTdQ59F0FwZEPt/fGbRgBKVt3Quf6W
 6n7kRk9MG6gdD7V9vPpFV41e+5MWYtqGWY3ScDP8SyYLjL/Xgr+5KFKkDfuubK8DeNqdLniV
 jHo/vqmIgO+6NgzPGPgmbutzFQXlxUqjiNAAKzF2+Tkddi+WKABrcc/EqnBb0X8GdqcIamO5
 SyVmuM+7Zdns7D9pcV16zMMQ8LfNFQCDvbCuuQKMDg2F22x5ekYXpwjqTyfjcHBkWC8vFNoY
 5aFMdyiN/Kkz0/kduP2ekYOgkRqcShfLEcG9SQ4LQZgqjMpTjSOGzBr3tOvVn5LkSJSHW2Z8
 Q0dxSkvFG2/lsOWFbwQeeZSaBi5vRZCYCOf5tRd1+E93FyQfpt4vsrXshIAk7IK7f0qXvxP4
 GDli5PKIEubD2Bn+gp3vB/DkfKySh5NBHVB+OPCoXRUWBkQxme65wBO02OZZt0k8Iq0i4Rci
 WV6z+lQHqDKtaVGgMsHn6PoeYhjf5Al5SP+U3imTjF2aCca1iDB5JOccX04MNljvifXgcbJN
 nkMgrzmm1ZgJ1PLur/ADWPlnz45quOhHg1TfUCLfI/DzgG7Z6u+oy4siQuFr9QT0MQIDAQAB
 o4HWMIHTMAkGA1UdEwQCMAAwEQYJYIZIAYb4QgEBBAQDAgWgMAsGA1UdDwQEAwIF4DAsBglg
 hkgBhvhCAQ0EHxYdT3BlblNTTCBHZW5lcmF0ZWQgQ2VydGlmaWNhdGUwHQYDVR0OBBYEFHw4
 +LnuALyLA5Cgy7T5ZAX1WzKPMB8GA1UdIwQYMBaAFF3U3hpBZq40HB5VM7B44/gmXiI0MDgG
 CWCGSAGG+EIBAwQrFilodHRwczovL2N1ZGFzeXN0ZW1zLm5ldDoxMTQ0My9yZXZva2VkLmNy
 bDANBgkqhkiG9w0BAQUFAAOCAQEAZ0L4tQbBd0hd4wuw/YVqEBDDXJ54q2AoqQAmsOlnoxLO
 31ehM/LvrTIP4yK2u1VmXtUumQ4Ao15JFM+xmwqtEGsh70RRrfVBAGd7KOZ3GB39FP2TgN/c
 L5fJKVxOqvEnW6cL9QtvUlcM3hXg8kDv60OB+LIcSE/P3/s+0tEpWPjxm3LHVE7JmPbZIcJ1
 YMoZvHh0NSjY5D0HZlwtbDO7pDz9sZf1QEOgjH828fhtborkaHaUI46pmrMjiBnY6ujXMcWD
 pxtikki0zY22nrxfTs5xDWGxyrc/cmucjxClJF6+OYVUSaZhiiHfa9Pr+41okLgsRB0AmNwE
 f6ItY3TI8DGCBQowggUGAgEBMIGjMIGdMQswCQYDVQQGEwJVUzEQMA4GA1UECBMHRmxvcmlk
 YTESMBAGA1UEBxMJTmljZXZpbGxlMRkwFwYDVQQKExBDdWRhIFN5c3RlbXMgTExDMRwwGgYD
 VQQDExNDdWRhIFN5c3RlbXMgTExDIENBMS8wLQYJKoZIhvcNAQkBFiBjdXN0b21lci1zZXJ2
 aWNlQGN1ZGFzeXN0ZW1zLm5ldAIBCDAJBgUrDgMCGgUAoIICOzAYBgkqhkiG9w0BCQMxCwYJ
 KoZIhvcNAQcBMBwGCSqGSIb3DQEJBTEPFw0xNDAzMjAxMjI2MzlaMCMGCSqGSIb3DQEJBDEW
 BBQR32SZybvYPRSh2yyMOKJNR5tQVDBsBgkqhkiG9w0BCQ8xXzBdMAsGCWCGSAFlAwQBKjAL
 BglghkgBZQMEAQIwCgYIKoZIhvcNAwcwDgYIKoZIhvcNAwICAgCAMA0GCCqGSIb3DQMCAgFA
 MAcGBSsOAwIHMA0GCCqGSIb3DQMCAgEoMIG0BgkrBgEEAYI3EAQxgaYwgaMwgZ0xCzAJBgNV
 BAYTAlVTMRAwDgYDVQQIEwdGbG9yaWRhMRIwEAYDVQQHEwlOaWNldmlsbGUxGTAXBgNVBAoT
 EEN1ZGEgU3lzdGVtcyBMTEMxHDAaBgNVBAMTE0N1ZGEgU3lzdGVtcyBMTEMgQ0ExLzAtBgkq
 hkiG9w0BCQEWIGN1c3RvbWVyLXNlcnZpY2VAY3VkYXN5c3RlbXMubmV0AgEIMIG2BgsqhkiG
 9w0BCRACCzGBpqCBozCBnTELMAkGA1UEBhMCVVMxEDAOBgNVBAgTB0Zsb3JpZGExEjAQBgNV
 BAcTCU5pY2V2aWxsZTEZMBcGA1UEChMQQ3VkYSBTeXN0ZW1zIExMQzEcMBoGA1UEAxMTQ3Vk
 YSBTeXN0ZW1zIExMQyBDQTEvMC0GCSqGSIb3DQEJARYgY3VzdG9tZXItc2VydmljZUBjdWRh
 c3lzdGVtcy5uZXQCAQgwDQYJKoZIhvcNAQEBBQAEggIAjrvvzQFOgSuCeTX+6e+W/0jShDsU
 MgxPnQc3xi002hY/qcA5mOQCAa/Dyynmcsyp1vYz27QaI7Bs9wiH+m+0U5x7gdp5H6mICMKe
 UGVkvDCQ/z2+sORuF+LilHmq6D/vy7i44s47wQ8WXCk6/a9TJXF6v83f1oxJuxqwUm13Sny8
 xP/UQQXZZl/fv99j3B6jjiZZ9JfiTmhWDhb7e8OAFmgT68XAV5PHAYwvgSpaFxB3w/JCoolW
 scVL4n/ESFP26j3RxCYb4Zr0jidpX+aeBdzCtx5BJE4OBHcTM6tH011jbtCdiRe5ruhtZq0g
 j0//egwZgyOUTFMWGF4j3XB7qeC8pdv/W9PgdWUncn99kkgFfvi+QxtEe8EiKRc61VXSdt9R
 EwtNKrHbmvCkPNWlCQ78lw5j3lfSG6bwYbvwjc31/ZOlJ7kjEED4vvB70GSaFxeQTpGbMQwB
 kz5MpFlalj/w8XLy5AWcgENbs9VC4lVeqH78vgg9oVJznlHipQO2emnGWFpsReX3vK65gaTM
 4i7a++zFX5uw/UabVTKdUvMyshX2pr++jeI2hK6h8FHqd8PVK/nhNYaOQVD+e/1ju8v6YJWu
 4hQJDd44EleGDNXsEfm/FaZt6VNOZ+UZhvpfBdQaE+sE+DcTOhu/7eWIaiccO+s8tUFc+OCI
 CTp2VFQAAAAAAAA=
 --------------ms010203060907010901070002--
 
 

From: Andriy Gapon <avg@FreeBSD.org>
To: bug-followup@FreeBSD.org, karl@fs.denninger.net
Cc:  
Subject: Re: kern/187594: [zfs] [patch] ZFS ARC behavior problem and fix
Date: Thu, 20 Mar 2014 16:56:18 +0200

 I think that you are gradually approaching a correct solution to the problem,
 but from quite a different angle comparing to how I would approach the problem.
 
 In fact, I think that it was this commit
 http://svnweb.freebsd.org/changeset/base/254304 that broke a balance between the
 page cache and ZFS ARC.
 
 On technical side, I see that you are still using kernel_sysctlbyname in your
 patches.  As I've said before, this is not needed and in certain sense incorrect.
 
 -- 
 Andriy Gapon

From: Karl Denninger <karl@denninger.net>
To: bug-followup@FreeBSD.org, karl@fs.denninger.net
Cc:  
Subject: Re: kern/187594: [zfs] [patch] ZFS ARC behavior problem and fix
Date: Thu, 20 Mar 2014 12:00:54 -0500

 This is a cryptographically signed message in MIME format.
 
 --------------ms010508000607000909070805
 Content-Type: text/plain; charset=ISO-8859-1; format=flowed
 Content-Transfer-Encoding: quoted-printable
 
 Responsive to avg's comment and with another overnight and daytime load=20
 of testing on multiple machines with varying memory configs from 4-24GB=20
 of RAM here is another version of the patch.
 
 The differences are:
 
 1. No longer use kernel_sysctlbyname, include the VM header file and get =
 
 the values directly (less overhead.)  Remove the variables no longer need=
 ed.
 
 2. Set the default free RAM level for ARC shrinkage to v_free_target=20
 less 3% as I was able to provoke a stall once with it set to a 5%=20
 reservation, was able to provoke it with the parameter set to 10% with a =
 
 lot of work and was able to do so "on demand" with it set to 20%.  With=20
 a 5% invasion initiating a scrub with very heavy I/O and image load=20
 (hundreds of web and database processes) provoked a ~10 second system=20
 stall.  With it set to 3% I have not been able to reproduce the stall=20
 yet the inactive page count remains stable even under extremely heavy=20
 load, indicating that page-stealing remains effective when required. =20
 Note that for my workload even with this level set above v_free_target,=20
 which would imply no page stealing by the VM system before ARC expansion =
 
 is halted, I do not get unbridled inactive page growth.
 
 As before vfs.zfs.zrc_freepages and vfs.zfs.arc_freepage_percent remain=20
 as accessible knobs if you wish to twist them for some reason to=20
 compensate for an unusual load profile or machine configuration.
 
 *** arc.c.original	Thu Mar 13 09:18:48 2014
 --- arc.c	Thu Mar 20 11:51:48 2014
 ***************
 *** 18,23 ****
 --- 18,94 ----
     *
     * CDDL HEADER END
     */
 +
 + /* Karl Denninger (karl@denninger.net), 3/20/2014, FreeBSD-specific
 +  *
 +  * If "NEWRECLAIM" is defined, change the "low memory" warning that cau=
 ses
 +  * the ARC cache to be pared down.  The reason for the change is that t=
 he
 +  * apparent attempted algorithm is to start evicting ARC cache when fre=
 e
 +  * pages fall below 25% of installed RAM.  This maps reasonably well to=
  how
 +  * Solaris is documented to behave; when "lotsfree" is invaded ZFS is t=
 old
 +  * to pare down.
 +  *
 +  * The problem is that on FreeBSD machines the system doesn't appear to=
  be
 +  * getting what the authors of the original code thought they were look=
 ing at
 +  * with its test -- or at least not what Solaris did -- and as a result=
  that
 +  * test never triggers.  That leaves the only reclaim trigger as the "p=
 aging
 +  * needed" status flag, and by the time * that trips the system is alre=
 ady
 +  * in low-memory trouble.  This can lead to severe pathological behavio=
 r
 +  * under the following scenario:
 +  * - The system starts to page and ARC is evicted.
 +  * - The system stops paging as ARC's eviction drops wired RAM a bit.
 +  * - ARC starts increasing its allocation again, and wired memory grows=
 =2E
 +  * - A new image is activated, and the system once again attempts to pa=
 ge.
 +  * - ARC starts to be evicted again.
 +  * - Back to #2
 +  *
 +  * Note that ZFS's ARC default (unless you override it in /boot/loader.=
 conf)
 +  * is to allow the ARC cache to grab nearly all of free RAM, provided n=
 obody
 +  * else needs it.  That would be ok if we evicted cache when required.
 +  *
 +  * Unfortunately the system can get into a state where it never
 +  * manages to page anything of materiality back in, as if there is acti=
 ve
 +  * I/O the ARC will start grabbing space once again as soon as the memo=
 ry
 +  * contention state drops.  For this reason the "paging is occurring" f=
 lag
 +  * should be the **last resort** condition for ARC eviction; you want t=
 o
 +  * (as Solaris does) start when there is material free RAM left BUT the=
 
 +  * vm system thinks it needs to be active to steal pages back in the at=
 tempt
 +  * to never get into the condition where you're potentially paging off
 +  * executables in favor of leaving disk cache allocated.
 +  *
 +  * To fix this we change how we look at low memory, declaring two new
 +  * runtime tunables.
 +  *
 +  * The new sysctls are:
 +  * vfs.zfs.arc_freepages (free pages required to call RAM "sufficient")=
 
 +  * vfs.zfs.arc_freepage_percent (additional reservation percentage, def=
 ault 0)
 +  *
 +  * vfs.zfs.arc_freepages is initialized from vm.v_free_target, less 3%.=
 
 +  * This should insure that we allow the VM system to steal pages first,=
 
 +  * but pare the cache before we suspend processes attempting to get mor=
 e
 +  * memory, thereby avoiding "stalls."  You can set this higher if you w=
 ish,
 +  * or force a specific percentage reservation as well, but doing so may=
 
 +  * cause the cache to pare back while the VM system remains willing to
 +  * allow "inactive" pages to accumulate.  The challenge is that image
 +  * activation can force things into the page space on a repeated basis
 +  * if you allow this level to be too small (the above pathological
 +  * behavior); the defaults should avoid that behavior but the sysctls
 +  * are exposed should your workload require adjustment.
 +  *
 +  * If we're using this check for low memory we are replacing the previo=
 us
 +  * ones, including the oddball "random" reclaim that appears to fire fa=
 r
 +  * more often than it should.  We still trigger if the system pages.
 +  *
 +  * If you turn on NEWRECLAIM_DEBUG then the kernel will print on the co=
 nsole
 +  * status messages when the reclaim status trips on and off, along with=
  the
 +  * page count aggregate that triggered it (and the free space) for each=
 
 +  * event.
 +  */
 +
 + #define	NEWRECLAIM
 + #undef	NEWRECLAIM_DEBUG
 +
 +
    /*
     * Copyright (c) 2005, 2010, Oracle and/or its affiliates. All rights =
 reserved.
     * Copyright (c) 2013 by Delphix. All rights reserved.
 ***************
 *** 139,144 ****
 --- 210,222 ----
   =20
    #include <vm/vm_pageout.h>
   =20
 + #ifdef	NEWRECLAIM
 + #ifdef	__FreeBSD__
 + #include <sys/sysctl.h>
 + #include <sys/vmmeter.h>
 + #endif
 + #endif	/* NEWRECLAIM */
 +
    #ifdef illumos
    #ifndef _KERNEL
    /* set with ZFS_DEBUG=3Dwatch, to enable watchpoints on frozen buffers=
  */
 ***************
 *** 203,218 ****
 --- 281,316 ----
    int zfs_arc_shrink_shift =3D 0;
    int zfs_arc_p_min_shift =3D 0;
    int zfs_disable_dup_eviction =3D 0;
 + #ifdef	NEWRECLAIM
 + #ifdef  __FreeBSD__
 + static	int freepages =3D 0;	/* This much memory is considered critical =
 */
 + static	int percent_target =3D 0;	/* Additionally reserve "X" percent fr=
 ee RAM */
 + #endif	/* __FreeBSD__ */
 + #endif	/* NEWRECLAIM */
   =20
    TUNABLE_QUAD("vfs.zfs.arc_max", &zfs_arc_max);
    TUNABLE_QUAD("vfs.zfs.arc_min", &zfs_arc_min);
    TUNABLE_QUAD("vfs.zfs.arc_meta_limit", &zfs_arc_meta_limit);
 + #ifdef	NEWRECLAIM
 + #ifdef  __FreeBSD__
 + TUNABLE_INT("vfs.zfs.arc_freepages", &freepages);
 + TUNABLE_INT("vfs.zfs.arc_freepage_percent", &percent_target);
 + #endif	/* __FreeBSD__ */
 + #endif	/* NEWRECLAIM */
 +
    SYSCTL_DECL(_vfs_zfs);
    SYSCTL_UQUAD(_vfs_zfs, OID_AUTO, arc_max, CTLFLAG_RDTUN, &zfs_arc_max,=
  0,
        "Maximum ARC size");
    SYSCTL_UQUAD(_vfs_zfs, OID_AUTO, arc_min, CTLFLAG_RDTUN, &zfs_arc_min,=
  0,
        "Minimum ARC size");
   =20
 + #ifdef	NEWRECLAIM
 + #ifdef  __FreeBSD__
 + SYSCTL_INT(_vfs_zfs, OID_AUTO, arc_freepages, CTLFLAG_RWTUN, &freepages=
 , 0, "ARC Free RAM Pages Required");
 + SYSCTL_INT(_vfs_zfs, OID_AUTO, arc_freepage_percent, CTLFLAG_RWTUN, &pe=
 rcent_target, 0, "ARC Free RAM Target percentage");
 + #endif	/* __FreeBSD__ */
 + #endif	/* NEWRECLAIM */
 +
    /*
     * Note that buffers can be in one of 6 states:
     *	ARC_anon	- anonymous (discussed below)
 ***************
 *** 2438,2443 ****
 --- 2536,2546 ----
    {
   =20
    #ifdef _KERNEL
 + #ifdef	NEWRECLAIM_DEBUG
 + 	static	int	xval =3D -1;
 + 	static	int	oldpercent =3D 0;
 + 	static	int	oldfreepages =3D 0;
 + #endif	/* NEWRECLAIM_DEBUG */
   =20
    	if (needfree)
    		return (1);
 ***************
 *** 2476,2481 ****
 --- 2579,2585 ----
    		return (1);
   =20
    #if defined(__i386)
 +
    	/*
    	 * If we're on an i386 platform, it's possible that we'll exhaust the=
 
    	 * kernel heap space before we ever run out of available physical
 ***************
 *** 2492,2502 ****
    		return (1);
    #endif
    #else	/* !sun */
    	if (kmem_used() > (kmem_size() * 3) / 4)
    		return (1);
    #endif	/* sun */
   =20
 - #else
    	if (spa_get_random(100) =3D=3D 0)
    		return (1);
    #endif
 --- 2596,2658 ----
    		return (1);
    #endif
    #else	/* !sun */
 +
 + #ifdef	NEWRECLAIM
 + #ifdef  __FreeBSD__
 + /*
 +  * Implement the new tunable free RAM algorithm.  We check the free pag=
 es
 +  * against the minimum specified target and the percentage that should =
 be
 +  * free.  If we're low we ask for ARC cache shrinkage.  If this is defi=
 ned
 +  * on a FreeBSD system the older checks are not performed.
 +  *
 +  * Check first to see if we need to init freepages, then test.
 +  */
 + 	if (!freepages) {		/* If zero then (re)init */
 + 		freepages =3D cnt.v_free_target - (cnt.v_free_target / 33);
 + #ifdef	NEWRECLAIM_DEBUG
 + 		printf("ZFS ARC: Default vfs.zfs.arc_freepages to [%u] [%u less 3%%]\=
 n", freepages, cnt.v_free_target);
 + #endif	/* NEWRECLAIM_DEBUG */
 + 	}
 + #ifdef	NEWRECLAIM_DEBUG
 + 	if (percent_target !=3D oldpercent) {
 + 		printf("ZFS ARC: Reservation percent change to [%d], [%d] pages, [%d]=
  free\n", percent_target, cnt.v_page_count, cnt.v_free_count);
 + 		oldpercent =3D percent_target;
 + 	}
 + 	if (freepages !=3D oldfreepages) {
 + 		printf("ZFS ARC: Low RAM page change to [%d], [%d] pages, [%d] free\n=
 ", freepages, cnt.v_page_count, cnt.v_free_count);
 + 		oldfreepages =3D freepages;
 + 	}
 + #endif	/* NEWRECLAIM_DEBUG */
 + /*
 +  * Now figure out how much free RAM we require to call the ARC cache st=
 atus
 +  * "ok".  Add the percentage specified of the total to the base require=
 ment.
 +  */
 +
 + 	if (cnt.v_free_count < freepages + ((cnt.v_page_count / 100) * percent=
 _target)) {
 + #ifdef	NEWRECLAIM_DEBUG
 + 		if (xval !=3D 1) {
 + 			printf("ZFS ARC: RECLAIM total %u, free %u, free pct (%u), reserved =
 (%u), target pct (%u)\n", cnt.v_page_count, cnt.v_free_count, ((cnt.v_fre=
 e_count * 100) / cnt.v_page_count), freepages, percent_target);
 + 			xval =3D 1;
 + 		}
 + #endif	/* NEWRECLAIM_DEBUG */
 + 		return(1);
 + 	} else {
 + #ifdef	NEWRECLAIM_DEBUG
 + 		if (xval !=3D 0) {
 + 			printf("ZFS ARC: NORMAL total %u, free %u, free pct (%u), reserved (=
 %u), target pct (%u)\n", cnt.v_page_count, cnt.v_free_count, ((cnt.v_free=
 _count * 100) / cnt.v_page_count), freepages, percent_target);
 + 			xval =3D 0;
 + 		}
 + #endif	/* NEWRECLAIM_DEBUG */
 + 		return(0);
 + 	}
 +
 + #endif	/* __FreeBSD__ */
 + #endif	/* NEWRECLAIM */
 +
    	if (kmem_used() > (kmem_size() * 3) / 4)
    		return (1);
    #endif	/* sun */
   =20
    	if (spa_get_random(100) =3D=3D 0)
    		return (1);
    #endif
 
 
 --=20
 -- Karl
 karl@denninger.net
 
 
 
 --------------ms010508000607000909070805
 Content-Type: application/pkcs7-signature; name="smime.p7s"
 Content-Transfer-Encoding: base64
 Content-Disposition: attachment; filename="smime.p7s"
 Content-Description: S/MIME Cryptographic Signature
 
 MIAGCSqGSIb3DQEHAqCAMIACAQExCzAJBgUrDgMCGgUAMIAGCSqGSIb3DQEHAQAAoIIFTzCC
 BUswggQzoAMCAQICAQgwDQYJKoZIhvcNAQEFBQAwgZ0xCzAJBgNVBAYTAlVTMRAwDgYDVQQI
 EwdGbG9yaWRhMRIwEAYDVQQHEwlOaWNldmlsbGUxGTAXBgNVBAoTEEN1ZGEgU3lzdGVtcyBM
 TEMxHDAaBgNVBAMTE0N1ZGEgU3lzdGVtcyBMTEMgQ0ExLzAtBgkqhkiG9w0BCQEWIGN1c3Rv
 bWVyLXNlcnZpY2VAY3VkYXN5c3RlbXMubmV0MB4XDTEzMDgyNDE5MDM0NFoXDTE4MDgyMzE5
 MDM0NFowWzELMAkGA1UEBhMCVVMxEDAOBgNVBAgTB0Zsb3JpZGExFzAVBgNVBAMTDkthcmwg
 RGVubmluZ2VyMSEwHwYJKoZIhvcNAQkBFhJrYXJsQGRlbm5pbmdlci5uZXQwggIiMA0GCSqG
 SIb3DQEBAQUAA4ICDwAwggIKAoICAQC5n2KBrBmG22nVntVdvgKCB9UcnapNThrW1L+dq6th
 d9l4mj+qYMUpJ+8I0rTbY1dn21IXQBoBQmy8t1doKwmTdQ59F0FwZEPt/fGbRgBKVt3Quf6W
 6n7kRk9MG6gdD7V9vPpFV41e+5MWYtqGWY3ScDP8SyYLjL/Xgr+5KFKkDfuubK8DeNqdLniV
 jHo/vqmIgO+6NgzPGPgmbutzFQXlxUqjiNAAKzF2+Tkddi+WKABrcc/EqnBb0X8GdqcIamO5
 SyVmuM+7Zdns7D9pcV16zMMQ8LfNFQCDvbCuuQKMDg2F22x5ekYXpwjqTyfjcHBkWC8vFNoY
 5aFMdyiN/Kkz0/kduP2ekYOgkRqcShfLEcG9SQ4LQZgqjMpTjSOGzBr3tOvVn5LkSJSHW2Z8
 Q0dxSkvFG2/lsOWFbwQeeZSaBi5vRZCYCOf5tRd1+E93FyQfpt4vsrXshIAk7IK7f0qXvxP4
 GDli5PKIEubD2Bn+gp3vB/DkfKySh5NBHVB+OPCoXRUWBkQxme65wBO02OZZt0k8Iq0i4Rci
 WV6z+lQHqDKtaVGgMsHn6PoeYhjf5Al5SP+U3imTjF2aCca1iDB5JOccX04MNljvifXgcbJN
 nkMgrzmm1ZgJ1PLur/ADWPlnz45quOhHg1TfUCLfI/DzgG7Z6u+oy4siQuFr9QT0MQIDAQAB
 o4HWMIHTMAkGA1UdEwQCMAAwEQYJYIZIAYb4QgEBBAQDAgWgMAsGA1UdDwQEAwIF4DAsBglg
 hkgBhvhCAQ0EHxYdT3BlblNTTCBHZW5lcmF0ZWQgQ2VydGlmaWNhdGUwHQYDVR0OBBYEFHw4
 +LnuALyLA5Cgy7T5ZAX1WzKPMB8GA1UdIwQYMBaAFF3U3hpBZq40HB5VM7B44/gmXiI0MDgG
 CWCGSAGG+EIBAwQrFilodHRwczovL2N1ZGFzeXN0ZW1zLm5ldDoxMTQ0My9yZXZva2VkLmNy
 bDANBgkqhkiG9w0BAQUFAAOCAQEAZ0L4tQbBd0hd4wuw/YVqEBDDXJ54q2AoqQAmsOlnoxLO
 31ehM/LvrTIP4yK2u1VmXtUumQ4Ao15JFM+xmwqtEGsh70RRrfVBAGd7KOZ3GB39FP2TgN/c
 L5fJKVxOqvEnW6cL9QtvUlcM3hXg8kDv60OB+LIcSE/P3/s+0tEpWPjxm3LHVE7JmPbZIcJ1
 YMoZvHh0NSjY5D0HZlwtbDO7pDz9sZf1QEOgjH828fhtborkaHaUI46pmrMjiBnY6ujXMcWD
 pxtikki0zY22nrxfTs5xDWGxyrc/cmucjxClJF6+OYVUSaZhiiHfa9Pr+41okLgsRB0AmNwE
 f6ItY3TI8DGCBQowggUGAgEBMIGjMIGdMQswCQYDVQQGEwJVUzEQMA4GA1UECBMHRmxvcmlk
 YTESMBAGA1UEBxMJTmljZXZpbGxlMRkwFwYDVQQKExBDdWRhIFN5c3RlbXMgTExDMRwwGgYD
 VQQDExNDdWRhIFN5c3RlbXMgTExDIENBMS8wLQYJKoZIhvcNAQkBFiBjdXN0b21lci1zZXJ2
 aWNlQGN1ZGFzeXN0ZW1zLm5ldAIBCDAJBgUrDgMCGgUAoIICOzAYBgkqhkiG9w0BCQMxCwYJ
 KoZIhvcNAQcBMBwGCSqGSIb3DQEJBTEPFw0xNDAzMjAxNzAwNTRaMCMGCSqGSIb3DQEJBDEW
 BBQrRNHcjN1dwAlZu7blrh+3Vu7++TBsBgkqhkiG9w0BCQ8xXzBdMAsGCWCGSAFlAwQBKjAL
 BglghkgBZQMEAQIwCgYIKoZIhvcNAwcwDgYIKoZIhvcNAwICAgCAMA0GCCqGSIb3DQMCAgFA
 MAcGBSsOAwIHMA0GCCqGSIb3DQMCAgEoMIG0BgkrBgEEAYI3EAQxgaYwgaMwgZ0xCzAJBgNV
 BAYTAlVTMRAwDgYDVQQIEwdGbG9yaWRhMRIwEAYDVQQHEwlOaWNldmlsbGUxGTAXBgNVBAoT
 EEN1ZGEgU3lzdGVtcyBMTEMxHDAaBgNVBAMTE0N1ZGEgU3lzdGVtcyBMTEMgQ0ExLzAtBgkq
 hkiG9w0BCQEWIGN1c3RvbWVyLXNlcnZpY2VAY3VkYXN5c3RlbXMubmV0AgEIMIG2BgsqhkiG
 9w0BCRACCzGBpqCBozCBnTELMAkGA1UEBhMCVVMxEDAOBgNVBAgTB0Zsb3JpZGExEjAQBgNV
 BAcTCU5pY2V2aWxsZTEZMBcGA1UEChMQQ3VkYSBTeXN0ZW1zIExMQzEcMBoGA1UEAxMTQ3Vk
 YSBTeXN0ZW1zIExMQyBDQTEvMC0GCSqGSIb3DQEJARYgY3VzdG9tZXItc2VydmljZUBjdWRh
 c3lzdGVtcy5uZXQCAQgwDQYJKoZIhvcNAQEBBQAEggIAsIbg6OIk2XnrPw25FA9+s4FCnKlo
 Wz3KtfA59Gf2jX8yKEQW26k1QK6w62Zc8KaB5LypaOK1rZ5bipeu6rGHhgaG1oPXUMmcc2p4
 E18URDskAfn5apoJ/n54nR94OqHfQ/EPBx711pxYtAGBLFOOzwrU2MEZCl2KBydI+Bw/E75R
 WRIk6y0NqSWjgVWU2tJwnOEZj/2UGQCSvJ7h5t1n7idbDIfT88/hvAW3b3knRwPxwpZretXq
 2BGgmv8lojr7Zui5sR/YdDjSK2yGHqo0mWkSAHp0Wts8okcoJNZSEispFRh56MWCIoJ51cki
 pCZH/vX1EEsfka3CrlE7LWABAYf1biy+Xq/Bfxgq9oAaknGF2yM0jgR7xnjLYLvbv5pjt7ar
 TH2JslJMYkJPKiYFJNEgVJ9wTVQtrCPJQPTk3R1qD3YFraly5Mgjwy5Ax5n8SW858WWOxHeP
 vmL0j1boO0Re9qeAb9v/z8z3NPkFPZhBrEz3g6INCWil+2Vx1yruJvxm1oN9OMQSt2qY38rj
 XWhWVxoQtW39LZc/xSNR41DQXvPJ8VOvyrmvLm7uTm4+lQYVUwNuLNbDFlj8slkAeXF/eR1S
 4VuWtwexxCco+xGjbPTcZgap976XsvlRWOmjmwqZyGNuW7ZmcODQPFjQvpnBkx9Rm5cLndK6
 OVorTQkAAAAAAAA=
 --------------ms010508000607000909070805--
 
 

From: Karl Denninger <karl@denninger.net>
To: bug-followup@FreeBSD.org, karl@fs.denninger.net
Cc:  
Subject: Re: kern/187594: [zfs] [patch] ZFS ARC behavior problem and fix
Date: Mon, 24 Mar 2014 06:41:16 -0500

 This is a cryptographically signed message in MIME format.
 
 --------------ms090509050705090705090709
 Content-Type: text/plain; charset=ISO-8859-1; format=flowed
 Content-Transfer-Encoding: quoted-printable
 
 Update:
 
 1. Patch is still good against latest arc.c change (associated with new=20
 flags on the pool).
 2. Change default low memory warning for the arc to cnt.v_page_count; no =
 
 margin.  This appears to provide the best performance and does not cause =
 
 problems with inact pages or other misbehavior on my test systems.
 3. Expose the return flag (arc_shrink_needed) so if you care to watch it =
 
 for some reason, you can.
 
 *** arc.c.original	Sun Mar 23 14:56:01 2014
 --- arc.c	Sun Mar 23 15:12:15 2014
 ***************
 *** 18,23 ****
 --- 18,95 ----
     *
     * CDDL HEADER END
     */
 +
 + /* Karl Denninger (karl@denninger.net), 3/20/2014, FreeBSD-specific
 +  *
 +  * If "NEWRECLAIM" is defined, change the "low memory" warning that cau=
 ses
 +  * the ARC cache to be pared down.  The reason for the change is that t=
 he
 +  * apparent attempted algorithm is to start evicting ARC cache when fre=
 e
 +  * pages fall below 25% of installed RAM.  This maps reasonably well to=
  how
 +  * Solaris is documented to behave; when "lotsfree" is invaded ZFS is t=
 old
 +  * to pare down.
 +  *
 +  * The problem is that on FreeBSD machines the system doesn't appear to=
  be
 +  * getting what the authors of the original code thought they were look=
 ing at
 +  * with its test -- or at least not what Solaris did -- and as a result=
  that
 +  * test never triggers.  That leaves the only reclaim trigger as the "p=
 aging
 +  * needed" status flag, and by the time * that trips the system is alre=
 ady
 +  * in low-memory trouble.  This can lead to severe pathological behavio=
 r
 +  * under the following scenario:
 +  * - The system starts to page and ARC is evicted.
 +  * - The system stops paging as ARC's eviction drops wired RAM a bit.
 +  * - ARC starts increasing its allocation again, and wired memory grows=
 =2E
 +  * - A new image is activated, and the system once again attempts to pa=
 ge.
 +  * - ARC starts to be evicted again.
 +  * - Back to #2
 +  *
 +  * Note that ZFS's ARC default (unless you override it in /boot/loader.=
 conf)
 +  * is to allow the ARC cache to grab nearly all of free RAM, provided n=
 obody
 +  * else needs it.  That would be ok if we evicted cache when required.
 +  *
 +  * Unfortunately the system can get into a state where it never
 +  * manages to page anything of materiality back in, as if there is acti=
 ve
 +  * I/O the ARC will start grabbing space once again as soon as the memo=
 ry
 +  * contention state drops.  For this reason the "paging is occurring" f=
 lag
 +  * should be the **last resort** condition for ARC eviction; you want t=
 o
 +  * (as Solaris does) start when there is material free RAM left BUT the=
 
 +  * vm system thinks it needs to be active to steal pages back in the at=
 tempt
 +  * to never get into the condition where you're potentially paging off
 +  * executables in favor of leaving disk cache allocated.
 +  *
 +  * To fix this we change how we look at low memory, declaring two new
 +  * runtime tunables and one status.
 +  *
 +  * The new sysctls are:
 +  * vfs.zfs.arc_freepages (free pages required to call RAM "sufficient")=
 
 +  * vfs.zfs.arc_freepage_percent (additional reservation percentage, def=
 ault 0)
 +  * vfs.zfs.arc_shrink_needed (shows "1" if we're asking for shrinking t=
 he ARC)
 +  *
 +  * vfs.zfs.arc_freepages is initialized from vm.v_free_target.
 +  * This should insure that we allow the VM system to steal pages,
 +  * but pare the cache before we suspend processes attempting to get mor=
 e
 +  * memory, thereby avoiding "stalls."  You can set this higher if you w=
 ish,
 +  * or force a specific percentage reservation as well, but doing so may=
 
 +  * cause the cache to pare back while the VM system remains willing to
 +  * allow "inactive" pages to accumulate.  The challenge is that image
 +  * activation can force things into the page space on a repeated basis
 +  * if you allow this level to be too small (the above pathological
 +  * behavior); the defaults should avoid that behavior but the sysctls
 +  * are exposed should your workload require adjustment.
 +  *
 +  * If we're using this check for low memory we are replacing the previo=
 us
 +  * ones, including the oddball "random" reclaim that appears to fire fa=
 r
 +  * more often than it should.  We still trigger if the system pages.
 +  *
 +  * If you turn on NEWRECLAIM_DEBUG then the kernel will print on the co=
 nsole
 +  * status messages when the reclaim status trips on and off, along with=
  the
 +  * page count aggregate that triggered it (and the free space) for each=
 
 +  * event.
 +  */
 +
 + #define	NEWRECLAIM
 + #undef	NEWRECLAIM_DEBUG
 +
 +
    /*
     * Copyright (c) 2005, 2010, Oracle and/or its affiliates. All rights =
 reserved.
     * Copyright (c) 2013 by Delphix. All rights reserved.
 ***************
 *** 139,144 ****
 --- 211,223 ----
   =20
    #include <vm/vm_pageout.h>
   =20
 + #ifdef	NEWRECLAIM
 + #ifdef	__FreeBSD__
 + #include <sys/sysctl.h>
 + #include <sys/vmmeter.h>
 + #endif
 + #endif	/* NEWRECLAIM */
 +
    #ifdef illumos
    #ifndef _KERNEL
    /* set with ZFS_DEBUG=3Dwatch, to enable watchpoints on frozen buffers=
  */
 ***************
 *** 203,218 ****
 --- 282,320 ----
    int zfs_arc_shrink_shift =3D 0;
    int zfs_arc_p_min_shift =3D 0;
    int zfs_disable_dup_eviction =3D 0;
 + #ifdef	NEWRECLAIM
 + #ifdef  __FreeBSD__
 + static	int freepages =3D 0;	/* This much memory is considered critical =
 */
 + static	int percent_target =3D 0;	/* Additionally reserve "X" percent fr=
 ee RAM */
 + static	int shrink_needed =3D 0;	/* Shrinkage of ARC cache needed?	*/
 + #endif	/* __FreeBSD__ */
 + #endif	/* NEWRECLAIM */
   =20
    TUNABLE_QUAD("vfs.zfs.arc_max", &zfs_arc_max);
    TUNABLE_QUAD("vfs.zfs.arc_min", &zfs_arc_min);
    TUNABLE_QUAD("vfs.zfs.arc_meta_limit", &zfs_arc_meta_limit);
 + #ifdef	NEWRECLAIM
 + #ifdef  __FreeBSD__
 + TUNABLE_INT("vfs.zfs.arc_freepages", &freepages);
 + TUNABLE_INT("vfs.zfs.arc_freepage_percent", &percent_target);
 + TUNABLE_INT("vfs.zfs.arc_shrink_needed", &shrink_needed);
 + #endif	/* __FreeBSD__ */
 + #endif	/* NEWRECLAIM */
 +
    SYSCTL_DECL(_vfs_zfs);
    SYSCTL_UQUAD(_vfs_zfs, OID_AUTO, arc_max, CTLFLAG_RDTUN, &zfs_arc_max,=
  0,
        "Maximum ARC size");
    SYSCTL_UQUAD(_vfs_zfs, OID_AUTO, arc_min, CTLFLAG_RDTUN, &zfs_arc_min,=
  0,
        "Minimum ARC size");
   =20
 + #ifdef	NEWRECLAIM
 + #ifdef  __FreeBSD__
 + SYSCTL_INT(_vfs_zfs, OID_AUTO, arc_freepages, CTLFLAG_RWTUN, &freepages=
 , 0, "ARC Free RAM Pages Required");
 + SYSCTL_INT(_vfs_zfs, OID_AUTO, arc_freepage_percent, CTLFLAG_RWTUN, &pe=
 rcent_target, 0, "ARC Free RAM Target percentage");
 + SYSCTL_INT(_vfs_zfs, OID_AUTO, arc_shrink_needed, CTLFLAG_RD, &shrink_n=
 eeded, 0, "ARC Memory Constrained (0 =3D no, 1 =3D yes)");
 + #endif	/* __FreeBSD__ */
 + #endif	/* NEWRECLAIM */
 +
    /*
     * Note that buffers can be in one of 6 states:
     *	ARC_anon	- anonymous (discussed below)
 ***************
 *** 2438,2443 ****
 --- 2540,2550 ----
    {
   =20
    #ifdef _KERNEL
 + #ifdef	NEWRECLAIM_DEBUG
 + 	static	int	xval =3D -1;
 + 	static	int	oldpercent =3D 0;
 + 	static	int	oldfreepages =3D 0;
 + #endif	/* NEWRECLAIM_DEBUG */
   =20
    	if (needfree)
    		return (1);
 ***************
 *** 2476,2481 ****
 --- 2583,2589 ----
    		return (1);
   =20
    #if defined(__i386)
 +
    	/*
    	 * If we're on an i386 platform, it's possible that we'll exhaust the=
 
    	 * kernel heap space before we ever run out of available physical
 ***************
 *** 2492,2502 ****
    		return (1);
    #endif
    #else	/* !sun */
    	if (kmem_used() > (kmem_size() * 3) / 4)
    		return (1);
    #endif	/* sun */
   =20
 - #else
    	if (spa_get_random(100) =3D=3D 0)
    		return (1);
    #endif
 --- 2600,2664 ----
    		return (1);
    #endif
    #else	/* !sun */
 +
 + #ifdef	NEWRECLAIM
 + #ifdef  __FreeBSD__
 + /*
 +  * Implement the new tunable free RAM algorithm.  We check the free pag=
 es
 +  * against the minimum specified target and the percentage that should =
 be
 +  * free.  If we're low we ask for ARC cache shrinkage.  If this is defi=
 ned
 +  * on a FreeBSD system the older checks are not performed.
 +  *
 +  * Check first to see if we need to init freepages, then test.
 +  */
 + 	if (!freepages) {		/* If zero then (re)init */
 + 		freepages =3D cnt.v_free_target;
 + #ifdef	NEWRECLAIM_DEBUG
 + 		printf("ZFS ARC: Default vfs.zfs.arc_freepages to [%u]\n", freepages)=
 ;
 + #endif	/* NEWRECLAIM_DEBUG */
 + 	}
 + #ifdef	NEWRECLAIM_DEBUG
 + 	if (percent_target !=3D oldpercent) {
 + 		printf("ZFS ARC: Reservation percent change to [%d], [%d] pages, [%d]=
  free\n", percent_target, cnt.v_page_count, cnt.v_free_count);
 + 		oldpercent =3D percent_target;
 + 	}
 + 	if (freepages !=3D oldfreepages) {
 + 		printf("ZFS ARC: Low RAM page change to [%d], [%d] pages, [%d] free\n=
 ", freepages, cnt.v_page_count, cnt.v_free_count);
 + 		oldfreepages =3D freepages;
 + 	}
 + #endif	/* NEWRECLAIM_DEBUG */
 + /*
 +  * Now figure out how much free RAM we require to call the ARC cache st=
 atus
 +  * "ok".  Add the percentage specified of the total to the base require=
 ment.
 +  */
 +
 + 	if (cnt.v_free_count < (freepages + ((cnt.v_page_count / 100) * percen=
 t_target))) {
 + #ifdef	NEWRECLAIM_DEBUG
 + 		if (xval !=3D 1) {
 + 			printf("ZFS ARC: RECLAIM total %u, free %u, free pct (%u), reserved =
 (%u), target pct (%u)\n", cnt.v_page_count, cnt.v_free_count, ((cnt.v_fre=
 e_count * 100) / cnt.v_page_count), freepages, percent_target);
 + 			xval =3D 1;
 + 		}
 + #endif	/* NEWRECLAIM_DEBUG */
 + 		shrink_needed =3D 1;
 + 		return(1);
 + 	} else {
 + #ifdef	NEWRECLAIM_DEBUG
 + 		if (xval !=3D 0) {
 + 			printf("ZFS ARC: NORMAL total %u, free %u, free pct (%u), reserved (=
 %u), target pct (%u)\n", cnt.v_page_count, cnt.v_free_count, ((cnt.v_free=
 _count * 100) / cnt.v_page_count), freepages, percent_target);
 + 			xval =3D 0;
 + 		}
 + #endif	/* NEWRECLAIM_DEBUG */
 + 		shrink_needed =3D 0;
 + 		return(0);
 + 	}
 +
 + #endif	/* __FreeBSD__ */
 + #endif	/* NEWRECLAIM */
 +
    	if (kmem_used() > (kmem_size() * 3) / 4)
    		return (1);
    #endif	/* sun */
   =20
    	if (spa_get_random(100) =3D=3D 0)
    		return (1);
    #endif
 
 --=20
 -- Karl
 karl@denninger.net
 
 
 
 --------------ms090509050705090705090709
 Content-Type: application/pkcs7-signature; name="smime.p7s"
 Content-Transfer-Encoding: base64
 Content-Disposition: attachment; filename="smime.p7s"
 Content-Description: S/MIME Cryptographic Signature
 
 MIAGCSqGSIb3DQEHAqCAMIACAQExCzAJBgUrDgMCGgUAMIAGCSqGSIb3DQEHAQAAoIIFTzCC
 BUswggQzoAMCAQICAQgwDQYJKoZIhvcNAQEFBQAwgZ0xCzAJBgNVBAYTAlVTMRAwDgYDVQQI
 EwdGbG9yaWRhMRIwEAYDVQQHEwlOaWNldmlsbGUxGTAXBgNVBAoTEEN1ZGEgU3lzdGVtcyBM
 TEMxHDAaBgNVBAMTE0N1ZGEgU3lzdGVtcyBMTEMgQ0ExLzAtBgkqhkiG9w0BCQEWIGN1c3Rv
 bWVyLXNlcnZpY2VAY3VkYXN5c3RlbXMubmV0MB4XDTEzMDgyNDE5MDM0NFoXDTE4MDgyMzE5
 MDM0NFowWzELMAkGA1UEBhMCVVMxEDAOBgNVBAgTB0Zsb3JpZGExFzAVBgNVBAMTDkthcmwg
 RGVubmluZ2VyMSEwHwYJKoZIhvcNAQkBFhJrYXJsQGRlbm5pbmdlci5uZXQwggIiMA0GCSqG
 SIb3DQEBAQUAA4ICDwAwggIKAoICAQC5n2KBrBmG22nVntVdvgKCB9UcnapNThrW1L+dq6th
 d9l4mj+qYMUpJ+8I0rTbY1dn21IXQBoBQmy8t1doKwmTdQ59F0FwZEPt/fGbRgBKVt3Quf6W
 6n7kRk9MG6gdD7V9vPpFV41e+5MWYtqGWY3ScDP8SyYLjL/Xgr+5KFKkDfuubK8DeNqdLniV
 jHo/vqmIgO+6NgzPGPgmbutzFQXlxUqjiNAAKzF2+Tkddi+WKABrcc/EqnBb0X8GdqcIamO5
 SyVmuM+7Zdns7D9pcV16zMMQ8LfNFQCDvbCuuQKMDg2F22x5ekYXpwjqTyfjcHBkWC8vFNoY
 5aFMdyiN/Kkz0/kduP2ekYOgkRqcShfLEcG9SQ4LQZgqjMpTjSOGzBr3tOvVn5LkSJSHW2Z8
 Q0dxSkvFG2/lsOWFbwQeeZSaBi5vRZCYCOf5tRd1+E93FyQfpt4vsrXshIAk7IK7f0qXvxP4
 GDli5PKIEubD2Bn+gp3vB/DkfKySh5NBHVB+OPCoXRUWBkQxme65wBO02OZZt0k8Iq0i4Rci
 WV6z+lQHqDKtaVGgMsHn6PoeYhjf5Al5SP+U3imTjF2aCca1iDB5JOccX04MNljvifXgcbJN
 nkMgrzmm1ZgJ1PLur/ADWPlnz45quOhHg1TfUCLfI/DzgG7Z6u+oy4siQuFr9QT0MQIDAQAB
 o4HWMIHTMAkGA1UdEwQCMAAwEQYJYIZIAYb4QgEBBAQDAgWgMAsGA1UdDwQEAwIF4DAsBglg
 hkgBhvhCAQ0EHxYdT3BlblNTTCBHZW5lcmF0ZWQgQ2VydGlmaWNhdGUwHQYDVR0OBBYEFHw4
 +LnuALyLA5Cgy7T5ZAX1WzKPMB8GA1UdIwQYMBaAFF3U3hpBZq40HB5VM7B44/gmXiI0MDgG
 CWCGSAGG+EIBAwQrFilodHRwczovL2N1ZGFzeXN0ZW1zLm5ldDoxMTQ0My9yZXZva2VkLmNy
 bDANBgkqhkiG9w0BAQUFAAOCAQEAZ0L4tQbBd0hd4wuw/YVqEBDDXJ54q2AoqQAmsOlnoxLO
 31ehM/LvrTIP4yK2u1VmXtUumQ4Ao15JFM+xmwqtEGsh70RRrfVBAGd7KOZ3GB39FP2TgN/c
 L5fJKVxOqvEnW6cL9QtvUlcM3hXg8kDv60OB+LIcSE/P3/s+0tEpWPjxm3LHVE7JmPbZIcJ1
 YMoZvHh0NSjY5D0HZlwtbDO7pDz9sZf1QEOgjH828fhtborkaHaUI46pmrMjiBnY6ujXMcWD
 pxtikki0zY22nrxfTs5xDWGxyrc/cmucjxClJF6+OYVUSaZhiiHfa9Pr+41okLgsRB0AmNwE
 f6ItY3TI8DGCBQowggUGAgEBMIGjMIGdMQswCQYDVQQGEwJVUzEQMA4GA1UECBMHRmxvcmlk
 YTESMBAGA1UEBxMJTmljZXZpbGxlMRkwFwYDVQQKExBDdWRhIFN5c3RlbXMgTExDMRwwGgYD
 VQQDExNDdWRhIFN5c3RlbXMgTExDIENBMS8wLQYJKoZIhvcNAQkBFiBjdXN0b21lci1zZXJ2
 aWNlQGN1ZGFzeXN0ZW1zLm5ldAIBCDAJBgUrDgMCGgUAoIICOzAYBgkqhkiG9w0BCQMxCwYJ
 KoZIhvcNAQcBMBwGCSqGSIb3DQEJBTEPFw0xNDAzMjQxMTQxMTZaMCMGCSqGSIb3DQEJBDEW
 BBSpjQAUz/irMw8ktf83fseTJbi4szBsBgkqhkiG9w0BCQ8xXzBdMAsGCWCGSAFlAwQBKjAL
 BglghkgBZQMEAQIwCgYIKoZIhvcNAwcwDgYIKoZIhvcNAwICAgCAMA0GCCqGSIb3DQMCAgFA
 MAcGBSsOAwIHMA0GCCqGSIb3DQMCAgEoMIG0BgkrBgEEAYI3EAQxgaYwgaMwgZ0xCzAJBgNV
 BAYTAlVTMRAwDgYDVQQIEwdGbG9yaWRhMRIwEAYDVQQHEwlOaWNldmlsbGUxGTAXBgNVBAoT
 EEN1ZGEgU3lzdGVtcyBMTEMxHDAaBgNVBAMTE0N1ZGEgU3lzdGVtcyBMTEMgQ0ExLzAtBgkq
 hkiG9w0BCQEWIGN1c3RvbWVyLXNlcnZpY2VAY3VkYXN5c3RlbXMubmV0AgEIMIG2BgsqhkiG
 9w0BCRACCzGBpqCBozCBnTELMAkGA1UEBhMCVVMxEDAOBgNVBAgTB0Zsb3JpZGExEjAQBgNV
 BAcTCU5pY2V2aWxsZTEZMBcGA1UEChMQQ3VkYSBTeXN0ZW1zIExMQzEcMBoGA1UEAxMTQ3Vk
 YSBTeXN0ZW1zIExMQyBDQTEvMC0GCSqGSIb3DQEJARYgY3VzdG9tZXItc2VydmljZUBjdWRh
 c3lzdGVtcy5uZXQCAQgwDQYJKoZIhvcNAQEBBQAEggIAgRTUPXy9Gqe69794f6zBeOsb1GYt
 t732rinQP9a/SadpluwziBBHL2O1NpjuaP/TPTCQIj0Tc7T02QJ8KPmsLVpRy9r115eLcQ8L
 Yp/jDpRwUXKn7690gNf4NknaqmQTkiT7GN8/knSyyj3Oy3rWaTbjoAYsG5Iiu2aPiNP86SvZ
 60meUP6agmELnPRfpeJuixzB225n7o8X20wkiG1iJYSLHDceuPo4oy6/OStg+efxcxxOrBrq
 PIMTn5pXK0iNKLxgyHWm3We3jLXDq4NLBL844LJ1tuj1Axp++rwwhgs7aNHvwSwFc1iDh+KB
 UjxL0HTC5sapGdcyEFLcOW/SL400sZOlxBjmHYCHQ/2toNiUdc9CsOiDmgMrkFjOvHrWqsuX
 wHFra919HLtiqdUy3TxYLDh+3toa/1BW/DEEYDtWPqjWcoHIp2RasLAeJl9HAqlU/KgqfrUa
 eM0mnAEVa0qx5/KaGFqN1sl9EYhIJJgVTsQpb2Xk84p4c2ANxoK2uZ912pNHcq7tiplVd0F+
 WuYrYVkaXh+QJARJo3+GPzc9UnErDHLQSMYLBVQzhuA7CRDo/Orb2kUubZxWsD+9ztL/A8Wd
 ElW4DDD/or1xFdCsFPllvxFdiwBGKLccyqyPHQzgVQS+Sgi0vL3Ph7RgKGkUf+qGJxBRs97s
 g58oRFcAAAAAAAA=
 --------------ms090509050705090705090709--
 
 

From: Karl Denninger <karl@denninger.net>
To: bug-followup@FreeBSD.org
Cc:  
Subject: Re: kern/187594: [zfs] [patch] ZFS ARC behavior problem and fix
Date: Wed, 26 Mar 2014 07:20:25 -0500

 This is a cryptographically signed message in MIME format.
 
 --------------ms080306070708080308040001
 Content-Type: text/plain; charset=ISO-8859-1; format=flowed
 Content-Transfer-Encoding: quoted-printable
 
 Updated to handle the change in <sys/vmmeter.h> that was recently=20
 committed to HEAD and slightly tweak the default reservation to be equal =
 
 to the VM system's "wakeup" level.
 
 This appears, after lots of use in multiple environments, to be the=20
 ideal default setting.  The knobs remain if you wish to twist then, and=20
 I have also exposed the return flag for shrinking being needed should=20
 you want to monitor it for some reason.
 
 This change to arc.c has made a tremendous (and positive) difference in=20
 system behavior and others that are running it have made similar comments=
 =2E
 
 For those having problems with the PR system mangling these patches you=20
 can get the below patch via direct fetch at=20
 http://www.denninger.net/FreeBSD-Patches/arc-patch
 
 *** arc.c.original	Sun Mar 23 14:56:01 2014
 --- arc.c	Tue Mar 25 09:24:14 2014
 ***************
 *** 18,23 ****
 --- 18,95 ----
     *
     * CDDL HEADER END
     */
 +
 + /* Karl Denninger (karl@denninger.net), 3/25/2014, FreeBSD-specific
 +  *
 +  * If "NEWRECLAIM" is defined, change the "low memory" warning that cau=
 ses
 +  * the ARC cache to be pared down.  The reason for the change is that t=
 he
 +  * apparent attempted algorithm is to start evicting ARC cache when fre=
 e
 +  * pages fall below 25% of installed RAM.  This maps reasonably well to=
  how
 +  * Solaris is documented to behave; when "lotsfree" is invaded ZFS is t=
 old
 +  * to pare down.
 +  *
 +  * The problem is that on FreeBSD machines the system doesn't appear to=
  be
 +  * getting what the authors of the original code thought they were look=
 ing at
 +  * with its test -- or at least not what Solaris did -- and as a result=
  that
 +  * test never triggers.  That leaves the only reclaim trigger as the "p=
 aging
 +  * needed" status flag, and by the time * that trips the system is alre=
 ady
 +  * in low-memory trouble.  This can lead to severe pathological behavio=
 r
 +  * under the following scenario:
 +  * - The system starts to page and ARC is evicted.
 +  * - The system stops paging as ARC's eviction drops wired RAM a bit.
 +  * - ARC starts increasing its allocation again, and wired memory grows=
 =2E
 +  * - A new image is activated, and the system once again attempts to pa=
 ge.
 +  * - ARC starts to be evicted again.
 +  * - Back to #2
 +  *
 +  * Note that ZFS's ARC default (unless you override it in /boot/loader.=
 conf)
 +  * is to allow the ARC cache to grab nearly all of free RAM, provided n=
 obody
 +  * else needs it.  That would be ok if we evicted cache when required.
 +  *
 +  * Unfortunately the system can get into a state where it never
 +  * manages to page anything of materiality back in, as if there is acti=
 ve
 +  * I/O the ARC will start grabbing space once again as soon as the memo=
 ry
 +  * contention state drops.  For this reason the "paging is occurring" f=
 lag
 +  * should be the **last resort** condition for ARC eviction; you want t=
 o
 +  * (as Solaris does) start when there is material free RAM left BUT the=
 
 +  * vm system thinks it needs to be active to steal pages back in the at=
 tempt
 +  * to never get into the condition where you're potentially paging off
 +  * executables in favor of leaving disk cache allocated.
 +  *
 +  * To fix this we change how we look at low memory, declaring two new
 +  * runtime tunables and one status.
 +  *
 +  * The new sysctls are:
 +  * vfs.zfs.arc_freepages (free pages required to call RAM "sufficient")=
 
 +  * vfs.zfs.arc_freepage_percent (additional reservation percentage, def=
 ault 0)
 +  * vfs.zfs.arc_shrink_needed (shows "1" if we're asking for shrinking t=
 he ARC)
 +  *
 +  * vfs.zfs.arc_freepages is initialized from vm.v_free_target.
 +  * This should insure that we allow the VM system to steal pages,
 +  * but pare the cache before we suspend processes attempting to get mor=
 e
 +  * memory, thereby avoiding "stalls."  You can set this higher if you w=
 ish,
 +  * or force a specific percentage reservation as well, but doing so may=
 
 +  * cause the cache to pare back while the VM system remains willing to
 +  * allow "inactive" pages to accumulate.  The challenge is that image
 +  * activation can force things into the page space on a repeated basis
 +  * if you allow this level to be too small (the above pathological
 +  * behavior); the defaults should avoid that behavior but the sysctls
 +  * are exposed should your workload require adjustment.
 +  *
 +  * If we're using this check for low memory we are replacing the previo=
 us
 +  * ones, including the oddball "random" reclaim that appears to fire fa=
 r
 +  * more often than it should.  We still trigger if the system pages.
 +  *
 +  * If you turn on NEWRECLAIM_DEBUG then the kernel will print on the co=
 nsole
 +  * status messages when the reclaim status trips on and off, along with=
  the
 +  * page count aggregate that triggered it (and the free space) for each=
 
 +  * event.
 +  */
 +
 + #define	NEWRECLAIM
 + #undef	NEWRECLAIM_DEBUG
 +
 +
    /*
     * Copyright (c) 2005, 2010, Oracle and/or its affiliates. All rights =
 reserved.
     * Copyright (c) 2013 by Delphix. All rights reserved.
 ***************
 *** 139,144 ****
 --- 211,230 ----
   =20
    #include <vm/vm_pageout.h>
   =20
 + #ifdef	NEWRECLAIM
 + #ifdef	__FreeBSD__
 + #include <sys/sysctl.h>
 + #include <sys/vmmeter.h>
 + /*
 +  * Struct cnt. was renamed in -head (11-current) at rev 110016; check f=
 or it
 +  */
 + #if __FreeBSD_version < 1100016
 + #define	vm_cnt	cnt
 + #endif	/* __FreeBSD_version */
 +
 + #endif	/* __FreeBSD__ */
 + #endif	/* NEWRECLAIM */
 +
    #ifdef illumos
    #ifndef _KERNEL
    /* set with ZFS_DEBUG=3Dwatch, to enable watchpoints on frozen buffers=
  */
 ***************
 *** 203,218 ****
 --- 289,327 ----
    int zfs_arc_shrink_shift =3D 0;
    int zfs_arc_p_min_shift =3D 0;
    int zfs_disable_dup_eviction =3D 0;
 + #ifdef	NEWRECLAIM
 + #ifdef  __FreeBSD__
 + static	int freepages =3D 0;	/* This much memory is considered critical =
 */
 + static	int percent_target =3D 0;	/* Additionally reserve "X" percent fr=
 ee RAM */
 + static	int shrink_needed =3D 0;	/* Shrinkage of ARC cache needed?	*/
 + #endif	/* __FreeBSD__ */
 + #endif	/* NEWRECLAIM */
   =20
    TUNABLE_QUAD("vfs.zfs.arc_max", &zfs_arc_max);
    TUNABLE_QUAD("vfs.zfs.arc_min", &zfs_arc_min);
    TUNABLE_QUAD("vfs.zfs.arc_meta_limit", &zfs_arc_meta_limit);
 + #ifdef	NEWRECLAIM
 + #ifdef  __FreeBSD__
 + TUNABLE_INT("vfs.zfs.arc_freepages", &freepages);
 + TUNABLE_INT("vfs.zfs.arc_freepage_percent", &percent_target);
 + TUNABLE_INT("vfs.zfs.arc_shrink_needed", &shrink_needed);
 + #endif	/* __FreeBSD__ */
 + #endif	/* NEWRECLAIM */
 +
    SYSCTL_DECL(_vfs_zfs);
    SYSCTL_UQUAD(_vfs_zfs, OID_AUTO, arc_max, CTLFLAG_RDTUN, &zfs_arc_max,=
  0,
        "Maximum ARC size");
    SYSCTL_UQUAD(_vfs_zfs, OID_AUTO, arc_min, CTLFLAG_RDTUN, &zfs_arc_min,=
  0,
        "Minimum ARC size");
   =20
 + #ifdef	NEWRECLAIM
 + #ifdef  __FreeBSD__
 + SYSCTL_INT(_vfs_zfs, OID_AUTO, arc_freepages, CTLFLAG_RWTUN, &freepages=
 , 0, "ARC Free RAM Pages Required");
 + SYSCTL_INT(_vfs_zfs, OID_AUTO, arc_freepage_percent, CTLFLAG_RWTUN, &pe=
 rcent_target, 0, "ARC Free RAM Target percentage");
 + SYSCTL_INT(_vfs_zfs, OID_AUTO, arc_shrink_needed, CTLFLAG_RD, &shrink_n=
 eeded, 0, "ARC Memory Constrained (0 =3D no, 1 =3D yes)");
 + #endif	/* __FreeBSD__ */
 + #endif	/* NEWRECLAIM */
 +
    /*
     * Note that buffers can be in one of 6 states:
     *	ARC_anon	- anonymous (discussed below)
 ***************
 *** 2438,2443 ****
 --- 2547,2557 ----
    {
   =20
    #ifdef _KERNEL
 + #ifdef	NEWRECLAIM_DEBUG
 + 	static	int	xval =3D -1;
 + 	static	int	oldpercent =3D 0;
 + 	static	int	oldfreepages =3D 0;
 + #endif	/* NEWRECLAIM_DEBUG */
   =20
    	if (needfree)
    		return (1);
 ***************
 *** 2476,2481 ****
 --- 2590,2596 ----
    		return (1);
   =20
    #if defined(__i386)
 +
    	/*
    	 * If we're on an i386 platform, it's possible that we'll exhaust the=
 
    	 * kernel heap space before we ever run out of available physical
 ***************
 *** 2492,2502 ****
    		return (1);
    #endif
    #else	/* !sun */
    	if (kmem_used() > (kmem_size() * 3) / 4)
    		return (1);
    #endif	/* sun */
   =20
 - #else
    	if (spa_get_random(100) =3D=3D 0)
    		return (1);
    #endif
 --- 2607,2671 ----
    		return (1);
    #endif
    #else	/* !sun */
 +
 + #ifdef	NEWRECLAIM
 + #ifdef  __FreeBSD__
 + /*
 +  * Implement the new tunable free RAM algorithm.  We check the free pag=
 es
 +  * against the minimum specified target and the percentage that should =
 be
 +  * free.  If we're low we ask for ARC cache shrinkage.  If this is defi=
 ned
 +  * on a FreeBSD system the older checks are not performed.
 +  *
 +  * Check first to see if we need to init freepages, then test.
 +  */
 + 	if (!freepages) {		/* If zero then (re)init */
 + 		freepages =3D vm_cnt.v_free_target;
 + #ifdef	NEWRECLAIM_DEBUG
 + 		printf("ZFS ARC: Default vfs.zfs.arc_freepages to [%u]\n", freepages)=
 ;
 + #endif	/* NEWRECLAIM_DEBUG */
 + 	}
 + #ifdef	NEWRECLAIM_DEBUG
 + 	if (percent_target !=3D oldpercent) {
 + 		printf("ZFS ARC: Reservation percent change to [%d], [%d] pages, [%d]=
  free\n", percent_target, vm_cnt.v_page_count, vm_cnt.v_free_count);
 + 		oldpercent =3D percent_target;
 + 	}
 + 	if (freepages !=3D oldfreepages) {
 + 		printf("ZFS ARC: Low RAM page change to [%d], [%d] pages, [%d] free\n=
 ", freepages, vm_cnt.v_page_count, vm_cnt.v_free_count);
 + 		oldfreepages =3D freepages;
 + 	}
 + #endif	/* NEWRECLAIM_DEBUG */
 + /*
 +  * Now figure out how much free RAM we require to call the ARC cache st=
 atus
 +  * "ok".  Add the percentage specified of the total to the base require=
 ment.
 +  */
 +
 + 	if (vm_cnt.v_free_count < (freepages + ((vm_cnt.v_page_count / 100) * =
 percent_target))) {
 + #ifdef	NEWRECLAIM_DEBUG
 + 		if (xval !=3D 1) {
 + 			printf("ZFS ARC: RECLAIM total %u, free %u, free pct (%u), reserved =
 (%u), target pct (%u)\n", vm_cnt.v_page_count, vm_cnt.v_free_count, ((vm_=
 cnt.v_free_count * 100) / vm_cnt.v_page_count), freepages, percent_target=
 );
 + 			xval =3D 1;
 + 		}
 + #endif	/* NEWRECLAIM_DEBUG */
 + 		shrink_needed =3D 1;
 + 		return(1);
 + 	} else {
 + #ifdef	NEWRECLAIM_DEBUG
 + 		if (xval !=3D 0) {
 + 			printf("ZFS ARC: NORMAL total %u, free %u, free pct (%u), reserved (=
 %u), target pct (%u)\n", vm_cnt.v_page_count, vm_cnt.v_free_count, ((vm_c=
 nt.v_free_count * 100) / vm_cnt.v_page_count), freepages, percent_target)=
 ;
 + 			xval =3D 0;
 + 		}
 + #endif	/* NEWRECLAIM_DEBUG */
 + 		shrink_needed =3D 0;
 + 		return(0);
 + 	}
 +
 + #endif	/* __FreeBSD__ */
 + #endif	/* NEWRECLAIM */
 +
    	if (kmem_used() > (kmem_size() * 3) / 4)
    		return (1);
    #endif	/* sun */
   =20
    	if (spa_get_random(100) =3D=3D 0)
    		return (1);
    #endif
 
 --=20
 -- Karl
 karl@denninger.net
 
 
 
 --------------ms080306070708080308040001
 Content-Type: application/pkcs7-signature; name="smime.p7s"
 Content-Transfer-Encoding: base64
 Content-Disposition: attachment; filename="smime.p7s"
 Content-Description: S/MIME Cryptographic Signature
 
 MIAGCSqGSIb3DQEHAqCAMIACAQExCzAJBgUrDgMCGgUAMIAGCSqGSIb3DQEHAQAAoIIFTzCC
 BUswggQzoAMCAQICAQgwDQYJKoZIhvcNAQEFBQAwgZ0xCzAJBgNVBAYTAlVTMRAwDgYDVQQI
 EwdGbG9yaWRhMRIwEAYDVQQHEwlOaWNldmlsbGUxGTAXBgNVBAoTEEN1ZGEgU3lzdGVtcyBM
 TEMxHDAaBgNVBAMTE0N1ZGEgU3lzdGVtcyBMTEMgQ0ExLzAtBgkqhkiG9w0BCQEWIGN1c3Rv
 bWVyLXNlcnZpY2VAY3VkYXN5c3RlbXMubmV0MB4XDTEzMDgyNDE5MDM0NFoXDTE4MDgyMzE5
 MDM0NFowWzELMAkGA1UEBhMCVVMxEDAOBgNVBAgTB0Zsb3JpZGExFzAVBgNVBAMTDkthcmwg
 RGVubmluZ2VyMSEwHwYJKoZIhvcNAQkBFhJrYXJsQGRlbm5pbmdlci5uZXQwggIiMA0GCSqG
 SIb3DQEBAQUAA4ICDwAwggIKAoICAQC5n2KBrBmG22nVntVdvgKCB9UcnapNThrW1L+dq6th
 d9l4mj+qYMUpJ+8I0rTbY1dn21IXQBoBQmy8t1doKwmTdQ59F0FwZEPt/fGbRgBKVt3Quf6W
 6n7kRk9MG6gdD7V9vPpFV41e+5MWYtqGWY3ScDP8SyYLjL/Xgr+5KFKkDfuubK8DeNqdLniV
 jHo/vqmIgO+6NgzPGPgmbutzFQXlxUqjiNAAKzF2+Tkddi+WKABrcc/EqnBb0X8GdqcIamO5
 SyVmuM+7Zdns7D9pcV16zMMQ8LfNFQCDvbCuuQKMDg2F22x5ekYXpwjqTyfjcHBkWC8vFNoY
 5aFMdyiN/Kkz0/kduP2ekYOgkRqcShfLEcG9SQ4LQZgqjMpTjSOGzBr3tOvVn5LkSJSHW2Z8
 Q0dxSkvFG2/lsOWFbwQeeZSaBi5vRZCYCOf5tRd1+E93FyQfpt4vsrXshIAk7IK7f0qXvxP4
 GDli5PKIEubD2Bn+gp3vB/DkfKySh5NBHVB+OPCoXRUWBkQxme65wBO02OZZt0k8Iq0i4Rci
 WV6z+lQHqDKtaVGgMsHn6PoeYhjf5Al5SP+U3imTjF2aCca1iDB5JOccX04MNljvifXgcbJN
 nkMgrzmm1ZgJ1PLur/ADWPlnz45quOhHg1TfUCLfI/DzgG7Z6u+oy4siQuFr9QT0MQIDAQAB
 o4HWMIHTMAkGA1UdEwQCMAAwEQYJYIZIAYb4QgEBBAQDAgWgMAsGA1UdDwQEAwIF4DAsBglg
 hkgBhvhCAQ0EHxYdT3BlblNTTCBHZW5lcmF0ZWQgQ2VydGlmaWNhdGUwHQYDVR0OBBYEFHw4
 +LnuALyLA5Cgy7T5ZAX1WzKPMB8GA1UdIwQYMBaAFF3U3hpBZq40HB5VM7B44/gmXiI0MDgG
 CWCGSAGG+EIBAwQrFilodHRwczovL2N1ZGFzeXN0ZW1zLm5ldDoxMTQ0My9yZXZva2VkLmNy
 bDANBgkqhkiG9w0BAQUFAAOCAQEAZ0L4tQbBd0hd4wuw/YVqEBDDXJ54q2AoqQAmsOlnoxLO
 31ehM/LvrTIP4yK2u1VmXtUumQ4Ao15JFM+xmwqtEGsh70RRrfVBAGd7KOZ3GB39FP2TgN/c
 L5fJKVxOqvEnW6cL9QtvUlcM3hXg8kDv60OB+LIcSE/P3/s+0tEpWPjxm3LHVE7JmPbZIcJ1
 YMoZvHh0NSjY5D0HZlwtbDO7pDz9sZf1QEOgjH828fhtborkaHaUI46pmrMjiBnY6ujXMcWD
 pxtikki0zY22nrxfTs5xDWGxyrc/cmucjxClJF6+OYVUSaZhiiHfa9Pr+41okLgsRB0AmNwE
 f6ItY3TI8DGCBQowggUGAgEBMIGjMIGdMQswCQYDVQQGEwJVUzEQMA4GA1UECBMHRmxvcmlk
 YTESMBAGA1UEBxMJTmljZXZpbGxlMRkwFwYDVQQKExBDdWRhIFN5c3RlbXMgTExDMRwwGgYD
 VQQDExNDdWRhIFN5c3RlbXMgTExDIENBMS8wLQYJKoZIhvcNAQkBFiBjdXN0b21lci1zZXJ2
 aWNlQGN1ZGFzeXN0ZW1zLm5ldAIBCDAJBgUrDgMCGgUAoIICOzAYBgkqhkiG9w0BCQMxCwYJ
 KoZIhvcNAQcBMBwGCSqGSIb3DQEJBTEPFw0xNDAzMjYxMjIwMjVaMCMGCSqGSIb3DQEJBDEW
 BBSDwhwrsF5DQBRdl4eHVgPE1IGT8TBsBgkqhkiG9w0BCQ8xXzBdMAsGCWCGSAFlAwQBKjAL
 BglghkgBZQMEAQIwCgYIKoZIhvcNAwcwDgYIKoZIhvcNAwICAgCAMA0GCCqGSIb3DQMCAgFA
 MAcGBSsOAwIHMA0GCCqGSIb3DQMCAgEoMIG0BgkrBgEEAYI3EAQxgaYwgaMwgZ0xCzAJBgNV
 BAYTAlVTMRAwDgYDVQQIEwdGbG9yaWRhMRIwEAYDVQQHEwlOaWNldmlsbGUxGTAXBgNVBAoT
 EEN1ZGEgU3lzdGVtcyBMTEMxHDAaBgNVBAMTE0N1ZGEgU3lzdGVtcyBMTEMgQ0ExLzAtBgkq
 hkiG9w0BCQEWIGN1c3RvbWVyLXNlcnZpY2VAY3VkYXN5c3RlbXMubmV0AgEIMIG2BgsqhkiG
 9w0BCRACCzGBpqCBozCBnTELMAkGA1UEBhMCVVMxEDAOBgNVBAgTB0Zsb3JpZGExEjAQBgNV
 BAcTCU5pY2V2aWxsZTEZMBcGA1UEChMQQ3VkYSBTeXN0ZW1zIExMQzEcMBoGA1UEAxMTQ3Vk
 YSBTeXN0ZW1zIExMQyBDQTEvMC0GCSqGSIb3DQEJARYgY3VzdG9tZXItc2VydmljZUBjdWRh
 c3lzdGVtcy5uZXQCAQgwDQYJKoZIhvcNAQEBBQAEggIAYDUFBvfbzEjGO/S9bOndOoWVjRmc
 o2r5uLDYcA5Lcy04X65DZX0UdMRdhRpRjaKfTESjoGnxv6nLif1h3jA2K27oSNBZGCOStD62
 V+z7xj7k1Q1UDPwMIDHqKFhd6UCM5C6zFj8mLOeBMqKRzPIGZ98f8MN5/0zoQWLlJgXuvpFb
 O1LXUvaiY/2Y1nmFoKTpcF5Yql3pazCTz+O9usLPLKblRZn3INyxBhgcvP1tgZfJinr4nt9N
 KRC9//tuTpdFlzcqXBgkB/pyp5i+zUXqQp7cyKxk8DO2lJ54QJs6VMzkv/GV1Buo/fo4p0jb
 Kw4axJnB22LRpkV8b+O4hLB9yDAhsfBocQ1kY5d38wawzXgXxJxcLQjSvC0mECoXs1wY1lSr
 la0VQdjLCiBzX62uFSGr/WY/RDHgkgFnAHHy2qAgrvX/Uiw9zaxrXnUpte4fMhokAdwYFOY5
 7+XLtq+xawHfwrWanemu677V2ZC7e+UatDNZy7BjzRKq5vNg2OvlXNskE8WQJGgE7DSi2+cz
 8n905Ou66/EcARS20VHGjb+KA70f/BDO3Q7a5WOzUxxyUb4s95wnVV7ty9Vh8VuMxLSdE1wY
 oy7xGTxEHRffkqUSTG2r4zvoygsRnglRKXZjJs2AIMSuEwtzFgGM3nwF06I4NelJf0zkjrfy
 KOEtrekAAAAAAAA=
 --------------ms080306070708080308040001--
 
 

From: <dteske@FreeBSD.org>
To: <bug-followup@FreeBSD.org>, <karl@fs.denninger.net>
Cc: <dteske@FreeBSD.org>
Subject: Re: kern/187594: [zfs] [patch] ZFS ARC behavior problem and fix
Date: Thu, 27 Mar 2014 14:55:58 -0700

 Hi,
 
 I can't seem to find the code where you mention in your
 previous post:
 
 `...and slightly tweak the default reservation to be equal 
 to the VM system's "wakeup" level.'
 
 Comparing Mar 26th's patch to Mar 24th's patch yields no
 such change. Did you post the latest patch?
 -- 
 Devin
 
 _____________
 The information contained in this message is proprietary and/or confidential. If you are not the intended recipient, please: (i) delete the message and all copies; (ii) do not disclose, distribute or use the message in any manner; and (iii) notify the sender immediately. In addition, please be aware that any message addressed to our domain is subject to archiving and review by persons other than the intended recipient. Thank you.

From: Karl Denninger <karl@denninger.net>
To: bug-followup@FreeBSD.org, karl@fs.denninger.net
Cc:  
Subject: Re: kern/187594: [zfs] [patch] ZFS ARC behavior problem and fix
Date: Thu, 27 Mar 2014 20:32:17 -0500

 This is a cryptographically signed message in MIME format.
 
 --------------ms020507060801030702090509
 Content-Type: text/plain; charset=ISO-8859-1; format=flowed
 Content-Transfer-Encoding: quoted-printable
 
 The last change was the "cnt" structure rename; the previous rev was the =
 
 one where freepages was set to cnt.v_free_target (the margin was removed =
 
 in the rev sent on 24-March in the morning) -- there was no logic change =
 
 made in the 26th March followup .vs. the previous one from 24 March.
 
 The latest that I and others are running is what is on the PR (and the=20
 link, which is identical -- a couple people had said they had trouble=20
 with the pr followup inclusions being problematic when copied down to be =
 
 applied.)
 
 Apologies for the confusion.
 
 --=20
 -- Karl
 karl@denninger.net
 
 
 
 --------------ms020507060801030702090509
 Content-Type: application/pkcs7-signature; name="smime.p7s"
 Content-Transfer-Encoding: base64
 Content-Disposition: attachment; filename="smime.p7s"
 Content-Description: S/MIME Cryptographic Signature
 
 MIAGCSqGSIb3DQEHAqCAMIACAQExCzAJBgUrDgMCGgUAMIAGCSqGSIb3DQEHAQAAoIIFTzCC
 BUswggQzoAMCAQICAQgwDQYJKoZIhvcNAQEFBQAwgZ0xCzAJBgNVBAYTAlVTMRAwDgYDVQQI
 EwdGbG9yaWRhMRIwEAYDVQQHEwlOaWNldmlsbGUxGTAXBgNVBAoTEEN1ZGEgU3lzdGVtcyBM
 TEMxHDAaBgNVBAMTE0N1ZGEgU3lzdGVtcyBMTEMgQ0ExLzAtBgkqhkiG9w0BCQEWIGN1c3Rv
 bWVyLXNlcnZpY2VAY3VkYXN5c3RlbXMubmV0MB4XDTEzMDgyNDE5MDM0NFoXDTE4MDgyMzE5
 MDM0NFowWzELMAkGA1UEBhMCVVMxEDAOBgNVBAgTB0Zsb3JpZGExFzAVBgNVBAMTDkthcmwg
 RGVubmluZ2VyMSEwHwYJKoZIhvcNAQkBFhJrYXJsQGRlbm5pbmdlci5uZXQwggIiMA0GCSqG
 SIb3DQEBAQUAA4ICDwAwggIKAoICAQC5n2KBrBmG22nVntVdvgKCB9UcnapNThrW1L+dq6th
 d9l4mj+qYMUpJ+8I0rTbY1dn21IXQBoBQmy8t1doKwmTdQ59F0FwZEPt/fGbRgBKVt3Quf6W
 6n7kRk9MG6gdD7V9vPpFV41e+5MWYtqGWY3ScDP8SyYLjL/Xgr+5KFKkDfuubK8DeNqdLniV
 jHo/vqmIgO+6NgzPGPgmbutzFQXlxUqjiNAAKzF2+Tkddi+WKABrcc/EqnBb0X8GdqcIamO5
 SyVmuM+7Zdns7D9pcV16zMMQ8LfNFQCDvbCuuQKMDg2F22x5ekYXpwjqTyfjcHBkWC8vFNoY
 5aFMdyiN/Kkz0/kduP2ekYOgkRqcShfLEcG9SQ4LQZgqjMpTjSOGzBr3tOvVn5LkSJSHW2Z8
 Q0dxSkvFG2/lsOWFbwQeeZSaBi5vRZCYCOf5tRd1+E93FyQfpt4vsrXshIAk7IK7f0qXvxP4
 GDli5PKIEubD2Bn+gp3vB/DkfKySh5NBHVB+OPCoXRUWBkQxme65wBO02OZZt0k8Iq0i4Rci
 WV6z+lQHqDKtaVGgMsHn6PoeYhjf5Al5SP+U3imTjF2aCca1iDB5JOccX04MNljvifXgcbJN
 nkMgrzmm1ZgJ1PLur/ADWPlnz45quOhHg1TfUCLfI/DzgG7Z6u+oy4siQuFr9QT0MQIDAQAB
 o4HWMIHTMAkGA1UdEwQCMAAwEQYJYIZIAYb4QgEBBAQDAgWgMAsGA1UdDwQEAwIF4DAsBglg
 hkgBhvhCAQ0EHxYdT3BlblNTTCBHZW5lcmF0ZWQgQ2VydGlmaWNhdGUwHQYDVR0OBBYEFHw4
 +LnuALyLA5Cgy7T5ZAX1WzKPMB8GA1UdIwQYMBaAFF3U3hpBZq40HB5VM7B44/gmXiI0MDgG
 CWCGSAGG+EIBAwQrFilodHRwczovL2N1ZGFzeXN0ZW1zLm5ldDoxMTQ0My9yZXZva2VkLmNy
 bDANBgkqhkiG9w0BAQUFAAOCAQEAZ0L4tQbBd0hd4wuw/YVqEBDDXJ54q2AoqQAmsOlnoxLO
 31ehM/LvrTIP4yK2u1VmXtUumQ4Ao15JFM+xmwqtEGsh70RRrfVBAGd7KOZ3GB39FP2TgN/c
 L5fJKVxOqvEnW6cL9QtvUlcM3hXg8kDv60OB+LIcSE/P3/s+0tEpWPjxm3LHVE7JmPbZIcJ1
 YMoZvHh0NSjY5D0HZlwtbDO7pDz9sZf1QEOgjH828fhtborkaHaUI46pmrMjiBnY6ujXMcWD
 pxtikki0zY22nrxfTs5xDWGxyrc/cmucjxClJF6+OYVUSaZhiiHfa9Pr+41okLgsRB0AmNwE
 f6ItY3TI8DGCBQowggUGAgEBMIGjMIGdMQswCQYDVQQGEwJVUzEQMA4GA1UECBMHRmxvcmlk
 YTESMBAGA1UEBxMJTmljZXZpbGxlMRkwFwYDVQQKExBDdWRhIFN5c3RlbXMgTExDMRwwGgYD
 VQQDExNDdWRhIFN5c3RlbXMgTExDIENBMS8wLQYJKoZIhvcNAQkBFiBjdXN0b21lci1zZXJ2
 aWNlQGN1ZGFzeXN0ZW1zLm5ldAIBCDAJBgUrDgMCGgUAoIICOzAYBgkqhkiG9w0BCQMxCwYJ
 KoZIhvcNAQcBMBwGCSqGSIb3DQEJBTEPFw0xNDAzMjgwMTMyMTdaMCMGCSqGSIb3DQEJBDEW
 BBSHn3P56J5+zff1Vf53Qobt/cgTwjBsBgkqhkiG9w0BCQ8xXzBdMAsGCWCGSAFlAwQBKjAL
 BglghkgBZQMEAQIwCgYIKoZIhvcNAwcwDgYIKoZIhvcNAwICAgCAMA0GCCqGSIb3DQMCAgFA
 MAcGBSsOAwIHMA0GCCqGSIb3DQMCAgEoMIG0BgkrBgEEAYI3EAQxgaYwgaMwgZ0xCzAJBgNV
 BAYTAlVTMRAwDgYDVQQIEwdGbG9yaWRhMRIwEAYDVQQHEwlOaWNldmlsbGUxGTAXBgNVBAoT
 EEN1ZGEgU3lzdGVtcyBMTEMxHDAaBgNVBAMTE0N1ZGEgU3lzdGVtcyBMTEMgQ0ExLzAtBgkq
 hkiG9w0BCQEWIGN1c3RvbWVyLXNlcnZpY2VAY3VkYXN5c3RlbXMubmV0AgEIMIG2BgsqhkiG
 9w0BCRACCzGBpqCBozCBnTELMAkGA1UEBhMCVVMxEDAOBgNVBAgTB0Zsb3JpZGExEjAQBgNV
 BAcTCU5pY2V2aWxsZTEZMBcGA1UEChMQQ3VkYSBTeXN0ZW1zIExMQzEcMBoGA1UEAxMTQ3Vk
 YSBTeXN0ZW1zIExMQyBDQTEvMC0GCSqGSIb3DQEJARYgY3VzdG9tZXItc2VydmljZUBjdWRh
 c3lzdGVtcy5uZXQCAQgwDQYJKoZIhvcNAQEBBQAEggIABDSkqEfzqvQZihKsBol+YJIFn5MB
 eum5+9zmYaDCSWT2QCYTYmC1oe7/YORhWUAm/9/W0kHqTzh5vQQyF8RUKoNFGOgLWAYHbv6N
 GoKPh65jqJK7FMbxYtX/wakEaV+wsRTBBbWRdy6PjBtPhwy0izxRVzKk+SfgQkPB6E6Kzv/O
 8BflxmuxA2cECq9rHsPQy9xDmRhLHUTDy8g155NQ5YWNbx7LORTQF072Z0woI7cQPTgW3NrV
 yNvtFckQ6Lc6z91j7bsB0VfQvAYMTRpOm0oesUKLYRrZSo+vMuATrC9AC7Ehzcx2fglMMwsz
 xywZ5jzvn+3Hh61aphhC6XqT513hWjhgeV7yH2TH3aMeGdzjF0JQnDWz90KmiWsSO4mb8zFc
 Devvir/rgGddETsPyE1jDfBGxHiVJjIwVSI/9/6AKiuc2VULrDnz6rDM2QinZ4nlZGslL07e
 3vfxZMoSPGOQaljKOLEJrQ3lLS4ax8L3isedvW1X2Akg8G9zMLrzNy9YVESE8xTZEVCpPHqf
 gb6hjL4Zs5xZ+iAIXJv4fz4xS+OLQgpE05JBqbV9SvTu+Dp+iKzQsa5hkXkwy6/OXa/mumfq
 4dl5muWQGUs6xjKBj+0fmnuX5BRE4+weWOei2DR/sRLWEmRllJFpuR9qhQ6rMBc/53/Wy01N
 6qD7zncAAAAAAAA=
 --------------ms020507060801030702090509--
 
 

From: Karl Denninger <karl@denninger.net>
To: bug-followup@FreeBSD.org
Cc:  
Subject: Re: kern/187594: [zfs] [patch] ZFS ARC behavior problem and fix
Date: Thu, 03 Apr 2014 11:57:50 -0500

 This is a cryptographically signed message in MIME format.
 
 --------------ms040709030506010201090909
 Content-Type: text/plain; charset=ISO-8859-1; format=flowed
 Content-Transfer-Encoding: quoted-printable
 
 After more than a week of operation without any changes on a very busy=20
 production server this is what the status looks like at this particular=20
 moment in time (caught it being pretty quiet at the moment... slow day):
 
 [karl@NewFS ~]$ uptime
 11:56AM  up 10 days, 20:37, 1 user, load averages: 0.80, 0.59, 0.58
 
 [karl@NewFS ~]$ uname -v
 FreeBSD 10.0-STABLE #22 r263665:263671M: Sun Mar 23 15:00:48 CDT 2014
   karl@NewFS.denninger.net:/usr/obj/usr/src/sys/KSD-SMP
 
 
      1 users    Load  0.50  0.57  0.58                  Apr  3 11:52
 
 Mem:KB    REAL            VIRTUAL                       VN PAGER   SWAP P=
 AGER
          Tot   Share      Tot    Share    Free           in   out     in =
   out
 Act 4503936   32680  9319616    54908  701712  count
 All  17598k   42312 10162228   293268          pages
 Proc:                                                            Interrup=
 ts
    r   p   d   s   w   Csw  Trp  Sys  Int  Sof  Flt        ioflt  2635 to=
 tal
    2         245   3  9936 4302  12k  990  442 2878   1161 cow      11 ua=
 rt0 4
                                                       1460 zfod     53 uh=
 ci0 16
   0.6%Sys   0.1%Intr  1.5%User  0.0%Nice 97.8%Idle         ozfod       pc=
 m0 17
 |    |    |    |    |    |    |    |    |    |           %ozfod       ehc=
 i0 uhci
 >                                                         daefr       uhc=
 i1 21
                                             dtbuf     1779 prcfr   532 uh=
 ci3 ehci
 Namei     Name-cache   Dir-cache    485888 desvn     3862 totfr    44 twa=
 0 30
     Calls    hits   %    hits   %    145761 numvn          react   989 cp=
 u0:timer
     18611   18549 100                121467 frevn          pdwak    69 mp=
 s0 256
                                                        909 pdpgs    24 em=
 0:rx 0
 Disks   da0   da1   da2   da3   da4   da5   da6           intrn    32 em0=
 :tx 0
 KB/t  10.30 10.39  0.00  0.00 22.61 24.69 24.39  19017980 wire        em0=
 :link
 tps      21    21     0     0    10    16    16   2197580 act     118 em1=
 :rx 0
 MB/s   0.22  0.22  0.00  0.00  0.22  0.39  0.39   2544544 inact   107 em1=
 :tx 0
 %busy    19    19     0     0     0     1     1      3276 cache       em1=
 :link
                                                     698064 free        ah=
 ci0:ch0
                                                            buf      32 cp=
 u1:timer
                                                                     24 cp=
 u10:time
                                                                     50 cp=
 u6:timer
                                                                     26 cp=
 u12:time
                                                                     37 cp=
 u7:timer
                                                                     45 cp=
 u14:time
                                                                     41 cp=
 u4:timer
                                                                     35 cp=
 u15:time
                                                                     25 cp=
 u5:timer
                                                                     45 cp=
 u9:timer
                                                                     45 cp=
 u2:timer
                                                                    102 cp=
 u11:time
                                                                     63 cp=
 u3:timer
                                                                     41 cp=
 u13:time
                                                                     45 cp=
 u8:timer
 
 [karl@NewFS ~]$ zpool list
 NAME     SIZE  ALLOC   FREE    CAP  DEDUP  HEALTH  ALTROOT
 media   2.72T  2.12T   616G    77%  1.00x  ONLINE  -
 zroot    234G  18.8G   215G     8%  1.36x  ONLINE  -
 zstore  3.63T  2.50T  1.13T    68%  1.00x  ONLINE  -
 
 [karl@NewFS ~]$ zfs-stats -A
 
 ------------------------------------------------------------------------
 ZFS Subsystem Report                            Thu Apr  3 11:53:42 2014
 ------------------------------------------------------------------------
 
 ARC Summary: (HEALTHY)
          Memory Throttle Count:                  0
 
 ARC Misc:
          Deleted:                                27.84m
          Recycle Misses:                         1.12m
          Mutex Misses:                           2.65k
          Evict Skips:                            39.26m
 
 ARC Size:                               59.13%  13.20   GiB
          Target Size: (Adaptive)         59.14%  13.20   GiB
          Min Size (Hard Limit):          12.50%  2.79    GiB
          Max Size (High Water):          8:1     22.33   GiB
 
 ARC Size Breakdown:
          Recently Used Cache Size:       81.41%  10.75   GiB
          Frequently Used Cache Size:     18.59%  2.46    GiB
 
 ARC Hash Breakdown:
          Elements Max:                           2.69m
          Elements Current:               63.22%  1.70m
          Collisions:                             95.13m
          Chain Max:                              24
          Chains:                                 413.62k
 
 ------------------------------------------------------------------------
 
 [karl@NewFS ~]$ zfs-stats -E
 
 ------------------------------------------------------------------------
 ZFS Subsystem Report                            Thu Apr  3 11:53:59 2014
 ------------------------------------------------------------------------
 
 ARC Efficiency:                                 1.28b
          Cache Hit Ratio:                98.37%  1.26b
          Cache Miss Ratio:               1.63%   20.80m
          Actual Hit Ratio:               60.07%  766.91m
 
          Data Demand Efficiency:         99.15%  435.02m
          Data Prefetch Efficiency:       20.45%  17.49m
 
          CACHE HITS BY CACHE LIST:
            Anonymously Used:             38.72%  486.24m
            Most Recently Used:           3.74%   46.94m
            Most Frequently Used:         57.33%  719.97m
            Most Recently Used Ghost:     0.06%   792.68k
            Most Frequently Used Ghost:   0.16%   1.97m
 
          CACHE HITS BY DATA TYPE:
            Demand Data:                  34.34%  431.32m
            Prefetch Data:                0.28%   3.58m
            Demand Metadata:              23.72%  297.92m
            Prefetch Metadata:            41.65%  523.09m
 
          CACHE MISSES BY DATA TYPE:
            Demand Data:                  17.75%  3.69m
            Prefetch Data:                66.88%  13.91m
            Demand Metadata:              5.78%   1.20m
            Prefetch Metadata:            9.60%   2.00m
 
 ------------------------------------------------------------------------
 
 Grinnin' big, in short.
 
 I have no reason to make further changes to the code or defaults.
 
 --=20
 -- Karl
 karl@denninger.net
 
 
 
 --------------ms040709030506010201090909
 Content-Type: application/pkcs7-signature; name="smime.p7s"
 Content-Transfer-Encoding: base64
 Content-Disposition: attachment; filename="smime.p7s"
 Content-Description: S/MIME Cryptographic Signature
 
 MIAGCSqGSIb3DQEHAqCAMIACAQExCzAJBgUrDgMCGgUAMIAGCSqGSIb3DQEHAQAAoIIFTzCC
 BUswggQzoAMCAQICAQgwDQYJKoZIhvcNAQEFBQAwgZ0xCzAJBgNVBAYTAlVTMRAwDgYDVQQI
 EwdGbG9yaWRhMRIwEAYDVQQHEwlOaWNldmlsbGUxGTAXBgNVBAoTEEN1ZGEgU3lzdGVtcyBM
 TEMxHDAaBgNVBAMTE0N1ZGEgU3lzdGVtcyBMTEMgQ0ExLzAtBgkqhkiG9w0BCQEWIGN1c3Rv
 bWVyLXNlcnZpY2VAY3VkYXN5c3RlbXMubmV0MB4XDTEzMDgyNDE5MDM0NFoXDTE4MDgyMzE5
 MDM0NFowWzELMAkGA1UEBhMCVVMxEDAOBgNVBAgTB0Zsb3JpZGExFzAVBgNVBAMTDkthcmwg
 RGVubmluZ2VyMSEwHwYJKoZIhvcNAQkBFhJrYXJsQGRlbm5pbmdlci5uZXQwggIiMA0GCSqG
 SIb3DQEBAQUAA4ICDwAwggIKAoICAQC5n2KBrBmG22nVntVdvgKCB9UcnapNThrW1L+dq6th
 d9l4mj+qYMUpJ+8I0rTbY1dn21IXQBoBQmy8t1doKwmTdQ59F0FwZEPt/fGbRgBKVt3Quf6W
 6n7kRk9MG6gdD7V9vPpFV41e+5MWYtqGWY3ScDP8SyYLjL/Xgr+5KFKkDfuubK8DeNqdLniV
 jHo/vqmIgO+6NgzPGPgmbutzFQXlxUqjiNAAKzF2+Tkddi+WKABrcc/EqnBb0X8GdqcIamO5
 SyVmuM+7Zdns7D9pcV16zMMQ8LfNFQCDvbCuuQKMDg2F22x5ekYXpwjqTyfjcHBkWC8vFNoY
 5aFMdyiN/Kkz0/kduP2ekYOgkRqcShfLEcG9SQ4LQZgqjMpTjSOGzBr3tOvVn5LkSJSHW2Z8
 Q0dxSkvFG2/lsOWFbwQeeZSaBi5vRZCYCOf5tRd1+E93FyQfpt4vsrXshIAk7IK7f0qXvxP4
 GDli5PKIEubD2Bn+gp3vB/DkfKySh5NBHVB+OPCoXRUWBkQxme65wBO02OZZt0k8Iq0i4Rci
 WV6z+lQHqDKtaVGgMsHn6PoeYhjf5Al5SP+U3imTjF2aCca1iDB5JOccX04MNljvifXgcbJN
 nkMgrzmm1ZgJ1PLur/ADWPlnz45quOhHg1TfUCLfI/DzgG7Z6u+oy4siQuFr9QT0MQIDAQAB
 o4HWMIHTMAkGA1UdEwQCMAAwEQYJYIZIAYb4QgEBBAQDAgWgMAsGA1UdDwQEAwIF4DAsBglg
 hkgBhvhCAQ0EHxYdT3BlblNTTCBHZW5lcmF0ZWQgQ2VydGlmaWNhdGUwHQYDVR0OBBYEFHw4
 +LnuALyLA5Cgy7T5ZAX1WzKPMB8GA1UdIwQYMBaAFF3U3hpBZq40HB5VM7B44/gmXiI0MDgG
 CWCGSAGG+EIBAwQrFilodHRwczovL2N1ZGFzeXN0ZW1zLm5ldDoxMTQ0My9yZXZva2VkLmNy
 bDANBgkqhkiG9w0BAQUFAAOCAQEAZ0L4tQbBd0hd4wuw/YVqEBDDXJ54q2AoqQAmsOlnoxLO
 31ehM/LvrTIP4yK2u1VmXtUumQ4Ao15JFM+xmwqtEGsh70RRrfVBAGd7KOZ3GB39FP2TgN/c
 L5fJKVxOqvEnW6cL9QtvUlcM3hXg8kDv60OB+LIcSE/P3/s+0tEpWPjxm3LHVE7JmPbZIcJ1
 YMoZvHh0NSjY5D0HZlwtbDO7pDz9sZf1QEOgjH828fhtborkaHaUI46pmrMjiBnY6ujXMcWD
 pxtikki0zY22nrxfTs5xDWGxyrc/cmucjxClJF6+OYVUSaZhiiHfa9Pr+41okLgsRB0AmNwE
 f6ItY3TI8DGCBQowggUGAgEBMIGjMIGdMQswCQYDVQQGEwJVUzEQMA4GA1UECBMHRmxvcmlk
 YTESMBAGA1UEBxMJTmljZXZpbGxlMRkwFwYDVQQKExBDdWRhIFN5c3RlbXMgTExDMRwwGgYD
 VQQDExNDdWRhIFN5c3RlbXMgTExDIENBMS8wLQYJKoZIhvcNAQkBFiBjdXN0b21lci1zZXJ2
 aWNlQGN1ZGFzeXN0ZW1zLm5ldAIBCDAJBgUrDgMCGgUAoIICOzAYBgkqhkiG9w0BCQMxCwYJ
 KoZIhvcNAQcBMBwGCSqGSIb3DQEJBTEPFw0xNDA0MDMxNjU3NTBaMCMGCSqGSIb3DQEJBDEW
 BBQtiQvKlj5Ru9Pv6YxPqFzRIx08NDBsBgkqhkiG9w0BCQ8xXzBdMAsGCWCGSAFlAwQBKjAL
 BglghkgBZQMEAQIwCgYIKoZIhvcNAwcwDgYIKoZIhvcNAwICAgCAMA0GCCqGSIb3DQMCAgFA
 MAcGBSsOAwIHMA0GCCqGSIb3DQMCAgEoMIG0BgkrBgEEAYI3EAQxgaYwgaMwgZ0xCzAJBgNV
 BAYTAlVTMRAwDgYDVQQIEwdGbG9yaWRhMRIwEAYDVQQHEwlOaWNldmlsbGUxGTAXBgNVBAoT
 EEN1ZGEgU3lzdGVtcyBMTEMxHDAaBgNVBAMTE0N1ZGEgU3lzdGVtcyBMTEMgQ0ExLzAtBgkq
 hkiG9w0BCQEWIGN1c3RvbWVyLXNlcnZpY2VAY3VkYXN5c3RlbXMubmV0AgEIMIG2BgsqhkiG
 9w0BCRACCzGBpqCBozCBnTELMAkGA1UEBhMCVVMxEDAOBgNVBAgTB0Zsb3JpZGExEjAQBgNV
 BAcTCU5pY2V2aWxsZTEZMBcGA1UEChMQQ3VkYSBTeXN0ZW1zIExMQzEcMBoGA1UEAxMTQ3Vk
 YSBTeXN0ZW1zIExMQyBDQTEvMC0GCSqGSIb3DQEJARYgY3VzdG9tZXItc2VydmljZUBjdWRh
 c3lzdGVtcy5uZXQCAQgwDQYJKoZIhvcNAQEBBQAEggIASySS2MCYfM+HpbvhybMFLOf39MPG
 9lKM9kaeWPUhLSzp5DqLZuL4cvQvel17Pr7SNed8aqtR+VoSQ3QyxDEdNRRh3jGA+Nk1GYdL
 UFaOPwxrmie0+IaMxjay5SoZoaBnuDL15QoExVIQIwindXtuX2R6ze97agYUn91exV/6GLhg
 51QuOancUKExT1zGOEMzLg2YxpYQOb6yicsjaqXOTLHcUGh+oZRRzNelEpUCSAotnkOa7ikx
 juNPdyfyJPZO4Oapvn0TwWY03JBX2BbhCRU5wLU7U0PPpExH/wbH1EpjXMT5Xx5g15EERk7N
 6E7nMROdEJmyK2N1pkD43paPX4oz5pjwiZZSOzr8HrV/pxzUitv2zOvonpmwYIpFlub9XGk1
 MUn2NNCpttbhLRgMSFFa99gakenZEjq3mrW4chJyHGg10FXh+Mrxh8Dv/HRB1sVRzp3D8LPw
 2nUw2MV9mhfZJGGz5QSkqudGwOkC7EMvHtdhEiyLWiHs6Ro9YWT65uGiz1uZcdhexGyjJjg8
 pSkcLaG++Ty+LtwJknhdwgHlDdDsThE1Zf8YXf10BA96uykt53sLbwnW20yy5FpGr8dcgiWh
 H4a84Div8YZ0OnR3+MyOXMU0+EBet1ojAarc2xMH6m+MnqIMl6nL9BhiEA51KE84odL6Nw8z
 sNJgryAAAAAAAAA=
 --------------ms040709030506010201090909--
 
 

From: Karl Denninger <karl@denninger.net>
To: bug-followup@FreeBSD.org, karl@fs.denninger.net
Cc:  
Subject: Re: kern/187594: [zfs] [patch] ZFS ARC behavior problem and fix
Date: Mon, 14 Apr 2014 10:40:49 -0500

 This is a cryptographically signed message in MIME format.
 
 --------------ms090507070206000900070606
 Content-Type: text/plain; charset=ISO-8859-1; format=flowed
 Content-Transfer-Encoding: quoted-printable
 
 Follow-up:
 
 21 days at this point of uninterrupted uptime, inact pages are stable as =
 
 is the free list, wired and free are appropriate and varies with load as =
 
 expected, ZERO swapping and performance is and has remained excellent,=20
 all on a very-heavily used fairly-beefy (~24GB RAM, dual Xeon CPUs)=20
 production system under 10-STABLE.
 
 --=20
 -- Karl
 karl@denninger.net
 
 
 
 --------------ms090507070206000900070606
 Content-Type: application/pkcs7-signature; name="smime.p7s"
 Content-Transfer-Encoding: base64
 Content-Disposition: attachment; filename="smime.p7s"
 Content-Description: S/MIME Cryptographic Signature
 
 MIAGCSqGSIb3DQEHAqCAMIACAQExCzAJBgUrDgMCGgUAMIAGCSqGSIb3DQEHAQAAoIIFTzCC
 BUswggQzoAMCAQICAQgwDQYJKoZIhvcNAQEFBQAwgZ0xCzAJBgNVBAYTAlVTMRAwDgYDVQQI
 EwdGbG9yaWRhMRIwEAYDVQQHEwlOaWNldmlsbGUxGTAXBgNVBAoTEEN1ZGEgU3lzdGVtcyBM
 TEMxHDAaBgNVBAMTE0N1ZGEgU3lzdGVtcyBMTEMgQ0ExLzAtBgkqhkiG9w0BCQEWIGN1c3Rv
 bWVyLXNlcnZpY2VAY3VkYXN5c3RlbXMubmV0MB4XDTEzMDgyNDE5MDM0NFoXDTE4MDgyMzE5
 MDM0NFowWzELMAkGA1UEBhMCVVMxEDAOBgNVBAgTB0Zsb3JpZGExFzAVBgNVBAMTDkthcmwg
 RGVubmluZ2VyMSEwHwYJKoZIhvcNAQkBFhJrYXJsQGRlbm5pbmdlci5uZXQwggIiMA0GCSqG
 SIb3DQEBAQUAA4ICDwAwggIKAoICAQC5n2KBrBmG22nVntVdvgKCB9UcnapNThrW1L+dq6th
 d9l4mj+qYMUpJ+8I0rTbY1dn21IXQBoBQmy8t1doKwmTdQ59F0FwZEPt/fGbRgBKVt3Quf6W
 6n7kRk9MG6gdD7V9vPpFV41e+5MWYtqGWY3ScDP8SyYLjL/Xgr+5KFKkDfuubK8DeNqdLniV
 jHo/vqmIgO+6NgzPGPgmbutzFQXlxUqjiNAAKzF2+Tkddi+WKABrcc/EqnBb0X8GdqcIamO5
 SyVmuM+7Zdns7D9pcV16zMMQ8LfNFQCDvbCuuQKMDg2F22x5ekYXpwjqTyfjcHBkWC8vFNoY
 5aFMdyiN/Kkz0/kduP2ekYOgkRqcShfLEcG9SQ4LQZgqjMpTjSOGzBr3tOvVn5LkSJSHW2Z8
 Q0dxSkvFG2/lsOWFbwQeeZSaBi5vRZCYCOf5tRd1+E93FyQfpt4vsrXshIAk7IK7f0qXvxP4
 GDli5PKIEubD2Bn+gp3vB/DkfKySh5NBHVB+OPCoXRUWBkQxme65wBO02OZZt0k8Iq0i4Rci
 WV6z+lQHqDKtaVGgMsHn6PoeYhjf5Al5SP+U3imTjF2aCca1iDB5JOccX04MNljvifXgcbJN
 nkMgrzmm1ZgJ1PLur/ADWPlnz45quOhHg1TfUCLfI/DzgG7Z6u+oy4siQuFr9QT0MQIDAQAB
 o4HWMIHTMAkGA1UdEwQCMAAwEQYJYIZIAYb4QgEBBAQDAgWgMAsGA1UdDwQEAwIF4DAsBglg
 hkgBhvhCAQ0EHxYdT3BlblNTTCBHZW5lcmF0ZWQgQ2VydGlmaWNhdGUwHQYDVR0OBBYEFHw4
 +LnuALyLA5Cgy7T5ZAX1WzKPMB8GA1UdIwQYMBaAFF3U3hpBZq40HB5VM7B44/gmXiI0MDgG
 CWCGSAGG+EIBAwQrFilodHRwczovL2N1ZGFzeXN0ZW1zLm5ldDoxMTQ0My9yZXZva2VkLmNy
 bDANBgkqhkiG9w0BAQUFAAOCAQEAZ0L4tQbBd0hd4wuw/YVqEBDDXJ54q2AoqQAmsOlnoxLO
 31ehM/LvrTIP4yK2u1VmXtUumQ4Ao15JFM+xmwqtEGsh70RRrfVBAGd7KOZ3GB39FP2TgN/c
 L5fJKVxOqvEnW6cL9QtvUlcM3hXg8kDv60OB+LIcSE/P3/s+0tEpWPjxm3LHVE7JmPbZIcJ1
 YMoZvHh0NSjY5D0HZlwtbDO7pDz9sZf1QEOgjH828fhtborkaHaUI46pmrMjiBnY6ujXMcWD
 pxtikki0zY22nrxfTs5xDWGxyrc/cmucjxClJF6+OYVUSaZhiiHfa9Pr+41okLgsRB0AmNwE
 f6ItY3TI8DGCBQowggUGAgEBMIGjMIGdMQswCQYDVQQGEwJVUzEQMA4GA1UECBMHRmxvcmlk
 YTESMBAGA1UEBxMJTmljZXZpbGxlMRkwFwYDVQQKExBDdWRhIFN5c3RlbXMgTExDMRwwGgYD
 VQQDExNDdWRhIFN5c3RlbXMgTExDIENBMS8wLQYJKoZIhvcNAQkBFiBjdXN0b21lci1zZXJ2
 aWNlQGN1ZGFzeXN0ZW1zLm5ldAIBCDAJBgUrDgMCGgUAoIICOzAYBgkqhkiG9w0BCQMxCwYJ
 KoZIhvcNAQcBMBwGCSqGSIb3DQEJBTEPFw0xNDA0MTQxNTQwNDlaMCMGCSqGSIb3DQEJBDEW
 BBQYq2pvkbMCcqXmG1R35UciRZZfHTBsBgkqhkiG9w0BCQ8xXzBdMAsGCWCGSAFlAwQBKjAL
 BglghkgBZQMEAQIwCgYIKoZIhvcNAwcwDgYIKoZIhvcNAwICAgCAMA0GCCqGSIb3DQMCAgFA
 MAcGBSsOAwIHMA0GCCqGSIb3DQMCAgEoMIG0BgkrBgEEAYI3EAQxgaYwgaMwgZ0xCzAJBgNV
 BAYTAlVTMRAwDgYDVQQIEwdGbG9yaWRhMRIwEAYDVQQHEwlOaWNldmlsbGUxGTAXBgNVBAoT
 EEN1ZGEgU3lzdGVtcyBMTEMxHDAaBgNVBAMTE0N1ZGEgU3lzdGVtcyBMTEMgQ0ExLzAtBgkq
 hkiG9w0BCQEWIGN1c3RvbWVyLXNlcnZpY2VAY3VkYXN5c3RlbXMubmV0AgEIMIG2BgsqhkiG
 9w0BCRACCzGBpqCBozCBnTELMAkGA1UEBhMCVVMxEDAOBgNVBAgTB0Zsb3JpZGExEjAQBgNV
 BAcTCU5pY2V2aWxsZTEZMBcGA1UEChMQQ3VkYSBTeXN0ZW1zIExMQzEcMBoGA1UEAxMTQ3Vk
 YSBTeXN0ZW1zIExMQyBDQTEvMC0GCSqGSIb3DQEJARYgY3VzdG9tZXItc2VydmljZUBjdWRh
 c3lzdGVtcy5uZXQCAQgwDQYJKoZIhvcNAQEBBQAEggIAXwyNfelEJ1dXzTY8t4nHykXZincj
 dfw/aVPIlXlUPk87GtZCSj2t30EAGMt4+wjhwUre+ZGl/9A6DBnBJ6XHNpqSqY5kSGA19L7+
 beTSTC4/dUcLA8EPVNbMyYPN1SsY/uyz/AzSfTAra9Ypy+n9pEPE4dQPfCN9W3RmKAeiQ4uX
 W1mkFQNUcYoGy5pwROpY2PT6ZUpWT8OM809PvhSI74pvepv95E0Gkl0kyg43MTO114A8jNPf
 XueHegfAvy5Prhb+VxKwWRZtD35toCepCy4q446nMGMVbyptKxcZ/HQj45ukCsCbQfRUP4yC
 ichEitLOJ5LfO5jg+3aQKaSgz8ByHLLXOCD0pZufXPkGiqM/Pd+bnZB/J4pwgHDMxnHjsOnf
 ImrDJzRsZ6e90ym6SwvGeZ4Ta4zwivc+y0WpnP2csN80gr128CB9yGqWOIOg2viB8PpY0hXh
 HnT0cdPZt2jw235qoYMzIRpbjDkAXIyGiIVNuf4ga5fELo/Z54GWYzvvXDI4PNQ42CTeNxcT
 dKfUgcqWyxIFczhF67apxw0xiM4XxPTMmaqV8EhtxizUpAEMvuatrInELjxyCkoD5ie2PZdh
 l+bw/MVLlcYMGO6i29XKfUdupAQIO6eAtmJB/lllH600R/YcpTQia9nibgX3/wAAD76k+JEf
 Ij7SLrkAAAAAAAA=
 --------------ms090507070206000900070606--
 
 

From: <dteske@FreeBSD.org>
To: <bug-followup@FreeBSD.org>, <karl@fs.denninger.net>
Cc:  
Subject: Re: kern/187594: [zfs] [patch] ZFS ARC behavior problem and fix
Date: Mon, 14 Apr 2014 10:58:09 -0700

 Been running this on stable/8 for a week now on 3 separate machines.
 All appears stable and under heavy load, we can certainly see the new
 reclaim firing early and appropriately when-needed (no longer do we
 have programs getting swapped out).
 
 Interestingly, in our testing we've found that we can force the old
 reclaim (code state prior to applying Karl's patch) to fire by sapping the
 few remaining pages from unallocated memory. I do this by exploiting
 a little known bug in the bourne-shell to leak memory (command below).
 
 	sh -c 'f(){ while :;do local b;done;};f'
 
 Watching "top" in the un-patched state, we can see Wired memory grow
 from ARC usage but not drop. I then run the above command and "top"
 shows an "sh" process with a fast-growing "SIZE", quickly eating up about
 100MB per second. When "top" shows the Free memory drop to mere KB
 (single pages), we see the original (again, unpatched) reclaim algorithm
 fire
 and the Wired memory finally starts to drop.
 
 After applying this patch, we no longer have to play the game of "eat
 all my remaining memory to force the original reclaim event to free up
 pages", but rather the ARC waxes and wanes with normal applicate usage.
 
 However, I must say that on stable/8 the problem of applications going to
 sleep is not nearly as bad as I have experienced it in 9 or 10.
 
 We are happy to report that the patch seems to be a win for stable/8 as
 well because in our case, we do like to have a bit of free memory and the
 old reclaim was not providing that. It's nice to not have to resort to
 tricks to
 get the ARC to pare down.
 -- 
 Cheers,
 Devin
 
 _____________
 The information contained in this message is proprietary and/or confidential. If you are not the intended recipient, please: (i) delete the message and all copies; (ii) do not disclose, distribute or use the message in any manner; and (iii) notify the sender immediately. In addition, please be aware that any message addressed to our domain is subject to archiving and review by persons other than the intended recipient. Thank you.

From: Karl Denninger <karl@denninger.net>
To: bug-followup@FreeBSD.org
Cc:  
Subject: Re: kern/187594: [zfs] [patch] ZFS ARC behavior problem and fix
Date: Thu, 15 May 2014 10:05:34 -0500

 This is a cryptographically signed message in MIME format.
 
 --------------ms050404000608010301080104
 Content-Type: text/plain; charset=ISO-8859-1; format=flowed
 Content-Transfer-Encoding: quoted-printable
 
 I have now been running the latest delta as posted 26 March -- it is=20
 coming up on two months now, has been stable here and I've seen several=20
 positive reports and no negative ones on impact for others. Performance=20
 continues to be "as expected."
 
 Is there an expectation on this being merged forward and/or MFC'd?
 
 --=20
 -- Karl
 karl@denninger.net
 
 
 
 --------------ms050404000608010301080104
 Content-Type: application/pkcs7-signature; name="smime.p7s"
 Content-Transfer-Encoding: base64
 Content-Disposition: attachment; filename="smime.p7s"
 Content-Description: S/MIME Cryptographic Signature
 
 MIAGCSqGSIb3DQEHAqCAMIACAQExCzAJBgUrDgMCGgUAMIAGCSqGSIb3DQEHAQAAoIIFTzCC
 BUswggQzoAMCAQICAQgwDQYJKoZIhvcNAQEFBQAwgZ0xCzAJBgNVBAYTAlVTMRAwDgYDVQQI
 EwdGbG9yaWRhMRIwEAYDVQQHEwlOaWNldmlsbGUxGTAXBgNVBAoTEEN1ZGEgU3lzdGVtcyBM
 TEMxHDAaBgNVBAMTE0N1ZGEgU3lzdGVtcyBMTEMgQ0ExLzAtBgkqhkiG9w0BCQEWIGN1c3Rv
 bWVyLXNlcnZpY2VAY3VkYXN5c3RlbXMubmV0MB4XDTEzMDgyNDE5MDM0NFoXDTE4MDgyMzE5
 MDM0NFowWzELMAkGA1UEBhMCVVMxEDAOBgNVBAgTB0Zsb3JpZGExFzAVBgNVBAMTDkthcmwg
 RGVubmluZ2VyMSEwHwYJKoZIhvcNAQkBFhJrYXJsQGRlbm5pbmdlci5uZXQwggIiMA0GCSqG
 SIb3DQEBAQUAA4ICDwAwggIKAoICAQC5n2KBrBmG22nVntVdvgKCB9UcnapNThrW1L+dq6th
 d9l4mj+qYMUpJ+8I0rTbY1dn21IXQBoBQmy8t1doKwmTdQ59F0FwZEPt/fGbRgBKVt3Quf6W
 6n7kRk9MG6gdD7V9vPpFV41e+5MWYtqGWY3ScDP8SyYLjL/Xgr+5KFKkDfuubK8DeNqdLniV
 jHo/vqmIgO+6NgzPGPgmbutzFQXlxUqjiNAAKzF2+Tkddi+WKABrcc/EqnBb0X8GdqcIamO5
 SyVmuM+7Zdns7D9pcV16zMMQ8LfNFQCDvbCuuQKMDg2F22x5ekYXpwjqTyfjcHBkWC8vFNoY
 5aFMdyiN/Kkz0/kduP2ekYOgkRqcShfLEcG9SQ4LQZgqjMpTjSOGzBr3tOvVn5LkSJSHW2Z8
 Q0dxSkvFG2/lsOWFbwQeeZSaBi5vRZCYCOf5tRd1+E93FyQfpt4vsrXshIAk7IK7f0qXvxP4
 GDli5PKIEubD2Bn+gp3vB/DkfKySh5NBHVB+OPCoXRUWBkQxme65wBO02OZZt0k8Iq0i4Rci
 WV6z+lQHqDKtaVGgMsHn6PoeYhjf5Al5SP+U3imTjF2aCca1iDB5JOccX04MNljvifXgcbJN
 nkMgrzmm1ZgJ1PLur/ADWPlnz45quOhHg1TfUCLfI/DzgG7Z6u+oy4siQuFr9QT0MQIDAQAB
 o4HWMIHTMAkGA1UdEwQCMAAwEQYJYIZIAYb4QgEBBAQDAgWgMAsGA1UdDwQEAwIF4DAsBglg
 hkgBhvhCAQ0EHxYdT3BlblNTTCBHZW5lcmF0ZWQgQ2VydGlmaWNhdGUwHQYDVR0OBBYEFHw4
 +LnuALyLA5Cgy7T5ZAX1WzKPMB8GA1UdIwQYMBaAFF3U3hpBZq40HB5VM7B44/gmXiI0MDgG
 CWCGSAGG+EIBAwQrFilodHRwczovL2N1ZGFzeXN0ZW1zLm5ldDoxMTQ0My9yZXZva2VkLmNy
 bDANBgkqhkiG9w0BAQUFAAOCAQEAZ0L4tQbBd0hd4wuw/YVqEBDDXJ54q2AoqQAmsOlnoxLO
 31ehM/LvrTIP4yK2u1VmXtUumQ4Ao15JFM+xmwqtEGsh70RRrfVBAGd7KOZ3GB39FP2TgN/c
 L5fJKVxOqvEnW6cL9QtvUlcM3hXg8kDv60OB+LIcSE/P3/s+0tEpWPjxm3LHVE7JmPbZIcJ1
 YMoZvHh0NSjY5D0HZlwtbDO7pDz9sZf1QEOgjH828fhtborkaHaUI46pmrMjiBnY6ujXMcWD
 pxtikki0zY22nrxfTs5xDWGxyrc/cmucjxClJF6+OYVUSaZhiiHfa9Pr+41okLgsRB0AmNwE
 f6ItY3TI8DGCBQowggUGAgEBMIGjMIGdMQswCQYDVQQGEwJVUzEQMA4GA1UECBMHRmxvcmlk
 YTESMBAGA1UEBxMJTmljZXZpbGxlMRkwFwYDVQQKExBDdWRhIFN5c3RlbXMgTExDMRwwGgYD
 VQQDExNDdWRhIFN5c3RlbXMgTExDIENBMS8wLQYJKoZIhvcNAQkBFiBjdXN0b21lci1zZXJ2
 aWNlQGN1ZGFzeXN0ZW1zLm5ldAIBCDAJBgUrDgMCGgUAoIICOzAYBgkqhkiG9w0BCQMxCwYJ
 KoZIhvcNAQcBMBwGCSqGSIb3DQEJBTEPFw0xNDA1MTUxNTA1MzRaMCMGCSqGSIb3DQEJBDEW
 BBSbaVBm+ygF/PjHv+0foTPNgyAZHDBsBgkqhkiG9w0BCQ8xXzBdMAsGCWCGSAFlAwQBKjAL
 BglghkgBZQMEAQIwCgYIKoZIhvcNAwcwDgYIKoZIhvcNAwICAgCAMA0GCCqGSIb3DQMCAgFA
 MAcGBSsOAwIHMA0GCCqGSIb3DQMCAgEoMIG0BgkrBgEEAYI3EAQxgaYwgaMwgZ0xCzAJBgNV
 BAYTAlVTMRAwDgYDVQQIEwdGbG9yaWRhMRIwEAYDVQQHEwlOaWNldmlsbGUxGTAXBgNVBAoT
 EEN1ZGEgU3lzdGVtcyBMTEMxHDAaBgNVBAMTE0N1ZGEgU3lzdGVtcyBMTEMgQ0ExLzAtBgkq
 hkiG9w0BCQEWIGN1c3RvbWVyLXNlcnZpY2VAY3VkYXN5c3RlbXMubmV0AgEIMIG2BgsqhkiG
 9w0BCRACCzGBpqCBozCBnTELMAkGA1UEBhMCVVMxEDAOBgNVBAgTB0Zsb3JpZGExEjAQBgNV
 BAcTCU5pY2V2aWxsZTEZMBcGA1UEChMQQ3VkYSBTeXN0ZW1zIExMQzEcMBoGA1UEAxMTQ3Vk
 YSBTeXN0ZW1zIExMQyBDQTEvMC0GCSqGSIb3DQEJARYgY3VzdG9tZXItc2VydmljZUBjdWRh
 c3lzdGVtcy5uZXQCAQgwDQYJKoZIhvcNAQEBBQAEggIADfjFv6rTjcQz+JkCtjZT46+zYdFw
 eGsHTyUH/UplZMAEw0ZKjz23rJwI5tG/bZFXI3XKqQl1cENsPHLydVEqEz15Qj1k2QlD/FyM
 zYuy0ylyiH96jpPHQ68DznVTahj5Oiv24s7mZExPlDtzLtO8KtbdIEy75aT5PpP7Er+ZhVLL
 55cY4F+CMxcsU5SzmMYtmURF9is0SucxSFe1KOv+By+ZyEtb+DCIsaq9eFR1ZrgOJ+18hgXs
 V+IL4inluJs9teLG6lxz7kM+0y3zsEhtwNxYmsSMDI2AGzjZsrSB5Aps9j/x2TZ3L8pHECye
 mshDGcd3ukuXLrnRl6wEUqsXzRkcylcsNgVK/rREeItnGTL4wzJaMPCFpow0U5iBNVBP6jgI
 xjKygNNg3LWmKapDcbsgl3MVKSmaijXqkMREwAZX66Tq/8ZMhB9KWoUa8j0+itwW6HX+hOai
 J3j2/z6vmh1aiz1F5I8vjlp6fjSBj2YThWYbGYJMWQKecUcOi94f0W0SvZobTKCfbrrJgvyY
 1W2u7m1J5FAKu6qKLdfRWAcy3HL5H6FTdUAmzcd1/3VErrLluvfQE6ZmvefD1286AEpfoM2B
 qjA5JTu9qy3i3gOwtaHEUbQmMjjhrSEO7z8ID1BdTPmvCaVFY7aFZ2GxB/iXu0n3p/CI5c3K
 FNWTkFMAAAAAAAA=
 --------------ms050404000608010301080104--
 
 
>Unformatted:
