From pmc@citylink.dinoex.sub.de  Sun Sep 13 22:43:23 2009
Return-Path: <pmc@citylink.dinoex.sub.de>
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 50E5B106566C
	for <FreeBSD-gnats-submit@freebsd.org>; Sun, 13 Sep 2009 22:43:23 +0000 (UTC)
	(envelope-from pmc@citylink.dinoex.sub.de)
Received: from uucp.dinoex.sub.de (uucp.dinoex.sub.de [194.45.71.2])
	by mx1.freebsd.org (Postfix) with ESMTP id BCDD98FC0C
	for <FreeBSD-gnats-submit@freebsd.org>; Sun, 13 Sep 2009 22:43:22 +0000 (UTC)
Received: from uucp.dinoex.sub.de (uucp@uucp.dinoex.sub.de [194.45.71.2] (may be forged))
	by uucp.dinoex.sub.de (8.14.3/8.14.2) with ESMTP id n8DMCaRY014210
	for <FreeBSD-gnats-submit@freebsd.org>; Mon, 14 Sep 2009 00:12:37 +0200 (CEST)
	(envelope-from pmc@citylink.dinoex.sub.de)
Received: from citylink.dinoex.sub.de (uucp@localhost)
	by uucp.dinoex.sub.de (8.14.3/8.14.2/Submit) with UUCP id n8DMCa57014209
	for FreeBSD-gnats-submit@freebsd.org; Mon, 14 Sep 2009 00:12:36 +0200 (CEST)
	(envelope-from pmc@citylink.dinoex.sub.de)
Received: from gate.oper.dinoex.org (gate-e [192.168.98.2])
	by citylink.dinoex.sub.de (8.14.3/8.14.3) with ESMTP id n8DLMptC086064
	for <FreeBSD-gnats-submit@freebsd.org>; Sun, 13 Sep 2009 23:22:52 +0200 (CEST)
	(envelope-from pmc@disp.oper.dinoex.org)
Received: from disp.oper.dinoex.org (disp-fe.oper.dinoex.org [192.168.96.5])
	by gate.oper.dinoex.org (8.14.3/8.14.3) with ESMTP id n8DLLiT6085901
	for <FreeBSD-gnats-submit@freebsd.org>; Sun, 13 Sep 2009 23:21:44 +0200 (CEST)
	(envelope-from pmc@disp.oper.dinoex.org)
Received: from disp.oper.dinoex.org (localhost [127.0.0.1])
	by disp.oper.dinoex.org (8.14.3/8.14.3) with ESMTP id n8DLLcso065516
	for <FreeBSD-gnats-submit@freebsd.org>; Sun, 13 Sep 2009 23:21:38 +0200 (CEST)
	(envelope-from pmc@disp.oper.dinoex.org)
Received: (from pmc@localhost)
	by disp.oper.dinoex.org (8.14.3/8.14.3/Submit) id n8DLLcxT065515;
	Sun, 13 Sep 2009 23:21:38 +0200 (CEST)
	(envelope-from pmc)
Message-Id: <200909132121.n8DLLcxT065515@disp.oper.dinoex.org>
Date: Sun, 13 Sep 2009 23:21:38 +0200 (CEST)
From: Peter Much <pmc@citylink.dinoex.sub.org>
Reply-To: Peter Much <pmc@citylink.dinoex.sub.org>
To: FreeBSD-gnats-submit@freebsd.org
Cc:
Subject: ZFS ceases caching when mem demand is high
X-Send-Pr-Version: 3.113
X-GNATS-Notify:

>Number:         138790
>Category:       kern
>Synopsis:       [zfs] ZFS ceases caching when mem demand is high
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    avg
>State:          closed
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          change-request
>Submitter-Id:   current-users
>Arrival-Date:   Sun Sep 13 22:50:05 UTC 2009
>Closed-Date:    Sat Apr 02 08:24:04 UTC 2011
>Last-Modified:  Sat Apr 02 08:24:04 UTC 2011
>Originator:     Peter Much
>Release:        FreeBSD 7.2-STABLE i386
>Organization:
n/a
>Environment:
System: FreeBSD disp.oper.dinoex.org 7.2-STABLE FreeBSD 7.2-STABLE #0: Mon Aug 10 20:38:48 CEST 2009 root@disp.oper.dinoex.org:/usr/src/sys/i386/compile/D1R72V1 i386

7.2-STABLE as of July, with ZFS version 13

>Description:

The system was originally equipped with 256 MB memory, and runs a lot
of (seldom used) processes, so there was about 350 MB paged out and
some ongoing competition for memory - which was handled well by
the VM-pager.

I activated ZFS for a selected few filesystems where I need
journalling/CoW. The intention was not to increase performance, but 
to increase crash-safety, for a database only.

I noticed that the actual size of the ARCache did shrink down to 
~1 MB and only metadata would be cached. This makes write-performance
incredibly bad, because for every, say, 1kB write a 128kB read has
to happen first.

As a remedy I increased memory to 768 MB, but the only effect was
the ARCache now shrinking to 2 MB instead of 1 MB.

Reading the source showed me that ZFS will try to shrink the ARCache
as soon as Free+Cache mem gets down to someway near the uppermost 
threshold (vm.v_free_target + vm.v_cache_min).

So, some modification seems necessary here, as it appears
inacceptable that one has to increase the installed memory by maybe
4 or 5 times only to enable ZFS on a machine that otherwise would 
function suitably. 
I do currently not know how the behaviour is on big machines
with a couple GB ram - but I think there the free list will also
run low if enough processes compete for memory.

>How-To-Repeat:

Start some processes that use up the available memory. Best choice
might be ruby processes that do GarbageCollection and therefore
will be considered active and not candidate for swapping (otherwise
"sysctl vm.swap_idle_enabled=1" would be another solution).

Then check the difference of
kstat.zfs.misc.arcstats.size - vfs.zfs.arc_meta_used
(that should be the amount of payload data currently being cached)
going near zero.

>Fix:

Since ZFS write performance becomes horribly bad when there is no 
caching at all available, I suggest that a certain mimimum of
caching should be preserved even if the free list is quite low.

Therefore I changed the code in the following way (see attached
patch), and the results are now ok for my purpose.

Experiments showed that there is a certain risk that the machine may 
experience a freeze/lockdown when working on these parameters.
A crashdump analysis then gave me the hint that this seems to
happen when the amount of arc_anon buffers gets too high and requests
for further buffers will be declined. The machine then seems to
block all activity and steadily increase 
kstat.zfs.misc.arcstats.memory_throttle_count until watchdog-reboot.
I have not yet experienced this effect with the now attached patch,
but further evaluation seems necessary.


*** sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c.orig	Wed Aug  5 20:45:41 2009
--- sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c	Sat Sep  5 00:19:30 2009
***************
*** 1821,1831 ****
  
  #ifdef _KERNEL
  
  	/*
  	 * If pages are needed or we're within 2048 pages 
  	 * of needing to page need to reclaim
  	 */
! 	if (vm_pages_needed || (vm_paging_target() > -2048))
  		return (1);
  
  	if (needfree)
--- 1821,1835 ----
  
  #ifdef _KERNEL
  
+ 	if (vm_page_count_min())
+ 		return (1);
+ 
  	/*
  	 * If pages are needed or we're within 2048 pages 
  	 * of needing to page need to reclaim
  	 */
! 	if ((vm_pages_needed || (vm_paging_target() > -2048)) &&
! 			(arc_size > arc_c_min))
  		return (1);
  
  	if (needfree)
***************
*** 3338,3344 ****
  		available_memory += MIN(evictable_memory, arc_size - arc_c_min);
  	}
  
! 	if (inflight_data > available_memory / 4) {
  		ARCSTAT_INCR(arcstat_memory_throttle_count, 1);
  		return (ERESTART);
  	}
--- 3342,3348 ----
  		available_memory += MIN(evictable_memory, arc_size - arc_c_min);
  	}
  
! 	if ((inflight_data > available_memory / 4) && (arc_size > arc_c_min)) {
  		ARCSTAT_INCR(arcstat_memory_throttle_count, 1);
  		return (ERESTART);
  	}
>Release-Note:
>Audit-Trail:
Responsible-Changed-From-To: freebsd-bugs->freebsd-fs 
Responsible-Changed-By: linimon 
Responsible-Changed-When: Tue Sep 15 00:46:39 UTC 2009 
Responsible-Changed-Why:  
Over to maintainer(s). 

http://www.freebsd.org/cgi/query-pr.cgi?pr=138790 

From: dfilter@FreeBSD.ORG (dfilter service)
To: bug-followup@FreeBSD.org
Cc:  
Subject: Re: kern/138790: commit references a PR
Date: Fri, 17 Sep 2010 07:14:16 +0000 (UTC)

 Author: avg
 Date: Fri Sep 17 07:14:07 2010
 New Revision: 212780
 URL: http://svn.freebsd.org/changeset/base/212780
 
 Log:
   zfs arc_reclaim_needed: more reasonable threshold for available pages
   
   vm_paging_target() is not a trigger of any kind for pageademon, but
   rather a "soft" target for it when it's already triggered.
   Thus, trying to keep 2048 pages above that level at the expense of ARC
   was simply driving ARC size into the ground even with normal memory
   loads.
   Instead, use a threshold at which a pagedaemon scan is triggered, so
   that ARC reclaiming helps with pagedaemon's task, but the latter still
   recycles active and inactive pages.
   
   PR:		kern/146410, kern/138790
   MFC after:	3 weeks
 
 Modified:
   head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c
 
 Modified: head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c
 ==============================================================================
 --- head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c	Fri Sep 17 04:55:01 2010	(r212779)
 +++ head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c	Fri Sep 17 07:14:07 2010	(r212780)
 @@ -2161,10 +2161,10 @@ arc_reclaim_needed(void)
  		return (0);
  
  	/*
 -	 * If pages are needed or we're within 2048 pages
 -	 * of needing to page need to reclaim
 +	 * Cooperate with pagedaemon when it's time for it to scan
 +	 * and reclaim some pages.
  	 */
 -	if (vm_pages_needed || (vm_paging_target() > -2048))
 +	if (vm_paging_need())
  		return (1);
  
  #if 0
 _______________________________________________
 svn-src-all@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/svn-src-all
 To unsubscribe, send any mail to "svn-src-all-unsubscribe@freebsd.org"
 

From: dfilter@FreeBSD.ORG (dfilter service)
To: bug-followup@FreeBSD.org
Cc:  
Subject: Re: kern/138790: commit references a PR
Date: Fri, 17 Sep 2010 07:34:57 +0000 (UTC)

 Author: avg
 Date: Fri Sep 17 07:34:50 2010
 New Revision: 212783
 URL: http://svn.freebsd.org/changeset/base/212783
 
 Log:
   zfs arc_reclaim_needed: fix typo in mismerge in r212780
   
   PR:		kern/146410, kern/138790
   MFC after:	3 weeks
   X-MFC with:	r212780
 
 Modified:
   head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c
 
 Modified: head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c
 ==============================================================================
 --- head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c	Fri Sep 17 07:20:20 2010	(r212782)
 +++ head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c	Fri Sep 17 07:34:50 2010	(r212783)
 @@ -2160,7 +2160,7 @@ arc_reclaim_needed(void)
  	 * Cooperate with pagedaemon when it's time for it to scan
  	 * and reclaim some pages.
  	 */
 -	if (vm_paging_need())
 +	if (vm_paging_needed())
  		return (1);
  
  #if 0
 _______________________________________________
 svn-src-all@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/svn-src-all
 To unsubscribe, send any mail to "svn-src-all-unsubscribe@freebsd.org"
 
State-Changed-From-To: open->patched 
State-Changed-By: eadler 
State-Changed-When: Tue Mar 1 10:16:05 EST 2011 
State-Changed-Why:  
committed in head (r212783) 

http://www.freebsd.org/cgi/query-pr.cgi?pr=138790 
Responsible-Changed-From-To: freebsd-fs->avg 
Responsible-Changed-By: eadler 
Responsible-Changed-When: Tue Mar 1 10:23:16 EST 2011 
Responsible-Changed-Why:  
same as above 

http://www.freebsd.org/cgi/query-pr.cgi?pr=138790 
State-Changed-From-To: patched->closed 
State-Changed-By: avg 
State-Changed-When: Sat Apr 2 08:23:17 UTC 2011 
State-Changed-Why:  
I think that this has been actually resolved for all branches 
where active ZFS development/maintenance takes place. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=138790 
>Unformatted:
