From juhis@chernobyl.jmtilli.iki.fi  Sat May  8 20:51:24 2010
Return-Path: <juhis@chernobyl.jmtilli.iki.fi>
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id 1682A106566B
	for <FreeBSD-gnats-submit@freebsd.org>; Sat,  8 May 2010 20:51:24 +0000 (UTC)
	(envelope-from juhis@chernobyl.jmtilli.iki.fi)
Received: from smtp03.tky.fi (smtp03.tky.fi [82.130.63.73])
	by mx1.freebsd.org (Postfix) with SMTP id 944E48FC17
	for <FreeBSD-gnats-submit@freebsd.org>; Sat,  8 May 2010 20:51:23 +0000 (UTC)
Received: from manmutt.jmtilli.iki.fi ([82.130.23.70])
 by smtp03.tky.fi (SMSSMTP 4.1.9.35) with SMTP id M2010050823343506652
 for <FreeBSD-gnats-submit@freebsd.org>; Sat, 08 May 2010 23:34:35 +0300
Received: from chernobyl.jmtilli.iki.fi (chernobyl.jmtilli.iki.fi [172.16.0.7])
	by manmutt.jmtilli.iki.fi (Postfix) with ESMTP id 0EB62C
	for <FreeBSD-gnats-submit@freebsd.org>; Sat,  8 May 2010 23:34:36 +0300 (EEST)
Received: by chernobyl.jmtilli.iki.fi (Postfix, from userid 2001)
	id C065497; Sat,  8 May 2010 23:34:35 +0300 (EEST)
Message-Id: <20100508203435.C065497@chernobyl.jmtilli.iki.fi>
Date: Sat,  8 May 2010 23:34:35 +0300 (EEST)
From: Juha-Matti Tilli <jtilli@cc.hut.fi>
Reply-To: Juha-Matti Tilli <jtilli@cc.hut.fi>
To: FreeBSD-gnats-submit@freebsd.org
Cc:
Subject: [PATCH] bad file copy performance from UFS to ZFS
X-Send-Pr-Version: 3.113
X-GNATS-Notify:

>Number:         146410
>Category:       kern
>Synopsis:       [zfs] [patch] bad file copy performance from UFS to ZFS
>Confidential:   no
>Severity:       non-critical
>Priority:       medium
>Responsible:    pjd
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Sat May 08 21:00:10 UTC 2010
>Closed-Date:    
>Last-Modified:  Tue Feb  8 18:00:27 UTC 2011
>Originator:     Juha-Matti Tilli
>Release:        FreeBSD 8.0-STABLE amd64
>Organization:
>Environment:
System: FreeBSD chernobyl.jmtilli.iki.fi 8.0-STABLE FreeBSD 8.0-STABLE #0: Sat May 8 18:10:57 EEST 2010 root@chernobyl:/usr/obj/usr/src/sys/CHERNOBYL amd64

% dmesg|grep ' MB'   
real memory  = 4294967296 (4096 MB)
avail memory = 3829776384 (3652 MB)

In loader.conf:
vfs.zfs.arc_min=671088640
vfs.zfs.arc_max=2684354560
vfs.zfs.prefetch_disable=1
vm.kmem_size=3221225472

>Description:

When copying files from UFS to ZFS, performance decreases quickly to a small
fraction of the transfer rate of disks. The problem seems to be that caching of
files on UFS decreases the available memory so much that ZFS decreases ARC size
to the minimum and starts to throttle writes.

>How-To-Repeat:

Mount a ZFS filesystem and a UFS filesystem. Copy large files from UFS to ZFS.
Wait until top(1) shows that there is little memory free and that
kstat.zfs.misc.arcstats.size has decreased to vfs.zfs.arc_min. Observe the bad
performance and the continuous rapid increase of
kstat.zfs.misc.arcstats.memory_throttle_count.

>Fix:

I've managed to get better performance by modifying arc_memory_throttle to
include v_cache_count in addition to v_free_count to the number of available
pages, and increasing vm.v_cache_min and vm.v_cache_max so that more cached
file data is included in v_cache_count instead of v_inactive_count.

Here's a patch to sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c:

--- arc.c.orig  2010-05-08 17:53:38.343964308 +0300
+++ arc.c       2010-05-08 17:57:34.756952644 +0300
@@ -3516,7 +3516,8 @@
 {
 #ifdef _KERNEL
        uint64_t inflight_data = arc_anon->arcs_size;
-       uint64_t available_memory = ptoa((uintmax_t)cnt.v_free_count);
+       uint64_t available_memory = ptoa((uintmax_t)cnt.v_free_count +
+                                        (uintmax_t)cnt.v_cache_count);
        static uint64_t page_load = 0;
        static uint64_t last_txg = 0;

With this patch, vfs.zfs.arc_min=671088640, vm.v_cache_min=300000 and
vm.v_cache_max=400000, performance is adequate and I haven't seen a single
increase of kstat.zfs.misc.arcstats.memory_throttle_count.

This still doesn't solve the problem that ZFS decreases ARC size when reading
from UFS. One way to fix that would be to implement a configurable maximum
limit for the amount of cached file data for UFS, like vfs.zfs.arc_max for ZFS.

Another way to prevent ARC from decreasing is increasing vfs.zfs.arc_min when
copying data from a UFS partition and decreasing it to the original value after
all data is copied. However, vfs.zfs.arc_min can't be changed with sysctl, so
that approach requires two reboots. Perhaps it should be made possible to
change vfs.zfs.arc_min without requiring a reboot?
>Release-Note:
>Audit-Trail:
Responsible-Changed-From-To: freebsd-bugs->freebsd-fs 
Responsible-Changed-By: linimon 
Responsible-Changed-When: Sun May 9 22:08:29 UTC 2010 
Responsible-Changed-Why:  
Over to maintainer(s). 

http://www.freebsd.org/cgi/query-pr.cgi?pr=146410 
Responsible-Changed-From-To: freebsd-fs->pjd 
Responsible-Changed-By: pjd 
Responsible-Changed-When: pon 10 maj 2010 10:03:51 UTC 
Responsible-Changed-Why:  
I'll take this one. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=146410 

From: Peter Jeremy <peter.jeremy@alcatel-lucent.com>
To: bug-followup@freebsd.org, Juha-Matti Tilli <jtilli@cc.hut.fi>
Cc:  
Subject: Re: kern/146410: [zfs] [patch] bad file copy performance from UFS
 to ZFS
Date: Mon, 19 Jul 2010 10:19:35 +1000

 --kA1LkgxZ0NN7Mz3A
 Content-Type: multipart/mixed; boundary="11Y7aswkeuHtSBEs"
 Content-Disposition: inline
 
 
 --11Y7aswkeuHtSBEs
 Content-Type: text/plain; charset=us-ascii
 Content-Disposition: inline
 Content-Transfer-Encoding: quoted-printable
 
 Just reading or writing UFS files is sufficient to trigger this
 problem.  Accessing large UFS files via mmap(2) seems to be worse than
 using read(2)/write(2).
 
 I have been using the attached patch for some time and have recently
 done some stress testing - which showed that it survived when the
 stock ZFS wedged (even with r210214).  It is based on a patch written
 by Artem Belevich <fbsdlist@src.cx> (see http://pastebin.com/ZCkzkWcs).
 I am not convinced that this is the correct fix but it appears to be
 a reasonable work-around.
 
 --=20
 Peter Jeremy
 
 --11Y7aswkeuHtSBEs
 Content-Type: text/x-diff; charset=us-ascii
 Content-Disposition: attachment; filename="arc.patch"
 Content-Transfer-Encoding: quoted-printable
 
 Index: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c
 =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
 =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
 =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
 RCS file: /usr/ncvs/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.=
 c,v
 retrieving revision 1.22.2.6
 diff -u -r1.22.2.6 arc.c
 --- sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c	24 May 2010 20:09:=
 40 -0000	1.22.2.6
 +++ sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c	12 Jul 2010 09:21:=
 31 -0000
 @@ -183,10 +183,15 @@
  int zfs_arc_shrink_shift =3D 0;
  int zfs_arc_p_min_shift =3D 0;
 =20
 +uint64_t zfs_arc_bp_active;
 +uint64_t zfs_arc_bp_inactive;
 +
  TUNABLE_QUAD("vfs.zfs.arc_max", &zfs_arc_max);
  TUNABLE_QUAD("vfs.zfs.arc_min", &zfs_arc_min);
  TUNABLE_QUAD("vfs.zfs.arc_meta_limit", &zfs_arc_meta_limit);
  TUNABLE_INT("vfs.zfs.mdcomp_disable", &zfs_mdcomp_disable);
 +TUNABLE_QUAD("vfs.zfs.arc_bp_active", &zfs_arc_bp_active);
 +TUNABLE_QUAD("vfs.zfs.arc_bp_inactive", &zfs_arc_bp_inactive);
  SYSCTL_DECL(_vfs_zfs);
  SYSCTL_QUAD(_vfs_zfs, OID_AUTO, arc_max, CTLFLAG_RDTUN, &zfs_arc_max, 0,
      "Maximum ARC size");
 @@ -195,6 +200,11 @@
  SYSCTL_INT(_vfs_zfs, OID_AUTO, mdcomp_disable, CTLFLAG_RDTUN,
      &zfs_mdcomp_disable, 0, "Disable metadata compression");
 =20
 +SYSCTL_QUAD(_vfs_zfs, OID_AUTO, arc_bp_active, CTLFLAG_RW|CTLFLAG_TUN, &zf=
 s_arc_bp_active, 0,
 +    "Start ARC backpressure if active memory is below this limit");
 +SYSCTL_QUAD(_vfs_zfs, OID_AUTO, arc_bp_inactive, CTLFLAG_RW|CTLFLAG_TUN, &=
 zfs_arc_bp_inactive, 0,
 +    "Start ARC backpressure if inactive memory is below this limit");
 +
  /*
   * Note that buffers can be in one of 6 states:
   *	ARC_anon	- anonymous (discussed below)
 @@ -2103,7 +2113,6 @@
  }
 =20
  static int needfree =3D 0;
 -
  static int
  arc_reclaim_needed(void)
  {
 @@ -2112,20 +2121,58 @@
  #endif
 =20
  #ifdef _KERNEL
 -	if (needfree)
 -		return (1);
 +	/* We've grown too much, */
  	if (arc_size > arc_c_max)
  		return (1);
 +
 +	/* Pagedaemon is stuck, let's free something right away */
 +	if (vm_pageout_pages_needed)
 +		return 1;
 +
 +	/* Check if inactive list have grown too much */
 +	if ( zfs_arc_bp_inactive
 +	     && (ptoa((uintmax_t)cnt.v_inactive_count) > zfs_arc_bp_inactive)) {
 +		/* tell pager to reap 1/2th of inactive queue*/
 +		atomic_add_int(&vm_pageout_deficit, cnt.v_inactive_count/2);
 +		pagedaemon_wakeup();
 +		return needfree;
 +	}
 +
 +	/* Same for active list... */
 +	if ( zfs_arc_bp_active
 +	     && (ptoa((uintmax_t)cnt.v_active_count) > zfs_arc_bp_active)) {
 +		atomic_add_int(&vm_pageout_deficit, cnt.v_active_count/2);
 +		pagedaemon_wakeup();
 +		return needfree;
 +	}
 +
 +=09
 +	/* Old style behavior -- ARC gives up memory whenever page daemon asks.. =
 */
 +	if (needfree)
 +		return 1;
 +
 +	/*
 +	  We got here either because active/inactive lists are
 +	  getting short or because we've been called during voluntary
 +	  ARC size checks. Kind of gray area...
 +	*/
 +
 +	/* If we didn't reach our minimum yet, don't rush to give memory up..*/
  	if (arc_size <=3D arc_c_min)
  		return (0);
 =20
 +	/* If we're really short on memory now, give it up. */
 +	if (vm_page_count_min()) {
 +		return (1);
 +	}
 +=09
  	/*
 -	 * If pages are needed or we're within 2048 pages
 -	 * of needing to page need to reclaim
 +	 * If we're within 2048 pages of pagedaemon start, reclaim...
  	 */
 -	if (vm_pages_needed || (vm_paging_target() > -2048))
 +	if (vm_pages_needed && (vm_paging_target() > -2048))
  		return (1);
 =20
 +
  #if 0
  	/*
  	 * take 'desfree' extra pages, so we reclaim sooner, rather than later
 @@ -2169,8 +2216,6 @@
  		return (1);
  #endif
  #else
 -	if (kmem_used() > (kmem_size() * 3) / 4)
 -		return (1);
  #endif
 =20
  #else
 @@ -2279,7 +2324,7 @@
  		if (arc_eviction_list !=3D NULL)
  			arc_do_user_evicts();
 =20
 -		if (arc_reclaim_needed()) {
 +		if (needfree) {
  			needfree =3D 0;
  #ifdef _KERNEL
  			wakeup(&needfree);
 @@ -3611,10 +3656,17 @@
  {
  #ifdef _KERNEL
  	uint64_t inflight_data =3D arc_anon->arcs_size;
 -	uint64_t available_memory =3D ptoa((uintmax_t)cnt.v_free_count);
 +	uint64_t available_memory;
  	static uint64_t page_load =3D 0;
  	static uint64_t last_txg =3D 0;
 =20
 +        /* How much memory is potentially available */
 +	available_memory =3D (uint64_t)cnt.v_free_count + cnt.v_cache_count;
 +	if (available_memory > cnt.v_free_min)
 +		available_memory =3D ptoa(available_memory - cnt.v_free_min);
 +	else
 +		available_memory =3D 0;
 +
  #if 0
  #if defined(__i386)
  	available_memory =3D
 
 --11Y7aswkeuHtSBEs--
 
 --kA1LkgxZ0NN7Mz3A
 Content-Type: application/pgp-signature
 Content-Disposition: inline
 
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.15 (FreeBSD)
 
 iEYEARECAAYFAkxDmhcACgkQ/opHv/APuIfROwCfbVD5XHjDa9ov9jvU8eCITRP8
 0y4AoMFiyPyD29878yJ6xfZtF7+tlpkD
 =CRd/
 -----END PGP SIGNATURE-----
 
 --kA1LkgxZ0NN7Mz3A--

From: dfilter@FreeBSD.ORG (dfilter service)
To: bug-followup@FreeBSD.org
Cc:  
Subject: Re: kern/146410: commit references a PR
Date: Fri, 17 Sep 2010 07:14:15 +0000 (UTC)

 Author: avg
 Date: Fri Sep 17 07:14:07 2010
 New Revision: 212780
 URL: http://svn.freebsd.org/changeset/base/212780
 
 Log:
   zfs arc_reclaim_needed: more reasonable threshold for available pages
   
   vm_paging_target() is not a trigger of any kind for pageademon, but
   rather a "soft" target for it when it's already triggered.
   Thus, trying to keep 2048 pages above that level at the expense of ARC
   was simply driving ARC size into the ground even with normal memory
   loads.
   Instead, use a threshold at which a pagedaemon scan is triggered, so
   that ARC reclaiming helps with pagedaemon's task, but the latter still
   recycles active and inactive pages.
   
   PR:		kern/146410, kern/138790
   MFC after:	3 weeks
 
 Modified:
   head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c
 
 Modified: head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c
 ==============================================================================
 --- head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c	Fri Sep 17 04:55:01 2010	(r212779)
 +++ head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c	Fri Sep 17 07:14:07 2010	(r212780)
 @@ -2161,10 +2161,10 @@ arc_reclaim_needed(void)
  		return (0);
  
  	/*
 -	 * If pages are needed or we're within 2048 pages
 -	 * of needing to page need to reclaim
 +	 * Cooperate with pagedaemon when it's time for it to scan
 +	 * and reclaim some pages.
  	 */
 -	if (vm_pages_needed || (vm_paging_target() > -2048))
 +	if (vm_paging_need())
  		return (1);
  
  #if 0
 _______________________________________________
 svn-src-all@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/svn-src-all
 To unsubscribe, send any mail to "svn-src-all-unsubscribe@freebsd.org"
 

From: dfilter@FreeBSD.ORG (dfilter service)
To: bug-followup@FreeBSD.org
Cc:  
Subject: Re: kern/146410: commit references a PR
Date: Fri, 17 Sep 2010 07:34:57 +0000 (UTC)

 Author: avg
 Date: Fri Sep 17 07:34:50 2010
 New Revision: 212783
 URL: http://svn.freebsd.org/changeset/base/212783
 
 Log:
   zfs arc_reclaim_needed: fix typo in mismerge in r212780
   
   PR:		kern/146410, kern/138790
   MFC after:	3 weeks
   X-MFC with:	r212780
 
 Modified:
   head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c
 
 Modified: head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c
 ==============================================================================
 --- head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c	Fri Sep 17 07:20:20 2010	(r212782)
 +++ head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c	Fri Sep 17 07:34:50 2010	(r212783)
 @@ -2160,7 +2160,7 @@ arc_reclaim_needed(void)
  	 * Cooperate with pagedaemon when it's time for it to scan
  	 * and reclaim some pages.
  	 */
 -	if (vm_paging_need())
 +	if (vm_paging_needed())
  		return (1);
  
  #if 0
 _______________________________________________
 svn-src-all@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/svn-src-all
 To unsubscribe, send any mail to "svn-src-all-unsubscribe@freebsd.org"
 

From: "Chris" <chris@chrysalisnet.org>
To: <bug-followup@FreeBSD.org>,
	<jtilli@cc.hut.fi>
Cc:  
Subject: Re: kern/146410: [zfs] [patch] bad file copy performance from UFS to ZFS
Date: Tue, 8 Feb 2011 17:15:34 -0000

 Hi is there any progress on this?
 
 Am using 8.2-RC3 and FreeBSD has the exact same problem many months later.
 
 I was reading from UFS backups and my ufs cache using over 7 gig of ram in a
 12gig machine rampant usage unchecked and zfs reducing its cache to make
 room.
 
 Really I agree with the original reporter that zfs tunables shouldnt need a
 reboot and more importantly ufs needs a sysctl knob to throttle its cache.
 
 I am considering setting a min cache size for zfs but I fear that ufs still
 wont hold back and I will find many gigs of memory swapped out.
 
>Unformatted:
