From nobody@FreeBSD.org  Fri May 16 16:07:44 2014
Return-Path: <nobody@FreeBSD.org>
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1])
	(using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
	(No client certificate requested)
	by hub.freebsd.org (Postfix) with ESMTPS id 83F93FCF
	for <freebsd-gnats-submit@FreeBSD.org>; Fri, 16 May 2014 16:07:44 +0000 (UTC)
Received: from cgiserv.freebsd.org (cgiserv.freebsd.org [IPv6:2001:1900:2254:206a::50:4])
	(using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
	(Client did not present a certificate)
	by mx1.freebsd.org (Postfix) with ESMTPS id 574F62EE5
	for <freebsd-gnats-submit@FreeBSD.org>; Fri, 16 May 2014 16:07:44 +0000 (UTC)
Received: from cgiserv.freebsd.org ([127.0.1.6])
	by cgiserv.freebsd.org (8.14.8/8.14.8) with ESMTP id s4GG7hE0098491
	for <freebsd-gnats-submit@FreeBSD.org>; Fri, 16 May 2014 16:07:43 GMT
	(envelope-from nobody@cgiserv.freebsd.org)
Received: (from nobody@localhost)
	by cgiserv.freebsd.org (8.14.8/8.14.8/Submit) id s4GG7hY1098490;
	Fri, 16 May 2014 16:07:43 GMT
	(envelope-from nobody)
Message-Id: <201405161607.s4GG7hY1098490@cgiserv.freebsd.org>
Date: Fri, 16 May 2014 16:07:43 GMT
From: Nathaniel Filardo <nwf@cs.jhu.edu>
To: freebsd-gnats-submit@FreeBSD.org
Subject: zfs_dirty_data_max{,_max,_percent} not exported as loader tunables
X-Send-Pr-Version: www-3.1
X-GNATS-Notify:

>Number:         189865
>Category:       kern
>Synopsis:       [zfs] [patch] zfs_dirty_data_max{,_max,_percent} not exported as loader tunables
>Confidential:   no
>Severity:       non-critical
>Priority:       low
>Responsible:    smh
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Fri May 16 16:10:01 UTC 2014
>Closed-Date:    
>Last-Modified:  Wed May 21 13:40:00 UTC 2014
>Originator:     Nathaniel Filardo
>Release:        9.2-STABLE
>Organization:
IETFNG.org
>Environment:
FreeBSD hydra.priv.oc.ietfng.org 9.2-STABLE FreeBSD 9.2-STABLE #132 9089443-dirty: Wed Apr 30 22:02:57 EDT 2014     root@hydra.priv.oc.ietfng.org:/usr/obj/systank/src-git/sys/NWFKERN  sparc64

>Description:
On machines with gobs of RAM, zfs_dirty_data_max is zfs_dirty_data_max_percent (i.e. 10) percent of memory or zfs_dirty_data_max_max (ie 4G) which may take tens of minutes to sync to disk, especially if data is spread out across the disk, during which time any program that attempts to write to disk eventually stalls because there are at most three txgs pending.  It would be nice to limit transactions to something smaller so that these latency spikes go away.
>How-To-Repeat:

>Fix:
I think something like the following (compiled but untested) patch would do the trick?

diff --git a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_pool.c b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_pool.c
index 9fe1961..f5aa000 100644
--- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_pool.c
+++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_pool.c
@@ -132,6 +132,22 @@ uint64_t zfs_delay_scale = 1000 * 1000 * 1000 / 2000;
 
 
 SYSCTL_DECL(_vfs_zfs);
+
+TUNABLE_INT("vfs.zfs.dirty_data_max_percent", &zfs_dirty_data_max_percent);
+SYSCTL_INT(_vfs_zfs, OID_AUTO, dirty_data_max_percent, CTLFLAG_RDTUN,
+    &zfs_dirty_data_max_percent, 0,
+    "Maximum percent of physical memory allocated to dirty data");
+
+TUNABLE_QUAD("vfs.zfs.dirty_data_max", &zfs_dirty_data_max);
+SYSCTL_UQUAD(_vfs_zfs, OID_AUTO, dirty_data_max, CTLFLAG_RDTUN,
+    &zfs_dirty_data_max, 0,
+    "Force a txg if dirty buffers exceed this value (bytes)");
+
+TUNABLE_QUAD("vfs.zfs.dirty_data_max_max", &zfs_dirty_data_max_max);
+SYSCTL_UQUAD(_vfs_zfs, OID_AUTO, dirty_data_max_max, CTLFLAG_RDTUN,
+    &zfs_dirty_data_max_max, 0,
+    "Limit dirty_data_max when using dirty_data_max_percent");
+
 #if 0
 TUNABLE_INT("vfs.zfs.no_write_throttle", &zfs_no_write_throttle);
 SYSCTL_INT(_vfs_zfs, OID_AUTO, no_write_throttle, CTLFLAG_RDTUN,


>Release-Note:
>Audit-Trail:
Responsible-Changed-From-To: freebsd-bugs->freebsd-fs 
Responsible-Changed-By: linimon 
Responsible-Changed-When: Tue May 20 03:51:56 UTC 2014 
Responsible-Changed-Why:  
Over to maintainer(s). 

http://www.freebsd.org/cgi/query-pr.cgi?pr=189865 

From: "Steven Hartland" <killing@multiplay.co.uk>
To: <bug-followup@freebsd.org>,
	<nwf@cs.jhu.edu>
Cc:  
Subject: Re: kern/189865: [zfs] [patch] zfs_dirty_data_max{,_max,_percent} not exported as loader tunables
Date: Tue, 20 May 2014 09:38:04 +0100

 Exposing zfs_dirty_data_max directly doesn't make sense as its
 a calculated value based off zfs_dirty_data_max_percent% of
 all memory and capped at zfs_dirty_data_max_max.
 
 Given this it could be limited via setting zfs_dirty_data_max_max.
 
 The following could be exposed:-
 zfs_dirty_data_max_max
 zfs_dirty_data_max_percent
 zfs_dirty_data_sync
 zfs_delay_min_dirty_percent
 zfs_delay_scale
 
 Would that forfull your requirement?
 
     Regards
     Steve

From: Nathaniel W Filardo <nwf@cs.jhu.edu>
To: Steven Hartland <killing@multiplay.co.uk>
Cc: bug-followup@freebsd.org
Subject: Re: kern/189865: [zfs] [patch] zfs_dirty_data_max{,_max,_percent}
 not exported as loader tunables
Date: Wed, 21 May 2014 00:09:20 -0400

 --wwU9tsYnHnYeRAKj
 Content-Type: text/plain; charset=us-ascii
 Content-Disposition: inline
 Content-Transfer-Encoding: quoted-printable
 
 On Tue, May 20, 2014 at 09:38:04AM +0100, Steven Hartland wrote:
 > Exposing zfs_dirty_data_max directly doesn't make sense as its
 > a calculated value based off zfs_dirty_data_max_percent% of
 > all memory and capped at zfs_dirty_data_max_max.
 
 I'm pretty sure the intention is that it is computed that way only if not
 set already -- there's a comparison for =3D=3D 0 before the value is assign=
 ed.
 See arc_init():
 http://fxr.watson.org/fxr/source/cddl/contrib/opensolaris/uts/common/fs/zfs=
 /arc.c?im=3Dexcerpts#L4150
 
 And in the Old World, the zfs.write_limit_override was similarly exported to
 override the similar computation of zfs.write_limit_max.  That said, no,
 I don't really care too much about this particular tunable; I was just
 mirroring Solaris.
 =20
 > Given this it could be limited via setting zfs_dirty_data_max_max.
 
 Sure.
 =20
 > The following could be exposed:-
 > zfs_dirty_data_max_max
 > zfs_dirty_data_max_percent
 > zfs_dirty_data_sync
 > zfs_delay_min_dirty_percent
 > zfs_delay_scale
 >=20
 > Would that forfull your requirement?
 
 It's overkill for my case, but yes, those should probably all be exposed.
 
 Cheers,
 --nwf;
 
 --wwU9tsYnHnYeRAKj
 Content-Type: application/pgp-signature
 
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iEYEARECAAYFAlN8Ju8ACgkQTeQabvr9Tc/DSgCfal/L6J9s0NE+FhhAG5E+IS0J
 u4QAn0u66uK6MINAPKpcdCgVNSwJhIwk
 =Mnqx
 -----END PGP SIGNATURE-----
 
 --wwU9tsYnHnYeRAKj--
Responsible-Changed-From-To: freebsd-fs->smh 
Responsible-Changed-By: smh 
Responsible-Changed-When: Wed May 21 13:12:54 UTC 2014 
Responsible-Changed-Why:  
I'll take it. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=189865 

From: dfilter@FreeBSD.ORG (dfilter service)
To: bug-followup@FreeBSD.org
Cc:  
Subject: Re: kern/189865: commit references a PR
Date: Wed, 21 May 2014 13:36:08 +0000 (UTC)

 Author: smh
 Date: Wed May 21 13:36:04 2014
 New Revision: 266497
 URL: http://svnweb.freebsd.org/changeset/base/266497
 
 Log:
   Added sysctls / tunables for ZFS dirty data tuning
   
   Added the following new sysctls / tunables:
   * vfs.zfs.dirty_data_max
   * vfs.zfs.dirty_data_max_max
   * vfs.zfs.dirty_data_max_percent
   * vfs.zfs.dirty_data_sync
   * vfs.zfs.delay_min_dirty_percent
   * vfs.zfs.delay_scale
   
   PR:		kern/189865
   MFC after:	2 weeks
 
 Modified:
   head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_pool.c
 
 Modified: head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_pool.c
 ==============================================================================
 --- head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_pool.c	Wed May 21 11:53:15 2014	(r266496)
 +++ head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_pool.c	Wed May 21 13:36:04 2014	(r266497)
 @@ -46,6 +46,11 @@
  #include <sys/zil_impl.h>
  #include <sys/dsl_userhold.h>
  
 +#ifdef __FreeBSD__
 +#include <sys/sysctl.h>
 +#include <sys/types.h>
 +#endif
 +
  /*
   * ZFS Write Throttle
   * ------------------
 @@ -130,33 +135,83 @@ uint64_t zfs_delay_scale = 1000 * 1000 *
   * per-pool basis using zfs.conf.
   */
  
 +#ifdef __FreeBSD__
 +
 +extern int zfs_vdev_async_write_active_max_dirty_percent;
  
  SYSCTL_DECL(_vfs_zfs);
 -#if 0
 -TUNABLE_INT("vfs.zfs.no_write_throttle", &zfs_no_write_throttle);
 -SYSCTL_INT(_vfs_zfs, OID_AUTO, no_write_throttle, CTLFLAG_RDTUN,
 -    &zfs_no_write_throttle, 0, "");
 -TUNABLE_INT("vfs.zfs.write_limit_shift", &zfs_write_limit_shift);
 -SYSCTL_INT(_vfs_zfs, OID_AUTO, write_limit_shift, CTLFLAG_RDTUN,
 -    &zfs_write_limit_shift, 0, "2^N of physical memory");
 -SYSCTL_DECL(_vfs_zfs_txg);
 -TUNABLE_INT("vfs.zfs.txg.synctime_ms", &zfs_txg_synctime_ms);
 -SYSCTL_INT(_vfs_zfs_txg, OID_AUTO, synctime_ms, CTLFLAG_RDTUN,
 -    &zfs_txg_synctime_ms, 0, "Target milliseconds to sync a txg");
 -
 -TUNABLE_QUAD("vfs.zfs.write_limit_min", &zfs_write_limit_min);
 -SYSCTL_UQUAD(_vfs_zfs, OID_AUTO, write_limit_min, CTLFLAG_RDTUN,
 -    &zfs_write_limit_min, 0, "Minimum write limit");
 -TUNABLE_QUAD("vfs.zfs.write_limit_max", &zfs_write_limit_max);
 -SYSCTL_UQUAD(_vfs_zfs, OID_AUTO, write_limit_max, CTLFLAG_RDTUN,
 -    &zfs_write_limit_max, 0, "Maximum data payload per txg");
 -TUNABLE_QUAD("vfs.zfs.write_limit_inflated", &zfs_write_limit_inflated);
 -SYSCTL_UQUAD(_vfs_zfs, OID_AUTO, write_limit_inflated, CTLFLAG_RDTUN,
 -    &zfs_write_limit_inflated, 0, "Maximum size of the dynamic write limit");
 -TUNABLE_QUAD("vfs.zfs.write_limit_override", &zfs_write_limit_override);
 -SYSCTL_UQUAD(_vfs_zfs, OID_AUTO, write_limit_override, CTLFLAG_RDTUN,
 -    &zfs_write_limit_override, 0,
 -    "Force a txg if dirty buffers exceed this value (bytes)");
 +
 +TUNABLE_QUAD("vfs.zfs.dirty_data_max", &zfs_dirty_data_max);
 +SYSCTL_UQUAD(_vfs_zfs, OID_AUTO, dirty_data_max, CTLFLAG_RWTUN,
 +    &zfs_dirty_data_max, 0,
 +    "The dirty space limit in bytes after which new writes are halted until "
 +    "space becomes available");
 +
 +TUNABLE_QUAD("vfs.zfs.dirty_data_max_max", &zfs_dirty_data_max_max);
 +SYSCTL_UQUAD(_vfs_zfs, OID_AUTO, dirty_data_max_max, CTLFLAG_RDTUN,
 +    &zfs_dirty_data_max_max, 0,
 +    "The absolute cap on diry_data_max when auto calculating");
 +
 +TUNABLE_INT("vfs.zfs.dirty_data_max_percent", &zfs_dirty_data_max_percent);
 +SYSCTL_INT(_vfs_zfs, OID_AUTO, dirty_data_max_percent, CTLFLAG_RDTUN,
 +    &zfs_dirty_data_max_percent, 0,
 +    "The percent of physical memory used to auto calculate dirty_data_max");
 +
 +TUNABLE_QUAD("vfs.zfs.dirty_data_sync", &zfs_dirty_data_sync);
 +SYSCTL_UQUAD(_vfs_zfs, OID_AUTO, dirty_data_sync, CTLFLAG_RWTUN,
 +    &zfs_dirty_data_sync, 0,
 +    "Force at txg if the number of dirty buffer bytes exceed this value");
 +
 +static int sysctl_zfs_delay_min_dirty_percent(SYSCTL_HANDLER_ARGS);
 +/* No zfs_delay_min_dirty_percent tunable due to limit requirements */
 +SYSCTL_PROC(_vfs_zfs, OID_AUTO, delay_min_dirty_percent,
 +    CTLTYPE_INT | CTLFLAG_MPSAFE | CTLFLAG_RW, 0, sizeof(int),
 +    sysctl_zfs_delay_min_dirty_percent, "I",
 +    "The limit of outstanding dirty data before transations are delayed");
 +
 +static int sysctl_zfs_delay_scale(SYSCTL_HANDLER_ARGS);
 +/* No zfs_delay_scale tunable due to limit requirements */
 +SYSCTL_PROC(_vfs_zfs, OID_AUTO, delay_scale,
 +    CTLTYPE_U64 | CTLFLAG_MPSAFE | CTLFLAG_RW, 0, sizeof(uint64_t),
 +    sysctl_zfs_delay_scale, "QU",
 +    "Controls how quickly the delay approaches infinity");
 +
 +static int
 +sysctl_zfs_delay_min_dirty_percent(SYSCTL_HANDLER_ARGS)
 +{
 +	int val, err;
 +
 +	val = zfs_delay_min_dirty_percent;
 +	err = sysctl_handle_int(oidp, &val, 0, req);
 +	if (err != 0 || req->newptr == NULL)
 +		return (err);
 +
 +	if (val < zfs_vdev_async_write_active_max_dirty_percent)
 +		return (EINVAL);
 +
 +	zfs_delay_min_dirty_percent = val;
 +
 +	return (0);
 +}
 +
 +static int
 +sysctl_zfs_delay_scale(SYSCTL_HANDLER_ARGS)
 +{
 +	uint64_t val;
 +	int err;
 +
 +	val = zfs_delay_scale;
 +	err = sysctl_handle_64(oidp, &val, 0, req);
 +	if (err != 0 || req->newptr == NULL)
 +		return (err);
 +
 +	if (val > UINT64_MAX / zfs_dirty_data_max)
 +		return (EINVAL);
 +
 +	zfs_delay_scale = val;
 +
 +	return (0);
 +}
  #endif
  
  hrtime_t zfs_throttle_delay = MSEC2NSEC(10);
 _______________________________________________
 svn-src-all@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/svn-src-all
 To unsubscribe, send any mail to "svn-src-all-unsubscribe@freebsd.org"
 
>Unformatted:
