From mm@mail2.vx.sk  Wed Apr 28 09:47:01 2010
Return-Path: <mm@mail2.vx.sk>
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 972CD1065670
	for <FreeBSD-gnats-submit@freebsd.org>; Wed, 28 Apr 2010 09:47:01 +0000 (UTC)
	(envelope-from mm@mail2.vx.sk)
Received: from mail2.vx.sk (neo.vx.sk [188.40.111.84])
	by mx1.freebsd.org (Postfix) with ESMTP id 595708FC26
	for <FreeBSD-gnats-submit@freebsd.org>; Wed, 28 Apr 2010 09:47:01 +0000 (UTC)
Received: from neo.vx.sk (localhost [127.0.0.1])
	by mail2.vx.sk (Postfix) with ESMTP id 607913A0CE
	for <FreeBSD-gnats-submit@freebsd.org>; Wed, 28 Apr 2010 11:47:00 +0200 (CEST)
Received: from mail2.vx.sk ([127.0.0.1])
	by neo.vx.sk (neo.vx.sk [127.0.0.1]) (amavisd-new, port 10024)
	with LMTP id w2Gv9etI0NBS for <FreeBSD-gnats-submit@freebsd.org>;
	Wed, 28 Apr 2010 11:46:55 +0200 (CEST)
Received: by mail2.vx.sk (Postfix, from userid 1001)
	id 869FA3A0C5; Wed, 28 Apr 2010 11:46:55 +0200 (CEST)
Message-Id: <20100428094655.869FA3A0C5@mail2.vx.sk>
Date: Wed, 28 Apr 2010 11:46:55 +0200 (CEST)
From: Martin Matuska <mm@FreeBSD.org>
Reply-To: Martin Matuska <mm@FreeBSD.org>
To: FreeBSD-gnats-submit@freebsd.org
Cc:
Subject: [zfs] [patch] write throttling bugfix (onnv 9366)
X-Send-Pr-Version: 3.113
X-GNATS-Notify:

>Number:         146108
>Category:       kern
>Synopsis:       [zfs] [patch] write throttling bugfix (onnv 9366)
>Confidential:   no
>Severity:       non-critical
>Priority:       low
>Responsible:    mm
>State:          closed
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          change-request
>Submitter-Id:   current-users
>Arrival-Date:   Wed Apr 28 09:50:04 UTC 2010
>Closed-Date:    Fri May 14 09:27:37 UTC 2010
>Last-Modified:  Sat May 15 07:10:04 UTC 2010
>Originator:     Martin Matuska
>Release:        FreeBSD 8.0-STABLE amd64
>Organization:
>Environment:
System: FreeBSD neo.vx.sk 8.0-STABLE FreeBSD 8.0-STABLE #10 r207271M: Tue Apr 27 21:36:49 CEST 2010 root@neo.vx.sk:/usr/obj/stable/sys/NEO amd64
>Description:
- fix improper pool write throughput calculation [1]
- make vfs.zfs.write_limit_override tunable (like in OpenSolaris)

The tunable is in bytes and enables fine-tuning of ZFS write bursts (tradeoff stalls vs. lower write throughput)

Discussed with and approved by: pjd@

MFS suggestion: 2 weeks

Sources:
OpenSolaris bug-id: 6817339 [1]
Onnv revision: 9366 [1]

>How-To-Repeat:
>Fix:
Index: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_pool.c
===================================================================
--- sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_pool.c	(revision 207314)
+++ sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_pool.c	(working copy)
@@ -47,6 +47,11 @@
 uint64_t zfs_write_limit_override = 0;
 extern uint64_t zfs_write_limit_min;
 
+SYSCTL_DECL(_vfs_zfs);
+TUNABLE_ULONG("vfs.zfs.write_limit_override", &zfs_write_limit_override);
+SYSCTL_ULONG(_vfs_zfs, OID_AUTO, write_limit_override, CTLFLAG_RW, &zfs_write_limit_override, 0,
+	"Override maximum TXG size");
+
 kmutex_t zfs_write_limit_lock;
 
 static pgcnt_t old_physmem = 0;
@@ -300,6 +305,7 @@
 	tx = dmu_tx_create_assigned(dp, txg);
 
 	dp->dp_read_overhead = 0;
+	start = gethrtime();
 	zio = zio_root(dp->dp_spa, NULL, NULL, ZIO_FLAG_MUSTSUCCEED);
 	while (ds = txg_list_remove(&dp->dp_dirty_datasets, txg)) {
 		if (!list_link_active(&ds->ds_synced_link))
@@ -310,7 +316,6 @@
 	}
 	DTRACE_PROBE(pool_sync__1setup);
 
-	start = gethrtime();
 	err = zio_wait(zio);
 	write_time = gethrtime() - start;
 	ASSERT(err == 0);
>Release-Note:
>Audit-Trail:
Responsible-Changed-From-To: freebsd-bugs->mm 
Responsible-Changed-By: delphij 
Responsible-Changed-When: Thu Apr 29 21:11:59 UTC 2010 
Responsible-Changed-Why:  
Submitter is now src/ committer and this has been approved by pjd@ 

http://www.freebsd.org/cgi/query-pr.cgi?pr=146108 

From: dfilter@FreeBSD.ORG (dfilter service)
To: bug-followup@FreeBSD.org
Cc:  
Subject: Re: kern/146108: commit references a PR
Date: Fri, 30 Apr 2010 07:48:45 +0000 (UTC)

 Author: mm
 Date: Fri Apr 30 07:48:29 2010
 New Revision: 207427
 URL: http://svn.freebsd.org/changeset/base/207427
 
 Log:
   Fix improper pool write throughput calculation.
   
   OpenSolaris onnv revision:	9366:17553395a745
   
   PR:		kern/146108
   Approved by:	pjd, delphij (mentor)
   Obtained from:	OpenSolaris, Bug ID 6817339
   MFC after:	2 weeks
 
 Modified:
   head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_pool.c
 
 Modified: head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_pool.c
 ==============================================================================
 --- head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_pool.c	Fri Apr 30 07:09:13 2010	(r207426)
 +++ head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_pool.c	Fri Apr 30 07:48:29 2010	(r207427)
 @@ -300,6 +300,7 @@ dsl_pool_sync(dsl_pool_t *dp, uint64_t t
  	tx = dmu_tx_create_assigned(dp, txg);
  
  	dp->dp_read_overhead = 0;
 +	start = gethrtime();
  	zio = zio_root(dp->dp_spa, NULL, NULL, ZIO_FLAG_MUSTSUCCEED);
  	while (ds = txg_list_remove(&dp->dp_dirty_datasets, txg)) {
  		if (!list_link_active(&ds->ds_synced_link))
 @@ -310,7 +311,6 @@ dsl_pool_sync(dsl_pool_t *dp, uint64_t t
  	}
  	DTRACE_PROBE(pool_sync__1setup);
  
 -	start = gethrtime();
  	err = zio_wait(zio);
  	write_time = gethrtime() - start;
  	ASSERT(err == 0);
 _______________________________________________
 svn-src-all@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/svn-src-all
 To unsubscribe, send any mail to "svn-src-all-unsubscribe@freebsd.org"
 

From: Martin Matuska <mm@FreeBSD.org>
To: bug-followup@FreeBSD.org, mm@FreeBSD.org
Cc:  
Subject: Re: kern/146108: [zfs] [patch] write throttling bugfix (onnv 9366)
Date: Fri, 30 Apr 2010 16:38:36 +0200

 This is a multi-part message in MIME format.
 --------------000907090904000208010804
 Content-Type: text/plain; charset=windows-1250
 Content-Transfer-Encoding: 7bit
 
 I am updating this PR with a new version of the sysctl patch. It is now
 placed under txg.c (where it should belong) and the description is more
 understandable.
 
 I would also recommend changing the description of the vfs.zfs.txg group
 from "ZFS TXG" to "ZFS transaction groups (TXG)"
 
 --------------000907090904000208010804
 Content-Type: text/plain;
  name="head-writelimit.patch"
 Content-Transfer-Encoding: 7bit
 Content-Disposition: attachment;
  filename="head-writelimit.patch"
 
 Index: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/txg.c
 ===================================================================
 --- sys/cddl/contrib/opensolaris/uts/common/fs/zfs/txg.c	(revision 207433)
 +++ sys/cddl/contrib/opensolaris/uts/common/fs/zfs/txg.c	(working copy)
 @@ -38,6 +38,7 @@
  
  int zfs_txg_timeout = 30;	/* max seconds worth of delta per txg */
  extern int zfs_txg_synctime;
 +extern uint64_t zfs_write_limit_override;
  
  SYSCTL_DECL(_vfs_zfs);
  SYSCTL_NODE(_vfs_zfs, OID_AUTO, txg, CTLFLAG_RW, 0, "ZFS TXG");
 @@ -47,6 +48,9 @@
  TUNABLE_INT("vfs.zfs.txg.synctime", &zfs_txg_synctime);
  SYSCTL_INT(_vfs_zfs_txg, OID_AUTO, synctime, CTLFLAG_RDTUN, &zfs_txg_synctime,
      0, "Target seconds to sync a txg");
 +TUNABLE_ULONG("vfs.zfs.txg.write_limit_override", &zfs_write_limit_override);
 +SYSCTL_ULONG(_vfs_zfs_txg, OID_AUTO, write_limit_override, CTLFLAG_RW,
 +    &zfs_write_limit_override, 0, "Override maximum size of a txg");
  
  /*
   * Prepare the txg subsystem.
 
 --------------000907090904000208010804--

From: dfilter@FreeBSD.ORG (dfilter service)
To: bug-followup@FreeBSD.org
Cc:  
Subject: Re: kern/146108: commit references a PR
Date: Sat,  1 May 2010 20:44:55 +0000 (UTC)

 Author: mm
 Date: Sat May  1 20:44:37 2010
 New Revision: 207481
 URL: http://svn.freebsd.org/changeset/base/207481
 
 Log:
   Add sysctl and loader tunable vfs.zfs.txg.write_limit_override.
   This tunable improves fine-tuning of ZFS write throttling.
   
   PR:		kern/146108
   Suggested by:	Nikolay Denev <ndenev at gmail.com>
   Approved by:	pjd, delphij (mentor)
   MFC after:	2 weeks
 
 Modified:
   head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/txg.c
 
 Modified: head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/txg.c
 ==============================================================================
 --- head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/txg.c	Sat May  1 19:53:15 2010	(r207480)
 +++ head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/txg.c	Sat May  1 20:44:37 2010	(r207481)
 @@ -38,6 +38,7 @@ static void txg_quiesce_thread(void *arg
  
  int zfs_txg_timeout = 30;	/* max seconds worth of delta per txg */
  extern int zfs_txg_synctime;
 +extern uint64_t zfs_write_limit_override;
  
  SYSCTL_DECL(_vfs_zfs);
  SYSCTL_NODE(_vfs_zfs, OID_AUTO, txg, CTLFLAG_RW, 0,
 @@ -48,6 +49,11 @@ SYSCTL_INT(_vfs_zfs_txg, OID_AUTO, timeo
  TUNABLE_INT("vfs.zfs.txg.synctime", &zfs_txg_synctime);
  SYSCTL_INT(_vfs_zfs_txg, OID_AUTO, synctime, CTLFLAG_RDTUN, &zfs_txg_synctime,
      0, "Target seconds to sync a txg");
 +TUNABLE_QUAD("vfs.zfs.txg.write_limit_override", &zfs_write_limit_override);
 +SYSCTL_QUAD(_vfs_zfs_txg, OID_AUTO, write_limit_override, CTLFLAG_RW,
 +    &zfs_write_limit_override, 0,
 +    "Override maximum size of a txg to this size in bytes, "
 +    "value of 0 means don't override");
  
  /*
   * Prepare the txg subsystem.
 _______________________________________________
 svn-src-all@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/svn-src-all
 To unsubscribe, send any mail to "svn-src-all-unsubscribe@freebsd.org"
 
State-Changed-From-To: open->patched 
State-Changed-By: mm 
State-Changed-When: Sat May 1 22:01:02 UTC 2010 
State-Changed-Why:  
Patch pending for MFC. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=146108 

From: dfilter@FreeBSD.ORG (dfilter service)
To: bug-followup@FreeBSD.org
Cc:  
Subject: Re: kern/146108: commit references a PR
Date: Fri, 14 May 2010 09:00:41 +0000 (UTC)

 Author: mm
 Date: Fri May 14 09:00:29 2010
 New Revision: 208062
 URL: http://svn.freebsd.org/changeset/base/208062
 
 Log:
   MFC r207427:
   
   Fix improper pool write throughput calculation.
   
   OpenSolaris onnv revision:	9366:17553395a745
   
   PR:		kern/146108
   Obtained from:	OpenSolaris (Bug ID 6817339)
   Approved by:	pjd, delphij (mentor)
 
 Modified:
   stable/8/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_pool.c
 Directory Properties:
   stable/8/sys/   (props changed)
   stable/8/sys/amd64/include/xen/   (props changed)
   stable/8/sys/cddl/contrib/opensolaris/   (props changed)
   stable/8/sys/contrib/dev/acpica/   (props changed)
   stable/8/sys/contrib/pf/   (props changed)
   stable/8/sys/dev/xen/xenpci/   (props changed)
   stable/8/sys/geom/sched/   (props changed)
 
 Modified: stable/8/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_pool.c
 ==============================================================================
 --- stable/8/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_pool.c	Fri May 14 08:56:07 2010	(r208061)
 +++ stable/8/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_pool.c	Fri May 14 09:00:29 2010	(r208062)
 @@ -300,6 +300,7 @@ dsl_pool_sync(dsl_pool_t *dp, uint64_t t
  	tx = dmu_tx_create_assigned(dp, txg);
  
  	dp->dp_read_overhead = 0;
 +	start = gethrtime();
  	zio = zio_root(dp->dp_spa, NULL, NULL, ZIO_FLAG_MUSTSUCCEED);
  	while (ds = txg_list_remove(&dp->dp_dirty_datasets, txg)) {
  		if (!list_link_active(&ds->ds_synced_link))
 @@ -310,7 +311,6 @@ dsl_pool_sync(dsl_pool_t *dp, uint64_t t
  	}
  	DTRACE_PROBE(pool_sync__1setup);
  
 -	start = gethrtime();
  	err = zio_wait(zio);
  	write_time = gethrtime() - start;
  	ASSERT(err == 0);
 _______________________________________________
 svn-src-all@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/svn-src-all
 To unsubscribe, send any mail to "svn-src-all-unsubscribe@freebsd.org"
 

From: dfilter@FreeBSD.ORG (dfilter service)
To: bug-followup@FreeBSD.org
Cc:  
Subject: Re: kern/146108: commit references a PR
Date: Fri, 14 May 2010 09:02:45 +0000 (UTC)

 Author: mm
 Date: Fri May 14 09:02:31 2010
 New Revision: 208063
 URL: http://svn.freebsd.org/changeset/base/208063
 
 Log:
   MFC r207427:
   
   Fix improper pool write throughput calculation.
   
   OpenSolaris onnv revision:      9366:17553395a745
   
   PR:		kern/146108
   Obtained from:	OpenSolaris (Bug ID 6817339)
   Approved by:	pjd, delphij (mentor)
 
 Modified:
   stable/7/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_pool.c
 Directory Properties:
   stable/7/sys/   (props changed)
   stable/7/sys/cddl/contrib/opensolaris/   (props changed)
   stable/7/sys/contrib/dev/acpica/   (props changed)
   stable/7/sys/contrib/pf/   (props changed)
 
 Modified: stable/7/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_pool.c
 ==============================================================================
 --- stable/7/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_pool.c	Fri May 14 09:00:29 2010	(r208062)
 +++ stable/7/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_pool.c	Fri May 14 09:02:31 2010	(r208063)
 @@ -300,6 +300,7 @@ dsl_pool_sync(dsl_pool_t *dp, uint64_t t
  	tx = dmu_tx_create_assigned(dp, txg);
  
  	dp->dp_read_overhead = 0;
 +	start = gethrtime();
  	zio = zio_root(dp->dp_spa, NULL, NULL, ZIO_FLAG_MUSTSUCCEED);
  	while (ds = txg_list_remove(&dp->dp_dirty_datasets, txg)) {
  		if (!list_link_active(&ds->ds_synced_link))
 @@ -310,7 +311,6 @@ dsl_pool_sync(dsl_pool_t *dp, uint64_t t
  	}
  	DTRACE_PROBE(pool_sync__1setup);
  
 -	start = gethrtime();
  	err = zio_wait(zio);
  	write_time = gethrtime() - start;
  	ASSERT(err == 0);
 _______________________________________________
 svn-src-all@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/svn-src-all
 To unsubscribe, send any mail to "svn-src-all-unsubscribe@freebsd.org"
 
State-Changed-From-To: patched->closed 
State-Changed-By: mm 
State-Changed-When: Fri May 14 09:27:36 UTC 2010 
State-Changed-Why:  
Committed. Thanks! 

http://www.freebsd.org/cgi/query-pr.cgi?pr=146108 

From: dfilter@FreeBSD.ORG (dfilter service)
To: bug-followup@FreeBSD.org
Cc:  
Subject: Re: kern/146108: commit references a PR
Date: Sat, 15 May 2010 07:07:53 +0000 (UTC)

 Author: mm
 Date: Sat May 15 07:07:38 2010
 New Revision: 208109
 URL: http://svn.freebsd.org/changeset/base/208109
 
 Log:
   MFC r207481, r207956:
   
   MFC r207481 [1]:
   Add sysctl and loader tunable vfs.zfs.txg.write_limit_override.
   This tunable improves fine-tuning of ZFS write throttling.
   
   MFC r207956 [2]:
   Fix possible hang when replaying large truncations.
   OpenSolaris onnv revision:	7904:6a124a4ca9c5
   
   PR:		kern/146108 [1]
   Suggested by:	Nikolay Denev <ndenev at gmail.com> [1]
   Obtained from:	OpenSolaris (Bug ID 6761624) [2]
   Approved by:	pjd, delphij (mentor)
 
 Modified:
   stable/8/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/txg.c
   stable/8/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zil.c
 Directory Properties:
   stable/8/sys/   (props changed)
   stable/8/sys/amd64/include/xen/   (props changed)
   stable/8/sys/cddl/contrib/opensolaris/   (props changed)
   stable/8/sys/contrib/dev/acpica/   (props changed)
   stable/8/sys/contrib/pf/   (props changed)
   stable/8/sys/dev/xen/xenpci/   (props changed)
   stable/8/sys/geom/sched/   (props changed)
 
 Modified: stable/8/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/txg.c
 ==============================================================================
 --- stable/8/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/txg.c	Sat May 15 07:01:41 2010	(r208108)
 +++ stable/8/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/txg.c	Sat May 15 07:07:38 2010	(r208109)
 @@ -38,6 +38,7 @@ static void txg_quiesce_thread(void *arg
  
  int zfs_txg_timeout = 30;	/* max seconds worth of delta per txg */
  extern int zfs_txg_synctime;
 +extern uint64_t zfs_write_limit_override;
  
  SYSCTL_DECL(_vfs_zfs);
  SYSCTL_NODE(_vfs_zfs, OID_AUTO, txg, CTLFLAG_RW, 0,
 @@ -48,6 +49,11 @@ SYSCTL_INT(_vfs_zfs_txg, OID_AUTO, timeo
  TUNABLE_INT("vfs.zfs.txg.synctime", &zfs_txg_synctime);
  SYSCTL_INT(_vfs_zfs_txg, OID_AUTO, synctime, CTLFLAG_RDTUN, &zfs_txg_synctime,
      0, "Target seconds to sync a txg");
 +TUNABLE_QUAD("vfs.zfs.txg.write_limit_override", &zfs_write_limit_override);
 +SYSCTL_QUAD(_vfs_zfs_txg, OID_AUTO, write_limit_override, CTLFLAG_RW,
 +    &zfs_write_limit_override, 0,
 +    "Override maximum size of a txg to this size in bytes, "
 +    "value of 0 means don't override");
  
  /*
   * Prepare the txg subsystem.
 
 Modified: stable/8/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zil.c
 ==============================================================================
 --- stable/8/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zil.c	Sat May 15 07:01:41 2010	(r208108)
 +++ stable/8/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zil.c	Sat May 15 07:07:38 2010	(r208109)
 @@ -1567,6 +1567,29 @@ zil_replay_log_record(zilog_t *zilog, lr
  	}
  
  	/*
 +	 * Replay of large truncates can end up needing additional txs
 +	 * and a different txg. If they are nested within the replay tx
 +	 * as below then a hang is possible. So we do the truncate here
 +	 * and redo the truncate later (a no-op) and update the sequence
 +	 * number whilst in the replay tx. Fortunately, it's safe to repeat
 +	 * a truncate if we crash and the truncate commits. A create over
 +	 * an existing file will also come in as a TX_TRUNCATE record.
 +	 *
 +	 * Note, remove of large files and renames over large files is
 +	 * handled by putting the deleted object on a stable list
 +	 * and if necessary force deleting the object outside of the replay
 +	 * transaction using the zr_replay_cleaner.
 +	 */
 +	if (txtype == TX_TRUNCATE) {
 +		*zr->zr_txgp = TXG_NOWAIT;
 +		error = zr->zr_replay[TX_TRUNCATE](zr->zr_arg, zr->zr_lrbuf,
 +		    zr->zr_byteswap);
 +		if (error)
 +			goto bad;
 +		zr->zr_byteswap = 0; /* only byteswap once */
 +	}
 +
 +	/*
  	 * We must now do two things atomically: replay this log record,
  	 * and update the log header to reflect the fact that we did so.
  	 * We use the DMU's ability to assign into a specific txg to do this.
 @@ -1636,6 +1659,7 @@ zil_replay_log_record(zilog_t *zilog, lr
  		dprintf("pass %d, retrying\n", pass);
  	}
  
 +bad:
  	ASSERT(error && error != ERESTART);
  	name = kmem_alloc(MAXNAMELEN, KM_SLEEP);
  	dmu_objset_name(zr->zr_os, name);
 _______________________________________________
 svn-src-all@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/svn-src-all
 To unsubscribe, send any mail to "svn-src-all-unsubscribe@freebsd.org"
 
>Unformatted:
