From mitya@agata.yandex.ru  Sun Aug 11 12:22:04 2013
Return-Path: <mitya@agata.yandex.ru>
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1])
	(using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
	(No client certificate requested)
	by hub.freebsd.org (Postfix) with ESMTP id 50A6D1B5
	for <FreeBSD-gnats-submit@freebsd.org>; Sun, 11 Aug 2013 12:22:04 +0000 (UTC)
	(envelope-from mitya@agata.yandex.ru)
Received: from agata.yandex.ru (unknown [IPv6:2a02:6b8:0:c38::7])
	(using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
	(No client certificate requested)
	by mx1.freebsd.org (Postfix) with ESMTPS id AE48222F5
	for <FreeBSD-gnats-submit@freebsd.org>; Sun, 11 Aug 2013 12:22:03 +0000 (UTC)
Received: from agata.yandex.ru (localhost [127.0.0.1])
	by agata.yandex.ru (8.14.7/8.14.7) with ESMTP id r7BCLt9D003012
	for <FreeBSD-gnats-submit@freebsd.org>; Sun, 11 Aug 2013 16:21:55 +0400 (MSK)
	(envelope-from mitya@agata.yandex.ru)
Received: (from mitya@localhost)
	by agata.yandex.ru (8.14.7/8.14.7/Submit) id r7BCLteq003011;
	Sun, 11 Aug 2013 16:21:55 +0400 (MSK)
	(envelope-from mitya)
Message-Id: <201308111221.r7BCLteq003011@agata.yandex.ru>
Date: Sun, 11 Aug 2013 16:21:55 +0400 (MSK)
From: Dmitry Sivachenko <trtrmitya@gmail.com>
Reply-To: Dmitry Sivachenko <trtrmitya@gmail.com>
To: FreeBSD-gnats-submit@freebsd.org
Cc:
Subject: Writes to almost full FS eat 100% CPU and speed drops below 1MB/sec
X-Send-Pr-Version: 3.114
X-GNATS-Notify:

>Number:         181226
>Category:       kern
>Synopsis:       [ufs] Writes to almost full FS eat 100% CPU and speed drops below 1MB/sec [regression]
>Confidential:   no
>Severity:       serious
>Priority:       low
>Responsible:    mckusick
>State:          closed
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Sun Aug 11 12:30:01 UTC 2013
>Closed-Date:    Thu Sep 12 19:42:25 UTC 2013
>Last-Modified:  Thu Sep 12 19:42:25 UTC 2013
>Originator:     Dmitry Sivachenko
>Release:        FreeBSD 9.2-BETA2 amd64
>Organization:
>Environment:
System: FreeBSD agata.yandex.ru 9.2-BETA2 FreeBSD 9.2-BETA2 #2 r253884M: Sun Aug 11 12:12:37 MSK 2013 mitya@agata.yandex.ru:/usr/obj/opt/WRK/src/sys/CAVIA amd64


FreeBSD-9.2-BETA2
>Description:
I have 25TB Dell PERC 6 RAID5 array.  When it becomes almost full
(10-20GB free), processes which write data to it start eating 100% CPU and 
write speed drops below 1MB/sec (normally to gives 400MB/sec).

 1889 mitya           1 100    0  2058M  1027M CPU12  12   0:47 92.77% dd

systat -vm shows disk array is not busy:

Disks mfid0
KB/t  63.71
tps      65
MB/s   0.77
%busy     3

If I delete some files to free space during that slow write, the same process
starts writing with normal speed.

I was running that machine with ~1 year old 9-STABLE without any problems.
That array often overflows, and I always got "filesystem is full" error
without write speed reduction.
The problem appeared after I upgraded to 9.2-BETA2 few days ago.

tunefs: POSIX.1e ACLs: (-a)                                disabled
tunefs: NFSv4 ACLs: (-N)                                   disabled
tunefs: MAC multilabel: (-l)                               disabled
tunefs: soft updates: (-n)                                 enabled
tunefs: soft update journaling: (-j)                       disabled
tunefs: gjournal: (-J)                                     disabled
tunefs: trim: (-t)                                         disabled
tunefs: maximum blocks per file in a cylinder group: (-e)  4096
tunefs: average file size: (-f)                            16384
tunefs: average number of files in a directory: (-s)       64
tunefs: minimum percentage of free space: (-m)             1%
tunefs: space to hold for metadata blocks: (-k)            0
tunefs: optimization preference: (-o)                      space
tunefs: volume label: (-L)                                 

>How-To-Repeat:
	
>Fix:

	


>Release-Note:
>Audit-Trail:
Responsible-Changed-From-To: freebsd-bugs->freebsd-fs 
Responsible-Changed-By: linimon 
Responsible-Changed-When: Sun Aug 11 13:39:12 UTC 2013 
Responsible-Changed-Why:  
Submitter notes this is a recent regression. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=181226 

From: Dmitry Sivachenko <trtrmitya@gmail.com>
To: bug-followup@FreeBSD.org
Cc:  
Subject: Re: kern/181226: [ufs] Writes to almost full FS eat 100% CPU and speed drops below 1MB/sec [regression]
Date: Sun, 25 Aug 2013 19:45:08 +0400

 I found the exact revision number which broke that:
 
 Author: mckusick
 Date: Mon Apr 22 23:59:00 2013
 New Revision: 249782
 URL: http://svnweb.freebsd.org/changeset/base/249782
Responsible-Changed-From-To: freebsd-fs->mckusick 
Responsible-Changed-By: delphij 
Responsible-Changed-When: Sun Aug 25 19:33:07 UTC 2013 
Responsible-Changed-Why:  
Over to UFS maintainer. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=181226 

From: dfilter@FreeBSD.ORG (dfilter service)
To: bug-followup@FreeBSD.org
Cc:  
Subject: Re: kern/181226: commit references a PR
Date: Wed, 28 Aug 2013 17:38:13 +0000 (UTC)

 Author: mckusick
 Date: Wed Aug 28 17:38:05 2013
 New Revision: 254995
 URL: http://svnweb.freebsd.org/changeset/base/254995
 
 Log:
   A performance problem was reported in PR kern/181226:
   
       I have 25TB Dell PERC 6 RAID5 array. When it becomes almost
       full (10-20GB free), processes which write data to it start
       eating 100% CPU and write speed drops below 1MB/sec (normally
       to gives 400MB/sec). The revision at which it first became
       apparent was http://svnweb.freebsd.org/changeset/base/249782.
   
   The offending change reserved an area in each cylinder group to
   store metadata. The new algorithm attempts to save this area for
   metadata and allows its use for non-metadata only after all the
   data areas have been exhausted. The size of the reserved area
   defaults to half of minfree, so the filesystem reports full before
   the data area can completely fill. However, in this report, the
   filesystem has had minfree reduced to 1% thus forcing the metadata
   area to be used for data. As the filesystem approached full, it
   had only metadata areas left to allocate. The result was that
   every block allocation had to scan summary data for 30,000 cylinder
   groups before falling back to searching up to 30,000 metadata areas.
   
   The fix is to give up on saving the metadata areas once the free
   space reserve drops below 2%. The effect of this change is to use
   the old algorithm of just accepting the first available block that
   we find. Since most filesystems use the default 5% minfree, this
   will have no effect on their operation. For those that want to push
   to the limit, they will get their crappy block placements quickly.
   
   Submitted by:  Dmitry Sivachenko
   Fix Tested by: Dmitry Sivachenko
   PR:            kern/181226
   MFC after:     2 weeks
 
 Modified:
   head/sys/ufs/ffs/ffs_alloc.c
 
 Modified: head/sys/ufs/ffs/ffs_alloc.c
 ==============================================================================
 --- head/sys/ufs/ffs/ffs_alloc.c	Wed Aug 28 16:59:55 2013	(r254994)
 +++ head/sys/ufs/ffs/ffs_alloc.c	Wed Aug 28 17:38:05 2013	(r254995)
 @@ -516,7 +516,13 @@ ffs_reallocblks_ufs1(ap)
  	ip = VTOI(vp);
  	fs = ip->i_fs;
  	ump = ip->i_ump;
 -	if (fs->fs_contigsumsize <= 0)
 +	/*
 +	 * If we are not tracking block clusters or if we have less than 2%
 +	 * free blocks left, then do not attempt to cluster. Running with
 +	 * less than 5% free block reserve is not recommended and those that
 +	 * choose to do so do not expect to have good file layout.
 +	 */
 +	if (fs->fs_contigsumsize <= 0 || freespace(fs, 2) < 0)
  		return (ENOSPC);
  	buflist = ap->a_buflist;
  	len = buflist->bs_nchildren;
 @@ -737,7 +743,13 @@ ffs_reallocblks_ufs2(ap)
  	ip = VTOI(vp);
  	fs = ip->i_fs;
  	ump = ip->i_ump;
 -	if (fs->fs_contigsumsize <= 0)
 +	/*
 +	 * If we are not tracking block clusters or if we have less than 2%
 +	 * free blocks left, then do not attempt to cluster. Running with
 +	 * less than 5% free block reserve is not recommended and those that
 +	 * choose to do so do not expect to have good file layout.
 +	 */
 +	if (fs->fs_contigsumsize <= 0 || freespace(fs, 2) < 0)
  		return (ENOSPC);
  	buflist = ap->a_buflist;
  	len = buflist->bs_nchildren;
 _______________________________________________
 svn-src-all@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/svn-src-all
 To unsubscribe, send any mail to "svn-src-all-unsubscribe@freebsd.org"
 
State-Changed-From-To: open->patched 
State-Changed-By: mckusick 
State-Changed-When: Wed Aug 28 17:53:33 UTC 2013 
State-Changed-Why:  
A working patch has been applied to head. Assuming no problems are 
reported it will be MFC'ed to 9 in two weeks and this report closed. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=181226 

From: dfilter@FreeBSD.ORG (dfilter service)
To: bug-followup@FreeBSD.org
Cc:  
Subject: Re: kern/181226: commit references a PR
Date: Thu, 12 Sep 2013 19:36:11 +0000 (UTC)

 Author: mckusick
 Date: Thu Sep 12 19:36:04 2013
 New Revision: 255494
 URL: http://svnweb.freebsd.org/changeset/base/255494
 
 Log:
   MFC of 254995:
   
   A performance problem was reported in PR kern/181226:
   
       I have 25TB Dell PERC 6 RAID5 array. When it becomes almost
       full (10-20GB free), processes which write data to it start
       eating 100% CPU and write speed drops below 1MB/sec (normally
       to gives 400MB/sec). The revision at which it first became
       apparent was http://svnweb.freebsd.org/changeset/base/249782.
   
   The offending change reserved an area in each cylinder group to
   store metadata. The new algorithm attempts to save this area for
   metadata and allows its use for non-metadata only after all the
   data areas have been exhausted. The size of the reserved area
   defaults to half of minfree, so the filesystem reports full before
   the data area can completely fill. However, in this report, the
   filesystem has had minfree reduced to 1% thus forcing the metadata
   area to be used for data. As the filesystem approached full, it
   had only metadata areas left to allocate. The result was that
   every block allocation had to scan summary data for 30,000 cylinder
   groups before falling back to searching up to 30,000 metadata areas.
   
   The fix is to give up on saving the metadata areas once the free
   space reserve drops below 2%. The effect of this change is to use
   the old algorithm of just accepting the first available block that
   we find. Since most filesystems use the default 5% minfree, this
   will have no effect on their operation. For those that want to push
   to the limit, they will get their crappy block placements quickly.
   
   Submitted by:  Dmitry Sivachenko
   Fix Tested by: Dmitry Sivachenko
   PR:            kern/181226
   
   MFC of 254996:
   
   In looking at block layouts as part of fixing filesystem block
   allocations under low free-space conditions (-r254995), determine
   that old block-preference search order used before -r249782 worked
   a bit better. This change reverts to that block-preference search order.
 
 Modified:
   stable/9/sys/ufs/ffs/ffs_alloc.c
 Directory Properties:
   stable/9/sys/   (props changed)
 
 Modified: stable/9/sys/ufs/ffs/ffs_alloc.c
 ==============================================================================
 --- stable/9/sys/ufs/ffs/ffs_alloc.c	Thu Sep 12 18:08:25 2013	(r255493)
 +++ stable/9/sys/ufs/ffs/ffs_alloc.c	Thu Sep 12 19:36:04 2013	(r255494)
 @@ -516,7 +516,13 @@ ffs_reallocblks_ufs1(ap)
  	ip = VTOI(vp);
  	fs = ip->i_fs;
  	ump = ip->i_ump;
 -	if (fs->fs_contigsumsize <= 0)
 +	/*
 +	 * If we are not tracking block clusters or if we have less than 2%
 +	 * free blocks left, then do not attempt to cluster. Running with
 +	 * less than 5% free block reserve is not recommended and those that
 +	 * choose to do so do not expect to have good file layout.
 +	 */
 +	if (fs->fs_contigsumsize <= 0 || freespace(fs, 2) < 0)
  		return (ENOSPC);
  	buflist = ap->a_buflist;
  	len = buflist->bs_nchildren;
 @@ -736,7 +742,13 @@ ffs_reallocblks_ufs2(ap)
  	ip = VTOI(vp);
  	fs = ip->i_fs;
  	ump = ip->i_ump;
 -	if (fs->fs_contigsumsize <= 0)
 +	/*
 +	 * If we are not tracking block clusters or if we have less than 2%
 +	 * free blocks left, then do not attempt to cluster. Running with
 +	 * less than 5% free block reserve is not recommended and those that
 +	 * choose to do so do not expect to have good file layout.
 +	 */
 +	if (fs->fs_contigsumsize <= 0 || freespace(fs, 2) < 0)
  		return (ENOSPC);
  	buflist = ap->a_buflist;
  	len = buflist->bs_nchildren;
 @@ -1173,7 +1185,7 @@ ffs_dirpref(pip)
  			if (fs->fs_contigdirs[cg] < maxcontigdirs)
  				return ((ino_t)(fs->fs_ipg * cg));
  		}
 -	for (cg = prefcg - 1; cg >= 0; cg--)
 +	for (cg = 0; cg < prefcg; cg++)
  		if (fs->fs_cs(fs, cg).cs_ndir < maxndir &&
  		    fs->fs_cs(fs, cg).cs_nifree >= minifree &&
  	    	    fs->fs_cs(fs, cg).cs_nbfree >= minbfree) {
 @@ -1186,7 +1198,7 @@ ffs_dirpref(pip)
  	for (cg = prefcg; cg < fs->fs_ncg; cg++)
  		if (fs->fs_cs(fs, cg).cs_nifree >= avgifree)
  			return ((ino_t)(fs->fs_ipg * cg));
 -	for (cg = prefcg - 1; cg >= 0; cg--)
 +	for (cg = 0; cg < prefcg; cg++)
  		if (fs->fs_cs(fs, cg).cs_nifree >= avgifree)
  			break;
  	return ((ino_t)(fs->fs_ipg * cg));
 _______________________________________________
 svn-src-all@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/svn-src-all
 To unsubscribe, send any mail to "svn-src-all-unsubscribe@freebsd.org"
 
State-Changed-From-To: patched->closed 
State-Changed-By: mckusick 
State-Changed-When: Thu Sep 12 19:41:08 UTC 2013 
State-Changed-Why:  
The fixes have been MFC'ed to 9-stable. They are not relevant 
to earlier versions of the system. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=181226 
>Unformatted:
