From nobody@FreeBSD.org  Wed Feb 22 10:55:32 2012
Return-Path: <nobody@FreeBSD.org>
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 033E2106566C
	for <freebsd-gnats-submit@FreeBSD.org>; Wed, 22 Feb 2012 10:55:32 +0000 (UTC)
	(envelope-from nobody@FreeBSD.org)
Received: from red.freebsd.org (red.freebsd.org [IPv6:2001:4f8:fff6::22])
	by mx1.freebsd.org (Postfix) with ESMTP id E34E98FC15
	for <freebsd-gnats-submit@FreeBSD.org>; Wed, 22 Feb 2012 10:55:31 +0000 (UTC)
Received: from red.freebsd.org (localhost [127.0.0.1])
	by red.freebsd.org (8.14.4/8.14.4) with ESMTP id q1MAtVcR032425
	for <freebsd-gnats-submit@FreeBSD.org>; Wed, 22 Feb 2012 10:55:31 GMT
	(envelope-from nobody@red.freebsd.org)
Received: (from nobody@localhost)
	by red.freebsd.org (8.14.4/8.14.4/Submit) id q1MAtVW4032424;
	Wed, 22 Feb 2012 10:55:31 GMT
	(envelope-from nobody)
Message-Id: <201202221055.q1MAtVW4032424@red.freebsd.org>
Date: Wed, 22 Feb 2012 10:55:31 GMT
From: Vsevolod Volkov <vvv@colocall.net>
To: freebsd-gnats-submit@FreeBSD.org
Subject: Multiple mkdir/rmdir fails with errno 31
X-Send-Pr-Version: www-3.1
X-GNATS-Notify:

>Number:         165392
>Category:       kern
>Synopsis:       [ufs] [patch] Multiple mkdir/rmdir fails with errno 31
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    freebsd-fs
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Wed Feb 22 11:00:22 UTC 2012
>Closed-Date:    
>Last-Modified:  Sun Apr 20 00:20:25 UTC 2014
>Originator:     Vsevolod Volkov
>Release:        9.0-RELEASE amd64/i386
>Organization:
>Environment:
FreeBSD 9.0-RELEASE #0: Mon Feb 13 12:12:58 EET 2012 amd64
FreeBSD 9.0-RELEASE #1: Thu Feb  9 16:29:18 EET 2012 i386
>Description:
Multiple sequence of mkdir and rmdir causes mkdir failure with errno 31. Usualy it happens on 32765 iteration.
>How-To-Repeat:
Compile and execute the following program:

#include <sys/stat.h>
#include <unistd.h>
#include <stdio.h>
#include <errno.h>

int main (void)
{
  int i;
  char dir[100];
  for (i = 0; i < 50000; i++)
  {
    snprintf (dir, sizeof(dir), "empty_dir/%d", i);
    printf ("%s\n", dir);
    if (mkdir (dir, 0700) == -1)
    {
      printf ("mkdir %s: (errno %d)\n", dir, errno);
      break;
    }
    if (rmdir (dir) == -1)
    {
      printf ("rmdir %s: (errno %d)\n", dir, errno);
      break;
    }
  }
  return 0;
}

gcc -o test1 test1.c
mkdir empty_dir
./test1
>Fix:


>Release-Note:
>Audit-Trail:

From: Andriy Gapon <avg@FreeBSD.org>
To: bug-followup@FreeBSD.org, vvv@colocall.net
Cc:  
Subject: Re: kern/165392: Multiple mkdir/rmdir fails with errno 31
Date: Wed, 22 Feb 2012 23:23:15 +0200

 Could you please provide full details of the tested filesystem?
 -- 
 Andriy Gapon
Responsible-Changed-From-To: freebsd-bugs->eadler 
Responsible-Changed-By: eadler 
Responsible-Changed-When: Thu Feb 23 05:05:19 UTC 2012 
Responsible-Changed-Why:  
I'll take it. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=165392 

From: Vsevolod Volkov <vvv@colocall.net>
To: Andriy Gapon <avg@FreeBSD.org>
Cc: bug-followup@FreeBSD.org
Subject: Re: kern/165392: Multiple mkdir/rmdir fails with errno 31
Date: Thu, 23 Feb 2012 10:06:03 +0200

 There is no problem with zfs. Test passed with 200000 iterations.

From: Vsevolod Volkov <vvv@colocall.net>
To: Andriy Gapon <avg@FreeBSD.org>
Cc: bug-followup@FreeBSD.org
Subject: Re: kern/165392: Multiple mkdir/rmdir fails with errno 31
Date: Thu, 23 Feb 2012 09:59:42 +0200

 I've tested 2 computers with 9.0-RELEASE (amd64 and i386). Filesystems
 are UFS2 with soft updates:
 
 tunefs: POSIX.1e ACLs: (-a)                                disabled
 tunefs: NFSv4 ACLs: (-N)                                   disabled
 tunefs: MAC multilabel: (-l)                               disabled
 tunefs: soft updates: (-n)                                 enabled
 tunefs: soft update journaling: (-j)                       disabled
 tunefs: gjournal: (-J)                                     disabled
 tunefs: trim: (-t)                                         disabled
 tunefs: maximum blocks per file in a cylinder group: (-e)  2048
 tunefs: average file size: (-f)                            16384
 tunefs: average number of files in a directory: (-s)       64
 tunefs: minimum percentage of free space: (-m)             8%
 tunefs: optimization preference: (-o)                      time
 tunefs: volume label: (-L)
Responsible-Changed-From-To: eadler->freebsd-fs 
Responsible-Changed-By: eadler 
Responsible-Changed-When: Sat Feb 25 15:24:40 UTC 2012 
Responsible-Changed-Why:  
I'm not going to have time to look into this soon enough 

http://www.freebsd.org/cgi/query-pr.cgi?pr=165392 

From: Jilles Tjoelker <jilles@stack.nl>
To: bug-followup@FreeBSD.org, vvv@colocall.net
Cc:  
Subject: Re: kern/165392: Multiple mkdir/rmdir fails with errno 31
Date: Sat, 25 Feb 2012 19:27:02 +0100

 > [mkdir fails with [EMLINK], but link count < LINK_MAX]
 
 I can reproduce this problem with UFS with soft updates (with or without
 journaling).
 
 A reproduction without C programs is:
 
 cd empty_dir
 mkdir `jot 32766 1`     # the last one will fail (correctly)
 rmdir 1
 mkdir a                 # will erroneously fail
 
 The problem appears to be because the previous rmdir has not yet been
 fully completed. It is still holding onto the link count until the
 directory is written, which may take up to two minutes.
 
 The same problem can occur with other calls that increase the link count
 such as link() and rename().
 
 A workaround is to call fsync() on the directory that contained the
 deleted entries. It will then release its hold on the link count and
 allow mkdir or other calls. If fsync() is only called when [EMLINK] is
 returned, the performance impact should not be very bad, although it
 still causes more I/O than necessary.
 
 The book "The Design and Implementation of the FreeBSD Operating System"
 contains a detailed description of soft updates in section 8.6 Soft
 Updates. The subsection "File Removal Requirements for Soft Updates"
 appears particularly relevant to this problem.
 
 A possible solution is to check for the problematic situation
 (i_effnlink < LINK_MAX && i_nlink >= LINK_MAX) and if so synchronously
 write one or more deleted directory entries that pointed to the inode
 with the link count problem. After that, i_nlink should be less than
 LINK_MAX and the link count can be checked again (depending on whether
 locks need to be dropped to do the write, it may or may not be possible
 for another thread to use up the last link first).
 
 For mkdir() and rename(), the directory that contains the deleted
 entries is obvious (the directory that will contain the new directory)
 while for link() it can (in the general case) only be found in soft
 updates data structures. Soft updates must track this because (if the
 link count became 0) it will not clear the inode before all directory
 entries that pointed to it have been written.
 
 Simply replacing the i_nlink < LINK_MAX check with i_effnlink < LINK_MAX
 is unsafe because it will lead to overflow of the 16-bit signed i_nlink
 field. If the field is made larger, I don't see how it is prevented that
 the code commits such a set of changes that an inode on disk has more
 than LINK_MAX links for some time (for example if a file in the new
 directory is fsynced while the old directory entries are still on the
 disk).
 
 -- 
 Jilles Tjoelker

From: Jaakko Heinonen <jh@FreeBSD.org>
To: Jilles Tjoelker <jilles@stack.nl>
Cc: bug-followup@FreeBSD.org, vvv@colocall.net, mckusick@FreeBSD.org
Subject: Re: kern/165392: Multiple mkdir/rmdir fails with errno 31
Date: Mon, 20 May 2013 22:21:34 +0300

 Hi!
 
 >  A workaround is to call fsync() on the directory that contained the
 >  deleted entries. It will then release its hold on the link count and
 >  allow mkdir or other calls. If fsync() is only called when [EMLINK] is
 >  returned, the performance impact should not be very bad, although it
 >  still causes more I/O than necessary.
 
 I tried to implement this with the following patch:
 
 	http://people.freebsd.org/~jh/patches/ufs-check_linkcnt.diff
 
 However, VOP_FSYNC(9) with the MNT_WAIT flag seems not to update the
 i_nlink count for a reason unknown to me. I can verify that also by
 taking your reproduction recipe above and adding "fsync ." between
 "rmdir 1" and "mkdir a".
 
 Does this mean that fsync(2) is broken for directories on softdep
 enabled UFS?
 
 I have cc'd Kirk in hope he could shed some light on this.
 
 -- 
 Jaakko

From: Jilles Tjoelker <jilles@stack.nl>
To: Jaakko Heinonen <jh@FreeBSD.org>
Cc: bug-followup@FreeBSD.org, vvv@colocall.net, mckusick@FreeBSD.org
Subject: Re: kern/165392: Multiple mkdir/rmdir fails with errno 31
Date: Mon, 27 May 2013 18:53:28 +0200

 fsync certainly helps but not as effectively as you'd want. Some
 combination of sleeps, fsyncs and mkdir attempts appears to be needed. A
 shell loop like
   rmdir 8; fsync .; \
   until mkdir h 2>/dev/null; do printf .; fsync .; sleep 1; done
 takes two seconds.
 
 However, in
   rmdir 13; mkdir m; fsync .; \
   until mkdir m 2>/dev/null; do printf .; sleep 1; done
 the fsync is of no benefit. It is just as slow as omitting it (about
 half a minute).
 
 I must have taken long enough to type/recall the commands when I tried
 this earlier. In my earlier experiments I gave the commands separately.
 
 > Does this mean that fsync(2) is broken for directories on softdep
 > enabled UFS?
 
 I don't think fsync(2) has to sync the exact link count to disk, since
 fsck will take care of that. However, it has to sync the timestamps,
 permissions and directory entries.
 
 > I have cc'd Kirk in hope he could shed some light on this.
 
 I'm also interested in whether it is safe to call VOP_FSYNC at that
 point, especially in the case of a rename where a lock on the source
 directory vnode may be held at the same time.
 
 -- 
 Jilles Tjoelker

From: Jaakko Heinonen <jh@FreeBSD.org>
To: Jilles Tjoelker <jilles@stack.nl>
Cc: bug-followup@FreeBSD.org, vvv@colocall.net, mckusick@FreeBSD.org
Subject: Re: kern/165392: Multiple mkdir/rmdir fails with errno 31
Date: Wed, 29 May 2013 19:53:11 +0300

 On 2013-05-27, Jilles Tjoelker wrote:
 > > However, VOP_FSYNC(9) with the MNT_WAIT flag seems not to update the
 > > i_nlink count for a reason unknown to me. I can verify that also by
 > > taking your reproduction recipe above and adding "fsync ." between
 > > "rmdir 1" and "mkdir a".
 > 
 > fsync certainly helps but not as effectively as you'd want. Some
 > combination of sleeps, fsyncs and mkdir attempts appears to be needed.
 
 I have revised the patch and the following version _appears_ to work.
 
 	http://people.freebsd.org/~jh/patches/ufs-check_linkcnt.2.diff
 
 It's still experimental and doesn't handle link(2) or rename(2) at all.
 
 In my testing debug.softdep.linkcnt_retries is increased by one with
 your original reproduction recipe.
 
 > I'm also interested in whether it is safe to call VOP_FSYNC at that
 > point, especially in the case of a rename where a lock on the source
 > directory vnode may be held at the same time.
 
 I think your concern is valid because softdep_fsync() needs to lock
 parent directories. Possibly you can work around the problem by
 unlocking the vnodes, doing fsync and then restarting rename.
 Unfortunately this makes rename even more complex.
 
 -- 
 Jaakko
>Unformatted:
