From nobody@FreeBSD.org  Wed Jun 28 19:43:32 2006
Return-Path: <nobody@FreeBSD.org>
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id DCC7916A608
	for <freebsd-gnats-submit@FreeBSD.org>; Wed, 28 Jun 2006 19:43:31 +0000 (UTC)
	(envelope-from nobody@FreeBSD.org)
Received: from www.freebsd.org (www.freebsd.org [216.136.204.117])
	by mx1.FreeBSD.org (Postfix) with ESMTP id F0A7F44537
	for <freebsd-gnats-submit@FreeBSD.org>; Wed, 28 Jun 2006 18:47:31 +0000 (GMT)
	(envelope-from nobody@FreeBSD.org)
Received: from www.freebsd.org (localhost [127.0.0.1])
	by www.freebsd.org (8.13.1/8.13.1) with ESMTP id k5SIlVvt029833
	for <freebsd-gnats-submit@FreeBSD.org>; Wed, 28 Jun 2006 18:47:31 GMT
	(envelope-from nobody@www.freebsd.org)
Received: (from nobody@localhost)
	by www.freebsd.org (8.13.1/8.13.1/Submit) id k5SIlVql029832;
	Wed, 28 Jun 2006 18:47:31 GMT
	(envelope-from nobody)
Message-Id: <200606281847.k5SIlVql029832@www.freebsd.org>
Date: Wed, 28 Jun 2006 18:47:31 GMT
From: Helio Luchtenberg Junior <hlj@viamidia.net>
To: freebsd-gnats-submit@FreeBSD.org
Subject: UFS2 filesystems hang when doing "fsck -B" or "dump -L" or "mksnapffs" in a moderated I/O filesystem with many file locks/unlocks
X-Send-Pr-Version: www-2.3

>Number:         99588
>Category:       kern
>Synopsis:       [ufs] UFS2 filesystems hang when doing "fsck -B" or "dump -L" or "mksnapffs" in a moderated I/O filesystem with many file locks/unlocks
>Confidential:   no
>Severity:       critical
>Priority:       high
>Responsible:    vwe
>State:          closed
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Wed Jun 28 19:50:17 GMT 2006
>Closed-Date:    Thu Dec 25 21:33:19 UTC 2008
>Last-Modified:  Thu Dec 25 21:33:19 UTC 2008
>Originator:     Helio Luchtenberg Junior
>Release:        FreeBSD 5.4p12
>Organization:
Viamidia Tecnologia S.A.
>Environment:
FreeBSD freeteste.viamidia.com 5.4-RELEASE-p12 FreeBSD 5.4-RELEASE-p12 #1: Wed Mar  8 16:08:29 UTC 2006     :/usr/obj/usr/src/sys/VIAMIDIA  i386

>Description:
We have noted filesystem "freezing" when creating filesystem snapshots
(mksnap_ffs), or fsck'ing in background (fsck -B), or dumping (dump -L)
on a filesystem of type UFS2 (default for FreeBSD 5.x) with a moderate I/O
and many processes doing intensive file "locking/unlocking" on that filesystem.

After the filesystem freezes, we could see that no activity was being done
on the processes trying to access files on that filesystem.  These processes
are kept in the "D" state (Disk wait) forever.  This seems to be a deadlock
because the processes that are keeping locks on that filesystem can not be
killed/aborted in any way (even with kill -9) and all they are seem in a
"D" state.  The "iostat" showed no disk activity on that filesystem after
some time.

We could reproduce the problem, see the description of how to do this below.
>How-To-Repeat:
We have created three directories below mountpoint (a UFS2 filesystem)
"/jails": /jails/teste1, /jails/teste2, /jails/teste3.  In each of these
directories we have put 120 files.  After running six instances of the
program below, being two copies of it modified as to point to the files
on "/jails/teste1", and other two copies of it modified as to point to
the files on "/jails/teste2" and finally more two copies of it modified
as to point to "/jails/teste3".  When these six copies of the program below
are run, and we try to create a filesystem snapshot of that filesystem
(/jails), after some time the filesystem hangs and no other activity can
be seen on it.  All six copies of the program are found to be in "D" state,
waiting for a disk operation to complete.  The only solution found to
restore the filesystem to a running state is to reboot the server.

---------------------------------
#include <stdio.h>
#include <fcntl.h>
#include <sys/types.h>
#include <sys/file.h>
#include <dirent.h>

struct dirent *dp;
DIR *dirp;
char name[4096];
int arq;

main(int argc, char *argv[])
{
        arq=0;
        while(1)
        {
           dirp = opendir("/jails/teste1"); arq=0;
           dp=readdir(dirp); /* skip directory "." */
           dp=readdir(dirp); /* skip directory ".." */

           while ((dp = readdir(dirp)) != NULL)
           {
                sprintf(name,"/jails/teste1/%s",dp->d_name);
                arq=open(name,O_RDWR);
                flock(arq,LOCK_EX);
                close(arq);
           }
           (void)closedir(dirp);
           dirp = opendir("/jails/teste1"); arq=0;
           dp=readdir(dirp); /* skip directory "." */
           dp=readdir(dirp); /* skip directory ".." */

           while ((dp = readdir(dirp)) != NULL)
           {
                sprintf(name,"/jails/teste1/%s",dp->d_name);
                arq=open(name,O_RDWR);
                flock(arq,LOCK_UN);
                close(arq);
           }
           (void)closedir(dirp);
        }
}
-------------------------------
>Fix:

>Release-Note:
>Audit-Trail:

From: "Dylan Cochran" <a134qaed@gmail.com>
To: bug-followup@freebsd.org, "Helio Luchtenberg Junior" <hlj@viamidia.net>
Cc:  
Subject: Re: kern/99588: UFS2 filesystems hang when doing "fsck -B" or "dump -L" or "mksnapffs" in a moderated I/O filesystem with many file locks/unlocks
Date: Fri, 15 Feb 2008 23:52:09 -0500

 Numerous locking fixes to the VFS have been committed since
 5.4-RELEASE which affects UFS snapshots, can you test if the behaviour
 exists with 7.0-RC2 or a -CURRENT snapshot?
State-Changed-From-To: open->feedback 
State-Changed-By: linimon 
State-Changed-When: Sat Feb 16 05:44:34 UTC 2008 
State-Changed-Why:  
Note that submitter has been asked for feedback. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=99588 
State-Changed-From-To: feedback->suspended 
State-Changed-By: vwe 
State-Changed-When: Sat May 10 17:52:53 UTC 2008 
State-Changed-Why:  

no feedback received for quite some time - suspend 
to be closed after 5.x EOL date (soon!) 


Responsible-Changed-From-To: freebsd-bugs->vwe 
Responsible-Changed-By: vwe 
Responsible-Changed-When: Sat May 10 17:52:53 UTC 2008 
Responsible-Changed-Why:  

track for EOL 

http://www.freebsd.org/cgi/query-pr.cgi?pr=99588 
State-Changed-From-To: suspended->closed  
State-Changed-By: brucec 
State-Changed-When: Thu Dec 25 21:29:16 UTC 2008 
State-Changed-Why:  
Feedback timeout, 5.x has gone EOL.  

http://www.freebsd.org/cgi/query-pr.cgi?pr=99588 
>Unformatted:
