From nobody@FreeBSD.ORG Tue Nov 30 17:34:54 1999
Return-Path: <nobody@FreeBSD.ORG>
Received: by hub.freebsd.org (Postfix, from userid 32767)
	id 77E4615AA7; Tue, 30 Nov 1999 17:34:46 -0800 (PST)
Message-Id: <19991201013446.77E4615AA7@hub.freebsd.org>
Date: Tue, 30 Nov 1999 17:34:46 -0800 (PST)
From: hostetlb@agcs.com
Sender: nobody@FreeBSD.ORG
To: freebsd-gnats-submit@freebsd.org
Subject: Kernel hangs during concurrent NFS writes
X-Send-Pr-Version: www-1.0

>Number:         15195
>Category:       kern
>Synopsis:       Kernel hangs during concurrent NFS writes
>Confidential:   no
>Severity:       critical
>Priority:       high
>Responsible:    dillon
>State:          closed
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Tue Nov 30 17:40:00 PST 1999
>Closed-Date:    Tue Dec 14 11:13:34 PST 1999
>Last-Modified:  Tue Dec 14 11:14:01 PST 1999
>Originator:     Bly Hostetler
>Release:        2.2.8 through 3.3
>Organization:
AG Communication Systems
>Environment:
FreeBSD calvin-t1.labs.agcs.com 2.2.8-RELEASE FreeBSD 2.2.8-RELEASE #0: Tue Nov
30 09:51:56 GMT 1999     root@calvin-t1.labs.agcs.com:/usr/src/sys/compile/AGCS
 i386
>Description:
We have multiple processes that have opened the same file with mode "a+"
(append, read-access).  The file is located on an NFS mounted partition
(easiest to reproduce using UDP, NFSv3; but we have seen it using UDP
or TCP/IP, and NFSv3 or NFSv2).  When the processes start writing to that
file at the same time, the OS get locked in the kernel; no login prompts,
no shell prompts, no i/o, nothing.

Trapping to DDB (ctrl-alt-esc) shows that the kernel is inside _nfs_write;
we have trapped it at several locations, but always within _nfs_write.
Following is an example "trace" from DDB :

_biodone(f418d200,f418d200,f22bf000,f22dc600,3) at _boidone 0x2e6
_nfs_doio(f418d200,f22df000,f22dc600,1,f418d200) at _nfs_doio 0x4c5
_nfs_strategy(efbffddc) at _nfs_strategy 0x68
_nfs_writebp(f418d200,1,efbffec4,f0163000,efbffe50) at _nfs_writebp 0x125
_nfs_bwrite(efbff50) at _nfs_bwrite 0x10
_nfs_write(efbffee08,f02564e0,1,efbff94,2) at _nfs_write 0x648
_vn_write(f25d6f80,efbfff34,f22bf000,f02564e0,f22c600) at _vn_write 0x93
_write(f220c600,efbff94,efbff84,0,5167) at _write 0x97
_syscall

We mounted the directory using :

/sbin/mount_nfs -U -c plato-t1:/u/FreeBSD/decm /usr/home/decm

We are running on a P-II 450, 512 Meg RAM, 1024 Swap, 100 Mbit ethernet.

Although we initially detected the problem in FreeBSD 2.2.8, we
subsequently loaded FreeBSD 3.3, and had the same results.

The sample code below was created to simulate third-party software that
was actually the original cause of the lock-ups.  We understand that this
form of concurrent writes to a file is asking for trouble, but we did
not have control over the problem code.  The sample code below worked
when tried on SCO Unix using the same NFS mounted directory (and file).

We have identified a fix to the C source code, and passed this on to
our vender, but we also believe this problem should be corrected in the
OS's NFS layer.
>How-To-Repeat:
The following program can be used to create the problem.  By default it
spawns 10 writes, but can spawn any number.  We have only used 10 and 50,
and the problem occurs immediately (within 5-10 writes to the file.)

*** BEGIN C CODE ***

#include <stdio.h>

main(int argc,
     char **argv)
{
    int forks = 10;
    int writer_number = 1;
    FILE *fp1;
    int i = 0;

    if (argc > 1) {
        forks = atoi(argv[1]);
    }
    fprintf(stderr, "Spawning %d writers\n", forks);

    while (--forks > 0) {
        if (!fork()) {
            /* Child */
            break;
        }
        writer_number++;
    }

    fprintf(stderr, "Writer number %d\n", writer_number);

    while (1) {
        while (!(fp1 = fopen("F1", "a+")));

        fprintf(fp1, "%d %d\n", writer_number, i);
        fflush(fp1);
        fclose(fp1);

        i++;
    }
}

*** END OF C CODE ***

The problem lies in the fact that each process is opening the file for
append, but is not locking the file for exclusive access.

The user-level work-around to the above code is to lock the file for
exclusive access.  Adding the following lines (in the locations
indicated) :

/* ... at the top of the file */
#include <sys/file.h>

...

        /* After the "while (!(...fopen(...)));" add this */
        flock(fileno(fp1), LOCK_EX);

>Fix:


>Release-Note:
>Audit-Trail:
Responsible-Changed-From-To: freebsd-bugs->dillon 
Responsible-Changed-By: sheldonh 
Responsible-Changed-When: Wed Dec 1 05:43:32 PST 1999 
Responsible-Changed-Why:  
Matt, are you interested in RELENG_3 stuff, or are your 
efforts limited to CURRENT? 

From: Ian Dowse <iedowse@maths.tcd.ie>
To: hostetlb@agcs.com
Cc: freebsd-gnats-submit@freebsd.org, iedowse@maths.tcd.ie,
	dillon@freebsd.org
Subject: Re: kern/15195: Kernel hangs during concurrent NFS writes 
Date: Wed, 01 Dec 1999 16:31:47 +0000

 In message <19991201013446.77E4615AA7@hub.freebsd.org>, hostetlb@agcs.com write
 s:
 
 >We have multiple processes that have opened the same file with mode "a+"
 >(append, read-access).  The file is located on an NFS mounted partition
 >(easiest to reproduce using UDP, NFSv3; but we have seen it using UDP
 >or TCP/IP, and NFSv3 or NFSv2).  When the processes start writing to that
 >file at the same time, the OS get locked in the kernel; no login prompts,
 >no shell prompts, no i/o, nothing.
 
 We have been using the following patch in 3.3-stable to work around this
 problem. I think the (bp->b_dirtyend <= bp->b_dirtyoff) stuff stops the
 hangs, and has already been fixed in -current.
 
 The #if 0'd section is a workaround for related panics in -stable. We've
 been using it on all our busy NFS clients for months. This fix cannot be
 used on -current, as the code run by nfs_getcacheblk() is quite
 different. (Matt explained this, but I've forgotten the details).
 
 Ian
 
 
 Index: nfs_bio.c
 ===================================================================
 RCS file: /FreeBSD/FreeBSD-CVS/src/sys/nfs/nfs_bio.c,v
 retrieving revision 1.65.2.2
 diff -u -r1.65.2.2 nfs_bio.c
 --- nfs_bio.c	1999/08/29 16:30:27	1.65.2.2
 +++ nfs_bio.c	1999/12/01 16:19:03
 @@ -758,10 +758,15 @@
  			vnode_pager_setsize(vp, np->n_size);
  		}
  		bufsize = biosize;
 +#if 0
 +		/* This optimisation causes problems if the file grows while
 +		 * waiting in nfs_getcacheblk(). b_dirtyoff/dirtyend/validoff/
 +		 * validend can end up greater than bufsize */
  		if ((off_t)(lbn + 1) * biosize > np->n_size) {
  			bufsize = np->n_size - (off_t)lbn * biosize;
  			bufsize = (bufsize + DEV_BSIZE - 1) & ~(DEV_BSIZE - 1);
  		}
 +#endif
  		bp = nfs_getcacheblk(vp, lbn, bufsize, p);
  		if (!bp)
  			return (EINTR);
 @@ -774,6 +779,9 @@
  		if ((off_t)bp->b_blkno * DEV_BSIZE + bp->b_dirtyend > np->n_size)
  			bp->b_dirtyend = np->n_size - (off_t)bp->b_blkno * DEV_BSIZE;
  
 +		if (bp->b_dirtyend <= bp->b_dirtyoff)
 +			bp->b_dirtyend = bp->b_dirtyoff = 0;
 +			
  		/*
  		 * If the new write will leave a contiguous dirty
  		 * area, just update the b_dirtyoff and b_dirtyend,
 @@ -1277,6 +1285,7 @@
  		}
  	    } else {
  		bp->b_resid = 0;
 +		bp->b_dirtyend = bp->b_dirtyoff = 0;
  		biodone(bp);
  		return (0);
  	    }
 
State-Changed-From-To: open->closed 
State-Changed-By: dillon 
State-Changed-When: Tue Dec 14 11:13:34 PST 1999 
State-Changed-Why:  
The patch has been committed to -stable and a more involved fix has been 
committed to -current. 
>Unformatted:
