From danderse@cs.utah.edu  Tue Nov 17 11:54:08 1998
Received: from wrath.cs.utah.edu ([155.99.198.98])
          by hub.freebsd.org (8.8.8/8.8.8) with ESMTP id LAA02860
          for <FreeBSD-gnats-submit@freebsd.org>; Tue, 17 Nov 1998 11:54:06 -0800 (PST)
          (envelope-from danderse@cs.utah.edu)
Received: from torrey.cs.utah.edu (torrey.cs.utah.edu [155.99.212.91])
	by wrath.cs.utah.edu (8.8.8/8.8.8) with ESMTP id MAA28531
	for <FreeBSD-gnats-submit@freebsd.org>; Tue, 17 Nov 1998 12:53:40 -0700 (MST)
Received: (from danderse@localhost)
	by torrey.cs.utah.edu (8.9.1/8.9.1) id MAA21319;
	Tue, 17 Nov 1998 12:53:40 -0700 (MST)
	(envelope-from danderse@cs.utah.edu)
Message-Id: <199811171953.MAA21319@torrey.cs.utah.edu>
Date: Tue, 17 Nov 1998 12:53:40 -0700 (MST)
From: David G Andersen <danderse@cs.utah.edu>
Reply-To: danderse@cs.utah.edu
To: FreeBSD-gnats-submit@freebsd.org
Subject: nfs mounts with 'intr' can cause system hang
X-Send-Pr-Version: 3.2

>Number:         8732
>Category:       kern
>Synopsis:       nfs mounts with 'intr' can cause system hang
>Confidential:   no
>Severity:       serious
>Priority:       high
>Responsible:    freebsd-bugs
>State:          closed
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Tue Nov 17 12:00:01 PST 1998
>Closed-Date:    Mon Dec 21 16:47:05 PST 1998
>Last-Modified:  Mon Dec 21 16:47:20 PST 1998
>Originator:     David G Andersen
>Release:        FreeBSD 3.0-CURRENT i386
>Organization:
University of Utah
>Environment:

FreeBSD 3.0, on dual PII-350, 128M ram.
Moderate NFS usage.  The problem is independent of the # processors,
memory, or hardware configuration as far as we could test.

We tested the NFS primarily with NFSv2.  We were using amd, but
the problem is separate from amd.

>Description:

If a program gets a SIGINTR while performing a close() on an NFS
file descriptor, the system will hang.  This only occurs if the
NFS filesystem is mounted with the 'intr' flag and the system
is running nfsiod processes.

In sys/kern/vfs_subr.c, vinvalbuf():

                while (vp->v_numoutput) {
                        vp->v_flag |= VBWAIT;
=>                      tsleep((caddr_t)&vp->v_numoutput,
                                slpflag | (PRIBIO + 1),
                                "vinvlbuf", slptimeo);
                }

The test program is stuck in this loop in vinvalbuf because there is a
SIGINTR pending.  This causes tsleep to return immediately (without sleeping)
with the return value EINTR or ERESTART but they aren't checking the return
value!  Hence, it spins forever in this loop because...

Meanwhile one of the pending nfsbiod's has been awakened because its reply
to the write request has arrived, but it never gets to run.  The other three
nfsbiods are blocked because only one biod can be in the socket receive at a
time.  And until the biods return, v_numoutput won't be decremented.

It works with no nfsbiods because the test program does all the buffer
writes itself so by the time it gets to vinvalbuf, v_numoutput is 0.


>How-To-Repeat:

Run the following program, with args:

./program  <path to NFS file>  1000

(the 1000 tells it to do 1000 opens/closes)

and ctrl-C it while it's running.  May take a few runs to
hang, because it has to interrupt during the flush.

#include <stdio.h>
#include <stdlib.h>
#include <sys/types.h>
#include <fcntl.h>
#include <unistd.h>

int main(int argc, char **argv) {
	char *filename;
	int fd;
	char buffer[10500];
	char filebuf[2048];
	int i;

	filename = argv[1];
	bzero(buffer, sizeof(buffer));

	for (i = 0; i < atoi(argv[2]); i++) {
		sprintf(filebuf, filename, i);
		printf("creating %s\n", filebuf);
		fd = open(filebuf, O_CREAT | O_WRONLY);
		write(fd, buffer, 8192);
		close(fd);
		unlink(filebuf);
	}
	printf("I did it, I did it\n");
	exit(0);
		
}


>Fix:
	
[With commentary stolen shamelessly from Mike Hibler.  Thanks, Mike]

There appear to be a few options for the fix:

1 - return EINTR on the close (close(2) indicates it's a potential
    error code).  This could break a lot of clients.

2 - Ignore SIGINTR during NFS flushes.  This seems like a bad
    idea too.

3 - Something else?

There are really two issues involved.  One is whether the FreeBSD change
to vinvalbuf is even necessary/correct... A cvs annotate shows:
==================

revision 1.156
date: 1998/06/10 22:02:14;  author: julian;  state: Exp;  lines: +4 -2
Replace 'sleep()' with 'tsleep()'
Accidentally imported from Kirk's codebase.

Pointed out by: various.
----------------------------
revision 1.155
date: 1998/06/10 18:13:19;  author: julian;  state: Exp;  lines: +18 -8
Submitted by: Kirk McKusick <mckusick@McKusick.COM>

Fix for potential hang when trying to reboot the system or
to forcibly unmount a soft update enabled filesystem.
FreeBSD already handled the reboot case differently, this is however a better
fix.

==================

So as 1.155 indicates, this change came directly from The Source so I believe
it is necessary.  The change in 1.156 is the key: by changing from the 4.4bsd
non-interruptible "sleep" to the possibly interruptible "tsleep" and OR'ing
in the "slpflag" the problem was introduced--now the sleep became
interruptible when called on an interruptible NFS mount.

That brings us to issue #2 which is what is the correct behavior in this case?
The easy way out is to just not OR in slpflag and go back to full-time non-
interruptibility (#2).  However, that probably isn't necessary.  I'm a
bettin' that you could just slpx() and return the tsleep value (#1)
and all will be fine. (well, as fine as it ever is in the NFS world...)
>Release-Note:
>Audit-Trail:

From: David Malone <dwmalone@maths.tcd.ie>
To: freebsd-gnats-submit@freebsd.org
Cc: danderse@cs.utah.edu, nops@maths.tcd.ie
Subject: Re: kern/8732: nfs mounts with 'intr' can cause system hang
Date: Thu, 17 Dec 1998 15:49:22 +0000

 I've having terrible problems with NFS - I get an uptime of about 3 days
 with NFS 3 and an uptime of about 3 hours with NFS 2 on our 3.0 machines.
 I think one of the problems I've been seeing mathces this PR exactly - the
 machine hangs and when I break to the debugger it seems to be deadlocked in
 vinvalbuf.
 
 The PR suggests that if tsleep returns an error this should returned.
 This would seem to be a sensible thing to do, as it is done just below
 with the return value of another tsleep. The PR also suggests that this
 may break some clients which don't check the return value of close, but
 I think they'll be broken anyway (NFS returns things like disk full on
 close I think).
 
 We have "tested" this suggestion once by jumping out of vinvalbuf from
 the debugger - the machine survived fine for several days afterwards.
 
 The other possibility suggested in this PR is not to allow the tsleep
 to be interupted, which would be far better than having the machine
 hang every few days!
 
 Is there any possibility of of someone commiting either of the fixes? I
 have three 3.0 machines here which I can do testing on if someone would
 like to suggest which is the correct fix.
 
 	David.

From: "David G. Andersen" <danderse@cs.utah.edu>
To: David Malone <dwmalone@maths.tcd.ie>
Cc: freebsd-gnats-submit@freebsd.org, danderse@cs.utah.edu, nops@maths.tcd.ie
Subject: Re: kern/8732: nfs mounts with 'intr' can cause system hang
Date: Thu, 17 Dec 1998 12:08:23 -0700 (MST)

 I decided to take the "return INTR on close, and don't worry about it"
 approach.  This patch does exactly this.  We're testing it here now,
 and it seems to work, and doesn't seem to have broken anything.
 
 Testers wanted. :-)
 
 Index: vfs_subr.c
 ===================================================================
 RCS file: /n/marker/usr/lsrc/FreeBSD/CVS/src/sys/kern/vfs_subr.c,v
 retrieving revision 1.174
 diff -r1.174 vfs_subr.c
 582,584c582,589
 <                       tsleep((caddr_t)&vp->v_numoutput,
 <                               slpflag | (PRIBIO + 1),
 <                               "vinvlbuf", slptimeo);
 ---
 >                       if (error = tsleep((caddr_t)&vp->v_numoutput,
 >                                          slpflag | (PRIBIO + 1),
 >                                          "vinvlbuf", slptimeo)) {
 >                               if (error == EINTR) {
 >                                     splx(s);
 >                                     return (EINTR);
 >                               }
 >                       }

From: David Malone <dwmalone@maths.tcd.ie>
To: "David G. Andersen" <danderse@cs.utah.edu>
Cc: freebsd-gnats-submit@freebsd.org, nops@maths.tcd.ie
Subject: Re: kern/8732: nfs mounts with 'intr' can cause system hang 
Date: Thu, 17 Dec 1998 21:11:21 +0000

 > I decided to take the "return INTR on close, and don't worry about it"
 > approach.  This patch does exactly this.  We're testing it here now,
 > and it seems to work, and doesn't seem to have broken anything.
 
 Grand - I'll try that so.
 
 > Testers wanted. :-)
 
 I've just recompiled kernels for all our 3.0 machines. Mind you it will
 be Janurary before the undergrads come back and give NFS a real good test ;-)
 
 	David.

From: "David G. Andersen" <danderse@cs.utah.edu>
To: Alfred Perlstein <bright@hotjobs.com>
Cc: freebsd-gnats-submit@freebsd.org
Subject: Re: kern/8732: nfs mounts with 'intr' can cause system hang
Date: Thu, 17 Dec 1998 14:31:52 -0700 (MST)

 Okay.. hrm.  Want to try a different one?
 
 Index: vfs_subr.c
 ===================================================================
 RCS file: /n/marker/usr/lsrc/FreeBSD/CVS/src/sys/kern/vfs_subr.c,v
 retrieving revision 1.174
 diff -r1.174 vfs_subr.c
 582,584c582,587
 <                       tsleep((caddr_t)&vp->v_numoutput,
 <                               slpflag | (PRIBIO + 1),
 <                               "vinvlbuf", slptimeo);
 ---
 >                       if (error = tsleep((caddr_t)&vp->v_numoutput,
 >                                          slpflag | (PRIBIO + 1),
 >                                          "vinvlbuf", slptimeo)) {
 >                               splx(s);
 >                               return (error);
 >                       }
 
 
 (Instead of just checking for EINTR, it returns on any error)
 
 I managed to replicate the crash you had with my first patch, but it
 took a lot more pounding for some reason.  I haven't managed to crash
 with this one yet, but who knows.
 
   -Dave
 
 Lo and Behold, Alfred Perlstein said:
 > > On Thu, 17 Dec 1998, David G. Andersen wrote:
 > 
 > whoa! nope, didn't work for me:
 > 
 > crashed on the first mail delete:
 > 
 > db>  ps
 >   pid   proc     addr    uid  ppid  pgrp  flag stat wmesg   wchan   cmd
 >   342 f6920d80 f6925000 1288   341   342 004006  2                  pine
 >   341 f6920ec0 f6922000 1288   340   341 004086  3   pause f69220f0 zsh
 >   340 f6839040 f6917000 1288   339   340 000184  3  select f0299d2c screen-3.7.4
 >   339 f6839540 f6908000 1288   335   339 004186  3   pause f69080f0 screen-3.7.4
 >   335 f6920240 f6982000 1288   333   335 004086  3   pause f69820f0 zsh
 >   334 f6920380 f697e000 1288   332   334 004086  3   ttyin f02725bc zsh
 >   333 f69204c0 f6979000 1288   297   285 004086  3  select f0299d2c kvt
 >   332 f6920600 f6974000 1288   297   285 004086  3  select f0299d2c kvt
 >   331 f6920740 f6970000 1288   297   285 004086  3  select f0299d2c kioslave
 >   330 f6920880 f696c000 1288   297   285 004086  3  select f0299d2c kioslave
 >   314 f6920b00 f692b000 1288     1   285 004186  3  nanslp f0272518 kblob.kss
 >   309 f6838f00 f691a000 1288   295   285 004086  3  nanslp f0272518 maudio
 >   301 f6839180 f6913000 1288   293   285 004086  3  select f0299d2c kpanel
 >   300 f68392c0 f690f000 1288   292   285 004086  3  select f0299d2c krootwm
 >   299 f6839400 f690c000 1288   291   285 004086  3  select f0299d2c kbgndwm
 >   297 f6839680 f6902000 1288   289   285 004086  3  select f0299d2c kfm
 >   296 f68397c0 f68ff000 1288   288   285 004086  3  select f0299d2c kwmsound
 >   295 f6839900 f68fc000 1288   287   285 004086  3  nanslp f0272518 kaudioserver
 >   293 f6839b80 f68f5000 1288   286   285 000086  3    wait f6839b80 sh
 >   292 f6839cc0 f68f2000 1288   286   285 000086  3    wait f6839cc0 sh
 >   291 f6839e00 f68ef000 1288   286   285 000086  3    wait f6839e00 sh
 >   289 f683a080 f68e8000 1288   286   285 000086  3    wait f683a080 sh
 >   288 f683a1c0 f68e5000 1288   286   285 000086  3    wait f683a1c0 sh
 >   287 f683a300 f68e2000 1288   286   285 000086  3    wait f683a300 sh
 >   286 f683a440 f68df000 1288   285   285 004086  3  select f0299d2c kwm
 >   285 f683a580 f68da000 1288   283   285 004086  3    wait f683a580 sh
 >   284 f683a6c0 f68cb000 1288   283   284 004184  3  select f0299d2c Xaccel
 >   283 f683a800 f68c9000 1288   278   278 004086  3    wait f683a800 xinit
 >   278 f683a940 f68c4000 1288   264   278 004086  3    wait f683a940 sh
 >   277 f683aa80 f68c2000    0   273   277 004086  3   ttyin f0294610 cat
 >   273 f683abc0 f68bf000    0   271   273 004086  3   pause f68bf0f0 zsh
 >   271 f683ad00 f68bb000    0   263   271 004086  3   pause f68bb0f0 csh
 >   270 f683ae40 f68b4000    0     1   270 004086  3   ttyin f029797c getty
 >   269 f683af80 f68ae000    0     1   269 004086  3   ttyin f0297888 getty
 >   268 f683b0c0 f68ab000    0     1   268 004086  3   ttyin f0297794 getty
 >   267 f683b200 f68a8000    0     1   267 004086  3   ttyin f02976a0 getty
 >   266 f683b5c0 f6897000    0     1   266 004086  3   ttyin f02975ac getty
 >   265 f683b700 f6894000    0     1   265 004086  3   ttyin f02974b8 getty
 >   264 f683c880 f6850000 1288     1   264 004086  3   pause f68500f0 zsh
 >   263 f683c9c0 f684c000 1288     1   263 004086  3   pause f684c0f0 zsh
 >   235 f683b340 f689f000    0     1   235 000084  3  select f0299d2c sshd1
 >   193 f683b840 f6890000    0     1   193 000084  2                  moused
 >   157 f683b980 f688d000    0     1   157 000184  3  select f0299d2c sendmail
 >   154 f683bac0 f6888000    0     1   154 000084  3  select f0299d2c lpd
 >   150 f683bc00 f6884000    0     1   150 000084  3  nanslp f0272518 cron
 >   147 f683c240 f6875000    0     1   147 000084  3  select f0299d2c inetd
 >   125 f683bd40 f6881000    0     1   120 000084  3  nfsrcvlk f686ccc0 nfsiod
 >   124 f683be80 f687e000    0     1   120 000084  3  nfsrcvlk f686ccc0 nfsiod
 >   123 f683bfc0 f687b000    0     1   120 000084  3  nfsrcvlk f686ccc0 nfsiod
 >   122 f683c100 f6878000    0     1   120 000004  3  sbwait f6660e04 nfsiod
 >   109 f683c380 f6872000    1     1   109 000184  3  select f0299d2c portmap
 >   100 f683c4c0 f6861000    0     1   100 000084  3  select f0299d2c syslogd
 >    33 f683c740 f685b000    0     1    33 000084  3   pause f685b0f0 adjkerntz
 >    26 f683c600 f685e000    0     1    26 000084  3  mfsidl f6835700 mount_mfs
 >     4 f683cb00 f6847000    0     0     0 000204  3  syncer f0299cdc syncer
 >     3 f683cc40 f6845000    0     0     0 000204  3  psleep f02925e4 vmdaemon
 >     2 f683cd80 f6843000    0     0     0 000204  3  psleep f0265370 pagedaemon
 >     1 f683cec0 f6841000    0     0     1 004084  3    wait f683cec0 init
 >     0 f0299038 f02e8000    0     0     0 000204  3   sched f0299038 swapper
 >   290 f6839f40 f68ec000 1288   286   285 002006  5                  sh
 > 
 > my window manager settings got hosed as well. *grrr*
 > 
 > thanks for the attempt though.
 > 
 > Alfred Perlstein - Programmer, HotJobs Inc. - www.hotjobs.com
 > -- There are operating systems, and then there's FreeBSD.
 > -- http://www.freebsd.org/                        3.0-current
 > 
 
 -- 
 work: danderse@cs.utah.edu                     me:  angio@pobox.com
       University of Utah                            http://www.angio.net/
       Department of Computer Science

From: Alfred Perlstein <bright@hotjobs.com>
To: "David G. Andersen" <danderse@cs.utah.edu>
Cc: freebsd-gnats-submit@freebsd.org
Subject: Re: kern/8732: nfs mounts with 'intr' can cause system hang
Date: Thu, 17 Dec 1998 16:51:10 -0500 (EST)

 On Thu, 17 Dec 1998, David G. Andersen wrote:
 
 > 
 > Okay.. hrm.  Want to try a different one?
 > 
 > Index: vfs_subr.c
 > ===================================================================
 > RCS file: /n/marker/usr/lsrc/FreeBSD/CVS/src/sys/kern/vfs_subr.c,v
 
 ...
 
 > (Instead of just checking for EINTR, it returns on any error)
 > 
 > I managed to replicate the crash you had with my first patch, but it
 > took a lot more pounding for some reason.  I haven't managed to crash
 > with this one yet, but who knows.
 > 
 >   -Dave
 > 
 
 someone should publish a mailbox over NFS stress test using pine.
 
 *cough* sorry, i'll check this out later tonight.
 
 i think it hits when pine is deleting mail and a signal arrives because
 the mailbox changed in size.
 
 does this propogate the signal back to the process?
 
 Alfred Perlstein - Programmer, HotJobs Inc. - www.hotjobs.com
 -- There are operating systems, and then there's FreeBSD.
 -- http://www.freebsd.org/                        3.0-current
 

From: Alfred Perlstein <bright@hotjobs.com>
To: "David G. Andersen" <danderse@cs.utah.edu>
Cc: freebsd-gnats-submit@freebsd.org
Subject: Re: kern/8732: nfs mounts with 'intr' can cause system hang
Date: Thu, 17 Dec 1998 17:07:10 -0500 (EST)

 On Thu, 17 Dec 1998, David G. Andersen wrote:
 
 > Okay.. hrm.  Want to try a different one?
 > 
 > Index: vfs_subr.c
 > ===================================================================
 > RCS file: /n/marker/usr/lsrc/FreeBSD/CVS/src/sys/kern/vfs_subr.c,v
 > retrieving revision 1.174
 > diff -r1.174 vfs_subr.c
 > 582,584c582,587
 > <                       tsleep((caddr_t)&vp->v_numoutput,
 > <                               slpflag | (PRIBIO + 1),
 > <                               "vinvlbuf", slptimeo);
 > ---
 > >                       if (error = tsleep((caddr_t)&vp->v_numoutput,
 > >                                          slpflag | (PRIBIO + 1),
 > >                                          "vinvlbuf", slptimeo)) {
 > >                               splx(s);
 > >                               return (error);
 > >                       }
 > 
 > 
 > (Instead of just checking for EINTR, it returns on any error)
 > 
 > I managed to replicate the crash you had with my first patch, but it
 > took a lot more pounding for some reason.  I haven't managed to crash
 > with this one yet, but who knows.
 > 
 >   -Dave
 
 ok, A & B
 
 A) ok, what if tsleep returns ERESTART?  will the kernel allow the process
 to exit the kernel call?  or will it loop retrying the syscall, and not
 allowing the process to return to userland?
 
 unless you say yes, i'll be trying the patch tonight (unplugging the
 ethernet to simulate an interrupted mount, then sending a ^C)
 
 B) well, um, i think i just got it.  Basically a ^C will make tsleep
 return EINTR which will allow the process back to userland, but perhaps a
 SIGIO or some other call will make tsleep give ERESTART which will restart
 the IO?
 
 is it A or B or ???
 
 thanks,
 -Alfred
 

From: Alfred Perlstein <bright@hotjobs.com>
To: "David G. Andersen" <danderse@cs.utah.edu>
Cc: freebsd-gnats-submit@freebsd.org
Subject: Re: kern/8732: nfs mounts with 'intr' can cause system hang
Date: Thu, 17 Dec 1998 19:04:32 -0500 (EST)

 On Thu, 17 Dec 1998, David G. Andersen wrote:
 
 > 
 > Okay.. hrm.  Want to try a different one?
 > 
 > Index: vfs_subr.c
 > ===================================================================
 > RCS file: /n/marker/usr/lsrc/FreeBSD/CVS/src/sys/kern/vfs_subr.c,v
 > 
 > (Instead of just checking for EINTR, it returns on any error)
 > 
 > I managed to replicate the crash you had with my first patch, but it
 > took a lot more pounding for some reason.  I haven't managed to crash
 > with this one yet, but who knows.
 > 
 >   -Dave
 
 Works great, I ran a script to automatically send myself mail every couple
 of seconds, happily deleted mail with 'intr' mounts for a good 10 minutes,
 this used to lock up on the first or second try.
 
 It was an intermidant(sp)? problem but it seems to be gone.
 
 You can ^C a hung df if the ethernet is unplugged and things seem fine
 afterward.
 
 Thanks a lot, great work.
 
 Alfred Perlstein - Programmer, HotJobs Inc. - www.hotjobs.com
 -- There are operating systems, and then there's FreeBSD.
 -- http://www.freebsd.org/                        3.0-current
 
 
 
State-Changed-From-To: open->closed 
State-Changed-By: eivind 
State-Changed-When: Mon Dec 21 16:47:05 PST 1998 
State-Changed-Why:  
I committed the fix (return error if tsleep() does) in vfs_subr.c v1.176 
>Unformatted:
