From nobody@FreeBSD.org  Sat May 17 10:54:16 2008
Return-Path: <nobody@FreeBSD.org>
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 0C6D01065687
	for <freebsd-gnats-submit@FreeBSD.org>; Sat, 17 May 2008 10:54:16 +0000 (UTC)
	(envelope-from nobody@FreeBSD.org)
Received: from www.freebsd.org (www.freebsd.org [IPv6:2001:4f8:fff6::21])
	by mx1.freebsd.org (Postfix) with ESMTP id EE3BE8FC19
	for <freebsd-gnats-submit@FreeBSD.org>; Sat, 17 May 2008 10:54:15 +0000 (UTC)
	(envelope-from nobody@FreeBSD.org)
Received: from www.freebsd.org (localhost [127.0.0.1])
	by www.freebsd.org (8.14.2/8.14.2) with ESMTP id m4HAr0PL033844
	for <freebsd-gnats-submit@FreeBSD.org>; Sat, 17 May 2008 10:53:00 GMT
	(envelope-from nobody@www.freebsd.org)
Received: (from nobody@localhost)
	by www.freebsd.org (8.14.2/8.14.1/Submit) id m4HAr0Yi033843;
	Sat, 17 May 2008 10:53:00 GMT
	(envelope-from nobody)
Message-Id: <200805171053.m4HAr0Yi033843@www.freebsd.org>
Date: Sat, 17 May 2008 10:53:00 GMT
From: Timo Sirainen <tss@iki.fi>
To: freebsd-gnats-submit@FreeBSD.org
Subject: NFS: fstat() fails to return ESTALE with rename()d files
X-Send-Pr-Version: www-3.1
X-GNATS-Notify:

>Number:         123755
>Category:       kern
>Synopsis:       [nfs] fstat() fails to return ESTALE with rename()d files
>Confidential:   no
>Severity:       non-critical
>Priority:       medium
>Responsible:    dfr
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Sat May 17 11:00:05 UTC 2008
>Closed-Date:    
>Last-Modified:  Sun Aug 01 17:10:18 UTC 2010
>Originator:     Timo Sirainen
>Release:        6.2-RELEASE and 7.0-STABLE
>Organization:
>Environment:
amd64
>Description:
I have a file that gets updated by writing to a temp file and then being rename()d over. In another process I want to know if the file got replaced so I can reopen and read the updated contents. So I do this check by:

1. Flush NFS caches (not really relevant)
2. stat(file, &st1)
3. fstat(opened_file_fd, &st2)
 - if it failed with ESTALE, it means the file was replaced
4. if (st1.st_ino != st2.st_ino) then file was replaced

The problem with FreeBSD is that both stats successfully return the same inode even if the file has been replaced already. This is probably because the old file has been deleted from the server and the inode has been reused for the new file. But in that condition the fstat() in 3. check should fail with ESTALE, which it doesn't in my rename() tests (but it does in my unlink() tests, so it's not completely broken).

>How-To-Repeat:
1. Compile the attached test program
2. Run it without any parameters on a FreeBSD NFS client
3. Run it with some parameters on either another NFS client or on the NFS server (doesn't matter which OS)
4. See that with FreeBSD it immediately starts reporting "inodes match, sizes don't" errors and "errors" counter never increases from zero (no commands fail with ESTALE).

If the same program is run on Linux NFS client it doesn't give size mismatch errors and error counter increases, as expected. (I'm pretty sure Solaris NFS client also works, but I couldn't verify it right now.)
>Fix:


Patch attached with submission follows:

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <fcntl.h>
#include <sys/stat.h>
#include <errno.h>

static int myfstat(int fd, struct stat *st)
{
	if (fstat(fd, st) == 0)
		return 0;
	if (errno != ESTALE)
		perror("fstat()");
	return -1;
}

int main(int argc, char *argv[], char **envp)
{
	struct stat st1, st2, st3;
	const char *path;
	int fd, errors = 0, ok = 0;

	path = argc == 1 ? "foo.1" : "foo.2";
	for (;;) {
		if ((errors + ok) % 10 == 0)
			printf("errors %d, ok %d\n", errors, ok);
		fd = creat(path, 0600);
		if (fd == -1) perror("creat()");
		if (argc == 1)
			write(fd, "a", 1);
		else
			write(fd, "bb", 2);
		if (fsync(fd) < 0) perror("fsync()");
		if (rename(path, "foo") < 0) perror("rename()");

		usleep(100 * (rand() % 10));

		if (myfstat(fd, &st1) < 0 ||
		    stat("foo", &st2) < 0 ||
		    myfstat(fd, &st3) < 0)
			errors++;
		else {
			ok++;
			if (st1.st_ino == st2.st_ino &&
			    st1.st_size == st3.st_size &&
			    st1.st_size != st2.st_size) {
				printf("inodes match, sizes don't\n");
			}
		}
		close(fd);
	}
	return 0;
}


>Release-Note:
>Audit-Trail:
Responsible-Changed-From-To: freebsd-bugs->dfr 
Responsible-Changed-By: remko 
Responsible-Changed-When: Sat May 17 12:28:18 UTC 2008 
Responsible-Changed-Why:  
Hi Doug, since you did recent work on the NFS implementation 
you might know this (lucky shot), else please reassign it 
to the bugs team :-). thanks! 

http://www.freebsd.org/cgi/query-pr.cgi?pr=123755 

From: Timo Sirainen <tss@iki.fi>
To: bug-followup@FreeBSD.org
Cc:  
Subject: Re: kern/123755: [nfs] fstat(1) fails to return ESTALE with
	rename()d files
Date: Tue, 27 May 2008 22:49:55 +0300

 I noticed a similar easier-to-reproduce problem:
 
 NFS client: open() a file
 NFS server: unlink() the file
 NFS client: fchown() the file -> ESTALE (as expected)
 NFS client: fstat() the file -> success (not expected)
 
 Since fchown() returned ESTALE, I think it should be remembered that the
 file is gone and fstat() should return ESTALE as well.
 
 

From: Jaakko Heinonen <jh@saunalahti.fi>
To: Timo Sirainen <tss@iki.fi>
Cc: bug-followup@FreeBSD.org
Subject: Re: kern/123755: [nfs] fstat(1) fails to return ESTALE with
	rename()d files
Date: Fri, 20 Feb 2009 14:18:26 +0200

 Hi Timo,
 
 > The problem with FreeBSD is that both stats successfully return the same
 > inode even if the file has been replaced already.
 
 NFS client attribute cache causes this. You can work around the problem
 by disabling the attribute cache with -o acdirmax=0,acregmax=0
 mount options.
 
 On 2008-05-27, Timo Sirainen wrote:
 > I noticed a similar easier-to-reproduce problem:
 >  
 > NFS client: open() a file
 > NFS server: unlink() the file
 > NFS client: fchown() the file -> ESTALE (as expected)
 > NFS client: fstat() the file -> success (not expected)
 >  
 > Since fchown() returned ESTALE, I think it should be remembered that the
 > file is gone and fstat() should return ESTALE as well.
 
 This should be easy to fix. Just invalidate the attribute cache if
 nfs_setattrrpc() fails with ESTALE.
 
 --- patch begins here ---
 Index: sys/nfsclient/nfs_vnops.c
 ===================================================================
 --- sys/nfsclient/nfs_vnops.c	(revision 188842)
 +++ sys/nfsclient/nfs_vnops.c	(working copy)
 @@ -838,6 +838,10 @@ nfs_setattrrpc(struct vnode *vp, struct 
  		nfsm_loadattr(vp, NULL);
  	m_freem(mrep);
  nfsmout:
 +	/* Invalidate the attribute cache if the NFS file handle is stale. */
 +	if (error == ESTALE)
 +		np->n_attrstamp = 0;
 +
  	return (error);
  }
 --- patch ends here ---
 
 However the other case which involves two different files with the same
 inode number is more difficult. AFAIK, there is no easy way to obtain
 the vnode of the other (stale) file in nfs_getattr(). vnode reference
 would be needed to invalidate the attribute cache for the file.
 
 -- 
 Jaakko

From: dfilter@FreeBSD.ORG (dfilter service)
To: bug-followup@FreeBSD.org
Cc:  
Subject: Re: kern/123755: commit references a PR
Date: Mon,  6 Apr 2009 21:11:23 +0000 (UTC)

 Author: jhb
 Date: Mon Apr  6 21:11:08 2009
 New Revision: 190785
 URL: http://svn.freebsd.org/changeset/base/190785
 
 Log:
   When a stale file handle is encountered, purge all cached information about
   an NFS node including the access and attribute caches.  Previously the NFS
   client only purged any name cache entries associated with the file.
   
   PR:		kern/123755
   Submitted by:	Jaakko Heinonen  jh of saunalahti fi
   Reported by:	Timo Sirainen  tss of iki fi
   Reviewed by:	rwatson, rmacklem
   MFC after:	1 month
 
 Modified:
   head/sys/nfs4client/nfs4_socket.c
   head/sys/nfsclient/nfs.h
   head/sys/nfsclient/nfs_krpc.c
   head/sys/nfsclient/nfs_socket.c
   head/sys/nfsclient/nfs_subs.c
 
 Modified: head/sys/nfs4client/nfs4_socket.c
 ==============================================================================
 --- head/sys/nfs4client/nfs4_socket.c	Mon Apr  6 20:17:28 2009	(r190784)
 +++ head/sys/nfs4client/nfs4_socket.c	Mon Apr  6 21:11:08 2009	(r190785)
 @@ -259,7 +259,7 @@ nfs4_request(struct vnode *vp, struct mb
  	 ** lookup cache, just in case.
  	 **/
  	if (error == ESTALE)
 -		cache_purge(vp);
 +		nfs_purgecache(vp);
  
  	return (error);
  }
 
 Modified: head/sys/nfsclient/nfs.h
 ==============================================================================
 --- head/sys/nfsclient/nfs.h	Mon Apr  6 20:17:28 2009	(r190784)
 +++ head/sys/nfsclient/nfs.h	Mon Apr  6 21:11:08 2009	(r190785)
 @@ -322,6 +322,7 @@ void	nfs_down(struct nfsreq *, struct nf
  #endif /* ! NFS4_USE_RPCCLNT */
  #endif
  
 +void	nfs_purgecache(struct vnode *);
  int	nfs_vinvalbuf(struct vnode *, int, struct thread *, int);
  int	nfs_readrpc(struct vnode *, struct uio *, struct ucred *);
  int	nfs_writerpc(struct vnode *, struct uio *, struct ucred *, int *,
 
 Modified: head/sys/nfsclient/nfs_krpc.c
 ==============================================================================
 --- head/sys/nfsclient/nfs_krpc.c	Mon Apr  6 20:17:28 2009	(r190784)
 +++ head/sys/nfsclient/nfs_krpc.c	Mon Apr  6 21:11:08 2009	(r190785)
 @@ -557,7 +557,7 @@ tryagain:
  		 * cache, just in case.
  		 */
  		if (error == ESTALE)
 -			cache_purge(vp);
 +			nfs_purgecache(vp);
  		/*
  		 * Skip wcc data on NFS errors for now. NetApp filers
  		 * return corrupt postop attrs in the wcc data for NFS
 
 Modified: head/sys/nfsclient/nfs_socket.c
 ==============================================================================
 --- head/sys/nfsclient/nfs_socket.c	Mon Apr  6 20:17:28 2009	(r190784)
 +++ head/sys/nfsclient/nfs_socket.c	Mon Apr  6 21:11:08 2009	(r190785)
 @@ -1364,7 +1364,7 @@ wait_for_pinned_req:
  			 * lookup cache, just in case.
  			 */
  			if (error == ESTALE)
 -				cache_purge(vp);
 +				nfs_purgecache(vp);
  			/*
  			 * Skip wcc data on NFS errors for now. NetApp filers return corrupt
  			 * postop attrs in the wcc data for NFS err EROFS. Not sure if they 
 
 Modified: head/sys/nfsclient/nfs_subs.c
 ==============================================================================
 --- head/sys/nfsclient/nfs_subs.c	Mon Apr  6 20:17:28 2009	(r190784)
 +++ head/sys/nfsclient/nfs_subs.c	Mon Apr  6 21:11:08 2009	(r190785)
 @@ -865,6 +865,29 @@ nfs_getattrcache(struct vnode *vp, struc
  	return (0);
  }
  
 +/*
 + * Purge all cached information about an NFS vnode including name
 + * cache entries, the attribute cache, and the access cache.  This is
 + * called when an NFS request for a node fails with a stale
 + * filehandle.
 + */
 +void
 +nfs_purgecache(struct vnode *vp)
 +{
 +	struct nfsnode *np;
 +	int i;
 +
 +	np = VTONFS(vp);
 +	cache_purge(vp);
 +	mtx_lock(&np->n_mtx);
 +	np->n_attrstamp = 0;
 +	KDTRACE_NFS_ATTRCACHE_FLUSH_DONE(vp);
 +	for (i = 0; i < NFS_ACCESSCACHESIZE; i++)
 +		np->n_accesscache[i].stamp = 0;
 +	KDTRACE_NFS_ACCESSCACHE_FLUSH_DONE(vp);
 +	mtx_unlock(&np->n_mtx);
 +}
 +
  static nfsuint64 nfs_nullcookie = { { 0, 0 } };
  /*
   * This function finds the directory cookie that corresponds to the
 _______________________________________________
 svn-src-all@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/svn-src-all
 To unsubscribe, send any mail to "svn-src-all-unsubscribe@freebsd.org"
 

From: John Baldwin <jhb@FreeBSD.org>
To: bug-followup@freebsd.org,
 tss@iki.fi
Cc:  
Subject: Re: kern/123755: [nfs] fstat(1) fails to return ESTALE with rename()d files
Date: Mon, 6 Apr 2009 17:31:36 -0400

 I think the original case with the two files is just a property of the 
 attribute cache.  The fstat() returns cached attributes for up to 3 seconds 
 before it will fetch new attributes from the NFS server.  Only then will it 
 get ESTALE and at that point fstat() would start failing with ESTALE.  You 
 can either reduce 'acregmin' to lower the cache timeout from 3 seconds (this 
 will increase your NFS traffic though) or if your app is periodically 
 checking for the new file it will simply notice about 3 seconds "late" when 
 it gets ESTALE.
 
 -- 
 John Baldwin

From: Timo Sirainen <tss@iki.fi>
To: John Baldwin <jhb@FreeBSD.org>
Cc: bug-followup@FreeBSD.org
Subject: Re: kern/123755: [nfs] fstat(1) fails to return ESTALE with
 rename()d files
Date: Mon, 06 Apr 2009 17:44:59 -0400

 --=-D8Zujor2qMJzqVtvka0J
 Content-Type: text/plain
 Content-Transfer-Encoding: quoted-printable
 
 On Mon, 2009-04-06 at 17:31 -0400, John Baldwin wrote:
 > I think the original case with the two files is just a property of the=20
 > attribute cache.  The fstat() returns cached attributes for up to 3 secon=
 ds=20
 > before it will fetch new attributes from the NFS server.  Only then will =
 it=20
 > get ESTALE and at that point fstat() would start failing with ESTALE.  Yo=
 u=20
 > can either reduce 'acregmin' to lower the cache timeout from 3 seconds (t=
 his=20
 > will increase your NFS traffic though) or if your app is periodically=20
 > checking for the new file it will simply notice about 3 seconds "late" wh=
 en=20
 > it gets ESTALE.
 
 Yeah, I was just wishing that my application would have been fully
 usable when 2+ FreeBSD NFS clients are accessing the same files. I have
 added a lot of code that tries to make sure attribute caches get flushed
 when necessary and my app notices when files are rename()d over, because
 failure to do that results in errors and/or corruption.
 
 --=-D8Zujor2qMJzqVtvka0J
 Content-Type: application/pgp-signature; name="signature.asc"
 Content-Description: This is a digitally signed message part
 
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.9 (GNU/Linux)
 
 iEYEABECAAYFAknad9sACgkQyUhSUUBVisn3+QCfbwdWbN9Kus7M1bBxJSnCEE2g
 rl4An06UTx3jRFfbW8sWiZTGWZcr4+H/
 =zsK7
 -----END PGP SIGNATURE-----
 
 --=-D8Zujor2qMJzqVtvka0J--
 

From: John Baldwin <jhb@freebsd.org>
To: Timo Sirainen <tss@iki.fi>
Cc: bug-followup@freebsd.org
Subject: Re: kern/123755: [nfs] fstat(1) fails to return ESTALE with rename()d files
Date: Tue, 7 Apr 2009 09:36:53 -0400

 On Monday 06 April 2009 5:44:59 pm Timo Sirainen wrote:
 > On Mon, 2009-04-06 at 17:31 -0400, John Baldwin wrote:
 > > I think the original case with the two files is just a property of the 
 > > attribute cache.  The fstat() returns cached attributes for up to 3 
 seconds 
 > > before it will fetch new attributes from the NFS server.  Only then will 
 it 
 > > get ESTALE and at that point fstat() would start failing with ESTALE.  You 
 > > can either reduce 'acregmin' to lower the cache timeout from 3 seconds 
 (this 
 > > will increase your NFS traffic though) or if your app is periodically 
 > > checking for the new file it will simply notice about 3 seconds "late" 
 when 
 > > it gets ESTALE.
 > 
 > Yeah, I was just wishing that my application would have been fully
 > usable when 2+ FreeBSD NFS clients are accessing the same files. I have
 > added a lot of code that tries to make sure attribute caches get flushed
 > when necessary and my app notices when files are rename()d over, because
 > failure to do that results in errors and/or corruption.
 
 You can't really reliable flush the attribute caches from userland AFAIK.  
 Also, other OS's also use attribute caches in their NFS clients (e.g. 
 Solaris, etc.).
 
 -- 
 John Baldwin
>Unformatted:
