From dillon@backplane.com  Sat Apr  4 17:23:27 1998
Received: from apollo.backplane.com (apollo.backplane.com [207.33.240.2])
          by hub.freebsd.org (8.8.8/8.8.8) with ESMTP id RAA04139
          for <FreeBSD-gnats-submit@freebsd.org>; Sat, 4 Apr 1998 17:23:25 -0800 (PST)
          (envelope-from dillon@backplane.com)
Received: (root@localhost) by apollo.backplane.com (8.8.8/8.6.5) id RAA07844; Sat, 4 Apr 1998 17:23:20 -0800 (PST)
Message-Id: <199804050123.RAA07844@apollo.backplane.com>
Date: Sat, 4 Apr 1998 17:23:20 -0800 (PST)
From: Matthew Dillon <dillon@backplane.com>
Reply-To: dillon@backplane.com
To: FreeBSD-gnats-submit@freebsd.org
Subject: MFS msync bug, MFS-related pager bug (with fixes)
X-Send-Pr-Version: 3.2

>Number:         6212
>Category:       kern
>Synopsis:       Two bugs with MFS filesystems fixed, one feature added
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    dillon
>State:          closed
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Sat Apr  4 17:30:01 PST 1998
>Closed-Date:    Mon Mar 26 16:54:42 PST 2001
>Last-Modified:  Mon Mar 26 16:55:14 PST 2001
>Originator:     Matthew Dillon
>Release:        FreeBSD 3.0-CURRENT i386
>Organization:
Best Internet Communications
>Environment:

	Pentium based diskless FreeBSD box.

>Description:

	The kernel does not set the P_SYSTEM flag for the MFS filesystem
	processes.  Due to the size of the processes this will result
	in the kernel attempting to kill the process over and over again
	if it runs out of swap, which really screws the machine up.
	(note: my one-liner fix to this is probably not in the right place)

	The MFS kernel process needs to msync() the memory map to backing
	store (which only effects MFS mounts that use a file for backing store).
	If it fails to do so, the kernel syncer will *never* *see* the dirty
	pages.  (note:  I change the tsleep() to tsleep() with a timeout and
	check the time, calling msync() every 30 seconds).

	The mount_mfs program (the mkfs.c code) insists on clearing the 
	backing store file if one has been specified.   There are lots of
	people who probably would like to be able to use an NFS mounted file
	for backing store, and this clearing results in a massive amount
	of network I/O (especially if you are mounting huge filesystems).
	Also included in my bug fixes is a modification to the program to
	check the size of the file and only truncate/pre-initialize it
	if it does not match the size of the requested filesystem.  If it
	does match, mount_mfs does not bother to clear it and simply
	newfs's over whatever data was previously there.

>How-To-Repeat:

	

>Fix:
	

Index: mfs_vfsops.c
===================================================================
RCS file: /src/FreeBSD-CVS/ncvs/src/sys/ufs/mfs/mfs_vfsops.c,v
retrieving revision 1.41
diff -r1.41 mfs_vfsops.c
48a49,50
> #include <sys/sysproto.h>	/* for msync_args */
> #include <sys/mman.h>	/* for msync_args */
432a435,441
> 	/*
> 	 * must mark the calling process as a system process
> 	 * so the pager doesn't try to kill it.  Doh!  And the
> 	 * pager may because the resident set size may be huge.
> 	 */
> 	p->p_flag |= P_SYSTEM;
> 
483c492
< 		else if (tsleep((caddr_t)vp, mfs_pri, "mfsidl", 0))
---
> 		else if (tsleep((caddr_t)vp, mfs_pri, "mfsidl", hz * 10))
484a494,518
> 
> 		/*
> 		 * we should call msync on the backing store every 30 seconds,
> 		 * otherwise the pages are not associated with the file and guess
> 		 * what!  the syncer never sees them.  msync has no effect 
> 		 * if the backing store is swap, but a big effect if it's a file
> 		 * (e.g. an NFS mounted file).
> 		 */
> 		{
> 			static long lsec;
> 			int dt = time_second - lsec;
> 
> 			if (dt < -30 || dt > 30) {
> 				struct msync_args uap;
> 
> 				lsec = time_second;
> 
> 				uap.addr = mfsp->mfs_baseoff;
> 				uap.len = mfsp->mfs_size;
> 				uap.flags = MS_ASYNC;
> 
> 				msync(curproc, &uap);
> 			}
> 		}
> 
Index: mkfs.c
===================================================================
RCS file: /src/FreeBSD-CVS/ncvs/src/sbin/newfs/mkfs.c,v
retrieving revision 1.20
diff -r1.20 mkfs.c
42a43
> #include <sys/stat.h>
181c182,184
< 			fd = open(filename,O_RDWR|O_TRUNC|O_CREAT,0644);
---
> 			struct stat st;
> 
> 			fd = open(filename,O_RDWR|O_CREAT,0644);
186,193c189,200
< 			for(l=0;l< fssize * sectorsize;l += l1) {
< 				l1 = fssize * sectorsize;
< 				if (BUFSIZ < l1)
< 					l1 = BUFSIZ;
< 				if (l1 != write(fd,buf,l1)) {
< 					perror(filename);
< 					exit(12);
< 				}
---
> 			fstat(fd, &st);
> 			if (st.st_size != fssize * sectorsize) {
> 			    ftruncate(fd, fssize * sectorsize);
> 			    for(l=0;l< fssize * sectorsize;l += l1) {
> 				    l1 = fssize * sectorsize;
> 				    if (BUFSIZ < l1)
> 					    l1 = BUFSIZ;
> 				    if (l1 != write(fd,buf,l1)) {
> 					    perror(filename);
> 					    exit(12);
> 				    }
> 			    }


>Release-Note:
>Audit-Trail:

From: Matthew Dillon <dillon@backplane.com>
To: freebsd-gnats-submit@freebsd.org
Cc:  Subject: Re: kern/6212: Two bugs with MFS filesystem fixed, two features added
Date: Mon, 6 Apr 1998 21:22:21 -0700 (PDT)

     This is a revised submission to my original submission.  Note that the
     diff included below is relative to the CVS base, *NOT* to my original
     submission.
 
     This submission fixes two bugs with MFS and adds two features that allow
     MFS to be used properly in a diskless workstation environment.
 
     Bug #1 fixed:	kernel does not set P_SYSTEM flag for MFS special
 			kernel process, causing paging system to attempt to
 			kill the MFS process if memory runs low.
 
     Bug #2 fixed:	When using file-backed storage, the dirty pages are
 			never synchronized to the backing store by the
 			kernel update/syncer daemon.  The MFS special kernel
 			process must call msync().  The fix causes it to call
 			msync() every 30 seconds.
 
     Feature #1 added:	MFS does not attempt to zero-out the file before 
 			newfs'ing it if the file is already the correct 
 			size.  This allows MFS to be used with NFS-mounted
 			backing store without eating the network alive.
 			(e.g. my workstation has two 32MB MFS mounts and 
 			one 8MB mount.  That hurts on a 10BaseT network
 			without this fix).
 
     Feature #2 added:	The ability to specify file-backed storage and have
 			MFS *NOT* newfs the storage, allowing persistent
 			backing store to survive a reboot.
 
 			This is critical in a workstation environment because
 			it allows MFS to be used over an NFS-based file backing
 			store in a persistant fashion (i.e. for a small /home
 			so one can use SSH and .Xauthority in a diskless
 			workstation environment).
 
 			Note that fsck works just fine on the backing store
 			file.
 
     Finally, it should be noted that being able to use an NFS-based file
     for backing store for an MFS filesystem is critical in a diskless
     workstation environment.  It allows the filesystems in question to not
     eat up memory, especially if the workstation environment has no swap
     at all.
 
     I would also like to submit instructions to the FreeBSD manual to 
     describe how to setup a diskless (floppy boot) workstation environment.
     It really is quite simple.. a read-only NFS mount for / and /usr, a
     secure r+w NFS mount for the MFS filesystem images, and three MFS 
     filesytems (two transitory: /var and /var/tmp, and one persistent: /home).
 
     It's a fantastic environment, and MFS is extremely stable where as trying
     to use vnconfig with an NFS-backed file leads to massive system corruption
     and crashes even under FreeBSD-3.0-current.  Since MFS disassociates I/O
     with a separate kernel process, it can deal with NFS based backing store
     without screwing up the machine.
 
 						-Matt
 
     Matthew Dillon   Engineering, BEST Internet Communications, Inc.
 		     <dillon@backplane.com>
     [always include a portion of the original email in any response!]
 
 
 Index: mkfs.c
 ===================================================================
 RCS file: /src/FreeBSD-CVS/ncvs/src/sbin/newfs/mkfs.c,v
 retrieving revision 1.21
 diff -r1.21 mkfs.c
 42a43
 > #include <sys/stat.h>
 108a110
 > extern int	skipnewfs;
 181c183,185
 < 			fd = open(filename,O_RDWR|O_TRUNC|O_CREAT,0644);
 ---
 > 			struct stat st;
 > 
 > 			fd = open(filename,O_RDWR|O_CREAT,0644);
 186,191c190,193
 < 			for(l=0;l< fssize * sectorsize;l += l1) {
 < 				l1 = fssize * sectorsize;
 < 				if (BUFSIZ < l1)
 < 					l1 = BUFSIZ;
 < 				if (l1 != write(fd,buf,l1)) {
 < 					perror(filename);
 ---
 > 			fstat(fd, &st);
 > 			if (st.st_size != fssize * sectorsize) {
 > 				if (skipnewfs) {
 > 					fprintf(stderr, "Filesize does not match filesystem sector count\n");
 193a196,205
 > 				ftruncate(fd, fssize * sectorsize);
 > 				for(l=0;l< fssize * sectorsize;l += l1) {
 > 					l1 = fssize * sectorsize;
 > 					if (BUFSIZ < l1)
 > 						l1 = BUFSIZ;
 > 					if (l1 != write(fd,buf,l1)) {
 > 						perror(filename);
 > 						exit(12);
 > 					}
 > 				}
 218a231,232
 > 	if (skipnewfs == 0) { 	/* didn't re-indent context so submitted cvs diff would be more readable */
 > 
 701a716,718
 > 
 > 	}	/* endif skipnewfs */
 > 
 Index: newfs.c
 ===================================================================
 RCS file: /src/FreeBSD-CVS/ncvs/src/sbin/newfs/newfs.c,v
 retrieving revision 1.18
 diff -r1.18 newfs.c
 195a196
 > int	skipnewfs;
 237c238
 < 	    "NF:T:a:b:c:d:e:f:i:m:o:s:" :
 ---
 > 	    "NF:U:T:a:b:c:d:e:f:i:m:o:s:" :
 255a257,259
 > 		case 'U':
 > 			skipnewfs = 1;
 > 			/* fall through */
 Index: mfs_vfsops.c
 ===================================================================
 RCS file: /src/FreeBSD-CVS/ncvs/src/sys/ufs/mfs/mfs_vfsops.c,v
 retrieving revision 1.41
 diff -r1.41 mfs_vfsops.c
 48a49,50
 > #include <sys/sysproto.h>	/* for msync_args */
 > #include <sys/mman.h>	/* for msync_args */
 432a435,441
 > 	/*
 > 	 * must mark the calling process as a system process
 > 	 * so the pager doesn't try to kill it.  Doh!  And the
 > 	 * pager may because the resident set size may be huge.
 > 	 */
 > 	p->p_flag |= P_SYSTEM;
 > 
 471c480,481
 < 		 * EINTR/ERESTART.
 ---
 > 		 * EINTR/ERESTART.  It will return EWOULDBLOCK if the timer
 > 		 * expired.
 482,483c492,495
 < 		}
 < 		else if (tsleep((caddr_t)vp, mfs_pri, "mfsidl", 0))
 ---
 > 		} else {
 > 		    int r = tsleep((caddr_t)vp, mfs_pri, "mfsidl", hz * 10);
 > 
 > 		    if (r && r != EWOULDBLOCK)
 484a497,522
 > 		}
 > 
 > 		/*
 > 		 * we should call msync on the backing store every 30 seconds,
 > 		 * otherwise the pages are not associated with the file and guess
 > 		 * what!  the syncer never sees them.  msync has no effect 
 > 		 * if the backing store is swap, but a big effect if it's a file
 > 		 * (e.g. an NFS mounted file).
 > 		 */
 > 		{
 > 			static long lsec;
 > 			int dt = time_second - lsec;
 > 
 > 			if (dt < -30 || dt > 30) {
 > 				struct msync_args uap;
 > 
 > 				lsec = time_second;
 > 
 > 				uap.addr = mfsp->mfs_baseoff;
 > 				uap.len = mfsp->mfs_size;
 > 				uap.flags = MS_ASYNC;
 > 
 > 				msync(curproc, &uap);
 > 			}
 > 		}
 > 
 

From: Peter Wemm <peter@netplex.com.au>
To: Matthew Dillon <dillon@backplane.com>
Cc: freebsd-gnats-submit@hub.FreeBSD.ORG
Subject: Re: kern/6212: Two bugs with MFS filesystem fixed, two features added 
Date: Tue, 07 Apr 1998 13:46:25 +0800

 Matthew Dillon wrote:
 [..]
 >  
 >      It's a fantastic environment, and MFS is extremely stable where as tryin
     g
 >      to use vnconfig with an NFS-backed file leads to massive system corrupti
     on
 >      and crashes even under FreeBSD-3.0-current.  Since MFS disassociates I/O
 >      with a separate kernel process, it can deal with NFS based backing store
 >      without screwing up the machine.
 
 Hmm!  Now this is interesting, I'd never thought of trying that before.
 
 Although, I'm puzzled why the msync() is needed at all, since it's just a
 mmap'ed file.  Perhaps there is some new lurking problem with synchronizing
 of mmap'ed files... :-/
 
 >  						-Matt
 
 Can you please do us a favour and supply a context or unified diff?
 ie: 'cvs diff -u'  Your patch could then be automatically applied rather
 than having to guess the context by hand.
 
 >  Index: mkfs.c
 >  ===================================================================
 >  RCS file: /src/FreeBSD-CVS/ncvs/src/sbin/newfs/mkfs.c,v
 >  retrieving revision 1.21
 >  diff -r1.21 mkfs.c
 >  42a43
 >  > #include <sys/stat.h>
 >  108a110
 >  > extern int	skipnewfs;
 >  181c183,185
 >  < 			fd = open(filename,O_RDWR|O_TRUNC|O_CREAT,0644);
 >  ---
 >  > 			struct stat st;
 >  > 
 >  > 			fd = open(filename,O_RDWR|O_CREAT,0644);
 
 Cheers,
 -Peter
 --
 Peter Wemm <peter@netplex.com.au>   Netplex Consulting
 
 

From: Matthew Dillon <dillon@backplane.com>
To: Peter Wemm <peter@netplex.com.au>
Cc: freebsd-gnats-submit@hub.FreeBSD.ORG
Subject: Re: kern/6212: Two bugs with MFS filesystem fixed, two features added 
Date: Tue, 7 Apr 1998 00:58:42 -0700 (PDT)

 :Hmm!  Now this is interesting, I'd never thought of trying that before.
 :
 :Although, I'm puzzled why the msync() is needed at all, since it's just a
 :mmap'ed file.  Perhaps there is some new lurking problem with synchronizing
 :of mmap'ed files... :-/
 
     Yah, this is what happens:  I have a workstation with 64MB of ram, 
     *no swap* configured, and several file-over-NFS-backed MFS 
     filesystems.
 
     Without that msync() hack, the RSS for the mfs processes continues to
     build until the machine runs out of memory.  If I 'sync' I'm ok ... it
     writes out the dirty pages.  If I don't, the machine barfs when it
     runs out of memory.   When I added the msync hack to the mfs kernel code,
     that also appeared to fix the problem.  If I don't sync and don't msync,
     the MFS filesystem's pages stay dirty and are never synchronized with
     their backing store.
 
 :Can you please do us a favour and supply a context or unified diff?
 :ie: 'cvs diff -u'  Your patch could then be automatically applied rather
 :than having to guess the context by hand.
 
     Ah! cvs diff -u.  ok.  should I bother resubmitting this bug report or
     can I just start doing that in the future ?
 
 					-Matt
 
 :>  Index: mkfs.c
 :>  ===================================================================
 :>  RCS file: /src/FreeBSD-CVS/ncvs/src/sbin/newfs/mkfs.c,v
 :>  retrieving revision 1.21
 :>  diff -r1.21 mkfs.c
 :>  42a43
 :>  > #include <sys/stat.h>
 :>  108a110
 :>  > extern int	skipnewfs;
 :>  181c183,185
 :>  < 			fd = open(filename,O_RDWR|O_TRUNC|O_CREAT,0644);
 :>  ---
 :>  > 			struct stat st;
 :>  > 
 :>  > 			fd = open(filename,O_RDWR|O_CREAT,0644);
 :
 :Cheers,
 :-Peter
 :--
 :Peter Wemm <peter@netplex.com.au>   Netplex Consulting
 :
 :
 :
 
     Matthew Dillon   Engineering, BEST Internet Communications, Inc.
 		     <dillon@backplane.com>
     [always include a portion of the original email in any response!]

From: Peter Wemm <peter@netplex.com.au>
To: dg@root.com
Cc: dyson@freebsd.org, freebsd-gnats-submit@freebsd.org
Subject: Re: kern/6212: Two bugs with MFS filesystem fixed, two features added 
Date: Tue, 07 Apr 1998 20:56:13 +0800

 David Greenman wrote:
 > > Although, I'm puzzled why the msync() is needed at all, since it's just a
 > > mmap'ed file.  Perhaps there is some new lurking problem with synchronizing
 > > of mmap'ed files... :-/
 > 
 >    We've gone both ways with this issue in the past, but the final resolution
 > was that it is better to require the msync() rather than automatically
 > flushing VM system managed pages to the backing store.
 
 Hmm..  Matt points out that a 'sync' command will cause the writeback, the
 dirty pages will get written to the file providing the mmap() backing
 store. The pre-softdep code ran a sync every 30 seconds in process 3
 (update) so this was being "handled".
 
 However, the syncer process doesn't seem to do this any more..  It looks
 like the syncer is not doing that old 30-second sync did.
 
 I suspect that this is a result of the differences between the FreeBSD vs.
 the 4.4Lite2 derived systems that Kirk wrote the code on.
 
 Hmm.. the old update() called 'vfs_msync(mp)' for each mountpoint every 30
 seconds, and this does a walkthrough of the vnodes with OBJ_MIGHTBEDIRTY
 and does a vm_object_page_clean() on them.  This is no longer happening
 after softdep, and probably explains the failure to write out dirty mmap
 regions. vfs_msync() (called at sync(2), unmount(2) and from the old update 
 process) is documented as:
 /*
  * perform msync on all vnodes under a mount point
  * the mount point must be locked.
  */
 
 I think the loss of this call that is the culprit.
 
 Hmm.  syncer called VOP_FSYNC() which schedules a vm_object_page_clean() on 
 all vnodes that it knows about.  We could get an aged graduated writeback by 
 tagging all "dirty" mmap vnodes as needing "work to be done" and syncer will 
 take care of it for us.
 
 Or, perhaps something like this (UNTESTED!!):
 Index: vfs_subr.c
 ===================================================================
 RCS file: /home/ncvs/src/sys/kern/vfs_subr.c,v
 retrieving revision 1.148
 diff -u -r1.148 vfs_subr.c
 --- vfs_subr.c	1998/03/30 09:51:08	1.148
 +++ vfs_subr.c	1998/04/07 12:46:55
 @@ -2742,6 +2740,7 @@
  		return (0);
  	asyncflag = mp->mnt_flag & MNT_ASYNC;
  	mp->mnt_flag &= ~MNT_ASYNC;
 +	vfs_msync(mp, MNT_NOWAIT);
  	VFS_SYNC(mp, MNT_LAZY, ap->a_cred, p);
  	if (asyncflag)
  		mp->mnt_flag |= MNT_ASYNC;
 
 This basically makes the periodic pseudo-sync do an implied msync like it
 did before the softdep changes.  This is a little more coarse though, 
 it'll cause the writeback to be scheduled in periodic chunks rather than 
 the smooth flow that the syncer tries to achieve.
 
 Cheers,
 -Peter
 --
 Peter Wemm <peter@netplex.com.au>   Netplex Consulting
 
 

From: Matthew Dillon <dillon@backplane.com>
To: David Greenman <dg@root.com>
Cc: freebsd-bugs@hub.freebsd.org, freebsd-gnats-submit@hub.freebsd.org
Subject: Re: kern/6212: Two bugs with MFS filesystem fixed, two features added 
Date: Tue, 7 Apr 1998 11:54:19 -0700 (PDT)

 :
 :   ...only if you want us to commit the changes. :-)
 :
 :-DG
 :
 :David Greenman
 :Core-team/Principal Architect, The FreeBSD Project
 
     heh heh.
 
 
     Matthew Dillon   Engineering, BEST Internet Communications, Inc.
 		     <dillon@backplane.com>
     [always include a portion of the original email in any response!]
 
 
 Index: mkfs.c
 ===================================================================
 RCS file: /src/FreeBSD-CVS/ncvs/src/sbin/newfs/mkfs.c,v
 retrieving revision 1.21
 diff -u -r1.21 mkfs.c
 --- mkfs.c	1998/01/19 16:55:26	1.21
 +++ mkfs.c	1998/04/07 04:06:28
 @@ -40,6 +40,7 @@
  #include <sys/time.h>
  #include <sys/wait.h>
  #include <sys/resource.h>
 +#include <sys/stat.h>
  #include <ufs/ufs/dinode.h>
  #include <ufs/ufs/dir.h>
  #include <ufs/ffs/fs.h>
 @@ -106,6 +107,7 @@
  extern caddr_t	malloc(), calloc();
  #endif
  extern char *	filename;
 +extern int	skipnewfs;
  
  union {
  	struct fs fs;
 @@ -178,19 +180,29 @@
  		if(filename) {
  			unsigned char buf[BUFSIZ];
  			unsigned long l,l1;
 -			fd = open(filename,O_RDWR|O_TRUNC|O_CREAT,0644);
 +			struct stat st;
 +
 +			fd = open(filename,O_RDWR|O_CREAT,0644);
  			if(fd < 0) {
  				perror(filename);
  				exit(12);
  			}
 -			for(l=0;l< fssize * sectorsize;l += l1) {
 -				l1 = fssize * sectorsize;
 -				if (BUFSIZ < l1)
 -					l1 = BUFSIZ;
 -				if (l1 != write(fd,buf,l1)) {
 -					perror(filename);
 +			fstat(fd, &st);
 +			if (st.st_size != fssize * sectorsize) {
 +				if (skipnewfs) {
 +					fprintf(stderr, "Filesize does not match filesystem sector count\n");
  					exit(12);
  				}
 +				ftruncate(fd, fssize * sectorsize);
 +				for(l=0;l< fssize * sectorsize;l += l1) {
 +					l1 = fssize * sectorsize;
 +					if (BUFSIZ < l1)
 +						l1 = BUFSIZ;
 +					if (l1 != write(fd,buf,l1)) {
 +						perror(filename);
 +						exit(12);
 +					}
 +				}
  			}
  			membase = mmap(
  				0,
 @@ -216,6 +228,8 @@
  			}
  		}
  	}
 +	if (skipnewfs == 0) { 	/* didn't re-indent context so submitted cvs diff would be more readable */
 +
  	fsi = fi;
  	fso = fo;
  	if (Oflag) {
 @@ -699,6 +713,9 @@
  	pp->p_fsize = sblock.fs_fsize;
  	pp->p_frag = sblock.fs_frag;
  	pp->p_cpg = sblock.fs_cpg;
 +
 +	}	/* endif skipnewfs */
 +
  	/*
  	 * Notify parent process of success.
  	 * Dissociate from session and tty.
 Index: newfs.c
 ===================================================================
 RCS file: /src/FreeBSD-CVS/ncvs/src/sbin/newfs/newfs.c,v
 retrieving revision 1.18
 diff -u -r1.18 newfs.c
 --- newfs.c	1998/01/16 06:31:23	1.18
 +++ newfs.c	1998/04/07 04:02:25
 @@ -193,6 +193,7 @@
  u_long	memleft;		/* virtual memory available */
  caddr_t	membase;		/* start address of memory based filesystem */
  char	*filename;
 +int	skipnewfs;
  #ifdef COMPAT
  char	*disktype;
  int	unlabeled;
 @@ -234,7 +235,7 @@
  	}
  
  	opstring = mfs ?
 -	    "NF:T:a:b:c:d:e:f:i:m:o:s:" :
 +	    "NF:U:T:a:b:c:d:e:f:i:m:o:s:" :
  	    "NOS:T:a:b:c:d:e:f:i:k:l:m:n:o:p:r:s:t:u:x:";
  	while ((ch = getopt(argc, argv, opstring)) != -1)
  		switch (ch) {
 @@ -253,6 +254,9 @@
  			disktype = optarg;
  			break;
  #endif
 +		case 'U':
 +			skipnewfs = 1;
 +			/* fall through */
  		case 'F':
  			filename = optarg;
  			break;
 Index: mfs_vfsops.c
 ===================================================================
 RCS file: /src/FreeBSD-CVS/ncvs/src/sys/ufs/mfs/mfs_vfsops.c,v
 retrieving revision 1.41
 diff -u -r1.41 mfs_vfsops.c
 --- mfs_vfsops.c	1998/03/01 22:46:53	1.41
 +++ mfs_vfsops.c	1998/04/05 02:21:16
 @@ -46,6 +46,8 @@
  #include <sys/signalvar.h>
  #include <sys/vnode.h>
  #include <sys/malloc.h>
 +#include <sys/sysproto.h>	/* for msync_args */
 +#include <sys/mman.h>	/* for msync_args */
  
  #include <ufs/ufs/quota.h>
  #include <ufs/ufs/inode.h>
 @@ -430,6 +432,13 @@
  error_1:	/* no state to back out*/
  
  success:
 +	/*
 +	 * must mark the calling process as a system process
 +	 * so the pager doesn't try to kill it.  Doh!  And the
 +	 * pager may because the resident set size may be huge.
 +	 */
 +	p->p_flag |= P_SYSTEM;
 +
  	return( err);
  }
  
 @@ -468,7 +477,8 @@
  		 * If a non-ignored signal is received, try to unmount.
  		 * If that fails, clear the signal (it has been "processed"),
  		 * otherwise we will loop here, as tsleep will always return
 -		 * EINTR/ERESTART.
 +		 * EINTR/ERESTART.  It will return EWOULDBLOCK if the timer
 +		 * expired.
  		 */
  		/*
  		 * Note that dounmount() may fail if work was queued after
 @@ -479,9 +489,37 @@
  			gotsig = 0;
  			if (dounmount(mp, 0, p) != 0)
  				CLRSIG(p, CURSIG(p));	/* try sleep again.. */
 -		}
 -		else if (tsleep((caddr_t)vp, mfs_pri, "mfsidl", 0))
 +		} else {
 +		    int r = tsleep((caddr_t)vp, mfs_pri, "mfsidl", hz * 10);
 +
 +		    if (r && r != EWOULDBLOCK)
  			gotsig++;	/* try to unmount in next pass */
 +		}
 +
 +		/*
 +		 * we should call msync on the backing store every 30 seconds,
 +		 * otherwise the pages are not associated with the file and guess
 +		 * what!  the syncer never sees them.  msync has no effect 
 +		 * if the backing store is swap, but a big effect if it's a file
 +		 * (e.g. an NFS mounted file).
 +		 */
 +		{
 +			static long lsec;
 +			int dt = time_second - lsec;
 +
 +			if (dt < -30 || dt > 30) {
 +				struct msync_args uap;
 +
 +				lsec = time_second;
 +
 +				uap.addr = mfsp->mfs_baseoff;
 +				uap.len = mfsp->mfs_size;
 +				uap.flags = MS_ASYNC;
 +
 +				msync(curproc, &uap);
 +			}
 +		}
 +
  	}
  	return (0);
  }

From: David Greenman <dg@root.com>
To: Peter Wemm <peter@netplex.com.au>
Cc: dyson@freebsd.org, freebsd-gnats-submit@freebsd.org
Subject: Re: kern/6212: Two bugs with MFS filesystem fixed, two features added 
Date: Tue, 07 Apr 1998 20:25:39 -0700

 >David Greenman wrote:
 >> > Although, I'm puzzled why the msync() is needed at all, since it's just a
 >> > mmap'ed file.  Perhaps there is some new lurking problem with synchronizing
 >> > of mmap'ed files... :-/
 >> 
 >>    We've gone both ways with this issue in the past, but the final resolution
 >> was that it is better to require the msync() rather than automatically
 >> flushing VM system managed pages to the backing store.
 >
 >Hmm..  Matt points out that a 'sync' command will cause the writeback, the
 >dirty pages will get written to the file providing the mmap() backing
 >store. The pre-softdep code ran a sync every 30 seconds in process 3
 >(update) so this was being "handled".
 >
 >However, the syncer process doesn't seem to do this any more..  It looks
 >like the syncer is not doing that old 30-second sync did.
 >
 >I suspect that this is a result of the differences between the FreeBSD vs.
 >the 4.4Lite2 derived systems that Kirk wrote the code on.
 >
 >Hmm.. the old update() called 'vfs_msync(mp)' for each mountpoint every 30
 >seconds, and this does a walkthrough of the vnodes with OBJ_MIGHTBEDIRTY
 >and does a vm_object_page_clean() on them.  This is no longer happening
 >after softdep, and probably explains the failure to write out dirty mmap
 >regions. vfs_msync() (called at sync(2), unmount(2) and from the old update 
 >process) is documented as:
 >/*
 > * perform msync on all vnodes under a mount point
 > * the mount point must be locked.
 > */
 >
 >I think the loss of this call that is the culprit.
 
    I was confused by the behavior at unmap time - we used to sync out dirty
 VM objects then as well, but people using INN (I think) found that this does
 bad things to performance. I don't know how this should be handled in the
 presense of softupdates, but the non-softupdates case should do the periodic
 sync, I think.
 
 -DG
 
 David Greenman
 Core-team/Principal Architect, The FreeBSD Project

From: Peter Wemm <peter@netplex.com.au>
To: dg@root.com
Cc: Peter Wemm <peter@netplex.com.au>, dyson@freebsd.org,
        freebsd-gnats-submit@freebsd.org
Subject: Re: kern/6212: Two bugs with MFS filesystem fixed, two features added 
Date: Wed, 08 Apr 1998 11:44:09 +0800

 David Greenman wrote:
 > >David Greenman wrote:
 > >> > Although, I'm puzzled why the msync() is needed at all, since it's just 
     a
 > >> > mmap'ed file.  Perhaps there is some new lurking problem with synchroniz
     ing
 > >> > of mmap'ed files... :-/
 [..]
 > >Hmm.. the old update() called 'vfs_msync(mp)' for each mountpoint every 30
 > >seconds, and this does a walkthrough of the vnodes with OBJ_MIGHTBEDIRTY
 > >and does a vm_object_page_clean() on them.  This is no longer happening
 > >after softdep, and probably explains the failure to write out dirty mmap
 > >regions. vfs_msync() (called at sync(2), unmount(2) and from the old update 
 > >process) is documented as:
 > >/*
 > > * perform msync on all vnodes under a mount point
 > > * the mount point must be locked.
 > > */
 > >
 > >I think the loss of this call that is the culprit.
 > 
 >    I was confused by the behavior at unmap time - we used to sync out dirty
 > VM objects then as well, but people using INN (I think) found that this does
 > bad things to performance. I don't know how this should be handled in the
 > presense of softupdates, but the non-softupdates case should do the periodic
 > sync, I think.
 
 I don't think the softupdates have much effect on that, do they?  
 softupdates is supposed to be about fs metadata, not data pages - we can 
 write the dirty data pages any time without too much concern for 
 softupdates.
 
 > -DG
 
 Cheers,
 -Peter
 --
 Peter Wemm <peter@netplex.com.au>   Netplex Consulting
 
 
Responsible-Changed-From-To: freebsd-bugs->dillon 
Responsible-Changed-By: johan 
Responsible-Changed-When: Thu Aug 10 23:42:26 PDT 2000 
Responsible-Changed-Why:  
Let Matt handle his own PRs. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=6212 
State-Changed-From-To: open->closed 
State-Changed-By: dillon 
State-Changed-When: Mon Mar 26 16:54:42 PST 2001 
State-Changed-Why:  
scrapped / obsolete 

http://www.freebsd.org/cgi/query-pr.cgi?pr=6212 
>Unformatted:
