From nobody@FreeBSD.org  Wed May  9 10:11:56 2001
Return-Path: <nobody@FreeBSD.org>
Received: from freefall.freebsd.org (freefall.freebsd.org [216.136.204.21])
	by hub.freebsd.org (Postfix) with ESMTP id 5F22837B422
	for <freebsd-gnats-submit@FreeBSD.org>; Wed,  9 May 2001 10:11:55 -0700 (PDT)
	(envelope-from nobody@FreeBSD.org)
Received: (from nobody@localhost)
	by freefall.freebsd.org (8.11.1/8.11.1) id f49HBtp13241;
	Wed, 9 May 2001 10:11:55 -0700 (PDT)
	(envelope-from nobody)
Message-Id: <200105091711.f49HBtp13241@freefall.freebsd.org>
Date: Wed, 9 May 2001 10:11:55 -0700 (PDT)
From: conrad@th.physik.uni-bonn.de
To: freebsd-gnats-submit@FreeBSD.org
Subject: On NFSv3 mounted filesystems, stat returns st_blksize=512
X-Send-Pr-Version: www-1.0

>Number:         27232
>Category:       kern
>Synopsis:       [nfs] On NFSv3 mounted filesystems, stat returns st_blksize=512
>Confidential:   no
>Severity:       non-critical
>Priority:       medium
>Responsible:    freebsd-bugs
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Wed May 09 10:20:01 PDT 2001
>Closed-Date:    
>Last-Modified:  Mon Mar 19 11:28:37 GMT 2007
>Originator:     Jan Conrad
>Release:        FreeBSD 4.3R
>Organization:
Univ. Bonn, Germany
>Environment:
FreeBSD merlin.th.physik.uni-bonn.de 4.3-RELEASE FreeBSD 4.3-RELEASE #0: Mon May  7 14:08:48 CEST 2001     conrad@merlin.th.physik.uni-bonn.de:/freebsd/misc/src/sys/compile/DEBUG  i386

>Description:
On NFSv3 mounts, stat returns st_blksize=512 for every regular file.
This in turn is used by libc routines as a default buffer size, as it
should be the 'optimal' io blocksize.

However, this leads to a drastic performance decrease. For example
a mailbox save of a 3MB messages (by pine) takes over half a minute
with 512 byte writes whereas it takes only a second or so with a 16kB
buffer.

>How-To-Repeat:
See above.

>Fix:
The whole thing can be traced back to the NFS code in the kernel.
The function nfs_loadattrcache of sys/nfs/nfs_subs.c makes the
assignement

		vap->va_blocksize = NFS_FABLKSIZE;

for NFSv3, where NFS_FABLKSIZE is 512.

In my opinion the assignement should be something like

		vap->va_blocksize = vp->v_mount->mnt_stat.f_iosize;

i.e. va_blocksize should be assigned the 'optimal' iosize for the
mounted file system. I think that this is the maximum of the read
and write blocksize of the nfs-mount (search for nfs_iosize in
nfs_vfsops.c).

This should solve the problem, but I am no kernel hacker :-)
>Release-Note:
>Audit-Trail:

From: Garrett Wollman <wollman@khavrinen.lcs.mit.edu>
To: conrad@th.physik.uni-bonn.de
Cc: freebsd-gnats-submit@FreeBSD.ORG
Subject: kern/27232: On NFSv3 mounted filesystems, stat returns st_blksize=512
Date: Wed, 9 May 2001 15:38:20 -0400 (EDT)

 <<On Wed, 9 May 2001 10:11:55 -0700 (PDT), conrad@th.physik.uni-bonn.de said:
 
 > On NFSv3 mounts, stat returns st_blksize=512 for every regular file.
 > This in turn is used by libc routines as a default buffer size, as it
 > should be the 'optimal' io blocksize.
 
 No.  It should be the block size used by the underlying filesystem's
 block allocator, and in which the file's `st_blocks' size-on-disk is
 reported.  While SUS describes it as a ``preferred'' block size, and
 the FreeBSD manual pages describe it as ``optimal ... for I/O'', the
 most important meaning of this field is as a multiplier of st_blocks
 to determine the file's size.
 
 -GAWollman
 
 --
 Garrett A. Wollman   | O Siem / We are all family / O Siem / We're all the same
 wollman@lcs.mit.edu  | O Siem / The fires of freedom 
 Opinions not those of| Dance in the burning flame
 MIT, LCS, CRS, or NSA|                     - Susan Aglukark and Chad Irschick

From: Bruce Evans <bde@zeta.org.au>
To: Garrett Wollman <wollman@khavrinen.lcs.mit.edu>
Cc: freebsd-gnats-submit@FreeBSD.ORG
Subject: Re: kern/27232: On NFSv3 mounted filesystems, stat returns st_blksize=512
Date: Thu, 10 May 2001 18:24:04 +1000 (EST)

 On Wed, 9 May 2001, Garrett Wollman wrote:
 
 >  <<On Wed, 9 May 2001 10:11:55 -0700 (PDT), conrad@th.physik.uni-bonn.de said:
 >  
 >  > On NFSv3 mounts, stat returns st_blksize=512 for every regular file.
 >  > This in turn is used by libc routines as a default buffer size, as it
 >  > should be the 'optimal' io blocksize.
 >  
 >  No.  It should be the block size used by the underlying filesystem's
 >  block allocator,
 
 Correct.  Even if there is no underlying filesystem's block allocator,
 stat() must fake it, and should fake it as well as possible.  nfs seems
 to have regressed to always setting vap->va_blocksize to NFS_FABLKSIZE
 (512) in the v3 case (see nfs_subs.c).
 
 >  and in which the file's `st_blocks' size-on-disk is
 >  reportedi.
 
 No.  At least under FreeBSD, st_blocks is in units of blocks with size
 S_BLKSIZE (512).  It may count blocks for metadata, so it may be larger
 than the file size.
 
 >  While SUS describes it as a ``preferred'' block size, and
 >  the FreeBSD manual pages describe it as ``optimal ... for I/O'', the
 
 It is just the best available approximation to the optimal i/o size.
 If it is good enough for filesystem blocks, then it can't be very bad
 for userland i/o.
 
 >  most important meaning of this field is as a multiplier of st_blocks
 >  to determine the file's size.
 
 No.  The multiplier is 512.
 
 Bruce
 

From: Jan Conrad <conrad@th.physik.uni-bonn.de>
To: Garrett Wollman <wollman@khavrinen.lcs.mit.edu>
Cc: <freebsd-gnats-submit@FreeBSD.ORG>
Subject: Re: kern/27232: On NFSv3 mounted filesystems, stat returns st_blksize=512
Date: Thu, 10 May 2001 12:18:18 +0200 (CEST)

 On Wed, 9 May 2001, Garrett Wollman wrote:
 
 > <<On Wed, 9 May 2001 10:11:55 -0700 (PDT), conrad@th.physik.uni-bonn.de said:
 >
 > > On NFSv3 mounts, stat returns st_blksize=512 for every regular file.
 > > This in turn is used by libc routines as a default buffer size, as it
 > > should be the 'optimal' io blocksize.
 >
 > No.  It should be the block size used by the underlying filesystem's
 > block allocator, and in which the file's `st_blocks' size-on-disk is
 > reported.  While SUS describes it as a ``preferred'' block size, and
 > the FreeBSD manual pages describe it as ``optimal ... for I/O'', the
 > most important meaning of this field is as a multiplier of st_blocks
 > to determine the file's size.
 
 Hmm - I am sorry, but I can't believe your answer.
 
 If I stat the following file on /var/tmp (newfs'd with -b 8192 -f 1024)
 (all with FreeBSD 4.3R)
 
 -rw-r--r--  1 root  wheel  33398 Mar 13 17:40 /var/tmp/dev.out
 
 # my little stat checker (appended to the message) gives me
 ./stat
 /var/tmp/dev.out:
 st_mode = 100644
 st_blksize = 8192
 st_size = 33398
 st_blocks = 66
 
 As you can see st_blocks measures the size of the file in 512byte blocks
 independent of st_blksize!
 
 And the source of this is ufs_getattr in sys/ufs/ufs/ufs_vnops.c
 (this is present in HEAD!)
 
 	vap->va_flags = ip->i_flags;
 	vap->va_gen = ip->i_gen;
 	vap->va_blocksize = vp->v_mount->mnt_stat.f_iosize;
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 	vap->va_bytes = dbtob((u_quad_t)ip->i_blocks);
 	vap->va_type = IFTOVT(ip->i_mode);
 
 So either ufs or nfs is wrong (or both!)
 
 
 -Jan
 
 
 
 The output is from the following little program
 
 
 #include <sys/types.h>
 #include <sys/stat.h>
 
 #include <stdio.h>
 #include <stdlib.h>
 #include <unistd.h>
 
 #define FILE1 "/milles/home/conrad/src/stat/stat.c"
 #define FILE2 "/.amd_mnt/avz109/users/Math_Dictionary/readme.htm"
 #define FILE3 "/var/tmp/dev.out"
 
 
 main()
 {
   struct stat sb;
   char *file = FILE3;
 
   if (stat(file, &sb) < 0) {
     printf ("fail\n");
   } else {
     printf ("%s:\n", file);
     printf ("st_mode = %o\n", sb.st_mode);
     printf ("st_blksize = %u\n", sb.st_blksize);
     printf ("st_size = %u\n", sb.st_size);
     printf ("st_blocks = %U\n", sb.st_blocks);
   };
 }
 
 
 
 
 >
 > -GAWollman
 >
 > --
 > Garrett A. Wollman   | O Siem / We are all family / O Siem / We're all the same
 > wollman@lcs.mit.edu  | O Siem / The fires of freedom
 > Opinions not those of| Dance in the burning flame
 > MIT, LCS, CRS, or NSA|                     - Susan Aglukark and Chad Irschick
 >
 
 -- 
 Physikalisches Institut der Universitaet Bonn
 Nussallee 12
 D-53115 Bonn
 GERMANY
 
 
 

From: Jan Conrad <conrad@th.physik.uni-bonn.de>
To: Bruce Evans <bde@zeta.org.au>
Cc: <freebsd-gnats-submit@FreeBSD.ORG>,
	<wollman@khavrinen.lcs.mit.edu>
Subject: Re: kern/27232: On NFSv3 mounted filesystems, stat returns st_blksize=512
Date: Thu, 10 May 2001 12:46:40 +0200 (CEST)

 On Thu, 10 May 2001, Bruce Evans wrote:
 
 > From: Bruce Evans <bde@zeta.org.au>
 > To: Garrett Wollman <wollman@khavrinen.lcs.mit.edu>
 > Cc: freebsd-gnats-submit@FreeBSD.ORG
 > Subject: Re: kern/27232: On NFSv3 mounted filesystems, stat returns st_blksize=512
 > Date: Thu, 10 May 2001 18:24:04 +1000 (EST)
 >
 >  On Wed, 9 May 2001, Garrett Wollman wrote:
 >
 >  >  <<On Wed, 9 May 2001 10:11:55 -0700 (PDT), conrad@th.physik.uni-bonn.de said:
 >  >
 >  >  > On NFSv3 mounts, stat returns st_blksize=512 for every regular file.
 >  >  > This in turn is used by libc routines as a default buffer size, as it
 >  >  > should be the 'optimal' io blocksize.
 >  >
 >  >  No.  It should be the block size used by the underlying filesystem's
 >  >  block allocator,
 >
 >  Correct.  Even if there is no underlying filesystem's block allocator,
 >  stat() must fake it, and should fake it as well as possible.  nfs seems
 >  to have regressed to always setting vap->va_blocksize to NFS_FABLKSIZE
 >  (512) in the v3 case (see nfs_subs.c).
 
 
 My question is: Why not set this to mnt_stat.f_iosize of the mount point?
 (As ufs does it?)
 
 >
 >  >  and in which the file's `st_blocks' size-on-disk is
 >  >  reportedi.
 >
 >  No.  At least under FreeBSD, st_blocks is in units of blocks with size
 >  S_BLKSIZE (512).  It may count blocks for metadata, so it may be larger
 >  than the file size.
 >
 >  >  While SUS describes it as a ``preferred'' block size, and
 >  >  the FreeBSD manual pages describe it as ``optimal ... for I/O'', the
 >
 >  It is just the best available approximation to the optimal i/o size.
 >  If it is good enough for filesystem blocks, then it can't be very bad
 >  for userland i/o.
 
 It is *very* bad for userland io. Unfortunately we have only very limited
 space here so some poor guys have to sit next to our file server!
 
 They can tell by the sound when somebody is saving a large file by stdio
 fwrites!!!
 
 It takes *MORE* then ten times than with a larger st_blksize!
 
 
 -Jan
 
 
 >
 >  >  most important meaning of this field is as a multiplier of st_blocks
 >  >  to determine the file's size.
 >
 >  No.  The multiplier is 512.
 >
 >  Bruce
 >
 >
 > To Unsubscribe: send mail to majordomo@FreeBSD.org
 > with "unsubscribe freebsd-bugs" in the body of the message
 >
 
 -- 
 Physikalisches Institut der Universitaet Bonn
 Nussallee 12
 D-53115 Bonn
 GERMANY
 
 
 
Responsible-Changed-From-To: freebsd-bugs->cel 
Responsible-Changed-By: cel 
Responsible-Changed-When: Wed May 24 18:57:36 UTC 2006 
Responsible-Changed-Why:  
Same problem existed in Linux 2.4.  Will look into it. 


http://www.freebsd.org/cgi/query-pr.cgi?pr=27232 
Responsible-Changed-From-To: cel->free-bsd 
Responsible-Changed-By: cel 
Responsible-Changed-When: Mon Mar 12 15:20:04 UTC 2007 
Responsible-Changed-Why:  
Back to the public pool. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=27232 
Responsible-Changed-From-To: free-bsd->freebsd-bugs 
Responsible-Changed-By: ceri 
Responsible-Changed-When: Mon Mar 19 11:28:15 UTC 2007 
Responsible-Changed-Why:  
Correct responsible. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=27232 
>Unformatted:
