From nobody@FreeBSD.org  Tue Jul  6 08:03:37 2004
Return-Path: <nobody@FreeBSD.org>
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id A736916A4CE
	for <freebsd-gnats-submit@FreeBSD.org>; Tue,  6 Jul 2004 08:03:37 +0000 (GMT)
Received: from www.freebsd.org (www.freebsd.org [216.136.204.117])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 9F2AE43D2D
	for <freebsd-gnats-submit@FreeBSD.org>; Tue,  6 Jul 2004 08:03:37 +0000 (GMT)
	(envelope-from nobody@FreeBSD.org)
Received: from www.freebsd.org (localhost [127.0.0.1])
	by www.freebsd.org (8.12.11/8.12.11) with ESMTP id i6683bHS003125
	for <freebsd-gnats-submit@FreeBSD.org>; Tue, 6 Jul 2004 08:03:37 GMT
	(envelope-from nobody@www.freebsd.org)
Received: (from nobody@localhost)
	by www.freebsd.org (8.12.11/8.12.11/Submit) id i6683bM3003124;
	Tue, 6 Jul 2004 08:03:37 GMT
	(envelope-from nobody)
Message-Id: <200407060803.i6683bM3003124@www.freebsd.org>
Date: Tue, 6 Jul 2004 08:03:37 GMT
From: Csaba Banhalmi <banhalmi@field.hu>
To: freebsd-gnats-submit@FreeBSD.org
Subject: usb 2.0 mobil rack+ fat32 performance problem
X-Send-Pr-Version: www-2.3

>Number:         68719
>Category:       kern
>Synopsis:       [msdosfs] [patch] poor performance with msdosfs and USB 2.0 mobil rack
>Confidential:   no
>Severity:       non-critical
>Priority:       low
>Responsible:    trhodes
>State:          closed
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Tue Jul 06 08:10:18 GMT 2004
>Closed-Date:    Sat Mar 01 23:46:51 UTC 2008
>Last-Modified:  Sat Mar 01 23:46:51 UTC 2008
>Originator:     Csaba Banhalmi
>Release:        4.10
>Organization:
>Environment:
FreeBSD Field.hu 4.10-RELEASE FreeBSD 4.10-RELEASE #5: Wed Jun 23 22:45:47 CEST 2004     banhalmi@Field.hu:/usr/obj/usr/src/sys/Field  i386
>Description:
i tried to copy files via usb 2.0 to my usb 2.0 mobil rack, but with
fat32 file system the performance was terrible.. ~1M/s
fat32 is important to me, because of windows compatibility.
i checked the performance with ufs fs, it had 10M/s peak, which is bad
too. However i still need fat32 mainly.

>How-To-Repeat:
use usb 2.0 mobil rack with a fast hdd and ehci driver on 4.10 or 5.2 rls      
>Fix:
      
>Release-Note:
>Audit-Trail:
Responsible-Changed-From-To: freebsd-i386->freebsd-usb 
Responsible-Changed-By: linimon 
Responsible-Changed-When: Thu Nov 4 07:30:07 GMT 2004 
Responsible-Changed-Why:  
Reassign to appropriate mailing list. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=68719 

From: Dominic Marks <dom@goodforbusiness.co.uk>
To: bug-followup@FreeBSD.org,
 banhalmi@field.hu,
 freebsd-fs@freebsd.org
Cc:  
Subject: Re: i386/68719: [usb] USB 2.0 mobil rack+ fat32 performance problem
Date: Fri, 27 May 2005 13:28:57 +0100

 (Posted to freebsd-fs as the PR is assigned to freebsd-usb@, but it seems to 
 be more related to the msdos filesystem than the USB system so perhaps it 
 should be reassigned?)
 
 I've been evaluating the performance of some usb2 hard discs with FreeBSD and 
 I found this PR (68719). The submitter is correct that performance with 
 msdosfs is severely limited.
 
 I tested a 'LaCie' USB2 disc:
 
 da0 at umass-sim0 bus 0 target 0 lun 0
 da0: <Maxtor 7Y250P0 YAR4> Fixed Direct Access SCSI-2 device 
 da0: 40.000MB/s transfers
 da0: 239372MB (490234752 512 byte sectors: 255H 63S/T 30515C)
 
 egg# diskinfo -t da0
  ...
 Seek times:
         Full stroke:      250 iter in   5.271879 sec =   21.088 msec
         Half stroke:      250 iter in   4.055049 sec =   16.220 msec
         Quarter stroke:   500 iter in   6.696545 sec =   13.393 msec
         Short forward:    400 iter in   2.316910 sec =    5.792 msec
         Short backward:   400 iter in   2.052681 sec =    5.132 msec
         Seq outer:       2048 iter in   1.574044 sec =    0.769 msec
         Seq inner:       2048 iter in   1.576574 sec =    0.770 msec
 Transfer rates:
         outside:       102400 kbytes in   3.445316 sec =    29722 kbytes/sec
         middle:        102400 kbytes in   3.441593 sec =    29754 kbytes/sec
         inside:        102400 kbytes in   3.435809 sec =    29804 kbytes/sec
 
 I used 10GB chunks of data to test the USB disc. Each test used a different 
 10GB of data to avoid caching distorting results. I made the following 
 measurements with both UFS2 and FAT32.
 
 1. Local disc copy from a new SATA-150 disc.
 2. Ftp copy over a local 100Mbit network from a server with a SATA-150 disc. 
 3. Create a zero-file using dd to test simple write performance.
 
 Client with attached USB disc: P4 2.6Ghz 768MB DDR, if_fxp, 1x ATA-100 disc
 Server used for FTP: Celeron 2.4GHz 1.5GB DDR, if_em, 4x SATA-150 discs.
 
 Both the client and server are running FreeBSD 5.4-STABLE built at
 Thu May 26 22:52:15 BST 2005.
 
 In test 1 I could not achieve any better than 5.1MB/s on an msdosfs 
 filesystem. Using UFS2 and softupdates a transfer rate of 22~25MB/s was 
 possible. Both test data sets were copied from the systems ATA-100 disc. In 
 both tests at these peaks gstat reports the device is 100% busy.
 
 A snapshot from gstat(8) during test 1. da0s1 is the fat32 filesystem.
 
 dT: 0.501  flag_I 500000us  sizeof 240  i -1
  L(q)  ops/s    r/s   kBps   ms/r    w/s   kBps   ms/w   %busy Name
     2     90     90   3363    5.2      0      0    0.0   38.2| ad0
     2     90     90   3363    5.2      0      0    0.0   38.4| ad0s1
       ...
     2     90     90   3363    5.3      0      0    0.0   39.0| ad0s1g
    96   1295      2      8  163.9   1293   5170  141.6   99.8| da0
    96   1295      2      8  163.9   1293   5170  143.1   99.9| da0s1
 
 
 In test 2 again the msdosfs filesystem could not achieve higher than 5MB/s. 
 With UFS2 the limit of the network was reached before the limit of the USB2 
 bus so the transfer was limited to 10.5MB/s average. During this period gstat 
 reported about 35-45% activity on the device which matches up as I would have 
 expected.
 
 I managed to improve the performance in these results a little by upping 
 MAXPHYS to 256, and then to 512 on the client. Going from 128 to 256 improved 
 the diskinto -t transfer rates by about 3MB/s increasing it to 512 seemed to 
 have no further benefit. Enabling polling for the fxp interface helped as 
 well by reducing the interrupt rate from ~8k/s to 2k/s during the second 
 test.
 
 Finally, I used dd to test just the filesystem-write.
 
 ufs2:
 
 egg# dd if=/dev/zero of=/mnt/file.test bs=64k count=10000
 10000+0 records in
 10000+0 records out
 655360000 bytes transferred in 25.093943 secs (26116262 bytes/sec)
 
 And from gstat during the `dd':
 
 dT: 0.501  flag_I 500000us  sizeof 240  i -1
  L(q)  ops/s    r/s   kBps   ms/r    w/s   kBps   ms/w   %busy Name
       ...
     2     50      2     32   47.8     48  24510   45.7   97.8| da0
     2     50      2     32   47.9     48  24510   45.7   97.8| da0s1
 
 msdosfs:
 
 egg# dd if=/dev/zero of=/mnt/file2.test bs=64k count=10000
 10000+0 records in
 10000+0 records out
 655360000 bytes transferred in 123.332992 secs (5313744 bytes/sec)
 
 gstat:
 
 dT: 0.501  flag_I 500000us  sizeof 240  i -1
  L(q)  ops/s    r/s   kBps   ms/r    w/s   kBps   ms/w   %busy Name
       ...
   163   1314      0      0    0.0   1314   5258  145.4  100.0| da0
   163   1314      0      0    0.0   1314   5258  146.5  100.0| da0s1
 
 The ECHI controller is:
 
 ehci0: <EHCI (generic) USB 2.0 controller> mem 0xffa80800-0xffa80bff irq 23 at 
 device 29.7 on pci0
 usb4: EHCI version 1.0
 usb4: companion controllers, 2 ports each: usb0 usb1 usb2 usb3
 usb4: <EHCI (generic) USB 2.0 controller> on ehci0
 usb4: USB revision 2.0
 uhub4: Intel EHCI root hub, class 9/0, rev 2.00/1.00, addr 1
 uhub4: 8 ports with 8 removable, self powered
 
 I have not made any tests of read performance but from looking at the results 
 I do not expect that it will be significantly better than write performance. 
 I may do some when I get more time to investigate and follow up if the 
 results are unexpected.
 
 Hopefully this will generate some interest in the problem, it is beyond my 
 time and expertise but it would be very nice to be able to access MS-DOS 
 formatted filesystems at a reasonable speed!
 
 Thank you,
 -- 
 Dominic
 GoodforBusiness.co.uk
 I.T. Services for SMEs in the UK.

From: Bruce Evans <bde@zeta.org.au>
To: Dominic Marks <dom@goodforbusiness.co.uk>
Cc: freebsd-gnats-submit@FreeBSD.org, banhalmi@field.hu,
   freebsd-fs@FreeBSD.org
Subject: Re: i386/68719: [usb] USB 2.0 mobil rack+ fat32 performance problem
Date: Sat, 28 May 2005 20:36:55 +1000 (EST)

 On Fri, 27 May 2005, Dominic Marks wrote:
 
 > (Posted to freebsd-fs as the PR is assigned to freebsd-usb@, but it seems to
 > be more related to the msdos filesystem than the USB system so perhaps it
 > should be reassigned?)
 
 It should be.  It is even less i386-specific than usb-specific.
 
 > I've been evaluating the performance of some usb2 hard discs with FreeBSD and
 > I found this PR (68719). The submitter is correct that performance with
 > msdosfs is severely limited.
 >
 > I tested a 'LaCie' USB2 disc:
 > ...
 > In test 1 I could not achieve any better than 5.1MB/s on an msdosfs
 > filesystem. Using UFS2 and softupdates a transfer rate of 22~25MB/s was
 > possible. Both test data sets were copied from the systems ATA-100 disc. In
 > both tests at these peaks gstat reports the device is 100% busy.
 
 I use the following to improve transfer rates for msdosfs.  The patch is
 for an old version so it might not apply directly.
 
 %%%
 Index: msdosfs_vnops.c
 ===================================================================
 RCS file: /home/ncvs/src/sys/fs/msdosfs/msdosfs_vnops.c,v
 retrieving revision 1.147
 diff -u -2 -r1.147 msdosfs_vnops.c
 --- msdosfs_vnops.c	4 Feb 2004 21:52:53 -0000	1.147
 +++ msdosfs_vnops.c	22 Feb 2004 07:27:15 -0000
 @@ -608,4 +622,5 @@
   	int error = 0;
   	u_long count;
 +	int seqcount;
   	daddr_t bn, lastcn;
   	struct buf *bp;
 @@ -693,4 +714,5 @@
   		lastcn = de_clcount(pmp, osize) - 1;
 
 +	seqcount = ioflag >> IO_SEQSHIFT;
   	do {
   		if (de_cluster(pmp, uio->uio_offset) > lastcn) {
 @@ -718,5 +740,5 @@
   			 */
   			bp = getblk(thisvp, bn, pmp->pm_bpcluster, 0, 0, 0);
 -			clrbuf(bp);
 +			vfs_bio_clrbuf(bp);
   			/*
   			 * Do the bmap now, since pcbmap needs buffers
 @@ -767,11 +789,19 @@
   		 * without delay.  Otherwise do a delayed write because we
   		 * may want to write somemore into the block later.
 +		 * XXX comment not updated with code.
   		 */
 +		if ((vp->v_mount->mnt_flag & MNT_NOCLUSTERW) == 0)
 +			bp->b_flags |= B_CLUSTEROK;
   		if (ioflag & IO_SYNC)
 -			(void) bwrite(bp);
 -		else if (n + croffset == pmp->pm_bpcluster)
 +			(void)bwrite(bp);
 +		else if (vm_page_count_severe() || buf_dirty_count_severe())
   			bawrite(bp);
 -		else
 -			bdwrite(bp);
 +		else if (n + croffset == pmp->pm_bpcluster) {
 +			if ((vp->v_mount->mnt_flag & MNT_NOCLUSTERW) == 0)
 +				cluster_write(bp, dep->de_FileSize, seqcount);
 +			else
 +				bawrite(bp);
 +  		} else
 +  			bdwrite(bp);
   		dep->de_flag |= DE_UPDATE;
   	} while (error == 0 && uio->uio_resid > 0);
 %%%
 
 Notes:
 - The xxx_count_severe() stuff doesn't work quite right and was observed
    to work especially badly for msdosfs in some configurations.  IIRC,
    only configurations with a tiny block size (e.g., 512 bytes) showed
    the problem, and the problem is more likely to be with tiny block sizes
    actually exercising the "severe" case than with msdosfs or with the
    tiny block sizes themselves.  The behaviour was apparently that when
    a severe page or buf shortage develops, the above handling makes the
    problem worse by using bawrite() instead of cluster_write().  Falling
    back to bawrite() may have made the resource shortage non-fatal, but
    it made the resource shortage last much longer since bawrite() was much
    slower, even on the reasonable fast ATA drive that I was testing on.
 - Using cluster_write() in the above is not essential.  bdwrite() works
    almost as well, or perhaps even better than cluster_write() provided
    write clustering is enabled by setting B_CLUSTEROK, since when this
    flag is set the delayed writes are clustered when they are done
    physically.
 
 > I have not made any tests of read performance but from looking at the results
 > I do not expect that it will be significantly better than write performance.
 > I may do some when I get more time to investigate and follow up if the
 > results are unexpected.
 
 Try it.  I would expect read performance to be much better.  If not, don't
 bother trying the above patch.  msdosfs uses read-ahead for read(), and
 this seems to work well so I haven't even tried changing it to use read
 clustering (the above only changes it to use write clustering).  This may
 depend on the drive doing read caching and not handling small block sizes
 too badly.  I mostly use ATA drives that have these properties.  Writing
 tinygrams tends to have a relatively higher cost because write caching is
 not enabled so clustering can only be done by the OS.
 
 > Hopefully this will generate some interest in the problem, it is beyond my
 > time and expertise but it would be very nice to be able to access MS-DOS
 > formatted filesystems at a reasonable speed!
 
 Some other changes are needed for general use at a reasonable speed:
 - use VMIO for metadata.
 - don't use pessimal block allocation.  The current allocator gives
    large inter-file fragmentation by attempting to minimise intra-file
    fragmentation, and when the file system becomes just 1/N full the
    attempt backfires and gives intra-file fragmentation too (files with
    more than N clusters are very likely to be fragmented).
 
 Bruce

From: Dominic Marks <dom@goodforbusiness.co.uk>
To: Bruce Evans <bde@zeta.org.au>
Cc: freebsd-gnats-submit@FreeBSD.org,
 banhalmi@field.hu,
 freebsd-fs@FreeBSD.org
Subject: Re: i386/68719: [usb] USB 2.0 mobil rack+ fat32 performance problem
Date: Sat, 28 May 2005 12:13:41 +0100

 On Saturday 28 May 2005 11:36, Bruce Evans wrote:
 > On Fri, 27 May 2005, Dominic Marks wrote:
 > > (Posted to freebsd-fs as the PR is assigned to freebsd-usb@, but it seems
 > > to be more related to the msdos filesystem than the USB system so perhaps
 > > it should be reassigned?)
 >
 > It should be.  It is even less i386-specific than usb-specific.
 >
 > > I've been evaluating the performance of some usb2 hard discs with FreeBSD
 > > and I found this PR (68719). The submitter is correct that performance
 > > with msdosfs is severely limited.
 > >
 > > I tested a 'LaCie' USB2 disc:
 > > ...
 > > In test 1 I could not achieve any better than 5.1MB/s on an msdosfs
 > > filesystem. Using UFS2 and softupdates a transfer rate of 22~25MB/s was
 > > possible. Both test data sets were copied from the systems ATA-100 disc.
 > > In both tests at these peaks gstat reports the device is 100% busy.
 >
 > I use the following to improve transfer rates for msdosfs.  The patch is
 > for an old version so it might not apply directly.
 >
 > %%%
 > Index: msdosfs_vnops.c
 > ===================================================================
 > RCS file: /home/ncvs/src/sys/fs/msdosfs/msdosfs_vnops.c,v
 > retrieving revision 1.147
 > diff -u -2 -r1.147 msdosfs_vnops.c
 > --- msdosfs_vnops.c	4 Feb 2004 21:52:53 -0000	1.147
 > +++ msdosfs_vnops.c	22 Feb 2004 07:27:15 -0000
 > @@ -608,4 +622,5 @@
 >   	int error = 0;
 >   	u_long count;
 > +	int seqcount;
 >   	daddr_t bn, lastcn;
 >   	struct buf *bp;
 > @@ -693,4 +714,5 @@
 >   		lastcn = de_clcount(pmp, osize) - 1;
 >
 > +	seqcount = ioflag >> IO_SEQSHIFT;
 >   	do {
 >   		if (de_cluster(pmp, uio->uio_offset) > lastcn) {
 > @@ -718,5 +740,5 @@
 >   			 */
 >   			bp = getblk(thisvp, bn, pmp->pm_bpcluster, 0, 0, 0);
 > -			clrbuf(bp);
 > +			vfs_bio_clrbuf(bp);
 >   			/*
 >   			 * Do the bmap now, since pcbmap needs buffers
 > @@ -767,11 +789,19 @@
 >   		 * without delay.  Otherwise do a delayed write because we
 >   		 * may want to write somemore into the block later.
 > +		 * XXX comment not updated with code.
 >   		 */
 > +		if ((vp->v_mount->mnt_flag & MNT_NOCLUSTERW) == 0)
 > +			bp->b_flags |= B_CLUSTEROK;
 >   		if (ioflag & IO_SYNC)
 > -			(void) bwrite(bp);
 > -		else if (n + croffset == pmp->pm_bpcluster)
 > +			(void)bwrite(bp);
 > +		else if (vm_page_count_severe() || buf_dirty_count_severe())
 >   			bawrite(bp);
 > -		else
 > -			bdwrite(bp);
 > +		else if (n + croffset == pmp->pm_bpcluster) {
 > +			if ((vp->v_mount->mnt_flag & MNT_NOCLUSTERW) == 0)
 > +				cluster_write(bp, dep->de_FileSize, seqcount);
 > +			else
 > +				bawrite(bp);
 > +  		} else
 > +  			bdwrite(bp);
 >   		dep->de_flag |= DE_UPDATE;
 >   	} while (error == 0 && uio->uio_resid > 0);
 > %%%
 
 Thanks! I'll try my three tests again with this patch.
 
 > Notes:
 > - The xxx_count_severe() stuff doesn't work quite right and was observed
 >    to work especially badly for msdosfs in some configurations.  IIRC,
 >    only configurations with a tiny block size (e.g., 512 bytes) showed
 >    the problem, and the problem is more likely to be with tiny block sizes
 >    actually exercising the "severe" case than with msdosfs or with the
 >    tiny block sizes themselves.  The behaviour was apparently that when
 >    a severe page or buf shortage develops, the above handling makes the
 >    problem worse by using bawrite() instead of cluster_write().  Falling
 >    back to bawrite() may have made the resource shortage non-fatal, but
 >    it made the resource shortage last much longer since bawrite() was much
 >    slower, even on the reasonable fast ATA drive that I was testing on.
 > - Using cluster_write() in the above is not essential.  bdwrite() works
 >    almost as well, or perhaps even better than cluster_write() provided
 >    write clustering is enabled by setting B_CLUSTEROK, since when this
 >    flag is set the delayed writes are clustered when they are done
 >    physically.
 >
 > > I have not made any tests of read performance but from looking at the
 > > results I do not expect that it will be significantly better than write
 > > performance. I may do some when I get more time to investigate and follow
 > > up if the results are unexpected.
 >
 > Try it.  I would expect read performance to be much better.  If not, don't
 > bother trying the above patch.  msdosfs uses read-ahead for read(), and
 > this seems to work well so I haven't even tried changing it to use read
 > clustering (the above only changes it to use write clustering).  This may
 > depend on the drive doing read caching and not handling small block sizes
 > too badly.  I mostly use ATA drives that have these properties.  Writing
 > tinygrams tends to have a relatively higher cost because write caching is
 > not enabled so clustering can only be done by the OS.
 
 Ok, I still have all the test equipment so I might as well do this today. I 
 have ATA write caching enabled on my systems.
 
 > > Hopefully this will generate some interest in the problem, it is beyond
 > > my time and expertise but it would be very nice to be able to access
 > > MS-DOS formatted filesystems at a reasonable speed!
 >
 > Some other changes are needed for general use at a reasonable speed:
 > - use VMIO for metadata.
 > - don't use pessimal block allocation.  The current allocator gives
 >    large inter-file fragmentation by attempting to minimise intra-file
 >    fragmentation, and when the file system becomes just 1/N full the
 >    attempt backfires and gives intra-file fragmentation too (files with
 >    more than N clusters are very likely to be fragmented).
 
 Is there anyone out there who is sufficently talented, with a strong desire to 
 tackle this problem? I would be happy to make the first payment, or hardware 
 donation into a development fund to see it get fixed. My resources are 
 limited though, so if there are others who would like this feature perhaps we 
 could combine to get a volunteer some really nice kit?
 
 > Bruce
 
 Thanks very much,
 -- 
 Dominic
 GoodforBusiness.co.uk
 I.T. Services for SMEs in the UK.

From: "Dominic Marks" <dom@goodforbusiness.co.uk>
To: <james>
Cc: <freebsd-fs@FreeBSD.org>,
	<freebsd-gnats-submit@FreeBSD.org>,
	<banhalmi@field.hu>
Subject: Re: i386/68719: [usb] USB 2.0 mobil rack+ fat32 performance problem
Date: Sat, 28 May 2005 13:45:33 +0100

 On Saturday 28 May 2005 11:36, Bruce Evans wrote:
 > On Fri, 27 May 2005, Dominic Marks wrote:
 > > (Posted to freebsd-fs as the PR is assigned to freebsd-usb@, but it seems
 > > to be more related to the msdos filesystem than the USB system so perhaps
 > > it should be reassigned?)
 >
 > It should be.  It is even less i386-specific than usb-specific.
 >
 > > I've been evaluating the performance of some usb2 hard discs with FreeBSD
 > > and I found this PR (68719). The submitter is correct that performance
 > > with msdosfs is severely limited.
 > >
 > > I tested a 'LaCie' USB2 disc:
 > > ...
 > > In test 1 I could not achieve any better than 5.1MB/s on an msdosfs
 > > filesystem. Using UFS2 and softupdates a transfer rate of 22~25MB/s was
 > > possible. Both test data sets were copied from the systems ATA-100 disc.
 > > In both tests at these peaks gstat reports the device is 100% busy.
 >
 > I use the following to improve transfer rates for msdosfs.  The patch is
 > for an old version so it might not apply directly.
 >
 > %%%
 > Index: msdosfs_vnops.c
 > ===================================================================
 > RCS file: /home/ncvs/src/sys/fs/msdosfs/msdosfs_vnops.c,v
 > retrieving revision 1.147
 > diff -u -2 -r1.147 msdosfs_vnops.c
 > --- msdosfs_vnops.c	4 Feb 2004 21:52:53 -0000	1.147
 > +++ msdosfs_vnops.c	22 Feb 2004 07:27:15 -0000
 > @@ -608,4 +622,5 @@
 >   	int error = 0;
 >   	u_long count;
 > +	int seqcount;
 >   	daddr_t bn, lastcn;
 >   	struct buf *bp;
 > @@ -693,4 +714,5 @@
 >   		lastcn = de_clcount(pmp, osize) - 1;
 >
 > +	seqcount = ioflag >> IO_SEQSHIFT;
 >   	do {
 >   		if (de_cluster(pmp, uio->uio_offset) > lastcn) {
 > @@ -718,5 +740,5 @@
 >   			 */
 >   			bp = getblk(thisvp, bn, pmp->pm_bpcluster, 0, 0, 0);
 > -			clrbuf(bp);
 > +			vfs_bio_clrbuf(bp);
 >   			/*
 >   			 * Do the bmap now, since pcbmap needs buffers
 > @@ -767,11 +789,19 @@
 >   		 * without delay.  Otherwise do a delayed write because we
 >   		 * may want to write somemore into the block later.
 > +		 * XXX comment not updated with code.
 >   		 */
 > +		if ((vp->v_mount->mnt_flag & MNT_NOCLUSTERW) == 0)
 > +			bp->b_flags |= B_CLUSTEROK;
 >   		if (ioflag & IO_SYNC)
 > -			(void) bwrite(bp);
 > -		else if (n + croffset == pmp->pm_bpcluster)
 > +			(void)bwrite(bp);
 > +		else if (vm_page_count_severe() || buf_dirty_count_severe())
 >   			bawrite(bp);
 > -		else
 > -			bdwrite(bp);
 > +		else if (n + croffset == pmp->pm_bpcluster) {
 > +			if ((vp->v_mount->mnt_flag & MNT_NOCLUSTERW) == 0)
 > +				cluster_write(bp, dep->de_FileSize, seqcount);
 > +			else
 > +				bawrite(bp);
 > +  		} else
 > +  			bdwrite(bp);
 >   		dep->de_flag |= DE_UPDATE;
 >   	} while (error == 0 && uio->uio_resid > 0);
 > %%%
 
 Thanks! I'll try my three tests again with this patch.
 
 > Notes:
 > - The xxx_count_severe() stuff doesn't work quite right and was observed
 >    to work especially badly for msdosfs in some configurations.  IIRC,
 >    only configurations with a tiny block size (e.g., 512 bytes) showed
 >    the problem, and the problem is more likely to be with tiny block sizes
 >    actually exercising the "severe" case than with msdosfs or with the
 >    tiny block sizes themselves.  The behaviour was apparently that when
 >    a severe page or buf shortage develops, the above handling makes the
 >    problem worse by using bawrite() instead of cluster_write().  Falling
 >    back to bawrite() may have made the resource shortage non-fatal, but
 >    it made the resource shortage last much longer since bawrite() was much
 >    slower, even on the reasonable fast ATA drive that I was testing on.
 > - Using cluster_write() in the above is not essential.  bdwrite() works
 >    almost as well, or perhaps even better than cluster_write() provided
 >    write clustering is enabled by setting B_CLUSTEROK, since when this
 >    flag is set the delayed writes are clustered when they are done
 >    physically.
 >
 > > I have not made any tests of read performance but from looking at the
 > > results I do not expect that it will be significantly better than write
 > > performance. I may do some when I get more time to investigate and follow
 > > up if the results are unexpected.
 >
 > Try it.  I would expect read performance to be much better.  If not, don't
 > bother trying the above patch.  msdosfs uses read-ahead for read(), and
 > this seems to work well so I haven't even tried changing it to use read
 > clustering (the above only changes it to use write clustering).  This may
 > depend on the drive doing read caching and not handling small block sizes
 > too badly.  I mostly use ATA drives that have these properties.  Writing
 > tinygrams tends to have a relatively higher cost because write caching is
 > not enabled so clustering can only be done by the OS.
 
 Ok, I still have all the test equipment so I might as well do this today. I 
 have ATA write caching enabled on my systems.
 
 > > Hopefully this will generate some interest in the problem, it is beyond
 > > my time and expertise but it would be very nice to be able to access
 > > MS-DOS formatted filesystems at a reasonable speed!
 >
 > Some other changes are needed for general use at a reasonable speed:
 > - use VMIO for metadata.
 > - don't use pessimal block allocation.  The current allocator gives
 >    large inter-file fragmentation by attempting to minimise intra-file
 >    fragmentation, and when the file system becomes just 1/N full the
 >    attempt backfires and gives intra-file fragmentation too (files with
 >    more than N clusters are very likely to be fragmented).
 
 Is there anyone out there who is sufficently talented, with a strong desire to 
 tackle this problem? I would be happy to make the first payment, or hardware 
 donation into a development fund to see it get fixed. My resources are 
 limited though, so if there are others who would like this feature perhaps we 
 could combine to get a volunteer some really nice kit?
 
 > Bruce
 
 Thanks very much,
 -- 
 Dominic
 GoodforBusiness.co.uk
 I.T. Services for SMEs in the UK.
 _______________________________________________
 freebsd-fs@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-fs
 To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"
 
 

From: "Bruce Evans" <bde@zeta.org.au>
To: <james>
Cc: <freebsd-fs@FreeBSD.org>,
	<freebsd-gnats-submit@FreeBSD.org>,
	<banhalmi@field.hu>
Subject: Re: i386/68719: [usb] USB 2.0 mobil rack+ fat32 performance problem
Date: Sat, 28 May 2005 13:45:34 +0100

 On Fri, 27 May 2005, Dominic Marks wrote:
 
 > (Posted to freebsd-fs as the PR is assigned to freebsd-usb@, but it seems to
 > be more related to the msdos filesystem than the USB system so perhaps it
 > should be reassigned?)
 
 It should be.  It is even less i386-specific than usb-specific.
 
 > I've been evaluating the performance of some usb2 hard discs with FreeBSD and
 > I found this PR (68719). The submitter is correct that performance with
 > msdosfs is severely limited.
 >
 > I tested a 'LaCie' USB2 disc:
 > ...
 > In test 1 I could not achieve any better than 5.1MB/s on an msdosfs
 > filesystem. Using UFS2 and softupdates a transfer rate of 22~25MB/s was
 > possible. Both test data sets were copied from the systems ATA-100 disc. In
 > both tests at these peaks gstat reports the device is 100% busy.
 
 I use the following to improve transfer rates for msdosfs.  The patch is
 for an old version so it might not apply directly.
 
 %%%
 Index: msdosfs_vnops.c
 ===================================================================
 RCS file: /home/ncvs/src/sys/fs/msdosfs/msdosfs_vnops.c,v
 retrieving revision 1.147
 diff -u -2 -r1.147 msdosfs_vnops.c
 --- msdosfs_vnops.c	4 Feb 2004 21:52:53 -0000	1.147
 +++ msdosfs_vnops.c	22 Feb 2004 07:27:15 -0000
 @@ -608,4 +622,5 @@
   	int error = 0;
   	u_long count;
 +	int seqcount;
   	daddr_t bn, lastcn;
   	struct buf *bp;
 @@ -693,4 +714,5 @@
   		lastcn = de_clcount(pmp, osize) - 1;
 
 +	seqcount = ioflag >> IO_SEQSHIFT;
   	do {
   		if (de_cluster(pmp, uio->uio_offset) > lastcn) {
 @@ -718,5 +740,5 @@
   			 */
   			bp = getblk(thisvp, bn, pmp->pm_bpcluster, 0, 0, 0);
 -			clrbuf(bp);
 +			vfs_bio_clrbuf(bp);
   			/*
   			 * Do the bmap now, since pcbmap needs buffers
 @@ -767,11 +789,19 @@
   		 * without delay.  Otherwise do a delayed write because we
   		 * may want to write somemore into the block later.
 +		 * XXX comment not updated with code.
   		 */
 +		if ((vp->v_mount->mnt_flag & MNT_NOCLUSTERW) == 0)
 +			bp->b_flags |= B_CLUSTEROK;
   		if (ioflag & IO_SYNC)
 -			(void) bwrite(bp);
 -		else if (n + croffset == pmp->pm_bpcluster)
 +			(void)bwrite(bp);
 +		else if (vm_page_count_severe() || buf_dirty_count_severe())
   			bawrite(bp);
 -		else
 -			bdwrite(bp);
 +		else if (n + croffset == pmp->pm_bpcluster) {
 +			if ((vp->v_mount->mnt_flag & MNT_NOCLUSTERW) == 0)
 +				cluster_write(bp, dep->de_FileSize, seqcount);
 +			else
 +				bawrite(bp);
 +  		} else
 +  			bdwrite(bp);
   		dep->de_flag |= DE_UPDATE;
   	} while (error == 0 && uio->uio_resid > 0);
 %%%
 
 Notes:
 - The xxx_count_severe() stuff doesn't work quite right and was observed
    to work especially badly for msdosfs in some configurations.  IIRC,
    only configurations with a tiny block size (e.g., 512 bytes) showed
    the problem, and the problem is more likely to be with tiny block sizes
    actually exercising the "severe" case than with msdosfs or with the
    tiny block sizes themselves.  The behaviour was apparently that when
    a severe page or buf shortage develops, the above handling makes the
    problem worse by using bawrite() instead of cluster_write().  Falling
    back to bawrite() may have made the resource shortage non-fatal, but
    it made the resource shortage last much longer since bawrite() was much
    slower, even on the reasonable fast ATA drive that I was testing on.
 - Using cluster_write() in the above is not essential.  bdwrite() works
    almost as well, or perhaps even better than cluster_write() provided
    write clustering is enabled by setting B_CLUSTEROK, since when this
    flag is set the delayed writes are clustered when they are done
    physically.
 
 > I have not made any tests of read performance but from looking at the results
 > I do not expect that it will be significantly better than write performance.
 > I may do some when I get more time to investigate and follow up if the
 > results are unexpected.
 
 Try it.  I would expect read performance to be much better.  If not, don't
 bother trying the above patch.  msdosfs uses read-ahead for read(), and
 this seems to work well so I haven't even tried changing it to use read
 clustering (the above only changes it to use write clustering).  This may
 depend on the drive doing read caching and not handling small block sizes
 too badly.  I mostly use ATA drives that have these properties.  Writing
 tinygrams tends to have a relatively higher cost because write caching is
 not enabled so clustering can only be done by the OS.
 
 > Hopefully this will generate some interest in the problem, it is beyond my
 > time and expertise but it would be very nice to be able to access MS-DOS
 > formatted filesystems at a reasonable speed!
 
 Some other changes are needed for general use at a reasonable speed:
 - use VMIO for metadata.
 - don't use pessimal block allocation.  The current allocator gives
    large inter-file fragmentation by attempting to minimise intra-file
    fragmentation, and when the file system becomes just 1/N full the
    attempt backfires and gives intra-file fragmentation too (files with
    more than N clusters are very likely to be fragmented).
 
 Bruce
 _______________________________________________
 freebsd-fs@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-fs
 To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"
 
 

From: Dominic Marks <dom@goodforbusiness.co.uk>
To: Bruce Evans <bde@zeta.org.au>
Cc: freebsd-gnats-submit@FreeBSD.org,
 banhalmi@field.hu,
 freebsd-fs@FreeBSD.org
Subject: Re: i386/68719: [usb] USB 2.0 mobil rack+ fat32 performance problem
Date: Sat, 28 May 2005 15:40:34 +0100

 On Saturday 28 May 2005 12:13, Dominic Marks wrote:
 > On Saturday 28 May 2005 11:36, Bruce Evans wrote:
 
 <snip>
 
 > >
 > > I use the following to improve transfer rates for msdosfs.  The patch is
 > > for an old version so it might not apply directly.
 > >
 > > %%%
 > > Index: msdosfs_vnops.c
 > > ===================================================================
 > > RCS file: /home/ncvs/src/sys/fs/msdosfs/msdosfs_vnops.c,v
 > > retrieving revision 1.147
 > > diff -u -2 -r1.147 msdosfs_vnops.c
 > > --- msdosfs_vnops.c	4 Feb 2004 21:52:53 -0000	1.147
 > > +++ msdosfs_vnops.c	22 Feb 2004 07:27:15 -0000
 > > @@ -608,4 +622,5 @@
 > >   	int error = 0;
 > >   	u_long count;
 > > +	int seqcount;
 > >   	daddr_t bn, lastcn;
 > >   	struct buf *bp;
 > > @@ -693,4 +714,5 @@
 > >   		lastcn = de_clcount(pmp, osize) - 1;
 > >
 > > +	seqcount = ioflag >> IO_SEQSHIFT;
 > >   	do {
 > >   		if (de_cluster(pmp, uio->uio_offset) > lastcn) {
 > > @@ -718,5 +740,5 @@
 > >   			 */
 > >   			bp = getblk(thisvp, bn, pmp->pm_bpcluster, 0, 0, 0);
 > > -			clrbuf(bp);
 > > +			vfs_bio_clrbuf(bp);
 > >   			/*
 > >   			 * Do the bmap now, since pcbmap needs buffers
 > > @@ -767,11 +789,19 @@
 > >   		 * without delay.  Otherwise do a delayed write because we
 > >   		 * may want to write somemore into the block later.
 > > +		 * XXX comment not updated with code.
 > >   		 */
 > > +		if ((vp->v_mount->mnt_flag & MNT_NOCLUSTERW) == 0)
 > > +			bp->b_flags |= B_CLUSTEROK;
 > >   		if (ioflag & IO_SYNC)
 > > -			(void) bwrite(bp);
 > > -		else if (n + croffset == pmp->pm_bpcluster)
 > > +			(void)bwrite(bp);
 > > +		else if (vm_page_count_severe() || buf_dirty_count_severe())
 > >   			bawrite(bp);
 > > -		else
 > > -			bdwrite(bp);
 > > +		else if (n + croffset == pmp->pm_bpcluster) {
 > > +			if ((vp->v_mount->mnt_flag & MNT_NOCLUSTERW) == 0)
 > > +				cluster_write(bp, dep->de_FileSize, seqcount);
 > > +			else
 > > +				bawrite(bp);
 > > +  		} else
 > > +  			bdwrite(bp);
 > >   		dep->de_flag |= DE_UPDATE;
 > >   	} while (error == 0 && uio->uio_resid > 0);
 > > %%%
 >
 > Thanks! I'll try my three tests again with this patch.
 
 Index: msdosfs_vnops.c
 ===================================================================
 RCS file: /usr/cvs/src/sys/fs/msdosfs/msdosfs_vnops.c,v
 retrieving revision 1.149.2.1
 diff -u -r1.149.2.1 msdosfs_vnops.c
 --- msdosfs_vnops.c	31 Jan 2005 23:25:56 -0000	1.149.2.1
 +++ msdosfs_vnops.c	28 May 2005 14:26:59 -0000
 @@ -607,6 +607,7 @@
  		struct uio *a_uio;
  		int a_ioflag;
  		struct ucred *a_cred;
 +		int seqcount;
  	} */ *ap;
  {
  	int n;
 @@ -615,6 +616,7 @@
  	u_long osize;
  	int error = 0;
  	u_long count;
 +	int seqcount;
  	daddr_t bn, lastcn;
  	struct buf *bp;
  	int ioflag = ap->a_ioflag;
 @@ -692,7 +694,7 @@
  	 */
  	if (uio->uio_offset + resid > osize) {
  		count = de_clcount(pmp, uio->uio_offset + resid) -
 -			de_clcount(pmp, osize);
 +		   	de_clcount(pmp, osize);
  		error = extendfile(dep, count, NULL, NULL, 0);
  		if (error &&  (error != ENOSPC || (ioflag & IO_UNIT)))
  			goto errexit;
 @@ -700,6 +702,7 @@
  	} else
  		lastcn = de_clcount(pmp, osize) - 1;
  
 +	seqcount = ioflag >> IO_SEQSHIFT;
  	do {
  		if (de_cluster(pmp, uio->uio_offset) > lastcn) {
  			error = ENOSPC;
 @@ -725,7 +728,7 @@
  			 * then no need to read data from disk.
  			 */
  			bp = getblk(thisvp, bn, pmp->pm_bpcluster, 0, 0, 0);
 -			clrbuf(bp);
 +			vfs_bio_clrbuf(bp);
  			/*
  			 * Do the bmap now, since pcbmap needs buffers
  			 * for the fat table. (see msdosfs_strategy)
 @@ -775,6 +778,7 @@
  		 * without delay.  Otherwise do a delayed write because we
  		 * may want to write somemore into the block later.
  		 */
 +		 /*
  		if (ioflag & IO_SYNC)
  			(void) bwrite(bp);
  		else if (n + croffset == pmp->pm_bpcluster)
 @@ -782,6 +786,24 @@
  		else
  			bdwrite(bp);
  		dep->de_flag |= DE_UPDATE;
 +		*/
 +		/*
 +		 * XXX Patch.
 +		 */
 +                if ((vp->v_mount->mnt_flag & MNT_NOCLUSTERW) == 0)
 +                       bp->b_flags |= B_CLUSTEROK;
 +                if (ioflag & IO_SYNC)
 +                       (void)bwrite(bp);
 +                else if (vm_page_count_severe() || buf_dirty_count_severe())
 +                       bawrite(bp);
 +                else if (n + croffset == pmp->pm_bpcluster) {
 +                       if ((vp->v_mount->mnt_flag & MNT_NOCLUSTERW) == 0)
 +                               cluster_write(bp, dep->de_FileSize, seqcount);
 +                       else
 +                               bawrite(bp);
 +               } else
 +                       bdwrite(bp);
 +                dep->de_flag |= DE_UPDATE;
  	} while (error == 0 && uio->uio_resid > 0);
  
  	/*
 
 Your patch works for me on 5.4-STABLE. It improves write performance 
 dramatically. I did another test, reading and writing 1GB chunks of data.
 
 # dd if=<in> of=<out> bs=512k count=2k
 
 ufs2/read:	28.25MB/s
 ufs2/write:	23.47MB/s
 
 msdosfs/read:	 5.08MB/s
 msdosfs/write:	23.13MB/s
 
 Raising vfs.read_max to 64 (from 8) seems to have improved the read 
 performance a little too but I have not measured how much yet.
 
 Since the patch is to the _write function is it safe to assume the same method 
 could be used to fix read performance if applied properly in the correct 
 function?
 
 Cheers,
 -- 
 Dominic
 GoodforBusiness.co.uk
 I.T. Services for SMEs in the UK.

From: "Dominic Marks" <dom@goodforbusiness.co.uk>
To: <james>
Cc: <freebsd-fs@FreeBSD.org>,
	<freebsd-gnats-submit@FreeBSD.org>,
	<banhalmi@field.hu>
Subject: Re: i386/68719: [usb] USB 2.0 mobil rack+ fat32 performance problem
Date: Sat, 28 May 2005 16:00:22 +0100

 On Saturday 28 May 2005 12:13, Dominic Marks wrote:
 > On Saturday 28 May 2005 11:36, Bruce Evans wrote:
 
 <snip>
 
 > >
 > > I use the following to improve transfer rates for msdosfs.  The patch is
 > > for an old version so it might not apply directly.
 > >
 > > %%%
 > > Index: msdosfs_vnops.c
 > > ===================================================================
 > > RCS file: /home/ncvs/src/sys/fs/msdosfs/msdosfs_vnops.c,v
 > > retrieving revision 1.147
 > > diff -u -2 -r1.147 msdosfs_vnops.c
 > > --- msdosfs_vnops.c	4 Feb 2004 21:52:53 -0000	1.147
 > > +++ msdosfs_vnops.c	22 Feb 2004 07:27:15 -0000
 > > @@ -608,4 +622,5 @@
 > >   	int error = 0;
 > >   	u_long count;
 > > +	int seqcount;
 > >   	daddr_t bn, lastcn;
 > >   	struct buf *bp;
 > > @@ -693,4 +714,5 @@
 > >   		lastcn = de_clcount(pmp, osize) - 1;
 > >
 > > +	seqcount = ioflag >> IO_SEQSHIFT;
 > >   	do {
 > >   		if (de_cluster(pmp, uio->uio_offset) > lastcn) {
 > > @@ -718,5 +740,5 @@
 > >   			 */
 > >   			bp = getblk(thisvp, bn, pmp->pm_bpcluster, 0, 0, 0);
 > > -			clrbuf(bp);
 > > +			vfs_bio_clrbuf(bp);
 > >   			/*
 > >   			 * Do the bmap now, since pcbmap needs buffers
 > > @@ -767,11 +789,19 @@
 > >   		 * without delay.  Otherwise do a delayed write because we
 > >   		 * may want to write somemore into the block later.
 > > +		 * XXX comment not updated with code.
 > >   		 */
 > > +		if ((vp->v_mount->mnt_flag & MNT_NOCLUSTERW) == 0)
 > > +			bp->b_flags |= B_CLUSTEROK;
 > >   		if (ioflag & IO_SYNC)
 > > -			(void) bwrite(bp);
 > > -		else if (n + croffset == pmp->pm_bpcluster)
 > > +			(void)bwrite(bp);
 > > +		else if (vm_page_count_severe() || buf_dirty_count_severe())
 > >   			bawrite(bp);
 > > -		else
 > > -			bdwrite(bp);
 > > +		else if (n + croffset == pmp->pm_bpcluster) {
 > > +			if ((vp->v_mount->mnt_flag & MNT_NOCLUSTERW) == 0)
 > > +				cluster_write(bp, dep->de_FileSize, seqcount);
 > > +			else
 > > +				bawrite(bp);
 > > +  		} else
 > > +  			bdwrite(bp);
 > >   		dep->de_flag |= DE_UPDATE;
 > >   	} while (error == 0 && uio->uio_resid > 0);
 > > %%%
 >
 > Thanks! I'll try my three tests again with this patch.
 
 Index: msdosfs_vnops.c
 ===================================================================
 RCS file: /usr/cvs/src/sys/fs/msdosfs/msdosfs_vnops.c,v
 retrieving revision 1.149.2.1
 diff -u -r1.149.2.1 msdosfs_vnops.c
 --- msdosfs_vnops.c	31 Jan 2005 23:25:56 -0000	1.149.2.1
 +++ msdosfs_vnops.c	28 May 2005 14:26:59 -0000
 @@ -607,6 +607,7 @@
  		struct uio *a_uio;
  		int a_ioflag;
  		struct ucred *a_cred;
 +		int seqcount;
  	} */ *ap;
  {
  	int n;
 @@ -615,6 +616,7 @@
  	u_long osize;
  	int error = 0;
  	u_long count;
 +	int seqcount;
  	daddr_t bn, lastcn;
  	struct buf *bp;
  	int ioflag = ap->a_ioflag;
 @@ -692,7 +694,7 @@
  	 */
  	if (uio->uio_offset + resid > osize) {
  		count = de_clcount(pmp, uio->uio_offset + resid) -
 -			de_clcount(pmp, osize);
 +		   	de_clcount(pmp, osize);
  		error = extendfile(dep, count, NULL, NULL, 0);
  		if (error &&  (error != ENOSPC || (ioflag & IO_UNIT)))
  			goto errexit;
 @@ -700,6 +702,7 @@
  	} else
  		lastcn = de_clcount(pmp, osize) - 1;
  
 +	seqcount = ioflag >> IO_SEQSHIFT;
  	do {
  		if (de_cluster(pmp, uio->uio_offset) > lastcn) {
  			error = ENOSPC;
 @@ -725,7 +728,7 @@
  			 * then no need to read data from disk.
  			 */
  			bp = getblk(thisvp, bn, pmp->pm_bpcluster, 0, 0, 0);
 -			clrbuf(bp);
 +			vfs_bio_clrbuf(bp);
  			/*
  			 * Do the bmap now, since pcbmap needs buffers
  			 * for the fat table. (see msdosfs_strategy)
 @@ -775,6 +778,7 @@
  		 * without delay.  Otherwise do a delayed write because we
  		 * may want to write somemore into the block later.
  		 */
 +		 /*
  		if (ioflag & IO_SYNC)
  			(void) bwrite(bp);
  		else if (n + croffset == pmp->pm_bpcluster)
 @@ -782,6 +786,24 @@
  		else
  			bdwrite(bp);
  		dep->de_flag |= DE_UPDATE;
 +		*/
 +		/*
 +		 * XXX Patch.
 +		 */
 +                if ((vp->v_mount->mnt_flag & MNT_NOCLUSTERW) == 0)
 +                       bp->b_flags |= B_CLUSTEROK;
 +                if (ioflag & IO_SYNC)
 +                       (void)bwrite(bp);
 +                else if (vm_page_count_severe() || buf_dirty_count_severe())
 +                       bawrite(bp);
 +                else if (n + croffset == pmp->pm_bpcluster) {
 +                       if ((vp->v_mount->mnt_flag & MNT_NOCLUSTERW) == 0)
 +                               cluster_write(bp, dep->de_FileSize, seqcount);
 +                       else
 +                               bawrite(bp);
 +               } else
 +                       bdwrite(bp);
 +                dep->de_flag |= DE_UPDATE;
  	} while (error == 0 && uio->uio_resid > 0);
  
  	/*
 
 Your patch works for me on 5.4-STABLE. It improves write performance 
 dramatically. I did another test, reading and writing 1GB chunks of data.
 
 # dd if=<in> of=<out> bs=512k count=2k
 
 ufs2/read:	28.25MB/s
 ufs2/write:	23.47MB/s
 
 msdosfs/read:	 5.08MB/s
 msdosfs/write:	23.13MB/s
 
 Raising vfs.read_max to 64 (from 8) seems to have improved the read 
 performance a little too but I have not measured how much yet.
 
 Since the patch is to the _write function is it safe to assume the same method 
 could be used to fix read performance if applied properly in the correct 
 function?
 
 Cheers,
 -- 
 Dominic
 GoodforBusiness.co.uk
 I.T. Services for SMEs in the UK.
 _______________________________________________
 freebsd-fs@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-fs
 To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"
 
 

From: Dominic Marks <dom@goodforbusiness.co.uk>
To: Bruce Evans <bde@zeta.org.au>
Cc: freebsd-gnats-submit@FreeBSD.org,
 banhalmi@field.hu,
 freebsd-fs@FreeBSD.org
Subject: Re: i386/68719: [usb] USB 2.0 mobil rack+ fat32 performance problem
Date: Sun, 29 May 2005 16:12:46 +0100

 --Boundary-00=_uvdmCBeoNTBAj+g
 Content-Type: text/plain;
   charset="iso-8859-1"
 Content-Transfer-Encoding: 7bit
 Content-Disposition: inline
 
 On Saturday 28 May 2005 15:40, Dominic Marks wrote:
 > On Saturday 28 May 2005 12:13, Dominic Marks wrote:
 > > On Saturday 28 May 2005 11:36, Bruce Evans wrote:
 >
 > <snip>
 >
 > > > I use the following to improve transfer rates for msdosfs.  The patch
 > > > is for an old version so it might not apply directly.
 > > >
 
 <snip>
 
 > > Thanks! I'll try my three tests again with this patch.
 >
 > Index: msdosfs_vnops.c
 > ===================================================================
 > RCS file: /usr/cvs/src/sys/fs/msdosfs/msdosfs_vnops.c,v
 > retrieving revision 1.149.2.1
 > diff -u -r1.149.2.1 msdosfs_vnops.c
 > --- msdosfs_vnops.c	31 Jan 2005 23:25:56 -0000	1.149.2.1
 > +++ msdosfs_vnops.c	28 May 2005 14:26:59 -0000
 > @@ -607,6 +607,7 @@
 >  		struct uio *a_uio;
 >  		int a_ioflag;
 >  		struct ucred *a_cred;
 > +		int seqcount;
 >  	} */ *ap;
 >  {
 >  	int n;
 > @@ -615,6 +616,7 @@
 >  	u_long osize;
 >  	int error = 0;
 >  	u_long count;
 > +	int seqcount;
 >  	daddr_t bn, lastcn;
 >  	struct buf *bp;
 >  	int ioflag = ap->a_ioflag;
 > @@ -692,7 +694,7 @@
 >  	 */
 >  	if (uio->uio_offset + resid > osize) {
 >  		count = de_clcount(pmp, uio->uio_offset + resid) -
 > -			de_clcount(pmp, osize);
 > +		   	de_clcount(pmp, osize);
 >  		error = extendfile(dep, count, NULL, NULL, 0);
 >  		if (error &&  (error != ENOSPC || (ioflag & IO_UNIT)))
 >  			goto errexit;
 > @@ -700,6 +702,7 @@
 >  	} else
 >  		lastcn = de_clcount(pmp, osize) - 1;
 >
 > +	seqcount = ioflag >> IO_SEQSHIFT;
 >  	do {
 >  		if (de_cluster(pmp, uio->uio_offset) > lastcn) {
 >  			error = ENOSPC;
 > @@ -725,7 +728,7 @@
 >  			 * then no need to read data from disk.
 >  			 */
 >  			bp = getblk(thisvp, bn, pmp->pm_bpcluster, 0, 0, 0);
 > -			clrbuf(bp);
 > +			vfs_bio_clrbuf(bp);
 >  			/*
 >  			 * Do the bmap now, since pcbmap needs buffers
 >  			 * for the fat table. (see msdosfs_strategy)
 > @@ -775,6 +778,7 @@
 >  		 * without delay.  Otherwise do a delayed write because we
 >  		 * may want to write somemore into the block later.
 >  		 */
 > +		 /*
 >  		if (ioflag & IO_SYNC)
 >  			(void) bwrite(bp);
 >  		else if (n + croffset == pmp->pm_bpcluster)
 > @@ -782,6 +786,24 @@
 >  		else
 >  			bdwrite(bp);
 >  		dep->de_flag |= DE_UPDATE;
 > +		*/
 > +		/*
 > +		 * XXX Patch.
 > +		 */
 > +                if ((vp->v_mount->mnt_flag & MNT_NOCLUSTERW) == 0)
 > +                       bp->b_flags |= B_CLUSTEROK;
 > +                if (ioflag & IO_SYNC)
 > +                       (void)bwrite(bp);
 > +                else if (vm_page_count_severe() ||
 > buf_dirty_count_severe()) +                       bawrite(bp);
 > +                else if (n + croffset == pmp->pm_bpcluster) {
 > +                       if ((vp->v_mount->mnt_flag & MNT_NOCLUSTERW) == 0)
 > +                               cluster_write(bp, dep->de_FileSize,
 > seqcount); +                       else
 > +                               bawrite(bp);
 > +               } else
 > +                       bdwrite(bp);
 > +                dep->de_flag |= DE_UPDATE;
 >  	} while (error == 0 && uio->uio_resid > 0);
 >
 >  	/*
 >
 > Your patch works for me on 5.4-STABLE. It improves write performance
 > dramatically. I did another test, reading and writing 1GB chunks of data.
 
 <snip>
 
 > Since the patch is to the _write function is it safe to assume the same
 > method could be used to fix read performance if applied properly in the
 > correct function?
 >
 > Cheers,
 
 I have been experimenting in msdosfs_read and I have managed to come up with 
 something that works, but I'm sure it is flawed. On large file reads it will 
 improve read performance (see below) - but only after a long period of the 
 file copy achieving only 3MB/s (see A1). During this time gstat reports the 
 disc itself is reading at its maximum of around 28MB/s. After a long period 
 of low throughput, the disc drops to 25MB/s but the actual transfer rate 
 increases to 25MB/s (see A2).
 
 I've tried to narrow it down to something but I'm mostly in the dark, so I'll 
 just hand over what I found to work to review. I looked at Bruce's changes to 
 msdosfs_write and tried to do the same (implement cluster_read) using the 
 ext2 and ffs _read methods as a how-to. I think I'm reading ahead too far, or 
 too early. I have been unable to interpret the gstat output during the first 
 part of the transfer any further.
 
 gstat(8) output at the start (slow, A1), and middle (fast, A2) of a large file 
 copy between msdosfs/usb drive (da0s1) and ufs2/ata-100 (ad1).
 
 # A1
 dT: 0.501  flag_I 500000us  sizeof 240  i -1
  L(q)  ops/s    r/s   kBps   ms/r    w/s   kBps   ms/w   %busy Name
    14    445    445  28376   24.7      0      0    0.0   99.9| da0
     0     28      0      0    0.0     28   3578    1.7    4.8| ad1
     0     28      0      0    0.0     28   3578    1.7    4.8| ad1s1
    14    445    445  28376   24.9      0      0    0.0  100.0| da0s1
 
 After 30-45 seconds (1GB test file):
 
 # A2
 dT: 0.501  flag_I 500000us  sizeof 240  i -1
  L(q)  ops/s    r/s   kBps   ms/r    w/s   kBps   ms/w   %busy Name
     1    403    403  25428    2.1      0      0    0.0   85.0| da0
     0    199      0      0    0.0    199  25532    1.7   34.0| ad1
     0    199      0      0    0.0    199  25532    1.7   34.1| ad1s1
     1    403    403  25428    2.1      0      0    0.0   85.9| da0s1
 
 The patch which combines Bruce's original patch for msdosfs_write, revised for 
 current text positions, and my attempts to do the same for msdosfs_read.
 
 %%
 Index: msdosfs_vnops.c
 ===================================================================
 RCS file: /usr/cvs/src/sys/fs/msdosfs/msdosfs_vnops.c,v
 retrieving revision 1.149.2.1
 diff -u -r1.149.2.1 msdosfs_vnops.c
 --- msdosfs_vnops.c	31 Jan 2005 23:25:56 -0000	1.149.2.1
 +++ msdosfs_vnops.c	29 May 2005 14:10:18 -0000
 @@ -517,6 +517,8 @@
  	int blsize;
  	int isadir;
  	int orig_resid;
 +	int nblks = 16; /* XXX should be defined, but not here */
 +	int crsize;
  	u_int n;
  	u_long diff;
  	u_long on;
 @@ -565,14 +567,21 @@
  			error = bread(pmp->pm_devvp, lbn, blsize, NOCRED, &bp);
  		} else {
  			blsize = pmp->pm_bpcluster;
 -			rablock = lbn + 1;
 -			if (seqcount > 1 &&
 -			    de_cn2off(pmp, rablock) < dep->de_FileSize) {
 -				rasize = pmp->pm_bpcluster;
 -				error = breadn(vp, lbn, blsize,
 -				    &rablock, &rasize, 1, NOCRED, &bp);
 +			/* XXX what is the best value for crsize? */
 + 			crsize = blsize * nblks > MAXBSIZE ? MAXBSIZE : blsize * nblks;
 +			if ((vp->v_mount->mnt_flag & MNT_NOCLUSTERR) == 0) {
 +				error = cluster_read(vp, dep->de_FileSize, lbn,
 +					crsize, NOCRED, uio->uio_resid, seqcount, &bp);
  			} else {
 -				error = bread(vp, lbn, blsize, NOCRED, &bp);
 +				rablock = lbn + 1;
 +				if (seqcount > 1 &&
 +					de_cn2off(pmp, rablock) < dep->de_FileSize) {
 +						rasize = pmp->pm_bpcluster;
 +						error = breadn(vp, lbn, blsize,
 +						&rablock, &rasize, 1, NOCRED, &bp);
 +				} else {
 +					error = bread(vp, lbn, blsize, NOCRED, &bp);
 +				}
  			}
  		}
  		if (error) {
 @@ -580,14 +589,16 @@
  			break;
  		}
  		on = uio->uio_offset & pmp->pm_crbomask;
 -		diff = pmp->pm_bpcluster - on;
 -		n = diff > uio->uio_resid ? uio->uio_resid : diff;
 +		diff = blsize * nblks - on;
 +		n = blsize * nblks > uio->uio_resid ? uio->uio_resid : blsize * nblks;
  		diff = dep->de_FileSize - uio->uio_offset;
 -		if (diff < n)
 +		if (diff < n) {
  			n = diff;
 -		diff = blsize - bp->b_resid;
 -		if (diff < n)
 +		}
 +		diff = blsize * nblks - bp->b_resid;
 +		if (diff < n) {
  			n = diff;
 +		}
  		error = uiomove(bp->b_data + on, (int) n, uio);
  		brelse(bp);
  	} while (error == 0 && uio->uio_resid > 0 && n != 0);
 @@ -607,6 +618,7 @@
  		struct uio *a_uio;
  		int a_ioflag;
  		struct ucred *a_cred;
 +		int seqcount;
  	} */ *ap;
  {
  	int n;
 @@ -615,6 +627,7 @@
  	u_long osize;
  	int error = 0;
  	u_long count;
 +	int seqcount;
  	daddr_t bn, lastcn;
  	struct buf *bp;
  	int ioflag = ap->a_ioflag;
 @@ -692,7 +705,7 @@
  	 */
  	if (uio->uio_offset + resid > osize) {
  		count = de_clcount(pmp, uio->uio_offset + resid) -
 -			de_clcount(pmp, osize);
 +		   	de_clcount(pmp, osize);
  		error = extendfile(dep, count, NULL, NULL, 0);
  		if (error &&  (error != ENOSPC || (ioflag & IO_UNIT)))
  			goto errexit;
 @@ -700,6 +713,7 @@
  	} else
  		lastcn = de_clcount(pmp, osize) - 1;
  
 +	seqcount = ioflag >> IO_SEQSHIFT;
  	do {
  		if (de_cluster(pmp, uio->uio_offset) > lastcn) {
  			error = ENOSPC;
 @@ -725,7 +739,7 @@
  			 * then no need to read data from disk.
  			 */
  			bp = getblk(thisvp, bn, pmp->pm_bpcluster, 0, 0, 0);
 -			clrbuf(bp);
 +			vfs_bio_clrbuf(bp);
  			/*
  			 * Do the bmap now, since pcbmap needs buffers
  			 * for the fat table. (see msdosfs_strategy)
 @@ -775,6 +789,7 @@
  		 * without delay.  Otherwise do a delayed write because we
  		 * may want to write somemore into the block later.
  		 */
 +		 /*
  		if (ioflag & IO_SYNC)
  			(void) bwrite(bp);
  		else if (n + croffset == pmp->pm_bpcluster)
 @@ -782,6 +797,24 @@
  		else
  			bdwrite(bp);
  		dep->de_flag |= DE_UPDATE;
 +		*/
 +		/*
 +		 * XXX Patch.
 +		 */
 +                if ((vp->v_mount->mnt_flag & MNT_NOCLUSTERW) == 0)
 +                       bp->b_flags |= B_CLUSTEROK;
 +                if (ioflag & IO_SYNC)
 +                       (void)bwrite(bp);
 +                else if (vm_page_count_severe() || buf_dirty_count_severe())
 +                       bawrite(bp);
 +                else if (n + croffset == pmp->pm_bpcluster) {
 +                       if ((vp->v_mount->mnt_flag & MNT_NOCLUSTERW) == 0)
 +                               cluster_write(bp, dep->de_FileSize, seqcount);
 +                       else
 +                               bawrite(bp);
 +               } else
 +                       bdwrite(bp);
 +                dep->de_flag |= DE_UPDATE;
  	} while (error == 0 && uio->uio_resid > 0);
  
  	/*
 
 %%
 
 With this patch I can get the following transfer rates:
 
 msdosfs reading
 
 # ls -lh /mnt/random2.file 
 -rwxr-xr-x  1 root  wheel   1.0G May 29 11:24 /mnt/random2.file
 
 # /usr/bin/time -al cp /mnt/random2.file /vol
        59.61 real         0.05 user         6.79 sys
        632  maximum resident set size
         11  average shared memory size
         80  average unshared data size
        123  average unshared stack size
         88  page reclaims
          0  page faults
          0  swaps
      23757  block input operations **
       8192  block output operations
          0  messages sent
          0  messages received
          0  signals received
      16660  voluntary context switches
      10387  involuntary context switches
 
 Average Rate: 15.31MB/s. (Would be higher if not for the slow start)
 
 ** This figure is 3x that of the UFS2 operations. This must be a indicator of 
 what I'm doing wrong, but I don't know what.
 
 msdosfs writing
 
 # /usr/bin/time -al cp /vol/random2.file /mnt
        47.33 real         0.03 user         7.13 sys
        632  maximum resident set size
         12  average shared memory size
         85  average unshared data size
        130  average unshared stack size
         88  page reclaims
          0  page faults
          0  swaps
       8735  block input operations
      16385  block output operations
          0  messages sent
          0  messages received
          0  signals received
       8856  voluntary context switches
      29631  involuntary context switches
 
 Average Rate: 18.79MB/s.
 
 To compare with UFS2 + softupdates on the same system / disc.
 
 ufs2 reading
 
 # /usr/bin/time -al cp /mnt/random2.file /vol
        42.39 real         0.02 user         6.61 sys
        632  maximum resident set size
         12  average shared memory size
         87  average unshared data size
        133  average unshared stack size
         88  page reclaims
          0  page faults
          0  swaps
       8249  block input operations
       8193  block output operations
          0  messages sent
          0  messages received
          0  signals received
       8246  voluntary context switches
      24617  involuntary context switches
 
 Average Rate: 20.89MB/s.
 
 ufs2 writing
 
 # /usr/bin/time -al cp /vol/random2.file /mnt/
        47.12 real         0.03 user         6.74 sys
        632  maximum resident set size
         12  average shared memory size
         85  average unshared data size
        130  average unshared stack size
         88  page reclaims
          0  page faults
          0  swaps
       8260  block input operations
       8192  block output operations
          0  messages sent
          0  messages received
          0  signals received
       8303  voluntary context switches
      24700  involuntary context switches
 
 Average Rate: 19MB/s.
 
 I'd hope that someone could point me in the right direction so I can clean the 
 patch up, or finish this off themselves and commit it (or something else 
 which will increase the read/write speed).
 
 Thanks very much,
 -- 
 Dominic
 GoodforBusiness.co.uk
 I.T. Services for SMEs in the UK.
 
 --Boundary-00=_uvdmCBeoNTBAj+g
 Content-Type: text/x-diff;
   charset="iso-8859-1";
   name="msdos-perf-releng5.patch"
 Content-Transfer-Encoding: 7bit
 Content-Disposition: attachment;
 	filename="msdos-perf-releng5.patch"
 
 Index: msdosfs_vnops.c
 ===================================================================
 RCS file: /usr/cvs/src/sys/fs/msdosfs/msdosfs_vnops.c,v
 retrieving revision 1.149.2.1
 diff -u -r1.149.2.1 msdosfs_vnops.c
 --- msdosfs_vnops.c	31 Jan 2005 23:25:56 -0000	1.149.2.1
 +++ msdosfs_vnops.c	29 May 2005 14:10:18 -0000
 @@ -517,6 +517,8 @@
  	int blsize;
  	int isadir;
  	int orig_resid;
 +	int nblks = 16; /* XXX should be defined, but not here */
 +	int crsize;
  	u_int n;
  	u_long diff;
  	u_long on;
 @@ -565,14 +567,21 @@
  			error = bread(pmp->pm_devvp, lbn, blsize, NOCRED, &bp);
  		} else {
  			blsize = pmp->pm_bpcluster;
 -			rablock = lbn + 1;
 -			if (seqcount > 1 &&
 -			    de_cn2off(pmp, rablock) < dep->de_FileSize) {
 -				rasize = pmp->pm_bpcluster;
 -				error = breadn(vp, lbn, blsize,
 -				    &rablock, &rasize, 1, NOCRED, &bp);
 +			/* XXX what is the best value for crsize? */
 + 			crsize = blsize * nblks > MAXBSIZE ? MAXBSIZE : blsize * nblks;
 +			if ((vp->v_mount->mnt_flag & MNT_NOCLUSTERR) == 0) {
 +				error = cluster_read(vp, dep->de_FileSize, lbn,
 +					crsize, NOCRED, uio->uio_resid, seqcount, &bp);
  			} else {
 -				error = bread(vp, lbn, blsize, NOCRED, &bp);
 +				rablock = lbn + 1;
 +				if (seqcount > 1 &&
 +					de_cn2off(pmp, rablock) < dep->de_FileSize) {
 +						rasize = pmp->pm_bpcluster;
 +						error = breadn(vp, lbn, blsize,
 +						&rablock, &rasize, 1, NOCRED, &bp);
 +				} else {
 +					error = bread(vp, lbn, blsize, NOCRED, &bp);
 +				}
  			}
  		}
  		if (error) {
 @@ -580,14 +589,16 @@
  			break;
  		}
  		on = uio->uio_offset & pmp->pm_crbomask;
 -		diff = pmp->pm_bpcluster - on;
 -		n = diff > uio->uio_resid ? uio->uio_resid : diff;
 +		diff = blsize * nblks - on;
 +		n = blsize * nblks > uio->uio_resid ? uio->uio_resid : blsize * nblks;
  		diff = dep->de_FileSize - uio->uio_offset;
 -		if (diff < n)
 +		if (diff < n) {
  			n = diff;
 -		diff = blsize - bp->b_resid;
 -		if (diff < n)
 +		}
 +		diff = blsize * nblks - bp->b_resid;
 +		if (diff < n) {
  			n = diff;
 +		}
  		error = uiomove(bp->b_data + on, (int) n, uio);
  		brelse(bp);
  	} while (error == 0 && uio->uio_resid > 0 && n != 0);
 @@ -607,6 +618,7 @@
  		struct uio *a_uio;
  		int a_ioflag;
  		struct ucred *a_cred;
 +		int seqcount;
  	} */ *ap;
  {
  	int n;
 @@ -615,6 +627,7 @@
  	u_long osize;
  	int error = 0;
  	u_long count;
 +	int seqcount;
  	daddr_t bn, lastcn;
  	struct buf *bp;
  	int ioflag = ap->a_ioflag;
 @@ -692,7 +705,7 @@
  	 */
  	if (uio->uio_offset + resid > osize) {
  		count = de_clcount(pmp, uio->uio_offset + resid) -
 -			de_clcount(pmp, osize);
 +		   	de_clcount(pmp, osize);
  		error = extendfile(dep, count, NULL, NULL, 0);
  		if (error &&  (error != ENOSPC || (ioflag & IO_UNIT)))
  			goto errexit;
 @@ -700,6 +713,7 @@
  	} else
  		lastcn = de_clcount(pmp, osize) - 1;
  
 +	seqcount = ioflag >> IO_SEQSHIFT;
  	do {
  		if (de_cluster(pmp, uio->uio_offset) > lastcn) {
  			error = ENOSPC;
 @@ -725,7 +739,7 @@
  			 * then no need to read data from disk.
  			 */
  			bp = getblk(thisvp, bn, pmp->pm_bpcluster, 0, 0, 0);
 -			clrbuf(bp);
 +			vfs_bio_clrbuf(bp);
  			/*
  			 * Do the bmap now, since pcbmap needs buffers
  			 * for the fat table. (see msdosfs_strategy)
 @@ -775,6 +789,7 @@
  		 * without delay.  Otherwise do a delayed write because we
  		 * may want to write somemore into the block later.
  		 */
 +		 /*
  		if (ioflag & IO_SYNC)
  			(void) bwrite(bp);
  		else if (n + croffset == pmp->pm_bpcluster)
 @@ -782,6 +797,24 @@
  		else
  			bdwrite(bp);
  		dep->de_flag |= DE_UPDATE;
 +		*/
 +		/*
 +		 * XXX Patch.
 +		 */
 +                if ((vp->v_mount->mnt_flag & MNT_NOCLUSTERW) == 0)
 +                       bp->b_flags |= B_CLUSTEROK;
 +                if (ioflag & IO_SYNC)
 +                       (void)bwrite(bp);
 +                else if (vm_page_count_severe() || buf_dirty_count_severe())
 +                       bawrite(bp);
 +                else if (n + croffset == pmp->pm_bpcluster) {
 +                       if ((vp->v_mount->mnt_flag & MNT_NOCLUSTERW) == 0)
 +                               cluster_write(bp, dep->de_FileSize, seqcount);
 +                       else
 +                               bawrite(bp);
 +               } else
 +                       bdwrite(bp);
 +                dep->de_flag |= DE_UPDATE;
  	} while (error == 0 && uio->uio_resid > 0);
  
  	/*
 
 --Boundary-00=_uvdmCBeoNTBAj+g--

From: "Dominic Marks" <dom@goodforbusiness.co.uk>
To: <james>
Cc: <freebsd-fs@FreeBSD.org>,
	<freebsd-gnats-submit@FreeBSD.org>,
	<banhalmi@field.hu>
Subject: Re: i386/68719: [usb] USB 2.0 mobil rack+ fat32 performance problem
Date: Sun, 29 May 2005 16:15:48 +0100

 This is a multi-part message in MIME format.
 
 --Boundary-00=_uvdmCBeoNTBAj+g
 Content-Type: text/plain;
 	charset="iso-8859-1"
 Content-Transfer-Encoding: 7bit
 Content-Disposition: inline
 
 On Saturday 28 May 2005 15:40, Dominic Marks wrote:
 > On Saturday 28 May 2005 12:13, Dominic Marks wrote:
 > > On Saturday 28 May 2005 11:36, Bruce Evans wrote:
 >
 > <snip>
 >
 > > > I use the following to improve transfer rates for msdosfs.  The patch
 > > > is for an old version so it might not apply directly.
 > > >
 
 <snip>
 
 > > Thanks! I'll try my three tests again with this patch.
 >
 > Index: msdosfs_vnops.c
 > ===================================================================
 > RCS file: /usr/cvs/src/sys/fs/msdosfs/msdosfs_vnops.c,v
 > retrieving revision 1.149.2.1
 > diff -u -r1.149.2.1 msdosfs_vnops.c
 > --- msdosfs_vnops.c	31 Jan 2005 23:25:56 -0000	1.149.2.1
 > +++ msdosfs_vnops.c	28 May 2005 14:26:59 -0000
 > @@ -607,6 +607,7 @@
 >  		struct uio *a_uio;
 >  		int a_ioflag;
 >  		struct ucred *a_cred;
 > +		int seqcount;
 >  	} */ *ap;
 >  {
 >  	int n;
 > @@ -615,6 +616,7 @@
 >  	u_long osize;
 >  	int error = 0;
 >  	u_long count;
 > +	int seqcount;
 >  	daddr_t bn, lastcn;
 >  	struct buf *bp;
 >  	int ioflag = ap->a_ioflag;
 > @@ -692,7 +694,7 @@
 >  	 */
 >  	if (uio->uio_offset + resid > osize) {
 >  		count = de_clcount(pmp, uio->uio_offset + resid) -
 > -			de_clcount(pmp, osize);
 > +		   	de_clcount(pmp, osize);
 >  		error = extendfile(dep, count, NULL, NULL, 0);
 >  		if (error &&  (error != ENOSPC || (ioflag & IO_UNIT)))
 >  			goto errexit;
 > @@ -700,6 +702,7 @@
 >  	} else
 >  		lastcn = de_clcount(pmp, osize) - 1;
 >
 > +	seqcount = ioflag >> IO_SEQSHIFT;
 >  	do {
 >  		if (de_cluster(pmp, uio->uio_offset) > lastcn) {
 >  			error = ENOSPC;
 > @@ -725,7 +728,7 @@
 >  			 * then no need to read data from disk.
 >  			 */
 >  			bp = getblk(thisvp, bn, pmp->pm_bpcluster, 0, 0, 0);
 > -			clrbuf(bp);
 > +			vfs_bio_clrbuf(bp);
 >  			/*
 >  			 * Do the bmap now, since pcbmap needs buffers
 >  			 * for the fat table. (see msdosfs_strategy)
 > @@ -775,6 +778,7 @@
 >  		 * without delay.  Otherwise do a delayed write because we
 >  		 * may want to write somemore into the block later.
 >  		 */
 > +		 /*
 >  		if (ioflag & IO_SYNC)
 >  			(void) bwrite(bp);
 >  		else if (n + croffset == pmp->pm_bpcluster)
 > @@ -782,6 +786,24 @@
 >  		else
 >  			bdwrite(bp);
 >  		dep->de_flag |= DE_UPDATE;
 > +		*/
 > +		/*
 > +		 * XXX Patch.
 > +		 */
 > +                if ((vp->v_mount->mnt_flag & MNT_NOCLUSTERW) == 0)
 > +                       bp->b_flags |= B_CLUSTEROK;
 > +                if (ioflag & IO_SYNC)
 > +                       (void)bwrite(bp);
 > +                else if (vm_page_count_severe() ||
 > buf_dirty_count_severe()) +                       bawrite(bp);
 > +                else if (n + croffset == pmp->pm_bpcluster) {
 > +                       if ((vp->v_mount->mnt_flag & MNT_NOCLUSTERW) == 0)
 > +                               cluster_write(bp, dep->de_FileSize,
 > seqcount); +                       else
 > +                               bawrite(bp);
 > +               } else
 > +                       bdwrite(bp);
 > +                dep->de_flag |= DE_UPDATE;
 >  	} while (error == 0 && uio->uio_resid > 0);
 >
 >  	/*
 >
 > Your patch works for me on 5.4-STABLE. It improves write performance
 > dramatically. I did another test, reading and writing 1GB chunks of data.
 
 <snip>
 
 > Since the patch is to the _write function is it safe to assume the same
 > method could be used to fix read performance if applied properly in the
 > correct function?
 >
 > Cheers,
 
 I have been experimenting in msdosfs_read and I have managed to come up with 
 something that works, but I'm sure it is flawed. On large file reads it will 
 improve read performance (see below) - but only after a long period of the 
 file copy achieving only 3MB/s (see A1). During this time gstat reports the 
 disc itself is reading at its maximum of around 28MB/s. After a long period 
 of low throughput, the disc drops to 25MB/s but the actual transfer rate 
 increases to 25MB/s (see A2).
 
 I've tried to narrow it down to something but I'm mostly in the dark, so I'll 
 just hand over what I found to work to review. I looked at Bruce's changes to 
 msdosfs_write and tried to do the same (implement cluster_read) using the 
 ext2 and ffs _read methods as a how-to. I think I'm reading ahead too far, or 
 too early. I have been unable to interpret the gstat output during the first 
 part of the transfer any further.
 
 gstat(8) output at the start (slow, A1), and middle (fast, A2) of a large file 
 copy between msdosfs/usb drive (da0s1) and ufs2/ata-100 (ad1).
 
 # A1
 dT: 0.501  flag_I 500000us  sizeof 240  i -1
  L(q)  ops/s    r/s   kBps   ms/r    w/s   kBps   ms/w   %busy Name
    14    445    445  28376   24.7      0      0    0.0   99.9| da0
     0     28      0      0    0.0     28   3578    1.7    4.8| ad1
     0     28      0      0    0.0     28   3578    1.7    4.8| ad1s1
    14    445    445  28376   24.9      0      0    0.0  100.0| da0s1
 
 After 30-45 seconds (1GB test file):
 
 # A2
 dT: 0.501  flag_I 500000us  sizeof 240  i -1
  L(q)  ops/s    r/s   kBps   ms/r    w/s   kBps   ms/w   %busy Name
     1    403    403  25428    2.1      0      0    0.0   85.0| da0
     0    199      0      0    0.0    199  25532    1.7   34.0| ad1
     0    199      0      0    0.0    199  25532    1.7   34.1| ad1s1
     1    403    403  25428    2.1      0      0    0.0   85.9| da0s1
 
 The patch which combines Bruce's original patch for msdosfs_write, revised for 
 current text positions, and my attempts to do the same for msdosfs_read.
 
 %%
 Index: msdosfs_vnops.c
 ===================================================================
 RCS file: /usr/cvs/src/sys/fs/msdosfs/msdosfs_vnops.c,v
 retrieving revision 1.149.2.1
 diff -u -r1.149.2.1 msdosfs_vnops.c
 --- msdosfs_vnops.c	31 Jan 2005 23:25:56 -0000	1.149.2.1
 +++ msdosfs_vnops.c	29 May 2005 14:10:18 -0000
 @@ -517,6 +517,8 @@
  	int blsize;
  	int isadir;
  	int orig_resid;
 +	int nblks = 16; /* XXX should be defined, but not here */
 +	int crsize;
  	u_int n;
  	u_long diff;
  	u_long on;
 @@ -565,14 +567,21 @@
  			error = bread(pmp->pm_devvp, lbn, blsize, NOCRED, &bp);
  		} else {
  			blsize = pmp->pm_bpcluster;
 -			rablock = lbn + 1;
 -			if (seqcount > 1 &&
 -			    de_cn2off(pmp, rablock) < dep->de_FileSize) {
 -				rasize = pmp->pm_bpcluster;
 -				error = breadn(vp, lbn, blsize,
 -				    &rablock, &rasize, 1, NOCRED, &bp);
 +			/* XXX what is the best value for crsize? */
 + 			crsize = blsize * nblks > MAXBSIZE ? MAXBSIZE : blsize * nblks;
 +			if ((vp->v_mount->mnt_flag & MNT_NOCLUSTERR) == 0) {
 +				error = cluster_read(vp, dep->de_FileSize, lbn,
 +					crsize, NOCRED, uio->uio_resid, seqcount, &bp);
  			} else {
 -				error = bread(vp, lbn, blsize, NOCRED, &bp);
 +				rablock = lbn + 1;
 +				if (seqcount > 1 &&
 +					de_cn2off(pmp, rablock) < dep->de_FileSize) {
 +						rasize = pmp->pm_bpcluster;
 +						error = breadn(vp, lbn, blsize,
 +						&rablock, &rasize, 1, NOCRED, &bp);
 +				} else {
 +					error = bread(vp, lbn, blsize, NOCRED, &bp);
 +				}
  			}
  		}
  		if (error) {
 @@ -580,14 +589,16 @@
  			break;
  		}
  		on = uio->uio_offset & pmp->pm_crbomask;
 -		diff = pmp->pm_bpcluster - on;
 -		n = diff > uio->uio_resid ? uio->uio_resid : diff;
 +		diff = blsize * nblks - on;
 +		n = blsize * nblks > uio->uio_resid ? uio->uio_resid : blsize * nblks;
  		diff = dep->de_FileSize - uio->uio_offset;
 -		if (diff < n)
 +		if (diff < n) {
  			n = diff;
 -		diff = blsize - bp->b_resid;
 -		if (diff < n)
 +		}
 +		diff = blsize * nblks - bp->b_resid;
 +		if (diff < n) {
  			n = diff;
 +		}
  		error = uiomove(bp->b_data + on, (int) n, uio);
  		brelse(bp);
  	} while (error == 0 && uio->uio_resid > 0 && n != 0);
 @@ -607,6 +618,7 @@
  		struct uio *a_uio;
  		int a_ioflag;
  		struct ucred *a_cred;
 +		int seqcount;
  	} */ *ap;
  {
  	int n;
 @@ -615,6 +627,7 @@
  	u_long osize;
  	int error = 0;
  	u_long count;
 +	int seqcount;
  	daddr_t bn, lastcn;
  	struct buf *bp;
  	int ioflag = ap->a_ioflag;
 @@ -692,7 +705,7 @@
  	 */
  	if (uio->uio_offset + resid > osize) {
  		count = de_clcount(pmp, uio->uio_offset + resid) -
 -			de_clcount(pmp, osize);
 +		   	de_clcount(pmp, osize);
  		error = extendfile(dep, count, NULL, NULL, 0);
  		if (error &&  (error != ENOSPC || (ioflag & IO_UNIT)))
  			goto errexit;
 @@ -700,6 +713,7 @@
  	} else
  		lastcn = de_clcount(pmp, osize) - 1;
  
 +	seqcount = ioflag >> IO_SEQSHIFT;
  	do {
  		if (de_cluster(pmp, uio->uio_offset) > lastcn) {
  			error = ENOSPC;
 @@ -725,7 +739,7 @@
  			 * then no need to read data from disk.
  			 */
  			bp = getblk(thisvp, bn, pmp->pm_bpcluster, 0, 0, 0);
 -			clrbuf(bp);
 +			vfs_bio_clrbuf(bp);
  			/*
  			 * Do the bmap now, since pcbmap needs buffers
  			 * for the fat table. (see msdosfs_strategy)
 @@ -775,6 +789,7 @@
  		 * without delay.  Otherwise do a delayed write because we
  		 * may want to write somemore into the block later.
  		 */
 +		 /*
  		if (ioflag & IO_SYNC)
  			(void) bwrite(bp);
  		else if (n + croffset == pmp->pm_bpcluster)
 @@ -782,6 +797,24 @@
  		else
  			bdwrite(bp);
  		dep->de_flag |= DE_UPDATE;
 +		*/
 +		/*
 +		 * XXX Patch.
 +		 */
 +                if ((vp->v_mount->mnt_flag & MNT_NOCLUSTERW) == 0)
 +                       bp->b_flags |= B_CLUSTEROK;
 +                if (ioflag & IO_SYNC)
 +                       (void)bwrite(bp);
 +                else if (vm_page_count_severe() || buf_dirty_count_severe())
 +                       bawrite(bp);
 +                else if (n + croffset == pmp->pm_bpcluster) {
 +                       if ((vp->v_mount->mnt_flag & MNT_NOCLUSTERW) == 0)
 +                               cluster_write(bp, dep->de_FileSize, seqcount);
 +                       else
 +                               bawrite(bp);
 +               } else
 +                       bdwrite(bp);
 +                dep->de_flag |= DE_UPDATE;
  	} while (error == 0 && uio->uio_resid > 0);
  
  	/*
 
 %%
 
 With this patch I can get the following transfer rates:
 
 msdosfs reading
 
 # ls -lh /mnt/random2.file 
 -rwxr-xr-x  1 root  wheel   1.0G May 29 11:24 /mnt/random2.file
 
 # /usr/bin/time -al cp /mnt/random2.file /vol
        59.61 real         0.05 user         6.79 sys
        632  maximum resident set size
         11  average shared memory size
         80  average unshared data size
        123  average unshared stack size
         88  page reclaims
          0  page faults
          0  swaps
      23757  block input operations **
       8192  block output operations
          0  messages sent
          0  messages received
          0  signals received
      16660  voluntary context switches
      10387  involuntary context switches
 
 Average Rate: 15.31MB/s. (Would be higher if not for the slow start)
 
 ** This figure is 3x that of the UFS2 operations. This must be a indicator of 
 what I'm doing wrong, but I don't know what.
 
 msdosfs writing
 
 # /usr/bin/time -al cp /vol/random2.file /mnt
        47.33 real         0.03 user         7.13 sys
        632  maximum resident set size
         12  average shared memory size
         85  average unshared data size
        130  average unshared stack size
         88  page reclaims
          0  page faults
          0  swaps
       8735  block input operations
      16385  block output operations
          0  messages sent
          0  messages received
          0  signals received
       8856  voluntary context switches
      29631  involuntary context switches
 
 Average Rate: 18.79MB/s.
 
 To compare with UFS2 + softupdates on the same system / disc.
 
 ufs2 reading
 
 # /usr/bin/time -al cp /mnt/random2.file /vol
        42.39 real         0.02 user         6.61 sys
        632  maximum resident set size
         12  average shared memory size
         87  average unshared data size
        133  average unshared stack size
         88  page reclaims
          0  page faults
          0  swaps
       8249  block input operations
       8193  block output operations
          0  messages sent
          0  messages received
          0  signals received
       8246  voluntary context switches
      24617  involuntary context switches
 
 Average Rate: 20.89MB/s.
 
 ufs2 writing
 
 # /usr/bin/time -al cp /vol/random2.file /mnt/
        47.12 real         0.03 user         6.74 sys
        632  maximum resident set size
         12  average shared memory size
         85  average unshared data size
        130  average unshared stack size
         88  page reclaims
          0  page faults
          0  swaps
       8260  block input operations
       8192  block output operations
          0  messages sent
          0  messages received
          0  signals received
       8303  voluntary context switches
      24700  involuntary context switches
 
 Average Rate: 19MB/s.
 
 I'd hope that someone could point me in the right direction so I can clean the 
 patch up, or finish this off themselves and commit it (or something else 
 which will increase the read/write speed).
 
 Thanks very much,
 -- 
 Dominic
 GoodforBusiness.co.uk
 I.T. Services for SMEs in the UK.
 
 --Boundary-00=_uvdmCBeoNTBAj+g
 Content-Type: text/x-diff;
 	charset="iso-8859-1";
 	name="msdos-perf-releng5.patch"
 Content-Transfer-Encoding: 7bit
 Content-Disposition: attachment;
 	filename="msdos-perf-releng5.patch"
 
 Index: msdosfs_vnops.c
 ===================================================================
 RCS file: /usr/cvs/src/sys/fs/msdosfs/msdosfs_vnops.c,v
 retrieving revision 1.149.2.1
 diff -u -r1.149.2.1 msdosfs_vnops.c
 --- msdosfs_vnops.c	31 Jan 2005 23:25:56 -0000	1.149.2.1
 +++ msdosfs_vnops.c	29 May 2005 14:10:18 -0000
 @@ -517,6 +517,8 @@
  	int blsize;
  	int isadir;
  	int orig_resid;
 +	int nblks = 16; /* XXX should be defined, but not here */
 +	int crsize;
  	u_int n;
  	u_long diff;
  	u_long on;
 @@ -565,14 +567,21 @@
  			error = bread(pmp->pm_devvp, lbn, blsize, NOCRED, &bp);
  		} else {
  			blsize = pmp->pm_bpcluster;
 -			rablock = lbn + 1;
 -			if (seqcount > 1 &&
 -			    de_cn2off(pmp, rablock) < dep->de_FileSize) {
 -				rasize = pmp->pm_bpcluster;
 -				error = breadn(vp, lbn, blsize,
 -				    &rablock, &rasize, 1, NOCRED, &bp);
 +			/* XXX what is the best value for crsize? */
 + 			crsize = blsize * nblks > MAXBSIZE ? MAXBSIZE : blsize * nblks;
 +			if ((vp->v_mount->mnt_flag & MNT_NOCLUSTERR) == 0) {
 +				error = cluster_read(vp, dep->de_FileSize, lbn,
 +					crsize, NOCRED, uio->uio_resid, seqcount, &bp);
  			} else {
 -				error = bread(vp, lbn, blsize, NOCRED, &bp);
 +				rablock = lbn + 1;
 +				if (seqcount > 1 &&
 +					de_cn2off(pmp, rablock) < dep->de_FileSize) {
 +						rasize = pmp->pm_bpcluster;
 +						error = breadn(vp, lbn, blsize,
 +						&rablock, &rasize, 1, NOCRED, &bp);
 +				} else {
 +					error = bread(vp, lbn, blsize, NOCRED, &bp);
 +				}
  			}
  		}
  		if (error) {
 @@ -580,14 +589,16 @@
  			break;
  		}
  		on = uio->uio_offset & pmp->pm_crbomask;
 -		diff = pmp->pm_bpcluster - on;
 -		n = diff > uio->uio_resid ? uio->uio_resid : diff;
 +		diff = blsize * nblks - on;
 +		n = blsize * nblks > uio->uio_resid ? uio->uio_resid : blsize * nblks;
  		diff = dep->de_FileSize - uio->uio_offset;
 -		if (diff < n)
 +		if (diff < n) {
  			n = diff;
 -		diff = blsize - bp->b_resid;
 -		if (diff < n)
 +		}
 +		diff = blsize * nblks - bp->b_resid;
 +		if (diff < n) {
  			n = diff;
 +		}
  		error = uiomove(bp->b_data + on, (int) n, uio);
  		brelse(bp);
  	} while (error == 0 && uio->uio_resid > 0 && n != 0);
 @@ -607,6 +618,7 @@
  		struct uio *a_uio;
  		int a_ioflag;
  		struct ucred *a_cred;
 +		int seqcount;
  	} */ *ap;
  {
  	int n;
 @@ -615,6 +627,7 @@
  	u_long osize;
  	int error = 0;
  	u_long count;
 +	int seqcount;
  	daddr_t bn, lastcn;
  	struct buf *bp;
  	int ioflag = ap->a_ioflag;
 @@ -692,7 +705,7 @@
  	 */
  	if (uio->uio_offset + resid > osize) {
  		count = de_clcount(pmp, uio->uio_offset + resid) -
 -			de_clcount(pmp, osize);
 +		   	de_clcount(pmp, osize);
  		error = extendfile(dep, count, NULL, NULL, 0);
  		if (error &&  (error != ENOSPC || (ioflag & IO_UNIT)))
  			goto errexit;
 @@ -700,6 +713,7 @@
  	} else
  		lastcn = de_clcount(pmp, osize) - 1;
  
 +	seqcount = ioflag >> IO_SEQSHIFT;
  	do {
  		if (de_cluster(pmp, uio->uio_offset) > lastcn) {
  			error = ENOSPC;
 @@ -725,7 +739,7 @@
  			 * then no need to read data from disk.
  			 */
  			bp = getblk(thisvp, bn, pmp->pm_bpcluster, 0, 0, 0);
 -			clrbuf(bp);
 +			vfs_bio_clrbuf(bp);
  			/*
  			 * Do the bmap now, since pcbmap needs buffers
  			 * for the fat table. (see msdosfs_strategy)
 @@ -775,6 +789,7 @@
  		 * without delay.  Otherwise do a delayed write because we
  		 * may want to write somemore into the block later.
  		 */
 +		 /*
  		if (ioflag & IO_SYNC)
  			(void) bwrite(bp);
  		else if (n + croffset == pmp->pm_bpcluster)
 @@ -782,6 +797,24 @@
  		else
  			bdwrite(bp);
  		dep->de_flag |= DE_UPDATE;
 +		*/
 +		/*
 +		 * XXX Patch.
 +		 */
 +                if ((vp->v_mount->mnt_flag & MNT_NOCLUSTERW) == 0)
 +                       bp->b_flags |= B_CLUSTEROK;
 +                if (ioflag & IO_SYNC)
 +                       (void)bwrite(bp);
 +                else if (vm_page_count_severe() || buf_dirty_count_severe())
 +                       bawrite(bp);
 +                else if (n + croffset == pmp->pm_bpcluster) {
 +                       if ((vp->v_mount->mnt_flag & MNT_NOCLUSTERW) == 0)
 +                               cluster_write(bp, dep->de_FileSize, seqcount);
 +                       else
 +                               bawrite(bp);
 +               } else
 +                       bdwrite(bp);
 +                dep->de_flag |= DE_UPDATE;
  	} while (error == 0 && uio->uio_resid > 0);
  
  	/*
 
 --Boundary-00=_uvdmCBeoNTBAj+g
 Content-Type: text/plain;
 	charset="us-ascii"
 MIME-Version: 1.0
 Content-Transfer-Encoding: 7bit
 Content-Disposition: attachment
 
 _______________________________________________
 freebsd-fs@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-fs
 To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"
 
 --Boundary-00=_uvdmCBeoNTBAj+g--

From: Bruce Evans <bde@zeta.org.au>
To: Dominic Marks <dom@goodforbusiness.co.uk>
Cc: freebsd-gnats-submit@FreeBSD.org, banhalmi@field.hu,
   freebsd-fs@FreeBSD.org
Subject: Re: i386/68719: [usb] USB 2.0 mobil rack+ fat32 performance problem
Date: Mon, 30 May 2005 17:19:22 +1000 (EST)

 On Sun, 29 May 2005, Dominic Marks wrote:
 
 > I have been experimenting in msdosfs_read and I have managed to come up with
 > something that works, but I'm sure it is flawed. On large file reads it will
 > improve read performance (see below) - but only after a long period of the
 > file copy achieving only 3MB/s (see A1). During this time gstat reports the
 > disc itself is reading at its maximum of around 28MB/s. After a long period
 > of low throughput, the disc drops to 25MB/s but the actual transfer rate
 > increases to 25MB/s (see A2).
 
 A1 is strange.  It might be reading too much ahead, but I wouldn't expect 
 the read-ahead to be discarded soon so this should make little difference
 for reading whole files.
 
 > I've tried to narrow it down to something but I'm mostly in the dark, so I'll
 > just hand over what I found to work to review. I looked at Bruce's changes to
 > msdosfs_write and tried to do the same (implement cluster_read) using the
 > ext2 and ffs _read methods as a how-to. I think I'm reading ahead too far, or
 > too early. I have been unable to interpret the gstat output during the first
 > part of the transfer any further.
 
 The ext2 and ffs methods are a good place to start.  Also look at cd9660 --
 it is a little simpler.
 
 > The patch which combines Bruce's original patch for msdosfs_write, revised for
 > current text positions, and my attempts to do the same for msdosfs_read.
 >
 > %%
 > Index: msdosfs_vnops.c
 > ===================================================================
 > RCS file: /usr/cvs/src/sys/fs/msdosfs/msdosfs_vnops.c,v
 > retrieving revision 1.149.2.1
 > diff -u -r1.149.2.1 msdosfs_vnops.c
 > --- msdosfs_vnops.c	31 Jan 2005 23:25:56 -0000	1.149.2.1
 > +++ msdosfs_vnops.c	29 May 2005 14:10:18 -0000
 > @@ -565,14 +567,21 @@
 > 			error = bread(pmp->pm_devvp, lbn, blsize, NOCRED, &bp);
 > 		} else {
 > 			blsize = pmp->pm_bpcluster;
 > -			rablock = lbn + 1;
 > -			if (seqcount > 1 &&
 > -			    de_cn2off(pmp, rablock) < dep->de_FileSize) {
 > -				rasize = pmp->pm_bpcluster;
 > -				error = breadn(vp, lbn, blsize,
 > -				    &rablock, &rasize, 1, NOCRED, &bp);
 > +			/* XXX what is the best value for crsize? */
 > + 			crsize = blsize * nblks > MAXBSIZE ? MAXBSIZE : blsize * nblks;
 > +			if ((vp->v_mount->mnt_flag & MNT_NOCLUSTERR) == 0) {
 > +				error = cluster_read(vp, dep->de_FileSize, lbn,
 > +					crsize, NOCRED, uio->uio_resid, seqcount, &bp);
 
 crsize should be just the block size (cluster size in msdosfs and
 blsize variable here) according to this code in all other file systems.
 seqcount gives the amount of readahead and there are algorithms elsewhere
 to guess its best value.  I think cluster_read() reads only physically
 contiguous blocks, so the amount of read-ahead for it is not critical
 for the clustered case anyway.  There will either be a large range of
 contigous blocks, in which case reading ahead a lot isn't bad, or
 read-ahead will be limited by discontiguities.  Giving a too-large
 value for crsize may be harmful by confusing cluster_read() about
 discontiguities, or just by asking it to read the large size when the
 blocks actually in the file aren't contiguous.
 
 I think the above handles most cases, so look for problems there first.
 
 > 			} else {
 
 The above seems to be missing a bread() for the EOF case (before the else).
 I don't know what cluster_read() does at EOF.  See cd9660_read() for clear
 code.  (Here there is unfortunately an extra level of indentation from a
 special case for directories.)
 
 > -				error = bread(vp, lbn, blsize, NOCRED, &bp);
 > +				rablock = lbn + 1;
 > +				if (seqcount > 1 &&
 > +					de_cn2off(pmp, rablock) < dep->de_FileSize) {
 > +						rasize = pmp->pm_bpcluster;
 > +						error = breadn(vp, lbn, blsize,
 > +						&rablock, &rasize, 1, NOCRED, &bp);
 > +				} else {
 > +					error = bread(vp, lbn, blsize, NOCRED, &bp);
 > +				}
 
 This part seems to be OK.  (It is just the old code indented.)
 
 > 			}
 > 		}
 > 		if (error) {
 > ...
 > %%
 >
 > With this patch I can get the following transfer rates:
 >
 > msdosfs reading
 >
 > # ls -lh /mnt/random2.file
 > -rwxr-xr-x  1 root  wheel   1.0G May 29 11:24 /mnt/random2.file
 >
 > # /usr/bin/time -al cp /mnt/random2.file /vol
 >       59.61 real         0.05 user         6.79 sys
 >       632  maximum resident set size
 >        11  average shared memory size
 >        80  average unshared data size
 >       123  average unshared stack size
 >        88  page reclaims
 >         0  page faults
 >         0  swaps
 >     23757  block input operations **
 >      8192  block output operations
 >         0  messages sent
 >         0  messages received
 >         0  signals received
 >     16660  voluntary context switches
 >     10387  involuntary context switches
 >
 > Average Rate: 15.31MB/s. (Would be higher if not for the slow start)
 >
 > ** This figure is 3x that of the UFS2 operations. This must be a indicator of
 > what I'm doing wrong, but I don't know what.
 
 This might also be a sign of fragmentation due to bad allocation policies
 at write time or write() not being able to do good allocation due to
 previous fragmentation.
 
 The average rate isn't too bad, despite the extra blocks.
 
 > msdosfs writing
 >
 > # /usr/bin/time -al cp /vol/random2.file /mnt
 >       47.33 real         0.03 user         7.13 sys
 >       632  maximum resident set size
 >        12  average shared memory size
 >        85  average unshared data size
 >       130  average unshared stack size
 >        88  page reclaims
 >         0  page faults
 >         0  swaps
 >      8735  block input operations
 >     16385  block output operations
 >         0  messages sent
 >         0  messages received
 >         0  signals received
 >      8856  voluntary context switches
 >     29631  involuntary context switches
 >
 > Average Rate: 18.79MB/s.
 
 There are 2x as many blocks as for ffs2 for writing instead of 3x for
 reading.  What are the input blocks for here?  Better put the non-msdosfs
 part of the source or target in memory so that it doesn't get counted.
 Or try mount -v (it gives sync and async read/write counts for individual
 file systems).
 
 2x is actually believable while ffs2's counts aren't.  It corresponds to
 a block size of 64K, which is what I would expect for the unfragmented
 case.
 
 > To compare with UFS2 + softupdates on the same system / disc.
 >
 > ufs2 reading
 >
 > # /usr/bin/time -al cp /mnt/random2.file /vol
 >       42.39 real         0.02 user         6.61 sys
 >       632  maximum resident set size
 >        12  average shared memory size
 >        87  average unshared data size
 >       133  average unshared stack size
 >        88  page reclaims
 >         0  page faults
 >         0  swaps
 >      8249  block input operations
 >      8193  block output operations
 >         0  messages sent
 >         0  messages received
 >         0  signals received
 >      8246  voluntary context switches
 >     24617  involuntary context switches
 >
 > Average Rate: 20.89MB/s.
 
 Isn't it 24.16MB/s?
 
 8192 i/o operations seems to be too small.  It corresponds to a block
 size of 128K.  Most drivers don't actually support doing i/o of that
 size (most have a limit of 64K), so if they get asked to then it is a
 bug.  This bug is common or ubiquitous.  The block size to use for
 clusters is in mnt_iosize_max, and this is set in various wrong ways,
 often or always to MAXPHYS = 128K.  This usually makes little difference
 except to give misleading statistics.  Clustering tends to produce
 blocks of size 128K and the block i/o counts report blocks of that
 sizes, but smaller blocks are sent to the hardware.  I'm not sure if
 libdevstat() sees the smaller blocks.  I think it doesn't.
 
 > [... ufs2 writing similar to reading]
 
 Bruce

From: "Bruce Evans" <bde@zeta.org.au>
To: <james>
Cc: <freebsd-fs@FreeBSD.org>,
	<freebsd-gnats-submit@FreeBSD.org>,
	<banhalmi@field.hu>
Subject: Re: i386/68719: [usb] USB 2.0 mobil rack+ fat32 performance problem
Date: Mon, 30 May 2005 08:30:53 +0100

 On Sun, 29 May 2005, Dominic Marks wrote:
 
 > I have been experimenting in msdosfs_read and I have managed to come up with
 > something that works, but I'm sure it is flawed. On large file reads it will
 > improve read performance (see below) - but only after a long period of the
 > file copy achieving only 3MB/s (see A1). During this time gstat reports the
 > disc itself is reading at its maximum of around 28MB/s. After a long period
 > of low throughput, the disc drops to 25MB/s but the actual transfer rate
 > increases to 25MB/s (see A2).
 
 A1 is strange.  It might be reading too much ahead, but I wouldn't expect 
 the read-ahead to be discarded soon so this should make little difference
 for reading whole files.
 
 > I've tried to narrow it down to something but I'm mostly in the dark, so I'll
 > just hand over what I found to work to review. I looked at Bruce's changes to
 > msdosfs_write and tried to do the same (implement cluster_read) using the
 > ext2 and ffs _read methods as a how-to. I think I'm reading ahead too far, or
 > too early. I have been unable to interpret the gstat output during the first
 > part of the transfer any further.
 
 The ext2 and ffs methods are a good place to start.  Also look at cd9660 --
 it is a little simpler.
 
 > The patch which combines Bruce's original patch for msdosfs_write, revised for
 > current text positions, and my attempts to do the same for msdosfs_read.
 >
 > %%
 > Index: msdosfs_vnops.c
 > ===================================================================
 > RCS file: /usr/cvs/src/sys/fs/msdosfs/msdosfs_vnops.c,v
 > retrieving revision 1.149.2.1
 > diff -u -r1.149.2.1 msdosfs_vnops.c
 > --- msdosfs_vnops.c	31 Jan 2005 23:25:56 -0000	1.149.2.1
 > +++ msdosfs_vnops.c	29 May 2005 14:10:18 -0000
 > @@ -565,14 +567,21 @@
 > 			error = bread(pmp->pm_devvp, lbn, blsize, NOCRED, &bp);
 > 		} else {
 > 			blsize = pmp->pm_bpcluster;
 > -			rablock = lbn + 1;
 > -			if (seqcount > 1 &&
 > -			    de_cn2off(pmp, rablock) < dep->de_FileSize) {
 > -				rasize = pmp->pm_bpcluster;
 > -				error = breadn(vp, lbn, blsize,
 > -				    &rablock, &rasize, 1, NOCRED, &bp);
 > +			/* XXX what is the best value for crsize? */
 > + 			crsize = blsize * nblks > MAXBSIZE ? MAXBSIZE : blsize * nblks;
 > +			if ((vp->v_mount->mnt_flag & MNT_NOCLUSTERR) == 0) {
 > +				error = cluster_read(vp, dep->de_FileSize, lbn,
 > +					crsize, NOCRED, uio->uio_resid, seqcount, &bp);
 
 crsize should be just the block size (cluster size in msdosfs and
 blsize variable here) according to this code in all other file systems.
 seqcount gives the amount of readahead and there are algorithms elsewhere
 to guess its best value.  I think cluster_read() reads only physically
 contiguous blocks, so the amount of read-ahead for it is not critical
 for the clustered case anyway.  There will either be a large range of
 contigous blocks, in which case reading ahead a lot isn't bad, or
 read-ahead will be limited by discontiguities.  Giving a too-large
 value for crsize may be harmful by confusing cluster_read() about
 discontiguities, or just by asking it to read the large size when the
 blocks actually in the file aren't contiguous.
 
 I think the above handles most cases, so look for problems there first.
 
 > 			} else {
 
 The above seems to be missing a bread() for the EOF case (before the else).
 I don't know what cluster_read() does at EOF.  See cd9660_read() for clear
 code.  (Here there is unfortunately an extra level of indentation from a
 special case for directories.)
 
 > -				error = bread(vp, lbn, blsize, NOCRED, &bp);
 > +				rablock = lbn + 1;
 > +				if (seqcount > 1 &&
 > +					de_cn2off(pmp, rablock) < dep->de_FileSize) {
 > +						rasize = pmp->pm_bpcluster;
 > +						error = breadn(vp, lbn, blsize,
 > +						&rablock, &rasize, 1, NOCRED, &bp);
 > +				} else {
 > +					error = bread(vp, lbn, blsize, NOCRED, &bp);
 > +				}
 
 This part seems to be OK.  (It is just the old code indented.)
 
 > 			}
 > 		}
 > 		if (error) {
 > ...
 > %%
 >
 > With this patch I can get the following transfer rates:
 >
 > msdosfs reading
 >
 > # ls -lh /mnt/random2.file
 > -rwxr-xr-x  1 root  wheel   1.0G May 29 11:24 /mnt/random2.file
 >
 > # /usr/bin/time -al cp /mnt/random2.file /vol
 >       59.61 real         0.05 user         6.79 sys
 >       632  maximum resident set size
 >        11  average shared memory size
 >        80  average unshared data size
 >       123  average unshared stack size
 >        88  page reclaims
 >         0  page faults
 >         0  swaps
 >     23757  block input operations **
 >      8192  block output operations
 >         0  messages sent
 >         0  messages received
 >         0  signals received
 >     16660  voluntary context switches
 >     10387  involuntary context switches
 >
 > Average Rate: 15.31MB/s. (Would be higher if not for the slow start)
 >
 > ** This figure is 3x that of the UFS2 operations. This must be a indicator of
 > what I'm doing wrong, but I don't know what.
 
 This might also be a sign of fragmentation due to bad allocation policies
 at write time or write() not being able to do good allocation due to
 previous fragmentation.
 
 The average rate isn't too bad, despite the extra blocks.
 
 > msdosfs writing
 >
 > # /usr/bin/time -al cp /vol/random2.file /mnt
 >       47.33 real         0.03 user         7.13 sys
 >       632  maximum resident set size
 >        12  average shared memory size
 >        85  average unshared data size
 >       130  average unshared stack size
 >        88  page reclaims
 >         0  page faults
 >         0  swaps
 >      8735  block input operations
 >     16385  block output operations
 >         0  messages sent
 >         0  messages received
 >         0  signals received
 >      8856  voluntary context switches
 >     29631  involuntary context switches
 >
 > Average Rate: 18.79MB/s.
 
 There are 2x as many blocks as for ffs2 for writing instead of 3x for
 reading.  What are the input blocks for here?  Better put the non-msdosfs
 part of the source or target in memory so that it doesn't get counted.
 Or try mount -v (it gives sync and async read/write counts for individual
 file systems).
 
 2x is actually believable while ffs2's counts aren't.  It corresponds to
 a block size of 64K, which is what I would expect for the unfragmented
 case.
 
 > To compare with UFS2 + softupdates on the same system / disc.
 >
 > ufs2 reading
 >
 > # /usr/bin/time -al cp /mnt/random2.file /vol
 >       42.39 real         0.02 user         6.61 sys
 >       632  maximum resident set size
 >        12  average shared memory size
 >        87  average unshared data size
 >       133  average unshared stack size
 >        88  page reclaims
 >         0  page faults
 >         0  swaps
 >      8249  block input operations
 >      8193  block output operations
 >         0  messages sent
 >         0  messages received
 >         0  signals received
 >      8246  voluntary context switches
 >     24617  involuntary context switches
 >
 > Average Rate: 20.89MB/s.
 
 Isn't it 24.16MB/s?
 
 8192 i/o operations seems to be too small.  It corresponds to a block
 size of 128K.  Most drivers don't actually support doing i/o of that
 size (most have a limit of 64K), so if they get asked to then it is a
 bug.  This bug is common or ubiquitous.  The block size to use for
 clusters is in mnt_iosize_max, and this is set in various wrong ways,
 often or always to MAXPHYS = 128K.  This usually makes little difference
 except to give misleading statistics.  Clustering tends to produce
 blocks of size 128K and the block i/o counts report blocks of that
 sizes, but smaller blocks are sent to the hardware.  I'm not sure if
 libdevstat() sees the smaller blocks.  I think it doesn't.
 
 > [... ufs2 writing similar to reading]
 
 Bruce
 _______________________________________________
 freebsd-fs@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-fs
 To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"
 
 

From: Bruce Evans <bde@zeta.org.au>
To: Dominic Marks <dom@goodforbusiness.co.uk>
Cc: freebsd-fs@freebsd.org, freebsd-gnats-submit@freebsd.org,
   banhalmi@field.hu
Subject: Re: i386/68719: [usb] USB 2.0 mobil rack+ fat32 performance problem
Date: Mon, 30 May 2005 20:11:45 +1000 (EST)

 On Mon, 30 May 2005, Bruce Evans wrote:
 
 > On Sun, 29 May 2005, Dominic Marks wrote:
 
 >> I have been experimenting in msdosfs_read and I have managed to come up 
 >> with
 >> something that works, but I'm sure it is flawed. On large file reads it 
 >> will
 >> ...
 >> %%
 >> Index: msdosfs_vnops.c
 >> ===================================================================
 >> RCS file: /usr/cvs/src/sys/fs/msdosfs/msdosfs_vnops.c,v
 >> retrieving revision 1.149.2.1
 >> diff -u -r1.149.2.1 msdosfs_vnops.c
 >> --- msdosfs_vnops.c	31 Jan 2005 23:25:56 -0000	1.149.2.1
 >> +++ msdosfs_vnops.c	29 May 2005 14:10:18 -0000
 >> @@ -565,14 +567,21 @@
 >> 			error = bread(pmp->pm_devvp, lbn, blsize, NOCRED, 
 >> &bp);
 >> 		} else {
 >> 			blsize = pmp->pm_bpcluster;
 >> -			rablock = lbn + 1;
 >> -			if (seqcount > 1 &&
 >> -			    de_cn2off(pmp, rablock) < dep->de_FileSize) {
 >> -				rasize = pmp->pm_bpcluster;
 >> -				error = breadn(vp, lbn, blsize,
 >> -				    &rablock, &rasize, 1, NOCRED, &bp);
 >> +			/* XXX what is the best value for crsize? */
 >> + 			crsize = blsize * nblks > MAXBSIZE ? MAXBSIZE : 
 >> blsize * nblks;
 >> +			if ((vp->v_mount->mnt_flag & MNT_NOCLUSTERR) == 0) {
 >> +				error = cluster_read(vp, dep->de_FileSize, 
 >> lbn,
 >> +					crsize, NOCRED, uio->uio_resid, 
 >> seqcount, &bp);
 >
 > crsize should be just the block size (cluster size in msdosfs and
 > blsize variable here) according to this code in all other file systems.
 > ...
 
 The main problem is that VOP_BMAP() is not fully implemented for msdosfs.
 msdosfs_bmap() only has a stub which pretends that clustering ins never
 possible:
 
 % /*
 %  * vp  - address of vnode file the file
 %  * bn  - which cluster we are interested in mapping to a filesystem block number.
 %  * vpp - returns the vnode for the block special file holding the filesystem
 %  *	 containing the file of interest
 %  * bnp - address of where to return the filesystem relative block number
 %  */
 
 This comment rotted in 1994 when 4.4BSD packed the args into a struct
 and added the a_runp and a_runb args to support clustering.
 
 % static int
 % msdosfs_bmap(ap)
 % 	struct vop_bmap_args /* {
 % 		struct vnode *a_vp;
 % 		daddr_t a_bn;
 % 		struct vnode **a_vpp;
 % 		daddr_t *a_bnp;
 % 		int *a_runp;
 % 		int *a_runb;
 % 	} */ *ap;
 % {
 % 	struct denode *dep = VTODE(ap->a_vp);
 % 	daddr_t blkno;
 % 	int error;
 % 
 % 	if (ap->a_vpp != NULL)
 % 		*ap->a_vpp = dep->de_devvp;
 % 	if (ap->a_bnp == NULL)
 % 		return (0);
 % 	if (ap->a_runp) {
 % 		/*
 % 		 * Sequential clusters should be counted here.
    		                       ^^^^^^^^^
 % 		 */
 % 		*ap->a_runp = 0;
 % 	}
 % 	if (ap->a_runb) {
 % 		*ap->a_runb = 0;
 % 	}
 % 	error = pcbmap(dep, ap->a_bn, &blkno, 0, 0);
 % 	*ap->a_bnp = blkno;
 % 	return (error);
 % }
 
 Here is a cleaned up version of the patch to add (not actually working)
 clustering to msdosfs_read().
 
 %%%
 Index: msdosfs_vnops.c
 ===================================================================
 RCS file: /home/ncvs/src/sys/fs/msdosfs/msdosfs_vnops.c,v
 retrieving revision 1.147
 diff -u -2 -r1.147 msdosfs_vnops.c
 --- msdosfs_vnops.c	4 Feb 2004 21:52:53 -0000	1.147
 +++ msdosfs_vnops.c	30 May 2005 08:57:02 -0000
 @@ -541,5 +555,7 @@
   		if (uio->uio_offset >= dep->de_FileSize)
   			break;
 +		blsize = pmp->pm_bpcluster;
   		lbn = de_cluster(pmp, uio->uio_offset);
 +		rablock = lbn + 1;
   		/*
   		 * If we are operating on a directory file then be sure to
 @@ -556,15 +573,15 @@
   				break;
   			error = bread(pmp->pm_devvp, lbn, blsize, NOCRED, &bp);
 +		} else if (de_cn2off(pmp, rablock) >= dep->de_FileSize) {
 +			error = bread(vp, lbn, blsize, NOCRED, &bp);
 +		} else if ((vp->v_mount->mnt_flag & MNT_NOCLUSTERR) == 0) {
 +			error = cluster_read(vp, dep->de_FileSize, lbn, blsize,
 +			    NOCRED, uio->uio_resid, seqcount, &bp);
 +		} else if (seqcount > 1) {
 +			rasize = blsize;
 +			error = breadn(vp, lbn,
 +			    blsize, &rablock, &rasize, 1, NOCRED, &bp);
   		} else {
 -			blsize = pmp->pm_bpcluster;
 -			rablock = lbn + 1;
 -			if (seqcount > 1 &&
 -			    de_cn2off(pmp, rablock) < dep->de_FileSize) {
 -				rasize = pmp->pm_bpcluster;
 -				error = breadn(vp, lbn, blsize,
 -				    &rablock, &rasize, 1, NOCRED, &bp);
 -			} else {
 -				error = bread(vp, lbn, blsize, NOCRED, &bp);
 -			}
 +			error = bread(vp, lbn, blsize, NOCRED, &bp);
   		}
   		if (error) {
 %%%
 
 I rearranged the code to be almost lexically identical with that in
 ffs_read().
 
 I only tested this on a relatively fast ATA drive.  It made little
 difference.  Most writes were clustered to give a block size of 64K
 and write speed of over 40+MB/s until the disk is nearly full, but
 reads weren't clustered with or without the patch so the block size
 remained at the fs block size (4K); the drive handles this block size
 mediocrely and gave a read speed of 20+MB/sec.  (The drive is a WDC
 1200JB-00CRA1.  This drive has the interesting behaviour of giving
 almost the same mediocre read speed for all block sizes between 2.5K
 and 19.5K.  A block size 20K gives maximal speed which is about twice
 as fast as the speed for a block size of 19.5K.)
 
 Both reading and writing a 1GB file to/from msdosfs caused noticable
 buffer resource problems.  Accesses to other file systems on the same
 disk sometimes blocked for many seconds.  I have debugging code in
 getblk().  It reported that a process waited 17 seconds in or near
 getblk().  The process only stopped waiting because I suspended the
 process accessing msdosfs.  This may be a local bug.
 
 Bruce

From: "Bruce Evans" <bde@zeta.org.au>
To: <james>
Cc: <freebsd-fs@freebsd.org>,
	<freebsd-gnats-submit@freebsd.org>,
	<banhalmi@field.hu>
Subject: Re: i386/68719: [usb] USB 2.0 mobil rack+ fat32 performance problem
Date: Mon, 30 May 2005 11:46:06 +0100

 On Mon, 30 May 2005, Bruce Evans wrote:
 
 > On Sun, 29 May 2005, Dominic Marks wrote:
 
 >> I have been experimenting in msdosfs_read and I have managed to come up 
 >> with
 >> something that works, but I'm sure it is flawed. On large file reads it 
 >> will
 >> ...
 >> %%
 >> Index: msdosfs_vnops.c
 >> ===================================================================
 >> RCS file: /usr/cvs/src/sys/fs/msdosfs/msdosfs_vnops.c,v
 >> retrieving revision 1.149.2.1
 >> diff -u -r1.149.2.1 msdosfs_vnops.c
 >> --- msdosfs_vnops.c	31 Jan 2005 23:25:56 -0000	1.149.2.1
 >> +++ msdosfs_vnops.c	29 May 2005 14:10:18 -0000
 >> @@ -565,14 +567,21 @@
 >> 			error = bread(pmp->pm_devvp, lbn, blsize, NOCRED, 
 >> &bp);
 >> 		} else {
 >> 			blsize = pmp->pm_bpcluster;
 >> -			rablock = lbn + 1;
 >> -			if (seqcount > 1 &&
 >> -			    de_cn2off(pmp, rablock) < dep->de_FileSize) {
 >> -				rasize = pmp->pm_bpcluster;
 >> -				error = breadn(vp, lbn, blsize,
 >> -				    &rablock, &rasize, 1, NOCRED, &bp);
 >> +			/* XXX what is the best value for crsize? */
 >> + 			crsize = blsize * nblks > MAXBSIZE ? MAXBSIZE : 
 >> blsize * nblks;
 >> +			if ((vp->v_mount->mnt_flag & MNT_NOCLUSTERR) == 0) {
 >> +				error = cluster_read(vp, dep->de_FileSize, 
 >> lbn,
 >> +					crsize, NOCRED, uio->uio_resid, 
 >> seqcount, &bp);
 >
 > crsize should be just the block size (cluster size in msdosfs and
 > blsize variable here) according to this code in all other file systems.
 > ...
 
 The main problem is that VOP_BMAP() is not fully implemented for msdosfs.
 msdosfs_bmap() only has a stub which pretends that clustering ins never
 possible:
 
 % /*
 %  * vp  - address of vnode file the file
 %  * bn  - which cluster we are interested in mapping to a filesystem block number.
 %  * vpp - returns the vnode for the block special file holding the filesystem
 %  *	 containing the file of interest
 %  * bnp - address of where to return the filesystem relative block number
 %  */
 
 This comment rotted in 1994 when 4.4BSD packed the args into a struct
 and added the a_runp and a_runb args to support clustering.
 
 % static int
 % msdosfs_bmap(ap)
 % 	struct vop_bmap_args /* {
 % 		struct vnode *a_vp;
 % 		daddr_t a_bn;
 % 		struct vnode **a_vpp;
 % 		daddr_t *a_bnp;
 % 		int *a_runp;
 % 		int *a_runb;
 % 	} */ *ap;
 % {
 % 	struct denode *dep = VTODE(ap->a_vp);
 % 	daddr_t blkno;
 % 	int error;
 % 
 % 	if (ap->a_vpp != NULL)
 % 		*ap->a_vpp = dep->de_devvp;
 % 	if (ap->a_bnp == NULL)
 % 		return (0);
 % 	if (ap->a_runp) {
 % 		/*
 % 		 * Sequential clusters should be counted here.
    		                       ^^^^^^^^^
 % 		 */
 % 		*ap->a_runp = 0;
 % 	}
 % 	if (ap->a_runb) {
 % 		*ap->a_runb = 0;
 % 	}
 % 	error = pcbmap(dep, ap->a_bn, &blkno, 0, 0);
 % 	*ap->a_bnp = blkno;
 % 	return (error);
 % }
 
 Here is a cleaned up version of the patch to add (not actually working)
 clustering to msdosfs_read().
 
 %%%
 Index: msdosfs_vnops.c
 ===================================================================
 RCS file: /home/ncvs/src/sys/fs/msdosfs/msdosfs_vnops.c,v
 retrieving revision 1.147
 diff -u -2 -r1.147 msdosfs_vnops.c
 --- msdosfs_vnops.c	4 Feb 2004 21:52:53 -0000	1.147
 +++ msdosfs_vnops.c	30 May 2005 08:57:02 -0000
 @@ -541,5 +555,7 @@
   		if (uio->uio_offset >= dep->de_FileSize)
   			break;
 +		blsize = pmp->pm_bpcluster;
   		lbn = de_cluster(pmp, uio->uio_offset);
 +		rablock = lbn + 1;
   		/*
   		 * If we are operating on a directory file then be sure to
 @@ -556,15 +573,15 @@
   				break;
   			error = bread(pmp->pm_devvp, lbn, blsize, NOCRED, &bp);
 +		} else if (de_cn2off(pmp, rablock) >= dep->de_FileSize) {
 +			error = bread(vp, lbn, blsize, NOCRED, &bp);
 +		} else if ((vp->v_mount->mnt_flag & MNT_NOCLUSTERR) == 0) {
 +			error = cluster_read(vp, dep->de_FileSize, lbn, blsize,
 +			    NOCRED, uio->uio_resid, seqcount, &bp);
 +		} else if (seqcount > 1) {
 +			rasize = blsize;
 +			error = breadn(vp, lbn,
 +			    blsize, &rablock, &rasize, 1, NOCRED, &bp);
   		} else {
 -			blsize = pmp->pm_bpcluster;
 -			rablock = lbn + 1;
 -			if (seqcount > 1 &&
 -			    de_cn2off(pmp, rablock) < dep->de_FileSize) {
 -				rasize = pmp->pm_bpcluster;
 -				error = breadn(vp, lbn, blsize,
 -				    &rablock, &rasize, 1, NOCRED, &bp);
 -			} else {
 -				error = bread(vp, lbn, blsize, NOCRED, &bp);
 -			}
 +			error = bread(vp, lbn, blsize, NOCRED, &bp);
   		}
   		if (error) {
 %%%
 
 I rearranged the code to be almost lexically identical with that in
 ffs_read().
 
 I only tested this on a relatively fast ATA drive.  It made little
 difference.  Most writes were clustered to give a block size of 64K
 and write speed of over 40+MB/s until the disk is nearly full, but
 reads weren't clustered with or without the patch so the block size
 remained at the fs block size (4K); the drive handles this block size
 mediocrely and gave a read speed of 20+MB/sec.  (The drive is a WDC
 1200JB-00CRA1.  This drive has the interesting behaviour of giving
 almost the same mediocre read speed for all block sizes between 2.5K
 and 19.5K.  A block size 20K gives maximal speed which is about twice
 as fast as the speed for a block size of 19.5K.)
 
 Both reading and writing a 1GB file to/from msdosfs caused noticable
 buffer resource problems.  Accesses to other file systems on the same
 disk sometimes blocked for many seconds.  I have debugging code in
 getblk().  It reported that a process waited 17 seconds in or near
 getblk().  The process only stopped waiting because I suspended the
 process accessing msdosfs.  This may be a local bug.
 
 Bruce
 _______________________________________________
 freebsd-fs@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-fs
 To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"
 
 

From: Dominic Marks <dom@goodforbusiness.co.uk>
To: Bruce Evans <bde@zeta.org.au>
Cc: freebsd-fs@freebsd.org,
 freebsd-gnats-submit@freebsd.org,
 banhalmi@field.hu
Subject: Re: i386/68719: [usb] USB 2.0 mobil rack+ fat32 performance problem
Date: Mon, 30 May 2005 16:09:11 +0100

 On Monday 30 May 2005 11:11, Bruce Evans wrote:
 > On Mon, 30 May 2005, Bruce Evans wrote:
 > > On Sun, 29 May 2005, Dominic Marks wrote:
 > >> I have been experimenting in msdosfs_read and I have managed to come up
 > >> with
 > >> something that works, but I'm sure it is flawed. On large file reads it
 > >> will
 > >> ...
 > >> %%
 > >> Index: msdosfs_vnops.c
 > >> ===================================================================
 > >> RCS file: /usr/cvs/src/sys/fs/msdosfs/msdosfs_vnops.c,v
 > >> retrieving revision 1.149.2.1
 > >> diff -u -r1.149.2.1 msdosfs_vnops.c
 > >> --- msdosfs_vnops.c	31 Jan 2005 23:25:56 -0000	1.149.2.1
 > >> +++ msdosfs_vnops.c	29 May 2005 14:10:18 -0000
 > >> @@ -565,14 +567,21 @@
 > >> 			error = bread(pmp->pm_devvp, lbn, blsize, NOCRED,
 > >> &bp);
 > >> 		} else {
 > >> 			blsize = pmp->pm_bpcluster;
 > >> -			rablock = lbn + 1;
 > >> -			if (seqcount > 1 &&
 > >> -			    de_cn2off(pmp, rablock) < dep->de_FileSize) {
 > >> -				rasize = pmp->pm_bpcluster;
 > >> -				error = breadn(vp, lbn, blsize,
 > >> -				    &rablock, &rasize, 1, NOCRED, &bp);
 > >> +			/* XXX what is the best value for crsize? */
 > >> + 			crsize = blsize * nblks > MAXBSIZE ? MAXBSIZE :
 > >> blsize * nblks;
 > >> +			if ((vp->v_mount->mnt_flag & MNT_NOCLUSTERR) == 0) {
 > >> +				error = cluster_read(vp, dep->de_FileSize,
 > >> lbn,
 > >> +					crsize, NOCRED, uio->uio_resid,
 > >> seqcount, &bp);
 > >
 > > crsize should be just the block size (cluster size in msdosfs and
 > > blsize variable here) according to this code in all other file systems.
 > > ...
 >
 > The main problem is that VOP_BMAP() is not fully implemented for msdosfs.
 > msdosfs_bmap() only has a stub which pretends that clustering ins never
 > possible:
 >
 > % /*
 > %  * vp  - address of vnode file the file
 > %  * bn  - which cluster we are interested in mapping to a filesystem block
 > number. %  * vpp - returns the vnode for the block special file holding the
 > filesystem %  *	 containing the file of interest
 > %  * bnp - address of where to return the filesystem relative block number
 > %  */
 >
 > This comment rotted in 1994 when 4.4BSD packed the args into a struct
 > and added the a_runp and a_runb args to support clustering.
 >
 > % static int
 > % msdosfs_bmap(ap)
 > % 	struct vop_bmap_args /* {
 > % 		struct vnode *a_vp;
 > % 		daddr_t a_bn;
 > % 		struct vnode **a_vpp;
 > % 		daddr_t *a_bnp;
 > % 		int *a_runp;
 > % 		int *a_runb;
 > % 	} */ *ap;
 > % {
 > % 	struct denode *dep = VTODE(ap->a_vp);
 > % 	daddr_t blkno;
 > % 	int error;
 > %
 > % 	if (ap->a_vpp != NULL)
 > % 		*ap->a_vpp = dep->de_devvp;
 > % 	if (ap->a_bnp == NULL)
 > % 		return (0);
 > % 	if (ap->a_runp) {
 > % 		/*
 > % 		 * Sequential clusters should be counted here.
 >    		                       ^^^^^^^^^
 > % 		 */
 > % 		*ap->a_runp = 0;
 > % 	}
 > % 	if (ap->a_runb) {
 > % 		*ap->a_runb = 0;
 > % 	}
 > % 	error = pcbmap(dep, ap->a_bn, &blkno, 0, 0);
 > % 	*ap->a_bnp = blkno;
 > % 	return (error);
 > % }
 
 If I understand what is supposed to be done here (I looked at cd9660 but
 I don't know if the rules are different from msdos), a_runp should be set
 to the extent of contiguous blocks from the current position within the
 same region? I put some debugging into msdosfs_bmap and here it is copied:
 
 (fsz is dep->de_FileSize)
 
 msdosfs_bmap: fsz  81047  blkno  6374316  lblkno 5
 msdosfs_bmap: fsz  81047  blkno  6374324  lblkno 6
 msdosfs_bmap: fsz  81047  blkno  6374332  lblkno 7
 msdosfs_bmap: fsz  81047  blkno  6374340  lblkno 8
 msdosfs_bmap: fsz  81047  blkno  6374348  lblkno 9
 msdosfs_bmap: fsz  81047  blkno  6374356  lblkno 10
 msdosfs_bmap: fsz  81047  blkno  6374364  lblkno 11
 msdosfs_bmap: fsz  81047  blkno  6374372  lblkno 12 # A1
 msdosfs_bmap: fsz  81047  blkno 13146156  lblkno 13 # A2
 msdosfs_bmap: fsz  81047  blkno 13146156  lblkno 14
 msdosfs_bmap: fsz  81047  blkno 13146156  lblkno 15
 msdosfs_bmap: fsz  81047  blkno 13146156  lblkno 16
 msdosfs_bmap: fsz  81047  blkno 13146156  lblkno 17
 msdosfs_bmap: fsz  81047  blkno 13146156  lblkno 18
 msdosfs_bmap: fsz  81047  blkno 13146156  lblkno 19
 
 I should compute the position of the boundary illustrated in A1 I should set 
 that to the read ahead value, until setting a new value at A2, perhaps this 
 should only be done for particularly large files? I will look at the other 
 _bmap routines to see what they do.
 
 > Here is a cleaned up version of the patch to add (not actually working)
 > clustering to msdosfs_read().
 >
 > %%%
 > Index: msdosfs_vnops.c
 > ===================================================================
 > RCS file: /home/ncvs/src/sys/fs/msdosfs/msdosfs_vnops.c,v
 > retrieving revision 1.147
 > diff -u -2 -r1.147 msdosfs_vnops.c
 > --- msdosfs_vnops.c	4 Feb 2004 21:52:53 -0000	1.147
 > +++ msdosfs_vnops.c	30 May 2005 08:57:02 -0000
 > @@ -541,5 +555,7 @@
 >   		if (uio->uio_offset >= dep->de_FileSize)
 >   			break;
 > +		blsize = pmp->pm_bpcluster;
 >   		lbn = de_cluster(pmp, uio->uio_offset);
 > +		rablock = lbn + 1;
 >   		/*
 >   		 * If we are operating on a directory file then be sure to
 > @@ -556,15 +573,15 @@
 >   				break;
 >   			error = bread(pmp->pm_devvp, lbn, blsize, NOCRED, &bp);
 > +		} else if (de_cn2off(pmp, rablock) >= dep->de_FileSize) {
 > +			error = bread(vp, lbn, blsize, NOCRED, &bp);
 > +		} else if ((vp->v_mount->mnt_flag & MNT_NOCLUSTERR) == 0) {
 > +			error = cluster_read(vp, dep->de_FileSize, lbn, blsize,
 > +			    NOCRED, uio->uio_resid, seqcount, &bp);
 > +		} else if (seqcount > 1) {
 > +			rasize = blsize;
 > +			error = breadn(vp, lbn,
 > +			    blsize, &rablock, &rasize, 1, NOCRED, &bp);
 >   		} else {
 > -			blsize = pmp->pm_bpcluster;
 > -			rablock = lbn + 1;
 > -			if (seqcount > 1 &&
 > -			    de_cn2off(pmp, rablock) < dep->de_FileSize) {
 > -				rasize = pmp->pm_bpcluster;
 > -				error = breadn(vp, lbn, blsize,
 > -				    &rablock, &rasize, 1, NOCRED, &bp);
 > -			} else {
 > -				error = bread(vp, lbn, blsize, NOCRED, &bp);
 > -			}
 > +			error = bread(vp, lbn, blsize, NOCRED, &bp);
 >   		}
 >   		if (error) {
 > %%%
 >
 > I rearranged the code to be almost lexically identical with that in
 > ffs_read().
 
 Thanks, I will use this as a basis for any other things I try.
 
 > I only tested this on a relatively fast ATA drive.  It made little
 > difference.  Most writes were clustered to give a block size of 64K
 > and write speed of over 40+MB/s until the disk is nearly full, but
 > reads weren't clustered with or without the patch so the block size
 > remained at the fs block size (4K); the drive handles this block size
 > mediocrely and gave a read speed of 20+MB/sec.  (The drive is a WDC
 > 1200JB-00CRA1.  This drive has the interesting behaviour of giving
 > almost the same mediocre read speed for all block sizes between 2.5K
 > and 19.5K.  A block size 20K gives maximal speed which is about twice
 > as fast as the speed for a block size of 19.5K.)
 
 I am still confused as to how reading blsize * 16 actually improved
 the transfer rate after a long period of making it worse. Perhaps it
 is related to the buffer resource problem you describe below.
 
 > Both reading and writing a 1GB file to/from msdosfs caused noticable
 > buffer resource problems.  Accesses to other file systems on the same
 > disk sometimes blocked for many seconds.  I have debugging code in
 > getblk().  It reported that a process waited 17 seconds in or near
 > getblk().  The process only stopped waiting because I suspended the
 > process accessing msdosfs.  This may be a local bug.
 
 I'll look for buffer resource statistics in the system tools and
 measure those. There are no obvious signs, to me, that the systems are
 in any specific difficulties while running the transfers.
 
 > Bruce
 
 Thanks a lot for the answers and code,
 -- 
 Dominic
 GoodforBusiness.co.uk
 I.T. Services for SMEs in the UK.

From: "Dominic Marks" <dom@goodforbusiness.co.uk>
To: <james>
Cc: <freebsd-fs@freebsd.org>,
	<freebsd-gnats-submit@freebsd.org>,
	<banhalmi@field.hu>
Subject: Re: i386/68719: [usb] USB 2.0 mobil rack+ fat32 performance problem
Date: Mon, 30 May 2005 17:00:54 +0100

 On Monday 30 May 2005 11:11, Bruce Evans wrote:
 > On Mon, 30 May 2005, Bruce Evans wrote:
 > > On Sun, 29 May 2005, Dominic Marks wrote:
 > >> I have been experimenting in msdosfs_read and I have managed to come up
 > >> with
 > >> something that works, but I'm sure it is flawed. On large file reads it
 > >> will
 > >> ...
 > >> %%
 > >> Index: msdosfs_vnops.c
 > >> ===================================================================
 > >> RCS file: /usr/cvs/src/sys/fs/msdosfs/msdosfs_vnops.c,v
 > >> retrieving revision 1.149.2.1
 > >> diff -u -r1.149.2.1 msdosfs_vnops.c
 > >> --- msdosfs_vnops.c	31 Jan 2005 23:25:56 -0000	1.149.2.1
 > >> +++ msdosfs_vnops.c	29 May 2005 14:10:18 -0000
 > >> @@ -565,14 +567,21 @@
 > >> 			error = bread(pmp->pm_devvp, lbn, blsize, NOCRED,
 > >> &bp);
 > >> 		} else {
 > >> 			blsize = pmp->pm_bpcluster;
 > >> -			rablock = lbn + 1;
 > >> -			if (seqcount > 1 &&
 > >> -			    de_cn2off(pmp, rablock) < dep->de_FileSize) {
 > >> -				rasize = pmp->pm_bpcluster;
 > >> -				error = breadn(vp, lbn, blsize,
 > >> -				    &rablock, &rasize, 1, NOCRED, &bp);
 > >> +			/* XXX what is the best value for crsize? */
 > >> + 			crsize = blsize * nblks > MAXBSIZE ? MAXBSIZE :
 > >> blsize * nblks;
 > >> +			if ((vp->v_mount->mnt_flag & MNT_NOCLUSTERR) == 0) {
 > >> +				error = cluster_read(vp, dep->de_FileSize,
 > >> lbn,
 > >> +					crsize, NOCRED, uio->uio_resid,
 > >> seqcount, &bp);
 > >
 > > crsize should be just the block size (cluster size in msdosfs and
 > > blsize variable here) according to this code in all other file systems.
 > > ...
 >
 > The main problem is that VOP_BMAP() is not fully implemented for msdosfs.
 > msdosfs_bmap() only has a stub which pretends that clustering ins never
 > possible:
 >
 > % /*
 > %  * vp  - address of vnode file the file
 > %  * bn  - which cluster we are interested in mapping to a filesystem block
 > number. %  * vpp - returns the vnode for the block special file holding the
 > filesystem %  *	 containing the file of interest
 > %  * bnp - address of where to return the filesystem relative block number
 > %  */
 >
 > This comment rotted in 1994 when 4.4BSD packed the args into a struct
 > and added the a_runp and a_runb args to support clustering.
 >
 > % static int
 > % msdosfs_bmap(ap)
 > % 	struct vop_bmap_args /* {
 > % 		struct vnode *a_vp;
 > % 		daddr_t a_bn;
 > % 		struct vnode **a_vpp;
 > % 		daddr_t *a_bnp;
 > % 		int *a_runp;
 > % 		int *a_runb;
 > % 	} */ *ap;
 > % {
 > % 	struct denode *dep = VTODE(ap->a_vp);
 > % 	daddr_t blkno;
 > % 	int error;
 > %
 > % 	if (ap->a_vpp != NULL)
 > % 		*ap->a_vpp = dep->de_devvp;
 > % 	if (ap->a_bnp == NULL)
 > % 		return (0);
 > % 	if (ap->a_runp) {
 > % 		/*
 > % 		 * Sequential clusters should be counted here.
 >    		                       ^^^^^^^^^
 > % 		 */
 > % 		*ap->a_runp = 0;
 > % 	}
 > % 	if (ap->a_runb) {
 > % 		*ap->a_runb = 0;
 > % 	}
 > % 	error = pcbmap(dep, ap->a_bn, &blkno, 0, 0);
 > % 	*ap->a_bnp = blkno;
 > % 	return (error);
 > % }
 
 If I understand what is supposed to be done here (I looked at cd9660 but
 I don't know if the rules are different from msdos), a_runp should be set
 to the extent of contiguous blocks from the current position within the
 same region? I put some debugging into msdosfs_bmap and here it is copied:
 
 (fsz is dep->de_FileSize)
 
 msdosfs_bmap: fsz  81047  blkno  6374316  lblkno 5
 msdosfs_bmap: fsz  81047  blkno  6374324  lblkno 6
 msdosfs_bmap: fsz  81047  blkno  6374332  lblkno 7
 msdosfs_bmap: fsz  81047  blkno  6374340  lblkno 8
 msdosfs_bmap: fsz  81047  blkno  6374348  lblkno 9
 msdosfs_bmap: fsz  81047  blkno  6374356  lblkno 10
 msdosfs_bmap: fsz  81047  blkno  6374364  lblkno 11
 msdosfs_bmap: fsz  81047  blkno  6374372  lblkno 12 # A1
 msdosfs_bmap: fsz  81047  blkno 13146156  lblkno 13 # A2
 msdosfs_bmap: fsz  81047  blkno 13146156  lblkno 14
 msdosfs_bmap: fsz  81047  blkno 13146156  lblkno 15
 msdosfs_bmap: fsz  81047  blkno 13146156  lblkno 16
 msdosfs_bmap: fsz  81047  blkno 13146156  lblkno 17
 msdosfs_bmap: fsz  81047  blkno 13146156  lblkno 18
 msdosfs_bmap: fsz  81047  blkno 13146156  lblkno 19
 
 I should compute the position of the boundary illustrated in A1 I should set 
 that to the read ahead value, until setting a new value at A2, perhaps this 
 should only be done for particularly large files? I will look at the other 
 _bmap routines to see what they do.
 
 > Here is a cleaned up version of the patch to add (not actually working)
 > clustering to msdosfs_read().
 >
 > %%%
 > Index: msdosfs_vnops.c
 > ===================================================================
 > RCS file: /home/ncvs/src/sys/fs/msdosfs/msdosfs_vnops.c,v
 > retrieving revision 1.147
 > diff -u -2 -r1.147 msdosfs_vnops.c
 > --- msdosfs_vnops.c	4 Feb 2004 21:52:53 -0000	1.147
 > +++ msdosfs_vnops.c	30 May 2005 08:57:02 -0000
 > @@ -541,5 +555,7 @@
 >   		if (uio->uio_offset >= dep->de_FileSize)
 >   			break;
 > +		blsize = pmp->pm_bpcluster;
 >   		lbn = de_cluster(pmp, uio->uio_offset);
 > +		rablock = lbn + 1;
 >   		/*
 >   		 * If we are operating on a directory file then be sure to
 > @@ -556,15 +573,15 @@
 >   				break;
 >   			error = bread(pmp->pm_devvp, lbn, blsize, NOCRED, &bp);
 > +		} else if (de_cn2off(pmp, rablock) >= dep->de_FileSize) {
 > +			error = bread(vp, lbn, blsize, NOCRED, &bp);
 > +		} else if ((vp->v_mount->mnt_flag & MNT_NOCLUSTERR) == 0) {
 > +			error = cluster_read(vp, dep->de_FileSize, lbn, blsize,
 > +			    NOCRED, uio->uio_resid, seqcount, &bp);
 > +		} else if (seqcount > 1) {
 > +			rasize = blsize;
 > +			error = breadn(vp, lbn,
 > +			    blsize, &rablock, &rasize, 1, NOCRED, &bp);
 >   		} else {
 > -			blsize = pmp->pm_bpcluster;
 > -			rablock = lbn + 1;
 > -			if (seqcount > 1 &&
 > -			    de_cn2off(pmp, rablock) < dep->de_FileSize) {
 > -				rasize = pmp->pm_bpcluster;
 > -				error = breadn(vp, lbn, blsize,
 > -				    &rablock, &rasize, 1, NOCRED, &bp);
 > -			} else {
 > -				error = bread(vp, lbn, blsize, NOCRED, &bp);
 > -			}
 > +			error = bread(vp, lbn, blsize, NOCRED, &bp);
 >   		}
 >   		if (error) {
 > %%%
 >
 > I rearranged the code to be almost lexically identical with that in
 > ffs_read().
 
 Thanks, I will use this as a basis for any other things I try.
 
 > I only tested this on a relatively fast ATA drive.  It made little
 > difference.  Most writes were clustered to give a block size of 64K
 > and write speed of over 40+MB/s until the disk is nearly full, but
 > reads weren't clustered with or without the patch so the block size
 > remained at the fs block size (4K); the drive handles this block size
 > mediocrely and gave a read speed of 20+MB/sec.  (The drive is a WDC
 > 1200JB-00CRA1.  This drive has the interesting behaviour of giving
 > almost the same mediocre read speed for all block sizes between 2.5K
 > and 19.5K.  A block size 20K gives maximal speed which is about twice
 > as fast as the speed for a block size of 19.5K.)
 
 I am still confused as to how reading blsize * 16 actually improved
 the transfer rate after a long period of making it worse. Perhaps it
 is related to the buffer resource problem you describe below.
 
 > Both reading and writing a 1GB file to/from msdosfs caused noticable
 > buffer resource problems.  Accesses to other file systems on the same
 > disk sometimes blocked for many seconds.  I have debugging code in
 > getblk().  It reported that a process waited 17 seconds in or near
 > getblk().  The process only stopped waiting because I suspended the
 > process accessing msdosfs.  This may be a local bug.
 
 I'll look for buffer resource statistics in the system tools and
 measure those. There are no obvious signs, to me, that the systems are
 in any specific difficulties while running the transfers.
 
 > Bruce
 
 Thanks a lot for the answers and code,
 -- 
 Dominic
 GoodforBusiness.co.uk
 I.T. Services for SMEs in the UK.
 _______________________________________________
 freebsd-fs@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-fs
 To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"
 
 

From: Bruce Evans <bde@zeta.org.au>
To: Dominic Marks <dom@goodforbusiness.co.uk>
Cc: freebsd-fs@FreeBSD.org, freebsd-gnats-submit@FreeBSD.org,
   banhalmi@field.hu
Subject: Re: i386/68719: [usb] USB 2.0 mobil rack+ fat32 performance problem
Date: Tue, 31 May 2005 13:05:26 +1000 (EST)

 On Mon, 30 May 2005, Dominic Marks wrote:
 
 > On Monday 30 May 2005 11:11, Bruce Evans wrote:
 >> The main problem is that VOP_BMAP() is not fully implemented for msdosfs.
 >> msdosfs_bmap() only has a stub which pretends that clustering ins never
 >> possible:
 >
 > If I understand what is supposed to be done here (I looked at cd9660 but
 > I don't know if the rules are different from msdos), a_runp should be set
 > to the extent of contiguous blocks from the current position within the
 > same region? I put some debugging into msdosfs_bmap and here it is copied:
 
 cd9660 is deceptively simple here because (I think) it allocates files
 in perfectly contiguous extents.
 
 msdosfs, ffs^ufs and ext2fs have to do considerable work to map even a
 single block.  The details are in pcbmap() for msdosfs.  (The name of this
 function dates from when msdosfs was named pcfs.)  I think msdosfs_bmap()
 just needs to call this function for each block following the start block
 until a discontiguity is hit or a limit (*) is reached.
 
 ufs and ext2fs have an optimized and obfucsated version of this, with
 multiple blocks looked up at once and the single-block lookup implemented
 as a multiple-block lookup with a count of 1.  I doubt that this
 optimization is significant even for ufs, at least now that CPUs are
 10 to 100 times as fast relative to I/O as when it was implemented.
 However it is easier to optimize for msdosfs since there are no
 indirect blocks.
 
 All of cd9660, ufs and ext2fs have a whole file *_bmap.c for bmapping.
 ext2_bmaparray() is simplest, but bmapping in ext2fs and ufs is so
 similar that misspelling ext2_getlbns() as ufs_getlbns() in 1 caller
 is harmless.
 
 (*) The correct limit is mnt_iosize_max bytes.  cd9660 uses the wrong
 limit of MAXBSIZE.
 
 > (fsz is dep->de_FileSize)
 >
 > msdosfs_bmap: fsz  81047  blkno  6374316  lblkno 5
 > ...
 > msdosfs_bmap: fsz  81047  blkno  6374364  lblkno 11
 > msdosfs_bmap: fsz  81047  blkno  6374372  lblkno 12 # A1
 > msdosfs_bmap: fsz  81047  blkno 13146156  lblkno 13 # A2
 > msdosfs_bmap: fsz  81047  blkno 13146156  lblkno 14
 > ...
 >
 > I should compute the position of the boundary illustrated in A1 I should set
 > that to the read ahead value, until setting a new value at A2, perhaps this
 > should only be done for particularly large files? I will look at the other
 > _bmap routines to see what they do.
 
 Better to do it for all files.  For small files there are just fewer
 blocks to check for contiguity.
 
 > I am still confused as to how reading blsize * 16 actually improved
 > the transfer rate after a long period of making it worse. Perhaps it
 > is related to the buffer resource problem you describe below.
 
 Could be.  The buffer cache layer doesn't handle either overlapping
 buffers or variant buffer sizes very well.  Buffer sizes of (blsize *
 16) mixed with buffer sizes of blsize for msdosfs and 16K for ffs may
 excercise both of these.
 
 Bruce

From: "Bruce Evans" <bde@zeta.org.au>
To: <james>
Cc: <freebsd-fs@FreeBSD.org>,
	<freebsd-gnats-submit@FreeBSD.org>,
	<banhalmi@field.hu>
Subject: Re: i386/68719: [usb] USB 2.0 mobil rack+ fat32 performance problem
Date: Tue, 31 May 2005 05:30:20 +0100

 On Mon, 30 May 2005, Dominic Marks wrote:
 
 > On Monday 30 May 2005 11:11, Bruce Evans wrote:
 >> The main problem is that VOP_BMAP() is not fully implemented for msdosfs.
 >> msdosfs_bmap() only has a stub which pretends that clustering ins never
 >> possible:
 >
 > If I understand what is supposed to be done here (I looked at cd9660 but
 > I don't know if the rules are different from msdos), a_runp should be set
 > to the extent of contiguous blocks from the current position within the
 > same region? I put some debugging into msdosfs_bmap and here it is copied:
 
 cd9660 is deceptively simple here because (I think) it allocates files
 in perfectly contiguous extents.
 
 msdosfs, ffs^ufs and ext2fs have to do considerable work to map even a
 single block.  The details are in pcbmap() for msdosfs.  (The name of this
 function dates from when msdosfs was named pcfs.)  I think msdosfs_bmap()
 just needs to call this function for each block following the start block
 until a discontiguity is hit or a limit (*) is reached.
 
 ufs and ext2fs have an optimized and obfucsated version of this, with
 multiple blocks looked up at once and the single-block lookup implemented
 as a multiple-block lookup with a count of 1.  I doubt that this
 optimization is significant even for ufs, at least now that CPUs are
 10 to 100 times as fast relative to I/O as when it was implemented.
 However it is easier to optimize for msdosfs since there are no
 indirect blocks.
 
 All of cd9660, ufs and ext2fs have a whole file *_bmap.c for bmapping.
 ext2_bmaparray() is simplest, but bmapping in ext2fs and ufs is so
 similar that misspelling ext2_getlbns() as ufs_getlbns() in 1 caller
 is harmless.
 
 (*) The correct limit is mnt_iosize_max bytes.  cd9660 uses the wrong
 limit of MAXBSIZE.
 
 > (fsz is dep->de_FileSize)
 >
 > msdosfs_bmap: fsz  81047  blkno  6374316  lblkno 5
 > ...
 > msdosfs_bmap: fsz  81047  blkno  6374364  lblkno 11
 > msdosfs_bmap: fsz  81047  blkno  6374372  lblkno 12 # A1
 > msdosfs_bmap: fsz  81047  blkno 13146156  lblkno 13 # A2
 > msdosfs_bmap: fsz  81047  blkno 13146156  lblkno 14
 > ...
 >
 > I should compute the position of the boundary illustrated in A1 I should set
 > that to the read ahead value, until setting a new value at A2, perhaps this
 > should only be done for particularly large files? I will look at the other
 > _bmap routines to see what they do.
 
 Better to do it for all files.  For small files there are just fewer
 blocks to check for contiguity.
 
 > I am still confused as to how reading blsize * 16 actually improved
 > the transfer rate after a long period of making it worse. Perhaps it
 > is related to the buffer resource problem you describe below.
 
 Could be.  The buffer cache layer doesn't handle either overlapping
 buffers or variant buffer sizes very well.  Buffer sizes of (blsize *
 16) mixed with buffer sizes of blsize for msdosfs and 16K for ffs may
 excercise both of these.
 
 Bruce
 _______________________________________________
 freebsd-fs@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-fs
 To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"
 
 
Responsible-Changed-From-To: freebsd-usb->freebsd-bugs 
Responsible-Changed-By: linimon 
Responsible-Changed-When: Sat Jul 23 05:40:28 GMT 2005 
Responsible-Changed-Why:  
This does not seem to be USB-specific, from the followups. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=68719 
Responsible-Changed-From-To: freebsd-bugs->trhodes 
Responsible-Changed-By: remko 
Responsible-Changed-When: Fri Dec 29 20:45:55 UTC 2006 
Responsible-Changed-Why:  
assign to tom 

http://www.freebsd.org/cgi/query-pr.cgi?pr=68719 

From: Tom Rhodes <trhodes@FreeBSD.org>
To: bug-followup@FreeBSD.org
Cc: banhalmi@field.hu, Dominic Marks <dom@goodforbusiness.co.uk>
Subject: Re: kern/68719: [msdosfs] [patch] poor performance with msdosfs and
 USB 2.0 mobil rack
Date: Wed, 7 Mar 2007 22:10:53 -0500

 Hi,
 
 Just MFC'ed to RELENG_6, a patch that adds an additional cache which
 should improve performance in the case of large files.  Could you
 let me know if it helps your situation?  Thanks,
 
 -- 
 Tom Rhodes
State-Changed-From-To: open->feedback 
State-Changed-By: linimon 
State-Changed-When: Sat Jun 30 08:27:19 UTC 2007 
State-Changed-Why:  
Note that submitter was asked for feedback some time ago. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=68719 
State-Changed-From-To: feedback->closed 
State-Changed-By: linimon 
State-Changed-When: Sat Mar 1 23:46:21 UTC 2008 
State-Changed-Why:  
Feedback timeout (> 6 months).  The problem is believed to be fixed 
in RELENG_6. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=68719 
>Unformatted:
