From louie@whizzo.transsys.com  Wed Jun 18 19:09:46 2003
Return-Path: <louie@whizzo.transsys.com>
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 8D1C937B401
	for <FreeBSD-gnats-submit@freebsd.org>; Wed, 18 Jun 2003 19:09:46 -0700 (PDT)
Received: from whizzo.transsys.com (whizzo.TransSys.COM [144.202.42.10])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 8222043FA3
	for <FreeBSD-gnats-submit@freebsd.org>; Wed, 18 Jun 2003 19:09:45 -0700 (PDT)
	(envelope-from louie@whizzo.transsys.com)
Received: from whizzo.transsys.com (#6@localhost [127.0.0.1])
	by whizzo.transsys.com (8.12.9/8.12.9) with ESMTP id h5J29iW9065660
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO)
	for <FreeBSD-gnats-submit@freebsd.org>; Wed, 18 Jun 2003 22:09:44 -0400 (EDT)
	(envelope-from louie@whizzo.transsys.com)
Received: (from louie@localhost)
	by whizzo.transsys.com (8.12.9/8.12.9/Submit) id h5J29iF4065659;
	Wed, 18 Jun 2003 22:09:44 -0400 (EDT)
Message-Id: <200306190209.h5J29iF4065659@whizzo.transsys.com>
Date: Wed, 18 Jun 2003 22:09:44 -0400 (EDT)
From: Louis Mamakos <louie@TransSys.COM>
Reply-To: Louis Mamakos <louie@TransSys.COM>
To: FreeBSD-gnats-submit@freebsd.org
Cc:
Subject: cp(1) copies files in reverse order to destination
X-Send-Pr-Version: 3.113
X-GNATS-Notify:

>Number:         53475
>Category:       bin
>Synopsis:       cp(1) copies files in reverse order to destination
>Confidential:   no
>Severity:       non-critical
>Priority:       low
>Responsible:    freebsd-bugs
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          change-request
>Submitter-Id:   current-users
>Arrival-Date:   Wed Jun 18 19:10:10 PDT 2003
>Closed-Date:    
>Last-Modified:  Wed Apr 28 00:40:14 PDT 2004
>Originator:     Louis Mamakos
>Release:        FreeBSD 4.8-STABLE i386
>Organization:
>Environment:
System: FreeBSD whizzo.transsys.com 4.8-STABLE FreeBSD 4.8-STABLE #6: Sun Apr 6 11:00:39 EDT 2003 louie@whizzo.transsys.com:/a/obj/usr/src/sys/WHIZZO i386



>Description:

The cp(1) command produces surprising behavior when copying multiple
files to a destinationd directory.  The files are copied in reverse
order.  This is of consequence when the order of files in a directory
has meaning; e.g., in an mp3 player appliance the sequentially plays
files in an MS-DOS filesystem directory.

This is very counter-intuitive to the user.

>How-To-Repeat:
 mkdir /tmp/foo /tmp/bar
 touch /tmp/foo/1 /tmp/foo/2 /tmp/foo/3 /tmp/foo/4 /tmp/foo/5 /tmp/foo/6
 cp -v /tmp/foo/1 /tmp/foo/2 /tmp/foo/3 /tmp/foo/4 /tmp/foo/5 /tmp/foo/6 /tmp/bar

>Fix:

BTFOM.  Sneaking suspicion that mastercmp() and related callers are
implicated in this.



>Release-Note:
>Audit-Trail:

From: "Dorr H. Clark" <dclark@applmath.scu.edu>
To: freebsd-gnats-submit@FreeBSD.org, louie@TransSys.COM
Cc:  
Subject: Re: bin/53475: cp(1) copies files in reverse order to destination
Date: Tue, 27 Apr 2004 19:03:14 -0700

 --- cp.c_orig   Sun Apr 25 06:32:27 2004
 +++ cp.c        Sun Apr 25 06:33:50 2004
 @@ -94,7 +94,6 @@
  enum op { FILE_TO_FILE, FILE_TO_DIR, DIR_TO_DNE };
  
  static int copy(char *[], enum op, int);
 -static int mastercmp(const FTSENT * const *, const FTSENT * const *);
  static void siginfo(int __unused);
  
  int
 @@ -274,7 +273,7 @@
         mask = ~umask(0777);
         umask(~mask);
  
 -       if ((ftsp = fts_open(argv, fts_options, mastercmp)) == NULL)
 +       if ((ftsp = fts_open(argv, fts_options, NULL)) == NULL)
                 err(1, "fts_open");
         for (badcp = rval = 0; (curr = fts_read(ftsp)) != NULL; badcp =
 0) {
                 switch (curr->fts_info) {
 @@ -478,32 +477,6 @@
         if (errno)
                 err(1, "fts_read");
         return (rval);
 -}
 -
 -/*
 - * mastercmp --
 - *     The comparison function for the copy order.  The order is to
 copy
 - *     non-directory files before directory files.  The reason for this
 - *     is because files tend to be in the same cylinder group as their
 - *     parent directory, whereas directories tend not to be.  Copying
 the
 - *     files first reduces seeking.
 - */
 -static int
 -mastercmp(const FTSENT * const *a, const FTSENT * const *b)
 -{
 -       int a_info, b_info;
 -
 -       a_info = (*a)->fts_info;
 -       if (a_info == FTS_ERR || a_info == FTS_NS || a_info == FTS_DNR)
 -               return (0);
 -       b_info = (*b)->fts_info;
 -       if (b_info == FTS_ERR || b_info == FTS_NS || b_info == FTS_DNR)
 -               return (0);
 -       if (a_info == FTS_D)
 -               return (-1);
 -       if (b_info == FTS_D)
 -               return (1);
 -       return (0);
  }
  
  static void
 
 
 As quoted above, the comments in cp.c tell us the function 
 mastercmp() is an attempt to improve performance based on 
 knowing something about physical disks.
  
 This is an old optimization strategy (it's in the original 
 version of cp.c).  AFAIK, in the updated BSD filesystem, 
 when we copy a file, we don't actually move the
 physical data block of the file but change the information in its
 inode such as the address of its data block and owner.  
 
 Deleting mastercmp() and setting the comparison paramter to NULL 
 for the function fts_open() suppresses the behavior 
 in the bug.
 
 The next question is whether deleting mastercmp eliminates
 an optimization.  Our testing shows the exact opposite,
 mastercmp is degrading performance.  We did several experiments
 with cp -R to measure elapsed time on transfers between devices
 of differing file system types (to avoid UFS2 optimizations).
 Our results show removing mastercmp yields a small performance
 gain (note: we had no SCSI devices available, and second note:
 variability in file system performance seems dominated 
 by other factors).
 
 M. K. McKusick has indicated in seminars that modern disk drives
 lie to the driver about their physical layouts.  The use of
 mastercmp in cp.c is a legacy optimization from a different
 era of disk technology.  We recommend removing this call
 from cp.c to address 53475.
 
 Ting Hui, engineer
 Dorr H. Clark, advisor
 Graduate School of Engineering
 Santa Clara University

From: Bruce Evans <bde@zeta.org.au>
To: "Dorr H. Clark" <dclark@applmath.scu.edu>
Cc: freebsd-gnats-submit@freebsd.org
Subject: Re: bin/53475: cp(1) copies files in reverse order to destination
Date: Wed, 28 Apr 2004 17:37:25 +1000 (EST)

 On Tue, 27 Apr 2004, Dorr H. Clark wrote:
 
 > ...
 >  -/*
 >  - * mastercmp --
 >  - *     The comparison function for the copy order.  The order is to
 >  copy
 >  - *     non-directory files before directory files.  The reason for this
 >  - *     is because files tend to be in the same cylinder group as their
 >  - *     parent directory, whereas directories tend not to be.  Copying
 >  the
 >  - *     files first reduces seeking.
 >  - */
 
 According to cp -pRv, mastercmp() gets this perfectly backwards: cp
 actually copies directories first.  It seems to just randomize the
 order of regular files; this is presumably because mastercmp() doesn't
 distinguish between all pairs of different files and qsort() doesn't
 preserve the original order.
 
 > ...
 >  As quoted above, the comments in cp.c tell us the function
 >  mastercmp() is an attempt to improve performance based on
 >  knowing something about physical disks.
 >
 >  This is an old optimization strategy (it's in the original
 >  version of cp.c).  AFAIK, in the updated BSD filesystem,
 >  when we copy a file, we don't actually move the
 >  physical data block of the file but change the information in its
 >  inode such as the address of its data block and owner.
 
 Copying still involves lots of physical i/o.  The difference in
 relatively recent versions of ffs is that it doesn't scatter the files
 so much by switching the cylinder group too often.  IIRC, it switched
 for every directory.
 
 >  The next question is whether deleting mastercmp eliminates
 >  an optimization.  Our testing shows the exact opposite,
 >  mastercmp is degrading performance.  We did several experiments
 >  with cp -R to measure elapsed time on transfers between devices
 >  of differing file system types (to avoid UFS2 optimizations).
 >  Our results show removing mastercmp yields a small performance
 >  gain (note: we had no SCSI devices available, and second note:
 >  variability in file system performance seems dominated
 >  by other factors).
 
 It would be interesting to know if mastercmp() works better if it does
 what its comment says it does.  I suspect that the backwardsness doesn't
 make much difference, but is worse than it used to be because there
 is now more competition for space in the same cylinder group.  I think
 benchmarks that don't descend into subdirs would show that using
 mastercmp really is an optimization for that access pattern, but I
 think that access pattern is relatively unusual.  Optimizing for the
 default fts order seems as good as anything.
 
 >  M. K. McKusick has indicated in seminars that modern disk drives
 >  lie to the driver about their physical layouts.  The use of
 >  mastercmp in cp.c is a legacy optimization from a different
 >  era of disk technology.  We recommend removing this call
 >  from cp.c to address 53475.
 
 Large seeks (especially ones larger than the drive's cache) still
 matter, and I think drivers rarely lie about these.  cp's attempted
 optimization is more about second-guessing what ffs does.  I agree
 that it shouldn't do this.  The file system might not even be ffs.
 
 Bruce
>Unformatted:
