From swear@blarg.net  Thu Mar  7 12:55:52 2002
Return-Path: <swear@blarg.net>
Received: from lists.blarg.net (lists.blarg.net [206.124.128.17])
	by hub.freebsd.org (Postfix) with ESMTP id 1FA4437B402
	for <FreeBSD-gnats-submit@freebsd.org>; Thu,  7 Mar 2002 12:55:52 -0800 (PST)
Received: from thig.blarg.net (thig.blarg.net [206.124.128.18])
	by lists.blarg.net (Postfix) with ESMTP id C4DE1BD87
	for <FreeBSD-gnats-submit@freebsd.org>; Thu,  7 Mar 2002 12:55:51 -0800 (PST)
Received: from localhost.localdomain ([206.124.139.115])
	by thig.blarg.net (8.9.3/8.9.3) with ESMTP id MAA20763
	for <FreeBSD-gnats-submit@freebsd.org>; Thu, 7 Mar 2002 12:55:51 -0800
Received: (from jojo@localhost)
	by localhost.localdomain (8.11.6/8.11.3) id g27Kx4646723;
	Thu, 7 Mar 2002 12:59:04 -0800 (PST)
	(envelope-from swear@blarg.net)
Message-Id: <e0bse0ccx4.se0@localhost.localdomain>
Date: 07 Mar 2002 12:59:03 -0800
From: "Gary W. Swearingen" <swear@blarg.net>
Reply-To: swear@blarg.net
To: FreeBSD-gnats-submit@freebsd.org
Subject: cp(1) page needs a "Bugs" section.
X-GNATS-Notify:

>Number:         35646
>Category:       docs
>Synopsis:       cp(1) page needs a "Bugs" section.
>Confidential:   no
>Severity:       non-critical
>Priority:       low
>Responsible:    freebsd-doc
>State:          closed
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          change-request
>Submitter-Id:   current-users
>Arrival-Date:   Thu Mar 07 13:00:05 PST 2002
>Closed-Date:    Tue May 16 20:58:11 GMT 2006
>Last-Modified:  Tue May 16 20:58:11 GMT 2006
>Originator:     Gary W. Swearingen
>Release:        FreeBSD 4.5-STABLE i386
>Organization:
none
>Environment:
n/a
================
>Description:

The cp(1) program has a feature that should be documented in a "Bugs"
section.  (Or can there be a "Warnings" section?)

The "cp" program removes "holes" from "sparse" files while copying,
resulting in a non-exact copy, in some sense.
================
>How-To-Repeat:

$ df .
Filesystem  1K-blocks     Used    Avail Capacity  Mounted on
/dev/ad0s2f   4530961  1477900  2690585    35%    /u
$ dd if=/dev/zero of=zeros-sparse oseek=1000 count=2
2+0 records in
2+0 records out
1024 bytes transferred in 0.000205 secs (4994148 bytes/sec)
$ df .
Filesystem   1K-blocks     Used    Avail Capacity  Mounted on
/dev/ad0s2f    4530961  1477916  2690569    35%    /u
$ cp zeros-sparse zeros-cp
$ df .
Filesystem   1K-blocks     Used    Avail Capacity  Mounted on
/dev/ad0s2f    4530961  1478428  2690057    35%    /u
$ l zero*
-rw-r-----  1 root  wheel  513024 Mar  7 12:42 zeros-cp
-rw-r-----  1 root  wheel  513024 Mar  7 12:42 zeros-sparse

================
>Fix:

Add a "Bugs" section explaining about "sparse" files and "holes"
and how "cp" handles them.
>Release-Note:
>Audit-Trail:

From: Giorgos Keramidas <keramida@freebsd.org>
To: "Gary W. Swearingen" <swear@blarg.net>
Cc: bug-followup@freebsd.org
Subject: Re: docs/35646: cp(1) page needs a "Bugs" section.
Date: Fri, 8 Mar 2002 02:07:24 +0200 (EET)

 Gary W. Swearingen wrote:
 
 > The cp(1) program has a feature that should be documented in a "Bugs"
 > section.  (Or can there be a "Warnings" section?)
 
 Are you sure this should be documented in the manual page of cp(1) ?
 Any program that copies data and doesn't take special care of 'holes' will
 show similar behavior.  Should we modify their manual pages too?
 
 	[ See what dd(1) does instead of cp(1) below. ]
 
 	hades:~> cd /tmp
 	hades:/tmp> df .
 	Filesystem   1K-blocks     Used    Avail Capacity  Mounted on
 	/dev/ad0s1a     194548    65916   113069    37%    /
 	hades:/tmp> dd if=/dev/zero of=zeros-sparse oseek=1000 count=2
 	2+0 records in
 	2+0 records out
 	1024 bytes transferred in 0.000685 secs (1494682 bytes/sec)
 	hades:/tmp> df .
 	Filesystem   1K-blocks     Used    Avail Capacity  Mounted on
 	/dev/ad0s1a     194548    65932   113053    37%    /
 	hades:/tmp> dd if=zeros-sparse of=zeros-dd
 	1002+0 records in
 	1002+0 records out
 	513024 bytes transferred in 0.286582 secs (1790147 bytes/sec)
 
 	^^^ Many blocks copied.
 
 	hades:/tmp> df .
 	Filesystem   1K-blocks     Used    Avail Capacity  Mounted on
 	/dev/ad0s1a     194548    66444   112541    37%    /
 	hades:/tmp> ls -l zeros-*
 	-rw-r--r--  1 charon  wheel  - 513024 Mar  8 02:03 zeros-dd
 	-rw-r--r--  1 charon  wheel  - 513024 Mar  8 02:03 zeros-sparse
 
 Note that I'm not opposing the change.  I'm only asking for ideas about all the
 possible programs that will behave exactly like cp(1) and dd(1) do, when they
 find files with 'holes'.
 
 Giorgos Keramidas                       FreeBSD Documentation Project
 keramida@{freebsd.org,ceid.upatras.gr}  http://www.FreeBSD.org/docproj/
 

From: swear@blarg.net (Gary W. Swearingen)
To: Giorgos Keramidas <keramida@freebsd.org>
Cc: bug-followup@freebsd.org
Subject: Re: docs/35646: cp(1) page needs a "Bugs" section.
Date: 07 Mar 2002 19:43:33 -0800

 Giorgos Keramidas <keramida@freebsd.org> writes:
 
 > Are you sure this should be documented in the manual page of cp(1) ?
 > Any program that copies data and doesn't take special care of 'holes' will
 > show similar behavior.  Should we modify their manual pages too?
 > 
 > 	[ See what dd(1) does instead of cp(1) below. ]
 [snip...]
 > Note that I'm not opposing the change.  I'm only asking for ideas about all the
 > possible programs that will behave exactly like cp(1) and dd(1) do, when they
 > find files with 'holes'.
 
 You're clever to think of such things.  If the OS could always hide the
 fact that it was compressing or uncompressing files like this, then it
 would never need mentioning outside the filesytem documenation.  But it
 doesn't.  A user of "cp" or "dd" should be able to predict, based on his
 reading of the man page or maybe some handbook, whether his use of the
 command will over-fill his filesystem.  Currently, he must resort to
 trail and error, a method dear to many UNIX users, but not to many
 others. (Of course, many will not read about it until being bitten.)
 
 Such knowledge probably should also be available to users of ">", "|",
 "cat", and probably some others.  Probably less important for "vi",
 "sed", "awk", because few have expectations as to the size of their
 outputs.  It's going to go undocumented in many cases, but I think
 "cp" and "dd" are special cases as one often cares much about their
 outputs.  One expects a copy to be identical to the original for all
 purposes, not just most purposes.  I've seen the issue discussed before
 and it would have been nice to be able to point to documentation on it.
 
 But then, I didn't provide such documentation...

From: Giorgos Keramidas <keramida@freebsd.org>
To: "Gary W. Swearingen" <swear@blarg.net>
Cc: bug-followup@freebsd.org
Subject: Re: docs/35646: cp(1) page needs a "Bugs" section.
Date: Fri, 8 Mar 2002 05:59:18 +0200 (EET)

 Gary W. Swearingen wrote:
 
 > Giorgos Keramidas <keramida@freebsd.org> writes:
 >
 > > Are you sure this should be documented in the manual page of cp(1) ?
 > > Any program that copies data and doesn't take special care of 'holes' will
 > > show similar behavior.  Should we modify their manual pages too?
 > >
 > > 	[ See what dd(1) does instead of cp(1) below. ]
 > [snip...]
 > > Note that I'm not opposing the change.  I'm only asking for ideas about all the
 > > possible programs that will behave exactly like cp(1) and dd(1) do, when they
 > > find files with 'holes'.
 >
 > You're clever to think of such things.  If the OS could always hide the
 > fact that it was compressing or uncompressing files like this, then it
 > would never need mentioning outside the filesytem documenation.  But it
 > doesn't.  A user of "cp" or "dd" should be able to predict, based on his
 > reading of the man page or maybe some handbook, whether his use of the
 > command will over-fill his filesystem.  Currently, he must resort to
 > trail and error, a method dear to many UNIX users, but not to many
 > others. (Of course, many will not read about it until being bitten.)
 
 A more general solution is needed.  This is what I was trying to point out.
 Many commands will do strange things with files that have holes.  A few
 that I could think off the top of my head were:
 
 	cat file1 > file2
 	cat < file1 > file2
 	cp file1 file2
 	awk scripts
 	sed scripts
 	perl filters
 
 Practically, any command that does not have knowledge of the underlying
 filesystem data-structures will copy the 'wrong' amount of data.  AFAIK,
 only dump(8) and restore(8) handle files with holes correctly; but these
 commands work directly on the filesystem device.
 
 I'll have to think about this a bit more.  I'll get back to you soon.
 
 Giorgos Keramidas                       FreeBSD Documentation Project
 keramida@{freebsd.org,ceid.upatras.gr}  http://www.FreeBSD.org/docproj/
 
State-Changed-From-To: open->closed 
State-Changed-By: keramida 
State-Changed-When: Tue May 16 20:56:23 UTC 2006 
State-Changed-Why:  
I don't think there is a general way to document sparse file in 
all the possible places where they may come up as something 
resembling a "surprise" for users who don't know their internals. 

Documentation about sparse files doesn't belong in all the manpages 
but in introductory UNIX documentation, IMHO. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=35646 
>Unformatted:
