From swear@blarg.net  Sun Jan 13 16:59:04 2002
Return-Path: <swear@blarg.net>
Received: from lists.blarg.net (lists.blarg.net [206.124.128.17])
	by hub.freebsd.org (Postfix) with ESMTP id A649437B41D
	for <FreeBSD-gnats-submit@freebsd.org>; Sun, 13 Jan 2002 16:59:03 -0800 (PST)
Received: from thig.blarg.net (thig.blarg.net [206.124.128.18])
	by lists.blarg.net (Postfix) with ESMTP id 5F220BD30
	for <FreeBSD-gnats-submit@freebsd.org>; Sun, 13 Jan 2002 16:59:03 -0800 (PST)
Received: from localhost.localdomain ([206.124.139.115])
	by thig.blarg.net (8.9.3/8.9.3) with ESMTP id QAA27267
	for <FreeBSD-gnats-submit@freebsd.org>; Sun, 13 Jan 2002 16:59:02 -0800
Received: (from jojo@localhost)
	by localhost.localdomain (8.11.6/8.11.3) id g0E11se03961;
	Sun, 13 Jan 2002 17:01:54 -0800 (PST)
	(envelope-from swear@blarg.net)
Message-Id: <3hzo3hwxrx.o3h@localhost.localdomain>
Date: 13 Jan 2002 17:01:54 -0800
From: "Gary W. Swearingen" <swear@blarg.net>
Reply-To: swear@blarg.net
To: FreeBSD-gnats-submit@freebsd.org
Subject: split(1) man page implies that input file is removed.
X-GNATS-Notify:

>Number:         33852
>Category:       docs
>Synopsis:       split(1) man page implies that input file is removed.
>Confidential:   no
>Severity:       non-critical
>Priority:       low
>Responsible:    keramida
>State:          closed
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          doc-bug
>Submitter-Id:   current-users
>Arrival-Date:   Sun Jan 13 17:00:04 PST 2002
>Closed-Date:    Sat Jan 26 11:38:38 UTC 2008
>Last-Modified:  Sat Jan 26 11:40:00 UTC 2008
>Originator:     Gary W. Swearingen
>Release:        FreeBSD 4.5-PRERELEASE i386
>Organization:
none
>Environment:
n/a
================
>Description:

1) The split(1) man page can leave some readers wondering whether the
input file itself (or a copy of it) is split into the output files.
That is, does "split" remove the input file? (After splitting a log,
the whole log is gone.)

2) The synopsis line implies that the -b, -l, and -p options may be
used together and the -p option's description only partially clarifies
the fact that only one of those options may be used successfully.

3) The page's use of "file" for a filename and "name" for a prefix is
not as clear as it could be.
================
>How-To-Repeat:
n/a
================
>Fix:

1) Change wording.
2) Change first SYNOPSIS line from
    split [-b byte_count[k|m]] [-l line_count] [-p pattern]
to
    split [-b byte_count[k|m] | -l line_count | -p pattern]
3) Change words.

NOTE: I've put a one-line patch for split.c at the end to make "usage"
match "SYNOPSIS".  If nobody wants to use it now, I can write a PR for
it later.

--- /tmp/split..orig.1	Sun Jan 13 13:59:09 2002
+++ /tmp/split.1	Sun Jan 13 15:11:33 2002
@@ -40,17 +40,21 @@
 .Nd split a file into pieces
 .Sh SYNOPSIS
 .Nm
-.Op Fl b Ar byte_count[k|m]
-.Op Fl l Ar line_count
-.Op Fl p Ar pattern
-.Op Ar file Op Ar name
+.Op Fl b Ar byte_count[k|m] | Fl l Ar line_count | Fl p Ar pattern
+.Op Ar filename Op Ar prefix
 .Sh DESCRIPTION
 The
 .Nm
-utility reads the given
-.Ar file
+utility reads file
+.Ar filename
 (or standard input if no file is specified)
-and breaks it up into files of 1000 lines each.
+and breaks it up into files of 1000 lines
+(or an optionally specified size) each, leaving file
+.Ar filename
+unchanged.
+No padding is added, so the last new file is normally smaller than the
+others and proper catenation of the output files creates a copy of the
+unsplit original.
 .Pp
 The options are as follows:
 .Bl -tag -width Ds
@@ -77,11 +81,6 @@
 .Ar pattern ,
 which is interpreted as an extended regular expression.
 The matching line will be the first line of the next output file.
-This option is incompatible with the
-.Fl b
-and
-.Fl l
-options.
 .El
 .Pp
 If additional arguments are specified, the first is used as the name
@@ -93,13 +92,13 @@
 .Dq Li aa-zz .
 .Pp
 If the
-.Ar name
+.Ar prefix
 argument is not specified, the file is split into lexically ordered
 files named in the range of
 .Dq Li xaa-zzz .
 .Sh BUGS
 For historical reasons, if you specify
-.Ar name ,
+.Ar prefix ,
 .Nm
 can only create 676 separate
 files.
--- /tmp/split..orig.c	Sun Jan 13 15:13:54 2002
+++ /tmp/split.c	Sun Jan 13 15:18:26 2002
@@ -312,6 +312,6 @@
 usage()
 {
 	(void)fprintf(stderr,
-"usage: split [-b byte_count] [-l line_count] [-p pattern] [file [prefix]]\n");
+"usage: split [-b byte_count | -l line_count | -p pattern] [filename [prefix]]\n");
 	exit(EX_USAGE);
 }
>Release-Note:
>Audit-Trail:

From: Ruslan Ermilov <ru@FreeBSD.org>
To: "Gary W. Swearingen" <swear@blarg.net>
Cc: bug-followup@FreeBSD.org
Subject: Re: docs/33852: split(1) man page implies that input file is removed.
Date: Mon, 14 Jan 2002 11:44:01 +0200

 On Sun, Jan 13, 2002 at 05:01:54PM -0800, Gary W. Swearingen wrote:
 > 
 > 1) The split(1) man page can leave some readers wondering whether the
 > input file itself (or a copy of it) is split into the output files.
 > That is, does "split" remove the input file? (After splitting a log,
 > the whole log is gone.)
 > 
 > 2) The synopsis line implies that the -b, -l, and -p options may be
 > used together and the -p option's description only partially clarifies
 > the fact that only one of those options may be used successfully.
 > 
 > 3) The page's use of "file" for a filename and "name" for a prefix is
 > not as clear as it could be.
 > 
 [...]
 
 > >Fix:
 > 
 > 1) Change wording.
 > 2) Change first SYNOPSIS line from
 >     split [-b byte_count[k|m]] [-l line_count] [-p pattern]
 > to
 >     split [-b byte_count[k|m] | -l line_count | -p pattern]
 > 3) Change words.
 > 
 > NOTE: I've put a one-line patch for split.c at the end to make "usage"
 > match "SYNOPSIS".  If nobody wants to use it now, I can write a PR for
 > it later.
 > 
 > --- /tmp/split..orig.1	Sun Jan 13 13:59:09 2002
 > +++ /tmp/split.1	Sun Jan 13 15:11:33 2002
 > @@ -40,17 +40,21 @@
 >  .Nd split a file into pieces
 >  .Sh SYNOPSIS
 >  .Nm
 > -.Op Fl b Ar byte_count[k|m]
 > -.Op Fl l Ar line_count
 > -.Op Fl p Ar pattern
 > -.Op Ar file Op Ar name
 > +.Op Fl b Ar byte_count[k|m] | Fl l Ar line_count | Fl p Ar pattern
 > +.Op Ar filename Op Ar prefix
 > 
 I don't like changing "file" to "filename", because "file" is a
 standard value that's output if you don't give .Ar any arguments.
 
 >  .Sh DESCRIPTION
 >  The
 >  .Nm
 > -utility reads the given
 > -.Ar file
 > +utility reads file
 > +.Ar filename
 >  (or standard input if no file is specified)
 > -and breaks it up into files of 1000 lines each.
 > +and breaks it up into files of 1000 lines
 > +(or an optionally specified size) each, leaving file
 > +.Ar filename
 > +unchanged.
 > 
 or an optionally specified pattern (-p).
 
 This IMO unnecessarily duplicates options description.
 
 > +No padding is added, so the last new file is normally smaller than the
 > +others and proper catenation of the output files creates a copy of the
 > +unsplit original.
 > 
 This clause is not true for the -p case, which is not size-constrained.
 
 I'd be happy to commit this patch instead, if you like (based on
 your version):
 
 Index: split.1
 ===================================================================
 RCS file: /home/ncvs/src/usr.bin/split/split.1,v
 retrieving revision 1.6
 diff -u -p -r1.6 split.1
 --- split.1	2001/07/15 08:01:34	1.6
 +++ split.1	2002/01/14 09:41:17
 @@ -40,48 +40,44 @@
  .Nd split a file into pieces
  .Sh SYNOPSIS
  .Nm
 -.Op Fl b Ar byte_count[k|m]
 -.Op Fl l Ar line_count
 -.Op Fl p Ar pattern
 -.Op Ar file Op Ar name
 +.Op Fl b Ar byte_count Ns Oo Cm k Ns | Ns Cm m Oc | Fl l Ar line_count | Fl p Ar pattern
 +.Op Ar file Op Ar prefix
  .Sh DESCRIPTION
  The
  .Nm
  utility reads the given
  .Ar file
  (or standard input if no file is specified)
 -and breaks it up into files of 1000 lines each.
 +and breaks it up into files of 1000 lines each
 +(if no options are specified), leaving the
 +.Ar file
 +unchanged.
  .Pp
  The options are as follows:
 -.Bl -tag -width Ds
 -.It Fl b
 +.Bl -tag -width indent
 +.It Fl b Ar byte_count Ns Op Cm k Ns | Ns Cm m
  Create smaller files
  .Ar byte_count
  bytes in length.
  If
 -.Dq Li k
 +.Cm k
  is appended to the number, the file is split into
  .Ar byte_count
  kilobyte pieces.
  If
 -.Dq Li m
 +.Cm m
  is appended to the number, the file is split into
  .Ar byte_count
  megabyte pieces.
 -.It Fl l
 +.It Fl l Ar line_count
  Create smaller files
 -.Ar n
 +.Ar line_count
  lines in length.
  .It Fl p Ar pattern
  The file is split whenever an input line matches
  .Ar pattern ,
  which is interpreted as an extended regular expression.
  The matching line will be the first line of the next output file.
 -This option is incompatible with the
 -.Fl b
 -and
 -.Fl l
 -options.
  .El
  .Pp
  If additional arguments are specified, the first is used as the name
 @@ -90,16 +86,16 @@ If a second additional argument is speci
  for the names of the files into which the file is split.
  In this case, each file into which the file is split is named by the
  prefix followed by a lexically ordered suffix in the range of
 -.Dq Li aa-zz .
 +.Dq Li aa Ns - Ns Li zz .
  .Pp
  If the
 -.Ar name
 +.Ar prefix
  argument is not specified, the file is split into lexically ordered
  files named in the range of
  .Dq Li xaa-zzz .
  .Sh BUGS
  For historical reasons, if you specify
 -.Ar name ,
 +.Ar prefix ,
  .Nm
  can only create 676 separate
  files.
 Index: split.c
 ===================================================================
 RCS file: /home/ncvs/src/usr.bin/split/split.c,v
 retrieving revision 1.8
 diff -u -p -r1.8 split.c
 --- split.c	2001/12/12 23:09:07	1.8
 +++ split.c	2002/01/14 09:41:18
 @@ -116,11 +116,6 @@ main(argc, argv)
  			else if (*ep == 'm')
  				bytecnt *= 1048576;
  			break;
 -		case 'p' :      /* pattern matching. */
 -			if (regcomp(&rgx, optarg, REG_EXTENDED|REG_NOSUB) != 0)
 -				errx(EX_USAGE, "%s: illegal regexp", optarg);
 -			pflag = 1;
 -			break;
  		case 'l':		/* Line count. */
  			if (numlines != 0)
  				usage();
 @@ -128,6 +123,11 @@ main(argc, argv)
  				errx(EX_USAGE,
  				    "%s: illegal line count", optarg);
  			break;
 +		case 'p' :		/* Pattern matching. */
 +			if (regcomp(&rgx, optarg, REG_EXTENDED|REG_NOSUB) != 0)
 +				errx(EX_USAGE, "%s: illegal regexp", optarg);
 +			pflag = 1;
 +			break;
  		default:
  			usage();
  		}
 @@ -311,6 +311,6 @@ static void
  usage()
  {
  	(void)fprintf(stderr,
 -"usage: split [-b byte_count] [-l line_count] [-p pattern] [file [prefix]]\n");
 +"usage: split [-b byte_count | -l line_count | -p pattern] [file [prefix]]\n");
  	exit(EX_USAGE);
  }
 
 
 -- 
 Ruslan Ermilov		Oracle Developer/DBA,
 ru@sunbay.com		Sunbay Software AG,
 ru@FreeBSD.org		FreeBSD committer,
 +380.652.512.251	Simferopol, Ukraine
 
 http://www.FreeBSD.org	The Power To Serve
 http://www.oracle.com	Enabling The Information Age

From: swear@blarg.net (Gary W. Swearingen)
To: Ruslan Ermilov <ru@FreeBSD.org>
Cc: bug-followup@FreeBSD.org
Subject: Re: docs/33852: split(1) man page implies that input file is removed.
Date: 14 Jan 2002 11:29:54 -0800

 Ruslan Ermilov <ru@FreeBSD.org> writes:
 
 > I don't like changing "file" to "filename", because "file" is a
 > standard value that's output if you don't give .Ar any arguments.
 
 Bad conventions are better (in this case) than none. :-)
 
 > +and breaks it up into files of 1000 lines each
 > +(if no options are specified), leaving the
 > +.Ar file
 > +unchanged.
 
 That should be either "leaving the file _file_ unchanged" or "leaving
 _file_ unchanged" because "the _file_" refers to a filename or option
 argument which nobody will be concerned might change.
 
 > -.Bl -tag -width Ds
 > -.It Fl b
 > +.Bl -tag -width indent
 > +.It Fl b Ar byte_count Ns Op Cm k Ns | Ns Cm m
 
 I think it's better to leave option arguments out of the option
 description labels and leave them in the synopsis (at least for small
 man pages where the synopsis is easily viewed).  It should result in
 fewer man page bugs.  When an option has several forms of arguments or
 is otherwise complex, it is probably best buried in the description and
 still not in the description label.   But I've noticed it both ways.
 
 >> +No padding is added, so the last new file is normally smaller than the
 >> +others and proper catenation of the output files creates a copy of the
 >> +unsplit original.
 
 I'm glad you caught my -p (non-fixed-size chunks) oversight, but I
 wonder if you would replace my sentence above with:
 
     +No padding is added, so the proper catenation of the output files
     +creates a copy of the unsplit original.
 
 It could be at the end of the DESCRIPTION first paragraph, or as a new
 last paragraph of the DESCRIPTION.  (I thought it best to omit shell
 interaction by mentioning "cat prefix* >copy-of-original".)
 
 Users shouldn't have to experiment to determine that padding is not
 performed, especially since the first paragraph of the DESCRIPTION
 will imply that it does pad out to 1000 lines (by default).
 
 > -"usage: split [-b byte_count] [-l line_count] [-p pattern] [file [prefix]]\n");
 > +"usage: split [-b byte_count | -l line_count | -p pattern] [file [prefix]]\n");
 
 You might want to break that into two shorter lines.  I wasn't 100% sure
 how to do it.  What's the FreeBSD standard limit?

From: Giorgos Keramidas <keramida@freebsd.org>
To: Ruslan Ermilov <ru@freebsd.org>
Cc: bug-followup@freebsd.org
Subject: Re: docs/33852: docs/33852: split(1) man page implies that input file is removed.
Date: Tue, 16 May 2006 18:05:33 +0300

 Hi Ruslan,
 Any change we can commit the last patch of this PR?
 
 http://www.freebsd.org/cgi/query-pr.cgi?pr=docs/33852
 
 IMHO, it looks ok, but I don't think we can expect Gary to review
 it any time soon now...
 

From: Ruslan Ermilov <ru@FreeBSD.org>
To: Giorgos Keramidas <keramida@FreeBSD.org>
Cc: bug-followup@FreeBSD.org
Subject: Re: docs/33852: docs/33852: split(1) man page implies that input file is removed.
Date: Tue, 16 May 2006 18:21:44 +0300

 On Tue, May 16, 2006 at 06:05:33PM +0300, Giorgos Keramidas wrote:
 > Hi Ruslan,
 > Any change we can commit the last patch of this PR?
 > 
 > http://www.freebsd.org/cgi/query-pr.cgi?pr=docs/33852
 > 
 > IMHO, it looks ok, but I don't think we can expect Gary to review
 > it any time soon now...
 > 
 If you have time for this, go ahead and borrow the text
 from POSIX.  I think it should fix all the issues that
 are mentioned in the PR (except adding the -p option).
 I mean, a good SYNOPSIS in my opinion would look like
 this:
 
 SYNOPSIS
      split [-l line_count] [-a suffix_length] [file [name]]
      split -b byte_count[k|m] [-a suffix_length] [file [name]]
      split -p pattern [-a suffix_length] [file [name]]
 
 Feel free to also borrow any changes in option and
 argument names, and any descriptional text if it makes
 it look better.  Just make sure the SYNOPSIS and
 usage() stay in sync.
 
 
 Cheers,
 -- 
 Ruslan Ermilov
 ru@FreeBSD.org
 FreeBSD committer
State-Changed-From-To: open->patched 
State-Changed-By: keramida 
State-Changed-When: Tue Aug 8 21:26:22 UTC 2006 
State-Changed-Why:  
I've adapted Ruslan's patch to the current state of CURRENT 
and committed it. 


Responsible-Changed-From-To: freebsd-doc->keramida 
Responsible-Changed-By: keramida 
Responsible-Changed-When: Tue Aug 8 21:26:22 UTC 2006 
Responsible-Changed-Why:  
MFC reminder. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=33852 
State-Changed-From-To: patched->closed 
State-Changed-By: keramida 
State-Changed-When: Sat Jan 26 11:38:07 UTC 2008 
State-Changed-Why:  
Merged to RELENG_6 too, as revision 1.15.2.1 

http://www.freebsd.org/cgi/query-pr.cgi?pr=33852 

From: dfilter@FreeBSD.ORG (dfilter service)
To: bug-followup@FreeBSD.org
Cc:  
Subject: Re: docs/33852: commit references a PR
Date: Sat, 26 Jan 2008 11:37:59 +0000 (UTC)

 keramida    2008-01-26 11:37:54 UTC
 
   FreeBSD src repository (doc committer)
 
   Modified files:        (Branch: RELENG_6)
     usr.bin/split        split.1 
   Log:
   MFC: 1.19
   
   Update usage & SYNOPSIS and clarify that input files are not removed.
   Sort getopt option handling of -p too, while here.
   
   The changes are adapted from a patch by Ruslan Ermilov, posted as
   followup to docs/33852.
   
   PR:             docs/33852
   Submitted by:   Gary W. Swearingen <swear@blarg.net>
   
   Revision  Changes    Path
   1.15.2.1  +22 -10    src/usr.bin/split/split.1
 _______________________________________________
 cvs-all@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/cvs-all
 To unsubscribe, send any mail to "cvs-all-unsubscribe@freebsd.org"
 
>Unformatted:
