From abc@anchorageinternet.org  Thu Oct 17 22:40:13 2002
Return-Path: <abc@anchorageinternet.org>
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 747E037B406
	for <FreeBSD-gnats-submit@freebsd.org>; Thu, 17 Oct 2002 22:40:13 -0700 (PDT)
Received: from groggy.anc.acsalaska.net (groggy.anc.acsalaska.net [208.151.119.232])
	by mx1.FreeBSD.org (Postfix) with ESMTP id A55AC43EAC
	for <FreeBSD-gnats-submit@freebsd.org>; Thu, 17 Oct 2002 22:40:11 -0700 (PDT)
	(envelope-from abc@anchorageinternet.org)
Received: from en26.groggy.anc.acsalaska.net (root@printer [192.168.0.26])
	by groggy.anc.acsalaska.net (8.11.6/8.11.6) with ESMTP id g9I5e8Y18050
	for <FreeBSD-gnats-submit@freebsd.org>; Thu, 17 Oct 2002 21:40:08 -0800 (AKDT)
	(envelope-from abc@anchorageinternet.org)
Received: (from abc@localhost)
	by en26.groggy.anc.acsalaska.net (8.12.6/8.12.6) id g9I5e9U9038266
	for "Send-PR" <FreeBSD-gnats-submit@freebsd.org>; Fri, 18 Oct 2002 05:40:09 GMT
	(envelope-from abc@anchorageinternet.org)
Message-Id: <200210180540.g9I5e9U9038266@en26.groggy.anc.acsalaska.net>
Date: Fri, 18 Oct 2002 05:40:09 GMT
From: abc@anchorageinternet.org
To: "Send-PR" <FreeBSD-gnats-submit@freebsd.org>
Subject: globbing/argument limits

>Number:         44195
>Category:       misc
>Synopsis:       globbing/argument limits
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    freebsd-bugs
>State:          closed
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          change-request
>Submitter-Id:   current-users
>Arrival-Date:   Thu Oct 17 22:50:01 PDT 2002
>Closed-Date:    Fri Oct 18 06:53:32 PDT 2002
>Last-Modified:  Fri Oct 18 06:53:32 PDT 2002
>Originator:     Joe Public
>Release:        i386 FreeBSD 4.7-RELEASE
>Organization:
no org
>Environment:
^^^^^^^^^^^^^^^^^^^^^^^^
>Description:
argument limits painful to users in days of 100GB drives.
>How-To-Repeat:
try a command and give it a few thousand arguments,
                like in file modifying command a folder with 6000 files.
                find(1) is too slow, and combining it with xargs is a kludge.
                there has to be a better solution than imposing these
                arbitrary limits on arguments.  user limits in /etc/login.conf,
                or something like that, should be used to limit use of
                utilities, not compiled-in defines.
>Fix:
dunno.
>Release-Note:
>Audit-Trail:

From: Peter Pentchev <roam@ringlet.net>
To: abc@anchorageinternet.org
Cc: bug-followup@FreeBSD.org
Subject: Re: misc/44195: globbing/argument limits
Date: Fri, 18 Oct 2002 13:36:36 +0300

 --GFPlsJ7YtLjXgs8j
 Content-Type: text/plain; charset=windows-1251
 Content-Disposition: inline
 Content-Transfer-Encoding: quoted-printable
 
 On Fri, Oct 18, 2002 at 05:40:09AM +0000, abc@anchorageinternet.org wrote:
 >=20
 > >Number:         44195
 > >Category:       misc
 > >Synopsis:       globbing/argument limits
 > >Originator:     Joe Public
 > >Release:        i386 FreeBSD 4.7-RELEASE
 > >Organization:
 > no org
 > >Environment:
 > ^^^^^^^^^^^^^^^^^^^^^^^^
 > >Description:
 > argument limits painful to users in days of 100GB drives.
 > >How-To-Repeat:
 > try a command and give it a few thousand arguments,
 
 It is not a matter of how many arguments you give to a command, it is
 simply a matter of how *long* the command line becomes.  Lugging around
 a multimegabyte command line buffer through shells, execv() system calls
 and such would be a *major* strain on your system.
 
 > like in file modifying command a folder with 6000 files.
 > find(1) is too slow, and combining it with xargs is a kludge.
 
 If you mean that 'find -exec' is too slow, then I would argue that using
 -exec is the kludge, when xargs(1) is available.  I am pretty sure that
 the find(1) and xargs(1) utilities were actually developed together,
 with a common goal in mind, that goal being *exactly* processing of
 multiple files in one go.
 
 The -exec primary to find(1) is extremely inefficient when dealing with
 many files - it spawns a new process for each file it finds, which, as
 you note, is too slow.  The xargs utility will do a much better job; I
 would be very interested in what exactly do you consider to be a kludge
 about it.
 
 > there has to be a better solution than imposing these
 > arbitrary limits on arguments.  user limits in /etc/login.conf,
 > or something like that, should be used to limit use of
 > utilities, not compiled-in defines.
 
 As explained above, the limits are not arbitrary, but governed by strict
 common sense when it comes to passing buffers both between userland
 utilities and through multiple crossings of the userland/kernel boundary
 in system calls.
 
 G'luck,
 Peter
 
 PS. This will very probably be my last post on this subject, and nobody
 should be surprised if this PR is closed very soon; what with the recent
 mailing list "activity", it scores big on my troll indicator.  I could
 be wrong, of course, but I'm just stating my opinion here.
 
 --=20
 Peter Pentchev	roam@ringlet.net	roam@FreeBSD.org
 PGP key:	http://people.FreeBSD.org/~roam/roam.key.asc
 Key fingerprint	FDBA FD79 C26F 3C51 C95E  DF9E ED18 B68D 1619 4553
 This would easier understand fewer had omitted.
 
 --GFPlsJ7YtLjXgs8j
 Content-Type: application/pgp-signature
 Content-Disposition: inline
 
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.2.0 (FreeBSD)
 
 iD8DBQE9r+Q07Ri2jRYZRVMRAjEmAJ9ZF0S/DEiWiTKSosxXzZ2xlHf0oQCdERmx
 1UahZ2drIfteCqMkWyC3GrM=
 =+snq
 -----END PGP SIGNATURE-----
 
 --GFPlsJ7YtLjXgs8j--

From: abc@anchorageinternet.org
To: Peter Pentchev <roam@ringlet.net>
Cc:  
Subject: Re: misc/44195: globbing/argument limits
Date: Fri, 18 Oct 2002 11:57:36 GMT

 > > >Number:         44195
 > > >Category:       misc
 > > >Synopsis:       globbing/argument limits
 > > >Originator:     Joe Public
 > > >Release:        i386 FreeBSD 4.7-RELEASE
 > > >Organization:
 > > no org
 > > >Environment:
 > > ^^^^^^^^^^^^^^^^^^^^^^^^
 > > >Description:
 > > argument limits painful to users in days of 100GB drives.
 > > >How-To-Repeat:
 > > try a command and give it a few thousand arguments,
 > 
 > It is not a matter of how many arguments you give to a command, it is
 > simply a matter of how *long* the command line becomes.  Lugging around
 > a multimegabyte command line buffer through shells, execv() system calls
 > and such would be a *major* strain on your system.
 
 i would've assumed the command line stays in place in memory,
 and only a pointer is passed around - and checking the exec(3)
 manpage seems to show this is the case in fact.
 
 > > like in file modifying command a folder with 6000 files.
 > > find(1) is too slow, and combining it with xargs is a kludge.
 > 
 > If you mean that 'find -exec' is too slow, then I would argue that using
 > -exec is the kludge, when xargs(1) is available.  I am pretty sure that
 > the find(1) and xargs(1) utilities were actually developed together,
 > with a common goal in mind, that goal being *exactly* processing of
 > multiple files in one go.
 
 ok - interesting - i appreciate you explaining this.
 it should be in a the FAQ or something.
  
 > The -exec primary to find(1) is extremely inefficient when dealing with
 > many files - it spawns a new process for each file it finds, which, as
 > you note, is too slow.  The xargs utility will do a much better job; I
 > would be very interested in what exactly do you consider to be a kludge
 > about it.
 
 i consider it to be a kludge when you have to:
 
 find -s "$I" ! -type d | xargs tar rvf "$I.tar" && \
 && gzip -f9 "$I.tar" && mv "$I.tar.gz" "$I.tgz" 
 
 just to create a sorted tar/gzip archive
 of a directory tree.  hehe - as i look at
 it - i say to myself "this is bullshit" :).
 i mean - UNIX hackers have got to be smarter
 than to demand all that from a user just to
 accomplish such a minor ordinary task.
 
 also, when you do something like:
 
     find / \!   \(  -path \*/bin/\*     -or -path \*/lib/\*         \
                 -or -path \*/libexec/\* -or -path /usr/games/\*     \
                 -or -path \*/sbin/\*    -or -path /boot/\*          \
                 -or -path /dev/\*       -or -path /modules/\*       \
                 -or -path /proc/\*      -or -path /root/\*      \)  \
                 -type f -exec chown     root:wheel {}               \;\
                         -exec chmod     0644 {}                     \;
 
 ie, something a find(1) with 2 -exec's, xargs fails,
 and you are forced to double or triple (or more) the code
 it takes - according to the number of -exec's you need to
 perform - i consider this to be a kludge as well.
 
 > > there has to be a better solution than imposing these
 > > arbitrary limits on arguments.  user limits in /etc/login.conf,
 > > or something like that, should be used to limit use of
 > > utilities, not compiled-in defines.
 > 
 > As explained above, the limits are not arbitrary, but governed by strict
 > common sense when it comes to passing buffers both between userland
 > utilities and through multiple crossings of the userland/kernel boundary
 > in system calls.
 
 i am not much of a C hacker these days, so i will respect what
 you say, but i don't see why you say we are passing
 'megabytes of buffers' around.  the 7000 fonts i got are
 only 150k of command line for example - and as i said,
 upon reading exec(3), it appears pointers are passed, not buffers.
  
 > G'luck,
 > Peter
 > 
 > PS. This will very probably be my last post on this subject, and nobody
 > should be surprised if this PR is closed very soon; what with the recent
 > mailing list "activity", it scores big on my troll indicator.  I could
 > be wrong, of course, but I'm just stating my opinion here.
 
 ok - well - no trolling - just installed 4.7, and hit
 a point of frustration with things that have been
 bugging me over the years - that don't see to
 get fixed or improve.
 
 i truly value the effort you made to try to explain,
 though as stated, i still don't see the problem
 in fixing things.
 
 thank you.

From: Peter Pentchev <roam@ringlet.net>
To: abc@anchorageinternet.org
Cc: bug-followup@freebsd.org, "Kerr, Greg" <greg@kerr1.com>,
	"Choudhury, Raj" <raj.choudhury@de.opel.com>
Subject: Re: misc/44195: globbing/argument limits
Date: Fri, 18 Oct 2002 15:48:13 +0300

 On Fri, Oct 18, 2002 at 11:57:36AM +0000, abc@anchorageinternet.org wrote:
 > > > >Number:         44195
 > > > >Category:       misc
 > > > >Synopsis:       globbing/argument limits
 > > > >Originator:     Joe Public
 > > > >Release:        i386 FreeBSD 4.7-RELEASE
 > > > >Organization:
 > > > no org
 > > > >Environment:
 > > > ^^^^^^^^^^^^^^^^^^^^^^^^
 > > > >Description:
 > > > argument limits painful to users in days of 100GB drives.
 > > > >How-To-Repeat:
 > > > try a command and give it a few thousand arguments,
 > >=20
 > > It is not a matter of how many arguments you give to a command, it is
 > > simply a matter of how *long* the command line becomes.  Lugging around
 > > a multimegabyte command line buffer through shells, execv() system calls
 > > and such would be a *major* strain on your system.
 >=20
 > i would've assumed the command line stays in place in memory,
 > and only a pointer is passed around - and checking the exec(3)
 > manpage seems to show this is the case in fact.
 
 Not when exec(3) invokes the execve(2) system call, as stated in the
 very first paragraph of the exec(3) manual page.  The execve(2) system
 call needs to copy the arguments to kernel space to examine them, and
 then to build a single command line for the new process.
 
 > > > like in file modifying command a folder with 6000 files.
 > > > find(1) is too slow, and combining it with xargs is a kludge.
 > >=20
 > > If you mean that 'find -exec' is too slow, then I would argue that using
 > > -exec is the kludge, when xargs(1) is available.  I am pretty sure that
 > > the find(1) and xargs(1) utilities were actually developed together,
 > > with a common goal in mind, that goal being *exactly* processing of
 > > multiple files in one go.
 >=20
 > ok - interesting - i appreciate you explaining this.
 > it should be in a the FAQ or something.
 > =20
 > > The -exec primary to find(1) is extremely inefficient when dealing with
 > > many files - it spawns a new process for each file it finds, which, as
 > > you note, is too slow.  The xargs utility will do a much better job; I
 > > would be very interested in what exactly do you consider to be a kludge
 > > about it.
 >=20
 > i consider it to be a kludge when you have to:
 >=20
 > find -s "$I" ! -type d | xargs tar rvf "$I.tar" && \
 > && gzip -f9 "$I.tar" && mv "$I.tar.gz" "$I.tgz"=20
 >=20
 > just to create a sorted tar/gzip archive
 > of a directory tree.  hehe - as i look at
 > it - i say to myself "this is bullshit" :).
 
 As noted in my response to your other PR, this particular use of find(1)
 and tar(1) may be optimized :)  Besides, the only "kludge" in that
 example is the need to update the tarball incrementally using tar's 'r'
 command instead of 'c'; I, personally, would not consider that too large
 a price to pay for being able to process the whole list of files at all.
 
 > i mean - UNIX hackers have got to be smarter
 > than to demand all that from a user just to
 > accomplish such a minor ordinary task.
 
 Yep, see both above and below :)
 
 > also, when you do something like:
 >=20
 >     find / \!   \(  -path \*/bin/\*     -or -path \*/lib/\*         \
 >                 -or -path \*/libexec/\* -or -path /usr/games/\*     \
 >                 -or -path \*/sbin/\*    -or -path /boot/\*          \
 >                 -or -path /dev/\*       -or -path /modules/\*       \
 >                 -or -path /proc/\*      -or -path /root/\*      \)  \
 >                 -type f -exec chown     root:wheel {}               \;\
 >                         -exec chmod     0644 {}                     \;
 >=20
 > ie, something a find(1) with 2 -exec's, xargs fails,
 > and you are forced to double or triple (or more) the code
 > it takes - according to the number of -exec's you need to
 > perform - i consider this to be a kludge as well.
 
 If you need to execute multiple commands, there are several things you
 might do.
 
 The simplest is to create a small shell script, and use xargs(1) to
 execute it; the shell script runs chown, chmod, or whatever, on all its
 arguments.
 
 Another way would be capturing find(1)'s output into a file, then
 running xargs(1) as many times as needed, redirecting its input to
 read this file; something like:
 
   find / \! \( ... \) > filelist
   xargs chown root:wheel < filelist
   xargs chmod 0644 < filelist
 
 Still another way might avoid the temporary file altogether, with some
 creative file descriptor hackery.  I *think* I have done this before,
 but right now, I cannot remember the proper incantations to make the
 shell duplicate find(1)'s output to a new file descriptor, run xargs
 from fd 1's output, then run another copy of xargs, making it read from
 the file descriptor that find(1)'s output was duplicated to.  I know it
 is possible, it is just that I cannot remember how to do it :)
 
 > > PS. This will very probably be my last post on this subject, and nobody
 > > should be surprised if this PR is closed very soon; what with the recent
 > > mailing list "activity", it scores big on my troll indicator.  I could
 > > be wrong, of course, but I'm just stating my opinion here.
 >=20
 > ok - well - no trolling - just installed 4.7, and hit
 > a point of frustration with things that have been
 > bugging me over the years - that don't see to
 > get fixed or improve.
 >=20
 > i truly value the effort you made to try to explain,
 > though as stated, i still don't see the problem
 > in fixing things.
 
 Apologies for the above paragraph of mine and my attitude in somewhat
 summarily dismissing your other PR's at first; there has been quite a
 bit of trolling on the various FreeBSD lists recently, and there have
 been a couple of bogus PR's filed in the process, so I was a bit
 trigger-happy there.
 
 G'luck,
 Peter
 
 --=20
 Peter Pentchev	roam@ringlet.net	roam@FreeBSD.org
 PGP key:	http://people.FreeBSD.org/~roam/roam.key.asc
 Key fingerprint	FDBA FD79 C26F 3C51 C95E  DF9E ED18 B68D 1619 4553
 This would easier understand fewer had omitted.

From: abc@anchorageinternet.org
To: Peter Pentchev <roam@ringlet.net>
Cc:  
Subject: Re: misc/44195: globbing/argument limits
Date: Fri, 18 Oct 2002 13:46:20 GMT

 ok - thanks - you were very helpful.
 i put this stuff in little scripts so
 on my web site so hopefully people can
 see some simple techniques and avoid
 bugging you guys like i did :)
 
 i did check around *quite a bit* for the things you
 answered, and failed to find answers as good as
 the ones you provided.  i was getting grumpy
 with some things that were frustrating and
 you made me a happy FBSD user once again :)
 
 ps.  i leave the following in this email
      so i have a copy in my sent mail and
      can study your answers more.  nothing
      new follows.  thank you very much.
 
 > On Fri, Oct 18, 2002 at 11:57:36AM +0000, abc@anchorageinternet.org wrote:
 > > > > >Number:         44195
 > > > > >Category:       misc
 > > > > >Synopsis:       globbing/argument limits
 > > > > >Originator:     Joe Public
 > > > > >Release:        i386 FreeBSD 4.7-RELEASE
 > > > > >Organization:
 > > > > no org
 > > > > >Environment:
 > > > > ^^^^^^^^^^^^^^^^^^^^^^^^
 > > > > >Description:
 > > > > argument limits painful to users in days of 100GB drives.
 > > > > >How-To-Repeat:
 > > > > try a command and give it a few thousand arguments,
 > > > 
 > > > It is not a matter of how many arguments you give to a command, it is
 > > > simply a matter of how *long* the command line becomes.  Lugging around
 > > > a multimegabyte command line buffer through shells, execv() system calls
 > > > and such would be a *major* strain on your system.
 > > 
 > > i would've assumed the command line stays in place in memory,
 > > and only a pointer is passed around - and checking the exec(3)
 > > manpage seems to show this is the case in fact.
 > 
 > Not when exec(3) invokes the execve(2) system call, as stated in the
 > very first paragraph of the exec(3) manual page.  The execve(2) system
 > call needs to copy the arguments to kernel space to examine them, and
 > then to build a single command line for the new process.
 > 
 > > > > like in file modifying command a folder with 6000 files.
 > > > > find(1) is too slow, and combining it with xargs is a kludge.
 > > > 
 > > > If you mean that 'find -exec' is too slow, then I would argue that using
 > > > -exec is the kludge, when xargs(1) is available.  I am pretty sure that
 > > > the find(1) and xargs(1) utilities were actually developed together,
 > > > with a common goal in mind, that goal being *exactly* processing of
 > > > multiple files in one go.
 > > 
 > > ok - interesting - i appreciate you explaining this.
 > > it should be in a the FAQ or something.
 > >  
 > > > The -exec primary to find(1) is extremely inefficient when dealing with
 > > > many files - it spawns a new process for each file it finds, which, as
 > > > you note, is too slow.  The xargs utility will do a much better job; I
 > > > would be very interested in what exactly do you consider to be a kludge
 > > > about it.
 > > 
 > > i consider it to be a kludge when you have to:
 > > 
 > > find -s "$I" ! -type d | xargs tar rvf "$I.tar" && \
 > > && gzip -f9 "$I.tar" && mv "$I.tar.gz" "$I.tgz" 
 > > 
 > > just to create a sorted tar/gzip archive
 > > of a directory tree.  hehe - as i look at
 > > it - i say to myself "this is bullshit" :).
 > 
 > As noted in my response to your other PR, this particular use of find(1)
 > and tar(1) may be optimized :)  Besides, the only "kludge" in that
 > example is the need to update the tarball incrementally using tar's 'r'
 > command instead of 'c'; I, personally, would not consider that too large
 > a price to pay for being able to process the whole list of files at all.
 > 
 > > i mean - UNIX hackers have got to be smarter
 > > than to demand all that from a user just to
 > > accomplish such a minor ordinary task.
 > 
 > Yep, see both above and below :)
 > 
 > > also, when you do something like:
 > > 
 > >     find / \!   \(  -path \*/bin/\*     -or -path \*/lib/\*         \
 > >                 -or -path \*/libexec/\* -or -path /usr/games/\*     \
 > >                 -or -path \*/sbin/\*    -or -path /boot/\*          \
 > >                 -or -path /dev/\*       -or -path /modules/\*       \
 > >                 -or -path /proc/\*      -or -path /root/\*      \)  \
 > >                 -type f -exec chown     root:wheel {}               \;\
 > >                         -exec chmod     0644 {}                     \;
 > > 
 > > ie, something a find(1) with 2 -exec's, xargs fails,
 > > and you are forced to double or triple (or more) the code
 > > it takes - according to the number of -exec's you need to
 > > perform - i consider this to be a kludge as well.
 > 
 > If you need to execute multiple commands, there are several things you
 > might do.
 > 
 > The simplest is to create a small shell script, and use xargs(1) to
 > execute it; the shell script runs chown, chmod, or whatever, on all its
 > arguments.
 > 
 > Another way would be capturing find(1)'s output into a file, then
 > running xargs(1) as many times as needed, redirecting its input to
 > read this file; something like:
 > 
 >   find / \! \( ... \) > filelist
 >   xargs chown root:wheel < filelist
 >   xargs chmod 0644 < filelist
 > 
 > Still another way might avoid the temporary file altogether, with some
 > creative file descriptor hackery.  I *think* I have done this before,
 > but right now, I cannot remember the proper incantations to make the
 > shell duplicate find(1)'s output to a new file descriptor, run xargs
 > from fd 1's output, then run another copy of xargs, making it read from
 > the file descriptor that find(1)'s output was duplicated to.  I know it
 > is possible, it is just that I cannot remember how to do it :)
 > 
 > > > PS. This will very probably be my last post on this subject, and nobody
 > > > should be surprised if this PR is closed very soon; what with the recent
 > > > mailing list "activity", it scores big on my troll indicator.  I could
 > > > be wrong, of course, but I'm just stating my opinion here.
 > > 
 > > ok - well - no trolling - just installed 4.7, and hit
 > > a point of frustration with things that have been
 > > bugging me over the years - that don't see to
 > > get fixed or improve.
 > > 
 > > i truly value the effort you made to try to explain,
 > > though as stated, i still don't see the problem
 > > in fixing things.
 > 
 > Apologies for the above paragraph of mine and my attitude in somewhat
 > summarily dismissing your other PR's at first; there has been quite a
 > bit of trolling on the various FreeBSD lists recently, and there have
 > been a couple of bogus PR's filed in the process, so I was a bit
 > trigger-happy there.
 > 
 > G'luck,
 > Peter
 > 
 > -- 
 > Peter Pentchev	roam@ringlet.net	roam@FreeBSD.org
 > PGP key:	http://people.FreeBSD.org/~roam/roam.key.asc
 > Key fingerprint	FDBA FD79 C26F 3C51 C95E  DF9E ED18 B68D 1619 4553
 > This would easier understand fewer had omitted.
State-Changed-From-To: open->closed 
State-Changed-By: roam 
State-Changed-When: Fri Oct 18 06:52:21 PDT 2002 
State-Changed-Why:  
The originator seems to agree that find(1) and xargs(1) may be 
persuaded to do the right thing after all :) 

http://www.freebsd.org/cgi/query-pr.cgi?pr=44195 
>Unformatted:
