[HN Gopher] An Opinionated Guide to Xargs
       ___________________________________________________________________
        
       An Opinionated Guide to Xargs
        
       Author : todsacerdoti
       Score  : 239 points
       Date   : 2021-08-21 16:21 UTC (6 hours ago)
        
 (HTM) web link (www.oilshell.org)
 (TXT) w3m dump (www.oilshell.org)
        
       | masklinn wrote:
       | > Shell functions and $1, instead of xargs -I {}
       | 
       | > -n instead of -L (to avoid an ad hoc data language)
       | 
       | Apparently GNU xargs is missing it, but BSD xargs has -J, which
       | is a `-I` which works with `-n`: with `-I` each replstr gets
       | replaced by one of the inputs, with `-J` the replstr gets
       | replaced by the entire batch (as determined by `-n`).
        
       | [deleted]
        
       | MichaelGroves wrote:
       | > _A lobste.rs user asked why you would use find | xargs rather
       | than find -exec. The answer is that it can be much faster. If
       | you're trying to rm 10,000 files, you can start one process
       | instead of 10,000 processes!_
       | 
       | Fair enough, but I still favor _find -exec_. I find it generally
       | less error prone, and it 's never been so slow that I wished I
       | had instead used xargs.
       | 
       | Also, if you're specifically using _-exec rm_ with find, you
       | could instead use find with _-delete_.
        
         | bobbylarrybobby wrote:
         | You can also use `find -exec` with `'+'` instead of `';'` as
         | the terminator. This will call `rm` on all of the found files
         | in one call.
        
           | masklinn wrote:
           | I tend to prefer xargs because it works in more contexts e.g.
           | I've got a tool which automatically generates databases but
           | sometimes the cleanup doesn't work. `find -exec` does
           | nothing, but `xargs -n1 dropdb` (following an intermediate
           | grep) does the job. From there, it makes sense to... just use
           | xargs everywhere.
           | 
           | And I always fail to remember that the -exec terminator must
           | be escaped in zsh, so using -exec always takes me multiple
           | tries. So I only use -exec when I must (for `find`
           | predicates).
        
           | shoo wrote:
           | i agree. `find somewhere -exec some_command {} +` can be
           | dramatically faster. but it does not guarantee a single
           | invocation of `some_command`, it may make multiple
           | invocations if you pass very large numbers of matching files
           | 
           | after spending a bit of time reading the man page for find, i
           | rarely use xargs any more. find is pretty good.
           | 
           | tangent:
           | 
           | another instance i've seen where spawning many processes can
           | lead to bad performance is in bash scripts for git pre-
           | recieve hooks, to scan and validate the commit message of a
           | range of commits before accepting them. it is pretty easy to
           | cobble together some loop in a bash script that executes
           | multiple processes _per commit_. that's fine for typical
           | small pushes of 1-20 commits -- but if someone needs to do
           | serious graph surgery and push a branch of 1000 - 10,000
           | commits that can can cause very long running times -- and
           | more seriously, timeouts, where the entire push gets rejected
           | as the pre-receive script takes too long. a small program
           | using the libgit2 API can do the same work at the cost of a
           | single process, although then you have the fun of figuring
           | out how to build, install and maintain binary git pre-receive
           | hooks.
        
         | chubot wrote:
         | A benefit I didn't mention in the post (but probably should) is
         | that the pipe lets you interpose other tools.
         | 
         | That is, find -exec is sort of "hard-coded", while find | xargs
         | allows obvious extensions like:                   find | grep |
         | xargs   # filter tasks              find | head | xargs   # I
         | use this all the time for faster testing              find |
         | shuf | xargs
         | 
         | Believe it or not I actually use find | shuf | xargs mplayer to
         | randomize music and videos :)
         | 
         | So shell is basically a more compositional language than find
         | (which is its own language, as I explain here:
         | http://www.oilshell.org/blog/2021/04/find-test.html )
        
       | reilly3000 wrote:
       | I'm unconvinced by the post OP was responding to. It's a utility,
       | it provides some means to get things done. *nix provides many
       | means of parsing text and running commands, each have their
       | idioms based on their own axioms. It seems as if a composer is
       | lambasting the clarinet because they don't care for its
       | fingerings. I've only used xargs sparingly, can somebody
       | enlighten me as to why it's bad, aside from the fact that there
       | are other ways to do some things it does?
        
       | westurner wrote:
       | Wanting verbose logging from xargs, years ago I wrote a script
       | called `el` (edit lines) that basically does `xargs -0` with
       | logging.
       | https://github.com/westurner/dotfiles/blob/develop/scripts/e...
       | 
       | It turns out that e.g. -print0 and -0 are the only safe way: line
       | endings aren't escaped:                   find . -type f -print0
       | | el -0 --each -x echo
       | 
       | GNU Parallel is a much better tool:
       | https://en.wikipedia.org/wiki/GNU_parallel
        
         | chubot wrote:
         | (author here) Hm I don't see either of these points because:
         | 
         | GNU xargs has --verbose which logs every command. Does that not
         | do what you want? (Maybe I should mention its existence in the
         | post)
         | 
         | xargs -P can do everything GNU parallel do, which I mention in
         | the post. Any counterexamples? GNU parallel is a very ugly DSL
         | IMO, and I don't see what it adds.
         | 
         | --
         | 
         | edit: Logging can also be done with by recursively invoking
         | shell functions that log with the $0 Dispatch Pattern,
         | explained in the post. I don't see a need for another tool;
         | this is the Unix philosophy and compositionality of shell at
         | work :)
        
           | jeffbee wrote:
           | Parallel's killer feature is how it spools subprocess output,
           | ensuring that it doesn't get jumbled together. xargs can't do
           | that. I use parallel for things like shelling out to 10000
           | hosts and getting some statistics. If I use xargs the output
           | stomps all over itself.
        
             | chubot wrote:
             | Ah OK thanks, I responded to this here:
             | https://news.ycombinator.com/item?id=28259473
        
           | Godel_unicode wrote:
           | As far as I'm aware, xargs still has the problem of multiple
           | jobs being able to write to stdout at the same time,
           | potentially causing their output streams to be intermingled.
           | Compare this with parallels --group.
           | 
           | Also parallels can run some of those threads on remote
           | machines. I don't believe xargs has an equivalent job
           | management function.
        
           | [deleted]
        
         | LeoPanthera wrote:
         | Yeah but xargs doesn't refuse to run until I have agreed to a
         | EULA stating I will cite it in my next academic paper.
        
           | jeffbee wrote:
           | parallel doesn't either, it just nags. I agree about how
           | silly and annoying it is. Imagine if every time the parallel
           | author opened Firefox he got a message reminding him to
           | personally thank me if he uses his web browser for research,
           | or if every time his research program calls malloc he has to
           | acknowledge and cite Ulrich Drepper. Very very silly.
           | 
           | Parallel is the better tool but the nagware impairs its
           | reputation.
        
             | blibble wrote:
             | or every time a process called fork() you had to read some
             | stupid message
        
       | fiddlerwoaroof wrote:
       | I frequently find myself reaching for this pattern instead of
       | xargs:                   do_something | ( while read -r v; do
       | . . .         done )
       | 
       | I've found that it has fewer edge cases (except it creates a
       | subshell, which can be avoided in some shells by using braces
       | instead of parens)
        
         | aaaaaaaaaaab wrote:
         | Also for the `while` enthusiasts, here's how you zip the output
         | of two processes in bash:                   paste -d \\n
         | <(do_something1) <(do_something2) | while read -r var1 && read
         | -r var2; do             ... # var1 comes from do_something1,
         | var2 comes from do_something2         done
        
         | aaaaaaaaaaab wrote:
         | Some additional tips:
         | 
         | 1. You don't need the parentheses.
         | 
         | 2. If you use process substitution [1] instead of a pipe, you
         | will stay in the same process and can modify variables of the
         | enclosing scope:                   i=0         while read -r v;
         | do             ...             i=$(( i + 1))         done <
         | <(do_something)
         | 
         | The drawback is that this way `do_something` has to come after
         | `done`, but that's bash for you -\\_(tsu)_/-
         | 
         | [1]
         | https://www.gnu.org/software/bash/manual/html_node/Process-S...
        
           | chriswarbo wrote:
           | I use this exact pattern a lot. One thing to consider is that
           | in the process substitution version, do_something can't
           | modify the enclosing variables. The vast majority of the time
           | I want to modify variables in the loop body and not the
           | generating process, but it's worth keeping in mind.
           | 
           | One common pattern I use this for is running a bunch of
           | checks/tests, e.g.                   EXIT_CODE=0
           | while read -r F         do             do_check "$F" ||
           | EXIT_CODE=1         done < <(find ./tests -type f)
           | exit "$EXIT_CODE"
           | 
           | This is a more complicated alternative to the following:
           | find ./tests -type f | while read -r F         do
           | do_check "$F" || exit 1         done
           | 
           | The simpler version will abort on the first error, whilst the
           | first version will always run all of the checks (exiting with
           | an error afterwards, if any of them failed)
        
             | fiddlerwoaroof wrote:
             | I usually write zsh scripts and I think there's a shell
             | option in zsh that allows the loop at the end of the pipe
             | to modify variables in the enclosing body: I remember at
             | least one occasion where I was surprised about this
             | discrepancy between shells.
        
               | aaaaaaaaaaab wrote:
               | Interesting! Indeed, Greg's BashFAQ notes it too:
               | https://mywiki.wooledge.org/BashFAQ/024
               | 
               | >Different shells exhibit different behaviors in this
               | situation:
               | 
               | >- BourneShell creates a subshell when the input or
               | output of anything (loops, case etc..) but a simple
               | command is redirected, either by using a pipeline or by a
               | redirection operator ('<', '>').
               | 
               | >- BASH, Yash and PDKsh-derived shells create a new
               | process only if the loop is part of a pipeline.
               | 
               | >- KornShell and Zsh creates it only if the loop is part
               | of a pipeline, but not if the loop is the last part of
               | it. The read example above actually works in ksh88,
               | ksh93, zsh! (but not MKsh or other PDKsh-derived shells)
               | 
               | >- POSIX specifies the bash behaviour, but as an
               | extension allows any or all of the parts of the pipeline
               | to run without a subshell (thus permitting the KornShell
               | behaviour, as well).
        
           | fiddlerwoaroof wrote:
           | Yeah, although I use the parentheses mostly because I like
           | how it reads. And that process substitution trick is
           | important too.
           | 
           | I think the redirection can come first, though (not at a
           | computer to test):                   < <( do_something )
           | while read . . .
        
             | [deleted]
        
             | lottin wrote:
             | This is not POSIX compliant though.
        
               | fiddlerwoaroof wrote:
               | These days bash and/or zsh are available nearly every
               | place I care about, so I find POSIX compliance to be much
               | less relevant.
        
               | pgtan wrote:
               | No, process substitution must be provided by the
               | kernel/syslibs, it is not feature of bash. For example
               | there is bash on AIX, but process substitution is not
               | possible because the OS do not support it.
        
             | aaaaaaaaaaab wrote:
             | Yeah, for _commands_ , the input/output redirections can
             | precede them, but for some reason it doesn't work for
             | builtin constructs like `while`:                   $ < <(
             | echo foo ) while read -r f; do echo "$f"; done
             | -bash: syntax error near unexpected token `do'         $ <
             | <( echo foo ) xargs echo         foo              $ bash
             | --version         GNU bash, version 5.1.4(1)-release
             | (x86_64-apple-darwin20.2.0)
        
               | fiddlerwoaroof wrote:
               | Maybe wrap the loop either with parentheses or braces?
        
               | aaaaaaaaaaab wrote:
               | Tried that, but nope :D I'll let you figure this one out
               | once you get near a computer!
        
             | entire-name wrote:
             | Redirection like this doesn't seem to work if it comes
             | first on GNU bash 5.0.17(1)-release.
             | 
             | For documentation purposes, this is the exact thing I tried
             | to run:                   $ < <(echo hi) while read a; do
             | echo "got $a"; done         -bash: syntax error near
             | unexpected token `do'              $ while read a; do echo
             | "got $a"; done < <(echo hi)         got hi
             | 
             | Maybe there is another way...
        
               | JNRowe wrote:
               | One way which isn't great, but an option nonetheless...
               | The zsh parser is happy with that form:
               | $ zsh -c '< <(echo hi) while read a; do echo "got $a";
               | done'         got hi
               | 
               | My position isn't that it is a good reason to switch
               | shells, but if you're using it anyway then it is an
               | option.
        
               | fiddlerwoaroof wrote:
               | I've always preferred zsh and, as I've slowly adopted
               | nix, I've slowly stopped writing bash in favor of zsh
        
         | tomcam wrote:
         | Thank you. Your comment coalesced a number of things in my mind
         | that I hadn't grasped properly as a UNIX midwit, especially the
         | braces thing.
        
         | ptspts wrote:
         | For thousands of arguments this sloution is much slower (high
         | CPU usage) than xargs, because either it implements the logic
         | as a shell script (slow) or it runs an external program for
         | each argument (slow).
        
           | fiddlerwoaroof wrote:
           | Sure, if performance matters use xargs. I find this is easier
           | to read and think about.
        
       | pgtan wrote:
       | FWIW AIX also has an apply command
       | 
       | https://www.ibm.com/docs/en/aix/7.2?topic=apply-command
        
         | 2OEH8eoCRo0 wrote:
         | I spent a year using AIX at my previous job and never heard of
         | this or saw anybody use it. Is it new in 7.2? We were far
         | behind on AIX 6.
        
           | pgtan wrote:
           | No idea how old this command is. Most of the AIX/Linux admins
           | I knew were very bad shell programmers, skills end with
           | awfull for-loops, useless use of cat, and awk '{print $3}'.
        
       | agumonkey wrote:
       | I used to have bash fun like `curry { xargs -I {} $1 }` or
       | something like that. Pretty useful to simplify one liners.
        
       | rcpt wrote:
       | awk '{ print your_command }' | bash
       | 
       | Never can remember all the -I stuff around xargs
        
         | chubot wrote:
         | This is like the sed|bash anti-pattern mentioned in the
         | original post, and quoted in the appendix on shell injection.
         | 
         | I wouldn't say "never use it", but I would hesitate to ever put
         | it in a script, vs. doing a one-off at the command line.
        
       | legobmw99 wrote:
       | This is only tangentially related, but after all the posts here
       | the last few days about thought terminating cliches, I can't help
       | but reflect on the "X considered harmful" title cliche
        
         | MichaelGroves wrote:
         | Would you say the title terminated your consideration of the
         | article?
        
           | legobmw99 wrote:
           | No I think if anything seeing it was a response to "xargs
           | considered harmful" made me take the authors side quicker
        
         | Zababa wrote:
         | I've been thinking about titles, and it's hard to make a good
         | one that doesn't look like a total cliche. "X considered
         | harmful", "an opinionated guide to X", some kind of joke or
         | reference, what could be a collection of tags (X, Y and Z),
         | "things I have learned doing X", etc.
        
           | zeroimpl wrote:
           | I specifically clicked on this topic because of the word
           | "opinionated". As I already know how to use xargs, I was
           | curious what kind of non-conventional or controversial
           | opinion the author might have.
        
             | Zababa wrote:
             | As I've said to a sibling comment, I don't think it's a bad
             | title, and "an opinionated guide to X" is one of the better
             | cliche for titles that I see (the worst being the
             | journalist that feels like they have to make a joke).
        
           | ineedasername wrote:
           | In this case a less cliche/click-baity title could simply be:
           | 
           | "A Response to Xargs Criticism"
        
             | Zababa wrote:
             | I think this title is fine, it's mostly that after spending
             | some time on Hacker News all the titles start to look the
             | same.
        
         | phone8675309 wrote:
         | What every X should know about Y, an opinionated take on Z
         | considered harmful
        
           | MonkeyClub wrote:
           | ...with an example Lisp implementation written in APL
           | translating into 6502 assembly :)
        
         | abetusk wrote:
         | Yes, I absolutely hate them. I was thinking of creating a
         | "considered harmful" considered harmful rant but it already
         | exists [0].
         | 
         | [0] https://meyerweb.com/eric/comment/chech.html
        
         | JadeNB wrote:
         | Is it thought _terminating_ , though? "X considered harmful"
         | seems more intended to spark discussion in an intentionally
         | inflammatory way than to stifle it.
         | 
         | (In any case, this surely _is_ tangential, since the title is
         | not  "X considered harmful" for any value of X--at best it
         | _comments_ on a post by that title, as, indeed, you are doing.)
        
       | yudlejoza wrote:
       | Of xargs, for, and while, I have limited myself to while. It's
       | more typing everytime but saves me from having to remember so
       | many quirks of each command.                   cat input.file |
       | ... | while read -r unit; do <cmd> ${unit}; done | ...
       | 
       | between 'while read -r unit' and 'while IFS= read -r unit' I can
       | probably handle 90% of the cases. (maybe I should always use IFS
       | since I tend to forget the proper way to use it).
        
         | patrickdavey wrote:
         | Would you mind expanding with a couple of examples? (E.g. using
         | "foo bar" as a single line or split by whitespace).
         | 
         | I suspect I'll really like your way of doing things, but an
         | example would be very handy.
        
       | andy81 wrote:
       | Today I appreciated Powershell
        
         | jmholla wrote:
         | Can you expand on that? I've never had trouble leveraging xargs
         | and find it aligns well with shell piping.
        
           | bialpio wrote:
           | Not OP but to me the best thing about PowerShell is that it
           | recognizes that text is not always the best way to output
           | results from commands if you care about creating pipelines.
           | In short, it passes objects around so there's no need for
           | parsing text.
        
             | bialpio wrote:
             | Two examples from the article translated into PS (sorry,
             | I'm a bit rusty so the second one may not be the shortest
             | possible):                 PS> "alice", "bob" | echo
             | PS> Get-ChildItem . -Include "*test.cpp","*test.py"
             | -Recurse | foreach { Remove-Item $_.Name }
             | 
             | No text parsing in sight, and the object attributes can be
             | tab-completed from the shell (e.g. I tab-completed the
             | `$_.Name`).
        
               | andy81 wrote:
               | Thanks, we were thinking of the same thing.
        
       | HMH wrote:
       | I always wonder why something like xargs is not a shell built-in.
       | It's such a common pattern, but I dread formulating the correct
       | incantation every time.
       | 
       | I was happy to read that the author comes to the same conclusion
       | and proposes an `each` builtin (albeit only for the Oil shell)!
       | Like that there is no need to learn another mini language as
       | pointed out.
        
         | JNRowe wrote:
         | If you're a zsh user it offers a version of something like
         | xargs in zargs1. As the documentation shows it can be really
         | quite powerful in part because of zsh's excellent globbing
         | facilities, and I think without that support it wouldn't be all
         | that useful as a built-in.
         | 
         | I'd also perhaps argue that the reason we don't want xargs to
         | be a built-in is precisely because of zargs and the point in
         | your second paragraph. If it was built-in it would no doubt be
         | obscenely different in each shell, and five decades later a
         | standard that no one follows would eventually specify its
         | behaviour ;)
         | 
         | 1 https://zsh.sourceforge.io/Doc/Release/User-
         | Contributions.ht... - search for "zargs", it has no anchor.
         | Sorry.
        
       | l0b0 wrote:
       | I scanned until I saw `ls | egrep '.*_test\\.(py|cc)' | xargs -d
       | $'\n' -- rm`, and then stopped. This is a terrible idea[1][2].
       | 
       | [1] https://mywiki.wooledge.org/ParsingLs
       | 
       | [2] https://unix.stackexchange.com/q/128985/3645
        
         | tyingq wrote:
         | I'm surprised the links don't mention find. The -print0 flag
         | makes it safe for crazy filenames, which pairs with the xargs
         | -0 flag, or the perl -0 flag, etc. And you have -maxdepth if
         | you don't want it to trawl.
        
       | WhatIsDukkha wrote:
       | I tend to reach for gnu parallel instead of xargs -
       | 
       | https://www.gnu.org/software/parallel/parallel_alternatives....
       | 
       | parallel is probably on the complex side but its also been
       | actively developed, bugfixed and had a lot of road miles from
       | large computing users.
        
         | orhmeh09 wrote:
         | The nagware prompts of parallel are so objectionable that I
         | will do a lot of things to avoid using it at all. So
         | pretentious!
        
           | queuebert wrote:
           | It's also written in Perl!
        
             | orhmeh09 wrote:
             | Veering off course here, after experiencing how incredibly
             | long it took to install Sqitch, I will go out of my way to
             | avoid anything that is more than a single script, certainly
             | anything requiring CPAN too. I don't think there's anything
             | technically wrong with these programs or with Perl, they're
             | just presented in ways that are unique hassles in this day
             | and age.
        
           | grawlinson wrote:
           | Seems like some distributions patch out the nagware. I know
           | Arch Linux does[0].
           | 
           | [0]: https://github.com/archlinux/svntogit-
           | community/tree/package...
        
         | chubot wrote:
         | I mention it here:
         | https://www.oilshell.org/blog/2021/08/xargs.html#xargs-p-aut...
         | 
         | What does it do that xargs and shell can't? (honest question)
        
           | lacksconfidence wrote:
           | i don't know if xargs cant, but i use gnu parallel to split
           | an input pipe into N parallel pipes processing slices of the
           | input stream.
           | 
           | Edit: To clarify, xargs usually wants to spin up a process
           | per task. I have parallel spin up N processes and then
           | continuously feed them.
        
           | bloopernova wrote:
           | GNU Parallel can be sourced into a bash session from a plain
           | text file and used as a function. I've used it to get around
           | overly-restrictive build environments. (overly restrictive
           | because the team that manages the build image wasn't open to
           | modifying their image for my use case)
        
           | xmcqdpt2 wrote:
           | Restart capability and remote executions make gnu parallel
           | the tool if choice for HPC. For example, you might very well
           | use gnu parallel to run 1000s of cpu-hours of numerical
           | simulation using patterns such as these ones,
           | 
           | https://docs.computecanada.ca/mediawiki/index.php?title=GNU_.
           | ..
           | 
           | Using xargs for this kind of work is euhm... not a good idea.
        
           | orf wrote:
           | Resumption, error reporting and much better progress
           | monitoring.
        
             | vhold wrote:
             | Oh I didn't know about resumption.. parallel has so many
             | features packed into its CLI it's kind of ridiculous.
             | 
             | For others that didn't know about it, see the examples
             | here: https://www.gnu.org/software/parallel/parallel_tutori
             | al.html...
             | 
             | Here's another surprising feature: https://www.gnu.org/soft
             | ware/parallel/parallel_tutorial.html...
        
           | bsmithers wrote:
           | Not to be pedantic, but that's a bit of a non-argument. _Of
           | course_ you can do it with xargs and shell, but imho parallel
           | is generally more convenient, especially for remote
           | execution. It provides a higher level of abstraction for such
           | tasks.
        
           | orhmeh09 wrote:
           | Issue complaint prompts to promote the author, for one.
        
           | leephillips wrote:
           | Remote execution.
        
             | chubot wrote:
             | I'd like to see a demo of it! I will try rewriting it with
             | the $0 Dispatch Pattern and ssh :)
        
               | xmcqdpt2 wrote:
               | Good luck balancing node usage!
               | 
               | Here is an example of how it works,
               | 
               | https://docs.computecanada.ca/mediawiki/index.php?title=G
               | NU_...
               | 
               | This + restart capabilities make gnu parallel very well
               | suited to running 1000s of compute-heavy jobs on HPC
               | clusters.
        
               | figomore wrote:
               | I used Parallel to distribute the rendering of a little
               | Blender animation It worked very well.
               | 
               | https://github.com/tfmoraes/blender_gnu_parallel_render/b
               | lob...
        
           | comex wrote:
           | One thing parallel can do better than xargs is collect
           | output.
           | 
           | If you use `xargs -P`, all processes share the same stdout
           | and output may be mixed arbitrarily between them. (If the
           | program being executed uses line buffering, lines _usually_
           | won 't be mixed together from multiple invocations, but they
           | can be if they're long enough).
           | 
           | In contrast, `parallel` by default doesn't mix together
           | output from different commands at all, instead buffering the
           | entire output until the command exits and then printing it.
           | 
           | With `--line-buffer` the unit of atomicity can be weakened
           | from an entire command output to individual lines of output,
           | reducing latency.
           | 
           | Alternately, with `--keep-order`, `parallel` can ensure the
           | outputs are printed in the same order as the corresponding
           | inputs, which makes the output deterministic if the program
           | is deterministic. Without that you'll get results in an
           | arbitrary order.
           | 
           | These aren't technically things that xargs and shell can't
           | do; you could reimplement the same behavior by hand with the
           | shell. But by the same token, there isn't anything xargs can
           | do that the shell can't do alone; you could always use the
           | shell to manually split up the input and invoke subprocesses.
           | It's just a question of how much you want to reimplement by
           | hand.
        
             | chubot wrote:
             | OK thanks, looks like there are several features of GNU
             | parallel that users like.
             | 
             | For the output interleaving issue, what I do is use the $0
             | Dispatch Pattern and write a shell function that redirects
             | to a file:                   do_one() {
             | task_with_stdout > $dir/$task_id.txt         }
             | 
             | So if there are 10,000 tasks then I get 10,000 files, and I
             | can check the progress with "ls", and I can also see what
             | tasks failed and possibly restart them.
             | 
             | You even have some notion of progress by checking the file
             | size with ls -l.
             | 
             | I tend to use a pattern where each task also outputs a
             | metadata file: the exit status, along with the data from
             | "time" (rusage, etc.)
             | 
             | But I admit that this is annoying to rewrite in every
             | script that uses xargs! It does make sense to have this
             | functionality in a tool.
             | 
             | But I think that tool should be a LANGUAGE like Oil, not a
             | weirdo interface like GNU parallel :)
             | 
             | But thanks for the explanation (and thanks to everyone in
             | this subthread) -- I learned a bunch and this is why I
             | write blog posts :)
        
               | Godel_unicode wrote:
               | Thank you for writing this, it really crystalized for me
               | why I feel the way I do about oil. I hate it. When I want
               | a language, I want a real language like python not a
               | weirdo jumped up shell (see what I did there?). What I
               | want in a shell is a super small, fast, universally
               | understood thing for basic tasks and easy expandability
               | through tools like parallel and python.
               | 
               | For what it's worth, I consider oil to be closer to a
               | unixy PowerShell rather than a more powerful bash. Note
               | that this is not a slight, PowerShell is sweet for what
               | it is. It (oil) really takes a hard left from the POSIX
               | philosophy of focusing on one thing and doing it well.
               | I'm also bitter that, if it's going to veer so far away
               | from POSIX, that it didn't go the whole hundred and
               | become a function language with comprehensions and such.
               | 
               | For what it's worth, everything you mentioned above about
               | your approach can be done with parallel.
        
       | senkora wrote:
       | I always think of xargs as the inverse of echo. echo converts
       | arguments to text streams, and xargs converts text streams to
       | arguments.
        
       | kazinator wrote:
       | In 2002, I implemented xargs in Lisp, in the Meta-CVS project.
       | 
       | It is quite necessary, because you cannot pass an arbitrarily
       | large command line or environment in exec system calls.
       | 
       | Of course, this doesn't have the problem requiring -0 because
       | we're not reading textual lines from standard input, but working
       | with lists of strings.                 ;;; This source file is
       | part of the Meta-CVS program,       ;;; which is distributed
       | under the GNU license.       ;;; Copyright 2002 Kaz Kylheku
       | (in-package :meta-cvs)            (defconstant *argument-limit*
       | (* 64 1024))            (defun execute-program-xargs (fixed-args
       | &optional extra-args fixed-trail-args)         (let* ((fixed-size
       | (reduce #'(lambda (x y)                                        (+
       | x (length y) 1))                                    (append
       | fixed-args fixed-trail-args)
       | :initial-value 0))                (size fixed-size))
       | (if extra-args             (let ((chopped-arg ())
       | (combined-status t))               (dolist (arg extra-args)
       | (push arg chopped-arg)                 (when (> (incf size (1+
       | (length arg))) *argument-limit*)                   (setf
       | combined-status                         (and combined-status
       | (execute-program (append fixed-args
       | (nreverse chopped-arg)
       | fixed-trail-args))))                   (setf chopped-arg nil)
       | (setf size fixed-size)))               (when chopped-arg
       | (execute-program (append fixed-args (nreverse chopped-arg)
       | fixed-trail-args)))               combined-status)
       | (execute-program (append fixed-args fixed-trail-args)))))
        
       | jordemort wrote:
       | I appreciate this. If I wrote my own opinionated guide to xargs,
       | it would be a single profane sentence.
        
       | lisper wrote:
       | Note that the suggested:
       | 
       | rm $(ls | grep foo)
       | 
       | will not work if you have file names that contain spaces.
       | 
       | Shell programming is planted thick with landmines like this.
        
         | ViViDboarder wrote:
         | The linked article doesn't suggest this. They explicitly
         | suggest against it.
         | 
         | > Besides the extra ls, the suggestion is bad because it relies
         | on shell's word splitting. This is due to the unquoted $().
         | It's better to rely on the splitting algorithms in xargs,
         | because they're simpler and more powerful.
        
       | pwg wrote:
       | Since the blog author is commenting here, you have this statement
       | part way down your blog:
       | 
       | > That is, grep doesn't support an analogous -0 flag.
       | 
       | However, the GNU grep variant does have an analogous flag:
       | 
       | -z, --null-data
       | 
       | Treat the input as a set of lines, each terminated by a zero byte
       | (the ASCII NUL character) instead of a newline. Like the -Z or
       | --null option, this option can be used with commands like sort -z
       | to process arbitrary file names.
        
         | chubot wrote:
         | Ah cool, I didn't know that! I'll update the blog post. (What a
         | cacophony of flags)
         | 
         | Edit: It seems that grep -0 isn't taken for something else and
         | they should have used it for consistency? The man page says
         | it's meant to be used with find -print0, xargs -0, perl -0, and
         | sort -z (another inconsistency)
        
           | tyingq wrote:
           | I think that's because they needed to support both input and
           | output. So there's both -Z and -z. No such thing as an
           | uppercase 0 :)
        
           | kragen wrote:
           | It _is_ taken in grep, just poorly documented; grep -5 means
           | grep -C 5, and grep -0 means grep -C 0. It 's not taken in
           | sort, though, so I don't know why they didn't use -0 for
           | sort.
        
           | l0b0 wrote:
           | It's best to give up on any kind of consistency between
           | command options. Any project is free to do anything it wants,
           | and they all do. Someone is eventually going to come up with
           | standard N+1[1] which does things consistently, but they are
           | going to have to either recreate a bazillion tools or create
           | some sort of huge translation framework configuration on top
           | of existing tools to get there. And even then it'll take
           | literally decades before people migrate away from the current
           | tools. Basically, the sad truth is this isn't going to
           | happen.
           | 
           | [1] https://xkcd.com/927/
        
             | [deleted]
        
       | aaaaaaaaaaab wrote:
       | I would recommend using -0 instead of -d, as the latter is not
       | supported on BSD (and macOS) xargs:
       | do_something | tr \\n \\0 | xargs -0 ...
        
         | derriz wrote:
         | I wish this was the default behavior of xargs (the 'tr \\\n
         | \\\0 | xargs -0' bit). I don't know why xargs splits on spaces
         | and tabs as well as newlines by default and doesn't even have a
         | flag to just split on lines.
         | 
         | Ok filenames can theoretically have newlines in them but I'd be
         | happy to deal with that weird case. I can't recall ever having
         | encountered it in years of using bash on various systems.
         | 
         | Shell pipes would then orthogonally provide the stuff like
         | substitution that xargs does in it's own unique way (that I
         | just can't be bothered learning) - instead you'd just pipe the
         | find output through sed or 'grep -v' or whatever you wanted
         | before piping into xargs.
         | 
         | I guess that's what aliases but I'm too lazy anymore to bother
         | with configuring often short-lived systems all the time.
        
           | fl0wenol wrote:
           | xargs defaults to all whitespace because it was designed to
           | get around the problem of short argv lengths (like, I'm
           | talking 4k or less on older Unix-y systems, sometimes as low
           | as 255 bytes).
           | 
           | So the defaults went with principle of least surprise,
           | pretending it's like a very long args list that you could
           | theoretically enter at the shell, including quotes.
           | 
           | You could, for example, edit the args list in vi and line
           | split / indent as you please but not impact the end result.
        
       | ahawkins wrote:
       | Xargs ftw!
        
       ___________________________________________________________________
       (page generated 2021-08-21 23:00 UTC)