[HN Gopher] Design of GNU Parallel (2015)
       ___________________________________________________________________
        
       Design of GNU Parallel (2015)
        
       Author : Havelock
       Score  : 142 points
       Date   : 2023-03-19 02:49 UTC (20 hours ago)
        
 (HTM) web link (www.gnu.org)
 (TXT) w3m dump (www.gnu.org)
        
       | ZoomZoomZoom wrote:
       | If anyone needs a pretty basic alternative with Windows support,
       | there's Rush:
       | 
       | https://github.com/shenwei356/rush
       | 
       | I use it pretty extensively with ffmpeg, imagemagick and the
       | like.
       | 
       | I'd been using the mmstick/parallel for a while, but it moved to
       | RedoxOS repos and then stopped being updated, while still having
       | some issues not ironed out.
       | 
       | https://github.com/shenwei356/rush
        
       | globalreset wrote:
       | What's the best rewrite of GNU Parallel in Rust? That citation
       | thing is so annoying.
        
       | seized wrote:
       | Parallel is a fun tool. I use it as a sort of simple slurm to
       | distribute work over many VMs to process tens to hundreds of TBs
       | of data. Sometimes across 2400+ cores.
        
       | cricalix wrote:
       | parallel is a tool I've reached for many times; the citation bit
       | it prints is odd - it seems to assume that the general use case
       | is research/academic - but easily squelched.
       | 
       | A sample use case would be having a file that has words in it,
       | one per line, and you want to run a program that operates on each
       | word (device name, dollar amount, whatever). Sure, you can use a
       | loop, but if the words and actions are independent, parallel is
       | one way to spin up N copies of your program and pass it a single
       | word from the file. Can get around Python's GIL without having to
       | use multiprocessing or threads (as a more concrete example).
       | 
       | Didn't realise that it busy waits, but I'm typically running it
       | on a not very busy server with tens of cores.
        
         | chungy wrote:
         | Thankfully both Debian and Arch patch out the citation
         | nonsense.
        
           | RhysU wrote:
           | It is "nonsense" because...?
           | 
           | A) You don't understand. Please read the "Citation notice"
           | section in the article.
           | 
           | B) You understand but don't use GNU Parallel.
           | 
           | C) You understand and use GNU Parallel in a non-academic
           | setting and find the hassle of supplying --no-notice to be
           | onerous vs the effort to write/maintain your own tool.
           | 
           | D) You understand and use GNU Parallel in an academic setting
           | and have cited Ole or plan to cite Ole.
           | 
           | From the article, nearly 10 years ago Ole added the citation
           | behavior after discussing it with his users: https://lists.gn
           | u.org/archive/html/parallel/2013-11/msg00006...
           | 
           | Ole's citations took off roughly coincident with this
           | behavior being added: https://scholar.google.com/citations?hl
           | =en&user=D7I0K34AAAAJ... (click "Cited By" and notice the bar
           | chart).
        
             | chungy wrote:
             | How about
             | 
             | E) I understand and use GNU Parallel and also completely
             | disagree with the author's insistence that citing tools is
             | appropriate.
             | 
             | Even in your second link, almost everything listed are
             | papers about Parallel itself. If I was writing about
             | Parallel, I'd be fine with citing it. If instead it's the
             | means to another end, I wouldn't.
        
             | hexane360 wrote:
             | It's nonsense because the standard in academic settings is
             | to cite works which contribute scientifically to the
             | current work, not merely utilities. If I publish a paper on
             | a command line tool for parallel processing, inspired by
             | features from GNU parallel, I would cite GNU parallel. But
             | if I'm doing (for instance) computational biology work, I'm
             | not going to cite: - the Linux kernel - Python - Matlab -
             | GNU parallel - RFC 793 - Every other program I use
             | 
             | Asking for citations is fine. But GNU parallel wants to
             | treat it like a requirement of using the software, without
             | making it a condition of the copyright: "== Is the citation
             | notice compatible with GPLv3? ==
             | 
             | Yes. The wording has been cleared by Richard M. Stallman to
             | be compatible with GPLv3. This is because the citation
             | notice is not part of the license, but part of academic
             | tradition."
             | 
             | This is disingenuous, because citing every tool you use in
             | preparing a scientific work is _not_ part of academic
             | tradition. And the statement that  "If you pay 10000 EUR
             | you should feel free to use GNU Parallel without citing."
             | doesn't make any sense in the "academic tradition" framing.
             | If Ole thinks citations are required by academic tradition,
             | that shouldn't change if I pay him enough money.
             | 
             | "If you disagree with Richard M. Stallman's interpretation
             | and feel the citation notice does not adhere to GPLv3, you
             | should treat the software as if it is not available under
             | GPLv3. And since GPLv3 is the only thing that would give
             | you the right to change it, you would not be allowed to
             | change the software.
             | 
             | In other words: If you want to remove the citation notice
             | to make the software compliant with your interpretation of
             | GPLv3, you first have to accept that the software is
             | already compliant with GPLv3, because nothing else gives
             | you the right to change it. And if you accept this, you do
             | not need to change it to make it compliant."
             | 
             | And this is legal nonsense. If I release something under a
             | license, and then break that license, that doesn't nullify
             | the original license. Claiming otherwise would allow me to
             | un-copyleft someone else's code.
        
               | justeleblanc wrote:
               | Also, in what world is OSS financed by citations (which
               | is stated as fact in the manpage)? The whole thing is
               | just bizarre. Do I have to cite the manufacturer of my
               | desk because I wrote my paper there?
        
               | [deleted]
        
               | RhysU wrote:
               | > It's nonsense because the standard in academic settings
               | is to cite works which contribute scientifically to the
               | current work, not merely utilities.
               | 
               | Whether or not it's standard is irrelevant. Ole asked you
               | to cite him if you use it. So, if you publish
               | academically, either don't use it or cite him. If not
               | using GNU Parallel hinders your science then the tool
               | must be material to your work flows.
               | 
               | For comparison, how many dumb citations do people add to
               | their papers that point to marginally relevant work
               | coming out of the same research center or academic
               | lineage? Those aren't scientifically relevant but they
               | are standard. Let's not pretend the academy is full of
               | citation purists.
        
               | hexane360 wrote:
               | "Ole asked you to cite him if you use it. So, if you
               | publish academically, either don't use it or cite him."
               | 
               | Why? Whether something has contributed meaningfully to my
               | research is my decision, not Ole's. Not having light
               | "hinders my science", so I'll be sure to cite Edison on
               | all my papers.
               | 
               | I agree with the sibling commentator that Ole's behavior
               | is jerkish. Not because he asked for citations, but that
               | he misleads users by claiming his request is standard,
               | when it is decidedly not. He also obfuscates the
               | voluntary nature of his request as much as possible, to
               | make it seem like citing is a legal requirement. And he
               | is inflammatory in responding to people who make the
               | perfectly valid decision to not cite him, or to patch the
               | notice out.
        
               | RhysU wrote:
               | > Why?
               | 
               | The Golden Rule.
               | 
               | You would be pissed if you spent years on something, felt
               | it was a contribution, saw the community use it, asked
               | them to cite it, and weren't cited.
        
               | chungy wrote:
               | At least in the real world, free software doesn't demand
               | that you agree with authors or do anything really. For as
               | long as Ole keeps Parallel as free software, we can use
               | it regardless of complying with requests.
               | 
               | Quite honestly, I think the behavior is on the highest
               | order of jerkishness. A nice request could be done in the
               | documentation, instead the path chosen is to bully users
               | of the software.
               | 
               | Once more, because it is free software, we are free to
               | use it despite what Ole thinks. We are free to patch it
               | out too.
        
               | RhysU wrote:
               | Ignoring his wishes and patching around them is also
               | being a jerk. The dude didn't have to open source or
               | maintain anything.
        
               | [deleted]
        
             | xyzzy_plugh wrote:
             | It's nonsense because a utility like parallel shouldn't
             | require state, let alone state used only to disable a nag
             | message. It's far less annoying to simply patch out the
             | nag.
             | 
             | As others point out, it's further annoying because it
             | doesn't even make any sense to begin with. If it was asking
             | for donations or something I could maybe even get behind
             | it, but the current message is pretentious and useless. It
             | serves no real purpose.
        
               | RhysU wrote:
               | Then hop on the mailing list and suggest he set up a
               | donation drop and donate.
        
       | ketzu wrote:
       | This was quite interesting to look through!
       | 
       | Perl 5.8.0 is over 20 years old
       | (https://dev.perl.org/perl5/news/2002/07/18/580ann/) while centOS
       | 3.9 was released in 2007! At the same time it seems not-that-old
       | and ancient.
       | 
       | My personal anecdote with gnu parallel was running into it while
       | working in academia. It worked well and saved me some time, but I
       | felt that it was unreasonable of a tool to ask for a citation to
       | parallelise a script - it seemed that matplotlib, jupyter and co
       | would need one as well. On the other hand, I decided to not use
       | it, because I also feel that authors can ask for whatever they
       | want.
        
         | Ferret7446 wrote:
         | It's a request, not a requirement. I see nothing wrong with the
         | request nor if an individual decides to not cite it due to
         | their principles/judgement.
        
         | ajsnigrutin wrote:
         | Yep, that's the great thing about perl... take a 20 year old
         | script and it still works today. In comparison, if they used
         | python, they'd be using python 2.2.
        
           | fmajid wrote:
           | That's basically a side-effect of Perl being a dead language,
           | frozen because Perl 6 will never happen. It's surprisingly
           | hard to eradicate, however.
        
             | uhtred wrote:
             | Perl 5 is actively developed still though, and presumable
             | will become Perl 7 at some point.
             | 
             | Why does a language being stable mean it's dead? Is Awk
             | dead?
        
             | VyseofArcadia wrote:
             | There's value in stability, though.
             | 
             | Maybe it's not dead. Maybe it's just finished. Does
             | everything need to keep changing? Change isn't always
             | improvement, and even if it is, if you have to maintain
             | backwards compatibility, sometimes the conceptual load of
             | having to keep the old ways and the new ways in your head
             | all the time isn't worth it.
             | 
             | Maybe we should start letting things just be finished.
        
             | attractivechaos wrote:
             | > _That 's basically a side-effect of Perl being a dead
             | language_
             | 
             | Keeping long-term backward compatibility does not
             | necessarily mean dying. C is 50 years old and still alive.
             | I have written a lot more Perl than Python. IMHO, Perl is
             | dying because its syntax is arcane and confusing. We can't
             | solve this problem unless we design a brand new language.
        
             | chungy wrote:
             | Perl isn't dead, not by a long shot. Perl 6 happened too,
             | and because compatibility was never even really a thought,
             | renamed to Raku instead. There's been talks for a few years
             | of finally bumping Perl's major version in order to change
             | the defaults.
        
       | michalc wrote:
       | I've never been sure if it's too much of a hack, but I've used
       | GNU parallel in Docker containers as a quick and easy way of
       | getting multiple processes running for web applications.
       | 
       | And with the `--halt now,done=1` option (that I think is
       | relatively recent?) it means that if any of the parallel
       | processes exit, parallel would exit itself, the whole container
       | will shut down, and external orchestration would start another
       | one if needed.
        
         | KronisLV wrote:
         | I've used Supervisor pretty successfully for this as well:
         | http://supervisord.org/
         | 
         | Example of installing it in a Debian/Ubuntu container during
         | container build, here's an example Dockerfile:
         | RUN apt-get update \           && apt-get -yq --no-upgrade
         | install \               supervisor \           && apt-get clean
         | \           && rm -rf /var/lib/apt/lists /var/cache/apt/*
         | 
         | Then it's possible to create a configuration file, for example
         | /etc/supervisord.conf, to specify what should run and how:
         | [supervisord]       nodaemon=true              [program:php-
         | fpm]       command=/usr/sbin/php-fpm8.0 -c
         | /etc/php/8.0/fpm/php-fpm.conf --nodaemonize
         | stdout_logfile=/dev/stdout       stdout_logfile_maxbytes=0
         | stderr_logfile=/dev/stderr       stderr_logfile_maxbytes=0
         | [program:nginx]       command=/usr/sbin/nginx
         | stdout_logfile=/dev/stdout       stdout_logfile_maxbytes=0
         | stderr_logfile=/dev/stderr       stderr_logfile_maxbytes=0
         | 
         | And finally it can be run inside of the container entrypoint,
         | along the lines of this in docker-entrypoint.sh:
         | #!/bin/bash       echo "Software versions..."       nginx -V &&
         | supervisord --version              echo "Running Supervisor..."
         | supervisord --configuration=/etc/supervisord.conf
         | 
         | Here's more information about the configuration file format, in
         | case anyone is curious:
         | http://supervisord.org/configuration.html
         | 
         | It should be noted that this package will bring in some
         | dependencies, though, which may or may not be okay, depending
         | on how stringent you are about space usage and what's in your
         | containers, example for a Ubuntu container:                 The
         | following NEW packages will be installed:         libexpat1
         | libmpdec3 libpython3-stdlib libpython3.10-minimal
         | libpython3.10-stdlib libreadline8 libsqlite3-0 media-types
         | python3 python3-minimal python3-pkg-resources python3.10
         | python3.10-minimal readline-common supervisor       0 upgraded,
         | 15 newly installed, 0 to remove and 0 not upgraded.       Need
         | to get 6905 kB of archives.       After this operation, 25.7 MB
         | of additional disk space will be used.
         | 
         | (just found the piece of software itself useful for this use
         | case, figured I'd share my experiences)
         | 
         | My problem is that it's not always immediately clear how
         | software that would normally run as a systemd service could be
         | launched in the foreground instead. It usually takes a bit of
         | digging around.
        
           | fbdab103 wrote:
           | This is pretty crafty. I do not know supervisor well enough -
           | if one of the services fail, can you engineer supervisor to
           | also crash so that it would bubble up to the container
           | infrastructure? My understanding is that standard supervisor
           | would let the process die and/or restart the service.
        
             | KronisLV wrote:
             | Supervisor allows you to have event listeners (e.g. for
             | processes quitting/crashing), so you can use those to
             | achieve that and kill supervisor itself. Here's an example
             | of people doing just that: https://gist.github.com/tomazzam
             | an/63265dfab3a9a61781993212f...
        
           | michalc wrote:
           | I have previously thought a bit about using something like
           | Supervisor. And if I was running something a bit closer to
           | the metal, with no other infrastructure to restart stuff,
           | then I would be much more pro.
           | 
           | But if inside Docker when something else already has the job
           | of restarting things if they fall over, then it feels a bit
           | over complicated in that there are multiple ways of doing the
           | restarting. Plus, I think there is a touch more visibility -
           | it's all just command line arguments to parallel:
           | parallel --will-cite --line-buffer --jobs 2 --halt now,done=1
           | ::: \             "some_proc some args" \
           | "another_proc some more args"
        
         | vrnvu wrote:
         | Cool tip thanks for sharing! I love letting process crash *when
         | possible* on failures so the OS restart them for me versus
         | trying to handle it manually at process level.
        
       | rurban wrote:
       | I wrote down a small usage example here:
       | https://savannah.gnu.org/forum/forum.php?forum_id=9197
       | 
       | No need for massive distributed clusters when you have a simple
       | perl oneliner
        
       | rockwotj wrote:
       | I recently used parallel to write a 1TB data file for testing
       | using all cores                 seq 0 10000 | parallel dd
       | if=/dev/urandom of=/mnt/foo/input bs=10M count=10 seek={}0
        
         | codetrotter wrote:
         | Was it noticeably different from                   dd
         | if=/dev/urandom of=/mnt/foo/input bs=10M count=100000
         | 
         | in the amount of time that it took?
        
       | BooneJS wrote:
       | Before GNU Parallel I used to use Ruby's workers and job queue to
       | keep ${N} cores busy with work. It sorta worked like GNU parallel
       | but was quite basic. I've since switched to using GNU Parallel.
       | Stable code I don't have to write doesn't have to be
       | maintained... not to mention it has more features than I normally
       | supported.
        
         | Alifatisk wrote:
         | What did you use exactly? I am curious, Resque? Sidekick?
        
       | anthk wrote:
       | Parallel, vidir to edit directories with nvi/vim, moreutils,
       | detox to scrap out any non-typeable char...
       | 
       | These are a must have today.
        
         | InfamousRece wrote:
         | moreutils have its own parallel utility that I actually prefer
         | to Gnu parallel.
        
           | [deleted]
        
           | anthk wrote:
           | No problems, they almost work the same I think. Oh, another
           | bunch of small tools to help yourself:                   -
           | entr. It runs a command on file/directory changes.         -
           | spt. Simple pomodoro technique. A good timer to help yourself
           | to work and take rests.         - herbe. It works great as a
           | notifier for spt. Add "play" from sox to write a script to
           | both        notify and play a sound in parallel.         -
           | sox/ffmpeg/imagemagick. Audio, video and image production and
           | conversion on the CLI. A must have.         -
           | catdoc/antiword/odt2txt/wordgrinder+sc-im+gnuplot.
           | Word/Excel/Libreoffice files reading and editing on the
           | terminal. Gnuplot with help with sc-im. This can be a beast
           | over SSH. With Gnuplot compiled with sixel support (and
           | XTerm) you can do magic.
           | 
           | - iomenu                    - cat bookmarks.txt | iomenu |
           | xargs firefox. Pick from a list of items (one per line) and
           | choose. I think it has fuzzy-finding matches.
           | 
           | I have several more. Simple battery meter (sbm), grabc to
           | grab a color from the screen, pointtools+catpoint to do
           | "presentations" over a terminal, nncp-go+yggdrasil for ad-hoc
           | networking and secure encrypted backups between devices...
        
           | andrewshadura wrote:
           | There's also paexec
        
       | docandrew wrote:
       | I couldn't make heads or tails of what this would be useful for
       | from the OP (maybe it's something I should already have known),
       | but this from the official site was pretty helpful:
       | https://www.gnu.org/software/parallel/parallel_cheat.pdf
        
         | psychphysic wrote:
         | That cheat sheet is super enlightening!
         | 
         | But quite useless as it'll print poorly and is overall a waste
         | of resources to have that lovely beach scene in the background.
        
           | kakadzhun wrote:
           | Try this resource instead. Although it is 100 pages, the
           | introductory part is already useful in and of itself!
           | 
           | https://zenodo.org/record/1146014/files/GNU_Parallel_2018.pd.
           | ..
        
             | chungy wrote:
             | Does Ole remember to cite LibreOffice in the production of
             | that document?
        
           | RadiozRadioz wrote:
           | The beach will certainly make the cheat sheet stick in my
           | memory, I can tell you that much.
        
           | bloopernova wrote:
           | I was able to remove the background using LibreOffice to open
           | the PDF.
        
       | imglorp wrote:
       | Don't forget "make -j" is another option.
        
         | fbdab103 wrote:
         | I was just attempting to parallelize a makefile (~500 files,
         | ~20 minutes per file), and I was not happy with the experience.
         | Make syntax for globbing is not ideal. Doubly so as my files
         | had spaces inside of them. All solvable of course, but I feel
         | more comfortable leaning on a parallel/xargs/find workflow than
         | esoteric make syntax to handle the realities of filenames in
         | the wild.
         | 
         | Which is a shame - 95% of my make usage is PHONY targets where
         | I have a task and not a generated artifact. My current use case
         | would have greatly benefited from the native parallelism and
         | the ability to restart only failed files.
        
         | fmajid wrote:
         | Or `xargs -P`
        
       | a2800276 wrote:
       | Wait what: `parallel` is a Perl script!? [1]
       | 
       | I would have thought it's black magic with assembler
       | optimisations for MIPS and special considerations for HP-UX...
       | 
       | This is such a lovely and interesting writeup, it's wonderful
       | that people take their time to share so generously!
       | 
       | [1] : an 11k loc petal script, you can read along here:
       | https://github.com/gitGNU/gnu_parallel/blob/master/src/paral...
        
         | mhh__ wrote:
         | assembly optimizations for starting processes?
        
           | remram wrote:
           | Maybe for reading the input, splitting it, and assembling the
           | possibly-very-long argument lists passed to the processes.
        
       | NortySpock wrote:
       | I found GNU parallel useful when I wanted to queue up transcoding
       | of flac files to mp3 on my Raspberry Pi. A few ffmpeg flags plus
       | a list of files meant I could easily just saturate one job per
       | core with a one-line bash command.
        
         | krylon wrote:
         | I like to use ts(1) for that.
         | http://vicerveza.homeunix.net/~viric/soft/ts/
        
         | hkt wrote:
         | I've used it to parallelise updating hundreds of helm releases
         | whose CI pipelines had ceased to exist. It is a neat tool.
        
           | noloblo wrote:
           | Can you please share the example code in gnu parallel
        
       ___________________________________________________________________
       (page generated 2023-03-19 23:02 UTC)