[HN Gopher] Design of GNU Parallel (2015)
___________________________________________________________________
Design of GNU Parallel (2015)
Author : Havelock
Score : 142 points
Date : 2023-03-19 02:49 UTC (20 hours ago)
(HTM) web link (www.gnu.org)
(TXT) w3m dump (www.gnu.org)
| ZoomZoomZoom wrote:
| If anyone needs a pretty basic alternative with Windows support,
| there's Rush:
|
| https://github.com/shenwei356/rush
|
| I use it pretty extensively with ffmpeg, imagemagick and the
| like.
|
| I'd been using the mmstick/parallel for a while, but it moved to
| RedoxOS repos and then stopped being updated, while still having
| some issues not ironed out.
|
| https://github.com/shenwei356/rush
| globalreset wrote:
| What's the best rewrite of GNU Parallel in Rust? That citation
| thing is so annoying.
| seized wrote:
| Parallel is a fun tool. I use it as a sort of simple slurm to
| distribute work over many VMs to process tens to hundreds of TBs
| of data. Sometimes across 2400+ cores.
| cricalix wrote:
| parallel is a tool I've reached for many times; the citation bit
| it prints is odd - it seems to assume that the general use case
| is research/academic - but easily squelched.
|
| A sample use case would be having a file that has words in it,
| one per line, and you want to run a program that operates on each
| word (device name, dollar amount, whatever). Sure, you can use a
| loop, but if the words and actions are independent, parallel is
| one way to spin up N copies of your program and pass it a single
| word from the file. Can get around Python's GIL without having to
| use multiprocessing or threads (as a more concrete example).
|
| Didn't realise that it busy waits, but I'm typically running it
| on a not very busy server with tens of cores.
| chungy wrote:
| Thankfully both Debian and Arch patch out the citation
| nonsense.
| RhysU wrote:
| It is "nonsense" because...?
|
| A) You don't understand. Please read the "Citation notice"
| section in the article.
|
| B) You understand but don't use GNU Parallel.
|
| C) You understand and use GNU Parallel in a non-academic
| setting and find the hassle of supplying --no-notice to be
| onerous vs the effort to write/maintain your own tool.
|
| D) You understand and use GNU Parallel in an academic setting
| and have cited Ole or plan to cite Ole.
|
| From the article, nearly 10 years ago Ole added the citation
| behavior after discussing it with his users: https://lists.gn
| u.org/archive/html/parallel/2013-11/msg00006...
|
| Ole's citations took off roughly coincident with this
| behavior being added: https://scholar.google.com/citations?hl
| =en&user=D7I0K34AAAAJ... (click "Cited By" and notice the bar
| chart).
| chungy wrote:
| How about
|
| E) I understand and use GNU Parallel and also completely
| disagree with the author's insistence that citing tools is
| appropriate.
|
| Even in your second link, almost everything listed are
| papers about Parallel itself. If I was writing about
| Parallel, I'd be fine with citing it. If instead it's the
| means to another end, I wouldn't.
| hexane360 wrote:
| It's nonsense because the standard in academic settings is
| to cite works which contribute scientifically to the
| current work, not merely utilities. If I publish a paper on
| a command line tool for parallel processing, inspired by
| features from GNU parallel, I would cite GNU parallel. But
| if I'm doing (for instance) computational biology work, I'm
| not going to cite: - the Linux kernel - Python - Matlab -
| GNU parallel - RFC 793 - Every other program I use
|
| Asking for citations is fine. But GNU parallel wants to
| treat it like a requirement of using the software, without
| making it a condition of the copyright: "== Is the citation
| notice compatible with GPLv3? ==
|
| Yes. The wording has been cleared by Richard M. Stallman to
| be compatible with GPLv3. This is because the citation
| notice is not part of the license, but part of academic
| tradition."
|
| This is disingenuous, because citing every tool you use in
| preparing a scientific work is _not_ part of academic
| tradition. And the statement that "If you pay 10000 EUR
| you should feel free to use GNU Parallel without citing."
| doesn't make any sense in the "academic tradition" framing.
| If Ole thinks citations are required by academic tradition,
| that shouldn't change if I pay him enough money.
|
| "If you disagree with Richard M. Stallman's interpretation
| and feel the citation notice does not adhere to GPLv3, you
| should treat the software as if it is not available under
| GPLv3. And since GPLv3 is the only thing that would give
| you the right to change it, you would not be allowed to
| change the software.
|
| In other words: If you want to remove the citation notice
| to make the software compliant with your interpretation of
| GPLv3, you first have to accept that the software is
| already compliant with GPLv3, because nothing else gives
| you the right to change it. And if you accept this, you do
| not need to change it to make it compliant."
|
| And this is legal nonsense. If I release something under a
| license, and then break that license, that doesn't nullify
| the original license. Claiming otherwise would allow me to
| un-copyleft someone else's code.
| justeleblanc wrote:
| Also, in what world is OSS financed by citations (which
| is stated as fact in the manpage)? The whole thing is
| just bizarre. Do I have to cite the manufacturer of my
| desk because I wrote my paper there?
| [deleted]
| RhysU wrote:
| > It's nonsense because the standard in academic settings
| is to cite works which contribute scientifically to the
| current work, not merely utilities.
|
| Whether or not it's standard is irrelevant. Ole asked you
| to cite him if you use it. So, if you publish
| academically, either don't use it or cite him. If not
| using GNU Parallel hinders your science then the tool
| must be material to your work flows.
|
| For comparison, how many dumb citations do people add to
| their papers that point to marginally relevant work
| coming out of the same research center or academic
| lineage? Those aren't scientifically relevant but they
| are standard. Let's not pretend the academy is full of
| citation purists.
| hexane360 wrote:
| "Ole asked you to cite him if you use it. So, if you
| publish academically, either don't use it or cite him."
|
| Why? Whether something has contributed meaningfully to my
| research is my decision, not Ole's. Not having light
| "hinders my science", so I'll be sure to cite Edison on
| all my papers.
|
| I agree with the sibling commentator that Ole's behavior
| is jerkish. Not because he asked for citations, but that
| he misleads users by claiming his request is standard,
| when it is decidedly not. He also obfuscates the
| voluntary nature of his request as much as possible, to
| make it seem like citing is a legal requirement. And he
| is inflammatory in responding to people who make the
| perfectly valid decision to not cite him, or to patch the
| notice out.
| RhysU wrote:
| > Why?
|
| The Golden Rule.
|
| You would be pissed if you spent years on something, felt
| it was a contribution, saw the community use it, asked
| them to cite it, and weren't cited.
| chungy wrote:
| At least in the real world, free software doesn't demand
| that you agree with authors or do anything really. For as
| long as Ole keeps Parallel as free software, we can use
| it regardless of complying with requests.
|
| Quite honestly, I think the behavior is on the highest
| order of jerkishness. A nice request could be done in the
| documentation, instead the path chosen is to bully users
| of the software.
|
| Once more, because it is free software, we are free to
| use it despite what Ole thinks. We are free to patch it
| out too.
| RhysU wrote:
| Ignoring his wishes and patching around them is also
| being a jerk. The dude didn't have to open source or
| maintain anything.
| [deleted]
| xyzzy_plugh wrote:
| It's nonsense because a utility like parallel shouldn't
| require state, let alone state used only to disable a nag
| message. It's far less annoying to simply patch out the
| nag.
|
| As others point out, it's further annoying because it
| doesn't even make any sense to begin with. If it was asking
| for donations or something I could maybe even get behind
| it, but the current message is pretentious and useless. It
| serves no real purpose.
| RhysU wrote:
| Then hop on the mailing list and suggest he set up a
| donation drop and donate.
| ketzu wrote:
| This was quite interesting to look through!
|
| Perl 5.8.0 is over 20 years old
| (https://dev.perl.org/perl5/news/2002/07/18/580ann/) while centOS
| 3.9 was released in 2007! At the same time it seems not-that-old
| and ancient.
|
| My personal anecdote with gnu parallel was running into it while
| working in academia. It worked well and saved me some time, but I
| felt that it was unreasonable of a tool to ask for a citation to
| parallelise a script - it seemed that matplotlib, jupyter and co
| would need one as well. On the other hand, I decided to not use
| it, because I also feel that authors can ask for whatever they
| want.
| Ferret7446 wrote:
| It's a request, not a requirement. I see nothing wrong with the
| request nor if an individual decides to not cite it due to
| their principles/judgement.
| ajsnigrutin wrote:
| Yep, that's the great thing about perl... take a 20 year old
| script and it still works today. In comparison, if they used
| python, they'd be using python 2.2.
| fmajid wrote:
| That's basically a side-effect of Perl being a dead language,
| frozen because Perl 6 will never happen. It's surprisingly
| hard to eradicate, however.
| uhtred wrote:
| Perl 5 is actively developed still though, and presumable
| will become Perl 7 at some point.
|
| Why does a language being stable mean it's dead? Is Awk
| dead?
| VyseofArcadia wrote:
| There's value in stability, though.
|
| Maybe it's not dead. Maybe it's just finished. Does
| everything need to keep changing? Change isn't always
| improvement, and even if it is, if you have to maintain
| backwards compatibility, sometimes the conceptual load of
| having to keep the old ways and the new ways in your head
| all the time isn't worth it.
|
| Maybe we should start letting things just be finished.
| attractivechaos wrote:
| > _That 's basically a side-effect of Perl being a dead
| language_
|
| Keeping long-term backward compatibility does not
| necessarily mean dying. C is 50 years old and still alive.
| I have written a lot more Perl than Python. IMHO, Perl is
| dying because its syntax is arcane and confusing. We can't
| solve this problem unless we design a brand new language.
| chungy wrote:
| Perl isn't dead, not by a long shot. Perl 6 happened too,
| and because compatibility was never even really a thought,
| renamed to Raku instead. There's been talks for a few years
| of finally bumping Perl's major version in order to change
| the defaults.
| michalc wrote:
| I've never been sure if it's too much of a hack, but I've used
| GNU parallel in Docker containers as a quick and easy way of
| getting multiple processes running for web applications.
|
| And with the `--halt now,done=1` option (that I think is
| relatively recent?) it means that if any of the parallel
| processes exit, parallel would exit itself, the whole container
| will shut down, and external orchestration would start another
| one if needed.
| KronisLV wrote:
| I've used Supervisor pretty successfully for this as well:
| http://supervisord.org/
|
| Example of installing it in a Debian/Ubuntu container during
| container build, here's an example Dockerfile:
| RUN apt-get update \ && apt-get -yq --no-upgrade
| install \ supervisor \ && apt-get clean
| \ && rm -rf /var/lib/apt/lists /var/cache/apt/*
|
| Then it's possible to create a configuration file, for example
| /etc/supervisord.conf, to specify what should run and how:
| [supervisord] nodaemon=true [program:php-
| fpm] command=/usr/sbin/php-fpm8.0 -c
| /etc/php/8.0/fpm/php-fpm.conf --nodaemonize
| stdout_logfile=/dev/stdout stdout_logfile_maxbytes=0
| stderr_logfile=/dev/stderr stderr_logfile_maxbytes=0
| [program:nginx] command=/usr/sbin/nginx
| stdout_logfile=/dev/stdout stdout_logfile_maxbytes=0
| stderr_logfile=/dev/stderr stderr_logfile_maxbytes=0
|
| And finally it can be run inside of the container entrypoint,
| along the lines of this in docker-entrypoint.sh:
| #!/bin/bash echo "Software versions..." nginx -V &&
| supervisord --version echo "Running Supervisor..."
| supervisord --configuration=/etc/supervisord.conf
|
| Here's more information about the configuration file format, in
| case anyone is curious:
| http://supervisord.org/configuration.html
|
| It should be noted that this package will bring in some
| dependencies, though, which may or may not be okay, depending
| on how stringent you are about space usage and what's in your
| containers, example for a Ubuntu container: The
| following NEW packages will be installed: libexpat1
| libmpdec3 libpython3-stdlib libpython3.10-minimal
| libpython3.10-stdlib libreadline8 libsqlite3-0 media-types
| python3 python3-minimal python3-pkg-resources python3.10
| python3.10-minimal readline-common supervisor 0 upgraded,
| 15 newly installed, 0 to remove and 0 not upgraded. Need
| to get 6905 kB of archives. After this operation, 25.7 MB
| of additional disk space will be used.
|
| (just found the piece of software itself useful for this use
| case, figured I'd share my experiences)
|
| My problem is that it's not always immediately clear how
| software that would normally run as a systemd service could be
| launched in the foreground instead. It usually takes a bit of
| digging around.
| fbdab103 wrote:
| This is pretty crafty. I do not know supervisor well enough -
| if one of the services fail, can you engineer supervisor to
| also crash so that it would bubble up to the container
| infrastructure? My understanding is that standard supervisor
| would let the process die and/or restart the service.
| KronisLV wrote:
| Supervisor allows you to have event listeners (e.g. for
| processes quitting/crashing), so you can use those to
| achieve that and kill supervisor itself. Here's an example
| of people doing just that: https://gist.github.com/tomazzam
| an/63265dfab3a9a61781993212f...
| michalc wrote:
| I have previously thought a bit about using something like
| Supervisor. And if I was running something a bit closer to
| the metal, with no other infrastructure to restart stuff,
| then I would be much more pro.
|
| But if inside Docker when something else already has the job
| of restarting things if they fall over, then it feels a bit
| over complicated in that there are multiple ways of doing the
| restarting. Plus, I think there is a touch more visibility -
| it's all just command line arguments to parallel:
| parallel --will-cite --line-buffer --jobs 2 --halt now,done=1
| ::: \ "some_proc some args" \
| "another_proc some more args"
| vrnvu wrote:
| Cool tip thanks for sharing! I love letting process crash *when
| possible* on failures so the OS restart them for me versus
| trying to handle it manually at process level.
| rurban wrote:
| I wrote down a small usage example here:
| https://savannah.gnu.org/forum/forum.php?forum_id=9197
|
| No need for massive distributed clusters when you have a simple
| perl oneliner
| rockwotj wrote:
| I recently used parallel to write a 1TB data file for testing
| using all cores seq 0 10000 | parallel dd
| if=/dev/urandom of=/mnt/foo/input bs=10M count=10 seek={}0
| codetrotter wrote:
| Was it noticeably different from dd
| if=/dev/urandom of=/mnt/foo/input bs=10M count=100000
|
| in the amount of time that it took?
| BooneJS wrote:
| Before GNU Parallel I used to use Ruby's workers and job queue to
| keep ${N} cores busy with work. It sorta worked like GNU parallel
| but was quite basic. I've since switched to using GNU Parallel.
| Stable code I don't have to write doesn't have to be
| maintained... not to mention it has more features than I normally
| supported.
| Alifatisk wrote:
| What did you use exactly? I am curious, Resque? Sidekick?
| anthk wrote:
| Parallel, vidir to edit directories with nvi/vim, moreutils,
| detox to scrap out any non-typeable char...
|
| These are a must have today.
| InfamousRece wrote:
| moreutils have its own parallel utility that I actually prefer
| to Gnu parallel.
| [deleted]
| anthk wrote:
| No problems, they almost work the same I think. Oh, another
| bunch of small tools to help yourself: -
| entr. It runs a command on file/directory changes. -
| spt. Simple pomodoro technique. A good timer to help yourself
| to work and take rests. - herbe. It works great as a
| notifier for spt. Add "play" from sox to write a script to
| both notify and play a sound in parallel. -
| sox/ffmpeg/imagemagick. Audio, video and image production and
| conversion on the CLI. A must have. -
| catdoc/antiword/odt2txt/wordgrinder+sc-im+gnuplot.
| Word/Excel/Libreoffice files reading and editing on the
| terminal. Gnuplot with help with sc-im. This can be a beast
| over SSH. With Gnuplot compiled with sixel support (and
| XTerm) you can do magic.
|
| - iomenu - cat bookmarks.txt | iomenu |
| xargs firefox. Pick from a list of items (one per line) and
| choose. I think it has fuzzy-finding matches.
|
| I have several more. Simple battery meter (sbm), grabc to
| grab a color from the screen, pointtools+catpoint to do
| "presentations" over a terminal, nncp-go+yggdrasil for ad-hoc
| networking and secure encrypted backups between devices...
| andrewshadura wrote:
| There's also paexec
| docandrew wrote:
| I couldn't make heads or tails of what this would be useful for
| from the OP (maybe it's something I should already have known),
| but this from the official site was pretty helpful:
| https://www.gnu.org/software/parallel/parallel_cheat.pdf
| psychphysic wrote:
| That cheat sheet is super enlightening!
|
| But quite useless as it'll print poorly and is overall a waste
| of resources to have that lovely beach scene in the background.
| kakadzhun wrote:
| Try this resource instead. Although it is 100 pages, the
| introductory part is already useful in and of itself!
|
| https://zenodo.org/record/1146014/files/GNU_Parallel_2018.pd.
| ..
| chungy wrote:
| Does Ole remember to cite LibreOffice in the production of
| that document?
| RadiozRadioz wrote:
| The beach will certainly make the cheat sheet stick in my
| memory, I can tell you that much.
| bloopernova wrote:
| I was able to remove the background using LibreOffice to open
| the PDF.
| imglorp wrote:
| Don't forget "make -j" is another option.
| fbdab103 wrote:
| I was just attempting to parallelize a makefile (~500 files,
| ~20 minutes per file), and I was not happy with the experience.
| Make syntax for globbing is not ideal. Doubly so as my files
| had spaces inside of them. All solvable of course, but I feel
| more comfortable leaning on a parallel/xargs/find workflow than
| esoteric make syntax to handle the realities of filenames in
| the wild.
|
| Which is a shame - 95% of my make usage is PHONY targets where
| I have a task and not a generated artifact. My current use case
| would have greatly benefited from the native parallelism and
| the ability to restart only failed files.
| fmajid wrote:
| Or `xargs -P`
| a2800276 wrote:
| Wait what: `parallel` is a Perl script!? [1]
|
| I would have thought it's black magic with assembler
| optimisations for MIPS and special considerations for HP-UX...
|
| This is such a lovely and interesting writeup, it's wonderful
| that people take their time to share so generously!
|
| [1] : an 11k loc petal script, you can read along here:
| https://github.com/gitGNU/gnu_parallel/blob/master/src/paral...
| mhh__ wrote:
| assembly optimizations for starting processes?
| remram wrote:
| Maybe for reading the input, splitting it, and assembling the
| possibly-very-long argument lists passed to the processes.
| NortySpock wrote:
| I found GNU parallel useful when I wanted to queue up transcoding
| of flac files to mp3 on my Raspberry Pi. A few ffmpeg flags plus
| a list of files meant I could easily just saturate one job per
| core with a one-line bash command.
| krylon wrote:
| I like to use ts(1) for that.
| http://vicerveza.homeunix.net/~viric/soft/ts/
| hkt wrote:
| I've used it to parallelise updating hundreds of helm releases
| whose CI pipelines had ceased to exist. It is a neat tool.
| noloblo wrote:
| Can you please share the example code in gnu parallel
___________________________________________________________________
(page generated 2023-03-19 23:02 UTC)