add xargs article - www.codemadness.org - www.codemadness.org saait content files
(HTM) git clone git://git.codemadness.org/www.codemadness.org
(DIR) Log
(DIR) Files
(DIR) Refs
(DIR) README
(DIR) LICENSE
---
(DIR) commit 32c11fc0471f7a0e7354a089ff663668863701fe
(DIR) parent 53575c812355488a857c20d86f09ce787a956adc
(HTM) Author: Hiltjo Posthuma <hiltjo@codemadness.org>
Date: Wed, 22 Nov 2023 19:31:51 +0100
add xargs article
Diffstat:
M config.cfg | 2 +-
M output/atom.xml | 14 +++++++++++++-
M output/atom_content.xml | 181 ++++++++++++++++++++++++++++++-
M output/index | 1 +
M output/index.html | 1 +
M output/rss.xml | 8 ++++++++
M output/rss_content.xml | 174 +++++++++++++++++++++++++++++++
M output/sitemap.xml | 4 ++++
M output/twtxt.txt | 1 +
M output/urllist.txt | 1 +
A output/xargs.html | 218 +++++++++++++++++++++++++++++++
A output/xargs.md | 188 +++++++++++++++++++++++++++++++
A pages/xargs.cfg | 6 ++++++
A pages/xargs.md | 188 +++++++++++++++++++++++++++++++
14 files changed, 984 insertions(+), 3 deletions(-)
---
(DIR) diff --git a/config.cfg b/config.cfg
@@ -1,5 +1,5 @@
# last updated the site.
-siteupdated = 2023-11-20
+siteupdated = 2023-11-22
sitetitle = Codemadness
siteurl = https://www.codemadness.org
(DIR) diff --git a/output/atom.xml b/output/atom.xml
@@ -2,11 +2,23 @@
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
<title>Codemadness</title>
<subtitle>blog with various projects and articles about computer-related things</subtitle>
- <updated>2023-11-20T00:00:00Z</updated>
+ <updated>2023-11-22T00:00:00Z</updated>
<link rel="alternate" type="text/html" href="https://www.codemadness.org" />
<id>https://www.codemadness.org/atom.xml</id>
<link rel="self" type="application/atom+xml" href="https://www.codemadness.org/atom.xml" />
<entry>
+ <title>xargs: an example for batch jobs</title>
+ <link rel="alternate" type="text/html" href="https://www.codemadness.org/xargs.html" />
+ <id>https://www.codemadness.org/xargs.html</id>
+ <updated>2023-11-22T00:00:00Z</updated>
+ <published>2023-11-22T00:00:00Z</published>
+ <author>
+ <name>Hiltjo</name>
+ <uri>https://www.codemadness.org</uri>
+ </author>
+ <summary>xargs: an example for batch jobs</summary>
+</entry>
+<entry>
<title>Improved Youtube RSS/Atom feed</title>
<link rel="alternate" type="text/html" href="https://www.codemadness.org/youtube-feed.html" />
<id>https://www.codemadness.org/youtube-feed.html</id>
(DIR) diff --git a/output/atom_content.xml b/output/atom_content.xml
@@ -2,11 +2,190 @@
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
<title>Codemadness</title>
<subtitle>blog with various projects and articles about computer-related things</subtitle>
- <updated>2023-11-20T00:00:00Z</updated>
+ <updated>2023-11-22T00:00:00Z</updated>
<link rel="alternate" type="text/html" href="https://www.codemadness.org" />
<id>https://www.codemadness.org/atom_content.xml</id>
<link rel="self" type="application/atom+xml" href="https://www.codemadness.org/atom_content.xml" />
<entry>
+ <title>xargs: an example for batch jobs</title>
+ <link rel="alternate" type="text/html" href="https://www.codemadness.org/xargs.html" />
+ <id>https://www.codemadness.org/xargs.html</id>
+ <updated>2023-11-22T00:00:00Z</updated>
+ <published>2023-11-22T00:00:00Z</published>
+ <author>
+ <name>Hiltjo</name>
+ <uri>https://www.codemadness.org</uri>
+ </author>
+ <summary>xargs: an example for batch jobs</summary>
+ <content type="html"><![CDATA[<h1>xargs: an example for batch jobs</h1>
+ <p><strong>Last modification on </strong> <time>2023-11-22</time></p>
+ <p>This describes a simple shellscript programming pattern to process a list of
+jobs in parallel. This script example is contained in one file.</p>
+<h1>Simple but less optimal example</h1>
+<pre><code>#!/bin/sh
+maxjobs=4
+
+# fake program for example purposes.
+someprogram() {
+ echo "Yep yep, I'm totally a real program!"
+ sleep "$1"
+}
+
+# run(arg1, arg2)
+run() {
+ echo "[$1] $2 started" >&2
+ someprogram "$1" >/dev/null
+ status="$?"
+ echo "[$1] $2 done" >&2
+ return "$status"
+}
+
+# process the jobs.
+j=1
+for f in 1 2 3 4 5 6 7 8 9 10; do
+ run "$f" "something" &
+
+ jm=$((j % maxjobs)) # shell arithmetic: modulo
+ test "$jm" = "0" && wait
+ j=$((j+1))
+done
+wait
+</code></pre>
+<h1>Why is this less optimal</h1>
+<p>This is less optimal because it waits until all jobs in the same batch are finished
+(each batch contain $maxjobs items).</p>
+<p>For example with 2 items per batch and 4 total jobs it could be:</p>
+<ul>
+<li>Job 1 is started.</li>
+<li>Job 2 is started.</li>
+<li>Job 2 is done.</li>
+<li>Job 1 is done.</li>
+<li>Wait: wait on process status of all background processes.</li>
+<li>Job 3 in new batch is started.</li>
+</ul>
+<p>This could be optimized to:</p>
+<ul>
+<li>Job 1 is started.</li>
+<li>Job 2 is started.</li>
+<li>Job 2 is done.</li>
+<li>Job 3 in new batch is started (immediately).</li>
+<li>Job 1 is done.</li>
+<li>...</li>
+</ul>
+<p>It also does not handle signals such as SIGINT (^C). However the xargs example
+below does:</p>
+<h1>Example</h1>
+<pre><code>#!/bin/sh
+maxjobs=4
+
+# fake program for example purposes.
+someprogram() {
+ echo "Yep yep, I'm totally a real program!"
+ sleep "$1"
+}
+
+# run(arg1, arg2)
+run() {
+ echo "[$1] $2 started" >&2
+ someprogram "$1" >/dev/null
+ status="$?"
+ echo "[$1] $2 done" >&2
+ return "$status"
+}
+
+# child process job.
+if test "$CHILD_MODE" = "1"; then
+ run "$1" "$2"
+ exit "$?"
+fi
+
+# generate a list of jobs for processing.
+list() {
+ for f in 1 2 3 4 5 6 7 8 9 10; do
+ printf '%s\0%s\0' "$f" "something"
+ done
+}
+
+# process jobs in parallel.
+list | CHILD_MODE="1" xargs -r -0 -P "${maxjobs}" -L 2 "$(readlink -f "$0")"
+</code></pre>
+<h1>Run and timings</h1>
+<p>Although the above example is kindof stupid, it already shows the queueing of
+jobs is more efficient.</p>
+<p>Script 1:</p>
+<pre><code>time ./script1.sh
+[...snip snip...]
+real 0m22.095s
+</code></pre>
+<p>Script 2:</p>
+<pre><code>time ./script2.sh
+[...snip snip...]
+real 0m18.120s
+</code></pre>
+<h1>How it works</h1>
+<p>The parent process:</p>
+<ul>
+<li>The parent, using xargs, handles the queue of jobs and schedules the jobs to
+execute as a child process.</li>
+<li>The list function writes the parameters to stdout. These parameters are
+separated by the NUL byte separator. The NUL byte separator is used because
+this character cannot be used in filenames (which can contain spaces or even
+newlines) and cannot be used in text (the NUL byte terminates the buffer for
+a string).</li>
+<li>The -L option must match the amount of arguments that are specified for the
+job. It will split the specified parameters per job.</li>
+<li>The expression "$(readlink -f "$0")" gets the absolute path to the
+shellscript itself. This is passed as the executable to run for xargs.</li>
+<li>xargs calls the script itself with the specified parameters it is being fed.
+The environment variable $CHILD_MODE is set to indicate to the script itself
+it is run as a child process of the script.</li>
+</ul>
+<p>The child process:</p>
+<ul>
+<li><p>The command-line arguments are passed by the parent using xargs.</p>
+</li>
+<li><p>The environment variable $CHILD_MODE is set to indicate to the script itself
+it is run as a child process of the script.</p>
+</li>
+<li><p>The script itself (ran in child-mode process) only executes the task and
+signals its status back to xargs and the parent.</p>
+</li>
+<li><p>The exit status of the child program is signaled to xargs. This could be
+handled, for example to stop on the first failure (in this example it is not).
+For example if the program is killed, stopped or the exit status is 255 then
+xargs stops running also.</p>
+</li>
+</ul>
+<h1>xargs -P and portability</h1>
+<p>Note that some of the options, like -P are as of writing (2023) non-POSIX:
+<a href="https://pubs.opengroup.org/onlinepubs/9699919799/">https://pubs.opengroup.org/onlinepubs/9699919799/</a>.
+However many systems support this useful extension.</p>
+<h1>Explanation of used xargs options:</h1>
+<p>From the OpenBSD man page: <a href="https://man.openbsd.org/xargs">https://man.openbsd.org/xargs</a></p>
+<pre><code>xargs - construct argument list(s) and execute utility
+</code></pre>
+<p>Options explained:</p>
+<ul>
+<li>-r: Do not run the command if there are no arguments. Normally the command
+is executed at least once even if there are no arguments.</li>
+<li>-0: Change xargs to expect NUL ('\0') characters as separators, instead of
+spaces and newlines.</li>
+<li>-P maxprocs: Parallel mode: run at most maxprocs invocations of utility
+at once.</li>
+<li>-L number: Call utility for every number of non-empty lines read. A line
+ending in unescaped white space and the next non-empty line are considered
+to form one single line. If EOF is reached and fewer than number lines have
+been read then utility will be called with the available lines.</li>
+</ul>
+<h1>References</h1>
+<ul>
+<li>xargs: <a href="https://man.openbsd.org/xargs">https://man.openbsd.org/xargs</a></li>
+<li>printf: <a href="https://man.openbsd.org/printf">https://man.openbsd.org/printf</a></li>
+<li>wait(2): <a href="https://man.openbsd.org/wait">https://man.openbsd.org/wait</a></li>
+</ul>
+]]></content>
+</entry>
+<entry>
<title>Improved Youtube RSS/Atom feed</title>
<link rel="alternate" type="text/html" href="https://www.codemadness.org/youtube-feed.html" />
<id>https://www.codemadness.org/youtube-feed.html</id>
(DIR) diff --git a/output/index b/output/index
@@ -11,6 +11,7 @@ i codemadness.org 70
i codemadness.org 70
iPhlog posts codemadness.org 70
i codemadness.org 70
+12023-11-22 xargs: an example for batch jobs /phlog/xargs codemadness.org 70
12023-11-20 Improved Youtube RSS/Atom feed /phlog/youtube-feed codemadness.org 70
12023-10-25 Setup your own mail paste service /phlog/mailservice codemadness.org 70
12022-07-01 A simple TODO application /phlog/todo codemadness.org 70
(DIR) diff --git a/output/index.html b/output/index.html
@@ -40,6 +40,7 @@
<div id="main">
<h1>Posts</h1>
<table>
+<tr><td><time>2023-11-22</time></td><td><a href="xargs.html">xargs: an example for batch jobs</a></td></tr>
<tr><td><time>2023-11-20</time></td><td><a href="youtube-feed.html">Improved Youtube RSS/Atom feed</a></td></tr>
<tr><td><time>2023-10-25</time></td><td><a href="mailservice.html">Setup your own mail paste service</a></td></tr>
<tr><td><time>2022-07-01</time></td><td><a href="todo-application.html">A simple TODO application</a></td></tr>
(DIR) diff --git a/output/rss.xml b/output/rss.xml
@@ -7,6 +7,14 @@
<description>blog with various projects and articles about computer-related things</description>
<link>https://www.codemadness.org</link>
<item>
+ <title>xargs: an example for batch jobs</title>
+ <link>https://www.codemadness.org/xargs.html</link>
+ <guid>https://www.codemadness.org/xargs.html</guid>
+ <dc:date>2023-11-22T00:00:00Z</dc:date>
+ <author>Hiltjo</author>
+ <description>xargs: an example for batch jobs</description>
+</item>
+<item>
<title>Improved Youtube RSS/Atom feed</title>
<link>https://www.codemadness.org/youtube-feed.html</link>
<guid>https://www.codemadness.org/youtube-feed.html</guid>
(DIR) diff --git a/output/rss_content.xml b/output/rss_content.xml
@@ -7,6 +7,180 @@
<description>blog with various projects and articles about computer-related things</description>
<link>https://www.codemadness.org</link>
<item>
+ <title>xargs: an example for batch jobs</title>
+ <link>https://www.codemadness.org/xargs.html</link>
+ <guid>https://www.codemadness.org/xargs.html</guid>
+ <dc:date>2023-11-22T00:00:00Z</dc:date>
+ <author>Hiltjo</author>
+ <description><![CDATA[<h1>xargs: an example for batch jobs</h1>
+ <p><strong>Last modification on </strong> <time>2023-11-22</time></p>
+ <p>This describes a simple shellscript programming pattern to process a list of
+jobs in parallel. This script example is contained in one file.</p>
+<h1>Simple but less optimal example</h1>
+<pre><code>#!/bin/sh
+maxjobs=4
+
+# fake program for example purposes.
+someprogram() {
+ echo "Yep yep, I'm totally a real program!"
+ sleep "$1"
+}
+
+# run(arg1, arg2)
+run() {
+ echo "[$1] $2 started" >&2
+ someprogram "$1" >/dev/null
+ status="$?"
+ echo "[$1] $2 done" >&2
+ return "$status"
+}
+
+# process the jobs.
+j=1
+for f in 1 2 3 4 5 6 7 8 9 10; do
+ run "$f" "something" &
+
+ jm=$((j % maxjobs)) # shell arithmetic: modulo
+ test "$jm" = "0" && wait
+ j=$((j+1))
+done
+wait
+</code></pre>
+<h1>Why is this less optimal</h1>
+<p>This is less optimal because it waits until all jobs in the same batch are finished
+(each batch contain $maxjobs items).</p>
+<p>For example with 2 items per batch and 4 total jobs it could be:</p>
+<ul>
+<li>Job 1 is started.</li>
+<li>Job 2 is started.</li>
+<li>Job 2 is done.</li>
+<li>Job 1 is done.</li>
+<li>Wait: wait on process status of all background processes.</li>
+<li>Job 3 in new batch is started.</li>
+</ul>
+<p>This could be optimized to:</p>
+<ul>
+<li>Job 1 is started.</li>
+<li>Job 2 is started.</li>
+<li>Job 2 is done.</li>
+<li>Job 3 in new batch is started (immediately).</li>
+<li>Job 1 is done.</li>
+<li>...</li>
+</ul>
+<p>It also does not handle signals such as SIGINT (^C). However the xargs example
+below does:</p>
+<h1>Example</h1>
+<pre><code>#!/bin/sh
+maxjobs=4
+
+# fake program for example purposes.
+someprogram() {
+ echo "Yep yep, I'm totally a real program!"
+ sleep "$1"
+}
+
+# run(arg1, arg2)
+run() {
+ echo "[$1] $2 started" >&2
+ someprogram "$1" >/dev/null
+ status="$?"
+ echo "[$1] $2 done" >&2
+ return "$status"
+}
+
+# child process job.
+if test "$CHILD_MODE" = "1"; then
+ run "$1" "$2"
+ exit "$?"
+fi
+
+# generate a list of jobs for processing.
+list() {
+ for f in 1 2 3 4 5 6 7 8 9 10; do
+ printf '%s\0%s\0' "$f" "something"
+ done
+}
+
+# process jobs in parallel.
+list | CHILD_MODE="1" xargs -r -0 -P "${maxjobs}" -L 2 "$(readlink -f "$0")"
+</code></pre>
+<h1>Run and timings</h1>
+<p>Although the above example is kindof stupid, it already shows the queueing of
+jobs is more efficient.</p>
+<p>Script 1:</p>
+<pre><code>time ./script1.sh
+[...snip snip...]
+real 0m22.095s
+</code></pre>
+<p>Script 2:</p>
+<pre><code>time ./script2.sh
+[...snip snip...]
+real 0m18.120s
+</code></pre>
+<h1>How it works</h1>
+<p>The parent process:</p>
+<ul>
+<li>The parent, using xargs, handles the queue of jobs and schedules the jobs to
+execute as a child process.</li>
+<li>The list function writes the parameters to stdout. These parameters are
+separated by the NUL byte separator. The NUL byte separator is used because
+this character cannot be used in filenames (which can contain spaces or even
+newlines) and cannot be used in text (the NUL byte terminates the buffer for
+a string).</li>
+<li>The -L option must match the amount of arguments that are specified for the
+job. It will split the specified parameters per job.</li>
+<li>The expression "$(readlink -f "$0")" gets the absolute path to the
+shellscript itself. This is passed as the executable to run for xargs.</li>
+<li>xargs calls the script itself with the specified parameters it is being fed.
+The environment variable $CHILD_MODE is set to indicate to the script itself
+it is run as a child process of the script.</li>
+</ul>
+<p>The child process:</p>
+<ul>
+<li><p>The command-line arguments are passed by the parent using xargs.</p>
+</li>
+<li><p>The environment variable $CHILD_MODE is set to indicate to the script itself
+it is run as a child process of the script.</p>
+</li>
+<li><p>The script itself (ran in child-mode process) only executes the task and
+signals its status back to xargs and the parent.</p>
+</li>
+<li><p>The exit status of the child program is signaled to xargs. This could be
+handled, for example to stop on the first failure (in this example it is not).
+For example if the program is killed, stopped or the exit status is 255 then
+xargs stops running also.</p>
+</li>
+</ul>
+<h1>xargs -P and portability</h1>
+<p>Note that some of the options, like -P are as of writing (2023) non-POSIX:
+<a href="https://pubs.opengroup.org/onlinepubs/9699919799/">https://pubs.opengroup.org/onlinepubs/9699919799/</a>.
+However many systems support this useful extension.</p>
+<h1>Explanation of used xargs options:</h1>
+<p>From the OpenBSD man page: <a href="https://man.openbsd.org/xargs">https://man.openbsd.org/xargs</a></p>
+<pre><code>xargs - construct argument list(s) and execute utility
+</code></pre>
+<p>Options explained:</p>
+<ul>
+<li>-r: Do not run the command if there are no arguments. Normally the command
+is executed at least once even if there are no arguments.</li>
+<li>-0: Change xargs to expect NUL ('\0') characters as separators, instead of
+spaces and newlines.</li>
+<li>-P maxprocs: Parallel mode: run at most maxprocs invocations of utility
+at once.</li>
+<li>-L number: Call utility for every number of non-empty lines read. A line
+ending in unescaped white space and the next non-empty line are considered
+to form one single line. If EOF is reached and fewer than number lines have
+been read then utility will be called with the available lines.</li>
+</ul>
+<h1>References</h1>
+<ul>
+<li>xargs: <a href="https://man.openbsd.org/xargs">https://man.openbsd.org/xargs</a></li>
+<li>printf: <a href="https://man.openbsd.org/printf">https://man.openbsd.org/printf</a></li>
+<li>wait(2): <a href="https://man.openbsd.org/wait">https://man.openbsd.org/wait</a></li>
+</ul>
+]]></description>
+</item>
+<item>
<title>Improved Youtube RSS/Atom feed</title>
<link>https://www.codemadness.org/youtube-feed.html</link>
<guid>https://www.codemadness.org/youtube-feed.html</guid>
(DIR) diff --git a/output/sitemap.xml b/output/sitemap.xml
@@ -1,6 +1,10 @@
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
+ <loc>https://www.codemadness.org/xargs.html</loc>
+ <lastmod>2023-11-22</lastmod>
+</url>
+<url>
<loc>https://www.codemadness.org/youtube-feed.html</loc>
<lastmod>2023-11-20</lastmod>
</url>
(DIR) diff --git a/output/twtxt.txt b/output/twtxt.txt
@@ -1,3 +1,4 @@
+2023-11-22T00:00:00Z xargs: an example for batch jobs: https://www.codemadness.org/xargs.html
2023-11-20T00:00:00Z Improved Youtube RSS/Atom feed: https://www.codemadness.org/youtube-feed.html
2023-10-25T00:00:00Z Setup your own mail paste service: https://www.codemadness.org/mailservice.html
2022-07-01T00:00:00Z A simple TODO application: https://www.codemadness.org/todo-application.html
(DIR) diff --git a/output/urllist.txt b/output/urllist.txt
@@ -1,3 +1,4 @@
+https://www.codemadness.org/xargs.html
https://www.codemadness.org/youtube-feed.html
https://www.codemadness.org/mailservice.html
https://www.codemadness.org/todo-application.html
(DIR) diff --git a/output/xargs.html b/output/xargs.html
@@ -0,0 +1,218 @@
+<!DOCTYPE html>
+<html dir="ltr" lang="en">
+<head>
+ <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
+ <meta http-equiv="Content-Language" content="en" />
+ <meta name="viewport" content="width=device-width" />
+ <meta name="keywords" content="xargs, wow hyper speed" />
+ <meta name="description" content="xargs: an example for batch jobs" />
+ <meta name="author" content="Hiltjo" />
+ <meta name="generator" content="Static content generated using saait: https://codemadness.org/saait.html" />
+ <title>xargs: an example for batch jobs - Codemadness</title>
+ <link rel="stylesheet" href="style.css" type="text/css" media="screen" />
+ <link rel="stylesheet" href="print.css" type="text/css" media="print" />
+ <link rel="alternate" href="atom.xml" type="application/atom+xml" title="Codemadness Atom Feed" />
+ <link rel="alternate" href="atom_content.xml" type="application/atom+xml" title="Codemadness Atom Feed with content" />
+ <link rel="icon" href="/favicon.png" type="image/png" />
+</head>
+<body>
+ <nav id="menuwrap">
+ <table id="menu" width="100%" border="0">
+ <tr>
+ <td id="links" align="left">
+ <a href="index.html">Blog</a> |
+ <a href="/git/" title="Git repository with some of my projects">Git</a> |
+ <a href="/releases/">Releases</a> |
+ <a href="gopher://codemadness.org">Gopherhole</a>
+ </td>
+ <td id="links-contact" align="right">
+ <span class="hidden"> | </span>
+ <a href="/donate/">Donate</a> |
+ <a href="feeds.html">Feeds</a> |
+ <a href="pgp.asc">PGP</a> |
+ <a href="mailto:hiltjo@AT@codemadness.DOT.org">Mail</a>
+ </td>
+ </tr>
+ </table>
+ </nav>
+ <hr class="hidden" />
+ <main id="mainwrap">
+ <div id="main">
+ <article>
+<header>
+ <h1>xargs: an example for batch jobs</h1>
+ <p>
+ <strong>Last modification on </strong> <time>2023-11-22</time>
+ </p>
+</header>
+
+<p>This describes a simple shellscript programming pattern to process a list of
+jobs in parallel. This script example is contained in one file.</p>
+<h1>Simple but less optimal example</h1>
+<pre><code>#!/bin/sh
+maxjobs=4
+
+# fake program for example purposes.
+someprogram() {
+ echo "Yep yep, I'm totally a real program!"
+ sleep "$1"
+}
+
+# run(arg1, arg2)
+run() {
+ echo "[$1] $2 started" >&2
+ someprogram "$1" >/dev/null
+ status="$?"
+ echo "[$1] $2 done" >&2
+ return "$status"
+}
+
+# process the jobs.
+j=1
+for f in 1 2 3 4 5 6 7 8 9 10; do
+ run "$f" "something" &
+
+ jm=$((j % maxjobs)) # shell arithmetic: modulo
+ test "$jm" = "0" && wait
+ j=$((j+1))
+done
+wait
+</code></pre>
+<h1>Why is this less optimal</h1>
+<p>This is less optimal because it waits until all jobs in the same batch are finished
+(each batch contain $maxjobs items).</p>
+<p>For example with 2 items per batch and 4 total jobs it could be:</p>
+<ul>
+<li>Job 1 is started.</li>
+<li>Job 2 is started.</li>
+<li>Job 2 is done.</li>
+<li>Job 1 is done.</li>
+<li>Wait: wait on process status of all background processes.</li>
+<li>Job 3 in new batch is started.</li>
+</ul>
+<p>This could be optimized to:</p>
+<ul>
+<li>Job 1 is started.</li>
+<li>Job 2 is started.</li>
+<li>Job 2 is done.</li>
+<li>Job 3 in new batch is started (immediately).</li>
+<li>Job 1 is done.</li>
+<li>...</li>
+</ul>
+<p>It also does not handle signals such as SIGINT (^C). However the xargs example
+below does:</p>
+<h1>Example</h1>
+<pre><code>#!/bin/sh
+maxjobs=4
+
+# fake program for example purposes.
+someprogram() {
+ echo "Yep yep, I'm totally a real program!"
+ sleep "$1"
+}
+
+# run(arg1, arg2)
+run() {
+ echo "[$1] $2 started" >&2
+ someprogram "$1" >/dev/null
+ status="$?"
+ echo "[$1] $2 done" >&2
+ return "$status"
+}
+
+# child process job.
+if test "$CHILD_MODE" = "1"; then
+ run "$1" "$2"
+ exit "$?"
+fi
+
+# generate a list of jobs for processing.
+list() {
+ for f in 1 2 3 4 5 6 7 8 9 10; do
+ printf '%s\0%s\0' "$f" "something"
+ done
+}
+
+# process jobs in parallel.
+list | CHILD_MODE="1" xargs -r -0 -P "${maxjobs}" -L 2 "$(readlink -f "$0")"
+</code></pre>
+<h1>Run and timings</h1>
+<p>Although the above example is kindof stupid, it already shows the queueing of
+jobs is more efficient.</p>
+<p>Script 1:</p>
+<pre><code>time ./script1.sh
+[...snip snip...]
+real 0m22.095s
+</code></pre>
+<p>Script 2:</p>
+<pre><code>time ./script2.sh
+[...snip snip...]
+real 0m18.120s
+</code></pre>
+<h1>How it works</h1>
+<p>The parent process:</p>
+<ul>
+<li>The parent, using xargs, handles the queue of jobs and schedules the jobs to
+execute as a child process.</li>
+<li>The list function writes the parameters to stdout. These parameters are
+separated by the NUL byte separator. The NUL byte separator is used because
+this character cannot be used in filenames (which can contain spaces or even
+newlines) and cannot be used in text (the NUL byte terminates the buffer for
+a string).</li>
+<li>The -L option must match the amount of arguments that are specified for the
+job. It will split the specified parameters per job.</li>
+<li>The expression "$(readlink -f "$0")" gets the absolute path to the
+shellscript itself. This is passed as the executable to run for xargs.</li>
+<li>xargs calls the script itself with the specified parameters it is being fed.
+The environment variable $CHILD_MODE is set to indicate to the script itself
+it is run as a child process of the script.</li>
+</ul>
+<p>The child process:</p>
+<ul>
+<li><p>The command-line arguments are passed by the parent using xargs.</p>
+</li>
+<li><p>The environment variable $CHILD_MODE is set to indicate to the script itself
+it is run as a child process of the script.</p>
+</li>
+<li><p>The script itself (ran in child-mode process) only executes the task and
+signals its status back to xargs and the parent.</p>
+</li>
+<li><p>The exit status of the child program is signaled to xargs. This could be
+handled, for example to stop on the first failure (in this example it is not).
+For example if the program is killed, stopped or the exit status is 255 then
+xargs stops running also.</p>
+</li>
+</ul>
+<h1>xargs -P and portability</h1>
+<p>Note that some of the options, like -P are as of writing (2023) non-POSIX:
+<a href="https://pubs.opengroup.org/onlinepubs/9699919799/">https://pubs.opengroup.org/onlinepubs/9699919799/</a>.
+However many systems support this useful extension.</p>
+<h1>Explanation of used xargs options:</h1>
+<p>From the OpenBSD man page: <a href="https://man.openbsd.org/xargs">https://man.openbsd.org/xargs</a></p>
+<pre><code>xargs - construct argument list(s) and execute utility
+</code></pre>
+<p>Options explained:</p>
+<ul>
+<li>-r: Do not run the command if there are no arguments. Normally the command
+is executed at least once even if there are no arguments.</li>
+<li>-0: Change xargs to expect NUL ('\0') characters as separators, instead of
+spaces and newlines.</li>
+<li>-P maxprocs: Parallel mode: run at most maxprocs invocations of utility
+at once.</li>
+<li>-L number: Call utility for every number of non-empty lines read. A line
+ending in unescaped white space and the next non-empty line are considered
+to form one single line. If EOF is reached and fewer than number lines have
+been read then utility will be called with the available lines.</li>
+</ul>
+<h1>References</h1>
+<ul>
+<li>xargs: <a href="https://man.openbsd.org/xargs">https://man.openbsd.org/xargs</a></li>
+<li>printf: <a href="https://man.openbsd.org/printf">https://man.openbsd.org/printf</a></li>
+<li>wait(2): <a href="https://man.openbsd.org/wait">https://man.openbsd.org/wait</a></li>
+</ul>
+
+ </article>
+ </div>
+ </main>
+</body>
+</html>
(DIR) diff --git a/output/xargs.md b/output/xargs.md
@@ -0,0 +1,188 @@
+This describes a simple shellscript programming pattern to process a list of
+jobs in parallel. This script example is contained in one file.
+
+
+# Simple but less optimal example
+
+ #!/bin/sh
+ maxjobs=4
+
+ # fake program for example purposes.
+ someprogram() {
+ echo "Yep yep, I'm totally a real program!"
+ sleep "$1"
+ }
+
+ # run(arg1, arg2)
+ run() {
+ echo "[$1] $2 started" >&2
+ someprogram "$1" >/dev/null
+ status="$?"
+ echo "[$1] $2 done" >&2
+ return "$status"
+ }
+
+ # process the jobs.
+ j=1
+ for f in 1 2 3 4 5 6 7 8 9 10; do
+ run "$f" "something" &
+
+ jm=$((j % maxjobs)) # shell arithmetic: modulo
+ test "$jm" = "0" && wait
+ j=$((j+1))
+ done
+ wait
+
+
+# Why is this less optimal
+
+This is less optimal because it waits until all jobs in the same batch are finished
+(each batch contain $maxjobs items).
+
+For example with 2 items per batch and 4 total jobs it could be:
+
+* Job 1 is started.
+* Job 2 is started.
+* Job 2 is done.
+* Job 1 is done.
+* Wait: wait on process status of all background processes.
+* Job 3 in new batch is started.
+
+
+This could be optimized to:
+
+* Job 1 is started.
+* Job 2 is started.
+* Job 2 is done.
+* Job 3 in new batch is started (immediately).
+* Job 1 is done.
+* ...
+
+
+It also does not handle signals such as SIGINT (^C). However the xargs example
+below does:
+
+
+# Example
+
+ #!/bin/sh
+ maxjobs=4
+
+ # fake program for example purposes.
+ someprogram() {
+ echo "Yep yep, I'm totally a real program!"
+ sleep "$1"
+ }
+
+ # run(arg1, arg2)
+ run() {
+ echo "[$1] $2 started" >&2
+ someprogram "$1" >/dev/null
+ status="$?"
+ echo "[$1] $2 done" >&2
+ return "$status"
+ }
+
+ # child process job.
+ if test "$CHILD_MODE" = "1"; then
+ run "$1" "$2"
+ exit "$?"
+ fi
+
+ # generate a list of jobs for processing.
+ list() {
+ for f in 1 2 3 4 5 6 7 8 9 10; do
+ printf '%s\0%s\0' "$f" "something"
+ done
+ }
+
+ # process jobs in parallel.
+ list | CHILD_MODE="1" xargs -r -0 -P "${maxjobs}" -L 2 "$(readlink -f "$0")"
+
+
+# Run and timings
+
+Although the above example is kindof stupid, it already shows the queueing of
+jobs is more efficient.
+
+Script 1:
+
+ time ./script1.sh
+ [...snip snip...]
+ real 0m22.095s
+
+Script 2:
+
+ time ./script2.sh
+ [...snip snip...]
+ real 0m18.120s
+
+
+# How it works
+
+The parent process:
+
+* The parent, using xargs, handles the queue of jobs and schedules the jobs to
+ execute as a child process.
+* The list function writes the parameters to stdout. These parameters are
+ separated by the NUL byte separator. The NUL byte separator is used because
+ this character cannot be used in filenames (which can contain spaces or even
+ newlines) and cannot be used in text (the NUL byte terminates the buffer for
+ a string).
+* The -L option must match the amount of arguments that are specified for the
+ job. It will split the specified parameters per job.
+* The expression "$(readlink -f "$0")" gets the absolute path to the
+ shellscript itself. This is passed as the executable to run for xargs.
+* xargs calls the script itself with the specified parameters it is being fed.
+ The environment variable $CHILD_MODE is set to indicate to the script itself
+ it is run as a child process of the script.
+
+
+The child process:
+
+* The command-line arguments are passed by the parent using xargs.
+
+* The environment variable $CHILD_MODE is set to indicate to the script itself
+ it is run as a child process of the script.
+
+* The script itself (ran in child-mode process) only executes the task and
+ signals its status back to xargs and the parent.
+
+* The exit status of the child program is signaled to xargs. This could be
+ handled, for example to stop on the first failure (in this example it is not).
+ For example if the program is killed, stopped or the exit status is 255 then
+ xargs stops running also.
+
+
+# xargs -P and portability
+
+Note that some of the options, like -P are as of writing (2023) non-POSIX:
+<https://pubs.opengroup.org/onlinepubs/9699919799/>.
+However many systems support this useful extension.
+
+
+# Explanation of used xargs options:
+
+From the OpenBSD man page: <https://man.openbsd.org/xargs>
+
+ xargs - construct argument list(s) and execute utility
+
+Options explained:
+
+* -r: Do not run the command if there are no arguments. Normally the command
+ is executed at least once even if there are no arguments.
+* -0: Change xargs to expect NUL ('\0') characters as separators, instead of
+ spaces and newlines.
+* -P maxprocs: Parallel mode: run at most maxprocs invocations of utility
+ at once.
+* -L number: Call utility for every number of non-empty lines read. A line
+ ending in unescaped white space and the next non-empty line are considered
+ to form one single line. If EOF is reached and fewer than number lines have
+ been read then utility will be called with the available lines.
+
+
+# References
+
+* xargs: <https://man.openbsd.org/xargs>
+* printf: <https://man.openbsd.org/printf>
+* wait(2): <https://man.openbsd.org/wait>
(DIR) diff --git a/pages/xargs.cfg b/pages/xargs.cfg
@@ -0,0 +1,6 @@
+title = xargs: an example for batch jobs
+id = xargs
+description = xargs: an example for batch jobs
+keywords = xargs, wow hyper speed
+created = 2023-11-22
+updated = 2023-11-22
(DIR) diff --git a/pages/xargs.md b/pages/xargs.md
@@ -0,0 +1,188 @@
+This describes a simple shellscript programming pattern to process a list of
+jobs in parallel. This script example is contained in one file.
+
+
+# Simple but less optimal example
+
+ #!/bin/sh
+ maxjobs=4
+
+ # fake program for example purposes.
+ someprogram() {
+ echo "Yep yep, I'm totally a real program!"
+ sleep "$1"
+ }
+
+ # run(arg1, arg2)
+ run() {
+ echo "[$1] $2 started" >&2
+ someprogram "$1" >/dev/null
+ status="$?"
+ echo "[$1] $2 done" >&2
+ return "$status"
+ }
+
+ # process the jobs.
+ j=1
+ for f in 1 2 3 4 5 6 7 8 9 10; do
+ run "$f" "something" &
+
+ jm=$((j % maxjobs)) # shell arithmetic: modulo
+ test "$jm" = "0" && wait
+ j=$((j+1))
+ done
+ wait
+
+
+# Why is this less optimal
+
+This is less optimal because it waits until all jobs in the same batch are finished
+(each batch contain $maxjobs items).
+
+For example with 2 items per batch and 4 total jobs it could be:
+
+* Job 1 is started.
+* Job 2 is started.
+* Job 2 is done.
+* Job 1 is done.
+* Wait: wait on process status of all background processes.
+* Job 3 in new batch is started.
+
+
+This could be optimized to:
+
+* Job 1 is started.
+* Job 2 is started.
+* Job 2 is done.
+* Job 3 in new batch is started (immediately).
+* Job 1 is done.
+* ...
+
+
+It also does not handle signals such as SIGINT (^C). However the xargs example
+below does:
+
+
+# Example
+
+ #!/bin/sh
+ maxjobs=4
+
+ # fake program for example purposes.
+ someprogram() {
+ echo "Yep yep, I'm totally a real program!"
+ sleep "$1"
+ }
+
+ # run(arg1, arg2)
+ run() {
+ echo "[$1] $2 started" >&2
+ someprogram "$1" >/dev/null
+ status="$?"
+ echo "[$1] $2 done" >&2
+ return "$status"
+ }
+
+ # child process job.
+ if test "$CHILD_MODE" = "1"; then
+ run "$1" "$2"
+ exit "$?"
+ fi
+
+ # generate a list of jobs for processing.
+ list() {
+ for f in 1 2 3 4 5 6 7 8 9 10; do
+ printf '%s\0%s\0' "$f" "something"
+ done
+ }
+
+ # process jobs in parallel.
+ list | CHILD_MODE="1" xargs -r -0 -P "${maxjobs}" -L 2 "$(readlink -f "$0")"
+
+
+# Run and timings
+
+Although the above example is kindof stupid, it already shows the queueing of
+jobs is more efficient.
+
+Script 1:
+
+ time ./script1.sh
+ [...snip snip...]
+ real 0m22.095s
+
+Script 2:
+
+ time ./script2.sh
+ [...snip snip...]
+ real 0m18.120s
+
+
+# How it works
+
+The parent process:
+
+* The parent, using xargs, handles the queue of jobs and schedules the jobs to
+ execute as a child process.
+* The list function writes the parameters to stdout. These parameters are
+ separated by the NUL byte separator. The NUL byte separator is used because
+ this character cannot be used in filenames (which can contain spaces or even
+ newlines) and cannot be used in text (the NUL byte terminates the buffer for
+ a string).
+* The -L option must match the amount of arguments that are specified for the
+ job. It will split the specified parameters per job.
+* The expression "$(readlink -f "$0")" gets the absolute path to the
+ shellscript itself. This is passed as the executable to run for xargs.
+* xargs calls the script itself with the specified parameters it is being fed.
+ The environment variable $CHILD_MODE is set to indicate to the script itself
+ it is run as a child process of the script.
+
+
+The child process:
+
+* The command-line arguments are passed by the parent using xargs.
+
+* The environment variable $CHILD_MODE is set to indicate to the script itself
+ it is run as a child process of the script.
+
+* The script itself (ran in child-mode process) only executes the task and
+ signals its status back to xargs and the parent.
+
+* The exit status of the child program is signaled to xargs. This could be
+ handled, for example to stop on the first failure (in this example it is not).
+ For example if the program is killed, stopped or the exit status is 255 then
+ xargs stops running also.
+
+
+# xargs -P and portability
+
+Note that some of the options, like -P are as of writing (2023) non-POSIX:
+<https://pubs.opengroup.org/onlinepubs/9699919799/>.
+However many systems support this useful extension.
+
+
+# Explanation of used xargs options:
+
+From the OpenBSD man page: <https://man.openbsd.org/xargs>
+
+ xargs - construct argument list(s) and execute utility
+
+Options explained:
+
+* -r: Do not run the command if there are no arguments. Normally the command
+ is executed at least once even if there are no arguments.
+* -0: Change xargs to expect NUL ('\0') characters as separators, instead of
+ spaces and newlines.
+* -P maxprocs: Parallel mode: run at most maxprocs invocations of utility
+ at once.
+* -L number: Call utility for every number of non-empty lines read. A line
+ ending in unescaped white space and the next non-empty line are considered
+ to form one single line. If EOF is reached and fewer than number lines have
+ been read then utility will be called with the available lines.
+
+
+# References
+
+* xargs: <https://man.openbsd.org/xargs>
+* printf: <https://man.openbsd.org/printf>
+* wait(2): <https://man.openbsd.org/wait>