xargs.html - www.codemadness.org - www.codemadness.org saait content files
(HTM) git clone git://git.codemadness.org/www.codemadness.org
(DIR) Log
(DIR) Files
(DIR) Refs
(DIR) README
(DIR) LICENSE
---
xargs.html (9768B)
---
1 <!DOCTYPE html>
2 <html dir="ltr" lang="en">
3 <head>
4 <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
5 <meta http-equiv="Content-Language" content="en" />
6 <meta name="viewport" content="width=device-width" />
7 <meta name="keywords" content="xargs, wow hyper speed" />
8 <meta name="description" content="xargs: an example for parallel batch jobs" />
9 <meta name="author" content="Hiltjo" />
10 <meta name="generator" content="Static content generated using saait: https://codemadness.org/saait.html" />
11 <title>xargs: an example for parallel batch jobs - Codemadness</title>
12 <link rel="stylesheet" href="style.css" type="text/css" media="screen" />
13 <link rel="stylesheet" href="print.css" type="text/css" media="print" />
14 <link rel="alternate" href="atom.xml" type="application/atom+xml" title="Codemadness Atom Feed" />
15 <link rel="alternate" href="atom_content.xml" type="application/atom+xml" title="Codemadness Atom Feed with content" />
16 <link rel="icon" href="/favicon.png" type="image/png" />
17 </head>
18 <body>
19 <nav id="menuwrap">
20 <table id="menu" width="100%" border="0">
21 <tr>
22 <td id="links" align="left">
23 <a href="index.html">Blog</a> |
24 <a href="/git/" title="Git repository with some of my projects">Git</a> |
25 <a href="/releases/">Releases</a> |
26 <a href="gopher://codemadness.org">Gopherhole</a>
27 </td>
28 <td id="links-contact" align="right">
29 <span class="hidden"> | </span>
30 <a href="feeds.html">Feeds</a> |
31 <a href="pgp.asc">PGP</a> |
32 <a href="mailto:hiltjo@AT@codemadness.DOT.org">Mail</a>
33 </td>
34 </tr>
35 </table>
36 </nav>
37 <hr class="hidden" />
38 <main id="mainwrap">
39 <div id="main">
40 <article>
41 <header>
42 <h1>xargs: an example for parallel batch jobs</h1>
43 <p>
44 <strong>Last modification on </strong> <time>2023-12-17</time>
45 </p>
46 </header>
47
48 <p>This describes a simple shellscript programming pattern to process a list of
49 jobs in parallel. This script example is contained in one file.</p>
50 <h1>Simple but less optimal example</h1>
51 <pre><code>#!/bin/sh
52 maxjobs=4
53
54 # fake program for example purposes.
55 someprogram() {
56 echo "Yep yep, I'm totally a real program!"
57 sleep "$1"
58 }
59
60 # run(arg1, arg2)
61 run() {
62 echo "[$1] $2 started" >&2
63 someprogram "$1" >/dev/null
64 status="$?"
65 echo "[$1] $2 done" >&2
66 return "$status"
67 }
68
69 # process the jobs.
70 j=1
71 for f in 1 2 3 4 5 6 7 8 9 10; do
72 run "$f" "something" &
73
74 jm=$((j % maxjobs)) # shell arithmetic: modulo
75 test "$jm" = "0" && wait
76 j=$((j+1))
77 done
78 wait
79 </code></pre>
80 <h1>Why is this less optimal</h1>
81 <p>This is less optimal because it waits until all jobs in the same batch are finished
82 (each batch contain $maxjobs items).</p>
83 <p>For example with 2 items per batch and 4 total jobs it could be:</p>
84 <ul>
85 <li>Job 1 is started.</li>
86 <li>Job 2 is started.</li>
87 <li>Job 2 is done.</li>
88 <li>Job 1 is done.</li>
89 <li>Wait: wait on process status of all background processes.</li>
90 <li>Job 3 in new batch is started.</li>
91 </ul>
92 <p>This could be optimized to:</p>
93 <ul>
94 <li>Job 1 is started.</li>
95 <li>Job 2 is started.</li>
96 <li>Job 2 is done.</li>
97 <li>Job 3 in new batch is started (immediately).</li>
98 <li>Job 1 is done.</li>
99 <li>...</li>
100 </ul>
101 <p>It also does not handle signals such as SIGINT (^C). However the xargs example
102 below does:</p>
103 <h1>Example</h1>
104 <pre><code>#!/bin/sh
105 maxjobs=4
106
107 # fake program for example purposes.
108 someprogram() {
109 echo "Yep yep, I'm totally a real program!"
110 sleep "$1"
111 }
112
113 # run(arg1, arg2)
114 run() {
115 echo "[$1] $2 started" >&2
116 someprogram "$1" >/dev/null
117 status="$?"
118 echo "[$1] $2 done" >&2
119 return "$status"
120 }
121
122 # child process job.
123 if test "$CHILD_MODE" = "1"; then
124 run "$1" "$2"
125 exit "$?"
126 fi
127
128 # generate a list of jobs for processing.
129 list() {
130 for f in 1 2 3 4 5 6 7 8 9 10; do
131 printf '%s\0%s\0' "$f" "something"
132 done
133 }
134
135 # process jobs in parallel.
136 list | CHILD_MODE="1" xargs -r -0 -P "${maxjobs}" -L 2 "$(readlink -f "$0")"
137 </code></pre>
138 <h1>Run and timings</h1>
139 <p>Although the above example is kindof stupid, it already shows the queueing of
140 jobs is more efficient.</p>
141 <p>Script 1:</p>
142 <pre><code>time ./script1.sh
143 [...snip snip...]
144 real 0m22.095s
145 </code></pre>
146 <p>Script 2:</p>
147 <pre><code>time ./script2.sh
148 [...snip snip...]
149 real 0m18.120s
150 </code></pre>
151 <h1>How it works</h1>
152 <p>The parent process:</p>
153 <ul>
154 <li>The parent, using xargs, handles the queue of jobs and schedules the jobs to
155 execute as a child process.</li>
156 <li>The list function writes the parameters to stdout. These parameters are
157 separated by the NUL byte separator. The NUL byte separator is used because
158 this character cannot be used in filenames (which can contain spaces or even
159 newlines) and cannot be used in text (the NUL byte terminates the buffer for
160 a string).</li>
161 <li>The -L option must match the amount of arguments that are specified for the
162 job. It will split the specified parameters per job.</li>
163 <li>The expression "$(readlink -f "$0")" gets the absolute path to the
164 shellscript itself. This is passed as the executable to run for xargs.</li>
165 <li>xargs calls the script itself with the specified parameters it is being fed.
166 The environment variable $CHILD_MODE is set to indicate to the script itself
167 it is run as a child process of the script.</li>
168 </ul>
169 <p>The child process:</p>
170 <ul>
171 <li><p>The command-line arguments are passed by the parent using xargs.</p>
172 </li>
173 <li><p>The environment variable $CHILD_MODE is set to indicate to the script itself
174 it is run as a child process of the script.</p>
175 </li>
176 <li><p>The script itself (ran in child-mode process) only executes the task and
177 signals its status back to xargs and the parent.</p>
178 </li>
179 <li><p>The exit status of the child program is signaled to xargs. This could be
180 handled, for example to stop on the first failure (in this example it is not).
181 For example if the program is killed, stopped or the exit status is 255 then
182 xargs stops running also.</p>
183 </li>
184 </ul>
185 <h1>Description of used xargs options</h1>
186 <p>From the OpenBSD man page: <a href="https://man.openbsd.org/xargs">https://man.openbsd.org/xargs</a></p>
187 <pre><code>xargs - construct argument list(s) and execute utility
188 </code></pre>
189 <p>Options explained:</p>
190 <ul>
191 <li>-r: Do not run the command if there are no arguments. Normally the command
192 is executed at least once even if there are no arguments.</li>
193 <li>-0: Change xargs to expect NUL ('\0') characters as separators, instead of
194 spaces and newlines.</li>
195 <li>-P maxprocs: Parallel mode: run at most maxprocs invocations of utility
196 at once.</li>
197 <li>-L number: Call utility for every number of non-empty lines read. A line
198 ending in unescaped white space and the next non-empty line are considered
199 to form one single line. If EOF is reached and fewer than number lines have
200 been read then utility will be called with the available lines.</li>
201 </ul>
202 <h1>xargs options -0 and -P, portability and historic context</h1>
203 <p>Some of the options, like -P are as of writing (2023) non-POSIX:
204 <a href="https://pubs.opengroup.org/onlinepubs/9699919799/utilities/xargs.html">https://pubs.opengroup.org/onlinepubs/9699919799/utilities/xargs.html</a>.
205 However many systems support this useful extension for many years now.</p>
206 <p>The specification even mentions implementations which support parallel
207 operations:</p>
208 <p>"The version of xargs required by this volume of POSIX.1-2017 is required to
209 wait for the completion of the invoked command before invoking another command.
210 This was done because historical scripts using xargs assumed sequential
211 execution. Implementations wanting to provide parallel operation of the invoked
212 utilities are encouraged to add an option enabling parallel invocation, but
213 should still wait for termination of all of the children before xargs
214 terminates normally."</p>
215 <p>Some historic context:</p>
216 <p>The xargs -0 option was added on 1996-06-11 by Theo de Raadt, about a year
217 after the NetBSD import (over 27 years ago at the time of writing):</p>
218 <p><a href="http://cvsweb.openbsd.org/cgi-bin/cvsweb/src/usr.bin/xargs/xargs.c?rev=1.2&content-type=text/x-cvsweb-markup">CVS log</a></p>
219 <p>On OpenBSD the xargs -P option was added on 2003-12-06 by syncing the FreeBSD
220 code:</p>
221 <p><a href="http://cvsweb.openbsd.org/cgi-bin/cvsweb/src/usr.bin/xargs/xargs.c?rev=1.14&content-type=text/x-cvsweb-markup">CVS log</a></p>
222 <p>Looking at the imported git history log of GNU findutils (which has xargs), the
223 very first commit already had the -0 and -P option:</p>
224 <p><a href="https://savannah.gnu.org/git/?group=findutils">git log</a></p>
225 <pre><code>commit c030b5ee33bbec3c93cddc3ca9ebec14c24dbe07
226 Author: Kevin Dalley <kevin@seti.org>
227 Date: Sun Feb 4 20:35:16 1996 +0000
228
229 Initial revision
230 </code></pre>
231 <h1>xargs: some incompatibilities found</h1>
232 <ul>
233 <li>Using the -0 option empty fields are handled differently in different
234 implementations.</li>
235 <li>The -n and -L option doesn't work correctly in many of the BSD implementations.
236 Some count empty fields, some don't. In early implementations in FreeBSD and
237 OpenBSD it only processed the first line. In OpenBSD it has been improved
238 around 2017.</li>
239 </ul>
240 <p>Depending on what you want to do a workaround could be to use the -0 option
241 with a single field and use the -n flag. Then in each child program invocation
242 split the field by a separator.</p>
243 <h1>References</h1>
244 <ul>
245 <li>xargs: <a href="https://man.openbsd.org/xargs">https://man.openbsd.org/xargs</a></li>
246 <li>printf: <a href="https://man.openbsd.org/printf">https://man.openbsd.org/printf</a></li>
247 <li>ksh, wait: <a href="https://man.openbsd.org/ksh#wait">https://man.openbsd.org/ksh#wait</a></li>
248 <li>wait(2): <a href="https://man.openbsd.org/wait">https://man.openbsd.org/wait</a></li>
249 </ul>
250
251 </article>
252 </div>
253 </main>
254 </body>
255 </html>