GNU Parallel and “fail immediately if any fails” problem

Andrey Voronkov
2 min readMay 23, 2021

--

I thought it will be in and out, 20 minutes adventure to solve the parallel execution problem “fail immediately if any fails” in pure Bash. How wrong I was.

The problem is to run N subprocesses in parallel using &, wait for them all, but fail immediately if any of those subprocesses has failed.

sleep 2 && false &
sleep 5 && true &
# Here we should wait and fail after first subprocess fails in 2 seconds

Built-in Bash wait (with no specific PID provided) either waits for all subprocesses and always returns 0 exit code by default or waits for the first subprocess to exit and returns its exit code (-n option). None of the modes fit our goal.

However, there is a solution to this problem that seems correct to me but it’s not optimal. And here is why: the script waits sequentially for all PIDs one by one in subprocesses spawning order (another wait mode) so the first-spawned task can be very lingering and the script will wait for it to finish in order to check the next one and so on even if some of the next-spawned tasks already failed. It works correctly but doesn’t exit immediately on the first fail. Moreover, it seems ugly and too verbose (needs to maintain an array of PIDs manually and wait for them in a loop afterward).

An attempt to improve the previous solution is incorrect due to the race condition: one of the spawned tasks can exit (and fail) before all tasks are spawned and jobs -p command is called. So we miss a failed task completely from the waiting list.

It needs a tool far more advanced in this area than pure Bash constructs and it is GNU Parallel https://www.gnu.org/software/parallel/. GNU Parallel basics are beyond the scope of this short memo and all you need to know it has plenty of useful bells and whistles for parallel tasks execution pure Bash hasn’t.

Correct solution:

hello="HELLOWORLD"
parallel --halt-on-error 2 --verbose ::: << EOF
sleep 10 && true
sleep 2 && false
sleep 1 \
&& echo "${hello}" \
&& true
EOF

Output:

sleep 10 && true
sleep 2 && false
sleep 1 && echo "HELLOWORLD" && true
HELLOWORLD
parallel: This job failed:
sleep 2 && false

--halt-on-error 2 is the key to the desired behavior:

0 Do not halt if a job fails. Exit status will be the number of jobs failed. This is the default.1 Do not start new jobs if a job fails, but complete the running jobs including cleanup. The exit status will be the exit status from the last failing job.2 Kill off all jobs immediately and exit without cleanup. The exit status will be the exit status from the failing job.

--verbose is used just for debug and reports the failing job.

Passing arguments to parallel via Bash heredoc is clean, concise, supports variables and multiline commands which is vital to most of the regular tasks every admin, DevOps, or programmer does on daily basis.

--

--

Andrey Voronkov
Andrey Voronkov

No responses yet