[HN Gopher] Bash Debugging
___________________________________________________________________
Bash Debugging
Author : ColinWright
Score : 342 points
Date : 2024-03-02 00:48 UTC (22 hours ago)
(HTM) web link (wizardzines.com)
(TXT) w3m dump (wizardzines.com)
| memco wrote:
| Good stuff! I use set-x frequently and have used a similar thing
| to die (but Julia's version is nicer). I'll consider using the
| debugger thing but stepping through a bash script line by line
| sounds a bit tedious. Perhaps less so than having to reread a log
| and rerun the script a bunch.
| chatmasta wrote:
| The line-by-line debugging would probably only be useful for a
| particular section of your script that you're trying to fix. In
| that case, you can remove the trap at the end of it with `trap
| - DEBUG`
| halostatue wrote:
| I often add a fail-unless function: fail-
| unless() { local result "$@"
| result=$? if ((result != 0)); then
| echo >2&1 "Failed ${result} with command '$*'."
| exit ${result} fi }
|
| That way, I know exactly what failed in the script.
| BeefySwain wrote:
| How does this work exactly? What calls that function and
| when?
| sureglymop wrote:
| You do e.g. `fail-unless somecommand`. The result
| (exit/return code) is captured in the function and based on
| that, the function logs and exits or not.
| ykonstant wrote:
| You probably meant to reply to BeefySwain, right?
| klysm wrote:
| I always put
|
| set -euxo pipefail
|
| at the top of my bash scripts. It makes some conditional testing
| more difficult but it has paid for itself many times over just
| because of pipefail
| ddlsmurf wrote:
| You can also set it for a bunch of lines then deactivate it
| with `set +x`. It gets rather tedious otherwise...
| mellutussa wrote:
| This is a lifesaver. Although I save the -x until I really need
| to see all the garbage debug output.
| klysm wrote:
| I find it's usually worth the noise to catch weirdness
| mrAssHat wrote:
| https://mywiki.wooledge.org/BashPitfalls#set_-euo_pipefail
| Brian_K_White wrote:
| I have almost that same die() in every script, except I call it
| abrt(). Maybe I'll switch to die() since it's shorter. Mine also
| prepends $0 and sometimes I use printf or echo -e so I can pass
| larger more complex messages with linefeeds and escape codes etc.
| jclulow wrote:
| It's actually possible to produce a stack trace of sorts, if you
| use a lot of bash functions. One possible implementation is:
| https://github.com/TritonDataCenter/sdc-headnode/blob/master...
| drizzleword wrote:
| Another stack trace implementation [1] that allows you to
| write: some-command || fail "message"
|
| to produce a stack trace and exit the shell in case of non-zero
| exit status from some-command, or write some-
| command || softfail "message" || return $?
|
| in case you want to produce a stack trace and return from the
| function.
|
| [1]: https://github.com/runag/runag/blob/main/lib/fail.sh
| asicsp wrote:
| See also:
|
| Why doesn't set -e (or set -o errexit, or trap ERR) do what I
| expected? https://mywiki.wooledge.org/BashFAQ/105
|
| What are the advantages and disadvantages of using set -u (or set
| -o nounset)? https://mywiki.wooledge.org/BashFAQ/112
|
| Safe ways to do things in bash
| https://github.com/anordal/shellharden/blob/master/how_to_do...
|
| Better Bash Scripting in 15 Minutes
| https://robertmuth.blogspot.com/2012/08/better-bash-scriptin...
|
| Writing Robust Bash Shell Scripts
| https://www.davidpashley.com/articles/writing-robust-shell-s...
| ndsipa_pomu wrote:
| You missed the most important tip - use ShellCheck on every
| script you write: https://www.shellcheck.net/
|
| Personally, I'm a big fan of BASH3 boilerplate:
| https://github.com/kvz/bash3boilerplate
|
| It's fine for BASH versions above v3 and provides decent
| logging though I typically extend the script so that I can pipe
| long running commands into its logging framework. It also
| ensures that you specify the "help" options correctly as it
| parses the usage information to process the command line
| arguments with support for short and long options.
| asicsp wrote:
| Oh yeah, shellcheck is a must have tool.
| ndsipa_pomu wrote:
| I think of it as a gateway to writing better scripts. When
| you first run it and it highlights what it considers to be
| a problem, you end up reading why it considers it to be a
| problem and that clues you in on some of the many footguns
| that Bash has.
| colordrops wrote:
| Is there a reason bash is still the de facto shell scripting
| language other than sheer momentum of legacy? I'm able to get
| what I need done in it, but it's clunky and the syntax is horrid.
| I guess it forces you to move to a proper language once scripts
| grow to a certain size/complexity, so perhaps it's by design?
| jimkoen wrote:
| Are you sure it's bash? Most scripts on FreeBSD's are written
| for sh, which I feel is much more widely supported due to being
| part of the POSIX standard. Bash is just popular I think.
| HankB99 wrote:
| Bash is pretty much expected to be installed on any Linux
| distro. On FreeBSD (and likely other BSDs) it is an optional
| install. If you want a script to run on either, use sh. If
| strictly Linux, bash is probably safe.
|
| Bash/sh is good for when you need to combine some commands
| and what needs to be done can be accomplished mostly by CLI
| commands with a little glue to tie them together. Some times
| it is surprising what can be accomplished. I wrote a program
| to import pictures from an SD card on Windows using C#,
| copying pictures to C:\Pictures\YYYY\MM\DD according to the
| EXIF data or failing that, file time stamp. I tried to port
| it to Linux but ran into problems trying to connect to the
| EXIF library. After struggling with that, I rewrote it using
| sh, some EXIF tool and various file utilities. It took 31
| lines, about half of which were actual commands and the rest
| comments or white space.
|
| A much bigger project is a script to install Debian with root
| on ZFS. It's mostly a series of CLI commands with some
| variable substitution and conditionals depending on stuff
| like encrypted or not.
| xk_id wrote:
| > Bash/sh is good for when you need to combine some
| commands and what needs to be done can be accomplished
| mostly by CLI commands with a little glue to tie them
| together.
|
| Once I've learned bash, I realised how much more problems i
| could solve, in addition to a majority of old ones. It's an
| entirely new level of "computer literacy"; and a more
| genuine one.
| xp84 wrote:
| I suppose it's an ambiguous designation.
|
| I feel like when I see a shell script in my work, which is
| not in operating systems development of course, people are
| targeting bash. I agree many things are careful to target sh
| for certain reasons (e.g. a script that runs in a container
| where the base image doesn't have bash installed) but i still
| think GP's question is interesting because it's not common to
| see, say, a zsh shell script, but seeing #!/bin/bash is super
| common.
| ykonstant wrote:
| I have done some delightful stuff in `zsh`, but I always
| lament how slow its numerical array traversal is.
| Frustratingly, experts told me it really doesn't have to be
| slow, the devs just don't seem to be bothering to revamp
| the underlying data structure because they are focusing
| more on associative arrays.
| chasil wrote:
| If you are on Ubuntu, and you must target #!/bin/sh then
| bash is not an option.
|
| Android and Apple are in similar situations.
| joveian wrote:
| FreeBSD's /bin/sh is based on ash, like NetBSD's, although
| I'm not sure how much they have in common these days. dash
| was forked from NetBSD's version of ash and then simplified
| considerably and fixed up to be fully (? or at least mostly)
| POSIX compliant. A while after that NetBSD's shell also had a
| bunch of POSIX fixes. I'm not sure how FreeBSD's shell is in
| terms of strict POSIX compliance.
|
| In my opinion, bash has two things (at least vs NetBSD's
| shell, possibly a few more vs POSIX) that make the average
| shell script (that I write) much easier. The first is &>
| which makes it easy to redirect both stdout and stderr to a
| file for logging. The standard 2>&1 can work but needs to be
| placed correctly or it doesn't work. That place isn't always
| the obvious place like it is with &> and running bash seems
| much preferrable to me than figuring that one out.
|
| The second is ${var@Q} which prints var quoted for the shell,
| which is nice to use all over the place to make sure any
| printed file names can be copied and pasted.
|
| My sense is that targeting POSIX is usually done for maximum
| portability or for use on systems that don't have bash
| installed by default. However, bash is quite widely available
| even if not by default and very widely used so I wouldn't say
| it is unreasonable to look at bash as the de facto standard
| and POSIX and other shells as being used in more limited
| circumstances.
| hyperadvanced wrote:
| It really is just legacy and momentum. Recent additions build
| on sh/bash really well but in the end shell scripting is a
| means to an end that need to evolve much slower than standard
| programming languages.
|
| I think bash/sh's key feature is that they are anti-entropy,
| there's no development or evolution so there's no chance you
| need to mess with dependencies or new features, the stuff that
| worked 20 years ago will continue to be the "bread and butter".
| By design, this results in a system that's averse to change and
| incentivizes people to reach outside of its limits when they
| are met.
| theonemind wrote:
| bourne shell scripting is good enough, which makes it nearly
| impossible to replace. Plan9's rc is a bit cleaner, and no one
| is going to switch for 'more of the same, but cleaner'. You
| haven't switched to something similar but better even though
| you could literally do it right now https://pkgsrc.se/shells ,
| and it doesn't run any different for anyone else. It usually
| takes something several times better in some crucial aspect to
| replace an entrenched technology. For example, Plan 9 is better
| than UNIX-like systems, but not good enough to replace them. I
| don't think it's possible to make something good enough to
| replace bourne shell scripting in its niche because before you
| have something several times better, good enough to actually
| replace it, you're in a different ecological niche or problem
| domain, for real scripting languages like Perl, Python, and
| Ruby. It's a local maximum solution that sucks the air out of
| the room for potential competition closer to the theoretical
| global maximum solution for the narrow problem domain.
| AtlasBarfed wrote:
| It's ubiquitous.
|
| But bash is so bad I wrote a ton of namespace shortened utils
| for using groovy scripts.
|
| Sooooooooooooooooo much better. use IDEs for dev, save library
| system, groovy smoothed almost all Java annoyances
| ndsipa_pomu wrote:
| Its legacy usage is certainly a big part of its popularity. You
| can generally rely on having a newish version of it on any
| modern distro and you don't have to worry about the version
| unless you want to do stuff with arrays etc.
|
| What I find compelling about bash is its position in relation
| to other languages and tools. It's ideal for tying together
| other tools and is close enough to the operating system to make
| that easy whilst also not requiring libraries to also be
| installed (c.f. python).
|
| I often hear the opinion that more complex scripting should be
| moved to a language such as python, but that adds a layer of
| complexity that is probably not helpful in the long-run. I can
| take a bash script that I wrote twenty years ago and it'll
| still work fine, but a python programme from twenty years ago
| may well have issues with versions.
| bregma wrote:
| Perfection is the enemy of Good Enough.
| jph wrote:
| Good info. You can improve your debugging by using exit codes
| like this: # die: print error message to
| stderr, then exit with error code. # example: die 69
| "Service unavailable." die() { n="$1" ;
| shift ; >&2 printf %s\\n "$*" ; exit "$n" }
|
| Many more shell script exit codes and helper functions:
|
| https://github.com/SixArm/unix-shell-script-kit/blob/main/un...
| ykonstant wrote:
| That's a nice list; I guess every experienced user has their
| helper functions. However, I have a small criticism for the
| philosophy of that `die`: `die` functions should pass by
| default the exit code of the failed command, and not silence
| its error output. If I want to give my own meaning to the
| command failure in a large script for instance, I will use a
| different, more specialized `die`. My own die is roughly as
| follows: __errex() { printf 'Fatal
| error [%s] on line %s in '"'"'%s'"'"': %s\n' \
| "${1:-"?"}" \
| "${2:-"?"}" \
| "${3:-"unknown script"}" \
| "${4:-"unknown error"}" >&2 ;
| exit "${1:-1}" } alias die='__errex "$?"
| "${LINENO}" "$0"'
| jph wrote:
| Thanks! Your way is better than what I have. Want to do a
| pull request? Or may I copy/paste your code?
| ykonstant wrote:
| Just copy paste!
| bewuethr wrote:
| There's also a fairly powerful gdb style actual debugger:
| https://bashdb.sourceforge.net/
| E39M5S62 wrote:
| We use a couple nice home-grown functions in ZFSBootMenu to help
| debug things. We have a zdebug logging function that's peppered
| liberally throughout the code base - https://github.com/zbm-
| dev/zfsbootmenu/blob/master/zfsbootme...
|
| Hitting ctrl-t on our main menu will, when booting with debug
| logging enabled, show a screen like this:
| https://i.imgur.com/Ge75zkP.png
|
| We also have a flamegraph profiling mechanism that can be enabled
| with https://github.com/zbm-
| dev/zfsbootmenu/blob/master/zfsbootme... . That will dump data to
| a serial port, which when re-assembled, can be used to produce a
| graph like https://raw.githubusercontent.com/zbm-
| dev/zfsbootmenu/master...
|
| Bash is suprisingly flexible.
| thaumaturgy wrote:
| The `die()` trick is good, but bash has an annoying quirk: if you
| try to `exit` while you're inside a subshell, then the subshell
| exits but the rest of the script continues. Example:
| #!/bin/bash die() { echo "$1" >&2; exit 1; }
| cat myfile | while read line; do if [[ "$line" =~
| "information" ]]; then die "Found match"
| fi done echo "I don't want this line"
|
| ..."I don't want this line" will be printed.
|
| You can often avoid subshells (and in this specific example,
| shellcheck is absolutely right to complain about UUOC, and fixing
| that will also fix the die-from-a-subshell problem).
|
| But, sometimes you can't, or avoiding a subshell really
| complicates the script. For those occasions, you can grab the
| script's PID at the top of the script and then use that to kill
| it dead: #!/bin/bash MYPID=$$
| die() { echo "$1" >&2; kill -9 $MYPID; exit 1; }
| cat myfile | while read line; do if [[ "$line" =~
| "information" ]]; then die "Found match"
| fi done echo "I don't want this line"
|
| ...but, of course, there are tradeoffs here too; killing it this
| way is a little bit brutal, and I've found that (for reasons I
| don't understand) it's not entirely reliable either.
| whatindaheck wrote:
| Could killing the PID like that create zombies?
| ykonstant wrote:
| Perhaps, but in any case I would never write code like this.
|
| First of all, sending sigkill is literally overkill and
| perpetuates a bad practice. Send `TERM`. If it doesn't work,
| figure out why.
|
| Secondly, subshells should be made as clear as possible and
| not hidden in pipes. Related, looping over `read` is
| essentially never the right thing to do. If you really need
| to do that, don't use pipes; use heredocs or herestrings.
|
| Fourth, if you cannot avoid subshells and you want to
| terminate the full script on some condition, exit with a
| specific exit code from the subshell, check for it outside
| and terminate appropriately.
| gpvos wrote:
| _> looping over `read` is essentially never the right thing
| to do_
|
| Why? I do it quite often, though admittedly usually in one-
| time scripts.
| ykonstant wrote:
| See my reply to qazxcvbnm and study the links carefully
| if you want to do robust shell scripting.
| qazxcvbnm wrote:
| Do enlighten me on why it is a bad idea to use loops over
| read; it's perhaps one of my favourite patterns in bash,
| and combined with pipes, appears to me one of the cleanest
| ways to correctly and concisely utilise parallelism in
| software.
| ykonstant wrote:
| Stephane Chazelas has already explained this extensively,
| so I will link to two of his most information-dense
| posts:
|
| https://unix.stackexchange.com/questions/169716/why-is-
| using...
|
| https://unix.stackexchange.com/questions/209123/understan
| din...
|
| Both of these posts must be read carefully if you really
| wish to write robust scripts.
| qazxcvbnm wrote:
| The provided points don't seem to be reasons for
| generally avoiding the subshell loop pattern.
|
| Reasons of 1) performance, 2) readability, and 3)
| security are provided as points against the pattern, and
| the post itself acknowledges that the pattern is a great
| way to call external programs.
|
| I'd think that the fact that one is using shell to begin
| with would almost certainly mean that one is using the
| subshell loop pattern for calling external programs,
| which is the use case that your post approves of. In this
| case, subshells taking the piped input as stdin allows
| the easy passing of data streamed over a file-descriptor,
| probably one of the most trivially performant ways of
| data movement, and the pattern is composable, certainly
| easier to remember, modify, and to extend than the
| provided xargs alternative, without potential problems
| such as exceeding max argument length. Having independent
| subshells also allows for non-interference between
| separate loops when run in parallel, offering something
| resembling a proper closure. In these respects, subshell
| loops provide benefits rather than pitfalls in
| performance and readability. Certainly read has some
| quirks that one needs to be aware of, but aren't much of
| an issue when operating on inputs of a known shape, which
| is likely the case if one is about to provide them as
| arguments to another command.
|
| Regarding "security", the need to quote applies to
| anything in shell, and has nothing specifically to do
| with the pattern.
| wolletd wrote:
| Just adding `set -e` also exits the script when a subshell
| exits with non-zero error code. I'm not sure why I would leave
| `set -e` out in _any_ shell script.
| porlw wrote:
| grep has a bad interaction with set -e, since it
| (infuriatingly) exits with 1 if no lines are matched.
| wolletd wrote:
| Well, otherwise `if grep "line" $file; then` wouldn't work,
| which in my opinion is the primary use case for grep in
| scripts.
|
| I'd still prefer a `grep <args> || true` over not having
| `set -e` for the whole file.
| dieulot wrote:
| You can `set +e` before and `set -e` after every such
| command. I indent those commands to make it look like a
| block and to make sure setting errexit again isn't
| forgotten.
| js2 wrote:
| I use `set -e` but it has its own quirks. A couple:
|
| An arithmetic expression that evaluates to zero will cause
| the script to exit. e.g this will exit: set
| -e i=0 (( i++ )) # exits
|
| Calling a function from a conditional prevents `set -e` from
| exiting. The following prints "hello\nworld\n":
| set -e main() { false # does not return nor
| exit echo hello } if main; then echo
| world; fi
|
| Practically speaking this means you need to explicitly check
| the return value of every command you run that you care about
| and guard against `set -e` in places you don't want the
| script to exit. So the value of `set -e` is limited.
|
| More at https://mywiki.wooledge.org/BashFAQ/105
| mmmpetrichor wrote:
| If I ever have to debug anything in bash. I stop using bash hah.
| sureglymop wrote:
| Doesn't make sense. What if you get a script someone else
| wrote? Printing every command and confirming as you run every
| command is a great idea.
|
| And, unfortunately shell has become the norm in CI/CD
| environments, pipelines etc. Can be convenient at times but can
| also be inconvenient and confusing as these scripts don't run
| in interactive shells.
| bigstrat2003 wrote:
| > And, unfortunately shell has become the norm in CI/CD
| environments, pipelines etc.
|
| A pipeline which relies on shell is not worth using, tbh.
| That's how much shell sucks.
| ndsipa_pomu wrote:
| What language would you base a pipeline on?
|
| I agree that bash sucks, but have yet to find anything to
| replace it that doesn't increase complexity and version
| problems.
| ykonstant wrote:
| I find such reasoning backwards. Indeed, shell scripting is not
| friendly to debugging. But ensuring correctness of shell
| scripts is essential: usually, they touch part of your "$HOME"
| or system folders and do tons of I/O, some of it destructive. I
| find it baffling to see people write careless scripts;
| sometimes using `rm` for cleanup with unquoted parameters, or
| much worse, dangerous uses of `mv`.
| bigstrat2003 wrote:
| I believe OP's point was that any shell script complex enough
| to require debugging should not be a shell script any more.
| t-3 wrote:
| While that's certainly true for people trying to do very
| complex things in "pure" shell, when the tools you're using
| are possibly buggy, it's not very useful. Sometimes you
| have to debug and figure out where the problem is
| occurring, an then you can do the much simpler work of
| replacing one part rather than writing a bespoke program to
| replace all the boring functionality obfuscated away by
| shell and the working programs.
| Brian_K_White wrote:
| I believe P's point was that it doesn't matter how simple
| or complex a script (or anything else) is, everything
| requires debugging. And/or that you have to be ready to
| debug bash regardless if you like it or not, regardless
| what you choose to write your own stuff in.
|
| OP's comment is not unfunny and not 100% untrue either
| though. But not 100% true either. A single word script
| still needs to be debugged.
| BeetleB wrote:
| > And/or that you have to be ready to debug bash
| regardless if you like it or not, regardless what you
| choose to write your own stuff in.
|
| Over a decade into my career, and I've successfully
| managed to avoid debugging Bash scripts.
|
| There is hope for people who don't want to.
| jon-wood wrote:
| I hear this argument occasionally and it's very contextual.
| While it's certainly possible to rewrite any given shell
| script in Python, Rust, or whatever language you prefer
| there are some things which are just clearer in Bash.
|
| I wouldn't want to write an entire application in Bash, but
| equally I wouldn't want to write a script which does
| relatively simple file operations in Python. Bash is a
| language which has been honed over many decades for
| precisely that sort of thing, and so can communicate what's
| happening far clearer than Python does in my view.
| ykonstant wrote:
| The `trap DEBUG` thing is pretty interesting; I almost always
| write POSIX code, so I don't get to play with such tricks. Does
| anybody know of some wizardry that could mimic this in arbitrary
| POSIX compliant shells?
| bubblebeard wrote:
| This is neat, never considered one could step debug a bash
| script. Kudos for sharing!
| c0l0 wrote:
| I shared
| https://johannes.truschnigg.info/writing/2021-12_colodebug/ a
| while ago here - you might appreciate it for another means to
| make your bash scripts easier to understand while they execute.
| ketanmaheshwari wrote:
| Plug and slightly related. I once created a bash pipeline
| debugger that preserves the intermediate outputs. Has a few
| limitations but maybe generally useful:
| https://github.com/ketancmaheshwari/pd
| chlorion wrote:
| Gentoo has a script that you can source at
| /lib/gentoo/functions.sh that provide various helper methods,
| mostly for printing messages, and it provides nice little green
| and red starts to indicate whether something has succeeded of
| failed.
|
| I use functions.sh in all of my scripts that are known to be
| running on Gentoo only and it makes them feel Gentoo-y and is
| useful in general.
| hn_acker wrote:
| Off-topic: The alt text for the image says
|
| > Image of a comic. To read the full HTML alt text, click "read
| the transcript".
|
| but I can't find any button relating to a transcript.
| schneems wrote:
| I recommend shellcheck as well. It might not catch your problem,
| but it will point out possible problems.
|
| Also I recommend: rewriting scripts in another language. At work
| we are converting bash scripts to rust and while it's a high
| ramp-up time, the resulting code is much easier to maintain and I
| have a much higher level of confidence in them. Bash is still
| good for quick scripts but once you hit 100 lines or so you
| really deserve a language with stronger guarantees.
| riperoni wrote:
| I agree, but we gotta tell that to CI/CD engineering and yaml-
| pipelines
| sestep wrote:
| When I was at Facebook, I wrote a Python script to extract
| shell scripts from GitHub Actions workflows, so we could run
| them all through ShellCheck: https://github.com/pytorch/pytor
| ch/blob/69e0bda9996865e319db...
| gkfasdfasdf wrote:
| You can use the bash 'caller' builtin to print a stack trace,
| e.g.:
| https://github.com/theimpostor/templates/blob/main/bash%2Fte...
___________________________________________________________________
(page generated 2024-03-02 23:01 UTC)