[HN Gopher] Sysexits - preferable exit codes for programs
___________________________________________________________________
Sysexits - preferable exit codes for programs
Author : susam
Score : 157 points
Date : 2021-10-31 12:14 UTC (2 days ago)
(HTM) web link (www.freebsd.org)
(TXT) w3m dump (www.freebsd.org)
| legulere wrote:
| Reminds me a bit of HRESULT
| (https://en.wikipedia.org/wiki/HRESULT)
| coldacid wrote:
| HRESULT is a lot more structured and allows for good non-zero
| responses, not just bad ones. (If an HRESULT >= 0, it's a
| success result; the sign bit is used to flag success or
| failure.)
| londons_explore wrote:
| These error codes are too vague to pinpoint a problem, and too
| vague to even really point in the direction of a problem.
|
| I'd prefer some mechanism for making a cross-program shortened
| stack trace indicating the specific error (ie. 'no space on
| device') and the sequence of things in progress that have now all
| failed (ie. 'cannot create output file', 'cannot decompress
| files', 'cannot install XYZ package', 'cannot update system').
|
| Then it's very clear to the user - The system can't be updated
| because XYZ package couldn't be installed, because files couldn't
| be decompressed, because the output file couldn't be created,
| because there is no space left.
| mikepurvis wrote:
| I think the main value here is understanding at the broadest
| level whether the problem is internal to the problem
| (unforeseen logic error), with how I configured the program
| (bad flags), or with the environment in which I ran the program
| (out of memory, disk, whatever).
|
| Obviously a single integer is not going to be a rich error-
| reporting mechanism, but particularly in the context of
| something like a process manager or job runner, it could be
| helpful to know what kinds of errors might go away on a retry,
| for example.
| ttyprintk wrote:
| I think history supports this, but I'm not sure I can cite a
| reference. Early Unix code I wrote followed the convention
| that codes 1-3 belong to system utilities, 4-9 are
| unreserved, and users pick 10-127. Some exotic use cases wrap
| some meaning into higher codes, but historically we only
| cared to detect whether the error comes from system binaries
| or within the application. I suppose programming on top of
| the application would mean using 20, but I still pick 10 by
| habit.
| Ono-Sendai wrote:
| You can do this in a simple enough way, in C++: each software
| level catches the exception from the level below, prepends its
| own message string to it and then throws a new exception with
| the combined message.
| londons_explore wrote:
| There also needs to be a way to get strings from a
| subprocess. Most unix-y tools involve a lot of subprocesses,
| and the ultimate failure might be 3 processes away...
| kevin_thibedeau wrote:
| RAM is exhausted. Now what happens?
| londons_explore wrote:
| Even the standard libraries of most programming languages
| don't have defined behaviour in out-of-memory conditions.
|
| Instead just use paging, pushing out cache, or memory
| compression to have a soft limit to ensure you never need
| to handle out of memory.
|
| In today's world of delayed page allocation, you'll
| probably just get killed with SIGBUS anyway.
| alexfromapex wrote:
| Obviously, these are just for BSD, but for Python 3, the standard
| library provides enumerations of error codes and provides a
| dictionary to map to the underlying system message:
| https://docs.python.org/3/library/errno.html
| loeg wrote:
| Those are a different set of error codes. Also, they have been
| present in Python long before 3.
| michaelhoffman wrote:
| The sysexits are constants (such as `os.EX_OK`) are in the `os`
| module. But beware, they are not available on some platforms.
|
| https://docs.python.org/3/library/os.html#os._exit
| zoomablemind wrote:
| OpenVMS has a system-wide convention for coding exit status [1].
|
| The lowest 3 bits of the status value is used to denote severity
| (0:warning, 1:success, 2:error, 3:info, 4:severe/fatal).
|
| In addition to that, each "facility" (either system or user
| service/application) defines its set of status values and
| corresp. messages, which can be incorporated for use with system
| message facilty. So that on a program exit a user can query the
| resulting status/message to get an explanation and possible
| remedies, if applicable (HELP /MESSSAGE $STATUS).
|
| [1]:https://wiki.vmssoftware.com/$STATUS
| _kst_ wrote:
| One issue with that is that using 1 for success conflicts with
| the nearly universal Unix convention of using 0 for success.
|
| C requires exit(0) to indicate a successful status. The OpenVMS
| C library translates that to a status of 1.
|
| A lot of C code uses exit(1) to indicate failure, but it
| indicates success in OpenVMS. (Use EXIT_FAILURE, a macro
| defined in <stdlib.h>, instead.)
| nerdponx wrote:
| I really like this idea, I might try using it in one of my own
| programs (knowing fully that it's not a generally accepted
| standard, but at least I can document what the codes mean)
| BruceEel wrote:
| This is very neat indeed. It could perhaps become an
| unofficial standard of sorts if enough developers adopt it...
| susam wrote:
| Also worth seeing: https://reviews.freebsd.org/D27176 (Discourage
| the use of sysexits(3) in new code)
| hdjjhhvvhga wrote:
| Your comment is more important than the submission itself so
| should appear at the top, otherwise people may think it's a
| good idea to use these.
| klodolph wrote:
| Interesting. When I've used sysexits, I only ever used
| EX_USAGE, and I felt like I "should" be using the others, too.
| Seems like other people have the same conclusion, that only
| signaling the difference between incorrect invocation and other
| types of failures is worth standardizing.
| derefr wrote:
| Basically, the difference between "a logic error in the
| calling script -- so the caller should abort too, because the
| caller has been proven as of that execution-step to be
| misprogrammed, and so continuing could be dangerous"; vs
| "anything else that isn't the calling script's fault."
|
| For actually digging into the "anything else" and fixing it
| (if you even have the privileges on the offending system) --
| well, that's what error messages are for.
| richardwhiuk wrote:
| I find their rationale curious. Rather than attempting to have
| some level of standardization, they are denying any
| standardization because: - You should send something to stderr
| (why not both?) - Choice is ambiguous (and so something non-
| standard will be better?)
| masklinn wrote:
| > Choice is ambiguous (and so something non-standard will be
| better?)
|
| It would say it's certainly better to say nothing than to
| imply a scheme is some sort of standard when it's really a
| largely unusable not-at-all a standard.
|
| Every time I try to use them, sysexits feels like a strictly
| less usable and relevant version of HTTP categories 400 and
| 500 smashed together with no rhyme or reason.
| nerdponx wrote:
| Personally I'd rather have a somewhat arbitrary standard
| than no standard at all!
| derefr wrote:
| It's not that it's arbitrary, it's that 1. it isn't
| univerally adopted, and 2. those that try to adhere to it
| don't have the _same_ understanding about which things
| qualify for which categories.
|
| sysexit(3) defines a contract that calling programs would
| find useful to _rely upon_ -- to do different things when
| programs exit with different particular values. But
| because of these human factors, calling programs _cannot_
| rely on it. There 's no signal in the noise. The contract
| is not enforced.
|
| This goes from just _useless_ to _actively harmful_ ,
| because the process of 1. finding the sysexit code
| standard, 2. thinking you _can_ rely on it; 3. writing a
| program that _does_ rely on it; 4. finding out that in
| practice its contract is rarely obeyed; and 5. ripping
| that code back out of your program -- all takes time and
| human labor. The existence of this "standard" thus
| creates net-negative utility. It consumes people's time
| to no useful end.
| OJFord wrote:
| Like HTTP statuses then? I'd still rather have them than
| not. (And I'll still be annoyed by GraphQL saying 200 OK
| - here're your errors!)
| nerdponx wrote:
| I learned about this recently and have never been more
| disgusted with a software design decision. Why??
| OJFord wrote:
| It makes sense in a 'technically true but ugh' sort of
| way. The GQL server is saying it successfully executed
| that query, here is the response, it happens to contain
| an error.
|
| And in further fairness, I believe there can be multiple
| 'responses' within, some of which may be errors and
| others not.
|
| But still ugh, personally I'd rather have resource-
| oriented semantic HTTP (read: 'REST' without triggering
| empassioned debate) just for that. I don't think adds
| enough over HTTP to warrant this sort of weirdness
| (sibling comment describes it as the next layer) - if it
| were just an HTTP alternative sitting on TCP fine, and
| I'm sure it would happily not have a top-level status
| code (or a more meaningful one) itself anyway.
| derefr wrote:
| > and I'm sure it would happily not have a top-level
| status code (or a more meaningful one) itself anyway
|
| Fun thing to learn about: WebDAV introduced the HTTP 207
| "Multi-Status" response status, for precisely the case
| where the response is actually a multi-response envelope.
| https://httpstatuses.com/207
|
| I often feel that the extra methods and features
| introduced by WebDAV would be usefully generalized to
| other analogous use-cases, like GraphQL. But then I
| recall that most of the WebDAV-specific stuff specifies
| XML-based request/response message formats, and quickly
| give up on the idea of any modern programmer doing it as
| a cool hack. Can't have "cool hack" and "XML" in the same
| sentence, after all ;)
| Quekid5 wrote:
| At least HTTP status codes _can_ reasonably point to
| where the error is: 2xx - Fine, you get
| to decide what to do 4xx - Problem on your end,
| mate 5xx - Problem on our end. Sorry, mate
| 3xx - It's complicated...
|
| GraphQL responding 200 to erroneous queries is...
| surprising (assuming only HTTP semantics), but what
| _kind_ of errors were these?
|
| Of course, we're assuming servers implemented by at least
| semi-reasonable people.
|
| EDIT: That said, I'd actually be happy with a protocol-
| level 2, 4, 5, 3 and leaving the _rest_ up the
| application layer. I know that isn 't very HATEOAS, but
| in practice all people truly want is RPC. (Sadly.)
| derefr wrote:
| > GraphQL responding 200 to erroneous queries is...
| surprising
|
| To me, both GraphQL (and also JSON-RPC, another weird one
| for HTTP errors) are really application-layer protocols
| of their own that are just "spoken over" HTTP, and as
| such implement their own error signalling standards. In
| cases like that, an HTTP error doesn't pertain to
| anything being discussed "inside" the overlay protocol;
| but instead refers to the _carrier protocol_.
|
| For example, seeing an HTTP 404 in response to hitting a
| GraphQL or JSON-RPC endpoint, wouldn't mean that the
| resource you asked for with your overlay-protocol request
| doesn't exist, but rather that the _endpoint you wanted
| to speak the overlay protocol to_ doesn 't exist.
|
| Consider, by analogy: a websocket, which actually does
| speak a formal HTTP sub-protocol. If you make the right
| request, you get your HTTP connection upgraded into a
| WebSocket connection. If you don't? Carrier protocol
| error.
| nerdponx wrote:
| How is that contract any different from the set of
| supported CLI flags or an API? Obviously you can't
| _assume_ that any particular command line tool follows
| any particular standard. But that 's no reason to not
| write programs that follow a standard!
|
| If you don't document how to use your program, that's
| another problem and the nobody will be able to use it
| anyway.
| EvanAnderson wrote:
| In Microsoft-land I find myself referencing a similar list of
| codes for Windows: https://docs.microsoft.com/en-
| us/windows/win32/debug/system-...
|
| Knowing that these exist has, in my experience, conferred nearly
| "magical" powers of diagnosis w/ Windows system-related errors.
| [deleted]
| nanis wrote:
| I prefer C:\> perl -MWin32 -E "say
| Win32::FormatMessage(1303)" No encryption key is
| available. A well-known encryption key was returned.
|
| You can put that in a batch file: C:\> type
| %USERPROFILE%\bin\wm.bat @perl -MWin32 -E "$e = shift;
| say defined($e) ? Win32::FormatMessage($e) : 'Need exit code'"
| %*
|
| so it's always available: C:\> wm 2450
| The user accounts database is not configured correctly.
|
| I am not sure what the right way to do this in PowerShell[1]
| is, but this batch script has served me well for almost two
| decades now.
|
| [1]: https://geekeefy.wordpress.com/2016/06/14/powershell-
| import-...
| biryani_chicken wrote:
| Microsoft has an error lookup tool [0]: $
| err 2450 # for decimal 2450 / hex 0x992
| NERR_BadUasConfig lmerr.h
| # /* The user accounts database is not configured correctly.
| # */ # No results found for hex 0x2450 / decimal 9296
| # 1 matches found for "2450"
|
| [0] https://www.microsoft.com/en-
| us/download/details.aspx?id=100...
| ainar-g wrote:
| OpenBSD version: https://man.openbsd.org/sysexits.3.
|
| Interestingly enough, it claims that the file first appeared in
| 4.0BSD as opposed to "somewhere after 4.3BSD", like the FreeBSD
| one does.
| erk__ wrote:
| The NetBSD manpage says:
|
| > The <sysexits.h> header appeared somewhere after 4.3BSD.
|
| > The manual page for it appeared in NetBSD 4.0.
|
| Also it does not seem to be in any of the 4.3 man page archives
| on the freebsd man page website.
| nanis wrote:
| This is a coherent, reasonable, set of codes that can be adapted
| for a variety of situations. However, there are multiple
| situations where one might want to have more meaningful
| specialization of error codes. In such cases, for the love of all
| that is holy, please document the codes and do not change the
| meaning of codes from one version of the utility to the next.
| adrianmonk wrote:
| > _Instead, the pre-defined exit codes from sysexits should be
| used, so the caller of the process can get a rough estimation
| about the failure class without looking up the source code._
|
| I got a laugh out of how this assumes that checking the command's
| documentation isn't an option. If you want to know about
| program's behavior, you're going to be looking at the source
| code.
| jbverschoor wrote:
| The FreeBSD handbook is one of the most extensive, complete,
| and properly written manuals out there.
|
| Most (recent) open source projects don't even come close. So
| you can redirect your laughs to one of the other "open source"
| projects.
| adrianmonk wrote:
| I'm laughing at the state of the industry in general. I'm
| definitely not laughing at the person who wrote this
| documentation since they're obviously part of the solution.
| mkipper wrote:
| Yeah, this is bizarre.
|
| What fraction of apps actually use sysexits.h? 0.1%? 0.01%? If
| an application fails and exits with some >1 error code, it
| would be insane to assume it's following this convention.
|
| Hopefully the command's documentation describes the error codes
| (either referencing sysexits or explicitly listing the non-
| standard exit codes), and if it doesn't, you'll be stuck
| looking at the source code anyway.
| iechoz6H wrote:
| You mean there is more than zero or non-zero?
| mixmastamyk wrote:
| Often at: /usr/include/sysexits.h under Linux.
| bodyfour wrote:
| It exists on any UNIX, basically. It was added 40 years ago:
| https://github.com/dspinellis/unix-history-repo/commit/8e0a2...
| npongratz wrote:
| The Advanced Bash-Scripting Guide also has some (IMO) good
| advice, with a footnoted mention of sysexits.h:
| https://tldp.org/LDP/abs/html/exitcodes.html
|
| > The author of this document proposes restricting user-defined
| exit codes to the range 64 - 113 (in addition to 0, for success),
| to conform with the C/C++ standard. This would allot 50 valid
| codes, and make troubleshooting scripts more straightforward.[2]
|
| > ...
|
| > [2] An update of /usr/include/sysexits.h allocates previously
| unused exit codes from 64 - 78. It may be anticipated that the
| range of unallotted exit codes will be further restricted in the
| future. The author of this document will not do fixups on the
| scripting examples to conform to the changing standard. This
| should not cause any problems, since there is no overlap or
| conflict in usage of exit codes between compiled C/C++ binaries
| and shell scripts.
| _kst_ wrote:
| > The author of this document proposes restricting user-defined
| exit codes to the range 64 - 113 (in addition to 0, for
| success), to conform with the C/C++ standard.
|
| The C and C++ standards don't say anything that suggests
| reserving the range 64-113.
|
| Here's what the C standard says:
|
| > If the value of status is zero or EXIT_SUCCESS, an
| implementation-defined form of the status successful
| termination is returned. If the value of status is
| EXIT_FAILURE, an implementation-defined form of the status
| unsuccessful termination is returned. Otherwise the status
| returned is implementation-defined.
|
| C++ inherits C's specification.
|
| EXIT_SUCCESS is (almost?) always defined as 0. EXIT_FAILURE is
| usually defined as 1 (but not on OpenVMS, where the OS defines
| 1 as a form of success).
|
| The restriction to 8-bit values is POSIX, not C or C++.
| LukeShu wrote:
| _> BSD sysexits.h (which is also in GNU libc) defines several
| exit codes in the range 64-78. These are typically used in the
| context of mail delivery; originating with BSD delivermail (the
| NCP predecessor to the TCP /IP sendmail), and are still used by
| modern mail systems such as Postfix to interpret the local(8)
| delivery agent's exit status. Using these for service exit codes
| isn't recommended by LSB (which says they are in the range
| reserved for future LSB use) or by daemon(7). However, they are
| used in practice,_
|
| --
| https://pkg.go.dev/git.lukeshu.com/go/libsystemd@v0.5.3/sd_d...
| michaelhoffman wrote:
| I've used these in Python programs but unfortunately they are
| only defined on some platforms. Would be nice if there were
| fallback values for when the platform doesn't define them.
| mixmastamyk wrote:
| Everywhere but Windows, just copy them in.
| gardaani wrote:
| Something to keep in mind when writing cross-platform command
| line tools: on most Unix-like platforms, only the eight least-
| significant bits are considered, while on Windows all bits are
| considered. [1]
|
| So, exit(256) means exit code 0 on Linux and Mac, but 256 on
| Windows.
|
| [1] https://doc.rust-
| lang.org/std/process/fn.exit.html#platform-...
| daneel_w wrote:
| I've personally never made use of anything more than 0 for OK and
| 1 for nope. Perhaps a bad habit, but it's universal.
| ttyprintk wrote:
| I wouldn't say it's a bad habit, but it's for portability not
| universal agreement.
| sverhagen wrote:
| I'm waiting for this to happen:
|
| 1. Find myself writing another BASH script in the future
|
| 2. Remember this article, go find it
|
| 3. Observe that my particular error state does not naturally fall
| in any of the categories
|
| Why do I expect this? Because that's what always happens,
| perhaps?
| LukeShu wrote:
| sysexits.h was originally the codes used to communicate from
| the mail delivery agent to the parent mail daemon. These codes
| are still used on modern email systems like Postfix. IMO, it
| is/was a mistake to shoehorn sysexits.h in to anything but that
| mail-delivery usecase.
| Animats wrote:
| The whole concept of "exit codes" is a legacy of the PDP-11 era,
| when getting anything back from a program was hard. Arguably, you
| should get back something like argc/argv and environment variable
| updates. Then programs would be more like functions.
| adtac wrote:
| argv is stored in the program's memory. where would this
| hypothetical, arbitrarily sized return data be stored after the
| program exits? who is responsible for storing it until the
| exited process' parent asks for it? the kernel?
| [deleted]
| frou_dh wrote:
| I'm happy enough with with 0 (success), 1 (failure), 2 (invoked
| incorrectly, i.e. invalid flags or too few arguments). But maybe
| other people and particularly sysadmins care more.
|
| Only using 0 and 1, period, and not differentiating between the
| situations 1 and 2 is a bit too sloppy though.
| cesarb wrote:
| I have used 75 (EX_TEMPFAIL) to ask systemd to restart a
| service (by adding "RestartForceExitStatus=75" to the service
| unit file).
| pmontra wrote:
| This matches with my memories of late 80s UNIX idiom:
|
| 0, OK
|
| 1, Error
|
| 2, Bad arguments
|
| However I'm not surprised that requirements evolved and
| somebody needs more structured and informative exit codes.
| biryani_chicken wrote:
| I remember the Jetbrains IDE's bash script used a code to mean
| restart the application by putting it in a loop and breaking
| unless it got that exit code.
| epage wrote:
| When I was designing the programmatic API for my programmer's
| spell checker [0], I found that it was easy to get an exit code
| from something else in the stack and if they all used 1, there
| wasn't a way to differentiate. This is the reason I went with
| sysexists, so I would have a more nuanced codes to reduce the
| probability of two processes in the stack returning confusable
| errors.
|
| [0] https://github.com/crate-ci/typos
| Normal_gaussian wrote:
| Is 2 used consistently by any programs in this manner?
|
| If I was to advise someone I'd tell o interpret and write
| programs using the following rule of thumb and not to worry too
| much:
|
| 0 = success / true
|
| 1 = failure (read stderr) / false
|
| >1 = failure (read stderr & docs)
| umvi wrote:
| I do something very similar, though I read I should avoid
| codes 1 and 2 so my error codes start at 3: https://github.co
| m/RPGillespie6/fastcov/blob/master/fastcov....
|
| That way automated CI pipelines (or whatever) can switch on
| the return code if needed rather than try to parse stderr
| LukeShu wrote:
| The Linux Standard Base (LSB) specification specifies this
| meaning of 2 for init scripts, and systemd interprets daemon
| exit codes in accordance with LSB. From there, this meaning
| of 2 has spread to other types of programs and is quite
| common, but still far from ubiquitous.
|
| https://refspecs.linuxbase.org/LSB_5.0.0/LSB-Core-
| generic/LS...
| ttyprintk wrote:
| Bash uses 1 and 2 this way.
| epage wrote:
| The main arg parser for Rust (clap) was looking at using
| sysexits for usage but then got feedback from burntsushi (of
| ripgrep fame) for how rarely it is used and instead
| encouraged 2. In looking at it, it seems like 2 is the most
| consistent code and for clap v3 will be using it.
|
| - 2 is "misuse of shell built-ins" for bash [1] - Python's
| argparse uses 2 [2]
|
| I feel like we looked at more but can't remember which all
| they were
|
| [0] https://github.com/clap-
| rs/clap/pull/1637#issuecomment-58103...
|
| [1] https://tldp.org/LDP/abs/html/exitcodes.html
|
| [2] https://github.com/python/cpython/blob/main/Lib/argparse.
| py#...
| spicybright wrote:
| This is the first I'm hearing of using code 2 myself. Been
| using bash casually for years now.
| jitl wrote:
| Many programming languages will exit 1 for unexpected
| exceptions at runtime (or programmer error). Eg, this Ruby
| program will exit 1: $ ruby -e 'raise
| StandardError.new "oops"' Traceback (most recent call
| last): -e:1:in `<main>': oops (StandardError)
| $ echo $? 1
|
| And so will this one: $ ruby -e 'invalid
| method call' Traceback (most recent call last):
| -e:1:in `<main>': undefined local variable or method `call'
| for main:Object (NameError) Did you mean? caller
|
| So, it's nice to distinguish between a possibly unexpected
| exceptional exit (1), and an "error case detected by the
| programmer", which is what I use exit (2) for.
___________________________________________________________________
(page generated 2021-11-02 23:01 UTC)