[HN Gopher] Sysexits - preferable exit codes for programs
       ___________________________________________________________________
        
       Sysexits - preferable exit codes for programs
        
       Author : susam
       Score  : 157 points
       Date   : 2021-10-31 12:14 UTC (2 days ago)
        
 (HTM) web link (www.freebsd.org)
 (TXT) w3m dump (www.freebsd.org)
        
       | legulere wrote:
       | Reminds me a bit of HRESULT
       | (https://en.wikipedia.org/wiki/HRESULT)
        
         | coldacid wrote:
         | HRESULT is a lot more structured and allows for good non-zero
         | responses, not just bad ones. (If an HRESULT >= 0, it's a
         | success result; the sign bit is used to flag success or
         | failure.)
        
       | londons_explore wrote:
       | These error codes are too vague to pinpoint a problem, and too
       | vague to even really point in the direction of a problem.
       | 
       | I'd prefer some mechanism for making a cross-program shortened
       | stack trace indicating the specific error (ie. 'no space on
       | device') and the sequence of things in progress that have now all
       | failed (ie. 'cannot create output file', 'cannot decompress
       | files', 'cannot install XYZ package', 'cannot update system').
       | 
       | Then it's very clear to the user - The system can't be updated
       | because XYZ package couldn't be installed, because files couldn't
       | be decompressed, because the output file couldn't be created,
       | because there is no space left.
        
         | mikepurvis wrote:
         | I think the main value here is understanding at the broadest
         | level whether the problem is internal to the problem
         | (unforeseen logic error), with how I configured the program
         | (bad flags), or with the environment in which I ran the program
         | (out of memory, disk, whatever).
         | 
         | Obviously a single integer is not going to be a rich error-
         | reporting mechanism, but particularly in the context of
         | something like a process manager or job runner, it could be
         | helpful to know what kinds of errors might go away on a retry,
         | for example.
        
           | ttyprintk wrote:
           | I think history supports this, but I'm not sure I can cite a
           | reference. Early Unix code I wrote followed the convention
           | that codes 1-3 belong to system utilities, 4-9 are
           | unreserved, and users pick 10-127. Some exotic use cases wrap
           | some meaning into higher codes, but historically we only
           | cared to detect whether the error comes from system binaries
           | or within the application. I suppose programming on top of
           | the application would mean using 20, but I still pick 10 by
           | habit.
        
         | Ono-Sendai wrote:
         | You can do this in a simple enough way, in C++: each software
         | level catches the exception from the level below, prepends its
         | own message string to it and then throws a new exception with
         | the combined message.
        
           | londons_explore wrote:
           | There also needs to be a way to get strings from a
           | subprocess. Most unix-y tools involve a lot of subprocesses,
           | and the ultimate failure might be 3 processes away...
        
           | kevin_thibedeau wrote:
           | RAM is exhausted. Now what happens?
        
             | londons_explore wrote:
             | Even the standard libraries of most programming languages
             | don't have defined behaviour in out-of-memory conditions.
             | 
             | Instead just use paging, pushing out cache, or memory
             | compression to have a soft limit to ensure you never need
             | to handle out of memory.
             | 
             | In today's world of delayed page allocation, you'll
             | probably just get killed with SIGBUS anyway.
        
       | alexfromapex wrote:
       | Obviously, these are just for BSD, but for Python 3, the standard
       | library provides enumerations of error codes and provides a
       | dictionary to map to the underlying system message:
       | https://docs.python.org/3/library/errno.html
        
         | loeg wrote:
         | Those are a different set of error codes. Also, they have been
         | present in Python long before 3.
        
         | michaelhoffman wrote:
         | The sysexits are constants (such as `os.EX_OK`) are in the `os`
         | module. But beware, they are not available on some platforms.
         | 
         | https://docs.python.org/3/library/os.html#os._exit
        
       | zoomablemind wrote:
       | OpenVMS has a system-wide convention for coding exit status [1].
       | 
       | The lowest 3 bits of the status value is used to denote severity
       | (0:warning, 1:success, 2:error, 3:info, 4:severe/fatal).
       | 
       | In addition to that, each "facility" (either system or user
       | service/application) defines its set of status values and
       | corresp. messages, which can be incorporated for use with system
       | message facilty. So that on a program exit a user can query the
       | resulting status/message to get an explanation and possible
       | remedies, if applicable (HELP /MESSSAGE $STATUS).
       | 
       | [1]:https://wiki.vmssoftware.com/$STATUS
        
         | _kst_ wrote:
         | One issue with that is that using 1 for success conflicts with
         | the nearly universal Unix convention of using 0 for success.
         | 
         | C requires exit(0) to indicate a successful status. The OpenVMS
         | C library translates that to a status of 1.
         | 
         | A lot of C code uses exit(1) to indicate failure, but it
         | indicates success in OpenVMS. (Use EXIT_FAILURE, a macro
         | defined in <stdlib.h>, instead.)
        
         | nerdponx wrote:
         | I really like this idea, I might try using it in one of my own
         | programs (knowing fully that it's not a generally accepted
         | standard, but at least I can document what the codes mean)
        
           | BruceEel wrote:
           | This is very neat indeed. It could perhaps become an
           | unofficial standard of sorts if enough developers adopt it...
        
       | susam wrote:
       | Also worth seeing: https://reviews.freebsd.org/D27176 (Discourage
       | the use of sysexits(3) in new code)
        
         | hdjjhhvvhga wrote:
         | Your comment is more important than the submission itself so
         | should appear at the top, otherwise people may think it's a
         | good idea to use these.
        
         | klodolph wrote:
         | Interesting. When I've used sysexits, I only ever used
         | EX_USAGE, and I felt like I "should" be using the others, too.
         | Seems like other people have the same conclusion, that only
         | signaling the difference between incorrect invocation and other
         | types of failures is worth standardizing.
        
           | derefr wrote:
           | Basically, the difference between "a logic error in the
           | calling script -- so the caller should abort too, because the
           | caller has been proven as of that execution-step to be
           | misprogrammed, and so continuing could be dangerous"; vs
           | "anything else that isn't the calling script's fault."
           | 
           | For actually digging into the "anything else" and fixing it
           | (if you even have the privileges on the offending system) --
           | well, that's what error messages are for.
        
         | richardwhiuk wrote:
         | I find their rationale curious. Rather than attempting to have
         | some level of standardization, they are denying any
         | standardization because: - You should send something to stderr
         | (why not both?) - Choice is ambiguous (and so something non-
         | standard will be better?)
        
           | masklinn wrote:
           | > Choice is ambiguous (and so something non-standard will be
           | better?)
           | 
           | It would say it's certainly better to say nothing than to
           | imply a scheme is some sort of standard when it's really a
           | largely unusable not-at-all a standard.
           | 
           | Every time I try to use them, sysexits feels like a strictly
           | less usable and relevant version of HTTP categories 400 and
           | 500 smashed together with no rhyme or reason.
        
             | nerdponx wrote:
             | Personally I'd rather have a somewhat arbitrary standard
             | than no standard at all!
        
               | derefr wrote:
               | It's not that it's arbitrary, it's that 1. it isn't
               | univerally adopted, and 2. those that try to adhere to it
               | don't have the _same_ understanding about which things
               | qualify for which categories.
               | 
               | sysexit(3) defines a contract that calling programs would
               | find useful to _rely upon_ -- to do different things when
               | programs exit with different particular values. But
               | because of these human factors, calling programs _cannot_
               | rely on it. There 's no signal in the noise. The contract
               | is not enforced.
               | 
               | This goes from just _useless_ to _actively harmful_ ,
               | because the process of 1. finding the sysexit code
               | standard, 2. thinking you _can_ rely on it; 3. writing a
               | program that _does_ rely on it; 4. finding out that in
               | practice its contract is rarely obeyed; and 5. ripping
               | that code back out of your program -- all takes time and
               | human labor. The existence of this  "standard" thus
               | creates net-negative utility. It consumes people's time
               | to no useful end.
        
               | OJFord wrote:
               | Like HTTP statuses then? I'd still rather have them than
               | not. (And I'll still be annoyed by GraphQL saying 200 OK
               | - here're your errors!)
        
               | nerdponx wrote:
               | I learned about this recently and have never been more
               | disgusted with a software design decision. Why??
        
               | OJFord wrote:
               | It makes sense in a 'technically true but ugh' sort of
               | way. The GQL server is saying it successfully executed
               | that query, here is the response, it happens to contain
               | an error.
               | 
               | And in further fairness, I believe there can be multiple
               | 'responses' within, some of which may be errors and
               | others not.
               | 
               | But still ugh, personally I'd rather have resource-
               | oriented semantic HTTP (read: 'REST' without triggering
               | empassioned debate) just for that. I don't think adds
               | enough over HTTP to warrant this sort of weirdness
               | (sibling comment describes it as the next layer) - if it
               | were just an HTTP alternative sitting on TCP fine, and
               | I'm sure it would happily not have a top-level status
               | code (or a more meaningful one) itself anyway.
        
               | derefr wrote:
               | > and I'm sure it would happily not have a top-level
               | status code (or a more meaningful one) itself anyway
               | 
               | Fun thing to learn about: WebDAV introduced the HTTP 207
               | "Multi-Status" response status, for precisely the case
               | where the response is actually a multi-response envelope.
               | https://httpstatuses.com/207
               | 
               | I often feel that the extra methods and features
               | introduced by WebDAV would be usefully generalized to
               | other analogous use-cases, like GraphQL. But then I
               | recall that most of the WebDAV-specific stuff specifies
               | XML-based request/response message formats, and quickly
               | give up on the idea of any modern programmer doing it as
               | a cool hack. Can't have "cool hack" and "XML" in the same
               | sentence, after all ;)
        
               | Quekid5 wrote:
               | At least HTTP status codes _can_ reasonably point to
               | where the error is:                  2xx - Fine, you get
               | to decide what to do        4xx - Problem on your end,
               | mate        5xx - Problem on our end. Sorry, mate
               | 3xx - It's complicated...
               | 
               | GraphQL responding 200 to erroneous queries is...
               | surprising (assuming only HTTP semantics), but what
               | _kind_ of errors were these?
               | 
               | Of course, we're assuming servers implemented by at least
               | semi-reasonable people.
               | 
               | EDIT: That said, I'd actually be happy with a protocol-
               | level 2, 4, 5, 3 and leaving the _rest_ up the
               | application layer. I know that isn 't very HATEOAS, but
               | in practice all people truly want is RPC. (Sadly.)
        
               | derefr wrote:
               | > GraphQL responding 200 to erroneous queries is...
               | surprising
               | 
               | To me, both GraphQL (and also JSON-RPC, another weird one
               | for HTTP errors) are really application-layer protocols
               | of their own that are just "spoken over" HTTP, and as
               | such implement their own error signalling standards. In
               | cases like that, an HTTP error doesn't pertain to
               | anything being discussed "inside" the overlay protocol;
               | but instead refers to the _carrier protocol_.
               | 
               | For example, seeing an HTTP 404 in response to hitting a
               | GraphQL or JSON-RPC endpoint, wouldn't mean that the
               | resource you asked for with your overlay-protocol request
               | doesn't exist, but rather that the _endpoint you wanted
               | to speak the overlay protocol to_ doesn 't exist.
               | 
               | Consider, by analogy: a websocket, which actually does
               | speak a formal HTTP sub-protocol. If you make the right
               | request, you get your HTTP connection upgraded into a
               | WebSocket connection. If you don't? Carrier protocol
               | error.
        
               | nerdponx wrote:
               | How is that contract any different from the set of
               | supported CLI flags or an API? Obviously you can't
               | _assume_ that any particular command line tool follows
               | any particular standard. But that 's no reason to not
               | write programs that follow a standard!
               | 
               | If you don't document how to use your program, that's
               | another problem and the nobody will be able to use it
               | anyway.
        
       | EvanAnderson wrote:
       | In Microsoft-land I find myself referencing a similar list of
       | codes for Windows: https://docs.microsoft.com/en-
       | us/windows/win32/debug/system-...
       | 
       | Knowing that these exist has, in my experience, conferred nearly
       | "magical" powers of diagnosis w/ Windows system-related errors.
        
         | [deleted]
        
         | nanis wrote:
         | I prefer                   C:\> perl -MWin32 -E "say
         | Win32::FormatMessage(1303)"              No encryption key is
         | available. A well-known encryption key was returned.
         | 
         | You can put that in a batch file:                   C:\> type
         | %USERPROFILE%\bin\wm.bat         @perl -MWin32 -E "$e = shift;
         | say defined($e) ? Win32::FormatMessage($e) : 'Need exit code'"
         | %*
         | 
         | so it's always available:                   C:\> wm 2450
         | The user accounts database is not configured correctly.
         | 
         | I am not sure what the right way to do this in PowerShell[1]
         | is, but this batch script has served me well for almost two
         | decades now.
         | 
         | [1]: https://geekeefy.wordpress.com/2016/06/14/powershell-
         | import-...
        
           | biryani_chicken wrote:
           | Microsoft has an error lookup tool [0]:                   $
           | err 2450         # for decimal 2450 / hex 0x992
           | NERR_BadUasConfig                                     lmerr.h
           | # /* The user accounts database is not configured correctly.
           | # */         # No results found for hex 0x2450 / decimal 9296
           | # 1 matches found for "2450"
           | 
           | [0] https://www.microsoft.com/en-
           | us/download/details.aspx?id=100...
        
       | ainar-g wrote:
       | OpenBSD version: https://man.openbsd.org/sysexits.3.
       | 
       | Interestingly enough, it claims that the file first appeared in
       | 4.0BSD as opposed to "somewhere after 4.3BSD", like the FreeBSD
       | one does.
        
         | erk__ wrote:
         | The NetBSD manpage says:
         | 
         | > The <sysexits.h> header appeared somewhere after 4.3BSD.
         | 
         | > The manual page for it appeared in NetBSD 4.0.
         | 
         | Also it does not seem to be in any of the 4.3 man page archives
         | on the freebsd man page website.
        
       | nanis wrote:
       | This is a coherent, reasonable, set of codes that can be adapted
       | for a variety of situations. However, there are multiple
       | situations where one might want to have more meaningful
       | specialization of error codes. In such cases, for the love of all
       | that is holy, please document the codes and do not change the
       | meaning of codes from one version of the utility to the next.
        
       | adrianmonk wrote:
       | > _Instead, the pre-defined exit codes from sysexits should be
       | used, so the caller of the process can get a rough estimation
       | about the failure class without looking up the source code._
       | 
       | I got a laugh out of how this assumes that checking the command's
       | documentation isn't an option. If you want to know about
       | program's behavior, you're going to be looking at the source
       | code.
        
         | jbverschoor wrote:
         | The FreeBSD handbook is one of the most extensive, complete,
         | and properly written manuals out there.
         | 
         | Most (recent) open source projects don't even come close. So
         | you can redirect your laughs to one of the other "open source"
         | projects.
        
           | adrianmonk wrote:
           | I'm laughing at the state of the industry in general. I'm
           | definitely not laughing at the person who wrote this
           | documentation since they're obviously part of the solution.
        
         | mkipper wrote:
         | Yeah, this is bizarre.
         | 
         | What fraction of apps actually use sysexits.h? 0.1%? 0.01%? If
         | an application fails and exits with some >1 error code, it
         | would be insane to assume it's following this convention.
         | 
         | Hopefully the command's documentation describes the error codes
         | (either referencing sysexits or explicitly listing the non-
         | standard exit codes), and if it doesn't, you'll be stuck
         | looking at the source code anyway.
        
       | iechoz6H wrote:
       | You mean there is more than zero or non-zero?
        
       | mixmastamyk wrote:
       | Often at: /usr/include/sysexits.h under Linux.
        
         | bodyfour wrote:
         | It exists on any UNIX, basically. It was added 40 years ago:
         | https://github.com/dspinellis/unix-history-repo/commit/8e0a2...
        
       | npongratz wrote:
       | The Advanced Bash-Scripting Guide also has some (IMO) good
       | advice, with a footnoted mention of sysexits.h:
       | https://tldp.org/LDP/abs/html/exitcodes.html
       | 
       | > The author of this document proposes restricting user-defined
       | exit codes to the range 64 - 113 (in addition to 0, for success),
       | to conform with the C/C++ standard. This would allot 50 valid
       | codes, and make troubleshooting scripts more straightforward.[2]
       | 
       | > ...
       | 
       | > [2] An update of /usr/include/sysexits.h allocates previously
       | unused exit codes from 64 - 78. It may be anticipated that the
       | range of unallotted exit codes will be further restricted in the
       | future. The author of this document will not do fixups on the
       | scripting examples to conform to the changing standard. This
       | should not cause any problems, since there is no overlap or
       | conflict in usage of exit codes between compiled C/C++ binaries
       | and shell scripts.
        
         | _kst_ wrote:
         | > The author of this document proposes restricting user-defined
         | exit codes to the range 64 - 113 (in addition to 0, for
         | success), to conform with the C/C++ standard.
         | 
         | The C and C++ standards don't say anything that suggests
         | reserving the range 64-113.
         | 
         | Here's what the C standard says:
         | 
         | > If the value of status is zero or EXIT_SUCCESS, an
         | implementation-defined form of the status successful
         | termination is returned. If the value of status is
         | EXIT_FAILURE, an implementation-defined form of the status
         | unsuccessful termination is returned. Otherwise the status
         | returned is implementation-defined.
         | 
         | C++ inherits C's specification.
         | 
         | EXIT_SUCCESS is (almost?) always defined as 0. EXIT_FAILURE is
         | usually defined as 1 (but not on OpenVMS, where the OS defines
         | 1 as a form of success).
         | 
         | The restriction to 8-bit values is POSIX, not C or C++.
        
       | LukeShu wrote:
       | _> BSD sysexits.h (which is also in GNU libc) defines several
       | exit codes in the range 64-78. These are typically used in the
       | context of mail delivery; originating with BSD delivermail (the
       | NCP predecessor to the TCP /IP sendmail), and are still used by
       | modern mail systems such as Postfix to interpret the local(8)
       | delivery agent's exit status. Using these for service exit codes
       | isn't recommended by LSB (which says they are in the range
       | reserved for future LSB use) or by daemon(7). However, they are
       | used in practice,_
       | 
       | --
       | https://pkg.go.dev/git.lukeshu.com/go/libsystemd@v0.5.3/sd_d...
        
       | michaelhoffman wrote:
       | I've used these in Python programs but unfortunately they are
       | only defined on some platforms. Would be nice if there were
       | fallback values for when the platform doesn't define them.
        
         | mixmastamyk wrote:
         | Everywhere but Windows, just copy them in.
        
       | gardaani wrote:
       | Something to keep in mind when writing cross-platform command
       | line tools: on most Unix-like platforms, only the eight least-
       | significant bits are considered, while on Windows all bits are
       | considered. [1]
       | 
       | So, exit(256) means exit code 0 on Linux and Mac, but 256 on
       | Windows.
       | 
       | [1] https://doc.rust-
       | lang.org/std/process/fn.exit.html#platform-...
        
       | daneel_w wrote:
       | I've personally never made use of anything more than 0 for OK and
       | 1 for nope. Perhaps a bad habit, but it's universal.
        
         | ttyprintk wrote:
         | I wouldn't say it's a bad habit, but it's for portability not
         | universal agreement.
        
       | sverhagen wrote:
       | I'm waiting for this to happen:
       | 
       | 1. Find myself writing another BASH script in the future
       | 
       | 2. Remember this article, go find it
       | 
       | 3. Observe that my particular error state does not naturally fall
       | in any of the categories
       | 
       | Why do I expect this? Because that's what always happens,
       | perhaps?
        
         | LukeShu wrote:
         | sysexits.h was originally the codes used to communicate from
         | the mail delivery agent to the parent mail daemon. These codes
         | are still used on modern email systems like Postfix. IMO, it
         | is/was a mistake to shoehorn sysexits.h in to anything but that
         | mail-delivery usecase.
        
       | Animats wrote:
       | The whole concept of "exit codes" is a legacy of the PDP-11 era,
       | when getting anything back from a program was hard. Arguably, you
       | should get back something like argc/argv and environment variable
       | updates. Then programs would be more like functions.
        
         | adtac wrote:
         | argv is stored in the program's memory. where would this
         | hypothetical, arbitrarily sized return data be stored after the
         | program exits? who is responsible for storing it until the
         | exited process' parent asks for it? the kernel?
        
       | [deleted]
        
       | frou_dh wrote:
       | I'm happy enough with with 0 (success), 1 (failure), 2 (invoked
       | incorrectly, i.e. invalid flags or too few arguments). But maybe
       | other people and particularly sysadmins care more.
       | 
       | Only using 0 and 1, period, and not differentiating between the
       | situations 1 and 2 is a bit too sloppy though.
        
         | cesarb wrote:
         | I have used 75 (EX_TEMPFAIL) to ask systemd to restart a
         | service (by adding "RestartForceExitStatus=75" to the service
         | unit file).
        
         | pmontra wrote:
         | This matches with my memories of late 80s UNIX idiom:
         | 
         | 0, OK
         | 
         | 1, Error
         | 
         | 2, Bad arguments
         | 
         | However I'm not surprised that requirements evolved and
         | somebody needs more structured and informative exit codes.
        
         | biryani_chicken wrote:
         | I remember the Jetbrains IDE's bash script used a code to mean
         | restart the application by putting it in a loop and breaking
         | unless it got that exit code.
        
         | epage wrote:
         | When I was designing the programmatic API for my programmer's
         | spell checker [0], I found that it was easy to get an exit code
         | from something else in the stack and if they all used 1, there
         | wasn't a way to differentiate. This is the reason I went with
         | sysexists, so I would have a more nuanced codes to reduce the
         | probability of two processes in the stack returning confusable
         | errors.
         | 
         | [0] https://github.com/crate-ci/typos
        
         | Normal_gaussian wrote:
         | Is 2 used consistently by any programs in this manner?
         | 
         | If I was to advise someone I'd tell o interpret and write
         | programs using the following rule of thumb and not to worry too
         | much:
         | 
         | 0 = success / true
         | 
         | 1 = failure (read stderr) / false
         | 
         | >1 = failure (read stderr & docs)
        
           | umvi wrote:
           | I do something very similar, though I read I should avoid
           | codes 1 and 2 so my error codes start at 3: https://github.co
           | m/RPGillespie6/fastcov/blob/master/fastcov....
           | 
           | That way automated CI pipelines (or whatever) can switch on
           | the return code if needed rather than try to parse stderr
        
           | LukeShu wrote:
           | The Linux Standard Base (LSB) specification specifies this
           | meaning of 2 for init scripts, and systemd interprets daemon
           | exit codes in accordance with LSB. From there, this meaning
           | of 2 has spread to other types of programs and is quite
           | common, but still far from ubiquitous.
           | 
           | https://refspecs.linuxbase.org/LSB_5.0.0/LSB-Core-
           | generic/LS...
        
           | ttyprintk wrote:
           | Bash uses 1 and 2 this way.
        
           | epage wrote:
           | The main arg parser for Rust (clap) was looking at using
           | sysexits for usage but then got feedback from burntsushi (of
           | ripgrep fame) for how rarely it is used and instead
           | encouraged 2. In looking at it, it seems like 2 is the most
           | consistent code and for clap v3 will be using it.
           | 
           | - 2 is "misuse of shell built-ins" for bash [1] - Python's
           | argparse uses 2 [2]
           | 
           | I feel like we looked at more but can't remember which all
           | they were
           | 
           | [0] https://github.com/clap-
           | rs/clap/pull/1637#issuecomment-58103...
           | 
           | [1] https://tldp.org/LDP/abs/html/exitcodes.html
           | 
           | [2] https://github.com/python/cpython/blob/main/Lib/argparse.
           | py#...
        
           | spicybright wrote:
           | This is the first I'm hearing of using code 2 myself. Been
           | using bash casually for years now.
        
           | jitl wrote:
           | Many programming languages will exit 1 for unexpected
           | exceptions at runtime (or programmer error). Eg, this Ruby
           | program will exit 1:                   $ ruby -e 'raise
           | StandardError.new "oops"'         Traceback (most recent call
           | last):         -e:1:in `<main>': oops (StandardError)
           | $ echo $?         1
           | 
           | And so will this one:                   $ ruby -e 'invalid
           | method call'         Traceback (most recent call last):
           | -e:1:in `<main>': undefined local variable or method `call'
           | for main:Object (NameError)         Did you mean?  caller
           | 
           | So, it's nice to distinguish between a possibly unexpected
           | exceptional exit (1), and an "error case detected by the
           | programmer", which is what I use exit (2) for.
        
       ___________________________________________________________________
       (page generated 2021-11-02 23:01 UTC)