hngopher.com

       [HN Gopher] The Birth of Standard Error (2013)
       ___________________________________________________________________
        
       The Birth of Standard Error (2013)
        
       Author : marbu
       Score  : 101 points
       Date   : 2024-07-13 10:55 UTC (12 hours ago)
        
 (HTM) web link (www2.dmst.aueb.gr)
 (TXT) w3m dump (www2.dmst.aueb.gr)
        
       | nintendo1889 wrote:
       | Wow. And now we can run webservers on printers.
        
         | kragen wrote:
         | i think contiki needs less than 16k of ram to run a webserver,
         | most of which is the tcp stack. my own httpdito is about 2k of
         | code and uses a 1024-byte data buffer, but that's sweeping tcp
         | under the linux kernel rug
         | 
         | this is just to say that you could probably have run a
         | webserver on a pdp-8/s, which was about the size of an atx case
         | and would be a reasonable controller to build into a
         | phototypesetter at the time
        
       | amelius wrote:
       | What I hate about stderr is that it's character-based, not line-
       | based.
       | 
       | I often get output from multiple threads or multiple processes
       | garbled together on the same line. I know how to fix this, but I
       | feel my OS should do it for me.
        
         | Bombinator wrote:
         | You can set the buffering mode of any file stream with setvbuf.
         | For example, setvbuf(stderr, NULL, _IOLBF, BUFSIZ) sets stderr
         | to line buffered I/O.
        
           | kragen wrote:
           | that may help, but if a write writes more than PIPE_BUF
           | bytes, it isn't guaranteed atomic by the kernel. similarly,
           | stdioing a line of more than BUFSIZ may result in multiple
           | write calls. i don't think posix makes any guarantees there
           | (this is just an empirically based speculation) and i'm
           | fairly sure the c standard doesn't
        
             | gpderetta wrote:
             | Don't confuse the C stderr (which is of type FILE) from
             | posix STDERR_FILENO file descriptor (i.e. 2). FILE (in
             | POSIX, and in C since C11) guarantees that each I/O
             | operation is thread safe (and flockfile in POSIX can be
             | used to make larger operations atomic). A low level POSIX
             | file descriptor is not thread safe (although of course the
             | kernel will protect its own integrity). BUFSIZ only matter
             | when writing to a pipe from distinct file descriptors.
        
               | kragen wrote:
               | i think the previous discussion may not have been clear
               | enough, because you seem to be discussing a totally
               | different scenario
               | 
               | given this program                   #include <string.h>
               | #include <stdio.h>              char large[16385];
               | int main()         {           printf("BUFSIZ is %d\n",
               | BUFSIZ);           memset(large, 'A', sizeof(large));
               | large[sizeof(large) - 1] = '\0';
               | fprintf(stderr, "%s\n", large);           return 0;
               | }
               | 
               | compiled with `gcc -static` against glibc 2.36-9+deb12u7,
               | we get this strace                   execve("./a.out",
               | ["./a.out"], 0x7fffafcb4a30 /* 49 vars */) = 0
               | brk(NULL)                               = 0x1e28000
               | brk(0x1e28d00)                          = 0x1e28d00
               | arch_prctl(ARCH_SET_FS, 0x1e28380)      = 0
               | set_tid_address(0x1e28650)              = 1501924
               | set_robust_list(0x1e28660, 24)          = 0
               | rseq(0x1e28ca0, 0x20, 0, 0x53053053)    = 0
               | prlimit64(0, RLIMIT_STACK, NULL, {rlim_cur=9788*1024,
               | rlim_max=RLIM64_INFINITY}) = 0
               | readlink("/proc/self/exe", "<censored>", 4096) = 21
               | getrandom("<censored>", 8, GRND_NONBLOCK) = 8
               | brk(NULL)                               = 0x1e28d00
               | brk(0x1e49d00)                          = 0x1e49d00
               | brk(0x1e4a000)                          = 0x1e4a000
               | mprotect(0x4a0000, 16384, PROT_READ)    = 0
               | newfstatat(1, "", {st_mode=S_IFCHR|0620,
               | st_rdev=makedev(0x88, 0x7), ...}, AT_EMPTY_PATH) = 0
               | write(1, "BUFSIZ is 8192\n", 15)        = 15
               | write(2, "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA"..., 8192) =
               | 8192         write(2,
               | "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA"..., 8192) = 8192
               | write(2, "\n", 1)                       = 1
               | exit_group(0)                           = ?         +++
               | exited with 0 +++
               | 
               | you can see that the single fprintf call resulted in
               | three separate calls to write(2), even though it is only
               | a single line of desperate screaming. those three calls
               | happen at three separate times, typically on the order of
               | tens of microseconds apart. if that file descriptor is
               | open to, for example, a terminal or pipe or logfile that
               | some other process is also writing to, that other process
               | can write other data during those tens of microseconds,
               | resulting in the intercalation of that other data in the
               | middle of the screaming
               | 
               | threads are completely irrelevant here, except that i
               | guess in an exotic scenario the 'other process' that is
               | writing to the file could conceivably be a different
               | thread in the same process? that would make your remarks
               | about 'distinct file descriptors' and thread safety make
               | sense. but we were talking about entirely separate
               | processes writing to the file, since that's the usual
               | case on unix, and in that case no form of thread-safety
               | is worth squat; what matters is the semantics of the
               | system calls
               | 
               | i don't think posix makes any guarantees about how many
               | calls to write(2)+ a call to fprintf(3) will result in,
               | though i haven't actually looked, and i don't think wg14
               | concerns itself with environment-dependent questions like
               | this at all
               | 
               | ______
               | 
               | + or writev(2)
        
               | gpderetta wrote:
               | Sorry, I meant specifically threads, so the atomicity is
               | purely process local of course. There is an I ternal
               | (recursive) mutex inside FILE.
        
               | BoingBoomTschak wrote:
               | What he's saying is that as long as you don't mix usage
               | of stdio and raw write(2), you won't have any
               | interleaving problem; because there's a lock, which is
               | why _unlocked variants exist.
        
         | chrisrhoden wrote:
         | I think it makes sense as a default to avoid issues discerning
         | timing due to a buffer.
        
         | Sharlin wrote:
         | stderr having line buffering turned off by default is
         | intentional. You want to see the output immediately and not
         | have it stuck in a buffer that might be lost if the program
         | crashes or freezes.
        
         | PhilipRoman wrote:
         | Lack of a standard way to control standard stream buffering is
         | a big pain point for me sometimes. I'm still salty the
         | libc+environment based approach was rejected by maintainers.
         | And it also cannot be fixed on the kernel side since buffering
         | is purely userspace feature.
        
         | o11c wrote:
         | In my experience, the biggest offender is programs trying to do
         | syscalls directly (possibly for async-signal-safety), but not
         | being aware of `writev`. Especially programs that do colored
         | output can be really stupid here. Sometimes there are stupid
         | programs that use multiple _processes_ to do colored output
         | even (IIRC CMake is a big offender here, but CMake is infamous
         | for refusing to fix bugs)!
         | 
         | The pipe buffer is big enough that _sane_ programs aren 't
         | likely to run into problems. The math:
         | 
         | PIPE_BUF is 512 per POSIX but in practice 4096 on Linux
         | (probably others too?). If we assume a horrible-and-unlikely 12
         | formatting characters per real character (and assume a real
         | character is non-BMP and thus 4 bytes, but still single-
         | column), Linux has enough for 64 characters. With more
         | reasonable assumptions (mostly ascii, no more than 4 formatting
         | changes per line) we get more like 6 _lines_ of output being
         | atomic on Linux, and even POSIX being likely to get at least
         | one whole line.
        
       | dn3500 wrote:
       | Separate error output was around for at least a decade before
       | this. I know MTS had it in the 1960s, and I don't think it was
       | their original idea. I used a CDC for a while and they had it
       | too. So while this is the story of how standard error was
       | introduced into Unix, it is not the origin story of the concept
       | of standard error.
        
         | lapsed_lisper wrote:
         | Unix's standard error is definitely not the first invention of
         | a sink for errors. According to Doug McIlroy, Unix got standard
         | error in its 6th Edition, released in May 1975
         | (http://www.cs.dartmouth.edu/~doug/reader.pdf). 5th Edition was
         | released in June, 1974, so it's reasonable to suppose Unix's
         | standard error was developed during that 11 month interval. By
         | that time, Multics already had a dedicated error stream, called
         | error_output (see https://multicians.org/mtbs/mtb763.html,
         | dated October 1973).
         | 
         | All the same, I'd be willing to believe that Unix's standard
         | error could have been an "independent rediscovery" of one
         | feature made highly desirable by other features (redirection
         | and pipes). It's not clear how much communication there was
         | among distinct OS researcher groups back then, so even if other
         | systems had an analogue, Bell Labs people might not have been
         | aware of it.
        
           | dbcurtis wrote:
           | The story that I recall about the origins of stderr is that
           | without it, pipes are a mess. Keeping stdout to just the text
           | that you want to pipe between tools and diverting all "noise"
           | elsewhere is what makes pipes useable.
        
             | kragen wrote:
             | the article is about specifically what kind of mess and
             | what kind of usability problems inspired the change
        
         | temporarely wrote:
         | I've always felt stderr should have been stdmeta.
         | 
         | p.s.
         | 
         | Well, actually more completely, something like this:
         | +---------+         [meta-in] --> |         | --> meta-out
         | | p r o c |           input    ==> |         | ==> output
         | +---------+
        
           | dfee wrote:
           | This is interesting: MIMO channels to a process. Single
           | stdin/stderr/stdout is effective for a single OS process, but
           | with so much pulled up to user land (e.g. workers via green
           | threads) maybe it makes sense to introduce multichannel
           | i/e/o.
        
           | jasonjayr wrote:
           | I bet you'd also be onboard with files having data forks and
           | resource forks too ... ?
           | 
           | TBH, it's a great idea, but history proved that we apparently
           | prefer a single stream of data and solving all the problems
           | it brings ...
        
             | dotancohen wrote:
             | We don't have a single stream - that's the point. stdout
             | and stderr are already different streams.
        
               | jasonjayr wrote:
               | Right, I was alluding to the original Mac's filesystem,
               | with separate data + resource forks, requiring all sorts
               | of hacks to transfer files to and from them. Due to all
               | the trouble of working with that across other platforms,
               | Mac eventually gave that up at a filesystem level, and
               | sprinkled ".DS_Store" files everywhere.
               | 
               | TCP/IP streams are bidirectional, but there is a limited
               | way of sending "out of band" data, though it is not used
               | as much. It would have been nice if the stdout/stderr
               | multiple streams extended to TCP/IP networking and even
               | HTTP messages too.
        
               | LegionMammal978 wrote:
               | > TCP/IP streams are bidirectional, but there is a
               | limited way of sending "out of band" data, though it is
               | not used as much.
               | 
               | It's not real "out of band" data: that's something wholly
               | invented by the Unix socket API. TCP itself just has an
               | "urgent pointer", which addresses some byte further in
               | the data stream that the receiver doesn't have yet, with
               | the intent that higher-level protocols could use it as a
               | signal to flush any data up to that pointer to observe
               | whatever the urgent message is. There's nothing in the
               | protocol itself to actually send a message separately
               | from the rest of the stream.
        
           | dotancohen wrote:
           | You might like this proposal:
           | 
           | https://unix.stackexchange.com/questions/197809/propose-
           | addi...
           | 
           | The idea is that some output is metadata (such as ps headers)
           | and some is data. With stdmeta we could differentiate between
           | the two.
        
           | euroderf wrote:
           | Sure. Make metadata out-of-band rather than in-band, so that
           | the ungovernable mess of Unix-standard plain ol' text streams
           | is replaced by structured data.
           | 
           | So, well then: allowing programs to consume and emit JSON -
           | is this progress ?
        
             | leoc wrote:
             | JSON is hardly the greatest structured format, but nearly
             | anything is better than Unix text streams.
        
               | euroderf wrote:
               | I'm thinking that if a range of XML markups can delimit
               | and separate out metadata (e.g. HTML head v body), then
               | heck so can JSON. Maybe not prettily.
        
           | BoingBoomTschak wrote:
           | Why not more than 4? Another thing CL did "better":
           | 
           | https://www.lispworks.com/documentation/lw50/CLHS/Body/v_deb.
           | ..
           | 
           | https://www.lispworks.com/documentation/lw50/CLHS/Body/v_ter.
           | ..
        
             | temporarely wrote:
             | Sure, it is arguably cleaner to explicitly isolate error
             | condition from general process meta-data. So in my OP
             | diagram you can add an errout coming down from the proc-
             | box. I've used this actual pattern in-process to hook up
             | pipelines of active/passive components. However, there is
             | no sense (imo) to propagate the errout of P(n) to P(n+1),
             | so 2-in, 3-out.
             | 
             | p.s. That is pL(n).stderr -> pE.stdin, where pL is the
             | 'business logic' and pE is the system's error processing
             | aspects. I.e. the error processing component's stdin is the
             | stderr of the logical processes (Lp), so there is a uniform
             | process model applicable to both logical and error
             | processing elements of the pipeline.
             | 
             | The issue is how to do this within the limits of line
             | terminal interface (CLI). In code (as in in-process
             | chaining) that aspect is a non-issue.
        
           | lapsed_lisper wrote:
           | Among conceptually Unix-like OSes, at least one tried to do
           | something along these lines: see http://www.bitsavers.org/pdf
           | /apollo/SR10/011021-A00_Using_Yo... PDF pp 151 and following.
        
         | cchi_co wrote:
         | The concept of separate error outputs has a rich history in
         | earlier computing systems
        
         | dredmorbius wrote:
         | Similarly, the SAS System, originally written on an IBM
         | mainframe in 1971 (probably OS-360 / MVS), and featured an
         | input file (the SAS program itself) and two outputs, a LIST
         | (the desired analytic output) and LOG, which contained status,
         | warning, and error messages. It's not quite stderr, but clearly
         | reflects similar thinking and was probably based on extant
         | practices at the time.
        
       | fnord77 wrote:
       | > "One afternoon several of us had the same experience --
       | typesetting something, feeding the paper through the developer,
       | only to find a single, beautifully typeset line: "cannot open
       | file foobar"
       | 
       | reminds me of those t-shirts or digital billboards displaying
       | some system error that we've all seen as memes
        
       | jchw wrote:
       | This is not exactly the most earth-shattering revelation, but
       | man: handling and communicating errors always seems to be a
       | source of a vast amount of inelegance in software.
       | 
       | I'd argue we haven't really "solved" the optimal way to do error
       | handling in programming: Using union types remains one of the
       | best options, but even that has its downsides. Consider the
       | ergonomics of forwarding an error type multiple layers in a Rust
       | program: you can remove some of the boilerplate by strapping
       | macros on top, but I'd argue that's more of a bandage than a fix.
       | Most other programming languages are either using exceptions,
       | which I don't like as they complicate control flow behavior
       | significantly, or simply ignore error _handling_ entirely (like C
       | and Go; Both of them provide some standard facilities for dealing
       | with error _values_ , but handling it is completely manual. I do
       | like this, since it's very straightforward, but it nonetheless is
       | just sidestepping the problem.) And even trying to keep it simple
       | can create new problems, like of course the way pthreads has to
       | contort errno into a thread-local, for reasons obvious.
       | 
       | And while stderr has created a somewhat unified channel for
       | dumping errors _into_ , once they've bubbled up to the point
       | where the program needs to output it, there's an almost unlimited
       | amount of opinions on exactly how error logging should work. Some
       | software won't use stderr by default, others only uses stderr for
       | specific types of errors. Some software dumps _everything_ that
       | isn 't data output into stderr, including e.g. `--help` text,
       | whereas some software uses stdout for anything that isn't
       | explicitly an error (Which often leads to me needing to pipe
       | --help to less twice: once without, and once with 2>&1.)
       | Categorization of error logging is also somewhat contentious:
       | should there be a "warning" severity? should you split errors
       | into modules? Formatting, too: what should be in a log line?
       | Should logs be structured into a machine-readable format such as
       | JSON?
       | 
       | It was probably a bad omen that even very old versions of UNIX
       | ran into problems dealing with error logging and wound up needing
       | to bifurcate things. Few programs feel as 'lazy' as UNIX; if UNIX
       | couldn't ignore the problem, god knows the rest of the software
       | was doomed.
        
         | euroderf wrote:
         | Handling errors in a processing pipeline in Go is a big fat
         | PITA. In other languages too ?
        
         | w10-1 wrote:
         | I think Swift is close to optimal for now.
         | 
         | It does have the union type Result<normal, error>, but most
         | people throw/catch Error.
         | 
         | In Swift, error is a simple value (without a stack frame) and
         | thus is as cheap as a return value, but can be handled/caught
         | anywhere in the call chain like an exception.
         | 
         | Error is a protocol that tags any type, so it can carry any
         | details you like, and your catch can switch on the type.
         | 
         | But it's only now (10 years on) that they're declaring error
         | types in the function signature. In this world, it turns out
         | that not throwing is the same as throws(Never). It took this
         | long because it's unclear (but possible) that per-type error
         | handling helps, mainly with libraries.
         | 
         | Serializing/tracking the originating (thread) context and
         | avoiding merge conflicts in error streams seems like the
         | unsolved problem. Both Java and Swift have structured
         | concurrency with parent/child relations with derived
         | cancellation/termination. Perhaps later that can include
         | errors.
        
       | stuaxo wrote:
       | Some things I want from std streams:
       | 
       | Timestamping or sync points, so that if I pipe multiple streams
       | (say stdout and stderr) I can keep them in sync further along
       | when various buffers may have been involved.
       | 
       | Metadata, such as magic file types.
       | 
       | Structured data (this may link with meta data, and maybe there is
       | even a way programs could negotiate what to send to each other).
        
       ___________________________________________________________________
       (page generated 2024-07-13 23:01 UTC)