[HN Gopher] The Birth of Standard Error (2013)
___________________________________________________________________
The Birth of Standard Error (2013)
Author : marbu
Score : 101 points
Date : 2024-07-13 10:55 UTC (12 hours ago)
(HTM) web link (www2.dmst.aueb.gr)
(TXT) w3m dump (www2.dmst.aueb.gr)
| nintendo1889 wrote:
| Wow. And now we can run webservers on printers.
| kragen wrote:
| i think contiki needs less than 16k of ram to run a webserver,
| most of which is the tcp stack. my own httpdito is about 2k of
| code and uses a 1024-byte data buffer, but that's sweeping tcp
| under the linux kernel rug
|
| this is just to say that you could probably have run a
| webserver on a pdp-8/s, which was about the size of an atx case
| and would be a reasonable controller to build into a
| phototypesetter at the time
| amelius wrote:
| What I hate about stderr is that it's character-based, not line-
| based.
|
| I often get output from multiple threads or multiple processes
| garbled together on the same line. I know how to fix this, but I
| feel my OS should do it for me.
| Bombinator wrote:
| You can set the buffering mode of any file stream with setvbuf.
| For example, setvbuf(stderr, NULL, _IOLBF, BUFSIZ) sets stderr
| to line buffered I/O.
| kragen wrote:
| that may help, but if a write writes more than PIPE_BUF
| bytes, it isn't guaranteed atomic by the kernel. similarly,
| stdioing a line of more than BUFSIZ may result in multiple
| write calls. i don't think posix makes any guarantees there
| (this is just an empirically based speculation) and i'm
| fairly sure the c standard doesn't
| gpderetta wrote:
| Don't confuse the C stderr (which is of type FILE) from
| posix STDERR_FILENO file descriptor (i.e. 2). FILE (in
| POSIX, and in C since C11) guarantees that each I/O
| operation is thread safe (and flockfile in POSIX can be
| used to make larger operations atomic). A low level POSIX
| file descriptor is not thread safe (although of course the
| kernel will protect its own integrity). BUFSIZ only matter
| when writing to a pipe from distinct file descriptors.
| kragen wrote:
| i think the previous discussion may not have been clear
| enough, because you seem to be discussing a totally
| different scenario
|
| given this program #include <string.h>
| #include <stdio.h> char large[16385];
| int main() { printf("BUFSIZ is %d\n",
| BUFSIZ); memset(large, 'A', sizeof(large));
| large[sizeof(large) - 1] = '\0';
| fprintf(stderr, "%s\n", large); return 0;
| }
|
| compiled with `gcc -static` against glibc 2.36-9+deb12u7,
| we get this strace execve("./a.out",
| ["./a.out"], 0x7fffafcb4a30 /* 49 vars */) = 0
| brk(NULL) = 0x1e28000
| brk(0x1e28d00) = 0x1e28d00
| arch_prctl(ARCH_SET_FS, 0x1e28380) = 0
| set_tid_address(0x1e28650) = 1501924
| set_robust_list(0x1e28660, 24) = 0
| rseq(0x1e28ca0, 0x20, 0, 0x53053053) = 0
| prlimit64(0, RLIMIT_STACK, NULL, {rlim_cur=9788*1024,
| rlim_max=RLIM64_INFINITY}) = 0
| readlink("/proc/self/exe", "<censored>", 4096) = 21
| getrandom("<censored>", 8, GRND_NONBLOCK) = 8
| brk(NULL) = 0x1e28d00
| brk(0x1e49d00) = 0x1e49d00
| brk(0x1e4a000) = 0x1e4a000
| mprotect(0x4a0000, 16384, PROT_READ) = 0
| newfstatat(1, "", {st_mode=S_IFCHR|0620,
| st_rdev=makedev(0x88, 0x7), ...}, AT_EMPTY_PATH) = 0
| write(1, "BUFSIZ is 8192\n", 15) = 15
| write(2, "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA"..., 8192) =
| 8192 write(2,
| "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA"..., 8192) = 8192
| write(2, "\n", 1) = 1
| exit_group(0) = ? +++
| exited with 0 +++
|
| you can see that the single fprintf call resulted in
| three separate calls to write(2), even though it is only
| a single line of desperate screaming. those three calls
| happen at three separate times, typically on the order of
| tens of microseconds apart. if that file descriptor is
| open to, for example, a terminal or pipe or logfile that
| some other process is also writing to, that other process
| can write other data during those tens of microseconds,
| resulting in the intercalation of that other data in the
| middle of the screaming
|
| threads are completely irrelevant here, except that i
| guess in an exotic scenario the 'other process' that is
| writing to the file could conceivably be a different
| thread in the same process? that would make your remarks
| about 'distinct file descriptors' and thread safety make
| sense. but we were talking about entirely separate
| processes writing to the file, since that's the usual
| case on unix, and in that case no form of thread-safety
| is worth squat; what matters is the semantics of the
| system calls
|
| i don't think posix makes any guarantees about how many
| calls to write(2)+ a call to fprintf(3) will result in,
| though i haven't actually looked, and i don't think wg14
| concerns itself with environment-dependent questions like
| this at all
|
| ______
|
| + or writev(2)
| gpderetta wrote:
| Sorry, I meant specifically threads, so the atomicity is
| purely process local of course. There is an I ternal
| (recursive) mutex inside FILE.
| BoingBoomTschak wrote:
| What he's saying is that as long as you don't mix usage
| of stdio and raw write(2), you won't have any
| interleaving problem; because there's a lock, which is
| why _unlocked variants exist.
| chrisrhoden wrote:
| I think it makes sense as a default to avoid issues discerning
| timing due to a buffer.
| Sharlin wrote:
| stderr having line buffering turned off by default is
| intentional. You want to see the output immediately and not
| have it stuck in a buffer that might be lost if the program
| crashes or freezes.
| PhilipRoman wrote:
| Lack of a standard way to control standard stream buffering is
| a big pain point for me sometimes. I'm still salty the
| libc+environment based approach was rejected by maintainers.
| And it also cannot be fixed on the kernel side since buffering
| is purely userspace feature.
| o11c wrote:
| In my experience, the biggest offender is programs trying to do
| syscalls directly (possibly for async-signal-safety), but not
| being aware of `writev`. Especially programs that do colored
| output can be really stupid here. Sometimes there are stupid
| programs that use multiple _processes_ to do colored output
| even (IIRC CMake is a big offender here, but CMake is infamous
| for refusing to fix bugs)!
|
| The pipe buffer is big enough that _sane_ programs aren 't
| likely to run into problems. The math:
|
| PIPE_BUF is 512 per POSIX but in practice 4096 on Linux
| (probably others too?). If we assume a horrible-and-unlikely 12
| formatting characters per real character (and assume a real
| character is non-BMP and thus 4 bytes, but still single-
| column), Linux has enough for 64 characters. With more
| reasonable assumptions (mostly ascii, no more than 4 formatting
| changes per line) we get more like 6 _lines_ of output being
| atomic on Linux, and even POSIX being likely to get at least
| one whole line.
| dn3500 wrote:
| Separate error output was around for at least a decade before
| this. I know MTS had it in the 1960s, and I don't think it was
| their original idea. I used a CDC for a while and they had it
| too. So while this is the story of how standard error was
| introduced into Unix, it is not the origin story of the concept
| of standard error.
| lapsed_lisper wrote:
| Unix's standard error is definitely not the first invention of
| a sink for errors. According to Doug McIlroy, Unix got standard
| error in its 6th Edition, released in May 1975
| (http://www.cs.dartmouth.edu/~doug/reader.pdf). 5th Edition was
| released in June, 1974, so it's reasonable to suppose Unix's
| standard error was developed during that 11 month interval. By
| that time, Multics already had a dedicated error stream, called
| error_output (see https://multicians.org/mtbs/mtb763.html,
| dated October 1973).
|
| All the same, I'd be willing to believe that Unix's standard
| error could have been an "independent rediscovery" of one
| feature made highly desirable by other features (redirection
| and pipes). It's not clear how much communication there was
| among distinct OS researcher groups back then, so even if other
| systems had an analogue, Bell Labs people might not have been
| aware of it.
| dbcurtis wrote:
| The story that I recall about the origins of stderr is that
| without it, pipes are a mess. Keeping stdout to just the text
| that you want to pipe between tools and diverting all "noise"
| elsewhere is what makes pipes useable.
| kragen wrote:
| the article is about specifically what kind of mess and
| what kind of usability problems inspired the change
| temporarely wrote:
| I've always felt stderr should have been stdmeta.
|
| p.s.
|
| Well, actually more completely, something like this:
| +---------+ [meta-in] --> | | --> meta-out
| | p r o c | input ==> | | ==> output
| +---------+
| dfee wrote:
| This is interesting: MIMO channels to a process. Single
| stdin/stderr/stdout is effective for a single OS process, but
| with so much pulled up to user land (e.g. workers via green
| threads) maybe it makes sense to introduce multichannel
| i/e/o.
| jasonjayr wrote:
| I bet you'd also be onboard with files having data forks and
| resource forks too ... ?
|
| TBH, it's a great idea, but history proved that we apparently
| prefer a single stream of data and solving all the problems
| it brings ...
| dotancohen wrote:
| We don't have a single stream - that's the point. stdout
| and stderr are already different streams.
| jasonjayr wrote:
| Right, I was alluding to the original Mac's filesystem,
| with separate data + resource forks, requiring all sorts
| of hacks to transfer files to and from them. Due to all
| the trouble of working with that across other platforms,
| Mac eventually gave that up at a filesystem level, and
| sprinkled ".DS_Store" files everywhere.
|
| TCP/IP streams are bidirectional, but there is a limited
| way of sending "out of band" data, though it is not used
| as much. It would have been nice if the stdout/stderr
| multiple streams extended to TCP/IP networking and even
| HTTP messages too.
| LegionMammal978 wrote:
| > TCP/IP streams are bidirectional, but there is a
| limited way of sending "out of band" data, though it is
| not used as much.
|
| It's not real "out of band" data: that's something wholly
| invented by the Unix socket API. TCP itself just has an
| "urgent pointer", which addresses some byte further in
| the data stream that the receiver doesn't have yet, with
| the intent that higher-level protocols could use it as a
| signal to flush any data up to that pointer to observe
| whatever the urgent message is. There's nothing in the
| protocol itself to actually send a message separately
| from the rest of the stream.
| dotancohen wrote:
| You might like this proposal:
|
| https://unix.stackexchange.com/questions/197809/propose-
| addi...
|
| The idea is that some output is metadata (such as ps headers)
| and some is data. With stdmeta we could differentiate between
| the two.
| euroderf wrote:
| Sure. Make metadata out-of-band rather than in-band, so that
| the ungovernable mess of Unix-standard plain ol' text streams
| is replaced by structured data.
|
| So, well then: allowing programs to consume and emit JSON -
| is this progress ?
| leoc wrote:
| JSON is hardly the greatest structured format, but nearly
| anything is better than Unix text streams.
| euroderf wrote:
| I'm thinking that if a range of XML markups can delimit
| and separate out metadata (e.g. HTML head v body), then
| heck so can JSON. Maybe not prettily.
| BoingBoomTschak wrote:
| Why not more than 4? Another thing CL did "better":
|
| https://www.lispworks.com/documentation/lw50/CLHS/Body/v_deb.
| ..
|
| https://www.lispworks.com/documentation/lw50/CLHS/Body/v_ter.
| ..
| temporarely wrote:
| Sure, it is arguably cleaner to explicitly isolate error
| condition from general process meta-data. So in my OP
| diagram you can add an errout coming down from the proc-
| box. I've used this actual pattern in-process to hook up
| pipelines of active/passive components. However, there is
| no sense (imo) to propagate the errout of P(n) to P(n+1),
| so 2-in, 3-out.
|
| p.s. That is pL(n).stderr -> pE.stdin, where pL is the
| 'business logic' and pE is the system's error processing
| aspects. I.e. the error processing component's stdin is the
| stderr of the logical processes (Lp), so there is a uniform
| process model applicable to both logical and error
| processing elements of the pipeline.
|
| The issue is how to do this within the limits of line
| terminal interface (CLI). In code (as in in-process
| chaining) that aspect is a non-issue.
| lapsed_lisper wrote:
| Among conceptually Unix-like OSes, at least one tried to do
| something along these lines: see http://www.bitsavers.org/pdf
| /apollo/SR10/011021-A00_Using_Yo... PDF pp 151 and following.
| cchi_co wrote:
| The concept of separate error outputs has a rich history in
| earlier computing systems
| dredmorbius wrote:
| Similarly, the SAS System, originally written on an IBM
| mainframe in 1971 (probably OS-360 / MVS), and featured an
| input file (the SAS program itself) and two outputs, a LIST
| (the desired analytic output) and LOG, which contained status,
| warning, and error messages. It's not quite stderr, but clearly
| reflects similar thinking and was probably based on extant
| practices at the time.
| fnord77 wrote:
| > "One afternoon several of us had the same experience --
| typesetting something, feeding the paper through the developer,
| only to find a single, beautifully typeset line: "cannot open
| file foobar"
|
| reminds me of those t-shirts or digital billboards displaying
| some system error that we've all seen as memes
| jchw wrote:
| This is not exactly the most earth-shattering revelation, but
| man: handling and communicating errors always seems to be a
| source of a vast amount of inelegance in software.
|
| I'd argue we haven't really "solved" the optimal way to do error
| handling in programming: Using union types remains one of the
| best options, but even that has its downsides. Consider the
| ergonomics of forwarding an error type multiple layers in a Rust
| program: you can remove some of the boilerplate by strapping
| macros on top, but I'd argue that's more of a bandage than a fix.
| Most other programming languages are either using exceptions,
| which I don't like as they complicate control flow behavior
| significantly, or simply ignore error _handling_ entirely (like C
| and Go; Both of them provide some standard facilities for dealing
| with error _values_ , but handling it is completely manual. I do
| like this, since it's very straightforward, but it nonetheless is
| just sidestepping the problem.) And even trying to keep it simple
| can create new problems, like of course the way pthreads has to
| contort errno into a thread-local, for reasons obvious.
|
| And while stderr has created a somewhat unified channel for
| dumping errors _into_ , once they've bubbled up to the point
| where the program needs to output it, there's an almost unlimited
| amount of opinions on exactly how error logging should work. Some
| software won't use stderr by default, others only uses stderr for
| specific types of errors. Some software dumps _everything_ that
| isn 't data output into stderr, including e.g. `--help` text,
| whereas some software uses stdout for anything that isn't
| explicitly an error (Which often leads to me needing to pipe
| --help to less twice: once without, and once with 2>&1.)
| Categorization of error logging is also somewhat contentious:
| should there be a "warning" severity? should you split errors
| into modules? Formatting, too: what should be in a log line?
| Should logs be structured into a machine-readable format such as
| JSON?
|
| It was probably a bad omen that even very old versions of UNIX
| ran into problems dealing with error logging and wound up needing
| to bifurcate things. Few programs feel as 'lazy' as UNIX; if UNIX
| couldn't ignore the problem, god knows the rest of the software
| was doomed.
| euroderf wrote:
| Handling errors in a processing pipeline in Go is a big fat
| PITA. In other languages too ?
| w10-1 wrote:
| I think Swift is close to optimal for now.
|
| It does have the union type Result<normal, error>, but most
| people throw/catch Error.
|
| In Swift, error is a simple value (without a stack frame) and
| thus is as cheap as a return value, but can be handled/caught
| anywhere in the call chain like an exception.
|
| Error is a protocol that tags any type, so it can carry any
| details you like, and your catch can switch on the type.
|
| But it's only now (10 years on) that they're declaring error
| types in the function signature. In this world, it turns out
| that not throwing is the same as throws(Never). It took this
| long because it's unclear (but possible) that per-type error
| handling helps, mainly with libraries.
|
| Serializing/tracking the originating (thread) context and
| avoiding merge conflicts in error streams seems like the
| unsolved problem. Both Java and Swift have structured
| concurrency with parent/child relations with derived
| cancellation/termination. Perhaps later that can include
| errors.
| stuaxo wrote:
| Some things I want from std streams:
|
| Timestamping or sync points, so that if I pipe multiple streams
| (say stdout and stderr) I can keep them in sync further along
| when various buffers may have been involved.
|
| Metadata, such as magic file types.
|
| Structured data (this may link with meta data, and maybe there is
| even a way programs could negotiate what to send to each other).
___________________________________________________________________
(page generated 2024-07-13 23:01 UTC)