[HN Gopher] 3-JSON
       ___________________________________________________________________
        
       3-JSON
        
       Author : RGBCube
       Score  : 90 points
       Date   : 2025-07-21 10:39 UTC (4 days ago)
        
 (HTM) web link (rgbcu.be)
 (TXT) w3m dump (rgbcu.be)
        
       | NoboruWataya wrote:
       | I've never heard of stddata. What distro/environment provides it?
        
         | jamessb wrote:
         | Nor have I; I think it is just what the developer of tree has
         | chosen to call file descriptor 3, rather than being a wider
         | convention or standard thing provided by the environment.
         | 
         | > As of version 2.0.0, in Linux, tree will attempt to
         | automatically output a compact JSON tree on file descriptor 3
         | (what I call stddata,) if present
         | 
         | https://github.com/Old-Man-Programmer/tree/blob/d501b58ff9cb...
        
         | deathanatos wrote:
         | It's a local invention of TFA's, AFAIK. It's not "std".
         | 
         | stdout would be the canonical location for putting JSON output
         | (and the "data" of a command, generally). Then things like `|
         | jq` just work.
        
       | zbendefy wrote:
       | offtopic: why is the Copyright (c) icon shake like crazy at the
       | bottom of the page?
       | 
       | Edit: Oh I guess it seems to be intentional, I clicked around and
       | I like the rgbcube site map.
        
         | omnicognate wrote:
         | <copyright intensifies>
        
       | gerikson wrote:
       | > Okay, apparently the stddata addition is causing havoc (who
       | knew how many scripts just haphazardly hand programs random file
       | descriptors, that's surely not a problem.)
       | 
       | I knew, and I've known since reading the "C shell considered
       | harmful" paper, which offhandedly mentioned that sh-based shells
       | can use an arbitrary number of file descriptors (maybe they have
       | to be one-digit integers though). csh can't, of course.
       | 
       | It's discussed in the first section here
       | 
       | https://harmful.cat-v.org/software/csh
        
         | theamk wrote:
         | this brings memories - university, first Unix exposure, Sun Ray
         | terminals, "tcsh" as default shell, and me doing "find / -name
         | ..." a lot.
         | 
         | I always wanted to ignore all errors form this (there was a lot
         | of "permission denied"), but tcsh just didn't have a simple
         | ability to do so. This taught me a valuable lesson about some
         | software just being better than other. And to this day, I keep
         | wondering you would people choose to use csh/tcsh voluntarily.
        
           | layer8 wrote:
           | Tcsh originally was more user-friendly for interactive use.
           | The rest is inertia.
        
       | mmastrac wrote:
       | It's a shame that stdX streams were never spec'd as sockets, with
       | appropriate handling available in the various shells.
       | 
       | Also, file handle inheritance by default was such a big mistake.
        
         | nulld3v wrote:
         | Yeah, POSIX made choices that looked sane and even elegant at
         | the time, but nowadays I think it is fair to say that they have
         | not aged well. Like it's not just FDs getting inherited by
         | default, almost _everything_ gets inherited by default:
         | 
         | Working dir, env vars, uid/gid, socket handles, file
         | descriptors, (some) file locks, message queues. AFAIK the only
         | exception is the argv, everything else is inherited on fork or
         | exec.
         | 
         | Sometimes this makes sense, but programmers always forget about
         | this, resulting in security incidents. Eventually most
         | programming languages gave up and updated their stdlibs to set
         | CLOEXEC when opening files and sockets, knowing that it would
         | break POSIX compatibility and API compatibility on their
         | stdlibs. Python is one example:
         | https://peps.python.org/pep-0446/
         | 
         | The "inherit by default" behavior also makes it very difficult
         | to evolve the shell interface. The nushell devs are looking for
         | a reliable way to request JSON output/input on processes
         | spawned by the shell (if supported by the program). Naively
         | passing env vars or FDs to the process causes problems because
         | if the process spawns any children of it's own, they too would
         | also inherit those env vars or FDs.
        
         | bandie91 wrote:
         | process inheritance was the best invention, because it models
         | reality quite close. you dont have new things just sitting in
         | an empty universe all alone and initialize everything themself
         | from ... somewhere ... because everything is reset around them.
         | 
         | environment (in a broader sense, not just environment
         | variables, but also CWD, file handles, uid/gid, sec context,
         | namespaces) is there for a reason: to use. if you dont want
         | your children processes to read the stdin in place of you, dont
         | give it to them. it's the parent process responsibility to set
         | up the env for the children.
         | 
         | although subprocesses are invented to do (some of) the parent's
         | job by delegating smaller steps and leave the details to them.
         | for example a http server would read the request (first) line,
         | then delegate the rest of the input to a subprocess (worker)
         | depending on who is free, who handles which type of request,
         | etc. this is original idea behind inheritance, IMO.
        
       | smarx007 wrote:
       | This is long overdue. PowerShell has long supported passing
       | structured output (objects) via pipes and this is the closest
       | attempt to approximate that without breaking the world.
        
         | account-5 wrote:
         | I don't know, Nushell does a pretty good job.
         | 
         | https://www.nushell.sh/
        
       | williamcotton wrote:
       | For this the key would be to eliminate serialization and
       | deserialization between steps in the pipeline.
        
       | superdisk wrote:
       | Tangential but I was surprised to see that tree(1), at least the
       | popular implementation, is made in Terre Haute (which is where
       | I'm from). Maybe I should invite the author for lunch or
       | something :)
        
       | Joker_vD wrote:
       | > who knew how many scripts just haphazardly hand programs random
       | file descriptors, that's surely not a problem.
       | 
       | Oh for fuck's sake! Why are _you_ using random file descriptors
       | nobody told you about? Those open fds are there for a reason,
       | thank you: I 've put an end of an open pipe specifically so I
       | could notice when it will become closed.
       | 
       | If the user set up the environment of your application in a
       | specific way, that means he wants your application to run in such
       | an environment. If you were invoked with 10 non-standard file
       | descriptors open and two injected threads -- you'll have to live
       | with it. Because, believe it or not, your application's purpose
       | is to _serve the user 's goals_. So don't break composability
       | that the user relies on, _please_.
        
         | listeria wrote:
         | This is the first I've heard of using an open pipe to poll for
         | subprocess termination. Don't get me wrong, I don't _hate_ it,
         | but you could just as easily have a SIGCHLD handler write to
         | your pipe (or do nothing, since poll(2) will be fail with
         | EINTR), and you don 't have to worry about the subprocess
         | closing the pipe or considering it some weird stddata fd like
         | tree does here.
        
           | o11c wrote:
           | `SIGCHLD` is extremely unreliable in a lot of ways, `pidfd`
           | is better (but Linux-specific), though it doesn't handle the
           | case of wanting to be notified of all _grandchildren_ 's
           | terminations after the direct child dies early.
        
       | veltas wrote:
       | The environment variable isn't much better, both are akin to
       | using a global var in your reentrant code, but at least
       | STDDATA_FD is less likely to collide than 3.
       | 
       | Can't wait for scripts using this variable for something
       | unrelated to break when they call my scripts.
       | 
       | This should be a parameter or argv[0]-based.
        
         | bcrl wrote:
         | That doesn't work reliably either. No existing code scrubs
         | STDDATA_FD from their environment variables, and there's no way
         | to know if anyone uses STDDATA_FD in the wild. Why not just use
         | a command line parameter like everyone else? Different isn't
         | better in a situation like this.
         | 
         | This is a larger concern I've started to see in a certain class
         | of younger developer where existing conventions are just
         | ignored without an attempt at understanding of why they exist.
         | Things are only going to get worse as naive vibe coders start
         | flinging more AI generated garbage out into the world. I pity
         | the pole folks trying to maintain these systems a couple of
         | decades from now.
        
           | veltas wrote:
           | That's what I really meant by saying a parameter, it should
           | be an option/flag that's given explicitly at invocation, or
           | just a different program name.
        
             | kps wrote:
             | Just go for `--json-output= _filename_ ` rather than
             | playing games.
        
       | dodomodo wrote:
       | every time I see the output of nushell I get so disappointed,
       | they got the formatting so wrong, all the extra delimiters makes
       | it hard to actually read the data. powershell got it right, using
       | alignment. if you look at virtually all shell programs until the
       | last few years you are going to see a similar, alignment based
       | output. only recently, with the rise of the abuse of ligature, we
       | started seeing this kind of incomprehensible blobs surrounding
       | our text.
        
         | secret-noun wrote:
         | The author states they're using nushell's `markdown` table
         | style because of issues with their font rendering certain
         | characters. `rounded` is the default and indeed, `markdown`
         | looks truly horrible in comparison.
         | 
         | Nushell's front page [1] shows an example of rounded, and
         | here's an example of an even further customized version [2].
         | 
         | I think these are very readable. There is alignment too, but
         | it's "local" alignment to cells in the same sub-table, not
         | "global" to the entire table -- this is good for fitting more
         | stuff into your terminal width without wrapping.
         | 
         | A supporting font is required though, yes.
         | 
         | [1]: https://www.nushell.sh/
         | 
         | [2]: https://i.imgur.com/U4MnYLe.png
        
           | dodomodo wrote:
           | nushell front page is exactly what I was referring to.
           | Compare the legibility of the ls command in the front page to
           | a regular ls command, it's insane how much more cluttered the
           | nushell version is.
        
       | EdSchouten wrote:
       | If only there was a variant of execve() / posix_spawn() that
       | simply took a literal array of which file descriptors would need
       | to be present in the new process. So that you can say:
       | int subprocess_stdin = open("/dev/null", O_RDONLY);         int
       | subprocess_stdout = open("some_output", O_WRONLY);         int
       | subprocess_stderr = STDERR_FILENO; // Let the subprocess use the
       | same stderr as me.         int subprocess_fds[] =
       | {subprocess_stdin, subprocess_stdout, subprocess_stderr};
       | posix_spawn_with_fds("my process", [...], subprocess_fds, 3);
       | 
       | Never understood why POSIX makes all of this so hard.
        
         | Y_Y wrote:
         | > Never understood why POSIX makes all of this so hard
         | 
         | I honestly can't say in this particular instance but always my
         | (unpopular?) instinct im such a situation is to asdume there is
         | a good reason and I just haven't understood it yet. It may have
         | become irrelevant in the meantime, but I can't know until I
         | understand, and it's served me well to give the patriarchs the
         | benefit of the doubt in such cases.
        
         | alerighi wrote:
         | It's something trivial to write (~20 lines of code), there is
         | no point for standard library to provide that kind of functions
         | in my opinion.
         | 
         | You do after the fork() (or clone, on Linux) a for loop that
         | closes every FD except the one you want to keep. In Linux there
         | is a close_range system call to close a range of in one call.
         | 
         | POSIX is an API designed to be a small layer on the operating
         | system, and designed to make as little assumption as possible
         | to the underlying system. This is the reason why POSIX is
         | nowadays implemented even on low resources embedded devices and
         | similar stuff.
         | 
         | At an higher level it's possible to use higher level
         | abstractions to manipulate processed (e.g. a C++ library that
         | does all of the above with a modern interface).
        
           | deathanatos wrote:
           | ... what POSIX API gets you the open FDs? (Or even just the
           | maximum open FD, and we'll just cause a bunch of errors
           | closing non-existent FDs.)
        
             | o11c wrote:
             | That's `sysconf(_SC_OPEN_MAX)`, but it is _always_ an bug
             | to close FDs you don 't know the origin of. You should be
             | specifying `O_CLOEXEC` by default if you want FDs closed
             | automatically.
        
               | deathanatos wrote:
               | That won't returned the maximum open file descriptor. You
               | _could_ perhaps use that value in lieu of the maximum
               | open file descriptor and loop through a crap ton more FDs
               | than even my previous post implied, I suppose, and this
               | is getting less efficient and more terribly engineered by
               | the comment, which I think proves the point...
               | 
               | > _but it is always an bug to close FDs you don 't know
               | the origin of._
               | 
               | And I would agree. I'm replying to the poster above me,
               | who is staking the claim that POSIX permits closing all
               | open file descriptors other than a desired set.
               | 
               | So, I suppose it can, at a cost of a few thousand
               | syscalls that'll all be pointless...
        
           | o11c wrote:
           | It is _always_ a bug to call `closerange` since you never
           | know if a parent process has _deliberately_ left a file
           | descriptor open for some kind of tracing. If the parent does
           | not want this, it _must_ use `O_CLOEXEC`. _Maybe_ if you
           | clear the entire environment you 'll be fine?
           | 
           | That said, it _is_ trivial to write a loop that takes a set
           | of _known_ old and new fd numbers (including e.g. swapping)
           | produces a set of calls to `dup2` and `fcntl` to give them
           | the new numbers, while correctly leaving all open fds open.
        
         | oguz-ismail wrote:
         | It's not hard, just a bit too long:                   #include
         | <fcntl.h>         #include <spawn.h>                  int
         | main(void) {           posix_spawn_file_actions_t file_actions;
         | posix_spawn_file_actions_init(&file_actions);
         | posix_spawn_file_actions_addopen(&file_actions, 0, "/dev/null",
         | O_RDONLY, 0);
         | posix_spawn_file_actions_addopen(&file_actions, 2, "/dev/null",
         | O_WRONLY, 0);           posix_spawnp(NULL, "ls", &file_actions,
         | NULL, (const char *[]){"ls", "-l", "/proc/self/fd", NULL},
         | NULL);
         | posix_spawn_file_actions_destroy(&file_actions);         }
        
       | js8 wrote:
       | Why not use a saner protocol than JSON, e.g. CBOR?
        
         | totallymike wrote:
         | Is CBOR as popularly supported as JSON?
         | 
         | Also, to answer your question with a guess, I would suppose
         | it's because they wanted to use JSON and they wrote the
         | feature.
        
           | thechao wrote:
           | I do a _lot_ of very low level programming with awful
           | performance-maintenance trade-offs. Here 's a great trick for
           | a "binary" JSON: remove all of the extra whitespace,
           | normalize your numbers, and the LZ4 the resulting string.
           | 
           | UTF-8 is already a great wire format.
           | 
           | I've never found a "binary JSON" that's significantly better
           | than this; I mean you _can_ beat it, but you need awkward
           | encodings (prefix indices  & other weird shit). You end up
           | burning nearly-byte for any particularly clever integer
           | encoding.
           | 
           | Most data structures are just nested arrays of integers. If
           | you need an integer keyed OBJECT you're SOL, but I just play
           | fiddly games with astral plane UTF-8 characters. (Yeah yeah
           | yeah _ad hoc_ encodings are nasty news.)
           | 
           | If you've got a BUTT LOAD of data just fire up a compressing
           | SQLite DB like a normal human.
        
             | js8 wrote:
             | If you're interested in performance, what about all the
             | number conversion (to decimals, presumably) that is
             | incurred with JSON?
        
               | thechao wrote:
               | If I'm interested in performance I'll build my data out
               | of offset handles and lay everything down into a block
               | and mmap() it around. That's parsing free, up to an
               | htons() -- but that's only a worst case scenario.
               | Everything else is about not inventing something custom &
               | being able to use easily vendored high-trust 3rd party
               | tools. (In this case: a JSON library, LZ4, and/or
               | SQLite.)
        
         | tonyarkles wrote:
         | Do you have a CBOR implementation that you like? Ideally one
         | with decent schema support? I was looking into CBOR as a
         | replacement for Protobufs for an embedded system I work on and
         | it's got a lot going for it but every implementation I looked
         | at seemed to support a very different subset of the schema spec
         | and it was brutal to try to find a pair of libraries (C for the
         | embedded side, C++ for the host side) that could actually share
         | a set of schema files.
        
       | crabbone wrote:
       | The company I work for is guilty of abusing 3. We use it for
       | debug output of user-supplied scripts that are meant to implement
       | monitoring / metrics :'(
       | 
       | This is the first time I hear about stddata though. Is this a
       | thing that's going into a standard? Is there already? Or is it
       | just a name someone gave to it and it's not a real thing?
        
       | fvwmuser wrote:
       | I wouldn't have said this is anything new.
       | 
       | FreeBSD has libxo[0] integrated into some of its tools:
       | 
       | [0] https://github.com/Juniper/libxo
        
         | theamk wrote:
         | Except they went with --libxo command-line option, which is
         | extremely unlikely to cause any problems in the existing
         | scripts.
        
       | krick wrote:
       | Can somebody explain what's going on here? It seems I'm missing
       | some important piece of background info. Why don't they just add
       | -J flag for everyone who wants to output JSON? Oh, wait, tree
       | already has -J flag to output JSON. So WTF are they doing here?
       | 
       | I am especially confused by this:
       | 
       | > Surely, nothing will happen if I just assume that the existence
       | of a specific file descriptor implies something, as nobody is
       | crazy or stupid enough to hardcode such a thing?
       | 
       | Wait, what? But "you" (tree authors) just hardcoded such a thing.
       | Do "you" have some special permission to do this nonsense?
        
       | cryptonector wrote:
       | Sorry, but this is going to be very dangerous because much code
       | will close unwanted FDs then open others. It's 50 years too late
       | to add this convention.
       | 
       | Instead maybe we need new system calls that return dups of a
       | hidden stddata FD or create/replace it.
        
       ___________________________________________________________________
       (page generated 2025-07-25 23:02 UTC)