[HN Gopher] 3-JSON
___________________________________________________________________
3-JSON
Author : RGBCube
Score : 90 points
Date : 2025-07-21 10:39 UTC (4 days ago)
(HTM) web link (rgbcu.be)
(TXT) w3m dump (rgbcu.be)
| NoboruWataya wrote:
| I've never heard of stddata. What distro/environment provides it?
| jamessb wrote:
| Nor have I; I think it is just what the developer of tree has
| chosen to call file descriptor 3, rather than being a wider
| convention or standard thing provided by the environment.
|
| > As of version 2.0.0, in Linux, tree will attempt to
| automatically output a compact JSON tree on file descriptor 3
| (what I call stddata,) if present
|
| https://github.com/Old-Man-Programmer/tree/blob/d501b58ff9cb...
| deathanatos wrote:
| It's a local invention of TFA's, AFAIK. It's not "std".
|
| stdout would be the canonical location for putting JSON output
| (and the "data" of a command, generally). Then things like `|
| jq` just work.
| zbendefy wrote:
| offtopic: why is the Copyright (c) icon shake like crazy at the
| bottom of the page?
|
| Edit: Oh I guess it seems to be intentional, I clicked around and
| I like the rgbcube site map.
| omnicognate wrote:
| <copyright intensifies>
| gerikson wrote:
| > Okay, apparently the stddata addition is causing havoc (who
| knew how many scripts just haphazardly hand programs random file
| descriptors, that's surely not a problem.)
|
| I knew, and I've known since reading the "C shell considered
| harmful" paper, which offhandedly mentioned that sh-based shells
| can use an arbitrary number of file descriptors (maybe they have
| to be one-digit integers though). csh can't, of course.
|
| It's discussed in the first section here
|
| https://harmful.cat-v.org/software/csh
| theamk wrote:
| this brings memories - university, first Unix exposure, Sun Ray
| terminals, "tcsh" as default shell, and me doing "find / -name
| ..." a lot.
|
| I always wanted to ignore all errors form this (there was a lot
| of "permission denied"), but tcsh just didn't have a simple
| ability to do so. This taught me a valuable lesson about some
| software just being better than other. And to this day, I keep
| wondering you would people choose to use csh/tcsh voluntarily.
| layer8 wrote:
| Tcsh originally was more user-friendly for interactive use.
| The rest is inertia.
| mmastrac wrote:
| It's a shame that stdX streams were never spec'd as sockets, with
| appropriate handling available in the various shells.
|
| Also, file handle inheritance by default was such a big mistake.
| nulld3v wrote:
| Yeah, POSIX made choices that looked sane and even elegant at
| the time, but nowadays I think it is fair to say that they have
| not aged well. Like it's not just FDs getting inherited by
| default, almost _everything_ gets inherited by default:
|
| Working dir, env vars, uid/gid, socket handles, file
| descriptors, (some) file locks, message queues. AFAIK the only
| exception is the argv, everything else is inherited on fork or
| exec.
|
| Sometimes this makes sense, but programmers always forget about
| this, resulting in security incidents. Eventually most
| programming languages gave up and updated their stdlibs to set
| CLOEXEC when opening files and sockets, knowing that it would
| break POSIX compatibility and API compatibility on their
| stdlibs. Python is one example:
| https://peps.python.org/pep-0446/
|
| The "inherit by default" behavior also makes it very difficult
| to evolve the shell interface. The nushell devs are looking for
| a reliable way to request JSON output/input on processes
| spawned by the shell (if supported by the program). Naively
| passing env vars or FDs to the process causes problems because
| if the process spawns any children of it's own, they too would
| also inherit those env vars or FDs.
| bandie91 wrote:
| process inheritance was the best invention, because it models
| reality quite close. you dont have new things just sitting in
| an empty universe all alone and initialize everything themself
| from ... somewhere ... because everything is reset around them.
|
| environment (in a broader sense, not just environment
| variables, but also CWD, file handles, uid/gid, sec context,
| namespaces) is there for a reason: to use. if you dont want
| your children processes to read the stdin in place of you, dont
| give it to them. it's the parent process responsibility to set
| up the env for the children.
|
| although subprocesses are invented to do (some of) the parent's
| job by delegating smaller steps and leave the details to them.
| for example a http server would read the request (first) line,
| then delegate the rest of the input to a subprocess (worker)
| depending on who is free, who handles which type of request,
| etc. this is original idea behind inheritance, IMO.
| smarx007 wrote:
| This is long overdue. PowerShell has long supported passing
| structured output (objects) via pipes and this is the closest
| attempt to approximate that without breaking the world.
| account-5 wrote:
| I don't know, Nushell does a pretty good job.
|
| https://www.nushell.sh/
| williamcotton wrote:
| For this the key would be to eliminate serialization and
| deserialization between steps in the pipeline.
| superdisk wrote:
| Tangential but I was surprised to see that tree(1), at least the
| popular implementation, is made in Terre Haute (which is where
| I'm from). Maybe I should invite the author for lunch or
| something :)
| Joker_vD wrote:
| > who knew how many scripts just haphazardly hand programs random
| file descriptors, that's surely not a problem.
|
| Oh for fuck's sake! Why are _you_ using random file descriptors
| nobody told you about? Those open fds are there for a reason,
| thank you: I 've put an end of an open pipe specifically so I
| could notice when it will become closed.
|
| If the user set up the environment of your application in a
| specific way, that means he wants your application to run in such
| an environment. If you were invoked with 10 non-standard file
| descriptors open and two injected threads -- you'll have to live
| with it. Because, believe it or not, your application's purpose
| is to _serve the user 's goals_. So don't break composability
| that the user relies on, _please_.
| listeria wrote:
| This is the first I've heard of using an open pipe to poll for
| subprocess termination. Don't get me wrong, I don't _hate_ it,
| but you could just as easily have a SIGCHLD handler write to
| your pipe (or do nothing, since poll(2) will be fail with
| EINTR), and you don 't have to worry about the subprocess
| closing the pipe or considering it some weird stddata fd like
| tree does here.
| o11c wrote:
| `SIGCHLD` is extremely unreliable in a lot of ways, `pidfd`
| is better (but Linux-specific), though it doesn't handle the
| case of wanting to be notified of all _grandchildren_ 's
| terminations after the direct child dies early.
| veltas wrote:
| The environment variable isn't much better, both are akin to
| using a global var in your reentrant code, but at least
| STDDATA_FD is less likely to collide than 3.
|
| Can't wait for scripts using this variable for something
| unrelated to break when they call my scripts.
|
| This should be a parameter or argv[0]-based.
| bcrl wrote:
| That doesn't work reliably either. No existing code scrubs
| STDDATA_FD from their environment variables, and there's no way
| to know if anyone uses STDDATA_FD in the wild. Why not just use
| a command line parameter like everyone else? Different isn't
| better in a situation like this.
|
| This is a larger concern I've started to see in a certain class
| of younger developer where existing conventions are just
| ignored without an attempt at understanding of why they exist.
| Things are only going to get worse as naive vibe coders start
| flinging more AI generated garbage out into the world. I pity
| the pole folks trying to maintain these systems a couple of
| decades from now.
| veltas wrote:
| That's what I really meant by saying a parameter, it should
| be an option/flag that's given explicitly at invocation, or
| just a different program name.
| kps wrote:
| Just go for `--json-output= _filename_ ` rather than
| playing games.
| dodomodo wrote:
| every time I see the output of nushell I get so disappointed,
| they got the formatting so wrong, all the extra delimiters makes
| it hard to actually read the data. powershell got it right, using
| alignment. if you look at virtually all shell programs until the
| last few years you are going to see a similar, alignment based
| output. only recently, with the rise of the abuse of ligature, we
| started seeing this kind of incomprehensible blobs surrounding
| our text.
| secret-noun wrote:
| The author states they're using nushell's `markdown` table
| style because of issues with their font rendering certain
| characters. `rounded` is the default and indeed, `markdown`
| looks truly horrible in comparison.
|
| Nushell's front page [1] shows an example of rounded, and
| here's an example of an even further customized version [2].
|
| I think these are very readable. There is alignment too, but
| it's "local" alignment to cells in the same sub-table, not
| "global" to the entire table -- this is good for fitting more
| stuff into your terminal width without wrapping.
|
| A supporting font is required though, yes.
|
| [1]: https://www.nushell.sh/
|
| [2]: https://i.imgur.com/U4MnYLe.png
| dodomodo wrote:
| nushell front page is exactly what I was referring to.
| Compare the legibility of the ls command in the front page to
| a regular ls command, it's insane how much more cluttered the
| nushell version is.
| EdSchouten wrote:
| If only there was a variant of execve() / posix_spawn() that
| simply took a literal array of which file descriptors would need
| to be present in the new process. So that you can say:
| int subprocess_stdin = open("/dev/null", O_RDONLY); int
| subprocess_stdout = open("some_output", O_WRONLY); int
| subprocess_stderr = STDERR_FILENO; // Let the subprocess use the
| same stderr as me. int subprocess_fds[] =
| {subprocess_stdin, subprocess_stdout, subprocess_stderr};
| posix_spawn_with_fds("my process", [...], subprocess_fds, 3);
|
| Never understood why POSIX makes all of this so hard.
| Y_Y wrote:
| > Never understood why POSIX makes all of this so hard
|
| I honestly can't say in this particular instance but always my
| (unpopular?) instinct im such a situation is to asdume there is
| a good reason and I just haven't understood it yet. It may have
| become irrelevant in the meantime, but I can't know until I
| understand, and it's served me well to give the patriarchs the
| benefit of the doubt in such cases.
| alerighi wrote:
| It's something trivial to write (~20 lines of code), there is
| no point for standard library to provide that kind of functions
| in my opinion.
|
| You do after the fork() (or clone, on Linux) a for loop that
| closes every FD except the one you want to keep. In Linux there
| is a close_range system call to close a range of in one call.
|
| POSIX is an API designed to be a small layer on the operating
| system, and designed to make as little assumption as possible
| to the underlying system. This is the reason why POSIX is
| nowadays implemented even on low resources embedded devices and
| similar stuff.
|
| At an higher level it's possible to use higher level
| abstractions to manipulate processed (e.g. a C++ library that
| does all of the above with a modern interface).
| deathanatos wrote:
| ... what POSIX API gets you the open FDs? (Or even just the
| maximum open FD, and we'll just cause a bunch of errors
| closing non-existent FDs.)
| o11c wrote:
| That's `sysconf(_SC_OPEN_MAX)`, but it is _always_ an bug
| to close FDs you don 't know the origin of. You should be
| specifying `O_CLOEXEC` by default if you want FDs closed
| automatically.
| deathanatos wrote:
| That won't returned the maximum open file descriptor. You
| _could_ perhaps use that value in lieu of the maximum
| open file descriptor and loop through a crap ton more FDs
| than even my previous post implied, I suppose, and this
| is getting less efficient and more terribly engineered by
| the comment, which I think proves the point...
|
| > _but it is always an bug to close FDs you don 't know
| the origin of._
|
| And I would agree. I'm replying to the poster above me,
| who is staking the claim that POSIX permits closing all
| open file descriptors other than a desired set.
|
| So, I suppose it can, at a cost of a few thousand
| syscalls that'll all be pointless...
| o11c wrote:
| It is _always_ a bug to call `closerange` since you never
| know if a parent process has _deliberately_ left a file
| descriptor open for some kind of tracing. If the parent does
| not want this, it _must_ use `O_CLOEXEC`. _Maybe_ if you
| clear the entire environment you 'll be fine?
|
| That said, it _is_ trivial to write a loop that takes a set
| of _known_ old and new fd numbers (including e.g. swapping)
| produces a set of calls to `dup2` and `fcntl` to give them
| the new numbers, while correctly leaving all open fds open.
| oguz-ismail wrote:
| It's not hard, just a bit too long: #include
| <fcntl.h> #include <spawn.h> int
| main(void) { posix_spawn_file_actions_t file_actions;
| posix_spawn_file_actions_init(&file_actions);
| posix_spawn_file_actions_addopen(&file_actions, 0, "/dev/null",
| O_RDONLY, 0);
| posix_spawn_file_actions_addopen(&file_actions, 2, "/dev/null",
| O_WRONLY, 0); posix_spawnp(NULL, "ls", &file_actions,
| NULL, (const char *[]){"ls", "-l", "/proc/self/fd", NULL},
| NULL);
| posix_spawn_file_actions_destroy(&file_actions); }
| js8 wrote:
| Why not use a saner protocol than JSON, e.g. CBOR?
| totallymike wrote:
| Is CBOR as popularly supported as JSON?
|
| Also, to answer your question with a guess, I would suppose
| it's because they wanted to use JSON and they wrote the
| feature.
| thechao wrote:
| I do a _lot_ of very low level programming with awful
| performance-maintenance trade-offs. Here 's a great trick for
| a "binary" JSON: remove all of the extra whitespace,
| normalize your numbers, and the LZ4 the resulting string.
|
| UTF-8 is already a great wire format.
|
| I've never found a "binary JSON" that's significantly better
| than this; I mean you _can_ beat it, but you need awkward
| encodings (prefix indices & other weird shit). You end up
| burning nearly-byte for any particularly clever integer
| encoding.
|
| Most data structures are just nested arrays of integers. If
| you need an integer keyed OBJECT you're SOL, but I just play
| fiddly games with astral plane UTF-8 characters. (Yeah yeah
| yeah _ad hoc_ encodings are nasty news.)
|
| If you've got a BUTT LOAD of data just fire up a compressing
| SQLite DB like a normal human.
| js8 wrote:
| If you're interested in performance, what about all the
| number conversion (to decimals, presumably) that is
| incurred with JSON?
| thechao wrote:
| If I'm interested in performance I'll build my data out
| of offset handles and lay everything down into a block
| and mmap() it around. That's parsing free, up to an
| htons() -- but that's only a worst case scenario.
| Everything else is about not inventing something custom &
| being able to use easily vendored high-trust 3rd party
| tools. (In this case: a JSON library, LZ4, and/or
| SQLite.)
| tonyarkles wrote:
| Do you have a CBOR implementation that you like? Ideally one
| with decent schema support? I was looking into CBOR as a
| replacement for Protobufs for an embedded system I work on and
| it's got a lot going for it but every implementation I looked
| at seemed to support a very different subset of the schema spec
| and it was brutal to try to find a pair of libraries (C for the
| embedded side, C++ for the host side) that could actually share
| a set of schema files.
| crabbone wrote:
| The company I work for is guilty of abusing 3. We use it for
| debug output of user-supplied scripts that are meant to implement
| monitoring / metrics :'(
|
| This is the first time I hear about stddata though. Is this a
| thing that's going into a standard? Is there already? Or is it
| just a name someone gave to it and it's not a real thing?
| fvwmuser wrote:
| I wouldn't have said this is anything new.
|
| FreeBSD has libxo[0] integrated into some of its tools:
|
| [0] https://github.com/Juniper/libxo
| theamk wrote:
| Except they went with --libxo command-line option, which is
| extremely unlikely to cause any problems in the existing
| scripts.
| krick wrote:
| Can somebody explain what's going on here? It seems I'm missing
| some important piece of background info. Why don't they just add
| -J flag for everyone who wants to output JSON? Oh, wait, tree
| already has -J flag to output JSON. So WTF are they doing here?
|
| I am especially confused by this:
|
| > Surely, nothing will happen if I just assume that the existence
| of a specific file descriptor implies something, as nobody is
| crazy or stupid enough to hardcode such a thing?
|
| Wait, what? But "you" (tree authors) just hardcoded such a thing.
| Do "you" have some special permission to do this nonsense?
| cryptonector wrote:
| Sorry, but this is going to be very dangerous because much code
| will close unwanted FDs then open others. It's 50 years too late
| to add this convention.
|
| Instead maybe we need new system calls that return dups of a
| hidden stddata FD or create/replace it.
___________________________________________________________________
(page generated 2025-07-25 23:02 UTC)