[HN Gopher] JC converts the output of popular command-line tools...
       ___________________________________________________________________
        
       JC converts the output of popular command-line tools to JSON
        
       Author : tosh
       Score  : 288 points
       Date   : 2023-12-08 14:12 UTC (8 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | calvinmorrison wrote:
       | In a certain sense, files everywhere is great, that's the promise
       | of unix, or plan9 to a further extent.
       | 
       | However, unstructured files, or files that all have their own
       | formats, is also equally hampering. Trying to even parse an nginx
       | log file can be annoying with just awk or some such.
       | 
       | One of the big disadvantages is that large system rewrites and
       | design changes cannot be executed in the linux userland.
       | 
       | All to say, I'd love a smarter shell, I love files, I have my awk
       | book sitting next to me, but I think it's high time to get some
       | serious improvements on parsing data.
       | 
       | In the same way programs are smart enough to know to render
       | colored output or not, I'd love it if it could dump structured
       | output (or not)
        
         | mikepurvis wrote:
         | When it comes to parsing server logs, it's too bad the
         | functionality can't be extracted out of something like
         | logstash, since that's already basically doing the same thing.
         | 
         | Though I guess the real endgame here is for upstream tools to
         | eventually recognize the value and learn how to directly supply
         | structured output.
        
           | dale_glass wrote:
           | You can get that out of journald.
           | journalctl -o json
           | 
           | And applications using journald directly can provide their
           | own custom fields.
        
         | hoherd wrote:
         | > In the same way programs are smart enough to know to render
         | colored output or not, I'd love it if it could dump structured
         | output (or not)
         | 
         | The even lower hanging fruit is to implement json output as a
         | command line argument in all cli tools. I would love to see
         | this done for the gnu core utils.
        
           | teddyh wrote:
           | At least "du", "env", and "printenv" in Coreutils all support
           | the "--null" option to separate output into being NUL-
           | separated instead of being text lines.
        
           | bryanlarsen wrote:
           | It was a really pleasant surprise to find the "-j" option to
           | do this for the "ip" command from the iproute2 project.
        
           | mr_mitm wrote:
           | AFAIK the idea is that if you need that kind of
           | interoperability in unix/gnu, you're supposed to write your
           | tool in C and include some libraries. Clearly not realistic
           | in many use cases.
        
         | dale_glass wrote:
         | Yup. It really grinds my gears that people came up with fairly
         | decent ideas half a century ago, and a large amount of people
         | decided to take that as gospel rather than as something to
         | improve on.
         | 
         | And it's like pulling teeth to get any improvement, because the
         | moment somebody like Lennart tries to get rid of decades of old
         | cruft, drama erupts.
         | 
         | And even JSON is still not quite there. JSON is an okay-ish
         | idea, but to do this properly what we need is a format that can
         | expose things like datatypes. More like PowerShell. So that we
         | can do amazing feats like treating a number like a number, and
         | calculating differences between dates by doing $a - $b.
        
           | CyberDildonics wrote:
           | If you want something more complex and restrictive you could
           | easily make it out of JSON. JSON works because it is simple
           | and isn't being constantly distorted into something more
           | complicated to cover niche use cases.
        
         | da_chicken wrote:
         | There is always Powershell. The trouble there is that it's so
         | rooted in .Net and objects that it's very difficult to
         | integrate with existing native commands on any platform.
        
           | uxp8u61q wrote:
           | If these "native commands" had a sensible output format,
           | integrating them with powershell would be as simple as
           | putting ConvertFrom-Json or ConvertFrom-Csv in the middle of
           | the pipeline. And let's be real, it's as poorly "integrated"
           | with "native commands" as bash or zsh is poorly integrated
           | with native commands.
        
             | da_chicken wrote:
             | You're not wrong, but the fix is then, "hey, let's extend
             | the functionality of literally every *nix command". Which
             | is hard to achieve. Building sensible serialized object-
             | oriented output is the same problem as building sensible
             | object-oriented output. That's why jc exists, I suppose.
             | 
             | There is always ConvertFrom-String. The problem is that it
             | is one of the least intuitive and worst-performing commands
             | I've used in Powershell. It's awful and I hate it. It's
             | like writing sed and awk commands without the benefit of
             | sed and awk's maturity and ubiquity. IMX, only Compare-
             | Object has been worse.
        
               | uxp8u61q wrote:
               | > "hey, let's extend the functionality of literally every
               | *nix command". Which is hard to achieve.
               | 
               | It's pretty much what powershell did, though.
        
               | imtringued wrote:
               | It sounds like there needs to be a monolithic solution
               | built from the ground up for interoperation.
               | 
               | Updating coreutils is a losing game at this point. Of
               | course people will get angry when a well designed
               | solution gradually takes over simply, because it is
               | better.
        
               | da_chicken wrote:
               | Eh, not really. Powershell itself is fairly limited in
               | terms of functionality. It's basically one step removed
               | from a .Net REPL, and the .Net classes aren't necessarily
               | written to do *nix admin tasks. It gives you all of .Net
               | to build tools with, but sometimes you still run into
               | Microsoftisms that are total nonsense. There's a reason
               | every C# project was using Newtonsoft's JSON.Net instead
               | of the wonky custom data representations that MS was
               | trying to push left over from the
               | embrace/extend/extinguish era.
        
               | uxp8u61q wrote:
               | I see a lot of buzzwords and attacks in this comment but
               | nothing actually concrete.
        
         | mistercow wrote:
         | Part of the problem is that the output of commands is both a UI
         | and an API, and because any text UI _can_ be used as an API,
         | the human readable text gets priority. Shell scripting is
         | therefore kind of like building third party browser extensions.
         | You look and you guess, and then you hack some parser up based
         | on your guess, and hope for the best.
         | 
         | I actually wish there was just a third standard output for
         | machine readable content, which your terminal doesn't print by
         | default. When you pipe, this output is what gets piped (unless
         | you redirect), it's expected to be jsonl, and the man page is
         | expected to specify a contract. Then stdout can be for humans,
         | and while you _can_ parse it, you know what you're doing is
         | fragile.
         | 
         | Of course, that's totally backwards incompatible, and as long
         | as we're unrealistically reinventing CLIs from the foundations
         | to modernize them, I have a long list of changes I'd make.
        
           | theblazehen wrote:
           | Have you seen some of the existing projects currently working
           | on it? Most well known is https://www.nushell.sh/, amongst
           | some others
        
           | chongli wrote:
           | I really want to agree because it seems to make so much sense
           | in theory. It gets rid of the need to parse by standardizing
           | on syntax. But it doesn't solve everything, so we still don't
           | get to live the dream. Namely, it does not solve the issue of
           | schemas, versioning, and migration.
           | 
           | And this is a really big issue that threatens to derail the
           | whole project. If my script runs a pipeline `a | b | c` and
           | utility b gets updated with a breaking change to the schema,
           | it breaks the entire script. Now I've got to go deep into the
           | weeds to figure out what happened, and the breakage might not
           | be visible in the human-readable output. So to debug I'll
           | have to pass the flag to each tool to get it to print all the
           | json to stdout, and then sit there eyeballing the json to
           | figure out how the schema changed and what I need to do to
           | fix my script.
           | 
           | Seems like a big mess to me. Unless there's something I'm
           | missing?
        
           | Izkata wrote:
           | > I actually wish there was just a third standard output for
           | machine readable content, which your terminal doesn't print
           | by default. When you pipe, this output is what gets piped
           | (unless you redirect), it's expected to be jsonl, and the man
           | page is expected to specify a contract.
           | 
           | Except for that last jsonl part, various commands already do
           | something like this with stdout by detecting what's on the
           | other end.
           | 
           | https://unix.stackexchange.com/questions/515778/how-does-
           | a-p...
        
         | numbsafari wrote:
         | > Trying to even parse an nginx log file can be annoying with
         | just awk or some such.
         | 
         | You probably already know this, but for those who do not, you
         | can configure nginx to generate JSON log output.
         | 
         | Quite handy if you are aggregating structured logs across your
         | stack.
        
         | imtringued wrote:
         | The problem is that nobody has built an actual ffi solution
         | except maybe the GObject guys. C isn't an ffi, because you need
         | a C compiler to make it work. By that I mean it is not an
         | interface, but rather just C code whose calling part has been
         | embedded into your application.
        
       | pushedx wrote:
       | I salute whoever chooses to maintain this
        
         | alex_suzuki wrote:
         | I wonder how they will address versions...
         | 
         | `aws s3 ls | jc ---aws=1.2.3`
         | 
         | What a nightmare.
        
           | dj_mc_merlin wrote:
           | What about
           | 
           | jc 'aws sts get-caller-identity' | jq [..]
           | 
           | That way the aws process can be a subprocess of jc, which can
           | read where the binary is and get its version automatically.
        
             | jasonjayr wrote:
             | That can get thorny, because it adds another level of
             | shell-quoting/escapes, and that is a notorious vector for
             | security problems.
        
             | sesm wrote:
             | jc already has this, see 'jc dig example.com' in examples.
             | They call it 'alternative magic syntax', but IMO it should
             | be the primary syntax, while piping and second-guessing the
             | previous commands parameters and version should be used
             | only in exceptional cases.
        
           | dtech wrote:
           | Aws cli isn't the best example because it supports outputting
           | json natively.
           | 
           | I'd expect this to not be a huge problem in practice because
           | this is mostly for those well established unix cli tools of
           | which the output has mostly ossified anyway. Many modern and
           | frequently updated tools support native JSON output.
        
             | hnlmorg wrote:
             | > Aws cli isn't the best example because it supports
             | outputting json natively.
             | 
             | The s3 sub command annoyingly doesn't. Which I'm guessing
             | is the reason the GP used that specifically.
        
               | cwilkes wrote:
               | Use "aws s3api list-objects-v2"
               | 
               | https://docs.aws.amazon.com/cli/latest/reference/s3api/
        
               | hnlmorg wrote:
               | You can, but it's not nearly as nice to use. For starters
               | you have to manage pagination yourself.
        
         | CoastalCoder wrote:
         | Good point. This reminds me of the Linux (Unix?) "file"
         | program, and whichever hero(es) maintain it.
        
         | amelius wrote:
         | I salute whoever chooses to use this and runs into the
         | assumptions made by this tool that turn out to be wrong.
        
         | sesm wrote:
         | In theory, if it could load something like 'plugins' (for
         | example as separate shell commands) some of the maintenance
         | effort could be offloaded to 'plugin' authors.
        
       | PreInternet01 wrote:
       | Oh, this is cool. I'm a huge proponent of CLI tools supporting
       | sensible JSON output, and things like
       | https://github.com/WireGuard/wireguard-tools/blob/master/con...
       | and PowerShell's |ConvertTo-Json are a huge part of my
       | management/monitoring automation efforts.
       | 
       | But, unfortunately, _sensible_ is doing some heavy lifting here
       | and reality is... well, reality. While the output of things like
       | the LSI /Broadcom StorCLI 'suffix the command with J' approach
       | and some of PowerShell's COM-hiding wrappers (which are
       | depressingly common) is _technically_ JSON, the end result is so
       | mindbogglingly complex-slash-useless, that you 're quickly forced
       | to revert to 'OK, just run some regexes on the plain-text output'
       | kludges anyway.
       | 
       | Having said that, I'll definitely check this out. If the first
       | example given, parsing dig output, is indeed representative of
       | what this can _reliably_ do, it should be interesting...
        
       | zubairq wrote:
       | Simple idea, really great to see this!
        
       | kbknapp wrote:
       | Really cool idea but this gives me anxiety just thinking about
       | how it has to be maintained. Taking into account versions,
       | command flags changing output, etc. all seems like a nightmare to
       | maintain to the point where I'm assuming actual usage of this
       | will work great for a few cases but quickly lose it's novelty
       | beyond basic cases. Not to mention using `--<CMD>` for the tool
       | seems like a poor choice as your help/manpage will end up being
       | thousands of lines long because each new parser will require a
       | new flag.
        
         | verdverm wrote:
         | This is one of the better use cases for LLMs, which have shown
         | good capability at turning unstructured text into structured
         | objects
        
           | ninkendo wrote:
           | If LLM's were local and cheap, sure. They're just too
           | heavyweight of a tool to use for simple CLI output
           | manipulation today. I don't want to send everything to the
           | cloud (and pay a fee), and even if it was a local LLM, I
           | don't want it to eat all my RAM and battery to do simple text
           | manipulation.
           | 
           | In 20 years, assuming some semblance of moore's law still
           | holds for storage/RAM/gpu, I'm right there with you.
        
             | d3nj4l wrote:
             | On my M1 Pro/16GB RAM mac I get decently fast, fully local
             | LLMs which are good enough to do this sort of thing. I use
             | them in scripts all the time. Granted, I haven't checked
             | the impact on the battery life I get, but I definitely
             | haven't noticed any differences in my regular use.
        
               | mosselman wrote:
               | Which models do you run and how?
        
               | _joel wrote:
               | not op, but this is handy https://lmstudio.ai/
        
               | verdverm wrote:
               | https://github.com/ggerganov/llama.cpp is a popular local
               | first approach. LLaMa is a good place to start, though I
               | typically use a model from Vertex AI via API
        
             | chongli wrote:
             | Yeah, it would be much better if you could send a sample of
             | the input and desired output and have the LLM write a
             | highly optimized shell script for you, which you could then
             | run locally on your multi-gigabyte log files or whatever.
        
           | himinlomax wrote:
           | Problem: some crusty old tty command has dodgy output.
           | 
           | Solution: throw a high end GPU with 24GB RAM and a million
           | dollar of training at it.
           | 
           | Yeah, great solution.
        
             | verdverm wrote:
             | With fine-tuning, you can get really good results on
             | specific tasks that can run on regular cpu/mem. I'd suggest
             | looking into the distillation research, where large model
             | expertise can be transferred to much smaller models.
             | 
             | Also, an LLM trained to be good at this task has many more
             | applications than just turning command output into
             | structured data. It's actually one of the most compelling
             | business use cases for LLMs
        
               | anonymous_sorry wrote:
               | The complaint is less whether it would work, and more a
               | question of taste. Obviously taste can be a personal
               | thing. My opinions are my own and not those of the BBC,
               | etc.
               | 
               | You have a small C program that processes this data in
               | memory, and dumps it to stdout in tabular text format.
               | 
               | Rather than simplify by stripping out the problematic bit
               | (the text output), you suggest adding a large, cutting-
               | edge, hard to inspect and verify piece of technology that
               | transforms that text through uncountable floating point
               | operations back into differently-formatted UTF8.
               | 
               | It might even work consistently (without you ever having
               | 100% confidence it won't hallucinate at precisely the
               | wrong moment).
               | 
               | You can certainly see it being justified for one-off
               | tasks that aren't worth automating.
               | 
               | But to shove such byzantine inefficiency and complexity
               | into an engineered system (rather than just modify the
               | original program to give the format you want) offends my
               | engineering sensibilities.
               | 
               | Maybe I'm just getting old!
        
               | verdverm wrote:
               | If you can modify the original program, then that is by
               | far the best way to go. More often than not, you cannot
               | change the program, and in relation to the broader
               | applicability, most unstructured content is not produced
               | by programs.
        
               | anonymous_sorry wrote:
               | Yes, makes sense. Although this was originally a post
               | about output of common command-line tools. Some of these
               | are built on C libraries that you can just use directly.
               | They are usually open source.
        
           | hnlmorg wrote:
           | As someone who maintains a solution that solves similar
           | problems to jc, I can assure you that you don't need a LLM to
           | parse most human readable output.
        
             | verdverm wrote:
             | it's more about the maintenance cost, you don't have to
             | write N parsers for M versions
             | 
             | Maybe the best middle ground is to have an LLM write the
             | parser. Lowers the development cost and runtime
             | performance, in theory
        
               | hnlmorg wrote:
               | You don't have to write dozens of parsers. I didn't.
        
               | verdverm wrote:
               | Part of the appeal is that people who don't know how to
               | program or write parsers can use an LLM to solve their
               | unstructured -> structured problem
        
               | hnlmorg wrote:
               | Sure. But we weren't talking about non-programmers
               | maintaining software.
        
               | verdverm wrote:
               | > people who don't know how to program _OR_ write parsers
               | 
               | there are plenty of programmers who do not know how to
               | write lexers, parsers, and grammars
        
               | hnlmorg wrote:
               | We are chatting about maintaining a software project
               | written in a software programming language. Not some
               | theoretical strawman argument youve just dreamt up
               | because others have rightly pointed out that you don't
               | need a LLM to parse the output of a 20KB command line
               | program.
               | 
               | As I said before, I maintain a project like this. I also
               | happen to work for a company that specialises in the use
               | of generative AI. So I'm well aware of the power of LLMs
               | as well as the problems of this very specific domain. The
               | ideas you've expressed here are, at best, optimistic.
               | 
               | by the time you've solved all the little quirks of ML
               | you'll have likely invested far more time on your LLM
               | then you would have if you'd just written a simple parser
               | and, ironically, needed someone far more specialised to
               | write the LLM than your average developer.
               | 
               | This simply isn't a problem that needs a LLM chucked at
               | it.
               | 
               | You don't even need to write lexers and grammars to parse
               | 99% of application output. Again, I know this because
               | I've written such software.
        
               | tovej wrote:
               | this is a terrible idea, I can't think of a less
               | efficient method with worse correctness guarantees. What
               | invariants does the LLM enforce? How do you make sure it
               | always does the right thing? How do you debug it when it
               | fails? What kind of error messages will you get? How will
               | it react to bad inputs, will it detect them (unlikely),
               | will it hallicinate an interpretation (most likely)
               | 
               | This is not a serious suggestion
        
               | verdverm wrote:
               | I used to focus on the potential pitfalls and be overly
               | negative. I've come to see that these tradeoffs are
               | situational. After using them myself, I can definitely
               | see upsides that outweigh the downsides
               | 
               | Developers make mistakes too, so there are no guarantees
               | either way. Each of your questions can be asked of
               | handwritten code too
        
               | smrq wrote:
               | You can ask those questions, but you won't get the same
               | answers.
               | 
               | It's not a question of "is the output always correct".
               | Nothing is so binary in the real world. A well hand-
               | maintained solution will trend further towards
               | correctness as bugs are caught, reported, fixed,
               | regression tested, etc.
               | 
               | Conversely, you could parse an IP address by rolling
               | 4d256 and praying. It, too, will sometimes be correct and
               | sometimes be incorrect. Does that make it an equally
               | valid solution?
        
               | SwedishExpat wrote:
               | You can put the LLM into the code loop so it prompts,
               | tries the response on test cases, and gives it back until
               | it works. This is surprisingly reliable for simple
               | problems like JSON'ifying structured text.
        
           | keithalewis wrote:
           | Give a kid a hammer and he'll find something to fix.
        
             | verdverm wrote:
             | What value does this comment add?
        
               | otteromkram wrote:
               | I got a kick out of it.
               | 
               | -\\_(tsu)_/-
        
               | keithalewis wrote:
               | Approximately the same amount as the comment I replied
               | to.
        
               | verdverm wrote:
               | One attempts to nudge a user towards the comment
               | guidelines of HN
               | (https://news.ycombinator.com/newsguidelines.html)
               | 
               | > Be kind. Don't be snarky. Converse curiously; don't
               | cross-examine. Edit out swipes.
               | 
               | > Comments should get more thoughtful and substantive,
               | not less, as a topic gets more divisive.
               | 
               | > Eschew flamebait. Avoid generic tangents. Omit internet
               | tropes.
        
               | leptons wrote:
               | The old saying "If a hammer is your only tool then
               | everything is a nail" is absolutely pertinent to this
               | comment thread.
        
               | verdverm wrote:
               | how so? what assumptions are you making to reach that
               | conclusion?
        
         | cproctor wrote:
         | Would it be fair to think about this as a shim whose scope of
         | responsibility will (hopefully) shrink over time, as command
         | line utilities increasingly support JSON output? Once a utility
         | commits to handling JSON export on its own, this tool can
         | delegate to that functionality going forward.
        
           | dan_quixote wrote:
           | I'd also assume that a CLI resisting JSON support is likely
           | to have a very stable interface. Maybe wishful thinking...
        
           | pydry wrote:
           | It would but I can still see somebody launching this with
           | great enthusiasm and then losing the passion to fix Yet
           | Another Parsing Bug introduced on a new version of dig
        
             | kbrazil wrote:
             | `jc` author here. I've been maintaining `jc` for nearly
             | four years now. Most of the maintenance is choosing which
             | new parsers to include. Old parsers don't seem to have too
             | many problems (see the Github issues) and bugs are
             | typically just corner cases that can be quickly addressed
             | along with added tests. In fact there is a plugin
             | architecture that allows users to get a quick fix so they
             | don't need to wait for the next release for the fix. In
             | practice it has worked out pretty well.
             | 
             | Most of the commands are pretty old and do not change
             | anymore. Many parsers are not even commands but standard
             | filetypes (YAML, CSV, XML, INI, X509 certs, JWT, etc.) and
             | string types (IP addresses, URLs, email addresses,
             | datetimes, etc.) which don't change or use standard
             | libraries to parse.
             | 
             | Additionally, I get a lot of support from the community.
             | Many new parsers are written and maintained by others,
             | which spreads the load and accelerates development.
        
         | majkinetor wrote:
         | This requires collaboration. People submitting parsing info for
         | the tool they need, and people that use it to easily keep it up
         | to date. That is the only way.
        
         | eichin wrote:
         | I'm sort of torn - yeah, one well-maintained "basket" beats
         | having a bunch of ad-hoc output parsers all over the place, but
         | I want direct json output because I'm doing something
         | complicated and don't want parsing to add to the problem. (I
         | suppose the right way to get comfortable with using this is to
         | just make sure to submit PRs with additional test cases for
         | everything I want to use it with, since I'd have to write those
         | tests anyway...)
        
         | zackmorris wrote:
         | Keep in mind that the maintenance responsibility you're anxious
         | about is currently a cost imposed on all developers.
         | 
         | <rant>
         | 
         | Since I started programming in the 80s, I've noticed a trend
         | where most software has adopted the Unix philosophy of "write
         | programs that do one thing and do it well". Which is cool and
         | everything, but has created an open source ecosystem of rugged
         | individualism where the proceeds to the winners so vastly
         | exceeds the crumbs left over for workers that there is no
         | ecosystem to speak of, just exploitation. Reflected now in the
         | wider economy.
         | 
         | But curated open source solutions like jc approach problems at
         | the systems level so that the contributions of an individual
         | become available to society in a way that might be called
         | "getting real work done". Because they prevent that unnecessary
         | effort being repeated 1000 or 1 million times by others. Which
         | feels alien in our current task-focussed reality where most
         | developers never really escape maintenance minutia.
         | 
         | So I'm all in favor of this inversion from "me" to "we". I also
         | feel that open source is the tech analog of socialism. We just
         | have it exactly backwards right now, that everyone has the
         | freedom to contribute, but only a select few reap the rewards
         | of those contributions.
         | 
         | We can imagine what a better system might look like, as it
         | would start with UBI. And we can start to think about
         | delivering software resources by rail instead of the labor of
         | countless individual developer "truck drivers". Some low-
         | hanging fruit might be: maybe we need distributions that
         | provide everything and the kitchen sink, then we run our
         | software and a compiler strips out the unused code, rather than
         | relying on luck to decide what we need before we've started
         | like with Arch or Nix. We could explore demand-side economics,
         | where humans no longer install software, but dependencies are
         | met internally on the fly, so no more include paths or headers
         | to babysit (how early programming languages worked before C++
         | imposed headers onto us). We could use declarative programming
         | more instead of brittle imperative (hard coded) techniques. We
         | could filter data through stateless self-contained code modules
         | communicating via FIFO streams like Unix executables. We could
         | use more #nocode approaches borrowed from FileMaker, MS Access
         | and Airtable (or something like it). We could write software
         | from least-privileges, asking the user for permission to access
         | files or the networks outside the module's memory space, and
         | then curate known-good permissions policies instead of
         | reinventing the wheel for every single program. We could (will)
         | write test-driven software where we design the spec as a series
         | of tests and then AI writes the business logic until all of the
         | tests pass.
         | 
         | There's a lot here to unpack here and a wealth of experience
         | available from older developers. But I sympathize with the
         | cognitive dissonance, as that's how I feel every single day
         | witnessing the frantic yak shaving of "modern" programming
         | while having to suppress my desire to use these other proven
         | techniques. Because there's simply no time to do so under the
         | current status quo where FAANG has quite literally all of the
         | trillions of dollars as the winners, so decides best practices
         | while the open source community subsists on scraps in their
         | parent's basement hoping to make rent someday.
        
       | nickster wrote:
       | All output being an object is one of my favorite things about
       | powershell. I miss it when I have to write a bash script.
        
       | timetraveller26 wrote:
       | Does anybody know of a listof modern unix command-line tools
       | accepting a --json option?
       | 
       | It may even be useful to add that information to this repo.
        
         | chungy wrote:
         | Basically everything on FreeBSD supports it via libxo.
        
         | user3939382 wrote:
         | Probably not what you had in mind but, AWS CLI.
        
         | pirates wrote:
         | kubectl with "-o json"
        
         | geraldcombs wrote:
         | TShark (the CLI companion to Wireshark) does with the `-T json`
         | flag.
        
       | Mister_Snuggles wrote:
       | In FreeBSD, this problem was solved with libxo[0]:
       | $ ps --libxo=json | jq         {           "process-information":
       | {             "process": [               {                 "pid":
       | "41389",                 "terminal-name": "0 ",
       | "state": "Is",                 "cpu-time": "0:00.01",
       | "command": "-bash (bash)"               },         [...]
       | 
       | It's not perfect though. ls had support, but it was removed for
       | reasons[1]. It's not supported by all of the utilities, etc.
       | 
       | This seems to be a great stop-gap with parsers for a LOT of
       | different commands, but it relies on parsing text output that's
       | not necessarily designed to be parsed. It would be nice if
       | utilities coalesced around a common flag to emit structured
       | output.
       | 
       | In PowerShell, structured output is the default and it seems to
       | work very well. This is probably too far for Unix/Linux, but a
       | standard "--json" flag would go a long way to getting the same
       | benefits.
       | 
       | [0] https://wiki.freebsd.org/LibXo
       | 
       | [1] https://reviews.freebsd.org/D13959
        
         | supriyo-biswas wrote:
         | Similarly, in SerentityOS, stuff under /proc return JSON data
         | rather than unstructured text files.
         | 
         | A better, structural way in which this could be fixed is to
         | allow data structures to be exported in ELFs and have those
         | data structures serialized into terminal output, which can then
         | be outputted in the preferred format of the user, such as JSON,
         | YAML, or processed accordingly.
        
         | nijave wrote:
         | Libxo is neat, in theory, but it seems like applications are
         | left to implement their own logic for a given output format
         | rather than being able to pass a structure to libxo and let it
         | do the formatting.
         | 
         | I can't remember the exact utility--I think it was iostat--
         | would use string interpolation to format output lines in JSON
         | and combined with certains flags produced completely mangled
         | output. Not sure if things have improved but I would have
         | expected something like JSON lines when interval is provided.
         | 
         | Powershell and kubectl are miles ahead of libxo in useability
         | imo
        
           | simias wrote:
           | Well I suspect that eventually you just run into hard
           | limitations with C's introspection facilities, or lack
           | thereof.
           | 
           | I like C a lot but one of the reasons I like Rust more these
           | days is the ability to trivially implement complex
           | serialization schemes without a ton of ad-hoc code and
           | boilerplate.
        
           | gigatexal wrote:
           | far better for applications to be unaware of a such a utility
           | and allow something like jc to grow in support with plugins
           | or something so as to keep the utilities simple and move the
           | logic and burden to the wrapping utility in this case jc.
        
         | evnp wrote:
         | _> In PowerShell, structured output is the default and it seems
         | to work very well. This is probably too far for Unix /Linux,
         | but a standard "--json" flag would go a long way to getting the
         | same benefits._
         | 
         | OP has a blog post[0] which describes exactly this. `jc` is
         | described as a tool to fill this role "in the meantime" -- my
         | reading is that it's intended to serve as a stepping stone
         | towards widespread `-j`/`--json` support across unix tools.
         | 
         | [0] https://blog.kellybrazil.com/2019/11/26/bringing-the-unix-
         | ph...
        
         | throw0101b wrote:
         | > _In FreeBSD, this problem was solved with libxo[0]:_
         | 
         | Libxo happens to be in the base system, but it is generally
         | available:
         | 
         | * https://github.com/Juniper/libxo
         | 
         | * https://libxo.readthedocs.io/en/latest/
        
         | ekidd wrote:
         | > _In PowerShell, structured output is the default and it seems
         | to work very well._
         | 
         | PowerShell goes a step beyond JSON, by supporting actual
         | mutable _objects_. So instead of just passing through
         | structured data, you effectively pass around opaque objects
         | that allow you to go back to earlier pipeline stages, and
         | invoke methods, if I understand correctly:
         | https://learn.microsoft.com/en-
         | us/powershell/module/microsof....
         | 
         | I'm rather fond of wrappers like jc and libxo, and experimental
         | shells like https://www.nushell.sh/. These still focus on
         | passing _data_ , not objects with executable methods. On some
         | level, I find this comfortable: Structured data still feels
         | pretty Unix-like, if that makes sense? If I want actual
         | objects, then it's probably time to fire up Python or Ruby.
         | 
         | Knowing when to switch from a shell script to a full-fledged
         | programming language is important, even if your shell is
         | basically awesome and has good programming features.
        
           | lukeschlather wrote:
           | Are executable methods really that bad? I mean, they're bad
           | in some abstract sense but that seems more like an objection
           | if we were talking about a "safe" language like Rust than
           | talking about shell scripting. For a shell executable methods
           | seem fine. If you don't make the method executable people are
           | just going to use eval() anyway, might as well do the more
           | predictable thing.
        
             | ekidd wrote:
             | It _might_ be possible to design a good Unix shell based on
             | objects, with the ability to  "call back into" programs.
             | But I haven't seen one yet that I'd prefer over Ruby or
             | Python.
             | 
             | I do think objects make plenty of sense in languages like
             | AppleScript, which essentially allowed users to script
             | running GUI applications. And similarly, Powershell's
             | objects might be right for Windows.
             | 
             | But nushell shows how far you can push "dumb" structured
             | data. And it still feels "Unix-like", or at least
             | "alternate universe Unix-like."
             | 
             | The other reason I'm suspicious of objects in shells is
             | that shell pilelines are technically async coroutines
             | operating over streams! That's already much further into
             | the world of Haskell or async Rust than many people
             | realize. And so allowing "downstream" portions of a
             | pipeline to call back into "upstream" running programs and
             | to randomly change things introduces all kinds of potential
             | async bugs.
             | 
             | If you're going to have a async coroutines operating on
             | streams, then having immutable data is often a good choice.
             | Traditional Unix shells do exactly this. Nushell does it,
             | too, but it replaces plain text with structured data.
        
           | munchbunny wrote:
           | PowerShell is basically an interactive C# shell with language
           | ergonomics targeting actual shell usage instead of "you can
           | use it as a shell" the way Python, Ruby, etc. approach their
           | interactive shells. However, the language and built-in
           | utilities work best when you are passing around data as
           | opposed to using PowerShell as if you were writing C#.
           | 
           | It's true, you are indeed passing around full-blown .NET
           | runtime objects. In fact your whole shell is running inside
           | an instance of the .NET runtime, including the ability to
           | dynamically load .NET DLL's and even directly invoke native
           | API's.
           | 
           | It feels a bit like JS in the sense that you're best off
           | sticking to "the good parts", where you get the power of
           | structured input/output but you don't end up trying to, for
           | example, implement high performance async code, even though
           | you technically could.
        
         | msla wrote:
         | > It's not perfect though. ls had support, but it was removed
         | for reasons
         | 
         | In specific:
         | 
         | https://svnweb.freebsd.org/base?view=revision&revision=32810...
         | 
         | > libxo imposes a large burden on system utilities. In the case
         | of ls, that burden is difficult to justify -- any language that
         | can interact with json output can use readdir(3) and stat(2).
         | 
         | Which rather misses the point of being able to use JSON in
         | shell scripts.
        
           | oh_sigh wrote:
           | I'd love to know what the burden was too. I hear comments
           | like that in code reviews and commonly when you push for
           | specifics about the burden, there is very little
        
         | rezonant wrote:
         | > This seems to be a great stop-gap with parsers for a LOT of
         | different commands, but it relies on parsing text output that's
         | not necessarily designed to be parsed
         | 
         | True, and yet it's extremely common to parse output in bash
         | scripts and other automations, so in a sense it's just
         | centralizing that effort. That being said at least when you do
         | it yourself you can fix problems directly.
        
         | imtringued wrote:
         | Now they only need to do the same thing for input and let the
         | operating system or the shell handle the argument parsing so
         | that it is consistent accross the entire operating system.
        
         | nerdponx wrote:
         | What I find weird about Powershell is that there's no "fixed-
         | width column" parser, which is a widely used format for Unix-
         | style CLI tools.
         | 
         | I don't know if NuShell has it, I haven't tried.
         | 
         | In any case, it's much better for tools to output more-
         | parseable data in the first place. Whitespace-delimited columns
         | are fine of course, but not so much when the data can contain
         | whitespace, as in the output from `ps`.
         | 
         | I don't see much reason why JSONLines (https://jsonlines.org/)
         | / NDJSON (https://ndjson.org/) can't be a standard output
         | format from most tools, in addition to tables.
         | 
         | As for the reason of removal:                 any language that
         | can interact with json output can use readdir(3) and stat(2).
         | 
         | Ugh. Any language of course can do it. But that's basically
         | telling users that they need to reimplement ls(1) themselves if
         | they want to use any of its output and features in scripts.
         | 
         | I understand if the maintenance burden is too high to put it in
         | ls(1) itself, but it's a shame that _no tool_ currently does
         | this. The closest we have is a feature request in Eza:
         | https://github.com/eza-community/eza/issues/472
        
       | timetraveller26 wrote:
       | The modern kids want it all easy, when I was learning Linux we
       | used null delimiters, xargs, cut, sed & awk and that was enough!
       | \s
        
       | rplnt wrote:
       | Wish it would have automatic parser selection by default. Even if
       | just for a (possible) selected subset. Typing `foo | jc | jq ...`
       | would be more convenient than `foo | jc --foo | jq ...`.
        
         | mikecarlton wrote:
         | It supports `jc foo | jq` which is quite handy. E.g. `jc dig
         | google.com txt | jq '.[]|.answer[]|.data'`
        
           | kbrazil wrote:
           | Also, `jc` automatically selects the correct /proc/file
           | parser so you can just do `jc /proc/meminfo` or `cat
           | /proc/meminfo | jc --proc` without specifying the actual proc
           | parser (though you can do that if you want)
           | 
           | Disclaimer: I'm the author of `jc`.
        
       | crazysim wrote:
       | Jesus Christ, it's JSON for Bourne!
       | 
       | I wonder how well this could work/interact with Powershell.
        
         | majkinetor wrote:
         | Perfectly: <json stringn input> | ConvertFrom-Json
        
       | js2 wrote:
       | Previous discussion (linked to from the project's readme):
       | 
       | https://news.ycombinator.com/item?id=28266193
        
       | Pxtl wrote:
       | Honestly this is half the reason I use Powershell for everything.
       | Bash-like experience but everything returns objects.
       | 
       | It's a messy, hairy, awful language. Consistently inconsistent,
       | dynamically-typed in the worst ways, "two googles per line"
       | complexity, etc.
       | 
       | But for the convenience of being able to combine shell-like
       | access to various tools and platforms combined with the
       | "everything is a stream of objects" model, it can't be beat in my
       | experience.
       | 
       | And you can still do all the bash-like things for tools that _don
       | 't_ have good Powershell wrappers that will convert their text-
       | streams into objects. Which, sadly, is just about everything.
        
       | abound wrote:
       | Nushell [1] ends up at mostly the same place (structured data
       | from shell commands) with a different approach, mostly just being
       | a shell itself.
       | 
       | [1] http://www.nushell.sh/
        
         | saghm wrote:
         | I had glanced at nunshell every now and then since it was
         | initially announced, but it wasn't until a month or two before
         | that I finally really "got" the point of it. I was trying to
         | write a script to look through all of the files in a directory
         | matching a certain pattern and pruning them to get rid of ones
         | with modified timestamps within 10 minutes of each other. I
         | remembered that nushell was supposed to be good for things like
         | this, and after playing around with it for a minute, it finally
         | "clicked" and now I'm hooked. Even when dealing with
         | unstructured data, there's a lot of power in being able to
         | convert it even into something as a list of records (sort of
         | like structs) and process it from there.
        
           | js2 wrote:
           | > write a script to look through all of the files in a
           | directory matching a certain pattern...
           | 
           | That's just `find`...
           | 
           | > and pruning them to get rid of ones with modified
           | timestamps...
           | 
           | Still just `find`...
           | 
           | > within 10 minutes of each other.
           | 
           | No longer just `find`. :-) I'd probably resort to Python +
           | pathlib at that point.
        
       | freedomben wrote:
       | Really glad to see this is already packaged for most linux
       | distributions. So many utilities nowadays seem be written in
       | Python, and python apps are such a PITA to install without
       | package manager packages. There's so many different ways to do it
       | and everything seems to be a little different. Some will require
       | root and try to install on top of package manager owned
       | locations, which is a nightmare.
       | 
       | Fedora Toolbox has been wonderful for this exact use case
       | (installing Python tools), but for utilities like this that will
       | be part of a bash pipe chain for me, toolbox won't cut it.
        
         | Spivak wrote:
         | Installing self-contained programs written in Python not
         | packaged for your distro:
         | PIPX_HOME=/usr/local/pipx PIPX_BIN_DIR=/usr/local/bin pipx
         | install app==1.2.3
         | 
         | It sets up an isolated install for each app with only its deps
         | and makes it transparent.
         | 
         | The distro installation tree of Python is for the exclusive use
         | of your distro because core apps cloud-init, dnf, firewalld are
         | built against those versions.
        
           | freedomben wrote:
           | thank you! That's amazingly helpful. I had no idea pipx was a
           | thing
           | 
           | For others: https://github.com/pypa/pipx
           | 
           | It's also in the Fedora repos: dnf install -y pipx
        
       | bottled_poe wrote:
       | I'd bet money on people (here) using tools like this to process
       | millions of records or more. It's a sad truth those people won't
       | have jobs in a few years when AI, which will know better, takes
       | hold :(
        
       | moss2 wrote:
       | Excellent
        
       | sesm wrote:
       | IMO 'jc dig example.com' should be the primary syntax, because
       | 'dig example.com | jc --dig' has to retroactively guess the flags
       | and parameters of previous command to parse the output.
        
       | mejutoco wrote:
       | I wonder if a tool could parse any terminal output into json in a
       | really dumb and deterministic way:                   {
       | "lines": [              "line1 bla bla",              "line1 bla
       | bla",            ],           "words": [              "word1",
       | "word2",            ],         }
       | 
       | With enough representations (maybe multiple of the same thing) to
       | make it easy to process, without knowing sed or similar. It seems
       | hacky but it would not require any maintenance for each command,
       | and would only change if the actual output changes.
        
         | hk__2 wrote:
         | What's the point of JSON, then?
        
       | codedokode wrote:
       | They are doing all wrong, instead of using human-readable and
       | machine-readable formats, all CLI tools should use human-and-
       | machine-readable format.
        
         | mathfailure wrote:
         | You mean YAML?
        
           | codedokode wrote:
           | YAML is over-complicated.
        
       | Cyph0n wrote:
       | Interesting project! But I expected them to be using textfsm (or
       | something similar) as a first step parser. textfsm is heavily
       | used to parse CLI outputs in networking devices.
       | 
       | https://github.com/google/textfsm
        
       | nailer wrote:
       | Nice.
       | 
       | Too many "lets fix the command line" (nushell, pwsh) have noble
       | goals, but also start with "first let's boil the ocean".
       | 
       | We need to easily ingest old shitty text output for a little
       | while to move to the new world of structured IO.
        
       | kazinator wrote:
       | $ dig example.com | txr dig.txr
       | [{"query_time":"1","rcvd":"56","answer_num":1,"status":"NOERROR",
       | "when_epoch":1702030676,"opcode":"QUERY","udp":"65494","opt_pseud
       | osection":{"edns":{"udp":65494,"flags":[],"version":"0"}},
       | "query_num":1,"question":{"name":"example.com.","type":"A","class
       | ":"IN"},         "server":"127.0.0.53#53(127.0.0.53)","id":"48295
       | ","authority_num":0,         "answer":[{"name":"example.com.","ty
       | pe":"A","data":"93.184.216.34","ttl":"4441",
       | "class":"IN"}],         "additional_num":1,"when":"Fri Dec 08
       | 10:17:56 PST 2023"}]            $ cat dig.txr       @(bind sep
       | @#/[\s\t]+/)       @(skip)       ;; ->>HEADER<<- opcode: @opcode,
       | status: @status, id: @id       ;; flags: qr rd ra; QUERY: @query,
       | ANSWER: @answer, AUTHORITY: @auth, ADDITIONAL: @additional
       | @(skip)       ;; OPT PSEUDOSECTION:       ; EDNS: version:
       | @edns_ver, flags:@flags; udp: @udp       @(skip)       ;;
       | QUESTION SECTION:       ;@qname@sep@qclass@sep@qtype
       | @(skip)       ;; ANSWER SECTION:
       | @aname@sep@ttl@sep@aclass@sep@atype@sep@data              ;;
       | Query time: @qtime msec       ;; SERVER: @server       ;; WHEN:
       | @when       ;; MSG SIZE  rcvd: @rcvd       @(do (put-jsonl #J^[{
       | "id" : ~id,                             "opcode" : ~opcode,
       | "status" : ~status,                             "udp" : ~udp,
       | "query_num" : ~(tofloat query),
       | "answer_num" : ~(tofloat answer),
       | "authority_num" : ~(tofloat auth),
       | "additional_num" : ~(tofloat additional),
       | "opt_pseudosection" :                             {
       | "edns" :                               {
       | "version" : ~edns_ver,                                 "flags" :
       | [],                                 "udp" : ~(tofloat udp)
       | }                             },
       | "question" :                             {
       | "name" : ~qname,                               "class" : ~qclass,
       | "type" : ~qtype                             },
       | "answer" :                             [
       | {                                 "name" : ~aname,
       | "class" : ~aclass,                                 "type" :
       | ~atype,                                 "ttl" : ~ttl,
       | "data" : ~data                               }
       | ],                             "query_time" : ~qtime,
       | "server" : ~server,                             "rcvd" : ~rcvd,
       | "when" : ~when,                             "when_epoch" :
       | ~(time-parse "%a %b %d %T %Z %Y" when).(time-utc)
       | }]))
       | 
       | The latest TXR (292 as of time of writing) allows integers in
       | JSON data, so (toint query) could be used.
        
       | AtlasBarfed wrote:
       | My God, doesn't properly handle ls?
        
       | da39a3ee wrote:
       | Awesome, does it work for man pages? They're a huge elephant in
       | the room -- people get really upset if you point out that man
       | pages are an unsearchable abomination, locking away vast amounts
       | of important information about unix systems in an unparseable
       | mess. But, it's true.
        
       ___________________________________________________________________
       (page generated 2023-12-08 23:01 UTC)