[HN Gopher] JC converts the output of popular command-line tools...
___________________________________________________________________
JC converts the output of popular command-line tools to JSON
Author : tosh
Score : 288 points
Date : 2023-12-08 14:12 UTC (8 hours ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| calvinmorrison wrote:
| In a certain sense, files everywhere is great, that's the promise
| of unix, or plan9 to a further extent.
|
| However, unstructured files, or files that all have their own
| formats, is also equally hampering. Trying to even parse an nginx
| log file can be annoying with just awk or some such.
|
| One of the big disadvantages is that large system rewrites and
| design changes cannot be executed in the linux userland.
|
| All to say, I'd love a smarter shell, I love files, I have my awk
| book sitting next to me, but I think it's high time to get some
| serious improvements on parsing data.
|
| In the same way programs are smart enough to know to render
| colored output or not, I'd love it if it could dump structured
| output (or not)
| mikepurvis wrote:
| When it comes to parsing server logs, it's too bad the
| functionality can't be extracted out of something like
| logstash, since that's already basically doing the same thing.
|
| Though I guess the real endgame here is for upstream tools to
| eventually recognize the value and learn how to directly supply
| structured output.
| dale_glass wrote:
| You can get that out of journald.
| journalctl -o json
|
| And applications using journald directly can provide their
| own custom fields.
| hoherd wrote:
| > In the same way programs are smart enough to know to render
| colored output or not, I'd love it if it could dump structured
| output (or not)
|
| The even lower hanging fruit is to implement json output as a
| command line argument in all cli tools. I would love to see
| this done for the gnu core utils.
| teddyh wrote:
| At least "du", "env", and "printenv" in Coreutils all support
| the "--null" option to separate output into being NUL-
| separated instead of being text lines.
| bryanlarsen wrote:
| It was a really pleasant surprise to find the "-j" option to
| do this for the "ip" command from the iproute2 project.
| mr_mitm wrote:
| AFAIK the idea is that if you need that kind of
| interoperability in unix/gnu, you're supposed to write your
| tool in C and include some libraries. Clearly not realistic
| in many use cases.
| dale_glass wrote:
| Yup. It really grinds my gears that people came up with fairly
| decent ideas half a century ago, and a large amount of people
| decided to take that as gospel rather than as something to
| improve on.
|
| And it's like pulling teeth to get any improvement, because the
| moment somebody like Lennart tries to get rid of decades of old
| cruft, drama erupts.
|
| And even JSON is still not quite there. JSON is an okay-ish
| idea, but to do this properly what we need is a format that can
| expose things like datatypes. More like PowerShell. So that we
| can do amazing feats like treating a number like a number, and
| calculating differences between dates by doing $a - $b.
| CyberDildonics wrote:
| If you want something more complex and restrictive you could
| easily make it out of JSON. JSON works because it is simple
| and isn't being constantly distorted into something more
| complicated to cover niche use cases.
| da_chicken wrote:
| There is always Powershell. The trouble there is that it's so
| rooted in .Net and objects that it's very difficult to
| integrate with existing native commands on any platform.
| uxp8u61q wrote:
| If these "native commands" had a sensible output format,
| integrating them with powershell would be as simple as
| putting ConvertFrom-Json or ConvertFrom-Csv in the middle of
| the pipeline. And let's be real, it's as poorly "integrated"
| with "native commands" as bash or zsh is poorly integrated
| with native commands.
| da_chicken wrote:
| You're not wrong, but the fix is then, "hey, let's extend
| the functionality of literally every *nix command". Which
| is hard to achieve. Building sensible serialized object-
| oriented output is the same problem as building sensible
| object-oriented output. That's why jc exists, I suppose.
|
| There is always ConvertFrom-String. The problem is that it
| is one of the least intuitive and worst-performing commands
| I've used in Powershell. It's awful and I hate it. It's
| like writing sed and awk commands without the benefit of
| sed and awk's maturity and ubiquity. IMX, only Compare-
| Object has been worse.
| uxp8u61q wrote:
| > "hey, let's extend the functionality of literally every
| *nix command". Which is hard to achieve.
|
| It's pretty much what powershell did, though.
| imtringued wrote:
| It sounds like there needs to be a monolithic solution
| built from the ground up for interoperation.
|
| Updating coreutils is a losing game at this point. Of
| course people will get angry when a well designed
| solution gradually takes over simply, because it is
| better.
| da_chicken wrote:
| Eh, not really. Powershell itself is fairly limited in
| terms of functionality. It's basically one step removed
| from a .Net REPL, and the .Net classes aren't necessarily
| written to do *nix admin tasks. It gives you all of .Net
| to build tools with, but sometimes you still run into
| Microsoftisms that are total nonsense. There's a reason
| every C# project was using Newtonsoft's JSON.Net instead
| of the wonky custom data representations that MS was
| trying to push left over from the
| embrace/extend/extinguish era.
| uxp8u61q wrote:
| I see a lot of buzzwords and attacks in this comment but
| nothing actually concrete.
| mistercow wrote:
| Part of the problem is that the output of commands is both a UI
| and an API, and because any text UI _can_ be used as an API,
| the human readable text gets priority. Shell scripting is
| therefore kind of like building third party browser extensions.
| You look and you guess, and then you hack some parser up based
| on your guess, and hope for the best.
|
| I actually wish there was just a third standard output for
| machine readable content, which your terminal doesn't print by
| default. When you pipe, this output is what gets piped (unless
| you redirect), it's expected to be jsonl, and the man page is
| expected to specify a contract. Then stdout can be for humans,
| and while you _can_ parse it, you know what you're doing is
| fragile.
|
| Of course, that's totally backwards incompatible, and as long
| as we're unrealistically reinventing CLIs from the foundations
| to modernize them, I have a long list of changes I'd make.
| theblazehen wrote:
| Have you seen some of the existing projects currently working
| on it? Most well known is https://www.nushell.sh/, amongst
| some others
| chongli wrote:
| I really want to agree because it seems to make so much sense
| in theory. It gets rid of the need to parse by standardizing
| on syntax. But it doesn't solve everything, so we still don't
| get to live the dream. Namely, it does not solve the issue of
| schemas, versioning, and migration.
|
| And this is a really big issue that threatens to derail the
| whole project. If my script runs a pipeline `a | b | c` and
| utility b gets updated with a breaking change to the schema,
| it breaks the entire script. Now I've got to go deep into the
| weeds to figure out what happened, and the breakage might not
| be visible in the human-readable output. So to debug I'll
| have to pass the flag to each tool to get it to print all the
| json to stdout, and then sit there eyeballing the json to
| figure out how the schema changed and what I need to do to
| fix my script.
|
| Seems like a big mess to me. Unless there's something I'm
| missing?
| Izkata wrote:
| > I actually wish there was just a third standard output for
| machine readable content, which your terminal doesn't print
| by default. When you pipe, this output is what gets piped
| (unless you redirect), it's expected to be jsonl, and the man
| page is expected to specify a contract.
|
| Except for that last jsonl part, various commands already do
| something like this with stdout by detecting what's on the
| other end.
|
| https://unix.stackexchange.com/questions/515778/how-does-
| a-p...
| numbsafari wrote:
| > Trying to even parse an nginx log file can be annoying with
| just awk or some such.
|
| You probably already know this, but for those who do not, you
| can configure nginx to generate JSON log output.
|
| Quite handy if you are aggregating structured logs across your
| stack.
| imtringued wrote:
| The problem is that nobody has built an actual ffi solution
| except maybe the GObject guys. C isn't an ffi, because you need
| a C compiler to make it work. By that I mean it is not an
| interface, but rather just C code whose calling part has been
| embedded into your application.
| pushedx wrote:
| I salute whoever chooses to maintain this
| alex_suzuki wrote:
| I wonder how they will address versions...
|
| `aws s3 ls | jc ---aws=1.2.3`
|
| What a nightmare.
| dj_mc_merlin wrote:
| What about
|
| jc 'aws sts get-caller-identity' | jq [..]
|
| That way the aws process can be a subprocess of jc, which can
| read where the binary is and get its version automatically.
| jasonjayr wrote:
| That can get thorny, because it adds another level of
| shell-quoting/escapes, and that is a notorious vector for
| security problems.
| sesm wrote:
| jc already has this, see 'jc dig example.com' in examples.
| They call it 'alternative magic syntax', but IMO it should
| be the primary syntax, while piping and second-guessing the
| previous commands parameters and version should be used
| only in exceptional cases.
| dtech wrote:
| Aws cli isn't the best example because it supports outputting
| json natively.
|
| I'd expect this to not be a huge problem in practice because
| this is mostly for those well established unix cli tools of
| which the output has mostly ossified anyway. Many modern and
| frequently updated tools support native JSON output.
| hnlmorg wrote:
| > Aws cli isn't the best example because it supports
| outputting json natively.
|
| The s3 sub command annoyingly doesn't. Which I'm guessing
| is the reason the GP used that specifically.
| cwilkes wrote:
| Use "aws s3api list-objects-v2"
|
| https://docs.aws.amazon.com/cli/latest/reference/s3api/
| hnlmorg wrote:
| You can, but it's not nearly as nice to use. For starters
| you have to manage pagination yourself.
| CoastalCoder wrote:
| Good point. This reminds me of the Linux (Unix?) "file"
| program, and whichever hero(es) maintain it.
| amelius wrote:
| I salute whoever chooses to use this and runs into the
| assumptions made by this tool that turn out to be wrong.
| sesm wrote:
| In theory, if it could load something like 'plugins' (for
| example as separate shell commands) some of the maintenance
| effort could be offloaded to 'plugin' authors.
| PreInternet01 wrote:
| Oh, this is cool. I'm a huge proponent of CLI tools supporting
| sensible JSON output, and things like
| https://github.com/WireGuard/wireguard-tools/blob/master/con...
| and PowerShell's |ConvertTo-Json are a huge part of my
| management/monitoring automation efforts.
|
| But, unfortunately, _sensible_ is doing some heavy lifting here
| and reality is... well, reality. While the output of things like
| the LSI /Broadcom StorCLI 'suffix the command with J' approach
| and some of PowerShell's COM-hiding wrappers (which are
| depressingly common) is _technically_ JSON, the end result is so
| mindbogglingly complex-slash-useless, that you 're quickly forced
| to revert to 'OK, just run some regexes on the plain-text output'
| kludges anyway.
|
| Having said that, I'll definitely check this out. If the first
| example given, parsing dig output, is indeed representative of
| what this can _reliably_ do, it should be interesting...
| zubairq wrote:
| Simple idea, really great to see this!
| kbknapp wrote:
| Really cool idea but this gives me anxiety just thinking about
| how it has to be maintained. Taking into account versions,
| command flags changing output, etc. all seems like a nightmare to
| maintain to the point where I'm assuming actual usage of this
| will work great for a few cases but quickly lose it's novelty
| beyond basic cases. Not to mention using `--<CMD>` for the tool
| seems like a poor choice as your help/manpage will end up being
| thousands of lines long because each new parser will require a
| new flag.
| verdverm wrote:
| This is one of the better use cases for LLMs, which have shown
| good capability at turning unstructured text into structured
| objects
| ninkendo wrote:
| If LLM's were local and cheap, sure. They're just too
| heavyweight of a tool to use for simple CLI output
| manipulation today. I don't want to send everything to the
| cloud (and pay a fee), and even if it was a local LLM, I
| don't want it to eat all my RAM and battery to do simple text
| manipulation.
|
| In 20 years, assuming some semblance of moore's law still
| holds for storage/RAM/gpu, I'm right there with you.
| d3nj4l wrote:
| On my M1 Pro/16GB RAM mac I get decently fast, fully local
| LLMs which are good enough to do this sort of thing. I use
| them in scripts all the time. Granted, I haven't checked
| the impact on the battery life I get, but I definitely
| haven't noticed any differences in my regular use.
| mosselman wrote:
| Which models do you run and how?
| _joel wrote:
| not op, but this is handy https://lmstudio.ai/
| verdverm wrote:
| https://github.com/ggerganov/llama.cpp is a popular local
| first approach. LLaMa is a good place to start, though I
| typically use a model from Vertex AI via API
| chongli wrote:
| Yeah, it would be much better if you could send a sample of
| the input and desired output and have the LLM write a
| highly optimized shell script for you, which you could then
| run locally on your multi-gigabyte log files or whatever.
| himinlomax wrote:
| Problem: some crusty old tty command has dodgy output.
|
| Solution: throw a high end GPU with 24GB RAM and a million
| dollar of training at it.
|
| Yeah, great solution.
| verdverm wrote:
| With fine-tuning, you can get really good results on
| specific tasks that can run on regular cpu/mem. I'd suggest
| looking into the distillation research, where large model
| expertise can be transferred to much smaller models.
|
| Also, an LLM trained to be good at this task has many more
| applications than just turning command output into
| structured data. It's actually one of the most compelling
| business use cases for LLMs
| anonymous_sorry wrote:
| The complaint is less whether it would work, and more a
| question of taste. Obviously taste can be a personal
| thing. My opinions are my own and not those of the BBC,
| etc.
|
| You have a small C program that processes this data in
| memory, and dumps it to stdout in tabular text format.
|
| Rather than simplify by stripping out the problematic bit
| (the text output), you suggest adding a large, cutting-
| edge, hard to inspect and verify piece of technology that
| transforms that text through uncountable floating point
| operations back into differently-formatted UTF8.
|
| It might even work consistently (without you ever having
| 100% confidence it won't hallucinate at precisely the
| wrong moment).
|
| You can certainly see it being justified for one-off
| tasks that aren't worth automating.
|
| But to shove such byzantine inefficiency and complexity
| into an engineered system (rather than just modify the
| original program to give the format you want) offends my
| engineering sensibilities.
|
| Maybe I'm just getting old!
| verdverm wrote:
| If you can modify the original program, then that is by
| far the best way to go. More often than not, you cannot
| change the program, and in relation to the broader
| applicability, most unstructured content is not produced
| by programs.
| anonymous_sorry wrote:
| Yes, makes sense. Although this was originally a post
| about output of common command-line tools. Some of these
| are built on C libraries that you can just use directly.
| They are usually open source.
| hnlmorg wrote:
| As someone who maintains a solution that solves similar
| problems to jc, I can assure you that you don't need a LLM to
| parse most human readable output.
| verdverm wrote:
| it's more about the maintenance cost, you don't have to
| write N parsers for M versions
|
| Maybe the best middle ground is to have an LLM write the
| parser. Lowers the development cost and runtime
| performance, in theory
| hnlmorg wrote:
| You don't have to write dozens of parsers. I didn't.
| verdverm wrote:
| Part of the appeal is that people who don't know how to
| program or write parsers can use an LLM to solve their
| unstructured -> structured problem
| hnlmorg wrote:
| Sure. But we weren't talking about non-programmers
| maintaining software.
| verdverm wrote:
| > people who don't know how to program _OR_ write parsers
|
| there are plenty of programmers who do not know how to
| write lexers, parsers, and grammars
| hnlmorg wrote:
| We are chatting about maintaining a software project
| written in a software programming language. Not some
| theoretical strawman argument youve just dreamt up
| because others have rightly pointed out that you don't
| need a LLM to parse the output of a 20KB command line
| program.
|
| As I said before, I maintain a project like this. I also
| happen to work for a company that specialises in the use
| of generative AI. So I'm well aware of the power of LLMs
| as well as the problems of this very specific domain. The
| ideas you've expressed here are, at best, optimistic.
|
| by the time you've solved all the little quirks of ML
| you'll have likely invested far more time on your LLM
| then you would have if you'd just written a simple parser
| and, ironically, needed someone far more specialised to
| write the LLM than your average developer.
|
| This simply isn't a problem that needs a LLM chucked at
| it.
|
| You don't even need to write lexers and grammars to parse
| 99% of application output. Again, I know this because
| I've written such software.
| tovej wrote:
| this is a terrible idea, I can't think of a less
| efficient method with worse correctness guarantees. What
| invariants does the LLM enforce? How do you make sure it
| always does the right thing? How do you debug it when it
| fails? What kind of error messages will you get? How will
| it react to bad inputs, will it detect them (unlikely),
| will it hallicinate an interpretation (most likely)
|
| This is not a serious suggestion
| verdverm wrote:
| I used to focus on the potential pitfalls and be overly
| negative. I've come to see that these tradeoffs are
| situational. After using them myself, I can definitely
| see upsides that outweigh the downsides
|
| Developers make mistakes too, so there are no guarantees
| either way. Each of your questions can be asked of
| handwritten code too
| smrq wrote:
| You can ask those questions, but you won't get the same
| answers.
|
| It's not a question of "is the output always correct".
| Nothing is so binary in the real world. A well hand-
| maintained solution will trend further towards
| correctness as bugs are caught, reported, fixed,
| regression tested, etc.
|
| Conversely, you could parse an IP address by rolling
| 4d256 and praying. It, too, will sometimes be correct and
| sometimes be incorrect. Does that make it an equally
| valid solution?
| SwedishExpat wrote:
| You can put the LLM into the code loop so it prompts,
| tries the response on test cases, and gives it back until
| it works. This is surprisingly reliable for simple
| problems like JSON'ifying structured text.
| keithalewis wrote:
| Give a kid a hammer and he'll find something to fix.
| verdverm wrote:
| What value does this comment add?
| otteromkram wrote:
| I got a kick out of it.
|
| -\\_(tsu)_/-
| keithalewis wrote:
| Approximately the same amount as the comment I replied
| to.
| verdverm wrote:
| One attempts to nudge a user towards the comment
| guidelines of HN
| (https://news.ycombinator.com/newsguidelines.html)
|
| > Be kind. Don't be snarky. Converse curiously; don't
| cross-examine. Edit out swipes.
|
| > Comments should get more thoughtful and substantive,
| not less, as a topic gets more divisive.
|
| > Eschew flamebait. Avoid generic tangents. Omit internet
| tropes.
| leptons wrote:
| The old saying "If a hammer is your only tool then
| everything is a nail" is absolutely pertinent to this
| comment thread.
| verdverm wrote:
| how so? what assumptions are you making to reach that
| conclusion?
| cproctor wrote:
| Would it be fair to think about this as a shim whose scope of
| responsibility will (hopefully) shrink over time, as command
| line utilities increasingly support JSON output? Once a utility
| commits to handling JSON export on its own, this tool can
| delegate to that functionality going forward.
| dan_quixote wrote:
| I'd also assume that a CLI resisting JSON support is likely
| to have a very stable interface. Maybe wishful thinking...
| pydry wrote:
| It would but I can still see somebody launching this with
| great enthusiasm and then losing the passion to fix Yet
| Another Parsing Bug introduced on a new version of dig
| kbrazil wrote:
| `jc` author here. I've been maintaining `jc` for nearly
| four years now. Most of the maintenance is choosing which
| new parsers to include. Old parsers don't seem to have too
| many problems (see the Github issues) and bugs are
| typically just corner cases that can be quickly addressed
| along with added tests. In fact there is a plugin
| architecture that allows users to get a quick fix so they
| don't need to wait for the next release for the fix. In
| practice it has worked out pretty well.
|
| Most of the commands are pretty old and do not change
| anymore. Many parsers are not even commands but standard
| filetypes (YAML, CSV, XML, INI, X509 certs, JWT, etc.) and
| string types (IP addresses, URLs, email addresses,
| datetimes, etc.) which don't change or use standard
| libraries to parse.
|
| Additionally, I get a lot of support from the community.
| Many new parsers are written and maintained by others,
| which spreads the load and accelerates development.
| majkinetor wrote:
| This requires collaboration. People submitting parsing info for
| the tool they need, and people that use it to easily keep it up
| to date. That is the only way.
| eichin wrote:
| I'm sort of torn - yeah, one well-maintained "basket" beats
| having a bunch of ad-hoc output parsers all over the place, but
| I want direct json output because I'm doing something
| complicated and don't want parsing to add to the problem. (I
| suppose the right way to get comfortable with using this is to
| just make sure to submit PRs with additional test cases for
| everything I want to use it with, since I'd have to write those
| tests anyway...)
| zackmorris wrote:
| Keep in mind that the maintenance responsibility you're anxious
| about is currently a cost imposed on all developers.
|
| <rant>
|
| Since I started programming in the 80s, I've noticed a trend
| where most software has adopted the Unix philosophy of "write
| programs that do one thing and do it well". Which is cool and
| everything, but has created an open source ecosystem of rugged
| individualism where the proceeds to the winners so vastly
| exceeds the crumbs left over for workers that there is no
| ecosystem to speak of, just exploitation. Reflected now in the
| wider economy.
|
| But curated open source solutions like jc approach problems at
| the systems level so that the contributions of an individual
| become available to society in a way that might be called
| "getting real work done". Because they prevent that unnecessary
| effort being repeated 1000 or 1 million times by others. Which
| feels alien in our current task-focussed reality where most
| developers never really escape maintenance minutia.
|
| So I'm all in favor of this inversion from "me" to "we". I also
| feel that open source is the tech analog of socialism. We just
| have it exactly backwards right now, that everyone has the
| freedom to contribute, but only a select few reap the rewards
| of those contributions.
|
| We can imagine what a better system might look like, as it
| would start with UBI. And we can start to think about
| delivering software resources by rail instead of the labor of
| countless individual developer "truck drivers". Some low-
| hanging fruit might be: maybe we need distributions that
| provide everything and the kitchen sink, then we run our
| software and a compiler strips out the unused code, rather than
| relying on luck to decide what we need before we've started
| like with Arch or Nix. We could explore demand-side economics,
| where humans no longer install software, but dependencies are
| met internally on the fly, so no more include paths or headers
| to babysit (how early programming languages worked before C++
| imposed headers onto us). We could use declarative programming
| more instead of brittle imperative (hard coded) techniques. We
| could filter data through stateless self-contained code modules
| communicating via FIFO streams like Unix executables. We could
| use more #nocode approaches borrowed from FileMaker, MS Access
| and Airtable (or something like it). We could write software
| from least-privileges, asking the user for permission to access
| files or the networks outside the module's memory space, and
| then curate known-good permissions policies instead of
| reinventing the wheel for every single program. We could (will)
| write test-driven software where we design the spec as a series
| of tests and then AI writes the business logic until all of the
| tests pass.
|
| There's a lot here to unpack here and a wealth of experience
| available from older developers. But I sympathize with the
| cognitive dissonance, as that's how I feel every single day
| witnessing the frantic yak shaving of "modern" programming
| while having to suppress my desire to use these other proven
| techniques. Because there's simply no time to do so under the
| current status quo where FAANG has quite literally all of the
| trillions of dollars as the winners, so decides best practices
| while the open source community subsists on scraps in their
| parent's basement hoping to make rent someday.
| nickster wrote:
| All output being an object is one of my favorite things about
| powershell. I miss it when I have to write a bash script.
| timetraveller26 wrote:
| Does anybody know of a listof modern unix command-line tools
| accepting a --json option?
|
| It may even be useful to add that information to this repo.
| chungy wrote:
| Basically everything on FreeBSD supports it via libxo.
| user3939382 wrote:
| Probably not what you had in mind but, AWS CLI.
| pirates wrote:
| kubectl with "-o json"
| geraldcombs wrote:
| TShark (the CLI companion to Wireshark) does with the `-T json`
| flag.
| Mister_Snuggles wrote:
| In FreeBSD, this problem was solved with libxo[0]:
| $ ps --libxo=json | jq { "process-information":
| { "process": [ { "pid":
| "41389", "terminal-name": "0 ",
| "state": "Is", "cpu-time": "0:00.01",
| "command": "-bash (bash)" }, [...]
|
| It's not perfect though. ls had support, but it was removed for
| reasons[1]. It's not supported by all of the utilities, etc.
|
| This seems to be a great stop-gap with parsers for a LOT of
| different commands, but it relies on parsing text output that's
| not necessarily designed to be parsed. It would be nice if
| utilities coalesced around a common flag to emit structured
| output.
|
| In PowerShell, structured output is the default and it seems to
| work very well. This is probably too far for Unix/Linux, but a
| standard "--json" flag would go a long way to getting the same
| benefits.
|
| [0] https://wiki.freebsd.org/LibXo
|
| [1] https://reviews.freebsd.org/D13959
| supriyo-biswas wrote:
| Similarly, in SerentityOS, stuff under /proc return JSON data
| rather than unstructured text files.
|
| A better, structural way in which this could be fixed is to
| allow data structures to be exported in ELFs and have those
| data structures serialized into terminal output, which can then
| be outputted in the preferred format of the user, such as JSON,
| YAML, or processed accordingly.
| nijave wrote:
| Libxo is neat, in theory, but it seems like applications are
| left to implement their own logic for a given output format
| rather than being able to pass a structure to libxo and let it
| do the formatting.
|
| I can't remember the exact utility--I think it was iostat--
| would use string interpolation to format output lines in JSON
| and combined with certains flags produced completely mangled
| output. Not sure if things have improved but I would have
| expected something like JSON lines when interval is provided.
|
| Powershell and kubectl are miles ahead of libxo in useability
| imo
| simias wrote:
| Well I suspect that eventually you just run into hard
| limitations with C's introspection facilities, or lack
| thereof.
|
| I like C a lot but one of the reasons I like Rust more these
| days is the ability to trivially implement complex
| serialization schemes without a ton of ad-hoc code and
| boilerplate.
| gigatexal wrote:
| far better for applications to be unaware of a such a utility
| and allow something like jc to grow in support with plugins
| or something so as to keep the utilities simple and move the
| logic and burden to the wrapping utility in this case jc.
| evnp wrote:
| _> In PowerShell, structured output is the default and it seems
| to work very well. This is probably too far for Unix /Linux,
| but a standard "--json" flag would go a long way to getting the
| same benefits._
|
| OP has a blog post[0] which describes exactly this. `jc` is
| described as a tool to fill this role "in the meantime" -- my
| reading is that it's intended to serve as a stepping stone
| towards widespread `-j`/`--json` support across unix tools.
|
| [0] https://blog.kellybrazil.com/2019/11/26/bringing-the-unix-
| ph...
| throw0101b wrote:
| > _In FreeBSD, this problem was solved with libxo[0]:_
|
| Libxo happens to be in the base system, but it is generally
| available:
|
| * https://github.com/Juniper/libxo
|
| * https://libxo.readthedocs.io/en/latest/
| ekidd wrote:
| > _In PowerShell, structured output is the default and it seems
| to work very well._
|
| PowerShell goes a step beyond JSON, by supporting actual
| mutable _objects_. So instead of just passing through
| structured data, you effectively pass around opaque objects
| that allow you to go back to earlier pipeline stages, and
| invoke methods, if I understand correctly:
| https://learn.microsoft.com/en-
| us/powershell/module/microsof....
|
| I'm rather fond of wrappers like jc and libxo, and experimental
| shells like https://www.nushell.sh/. These still focus on
| passing _data_ , not objects with executable methods. On some
| level, I find this comfortable: Structured data still feels
| pretty Unix-like, if that makes sense? If I want actual
| objects, then it's probably time to fire up Python or Ruby.
|
| Knowing when to switch from a shell script to a full-fledged
| programming language is important, even if your shell is
| basically awesome and has good programming features.
| lukeschlather wrote:
| Are executable methods really that bad? I mean, they're bad
| in some abstract sense but that seems more like an objection
| if we were talking about a "safe" language like Rust than
| talking about shell scripting. For a shell executable methods
| seem fine. If you don't make the method executable people are
| just going to use eval() anyway, might as well do the more
| predictable thing.
| ekidd wrote:
| It _might_ be possible to design a good Unix shell based on
| objects, with the ability to "call back into" programs.
| But I haven't seen one yet that I'd prefer over Ruby or
| Python.
|
| I do think objects make plenty of sense in languages like
| AppleScript, which essentially allowed users to script
| running GUI applications. And similarly, Powershell's
| objects might be right for Windows.
|
| But nushell shows how far you can push "dumb" structured
| data. And it still feels "Unix-like", or at least
| "alternate universe Unix-like."
|
| The other reason I'm suspicious of objects in shells is
| that shell pilelines are technically async coroutines
| operating over streams! That's already much further into
| the world of Haskell or async Rust than many people
| realize. And so allowing "downstream" portions of a
| pipeline to call back into "upstream" running programs and
| to randomly change things introduces all kinds of potential
| async bugs.
|
| If you're going to have a async coroutines operating on
| streams, then having immutable data is often a good choice.
| Traditional Unix shells do exactly this. Nushell does it,
| too, but it replaces plain text with structured data.
| munchbunny wrote:
| PowerShell is basically an interactive C# shell with language
| ergonomics targeting actual shell usage instead of "you can
| use it as a shell" the way Python, Ruby, etc. approach their
| interactive shells. However, the language and built-in
| utilities work best when you are passing around data as
| opposed to using PowerShell as if you were writing C#.
|
| It's true, you are indeed passing around full-blown .NET
| runtime objects. In fact your whole shell is running inside
| an instance of the .NET runtime, including the ability to
| dynamically load .NET DLL's and even directly invoke native
| API's.
|
| It feels a bit like JS in the sense that you're best off
| sticking to "the good parts", where you get the power of
| structured input/output but you don't end up trying to, for
| example, implement high performance async code, even though
| you technically could.
| msla wrote:
| > It's not perfect though. ls had support, but it was removed
| for reasons
|
| In specific:
|
| https://svnweb.freebsd.org/base?view=revision&revision=32810...
|
| > libxo imposes a large burden on system utilities. In the case
| of ls, that burden is difficult to justify -- any language that
| can interact with json output can use readdir(3) and stat(2).
|
| Which rather misses the point of being able to use JSON in
| shell scripts.
| oh_sigh wrote:
| I'd love to know what the burden was too. I hear comments
| like that in code reviews and commonly when you push for
| specifics about the burden, there is very little
| rezonant wrote:
| > This seems to be a great stop-gap with parsers for a LOT of
| different commands, but it relies on parsing text output that's
| not necessarily designed to be parsed
|
| True, and yet it's extremely common to parse output in bash
| scripts and other automations, so in a sense it's just
| centralizing that effort. That being said at least when you do
| it yourself you can fix problems directly.
| imtringued wrote:
| Now they only need to do the same thing for input and let the
| operating system or the shell handle the argument parsing so
| that it is consistent accross the entire operating system.
| nerdponx wrote:
| What I find weird about Powershell is that there's no "fixed-
| width column" parser, which is a widely used format for Unix-
| style CLI tools.
|
| I don't know if NuShell has it, I haven't tried.
|
| In any case, it's much better for tools to output more-
| parseable data in the first place. Whitespace-delimited columns
| are fine of course, but not so much when the data can contain
| whitespace, as in the output from `ps`.
|
| I don't see much reason why JSONLines (https://jsonlines.org/)
| / NDJSON (https://ndjson.org/) can't be a standard output
| format from most tools, in addition to tables.
|
| As for the reason of removal: any language that
| can interact with json output can use readdir(3) and stat(2).
|
| Ugh. Any language of course can do it. But that's basically
| telling users that they need to reimplement ls(1) themselves if
| they want to use any of its output and features in scripts.
|
| I understand if the maintenance burden is too high to put it in
| ls(1) itself, but it's a shame that _no tool_ currently does
| this. The closest we have is a feature request in Eza:
| https://github.com/eza-community/eza/issues/472
| timetraveller26 wrote:
| The modern kids want it all easy, when I was learning Linux we
| used null delimiters, xargs, cut, sed & awk and that was enough!
| \s
| rplnt wrote:
| Wish it would have automatic parser selection by default. Even if
| just for a (possible) selected subset. Typing `foo | jc | jq ...`
| would be more convenient than `foo | jc --foo | jq ...`.
| mikecarlton wrote:
| It supports `jc foo | jq` which is quite handy. E.g. `jc dig
| google.com txt | jq '.[]|.answer[]|.data'`
| kbrazil wrote:
| Also, `jc` automatically selects the correct /proc/file
| parser so you can just do `jc /proc/meminfo` or `cat
| /proc/meminfo | jc --proc` without specifying the actual proc
| parser (though you can do that if you want)
|
| Disclaimer: I'm the author of `jc`.
| crazysim wrote:
| Jesus Christ, it's JSON for Bourne!
|
| I wonder how well this could work/interact with Powershell.
| majkinetor wrote:
| Perfectly: <json stringn input> | ConvertFrom-Json
| js2 wrote:
| Previous discussion (linked to from the project's readme):
|
| https://news.ycombinator.com/item?id=28266193
| Pxtl wrote:
| Honestly this is half the reason I use Powershell for everything.
| Bash-like experience but everything returns objects.
|
| It's a messy, hairy, awful language. Consistently inconsistent,
| dynamically-typed in the worst ways, "two googles per line"
| complexity, etc.
|
| But for the convenience of being able to combine shell-like
| access to various tools and platforms combined with the
| "everything is a stream of objects" model, it can't be beat in my
| experience.
|
| And you can still do all the bash-like things for tools that _don
| 't_ have good Powershell wrappers that will convert their text-
| streams into objects. Which, sadly, is just about everything.
| abound wrote:
| Nushell [1] ends up at mostly the same place (structured data
| from shell commands) with a different approach, mostly just being
| a shell itself.
|
| [1] http://www.nushell.sh/
| saghm wrote:
| I had glanced at nunshell every now and then since it was
| initially announced, but it wasn't until a month or two before
| that I finally really "got" the point of it. I was trying to
| write a script to look through all of the files in a directory
| matching a certain pattern and pruning them to get rid of ones
| with modified timestamps within 10 minutes of each other. I
| remembered that nushell was supposed to be good for things like
| this, and after playing around with it for a minute, it finally
| "clicked" and now I'm hooked. Even when dealing with
| unstructured data, there's a lot of power in being able to
| convert it even into something as a list of records (sort of
| like structs) and process it from there.
| js2 wrote:
| > write a script to look through all of the files in a
| directory matching a certain pattern...
|
| That's just `find`...
|
| > and pruning them to get rid of ones with modified
| timestamps...
|
| Still just `find`...
|
| > within 10 minutes of each other.
|
| No longer just `find`. :-) I'd probably resort to Python +
| pathlib at that point.
| freedomben wrote:
| Really glad to see this is already packaged for most linux
| distributions. So many utilities nowadays seem be written in
| Python, and python apps are such a PITA to install without
| package manager packages. There's so many different ways to do it
| and everything seems to be a little different. Some will require
| root and try to install on top of package manager owned
| locations, which is a nightmare.
|
| Fedora Toolbox has been wonderful for this exact use case
| (installing Python tools), but for utilities like this that will
| be part of a bash pipe chain for me, toolbox won't cut it.
| Spivak wrote:
| Installing self-contained programs written in Python not
| packaged for your distro:
| PIPX_HOME=/usr/local/pipx PIPX_BIN_DIR=/usr/local/bin pipx
| install app==1.2.3
|
| It sets up an isolated install for each app with only its deps
| and makes it transparent.
|
| The distro installation tree of Python is for the exclusive use
| of your distro because core apps cloud-init, dnf, firewalld are
| built against those versions.
| freedomben wrote:
| thank you! That's amazingly helpful. I had no idea pipx was a
| thing
|
| For others: https://github.com/pypa/pipx
|
| It's also in the Fedora repos: dnf install -y pipx
| bottled_poe wrote:
| I'd bet money on people (here) using tools like this to process
| millions of records or more. It's a sad truth those people won't
| have jobs in a few years when AI, which will know better, takes
| hold :(
| moss2 wrote:
| Excellent
| sesm wrote:
| IMO 'jc dig example.com' should be the primary syntax, because
| 'dig example.com | jc --dig' has to retroactively guess the flags
| and parameters of previous command to parse the output.
| mejutoco wrote:
| I wonder if a tool could parse any terminal output into json in a
| really dumb and deterministic way: {
| "lines": [ "line1 bla bla", "line1 bla
| bla", ], "words": [ "word1",
| "word2", ], }
|
| With enough representations (maybe multiple of the same thing) to
| make it easy to process, without knowing sed or similar. It seems
| hacky but it would not require any maintenance for each command,
| and would only change if the actual output changes.
| hk__2 wrote:
| What's the point of JSON, then?
| codedokode wrote:
| They are doing all wrong, instead of using human-readable and
| machine-readable formats, all CLI tools should use human-and-
| machine-readable format.
| mathfailure wrote:
| You mean YAML?
| codedokode wrote:
| YAML is over-complicated.
| Cyph0n wrote:
| Interesting project! But I expected them to be using textfsm (or
| something similar) as a first step parser. textfsm is heavily
| used to parse CLI outputs in networking devices.
|
| https://github.com/google/textfsm
| nailer wrote:
| Nice.
|
| Too many "lets fix the command line" (nushell, pwsh) have noble
| goals, but also start with "first let's boil the ocean".
|
| We need to easily ingest old shitty text output for a little
| while to move to the new world of structured IO.
| kazinator wrote:
| $ dig example.com | txr dig.txr
| [{"query_time":"1","rcvd":"56","answer_num":1,"status":"NOERROR",
| "when_epoch":1702030676,"opcode":"QUERY","udp":"65494","opt_pseud
| osection":{"edns":{"udp":65494,"flags":[],"version":"0"}},
| "query_num":1,"question":{"name":"example.com.","type":"A","class
| ":"IN"}, "server":"127.0.0.53#53(127.0.0.53)","id":"48295
| ","authority_num":0, "answer":[{"name":"example.com.","ty
| pe":"A","data":"93.184.216.34","ttl":"4441",
| "class":"IN"}], "additional_num":1,"when":"Fri Dec 08
| 10:17:56 PST 2023"}] $ cat dig.txr @(bind sep
| @#/[\s\t]+/) @(skip) ;; ->>HEADER<<- opcode: @opcode,
| status: @status, id: @id ;; flags: qr rd ra; QUERY: @query,
| ANSWER: @answer, AUTHORITY: @auth, ADDITIONAL: @additional
| @(skip) ;; OPT PSEUDOSECTION: ; EDNS: version:
| @edns_ver, flags:@flags; udp: @udp @(skip) ;;
| QUESTION SECTION: ;@qname@sep@qclass@sep@qtype
| @(skip) ;; ANSWER SECTION:
| @aname@sep@ttl@sep@aclass@sep@atype@sep@data ;;
| Query time: @qtime msec ;; SERVER: @server ;; WHEN:
| @when ;; MSG SIZE rcvd: @rcvd @(do (put-jsonl #J^[{
| "id" : ~id, "opcode" : ~opcode,
| "status" : ~status, "udp" : ~udp,
| "query_num" : ~(tofloat query),
| "answer_num" : ~(tofloat answer),
| "authority_num" : ~(tofloat auth),
| "additional_num" : ~(tofloat additional),
| "opt_pseudosection" : {
| "edns" : {
| "version" : ~edns_ver, "flags" :
| [], "udp" : ~(tofloat udp)
| } },
| "question" : {
| "name" : ~qname, "class" : ~qclass,
| "type" : ~qtype },
| "answer" : [
| { "name" : ~aname,
| "class" : ~aclass, "type" :
| ~atype, "ttl" : ~ttl,
| "data" : ~data }
| ], "query_time" : ~qtime,
| "server" : ~server, "rcvd" : ~rcvd,
| "when" : ~when, "when_epoch" :
| ~(time-parse "%a %b %d %T %Z %Y" when).(time-utc)
| }]))
|
| The latest TXR (292 as of time of writing) allows integers in
| JSON data, so (toint query) could be used.
| AtlasBarfed wrote:
| My God, doesn't properly handle ls?
| da39a3ee wrote:
| Awesome, does it work for man pages? They're a huge elephant in
| the room -- people get really upset if you point out that man
| pages are an unsearchable abomination, locking away vast amounts
| of important information about unix systems in an unparseable
| mess. But, it's true.
___________________________________________________________________
(page generated 2023-12-08 23:01 UTC)