[HN Gopher] Extracting Objects Recursively with Jq
___________________________________________________________________
Extracting Objects Recursively with Jq
Author : edward
Score : 187 points
Date : 2021-08-01 14:59 UTC (8 hours ago)
(HTM) web link (til.simonwillison.net)
(TXT) w3m dump (til.simonwillison.net)
| tejtm wrote:
| To better appreciate the structure of the document the author is
| dealing with (and to cast a bit of light on which words are
| variables in the document and which are `jq` syntax. I offer a
| shameless plug to a one liner
|
| well, I would but the result is "too long for a HN comment" so
| here a bunch is sniped out of the middle (unedited the result
| would currently be 140 lines) curl -s
| https://hn.algolia.com/api/v1/items/27941108 |
| ~/bin/json2jqpath.jq . .author .children
| .children|.[] .children|.[]|.author
| .children|.[]|.children .children|.[]|.children|.[]
| .children|.[]|.children|.[]|.author
| .children|.[]|.children|.[]|.children
| .children|.[]|.children|.[]|.children|.[]
| .children|.[]|.children|.[]|.children|.[]|.author
| .children|.[]|.children|.[]|.children|.[]|.children
| .children|.[]|.children|.[]|.children|.[]|.children|.[]
| <snip> ... </snip>
| .children|.[]|.children|.[]|.created_at
| .children|.[]|.children|.[]|.created_at_i
| .children|.[]|.children|.[]|.id
| .children|.[]|.children|.[]|.options
| .children|.[]|.children|.[]|.parent_id
| .children|.[]|.children|.[]|.points
| .children|.[]|.children|.[]|.story_id
| .children|.[]|.children|.[]|.text
| .children|.[]|.children|.[]|.title
| .children|.[]|.children|.[]|.type
| .children|.[]|.children|.[]|.url .children|.[]|.created_at
| .children|.[]|.created_at_i .children|.[]|.id
| .children|.[]|.options .children|.[]|.parent_id
| .children|.[]|.points .children|.[]|.story_id
| .children|.[]|.text .children|.[]|.title
| .children|.[]|.type .children|.[]|.url .created_at
| .created_at_i .id .options .parent_id
| .points .story_id .text .title .type
| .url
|
| [0] https://github.com/TomConlin/json_to_paths
| woodruffw wrote:
| Thanks for linking this! I've wanted exactly this script, many
| times.
| jcims wrote:
| jq seems incredibly powerful. I only find myself using it a few
| times a year though, and have never been able to conceptualize
| the syntax enough to use it without prodigious googling.
| xcambar wrote:
| Yes, jq is very powerful yet incredibly inefficient to work
| with when you don't master its arcanes.
| minxomat wrote:
| Think of it just like bash. jq is the ultimate functional
| language and/or ETL tool. If I look back at larger jq
| transforms I've written a while ago (e.g. https://git.io/JBSfB)
| they still make perfect sense to me.
| [deleted]
| thangalin wrote:
| An introductory script showing the jq, REST, and JSON trinity:
|
| https://github.com/DaveJarvis/github-email/blob/master/githu...
| baldeagle wrote:
| The most powerful thing about jq, IMO is that is can alter some
| of the json without having to parse and convert to an object all
| of the json. It is like using data transformation lasers.
| [deleted]
| uluyol wrote:
| jq is a fantastic tool for exploring data and doing simple
| transformations. I often wish it could consume/write data in
| other formats.
| dangerbird2 wrote:
| In python-land, I like to use glom[1], which is basically jq
| but operating on arbitrary python data structures. I believe
| there are bindings for jq in python in other languages, which
| would allow operating on data structures, but I imagine they
| are just spawning jq as a subprocess, since it doesn't seem
| like jq has a public C api.
|
| [1] https://glom.readthedocs.io/en/latest/index.html
| matthewtovbin wrote:
| I moved to JSONata - http://docs.jsonata.org/overview.html
| mftb wrote:
| I'm happy to see that I'm not alone in my struggles with jq. I
| wanted to love it right out of the box. It appears to be very
| well engineered, but over and over again I have struggled with
| it's syntax.
|
| What I think I want is a syntax closer to css selectors. What I
| think I'm going to have to do is really stop and learn jq. It
| looks like some of the links in here may help.
| quickthrower2 wrote:
| Jq is a Command-line JSON processor, if like me you didn't know
| and thought it might be Jquery abbreviated
| psanford wrote:
| JQ is often frustrating when you want to do something non-trivial
| but you can't figure out how and the documentation is of little
| help.
|
| I think JQ could really benefit from having a classic programming
| language style "book", like "The AWK Programming Language". JQ is
| fundamentally a functional programming language with semantics
| that are not obvious reading its current docs.
| doytch wrote:
| I've also found the documentation tough to use when I need to
| jump back in and answer something like "how do I _____?" It
| feels like one of those sets of docs that's better approached
| by just reading the whole damn thing, working through some
| examples, etc. Studying the docs as opposed to referring to
| them.
| psanford wrote:
| Right, I agree. But I think the way the docs are currently
| written are not really meant to be read through like that.
| Thats why I mentioned "The AWK Programming Language" book,
| which is excellently written, and easy to read from start to
| end.
|
| I don't mean to denigrate the current docs, writing
| documentation is hard!
| matthewtovbin wrote:
| Try JSONata instead - http://docs.jsonata.org/overview.html
| loa_in_ wrote:
| It doesn't seem to provide a command-line tool, which is the
| point of jq
| hoshsadiq wrote:
| I tend to use jq a lot. As others have said, sometimes jq can be
| hard to grasp. Often it requires multiple attempts to get the
| correct answer. To make it a little easier for me, I've written a
| helper function[0] that combines it with fzf[1] to run jq as a
| REPL on any json. It allows to incrementally alter your DSL
| without having to continually call jq. This is similar to jid/jiq
| but a little more powerful. It includes functions to change the
| preview to output raw, compact (or not), and some other things.
|
| It is essentially similar to jqplay but local.
|
| I didn't use jid/jiq because jid uses go-simplejson, which is
| nowhere near as powerful as jq, and jiq seemed very buggy when I
| used it and it felt like it was hacked together. Plus there was
| no where to change jq's arguments while running it.
|
| I'm sure this function can be improved on, but this has been good
| enough for me so far.
|
| Also, I run gojq[2] instead of jq. It is a drop-in replacement
| for jq but is written in Go, and has some improvements over jq
| such as bug fixes, support for yaml input, and it also provides
| more helpful error messages.
|
| [0]
| https://github.com/hoshsadiq/dot_files/blob/master/zshrc.d/m...
|
| [1] https://github.com/junegunn/fzf
|
| [2] https://github.com/itchyny/gojq/
| alerighi wrote:
| I'm the opposite, I find that jq doesn't have a reason to
| exist, other than pretty printing JSON files on a terminal and
| doing basic filtering on a JSON object and only as interactive
| shell usage, NOT in a script.
|
| It's the classical tool, like sed, like awk, like a ton of unix
| utility that at first they seem to you easy to use, then you
| have to do something complex and you start abusing them, by
| piping things multiple times into jq, and you end up writing
| things like this: echo $json | jq "something
| $(echo $variable | jq 'something else' | tr '"' '\'') | sed
| 's/"/\\'/g" | jq "another js invocation" | awk '...' >
| file2.json
|
| I stopped using jq after realizing that I was wasting my time
| by trying to fix a script that used jq and didn't managed
| quoting correctly, trying to use different kind of quotes, even
| filtering the input before passing it to jq with tr replacing
| things. It's just another tool prone to abuse like sed, awk,
| tr, cut or similar things.
|
| I thought why I'm wasting my time on a tool that has a complex
| and limited DSL when I can write a clean python script in 10
| minutes to do the same things that is easier to write, to read
| and most importantly to maintain.
|
| To me a script that has to manipulate JSON should be written in
| an high level programming language like Python, and not be
| abused with tool as jq and stuff. Even I there is an already
| existent big bash script that you don't want to rewrite and you
| have to do some json processing in it... you can write an
| inline python script like this: python3
| <<PYEND ## your python code PYEND
|
| Also jq is another dependency to a script that must be
| installed.
| Arnavion wrote:
| If you have quoting issues, and going by the example you
| posted, you haven't learned how to pass parameters to jq
| correctly. You don't do string interpolation. You use `--arg`
| and `--argjson`
| ris wrote:
| Speaking as a long-time python developer...
|
| If you actually "get into" jq you find out that it's a
| significantly neater language than it appears on the surface.
| Firstly it _does_ allow you write multi-line scripts, and
| things start to look a lot neater once you do. Secondly it 's
| actually a real, working, functional programming language,
| which allows very succinct expression of ideas which, in
| python, would likely require the reader to track state across
| explicit loops and the like.
|
| Once you dig into the manual, you also tend to discover that
| a lot of the things that cause you to string multiple jq
| invocations together aren't actually necessary because there
| are quite sensible ways of handling them in-language.
|
| It's quite laughable though to tout python over jq because of
| it adding a dependency. Perhaps if you're already embedded in
| python-land and all your environments already have python -
| but many (most?) of us are increasingly targeting extremely
| minimal container image environments. In that case, adding
| python is a much larger and more complex dependency than jq's
| single 3.8MB binary.
|
| As someone who has to read an awful lot of other peoples
| deployment scripts, it's also quite nice when I see jq
| because it loudly advertises "all I'm doing here is mangling
| one piece of json into another! no side effects!". I'd much
| rather follow the thread of execution into that than some
| mystery ruby script any day.
| fisxoj wrote:
| If anyone is an emacs user and this sounds compelling, I
| recommend counsel-jq[0] for the sort of feedback loop described
| here.
|
| [0]: https://github.com/200ok-ch/counsel-jq
| forty wrote:
| I use counsel-jq occasionally, and my main issue with it is
| the single line input. As soon as the filtering is not
| trivial, it's much more convenient to be able to use multiple
| lines.
| rodorgas wrote:
| That's cool! What's the use of FZF? Isn't it a fuzzy finder,
| what are you searching for in a jq REPL?
| hoshsadiq wrote:
| It only uses the FZF's preview. The suggestions is completely
| empty. I tried to find an alternative as FZF has no way of
| disabling the selector window, but I was unable to find
| anything that was good enough for this.
|
| I considered forking jid/jiq and using gojq as a library, but
| I ended up not going down that route because of reasons that
| I cannot remember. I also considered using a tui or something
| but FZF has so much already implemented and has a lot of it
| right, and I didn't particularly feel like re-inventing the
| wheel.
| rattray wrote:
| This looks cool - would you be willing to share it in a brew-
| installable format?
| hoshsadiq wrote:
| Not particularly. I don't use a Mac, but I'd be happy to
| separate out the function into it's own script so it can be
| downloaded and put in your $PATH. I personally use zinit to
| manage individual files from random repos.
| psacawa wrote:
| I've written about this before, but for tmux users, I have
| interactive querying the form of ten-line shell script [1]. See
| it live [2]. It can easily be modfied to interactively query
| yaml, html with xpath or css, or text with awk or what have,
| and I have variants for each of these. This has the advantage
| over the parent comment thatyou have your text editor's key
| bindings instead of those of fzf.
|
| Dependencies: tmux, nodemon, less, jq, vim
|
| [1]:
| https://gist.github.com/psacawa/e63c4e25a8b0405309d3a03b6b50...
|
| [2]: https://streamable.com/jwdrqu
| keymone wrote:
| jq is nice, but the moment i need anything more complex than
| "pull this attribute out of bunch of objects" i vastly prefer
| spinning up an actual language runtime. or use a tool built
| around a language (e.g. https://github.com/borkdude/jet) rather
| than a language built around a tool.
| vlmutolo wrote:
| I love posts that are this information-dense. I know for a fact
| that this will be useful to me some day.
| [deleted]
| doytch wrote:
| I've really loved having jq at my disposal ever since learning
| about it, but I feel like it took the combination of it and gron
| [1] to really transform my debugging and JSON workflows.
|
| 1: https://github.com/TomNomNom/gron
| danso wrote:
| I can't believe the number of times I've wanted to grep a JSON
| file, and yet never thought to look and see if there was any
| kind of tool for it. Thanks!
| triska wrote:
| JSON data is also a valid _Prolog_ term, and the declarative
| programming language Prolog is ideally suited for handling tree-
| shaped data.
|
| Using for example Scryer Prolog, we can conveniently _relate_ the
| data to a flat list of items with Prolog 's built-in grammar
| mechanism, definite clause grammars (DCGs):
| flat_json(JSON) --> { JSON = {A,B,C,D,E,F,_:Cs}
| }, [{A,B,C,D,E,F}],
| flat_items(Cs). flat_items([]) --> [].
| flat_items([I|Is]) --> { I = {(A,B,C,D,E,_:Cs)}
| }, [{A,B,C,D,E}], flat_items(Cs),
| flat_items(Is).
|
| Sample query, using the example JSON data from the article:
| ?- JSON = { "id": 27941108, "created_at":
| "2021-07-24T14:15:05.000Z", "type": "story",
| "author": "edward", "title": "Fun with Unix domain
| sockets", "url":
| "https://simonwillison.net/2021/Jul/13/unix-domain-sockets/",
| "children": [ { "id":
| 27942287, "created_at":
| "2021-07-24T16:31:18.000Z", "type":
| "comment", "author": "DesiLurker",
| "text": "<p>one lesser known...", "children":
| [] }, { "id":
| 27944615, "created_at":
| "2021-07-24T21:26:33.000Z", "type":
| "comment", "author": "galaxyLogic",
| "text": "<p>I read this from Wikipedia...",
| "children": [ {
| "id": 27944746, "created_at":
| "2021-07-24T21:49:07.000Z", "type":
| "comment", "author": "hughrr",
| "text": "<p>Yes although I ...",
| "children": [] } ]
| } ] }, phrase(flat_json(JSON),
| Cs), maplist(portray_clause, Cs).
|
| yielding the flat list of entries, as desired:
| [{("id":27941108,"created_at":"2021-07-24T14:15:05.000Z","type":"
| story","author":"edward",...)}, {("id":27942287,"created
| _at":"2021-07-24T16:31:18.000Z","type":"comment",...)},
| {("id":27944615,"created_at":"2021-07-24T21:26:33.000Z","type":"c
| omment",...)}, {("id":27944746,"created_at":"2021-07-24T
| 21:49:07.000Z","type":"comment",...)}]
| ec109685 wrote:
| Anyway to avoid the need to enumerate the json fields that come
| before children with those placeholders? Otherwise, it will be
| brittle to modifications.
|
| Prolog was the one language I couldn't get my head around in
| the programming languages class I took at school.
| jeffbee wrote:
| I often use jq for random hacks but for something like this I
| would turn to XPath. I know that sounds impossibly retro, but
| XPath 3.1 for JSON is awesome and its language makes so much more
| sense than jq's. There are several good implementations, and all
| of them are faster than jq, too.
| pcr910303 wrote:
| A bit offtopic, but I don't see much people knowing/using the
| Algolia API[0]. It's much better to use than the HN official
| API[1], since it returns the whole tree data in one request.
|
| Unfortunately (I guess this is a big reason why people don't use
| it), it doesn't sort the comments - if you need the orders,
| you'll have to parse HN HTML (or just use the official API).
|
| Still just two requests (the HN site, the Algolia API) is much
| better than recursively requesting a hundred requests, so I use
| this approach in my client[2].
|
| [0]: https://hn.algolia.com/api
|
| [1]: https://github.com/HackerNews/API
|
| [2]: https://github.com/goranmoomin/HackerNews
| ur-whale wrote:
| There's a real need for a tool like jq, but jq unfortunately
| isn't it.
|
| What I mean is this: the functionality offered by jq (parsing
| json on the command line and extracting what you need from it) is
| really needed in many modern data processing tasks, but jq's DSL
| is one of the most horrible thing I've had to learn in recent
| years.
|
| The only way a casual user of that thing can hope to succeed in
| actually crafting a working jq query is pray that there is a
| stack overflow topic answering his exact need.
|
| Here's the way I use jq 99% of the time: cat
| somefile.json | jq . | <long pipeline of traditional, well
| though-out unix text processing tools such as sed, grep, awk,
| cut, etc...>
|
| And when this doesn't cut it, I write a python script.
| matthewtovbin wrote:
| Forget about jq. JSONata is much more powerful -
| http://docs.jsonata.org/overview.html
| turbocon wrote:
| Yea what's going on here, you have a couple of other comments
| pushing Jsonata?
|
| You an author or something?
| sidpatil wrote:
| jq is a single binary which I can easily just drop into a
| remote server or throw into my ~/.local/bin, where as JSONata
| requires NPM.
| [deleted]
| xcambar wrote:
| I love jq for many reasons:
|
| * it is very potent
|
| * it has extensive documentation, written more like parabolas and
| new-age sorcery than actually helpful content
|
| * it improves your search skills greatly, for many internet
| results try to give sense to its format
|
| * it gives great satisfaction after you've died-and-retried 300x
| a whole afternoon for a command you'll only need once.
|
| * it looks cool to use jq instead of [any language you're already
| familiar with].
|
| Yes, I'm being sarcastic, yet honest. Note that I still use it
| and recommend it for simpler use cases though.
|
| _[shrug]_
| cinntaile wrote:
| Jq makes me feel stupid, it's comforting to know that it's
| probably not just me.
| damagednoob wrote:
| I too find the jq syntax arcane and I've found
| https://jqplay.org/ to be an invaluable help.
| yewenjie wrote:
| Can someone comment on how does `jq` compare with `fx`?
|
| - https://github.com/antonmedv/fx
___________________________________________________________________
(page generated 2021-08-01 23:00 UTC)