[HN Gopher] Extracting Objects Recursively with Jq
       ___________________________________________________________________
        
       Extracting Objects Recursively with Jq
        
       Author : edward
       Score  : 187 points
       Date   : 2021-08-01 14:59 UTC (8 hours ago)
        
 (HTM) web link (til.simonwillison.net)
 (TXT) w3m dump (til.simonwillison.net)
        
       | tejtm wrote:
       | To better appreciate the structure of the document the author is
       | dealing with (and to cast a bit of light on which words are
       | variables in the document and which are `jq` syntax. I offer a
       | shameless plug to a one liner
       | 
       | well, I would but the result is "too long for a HN comment" so
       | here a bunch is sniped out of the middle (unedited the result
       | would currently be 140 lines)                 curl -s
       | https://hn.algolia.com/api/v1/items/27941108 |
       | ~/bin/json2jqpath.jq              .       .author       .children
       | .children|.[]       .children|.[]|.author
       | .children|.[]|.children       .children|.[]|.children|.[]
       | .children|.[]|.children|.[]|.author
       | .children|.[]|.children|.[]|.children
       | .children|.[]|.children|.[]|.children|.[]
       | .children|.[]|.children|.[]|.children|.[]|.author
       | .children|.[]|.children|.[]|.children|.[]|.children
       | .children|.[]|.children|.[]|.children|.[]|.children|.[]
       | <snip>         ...         </snip>
       | .children|.[]|.children|.[]|.created_at
       | .children|.[]|.children|.[]|.created_at_i
       | .children|.[]|.children|.[]|.id
       | .children|.[]|.children|.[]|.options
       | .children|.[]|.children|.[]|.parent_id
       | .children|.[]|.children|.[]|.points
       | .children|.[]|.children|.[]|.story_id
       | .children|.[]|.children|.[]|.text
       | .children|.[]|.children|.[]|.title
       | .children|.[]|.children|.[]|.type
       | .children|.[]|.children|.[]|.url       .children|.[]|.created_at
       | .children|.[]|.created_at_i       .children|.[]|.id
       | .children|.[]|.options       .children|.[]|.parent_id
       | .children|.[]|.points       .children|.[]|.story_id
       | .children|.[]|.text       .children|.[]|.title
       | .children|.[]|.type       .children|.[]|.url       .created_at
       | .created_at_i       .id       .options       .parent_id
       | .points       .story_id       .text       .title        .type
       | .url
       | 
       | [0] https://github.com/TomConlin/json_to_paths
        
         | woodruffw wrote:
         | Thanks for linking this! I've wanted exactly this script, many
         | times.
        
       | jcims wrote:
       | jq seems incredibly powerful. I only find myself using it a few
       | times a year though, and have never been able to conceptualize
       | the syntax enough to use it without prodigious googling.
        
         | xcambar wrote:
         | Yes, jq is very powerful yet incredibly inefficient to work
         | with when you don't master its arcanes.
        
         | minxomat wrote:
         | Think of it just like bash. jq is the ultimate functional
         | language and/or ETL tool. If I look back at larger jq
         | transforms I've written a while ago (e.g. https://git.io/JBSfB)
         | they still make perfect sense to me.
        
           | [deleted]
        
       | thangalin wrote:
       | An introductory script showing the jq, REST, and JSON trinity:
       | 
       | https://github.com/DaveJarvis/github-email/blob/master/githu...
        
       | baldeagle wrote:
       | The most powerful thing about jq, IMO is that is can alter some
       | of the json without having to parse and convert to an object all
       | of the json. It is like using data transformation lasers.
        
       | [deleted]
        
       | uluyol wrote:
       | jq is a fantastic tool for exploring data and doing simple
       | transformations. I often wish it could consume/write data in
       | other formats.
        
         | dangerbird2 wrote:
         | In python-land, I like to use glom[1], which is basically jq
         | but operating on arbitrary python data structures. I believe
         | there are bindings for jq in python in other languages, which
         | would allow operating on data structures, but I imagine they
         | are just spawning jq as a subprocess, since it doesn't seem
         | like jq has a public C api.
         | 
         | [1] https://glom.readthedocs.io/en/latest/index.html
        
         | matthewtovbin wrote:
         | I moved to JSONata - http://docs.jsonata.org/overview.html
        
       | mftb wrote:
       | I'm happy to see that I'm not alone in my struggles with jq. I
       | wanted to love it right out of the box. It appears to be very
       | well engineered, but over and over again I have struggled with
       | it's syntax.
       | 
       | What I think I want is a syntax closer to css selectors. What I
       | think I'm going to have to do is really stop and learn jq. It
       | looks like some of the links in here may help.
        
       | quickthrower2 wrote:
       | Jq is a Command-line JSON processor, if like me you didn't know
       | and thought it might be Jquery abbreviated
        
       | psanford wrote:
       | JQ is often frustrating when you want to do something non-trivial
       | but you can't figure out how and the documentation is of little
       | help.
       | 
       | I think JQ could really benefit from having a classic programming
       | language style "book", like "The AWK Programming Language". JQ is
       | fundamentally a functional programming language with semantics
       | that are not obvious reading its current docs.
        
         | doytch wrote:
         | I've also found the documentation tough to use when I need to
         | jump back in and answer something like "how do I _____?" It
         | feels like one of those sets of docs that's better approached
         | by just reading the whole damn thing, working through some
         | examples, etc. Studying the docs as opposed to referring to
         | them.
        
           | psanford wrote:
           | Right, I agree. But I think the way the docs are currently
           | written are not really meant to be read through like that.
           | Thats why I mentioned "The AWK Programming Language" book,
           | which is excellently written, and easy to read from start to
           | end.
           | 
           | I don't mean to denigrate the current docs, writing
           | documentation is hard!
        
         | matthewtovbin wrote:
         | Try JSONata instead - http://docs.jsonata.org/overview.html
        
           | loa_in_ wrote:
           | It doesn't seem to provide a command-line tool, which is the
           | point of jq
        
       | hoshsadiq wrote:
       | I tend to use jq a lot. As others have said, sometimes jq can be
       | hard to grasp. Often it requires multiple attempts to get the
       | correct answer. To make it a little easier for me, I've written a
       | helper function[0] that combines it with fzf[1] to run jq as a
       | REPL on any json. It allows to incrementally alter your DSL
       | without having to continually call jq. This is similar to jid/jiq
       | but a little more powerful. It includes functions to change the
       | preview to output raw, compact (or not), and some other things.
       | 
       | It is essentially similar to jqplay but local.
       | 
       | I didn't use jid/jiq because jid uses go-simplejson, which is
       | nowhere near as powerful as jq, and jiq seemed very buggy when I
       | used it and it felt like it was hacked together. Plus there was
       | no where to change jq's arguments while running it.
       | 
       | I'm sure this function can be improved on, but this has been good
       | enough for me so far.
       | 
       | Also, I run gojq[2] instead of jq. It is a drop-in replacement
       | for jq but is written in Go, and has some improvements over jq
       | such as bug fixes, support for yaml input, and it also provides
       | more helpful error messages.
       | 
       | [0]
       | https://github.com/hoshsadiq/dot_files/blob/master/zshrc.d/m...
       | 
       | [1] https://github.com/junegunn/fzf
       | 
       | [2] https://github.com/itchyny/gojq/
        
         | alerighi wrote:
         | I'm the opposite, I find that jq doesn't have a reason to
         | exist, other than pretty printing JSON files on a terminal and
         | doing basic filtering on a JSON object and only as interactive
         | shell usage, NOT in a script.
         | 
         | It's the classical tool, like sed, like awk, like a ton of unix
         | utility that at first they seem to you easy to use, then you
         | have to do something complex and you start abusing them, by
         | piping things multiple times into jq, and you end up writing
         | things like this:                   echo $json | jq "something
         | $(echo $variable | jq 'something else' | tr '"' '\'') | sed
         | 's/"/\\'/g" | jq "another js invocation" | awk '...' >
         | file2.json
         | 
         | I stopped using jq after realizing that I was wasting my time
         | by trying to fix a script that used jq and didn't managed
         | quoting correctly, trying to use different kind of quotes, even
         | filtering the input before passing it to jq with tr replacing
         | things. It's just another tool prone to abuse like sed, awk,
         | tr, cut or similar things.
         | 
         | I thought why I'm wasting my time on a tool that has a complex
         | and limited DSL when I can write a clean python script in 10
         | minutes to do the same things that is easier to write, to read
         | and most importantly to maintain.
         | 
         | To me a script that has to manipulate JSON should be written in
         | an high level programming language like Python, and not be
         | abused with tool as jq and stuff. Even I there is an already
         | existent big bash script that you don't want to rewrite and you
         | have to do some json processing in it... you can write an
         | inline python script like this:                    python3
         | <<PYEND          ## your python code          PYEND
         | 
         | Also jq is another dependency to a script that must be
         | installed.
        
           | Arnavion wrote:
           | If you have quoting issues, and going by the example you
           | posted, you haven't learned how to pass parameters to jq
           | correctly. You don't do string interpolation. You use `--arg`
           | and `--argjson`
        
           | ris wrote:
           | Speaking as a long-time python developer...
           | 
           | If you actually "get into" jq you find out that it's a
           | significantly neater language than it appears on the surface.
           | Firstly it _does_ allow you write multi-line scripts, and
           | things start to look a lot neater once you do. Secondly it 's
           | actually a real, working, functional programming language,
           | which allows very succinct expression of ideas which, in
           | python, would likely require the reader to track state across
           | explicit loops and the like.
           | 
           | Once you dig into the manual, you also tend to discover that
           | a lot of the things that cause you to string multiple jq
           | invocations together aren't actually necessary because there
           | are quite sensible ways of handling them in-language.
           | 
           | It's quite laughable though to tout python over jq because of
           | it adding a dependency. Perhaps if you're already embedded in
           | python-land and all your environments already have python -
           | but many (most?) of us are increasingly targeting extremely
           | minimal container image environments. In that case, adding
           | python is a much larger and more complex dependency than jq's
           | single 3.8MB binary.
           | 
           | As someone who has to read an awful lot of other peoples
           | deployment scripts, it's also quite nice when I see jq
           | because it loudly advertises "all I'm doing here is mangling
           | one piece of json into another! no side effects!". I'd much
           | rather follow the thread of execution into that than some
           | mystery ruby script any day.
        
         | fisxoj wrote:
         | If anyone is an emacs user and this sounds compelling, I
         | recommend counsel-jq[0] for the sort of feedback loop described
         | here.
         | 
         | [0]: https://github.com/200ok-ch/counsel-jq
        
           | forty wrote:
           | I use counsel-jq occasionally, and my main issue with it is
           | the single line input. As soon as the filtering is not
           | trivial, it's much more convenient to be able to use multiple
           | lines.
        
         | rodorgas wrote:
         | That's cool! What's the use of FZF? Isn't it a fuzzy finder,
         | what are you searching for in a jq REPL?
        
           | hoshsadiq wrote:
           | It only uses the FZF's preview. The suggestions is completely
           | empty. I tried to find an alternative as FZF has no way of
           | disabling the selector window, but I was unable to find
           | anything that was good enough for this.
           | 
           | I considered forking jid/jiq and using gojq as a library, but
           | I ended up not going down that route because of reasons that
           | I cannot remember. I also considered using a tui or something
           | but FZF has so much already implemented and has a lot of it
           | right, and I didn't particularly feel like re-inventing the
           | wheel.
        
         | rattray wrote:
         | This looks cool - would you be willing to share it in a brew-
         | installable format?
        
           | hoshsadiq wrote:
           | Not particularly. I don't use a Mac, but I'd be happy to
           | separate out the function into it's own script so it can be
           | downloaded and put in your $PATH. I personally use zinit to
           | manage individual files from random repos.
        
         | psacawa wrote:
         | I've written about this before, but for tmux users, I have
         | interactive querying the form of ten-line shell script [1]. See
         | it live [2]. It can easily be modfied to interactively query
         | yaml, html with xpath or css, or text with awk or what have,
         | and I have variants for each of these. This has the advantage
         | over the parent comment thatyou have your text editor's key
         | bindings instead of those of fzf.
         | 
         | Dependencies: tmux, nodemon, less, jq, vim
         | 
         | [1]:
         | https://gist.github.com/psacawa/e63c4e25a8b0405309d3a03b6b50...
         | 
         | [2]: https://streamable.com/jwdrqu
        
       | keymone wrote:
       | jq is nice, but the moment i need anything more complex than
       | "pull this attribute out of bunch of objects" i vastly prefer
       | spinning up an actual language runtime. or use a tool built
       | around a language (e.g. https://github.com/borkdude/jet) rather
       | than a language built around a tool.
        
       | vlmutolo wrote:
       | I love posts that are this information-dense. I know for a fact
       | that this will be useful to me some day.
        
       | [deleted]
        
       | doytch wrote:
       | I've really loved having jq at my disposal ever since learning
       | about it, but I feel like it took the combination of it and gron
       | [1] to really transform my debugging and JSON workflows.
       | 
       | 1: https://github.com/TomNomNom/gron
        
         | danso wrote:
         | I can't believe the number of times I've wanted to grep a JSON
         | file, and yet never thought to look and see if there was any
         | kind of tool for it. Thanks!
        
       | triska wrote:
       | JSON data is also a valid _Prolog_ term, and the declarative
       | programming language Prolog is ideally suited for handling tree-
       | shaped data.
       | 
       | Using for example Scryer Prolog, we can conveniently _relate_ the
       | data to a flat list of items with Prolog 's built-in grammar
       | mechanism, definite clause grammars (DCGs):
       | flat_json(JSON) -->                 { JSON = {A,B,C,D,E,F,_:Cs}
       | },                 [{A,B,C,D,E,F}],
       | flat_items(Cs).              flat_items([]) --> [].
       | flat_items([I|Is]) -->                 { I = {(A,B,C,D,E,_:Cs)}
       | },                 [{A,B,C,D,E}],                 flat_items(Cs),
       | flat_items(Is).
       | 
       | Sample query, using the example JSON data from the article:
       | ?- JSON = {             "id": 27941108,             "created_at":
       | "2021-07-24T14:15:05.000Z",             "type": "story",
       | "author": "edward",             "title": "Fun with Unix domain
       | sockets",             "url":
       | "https://simonwillison.net/2021/Jul/13/unix-domain-sockets/",
       | "children": [                 {                     "id":
       | 27942287,                     "created_at":
       | "2021-07-24T16:31:18.000Z",                     "type":
       | "comment",                     "author": "DesiLurker",
       | "text": "<p>one lesser known...",                     "children":
       | []                 },                 {                     "id":
       | 27944615,                     "created_at":
       | "2021-07-24T21:26:33.000Z",                     "type":
       | "comment",                     "author": "galaxyLogic",
       | "text": "<p>I read this from Wikipedia...",
       | "children": [                         {
       | "id": 27944746,                             "created_at":
       | "2021-07-24T21:49:07.000Z",                             "type":
       | "comment",                             "author": "hughrr",
       | "text": "<p>Yes although I ...",
       | "children": []                         }                     ]
       | }             ]         },            phrase(flat_json(JSON),
       | Cs),            maplist(portray_clause, Cs).
       | 
       | yielding the flat list of entries, as desired:                   
       | [{("id":27941108,"created_at":"2021-07-24T14:15:05.000Z","type":"
       | story","author":"edward",...)},          {("id":27942287,"created
       | _at":"2021-07-24T16:31:18.000Z","type":"comment",...)},          
       | {("id":27944615,"created_at":"2021-07-24T21:26:33.000Z","type":"c
       | omment",...)},          {("id":27944746,"created_at":"2021-07-24T
       | 21:49:07.000Z","type":"comment",...)}]
        
         | ec109685 wrote:
         | Anyway to avoid the need to enumerate the json fields that come
         | before children with those placeholders? Otherwise, it will be
         | brittle to modifications.
         | 
         | Prolog was the one language I couldn't get my head around in
         | the programming languages class I took at school.
        
       | jeffbee wrote:
       | I often use jq for random hacks but for something like this I
       | would turn to XPath. I know that sounds impossibly retro, but
       | XPath 3.1 for JSON is awesome and its language makes so much more
       | sense than jq's. There are several good implementations, and all
       | of them are faster than jq, too.
        
       | pcr910303 wrote:
       | A bit offtopic, but I don't see much people knowing/using the
       | Algolia API[0]. It's much better to use than the HN official
       | API[1], since it returns the whole tree data in one request.
       | 
       | Unfortunately (I guess this is a big reason why people don't use
       | it), it doesn't sort the comments - if you need the orders,
       | you'll have to parse HN HTML (or just use the official API).
       | 
       | Still just two requests (the HN site, the Algolia API) is much
       | better than recursively requesting a hundred requests, so I use
       | this approach in my client[2].
       | 
       | [0]: https://hn.algolia.com/api
       | 
       | [1]: https://github.com/HackerNews/API
       | 
       | [2]: https://github.com/goranmoomin/HackerNews
        
       | ur-whale wrote:
       | There's a real need for a tool like jq, but jq unfortunately
       | isn't it.
       | 
       | What I mean is this: the functionality offered by jq (parsing
       | json on the command line and extracting what you need from it) is
       | really needed in many modern data processing tasks, but jq's DSL
       | is one of the most horrible thing I've had to learn in recent
       | years.
       | 
       | The only way a casual user of that thing can hope to succeed in
       | actually crafting a working jq query is pray that there is a
       | stack overflow topic answering his exact need.
       | 
       | Here's the way I use jq 99% of the time:                   cat
       | somefile.json | jq . | <long pipeline of traditional, well
       | though-out unix text processing tools such as sed, grep, awk,
       | cut, etc...>
       | 
       | And when this doesn't cut it, I write a python script.
        
       | matthewtovbin wrote:
       | Forget about jq. JSONata is much more powerful -
       | http://docs.jsonata.org/overview.html
        
         | turbocon wrote:
         | Yea what's going on here, you have a couple of other comments
         | pushing Jsonata?
         | 
         | You an author or something?
        
         | sidpatil wrote:
         | jq is a single binary which I can easily just drop into a
         | remote server or throw into my ~/.local/bin, where as JSONata
         | requires NPM.
        
         | [deleted]
        
       | xcambar wrote:
       | I love jq for many reasons:
       | 
       | * it is very potent
       | 
       | * it has extensive documentation, written more like parabolas and
       | new-age sorcery than actually helpful content
       | 
       | * it improves your search skills greatly, for many internet
       | results try to give sense to its format
       | 
       | * it gives great satisfaction after you've died-and-retried 300x
       | a whole afternoon for a command you'll only need once.
       | 
       | * it looks cool to use jq instead of [any language you're already
       | familiar with].
       | 
       | Yes, I'm being sarcastic, yet honest. Note that I still use it
       | and recommend it for simpler use cases though.
       | 
       | _[shrug]_
        
         | cinntaile wrote:
         | Jq makes me feel stupid, it's comforting to know that it's
         | probably not just me.
        
         | damagednoob wrote:
         | I too find the jq syntax arcane and I've found
         | https://jqplay.org/ to be an invaluable help.
        
       | yewenjie wrote:
       | Can someone comment on how does `jq` compare with `fx`?
       | 
       | - https://github.com/antonmedv/fx
        
       ___________________________________________________________________
       (page generated 2021-08-01 23:00 UTC)