hngopher.com

       [HN Gopher] DuckDB as the New jq
       ___________________________________________________________________
        
       DuckDB as the New jq
        
       Author : pgr0ss
       Score  : 323 points
       Date   : 2024-03-21 18:19 UTC (1 days ago)
        
 (HTM) web link (www.pgrs.net)
 (TXT) w3m dump (www.pgrs.net)
        
       | sshine wrote:
       | Very cool!
       | 
       | I am also a big fan of jq.
       | 
       | And I think using DuckDB and SQL probably makes a lot of sense in
       | a lot of cases.
       | 
       | But I think the examples are very geared towards being better
       | solved in SQL.
       | 
       | The ideal jq examples are combinations of filter (select), map
       | (map) and concat (.[]).
       | 
       | For example, finding the right download link:                 $
       | curl -s https://api.github.com/repos/go-
       | gitea/gitea/releases/latest \         | jq -r '.assets[]
       | | .browser_download_url                  |
       | select(endswith("linux-amd64"))'       https://github.com/go-
       | gitea/gitea/releases/download/v1.15.7/gitea-1.15.7-linux-amd64
       | 
       | Or extracting the KUBE_CONFIG of a DigitalOcean Kubernetes
       | cluster from Terraform state:                 $ jq -r
       | '.resources[]               | select(.type ==
       | "digitalocean_kubernetes_cluster")               |
       | .instances[].attributes.kube_config[].raw_config' \
       | terraform.tfstate       apiVersion: v1       kind: Config
       | clusters:       - cluster:           certificate-authority-data:
       | ...           server: https://...k8s.ondigitalocean.com       ...
        
         | pgr0ss wrote:
         | I think that's a fair point. Unnesting arrays in SQL can be
         | annoying. Here is your first example with duckdb:
         | duckdb -c \         "select * from ( \           select
         | unnest(assets)->>'browser_download_url' as url \           from
         | read_json('https://api.github.com/repos/go-
         | gitea/gitea/releases/latest') \         ) \         where url
         | like '%linux-amd64'"
        
           | _flux wrote:
           | In case someone else was wondering, one can get a shell-
           | consumable output from that with                 duckdb
           | -noheader -list
        
       | hprotagonist wrote:
       | i've been using simonw's sqlite-utils (https://sqlite-
       | utils.datasette.io/en/stable/) for this sort of thing; given
       | structured json or jsonl, you can throw data at an in-memory
       | sqlite database and query away: https://sqlite-
       | utils.datasette.io/en/stable/cli.html#queryin...
        
       | NortySpock wrote:
       | In a similar vein, I have found Benthos to be an incredible
       | swiss-army-knife for transforming data and shoving it either into
       | (or out of) a message bus, webhook, or a database.
       | 
       | https://www.benthos.dev/
        
         | krembo wrote:
         | How does this defer from filebeat?
        
           | NortySpock wrote:
           | I don't know which filebeat you are referring to...
           | 
           | https://github.com/elastic/beats/tree/master/filebeat
           | 
           | This one? I only looked for a moment, but filebeat appears to
           | be ingestion only. Benthos does input, output, side-effects,
           | stream-stream joins, metrics-on-the-side, tiny-json-
           | wrangling-webooks, and more. I find it to be like plumbers
           | putty, closing over tooling gaps and smoothing rough edges
           | where ordinarily you'd have to write 20 lines of stream
           | processing code and 300 lines of error handling, reporting,
           | and performance hacks.
        
         | esafak wrote:
         | I wish it was not based on YAML. Pipelines are code, not
         | configuration!!
        
       | pletnes wrote:
       | Worth noting that both jq and duckdb can be used from python and
       | from the command line. Both are very useful data tools!
        
       | haradion wrote:
       | I've found Nushell (https://www.nushell.sh/) to be really handy
       | for ad-hoc data manipulation (and a decent enough general-purpose
       | shell).
        
         | wraptile wrote:
         | Nushell is really good but the learning curve is massive.
         | 
         | I've been on nushell for almost a year now and still struggle
         | to put more complex commands together. The docs are huge but
         | not very good and the community resources are very limited
         | (it's on Dicord smh) unfortunately. So, if anyone wants to get
         | into it you really need to put down few days to understand the
         | whole syntax suite but it's worth it!
        
       | HellsMaddy wrote:
       | Jq tip: Instead of `sort_by(.count) | reverse`, you can do
       | `sort_by(-.count)`
        
         | philsnow wrote:
         | only if you're sure that .count is never null:
         | $ echo '[{"a": {"count": null}}]' | jq -c 'sort_by(-.count)'
         | jq: error (at <stdin>:1): null (null) cannot be negated       $
         | echo '[{"a": {"count": null}}]' | jq -c 'sort_by(.count) |
         | reverse'       [{"a":{"count":null}}]
        
           | mdaniel wrote:
           | this whole thread is like nerd sniping me :-D but I felt
           | compelled to draw attention to jq's coalesce operator because
           | I only recently learned about it and searching for the word
           | "coalesce" in the man page is pfffft (it's official name is
           | "Alternative operator", with alternative being "for false and
           | null")                 $ echo '[{"a": {"count": null}}]' | jq
           | -c 'sort_by(-(.count//0))'       [{"a":{"count":null}}]
        
       | jeffbee wrote:
       | I tried this and it just seems to add bondage and discipline that
       | I don't need on top of what is, in practice, an extremely chaotic
       | format.
       | 
       | Example: trying to pick one field out of 20000 large JSON files
       | that represent local property records.
       | 
       | % duckdb -json -c "select apn.apnNumber from read_json('*')"
       | Invalid Input Error: JSON transform error in file "052136400500",
       | in record/value 1: Could not convert string 'fb1b1e68-89ee-11ea-
       | bc55-0242ad1302303' to INT128
       | 
       | Well, I didn't want that converted. I just want to ignore it.
       | This has been my experience overall. DuckDB is great if there is
       | a logical schema, not as good as jq when the corpus is just data
       | soup.
        
       | hu3 wrote:
       | Related, clickhouse local cli command is a speed demon to parse
       | and query JSON and other formats such as CSV:
       | 
       | - "The world's fastest tool for querying JSON files"
       | https://clickhouse.com/blog/worlds-fastest-json-querying-too...
       | 
       | - "Show HN: ClickHouse-local - a small tool for serverless data
       | analytics" https://news.ycombinator.com/item?id=34265206
        
         | mightybyte wrote:
         | I'll second this. Clickhouse is amazing. I was actually using
         | it today to query some CSV files. I had to refresh my memory on
         | the syntax so if anyone is interested:
         | clickhouse local -q "SELECT foo, sum(bar) FROM
         | file('foobar.csv', CSV) GROUP BY foo FORMAT Pretty"
         | 
         | Way easier than opening in Excel and creating a pivot table
         | which was my previous workflow.
         | 
         | Here's a list of the different input and output formats that it
         | supports.
         | 
         | https://clickhouse.com/docs/en/interfaces/formats
        
           | gkbrk wrote:
           | You don't even need to use file() for a lot of things
           | recently. These just work with clickhouse local. Even
           | wildcards work.                 select * from `foobar.csv`
           | 
           | or                 select * from `monthly-report-*.csv`
        
             | mightybyte wrote:
             | Ooh very nice, thanks for the tip!
        
             | wwader wrote:
             | Just had to try:                 $ function _select_aux ()
             | { clickhouse local -q "SELECT $* FORMAT Pretty" }       $
             | alias SELECT='noglob _select_aux'       $ SELECT COUNT(*)
             | as count FROM file('repos.json', JSON)       +-------+
             | | count |       +-------+       |    30 |       +-------+
        
       | hermitcrab wrote:
       | if you want a very visual way to transform JSON/XML/CSV/Excel etc
       | in a pipeline it might also be worth looking at Easy Data
       | Transform.
        
       | mritchie712 wrote:
       | You can also query (public) Google Sheets [0]
       | SELECT *          FROM
       | read_csv_auto('https://docs.google.com/spreadsheets/export?
       | format=csv&id=1GuEPkwjdICgJ31Ji3iUoarirZNDbPxQj_kf7fd4h4Ro',
       | normalize_names=True);
       | 
       | 0 - https://x.com/thisritchie/status/1767922982046015840?s=20
        
       | dudus wrote:
       | DuckDB parses JSON using yyjson internally .
       | 
       | https://github.com/ibireme/yyjson
        
       | ec109685 wrote:
       | While jq's syntax can be hard to remember, ChatGTP does an
       | excellent job generating jq from an example json file and a
       | description of how you want it parsed.
        
       | xg15 wrote:
       | The most effective combination I've found so far is jq + basic
       | shell tools.
       | 
       | I still think jq's syntax and data model is unbelievably elegant
       | and powerful once you get the hang of it - but its "standard
       | library" is unfortunately sorely lacking in many places and has
       | some awkward design choices in others, which means that a lot of
       | practical everyday tasks - such as aggregations or even just set
       | membership - are a lot more complicated than they ought to be.
       | 
       | Luckily, what jq can do _really well_ is bringing data of
       | interest into a line-based text representation, which is ideal
       | for all kinds of standard unix shell tools - so you can just use
       | those to take over the parts of your pipeline that would be hard
       | to do in  "pure" jq.
       | 
       | So I think my solution to the OP's task - get all distinct OSS
       | licenses from the project list and count usages for each one -
       | would be:
       | 
       | curl ... | jq '.[].license.key' | sort | uniq -c
       | 
       | That's it.
        
         | pcthrowaway wrote:
         | > I still think jq's syntax and data model is unbelievably
         | elegant and powerful once you get the hang of it - but its
         | "standard library" is unfortunately sorely lacking in many
         | places
         | 
         | After a few years of stalled development, jq has been taken
         | over recently by a new team of maintainers and is rapidly
         | working through a lot of longstanding issues
         | (https://github.com/jqlang/jq), so I'm not sure if this is
         | still the case
        
           | xg15 wrote:
           | Wasn't aware of that, that's great to hear! I think if there
           | is one utility that deserves a great maintainer team then
           | this one. But if we saw some actual improvements in the
           | future, that would be awesome!
           | 
           | I have a list of pet peeves that I'd really like to see
           | fixed, so I'm gonna risk a bit of hope.
        
         | jimbokun wrote:
         | The Unix philosophy continues to pass the test of time.
        
           | pphysch wrote:
           | Yes and no. Many UNIX philosophy proponents are abhorred by
           | powerful binaries like jq and awk.
        
         | eru wrote:
         | > I still think jq's syntax and data model is unbelievably
         | elegant and powerful once you get the hang of it [...]
         | 
         | It's basically just functional programming. (Or what you would
         | get from a functional programmer given the task of writing such
         | a tool as jq.)
         | 
         | That's not to diminish jq, it's a great tool. I love it!
        
         | salmo wrote:
         | As an old Unix guy this is exactly how I see jq: a gateway to a
         | fantastic library of text processing tools. I see a lot of
         | complicated things done inside the language, which is a valid
         | approach. But I don't need it to be a programming language
         | itself, just a transform to meet my next command after the
         | pipe.
         | 
         | If I want logic beyond that, then I skip the shell and write
         | "real" software.
         | 
         | I personally find those both to be more readable and easier to
         | fit in my head than long complex jq expressions. But that's
         | completely subjective and others may find the jq expression
         | language easier to read than shell or (choose your programming
         | language).
        
           | digdugdirk wrote:
           | Your comment made me go look up jq (even more than the
           | article did) and the first paragraph of the repo [0] feels
           | like a secret club's secret language.
           | 
           | I'm very interested, but not a Linux person, do you know of
           | any good resources for learning the Linux shell as a
           | programming language?
           | 
           | [0] https://jqlang.github.io/jq/
        
             | salmo wrote:
             | I'll say, I did shell scripting for years from copy/paste,
             | cribbing smarter people, and reading online guides. But I
             | didn't really understand until I read The Unix Programming
             | Environment by Brian Kernighan and Rob Pike.
             | 
             | It's a very old book and the audience was using dumb
             | terminals. But it made me understand why and how. I think
             | I've read every Kernighan book at this point and most he
             | was involved in because he is just so amazing and not just
             | conveying facts, but teaching how to think idiomatically in
             | the topic.
             | 
             | I also used awk for 2 decades, kind of like how I use jq
             | now. But when I read his memoir I suddenly "got it." What I
             | make with it now is intentional and not just me banging on
             | the keyboard until it works. A great middle ground for
             | something a little sophisticated, but not worth writing a
             | full program for.
             | 
             | Something else that helped me was to install a minimal
             | distro... actually a base FreeBSD install would be great...
             | and read the man pages for all the commands. I don't
             | remember the details, but I learned that things existed. I
             | have many man pages that I look at the same options on
             | every few months because I'm not positive I remember right.
             | Heck, I 'man test' all the time still. ('test' and '[' are
             | the same thing)
             | 
             | I also had an advantage of 2 great coworkers. They'd been
             | working on Unix since the 80s and their feedback helped me
             | be more efficient, clean, and avoid "useless use of cat"
             | problems.
             | 
             | I also highly recommend using shellcheck. I sometimes
             | disagree with it when I'm intentionally abusing shell
             | behavior, but it's a great way to train good habits and
             | prevent bugs that only crop up with bad input, scale, etc.
             | I get new devs to use it and it's helped them "ramp up"
             | quickly, with me explaining the "why" from time to time.
             | 
             | But yeah. The biggest problem I see is that people think
             | there is more syntax than there really is (like my test and
             | [ comment). And remember it's all text, processes, and
             | files. Except when we pretend it's not ;).
        
               | pcthrowaway wrote:
               | > The Unix Programming Environment
               | 
               | How does this compare to The Art of Unix Programming, if
               | you've read both?
        
               | salmo wrote:
               | I don't find that book to be very useful at all.
               | 
               | I'm kind of annoyed by the bait and switch of the title.
               | It's a play on Knuth's classic but then turns into
               | showing why Unix/Linux is better than Windows, etc.
               | 
               | As a disclaimer: I really don't respect ESR and his work,
               | and admire Brian Kernighan immensely. Very odd to be in a
               | situation where those names are put side by side. Just
               | want to call out that I do have bias on the people here.
               | Don't want to get into why as that's not constructive.
        
               | pcthrowaway wrote:
               | I wasn't aware of the bait and switch at the time I read
               | it, but I did really enjoy the history of how the
               | Unix/Linux ethic came together and evolved over time. Had
               | I heard of The Unix Programming Environment when I read
               | it in 2014 I may have gone with that instead, as I was
               | looking for something more along the lines of a technical
               | handbook rather than a code of ethics.
        
               | salmo wrote:
               | Yeah and ESR can be revisionist in his history,
               | projecting intention on something organic. He alienated a
               | lot of people over time with this... and other behavior.
               | 
               | The book I recommended is both a handbook and a "how to
               | think." It applies forward to things introduced well
               | after the book. But it also helped me understand why the
               | Byzantine behavior of a tty is what it is.
               | 
               | If you are interested in the history from a first person
               | perspective, I do recommend Kernighan's "Unix: A History
               | and a Memoir". He went from originally trying to write
               | something objective to realizing it was necessarily his
               | personal experience. Even the culture aspect of his story
               | has influenced how I try to foster teamwork. It was an
               | engaging read for me.
        
               | arp242 wrote:
               | I felt it was a good dive into "good" programs at a bit
               | of higher level, rather than "here's how to do X". Quite
               | a bit applies to Windows software and other software that
               | never touches Unix as well.
               | 
               | Some bits are better than others, some bits haven't aged
               | too well in the last 20 years, and it's a shame esr has
               | since turned crazy. But I still feel the book holds up
               | reasonably well.
               | 
               | "Bait and switch" certainly seems too strong of an
               | accusation, especially for a book that's available for
               | free.
               | 
               | I do agree that even pre-crazy esr was never on the level
               | of Kernighan in any way.
        
               | metadat wrote:
               | Really love your comment, so much that I wanted to check
               | out the books you mentioned.
               | 
               | After searching z-lib for "The UNIX Programming
               | Environment", all I found was a janky and grainy PDF.
               | Then I searched archive.org and discovered this high
               | fidelity PDF version:
               | 
               | https://archive.org/details/UnixProgrammingEnviornment
               | 
               | Note: Sadly, the EPUB version is 10x larger (370MB) and
               | is corrupted, not able to be opened / viewed.
        
             | NortySpock wrote:
             | So, grab yourself a Linux box (I suggest Debian), a large
             | CSV file or JSON lines file you need to slice up, and an
             | hour of time, and start trying out some bash one-liners on
             | your data. Set some goals like "find the Yahoo email
             | addresses in the data and sort by frequency" or "find error
             | messages that look like X" or "find how many times Ben
             | Franklin mentions his wife in his autobiography"
             | 
             | Here's the thing. These tools have been used since the '70s
             | to slice, dice and filter log files, CSVs, or other semi-
             | structured data. They can be chained together with the pipe
             | command. Sys admins were going through 100MB logs with
             | these tools before CPUs hit the gigahertz
             | 
             | These tools are blisteringly fast, and they are basically
             | installed on every Linux machine.
             | 
             | https://github.com/onceupon/Bash-Oneliner
             | 
             | And for a different play-by-play example:
             | 
             | https://adamdrake.com/command-line-tools-can-
             | be-235x-faster-...
        
             | coldtea wrote:
             | > _Your comment made me go look up jq (even more than the
             | article did) and the first paragraph of the repo [0] feels
             | like a secret club 's secret language._
             | 
             | Or one of the most old standing widespread clubs of
             | computing open standard language :)
             | 
             | "jq is like sed for JSON data - you can use it to slice and
             | filter and map and transform structured data with the same
             | ease that sed, awk, grep and friends let you play with
             | text."
             | 
             | Translation:
             | 
             | JQ is like a (UNIX/POSIX staple command line text-
             | manipulation tool) but specialized for text structured in
             | JSON format. You can use it to extract parts of a JSON
             | document (slice), keep nodes based on some criteria
             | (filter), transform each element in a list of structured
             | data to get a new list with the transformed versions (map),
             | and do that as easily as you can with the sed (basic
             | command line text manipulation program), awk (command line
             | text manipulation program with a full featured text-
             | processing oriented language), grep (command line program
             | to search for strings), and other assorted unix userland
             | programs.
        
         | mightybyte wrote:
         | Your command line solution doesn't give quite the same result
         | as OP. The final output in OP is sorted by the count field, but
         | your command line incantation doesn't do that. One might
         | respond that all you need to do is add a second "| sort" at the
         | end, but that doesn't quite do it either. That will use string
         | sorting instead of proper numeric sorting. In this example with
         | only three output rows it's not an issue. But with larger
         | amounts of data it will become a problem.
         | 
         | Your fundamental point about the power of basic shell tools is
         | still completely valid. But if I could attempt to summarize
         | OP's point, I think it would be that SQL is more powerful than
         | ad-hoc jq incantations. And in this case, I tend to agree with
         | OP. I've made substantial use of jq and yq over the course of
         | years, as well as other tools for CSVs and other data formats.
         | But every time I reach for them I have to spend a lot of time
         | hunting the docs for just the right syntax to attack my
         | specific problem. I know jq's paradigm draws from functional
         | programming concepts and I have plenty of personal experience
         | with functional programming, but the syntax and still feel very
         | ad hoc and clunky.
         | 
         | Modern OLAP DB tools like duckdb, clickhouse, etc that provide
         | really nice ways to get all kinds of data formats into and out
         | of a SQL environment seem dramatically more powerful to me.
         | Then when you add the power of all the basic shell tools on top
         | of that, I think you get a much more powerful combination.
         | 
         | I like this example from the clickhouse-local documentation:
         | $ ps aux | tail -n +2 | awk '{ printf("%s\t%s\n", $1, $4) }' \
         | | clickhouse-local --structure "user String, mem Float64" \
         | --query "SELECT user, round(sum(mem), 2) as memTotal
         | FROM table GROUP BY user ORDER BY memTotal DESC FORMAT Pretty"
        
           | tleb_ wrote:
           | For reference, sort(1) has -n, --numeric-sort: compare
           | according to string numerical value.
        
           | xg15 wrote:
           | You can archive that by appending sort -n, so the whole thing
           | becomes:
           | 
           | curl ... | jq '.[].license.key' | sort | uniq -c | sort -n
           | 
           | You can even turn it back into json by exploiting the fact
           | that when uniq -c gets lines of json as input, it's output
           | will be "accidentally" parseable as a sequence of json
           | literals by jq, where every second literal is a count. You
           | can use jq's (very weird) _input_ function to transform each
           | pair of literals into a  "proper" json object:
           | 
           | curl ... | jq '.[].license.key' | sort | uniq -c | sort -n |
           | jq '{"count":., "value":input}'
        
         | DanielHB wrote:
         | found out recently that jq can url-encode values, is there
         | anything it _can't_ do?
        
           | xg15 wrote:
           | Finding out whether . is contained in a given array or not,
           | evidently.
           | 
           | (That's not strictly true - you _can_ do it, you just have to
           | bend over backwards for what is essentially the  "in" keyword
           | in python, sql, etc. jq has no less than four functions that
           | look like they should do that - in(), has(), contains() and
           | inside(), yet they all do something slightly different)
        
       | JeremyNT wrote:
       | I have a lot of trouble understanding the benefits of this versus
       | just working with json with a programming language. It seems like
       | you're adding another layer of abstraction versus just dealing
       | with a normal hashmap-like data structure in your language of
       | choice.
       | 
       | If you want to work with it interactively, you could use a
       | notebook or REPL.
        
         | edu_guitar wrote:
         | if you are used to the command line and knows some basic
         | syntax, it is less verbose then opening a REPL and reading a
         | file. The fact that you can pipe the json data into it is also
         | a plus, making it easier to check quickly if the response of a
         | curl call has the fields/values you were expecting. Of course,
         | if you are more comfortable doing that from the REPL, you get
         | less value from learning jq. If you are fond of one liners, jq
         | offers a lot of potential.
        
         | bdcravens wrote:
         | Pipelining CLI commands or bash scripts. From a security
         | perspective, it may be preferable to not ship with a runtime.
        
           | vips7L wrote:
           | bash and jq are both runtimes.
        
             | bdcravens wrote:
             | Very difficult (or often impractical) to not have a shell
             | at all, and jq is at least limited in scope, and has no
             | dependencies that need to be installed. Far better than a
             | full language with its own standard library and set of
             | dependencies to lock down.
        
           | jonfw wrote:
           | Use a compiled language like golang if you don't want to ship
           | with a runtime.
           | 
           | If you're willing to ship w/ bash then I don't understand the
           | opposition to JS. Either tool puts you in a scenario where
           | somebody who can exec into your env can do whatever they want
        
         | WuxiFingerHold wrote:
         | My thoughts as well:                 const response = await
         | fetch("https://api.github.com/orgs/golang/repos");       const
         | repos = await response.json();            const groups =
         | Map.groupBy(repos, e => e?.license?.key);       ...
        
       | ndr wrote:
       | If you like lisp, and especially clojure, check out babashka[0].
       | This my first attempt but I bet you can do something nicer even
       | if you keep forcing yourself to stay into a single pipe command.
       | cat repos.json | bb -e ' (->> (-> *in* slurp (json/parse-string
       | true))                                     (group-by #(-> %
       | :license :key))                                     (map #(->
       | {:license (key %)
       | :count (-> % val count)}))
       | json/generate-string
       | println)'
       | 
       | [0] https://babashka.org/
        
       | schindlabua wrote:
       | Shoutout to jqp, an interactive jq explorer.
       | 
       | https://github.com/noahgorstein/jqp
        
       | nf3 wrote:
       | I run a pretty substantial platform where I implemented
       | structured logging to SQLite databases. Each log event is stored
       | as a JSON object in a row. A separate database is kept for each
       | day. Daily log files are about 35GB, so that's quite a lot of
       | data to go through is you want to look for something specific.
       | Being able to index on specific fields, as well as express
       | searches as SQL queries is a real game changer IMO.
        
       | rpigab wrote:
       | I love jq and yq, but sometimes I don't want to invest time in
       | learning new syntax and just fallback to some python one liner,
       | that can if necessary become a small python script.
       | 
       | Something like this, I have a version of this in a shell alias:
       | python3 -c "import
       | json,sys;d=json.load(sys.stdin);print(doStuff(d['path']['etc']))"
       | 
       | Pretty print is done with json.dumps.
        
       | phmx wrote:
       | There is also a way to import a table from the STDIN (see also
       | https://duckdb.org/docs/data/json/overview)
       | 
       | cat my.json | duckdb -c "CREATE TABLE mytbl AS SELECT * FROM
       | read_json_auto('/dev/stdin'); SELECT ... FROM mytbl"
        
       | Sammi wrote:
       | I work primarily in projects that use js and I mostly don't see
       | the point in working with json in other tools than js.
       | 
       | I have tried jq a little bit, but learning jq is learning a new
       | thing, which is healthy, but it also requires time and energy,
       | which is not always available.
       | 
       | When I want to munge some json I use js... because that is what
       | js in innately good at and it's what I already know. A little js
       | script that does stdin/file read and then JSON.parse, and then
       | map and filter some stuff, and at the end JSON.stringify to
       | stdout/file does the job 100% of the time in my experience.
       | 
       | And I can use a debugger or put in console logs when I want to
       | debug. I don't know how to debug jq or sql, so when I'm stuck I
       | end up going for js which I can debug.
       | 
       | Are there js developers who reach for jq when you are already
       | familiar with js? Is it because you are already strong in bash
       | and terminal usage? I think I get why you would want to use sql
       | if you are already experienced in sql. Sql is common and made for
       | data munging. Jq however is a new dsl when I don't see the
       | limitation of existing js or sql.
        
         | wwader wrote:
         | I do quite a lot of adhoc/exploratory programming to query and
         | transform data then jq is very convenient as it works very well
         | with "deep" data structures and the language itself it very
         | composable.
         | 
         | To debug in jq you can use the debug function to prints to
         | stderr, ex: "123 | debug | ..." or "{a:123, b:456} | debug({a})
         | | ... " only prints value of a "{a:123}"
        
       | jonfw wrote:
       | My current team produces a CLI binary that is available on every
       | build system and everybody's dev machines
       | 
       | Whenever we're writing automation, if the code is nontrivial, or
       | if it starts to include dependencies, we move the code into the
       | CLI tool.
       | 
       | The reason we like this is that we don't want to have to version
       | control tools like duckdb across every dev machine and every
       | build system that might run this script. We build and version
       | control a single binary and it makes life simple.
        
       | snthpy wrote:
       | Hi,
       | 
       | I very much share your sentiment and I saw a few comments
       | mentioning PRQL so I thought it might be worth bringing up the
       | following:
       | 
       | In order to make working with data at the terminal as easy and
       | fun as possible, some time ago I created pq (prql-query) which
       | leverages DuckDB, DataFusion and PRQL.
       | 
       | Unfortunately I am currently not in a position to maintain it so
       | the repo is archived but if someone wanted to help out and
       | collaborate we could change that.
       | 
       | It doesn't have much in the way of json functions out-of-the-box
       | but in PRQL it's easy to wrap the DuckDB functions for that and
       | with the new PRQL module system it will soon also become possible
       | to share those. If you look through my HN comment history I did
       | provide a JSON example before.
       | 
       | Anyway, you can take a look at the repo here:
       | https://github.com/PRQL/prql-query
       | 
       | If interested, you can get in touch with me via Github or the
       | PRQL Discord. I'm @snth on both.
        
       ___________________________________________________________________
       (page generated 2024-03-22 23:02 UTC)