hngopher.com

       [HN Gopher] JSONiq: JSON Query Language
       ___________________________________________________________________
        
       JSONiq: JSON Query Language
        
       Author : oever
       Score  : 91 points
       Date   : 2021-11-09 09:10 UTC (13 hours ago)
        
 (HTM) web link (www.jsoniq.org)
 (TXT) w3m dump (www.jsoniq.org)
        
       | markstos wrote:
       | The "Beginner's Tutorial" README has no code examples, and
       | suggests installing Anaconda, Jupyter and an old version of Zorba
       | to get started... beginning.
       | 
       | https://github.com/ghislainfourny/jsoniq-tutorial
       | 
       | Back button.
        
         | ghislainfourny wrote:
         | Thank you for your feedback. The tutorial was moved over to an
         | interactive notebook that you can find here:
         | 
         | https://colab.research.google.com/github/RumbleDB/rumble/blo...
         | 
         | The easiest way to get started locally is described here:
         | https://rumble.readthedocs.io/en/latest/Getting%20started/
         | 
         | We no longer recommend the installation of Anaconda -- instead,
         | the Spark tgz file can directly be downloaded, unzipped, and
         | the bin subdirectory added to the PATH, which is considerably
         | simpler. Likewise, the RumbleDB jar is just a download. Using
         | the RumbleDB shell is the easiest to set up; Jupyter and the
         | server require a bit of additional work.
         | 
         | For a cluster, this is even easier because most cloud platforms
         | can create one with the push of a button, and one only needs to
         | download the RumbleDB jar on the remote machine and get started
         | right away.
         | 
         | Use of RumbleDB on a cluster is explained here:
         | https://rumble.readthedocs.io/en/latest/Run%20on%20a%20clust...
        
           | ghislainfourny wrote:
           | There is also a docker version that should ease the process:
           | 
           | https://hub.docker.com/r/rumbledb/rumble
        
         | lichtenberger wrote:
         | https://nbviewer.org/github/ghislainfourny/jsoniq-tutorial/b...
        
       | vjust wrote:
       | I found it awkward to inject perl-like $variables in the middle
       | of a Python notebook. The association with anything-XML also is a
       | downer personally for me, its also a goodbye to lightweight. Yup,
       | this maybe a superficial observation, but the $ in a python-
       | script?
       | 
       | Jq works, both in python, and from the shell prompt.
       | https://pypi.org/project/jq/
        
       | danellis wrote:
       | Curious syntax. Why does it require sigils? Why does assignment
       | need both `let` and `:=`?
        
         | oever wrote:
         | It's derived from XQuery, a functional language for querying
         | and creating XML and JSON. (I realize that that's not an answer
         | to your question.)
        
         | emmanueloga_ wrote:
         | The look and feel of this language is due to its close
         | relationship with XQuery and the XML world, including the
         | syntax.
         | 
         | From [1]:
         | 
         |  _XQuery 3.1 was designed with the goal to support additional
         | data structures (maps, arrays) in memory. These structures are
         | mapped to JSON for input and output. XQuery 3.1 has been a W3C
         | recommendation since March 2017._
         | 
         |  _JSONiq was designed with the goal of querying and updating
         | JSON in settings such as document stores. It was also designed
         | by members of the XML Query working group (disclaimer: I am one
         | of them) while investigating various possibilities to support
         | JSON._
         | 
         | 1: https://stackoverflow.com/questions/44919443/what-are-the-
         | di...
        
         | ghislainfourny wrote:
         | Thank you for your question.
         | 
         | This is, in fact, not an assignment, but a variable binding
         | that is highly optimizable by an execution engine. The let
         | clause is part of the FLWOR expression, works in orchestration
         | with for, where, group by, order by, count, return; the ability
         | to bind variables while doing relational algebra is a feature
         | that is often missed in SQL.
         | 
         | JSONiq is functional and, in its core (non scripting) version,
         | does not allow modifying variable values.
        
         | lichtenberger wrote:
         | If you don't want both, then use
         | 
         | https://github.com/sirixdb/brackit
         | 
         | for instance :)
        
       | Felk wrote:
       | Can someone explain to me what the main differences to jq are,
       | besides the syntax?
        
         | masklinn wrote:
         | Jq is xpath, this looks to be xquery. In fact it specifically
         | works as an xquery embed.
        
           | emmanueloga_ wrote:
           | ... although, it seems neither JSONIq nor jq contain a
           | "parent" operator, as far as I can tell.
        
             | jolmg wrote:
             | Not that not having it makes either of them any less
             | powerful. If you descend to an inner context, you can refer
             | to the parent/ancestor via a variable you can set before-
             | hand.
        
             | lichtenberger wrote:
             | This might be too restricting regarding the storage.
             | 
             | But we have a function: https://github.com/sirixdb/brackit
        
           | bonzini wrote:
           | You can write the example below in jq as
           | def avg: add / length;         group_by(.url) | map({
           | "url": .[0].url,           "hits": length,            "avg":
           | map(.response_time) | avg         })
           | 
           | so jq should be (at least roughly) as powerful as JSONiq.
        
         | EdwardDiego wrote:
         | My first thought also - would be a good entry for a FAQ or blog
         | post.
        
         | baq wrote:
         | looks nothing like jq                 1. let $stats :=
         | collection("stats")       2. for $access in $stats       3.
         | group by $url := $access.url       4. return        5. {
         | 6.   "url": $url,       7.   "avg": avg($access.response_time),
         | 8.   "hits": count($access)       9. }
        
           | keymone wrote:
           | > besides the syntax
        
             | mro_name wrote:
             | isn't that question like "cinema besides the movies"?
        
               | keymone wrote:
               | no, language is not just syntax.
        
               | mro_name wrote:
               | Nor is the cinema just movies.
        
               | keymone wrote:
               | so if cinema isn't just movies then what's the problem
               | with asking about the difference in "cinema besides the
               | movies"?
        
               | mro_name wrote:
               | it's senseless besides nit-picking.
        
               | keymone wrote:
               | that's just like your opinion man.
        
               | chii wrote:
               | the user experience. jq is often a one liner, terse and
               | expressive. This jsoniq language looks almost like a
               | scripting language, requiring multiple lines to write an
               | expression.
        
               | _wolfie_ wrote:
               | I have multiple 30+ lines jq scripts in my current
               | project. So "often a one liner" is true, but it is not a
               | requirement, so I'm still not sure why use this instead.
        
             | jbverschoor wrote:
             | > JSONiq borrows a large numbers of ideas from XQuery
             | 
             | So basically grep or even sql ->xquery. No thank you!
        
         | bonzini wrote:
         | jq seems to have more focus on the generator and pipe
         | abstractions. In jq you say "foo | map(bar)"; foo and map(bar)
         | are both generators, and bar refers to each element of foo as
         | ".". Here you say "for $x in foo return bar"; foo and bar are
         | both JSON objects, and bar refers to each element of foo as
         | "$x", so the iteration is more explicit.
         | 
         | Likewise, compare "sum($element.response_time)" with
         | "map(.response_time) | add" in jq. Processing in JSONiq goes
         | inside to outside while jq goes left to right.
        
         | Deukhoofd wrote:
         | For one thing, modifying data I guess.
        
           | jolmg wrote:
           | What do you mean? jq can modify data:                 $ jq
           | '.foo += 1' <<< '{"foo": 2}'       {         "foo": 3       }
        
       | ainar-g wrote:
       | Related:
       | 
       | * RFC 6901 JavaScript Object Notation (JSON) Pointer:
       | https://datatracker.ietf.org/doc/html/rfc6901
       | 
       | * JSONPath RFC Draft: https://www.rfc-editor.org/rfc/internet-
       | drafts/draft-ietf-js...
       | 
       | * jq utility query language:
       | https://stedolan.github.io/jq/manual/
        
         | seumars wrote:
         | Also:
         | 
         | * GROQ: https://groq.dev
        
         | Aaargh20318 wrote:
         | JSONata: https://jsonata.org/
        
         | dariosalvi78 wrote:
         | somewhat similar to the Arango Query Language as well:
         | https://www.arangodb.com/docs/stable/aql/
        
       | lichtenberger wrote:
       | We're working on an implementation for SirixDB with currently a
       | slightly different syntax (for instance "=>" to dereference
       | object fields). We're also supporting all kinds of temporal query
       | enhancements (point in time queries, time travel queries...):
       | 
       | https://github.com/sirixdb/sirix
       | 
       | The query engine used is developed here (core implemented by
       | Sebastian Bachle and his tudents). Ideally the backend can be any
       | other data store as well.
       | 
       | https://github.com/sirixdb/brackit
        
       | sparsely wrote:
       | > Unlike SQL, which can only manipulate normalized data, JSONiq
       | natively works on the entire normalization spectrum: textual,
       | heterogeneous, deeply nested
       | 
       | Maybe I'm misunderstanding something, but isn't SQL perfectly
       | capable of acting on denormalized data? We often talk about the
       | degree of normalization that a RDMS has. Perhaps this is in
       | reference to nested data structures (which would normally be
       | represented in RDMS via FK relations), but even there it's
       | implementation dependent.
        
         | ghislainfourny wrote:
         | Thank you for your comment.
         | 
         | There exist indeed recent extensions of SQL that add support
         | for denormalized data (arrays, objects), for example in Spark
         | SQL and PostgreSQL, however SQL was originally designed for
         | tables and it remains cumbersome to write complex queries on
         | denormalized data (lateral views, etc).
         | 
         | A deeper and more detailed analysis of query languages for
         | nested data can be found in our recent paper with a concrete
         | use case in high energy physics, to be presented at VLDB 2022:
         | 
         | https://arxiv.org/abs/2104.12615
        
         | chippiewill wrote:
         | > Perhaps this is in reference to nested data structures
         | 
         | If you have nested data structures then it violates 1NF, i.e.
         | the data isn't normalised.
         | 
         | Any "pure" RDBMS won't even be able to represent that
         | denormalised data in a table, let alone query it with SQL.
         | Normally this is worked around by straight up serialising it.
         | Obviously some implementations have extensions, like Postgres
         | has JSON columns.
        
           | e-master wrote:
           | To be honest, I'm unsure how many 'pure' RDBMS are out there,
           | and doesn't seem like the parent comment had them in mind.
           | I'd rather suggest that mainstream RDBMS in general don't
           | have trouble working with data formats such as json or xml.
           | SQL Server for one has no trouble querying into deeply nested
           | XML/JSON columns and it does it surprisingly quickly based on
           | my experience.
        
           | maweki wrote:
           | As there are various degrees of "not normalized", this was a
           | very valid question (and answer).
           | 
           | SQL databases do indeed work with data not in Boyce-Codd-
           | Normal-Form. :D
        
           | user3939382 wrote:
           | Yes, MySQL also has functions dedicated to traversing and
           | selecting JSON
        
       | marstall wrote:
       | if anyone's here looking for javascript-language json query-er,
       | I've been using jsonata. found it deep and well-thought out, and
       | not too hard to get my head around the syntax
       | 
       | https://jsonata.org
        
       | ur-whale wrote:
       | I see dollar signs for variable names in there ... it is 2021 ...
       | have we learnt nothing?
        
         | gizdan wrote:
         | What's wrong with dollar signs for variable names?
        
           | jraph wrote:
           | We the Europeans feel discriminated. We'd like to see
           | variable names with EUR or PS.
           | 
           | let EURgdpr = 42;
           | 
           | let PScolour = #B0B;
           | 
           | There, this is more compelling, isn't it?
        
             | jhgb wrote:
             | That's why                   let $?whatever = 123;
             | 
             | was invented instead.
        
         | ivanche wrote:
         | I'd argue it's not dollar signs for variable names but dollar
         | signs in variable names.
        
         | HelloNurse wrote:
         | As you should expect, there's a reason: distinguishing
         | variables of the script from names within the JSON document and
         | from numerous keywords and builtin stuff. Variables get a sigil
         | because in normal code they are a minority of the names.
        
           | jolmg wrote:
           | Yup. Same with shells. Sigils on variables are so we don't
           | need to quote every argument/option on every command. And we
           | don't want to quote our strings in the shell because
           | different to other programming languages, shells are for
           | interactive use and so string literals are far more frequent
           | than variables.
        
       | bayesian_horse wrote:
       | Isn't every programming language a json query language?
       | Particularly functional ones?
        
         | emmanueloga_ wrote:
         | This answer will be a bit convoluted but hopefully has some
         | interesting and related concepts ...
         | 
         | First, from [1]:
         | 
         |  _... two representative [JSON] transformation tasks are
         | considered ... The exercise demonstrates that the absence of
         | parent or ancestor axes in the native representation of JSON
         | means that the transformation task needs to be approached in a
         | very different way._
         | 
         | The article shows that the ability to "navigate upward" to the
         | parent of a node can make certain queries and transformation
         | easier to implement. BTW, as far as I can tell, JSONIq does not
         | provide a way to "navigate upward" ...
         | 
         | But! also from [1]:
         | 
         |  _The ability to navigate upwards (and to a lesser extent,
         | sideways, to preceding and following siblings) clearly has
         | advantages and disadvantages. Without upwards navigation, a
         | transformation process that operates primarily as a recursive
         | tree walk cannot discover the context of leaf nodes (for
         | example, when processing a price, what product does it relate
         | to?), so this information needs to be passed down in the form
         | of parameters. However, the convenience of being able to
         | determine the context of a node comes at a significant price._
         | 
         | I remember dealing with trees in pure functional languages and
         | finding that typically implementing parent pointers can be
         | tricky [2]. I think if we combine this two facts, we can see
         | that not all languages make the same query or transformation
         | tasks equally easy.
         | 
         | 1: Page 167 of
         | https://archive.xmlprague.cz/2016/files/xmlprague-2016-proce...
         | 
         | 2: https://okmij.org/ftp/Scheme/xml.html#parent-ptr
        
         | gregwebs wrote:
         | Sure, just like you don't have to use SQL.
         | 
         | JSONiq is strongly typed, so only the strongly typed languages
         | are equivalent.
         | 
         | The point of this project, like SQL, is to have something
         | programming language agnostic that can be understood and
         | optimized by a db or a tool that doesn't use your programming
         | language.
         | 
         | It's mostly designed for data analysis and data interchange.
        
         | capableweb wrote:
         | I can only think of one language that supports JSON "natively":
         | JavaScript. For the rest of the languages, you usually need a
         | parser (which, to be fair, JavaScript also needs, it just
         | happens to come with the environment you use (browsers/NodeJS))
         | before you can actually do anything with the data.
        
           | johnbeech wrote:
           | Agreed, I generally impress colleagues with how fast it is to
           | solve problems with JSON / JS - the big advantage for me is
           | being able to "dump" a portion of program memory to JSON, and
           | then reload that into state of the application. It makes
           | systems much easier to understand when you can peer inside
           | the memory structures. I find by comparison that strongly
           | typed languages end up with a huge amount of boiler plate and
           | abstractions, you start to question which bits/bytes actually
           | matter. JSON reduces a problem space to something meaningful
           | and easily craftable by humans.
        
           | jbverschoor wrote:
           | You can eval() json.. That's where it came from
        
             | michaelmior wrote:
             | You can in JavaScript but you almost certainly don't want
             | to when you can just use JSON.parse and avoid the
             | possibility of the execution of arbitrary code.
        
         | dusted wrote:
         | I'd say yes, even the node repl is pretty comfortable for
         | working with json, but I know it's popular to hate on js for
         | not being typescript.
        
       | jansommer wrote:
       | See also https://rumbledb.org by same author
        
         | oever wrote:
         | There was a talk about RumbleDB and JSONiq at Declarative
         | Amsterdam a few days ago.
         | 
         | https://declarative.amsterdam/program
        
       | great_wubwub wrote:
       | "The main language construct is called a FLWOR expression...an
       | acronym for for, let, where, order by and return."
       | 
       | They missed a huge opportunity to call it FLOWR (flow-er as in
       | one who flows, or flower as in the thing that grows).
        
         | bestouff wrote:
         | Especially when you hover over "group by" and the pop-up title
         | says "FLWOR power" !
        
         | FrontAid wrote:
         | Not sure where you got that quote from, but FLWOR is not
         | something JSONiq invented.
         | 
         | - https://en.wikipedia.org/wiki/FLWOR
         | 
         | - https://www.w3.org/TR/xquery-30/#id-flwor-expressions
         | 
         | FLWOR is pronounced 'flower'.
        
       | Mariehane wrote:
       | I took Ghislain Fourny's Big Data course at ETH last year and
       | found working with JSONiq quite pleasant. I think it makes
       | writing complex queries quite simple, and way more readable than
       | trying to do the same in a query language like SQL, XPath, etc,
       | and since json seems to be fairly standard it is also useable
       | out-of-the-box for a lot of data "in the wild". It is my
       | impression that the tooling is a bit difficult to setup though.
       | 
       | Another thing is that it supposedly can scale to absolutely
       | massive datasets, since there is a Rumbe/Hadoop backend.
        
         | ocrb wrote:
         | Was the same for me. Like others here, I was quite sceptical
         | because I use Python / JS on a daily basis as a JSON query
         | language, but after I got used to it, I really liked the
         | language and think that it could increase my productivity if I
         | would use it more often.
        
         | heresie-dabord wrote:
         | It's interesting to watch solutions and anti-patterns repeat in
         | history ("the time-series of Reality").
         | 
         | Structuring data is an art and a skill. Perhaps not taught very
         | well.
         | 
         | There is a low-cost solution: CSV, TSV, lines of text.
         | 
         | The high-cost solution is relational structure and servers.
         | 
         | In the middle range, with risk of expensive tooling and glaring
         | anti-patterns, are XML and JSON.
         | 
         | The latter can both be used simply or with grotesque opacity. A
         | some point, feeding the monster becomes a main activity of the
         | village. At which point, the villagers elect a new monster with
         | their shovels and pitchforks.
        
           | rendall wrote:
           | I'm not exactly sure what you're talking about, but I like it
           | / am scared by it
        
       | traceroute66 wrote:
       | How does this differ from JMESPath[1] ?
       | 
       | [1] https://jmespath.org/
        
         | Mariehane wrote:
         | Looks to me like JMESPath is more similar to XPath, while
         | JSONiq uses statements like 'let', 'for', etc. which are more
         | similar to SQL (and seem more readable to me).
        
       | justsomeuser wrote:
       | I already have a JSON query language - it's JavaScript.
       | 
       | > Queries are 80% shorter than imperative code
       | 
       | From the examples they look the same length as the corresponding
       | JS I would write, maybe slightly longer.
       | 
       | What are the advantages to using this language over JS?
       | 
       | JS has a fast JIT which would make filtering data fast. How does
       | this query language compare in performance? Does it have indexes?
        
         | ghislainfourny wrote:
         | Thank you for your comment.
         | 
         | The motivation for JSONiq and RumbleDB is discussed in this
         | recent paper, the core argument is data independence:
         | 
         | https://arxiv.org/abs/1910.11582v3
         | 
         | JSONiq is a functional language and thus makes it easy to scale
         | to collections that have billions of objects.
         | 
         | A notable difference with JavaScript is that JSONiq has the
         | FLWOR expression, which is similar to, but more generic than, a
         | SQL statement.
        
         | markstos wrote:
         | Good points. JSON processing tools I find helpful in addition
         | to JavaScript include Gron aka "grep for JavaScript" (
         | https://github.com/tomnomnom/gron ), jq for querying, json_pp
         | for pretty-printing and `bat` for syntax highlighting.
        
         | Mariehane wrote:
         | This does some clever stuff in the background, e.g. query
         | optimization and performing the computations of a 'for'-loop
         | concurrently (potentially over multiple machines in a cluster).
         | So in my experience there was a bit of latency for the query to
         | execute, but then it executes faster and on much larger
         | datasets than regular JS could handle.
         | 
         | Just like how you could easily manipulate tabular data using
         | numpy or pandas (or excel), but SQL allows you to do it
         | declaratively, which has benefits in some cases.
        
         | marstall wrote:
         | it's just less code - instead of writing a series of buggy
         | loops with null checks and string concatenations all over the
         | place, you just writing a line of declaration and it deals with
         | all that for you.
        
       ___________________________________________________________________
       (page generated 2021-11-09 23:01 UTC)