[HN Gopher] JSONiq: JSON Query Language
___________________________________________________________________
JSONiq: JSON Query Language
Author : oever
Score : 91 points
Date : 2021-11-09 09:10 UTC (13 hours ago)
(HTM) web link (www.jsoniq.org)
(TXT) w3m dump (www.jsoniq.org)
| markstos wrote:
| The "Beginner's Tutorial" README has no code examples, and
| suggests installing Anaconda, Jupyter and an old version of Zorba
| to get started... beginning.
|
| https://github.com/ghislainfourny/jsoniq-tutorial
|
| Back button.
| ghislainfourny wrote:
| Thank you for your feedback. The tutorial was moved over to an
| interactive notebook that you can find here:
|
| https://colab.research.google.com/github/RumbleDB/rumble/blo...
|
| The easiest way to get started locally is described here:
| https://rumble.readthedocs.io/en/latest/Getting%20started/
|
| We no longer recommend the installation of Anaconda -- instead,
| the Spark tgz file can directly be downloaded, unzipped, and
| the bin subdirectory added to the PATH, which is considerably
| simpler. Likewise, the RumbleDB jar is just a download. Using
| the RumbleDB shell is the easiest to set up; Jupyter and the
| server require a bit of additional work.
|
| For a cluster, this is even easier because most cloud platforms
| can create one with the push of a button, and one only needs to
| download the RumbleDB jar on the remote machine and get started
| right away.
|
| Use of RumbleDB on a cluster is explained here:
| https://rumble.readthedocs.io/en/latest/Run%20on%20a%20clust...
| ghislainfourny wrote:
| There is also a docker version that should ease the process:
|
| https://hub.docker.com/r/rumbledb/rumble
| lichtenberger wrote:
| https://nbviewer.org/github/ghislainfourny/jsoniq-tutorial/b...
| vjust wrote:
| I found it awkward to inject perl-like $variables in the middle
| of a Python notebook. The association with anything-XML also is a
| downer personally for me, its also a goodbye to lightweight. Yup,
| this maybe a superficial observation, but the $ in a python-
| script?
|
| Jq works, both in python, and from the shell prompt.
| https://pypi.org/project/jq/
| danellis wrote:
| Curious syntax. Why does it require sigils? Why does assignment
| need both `let` and `:=`?
| oever wrote:
| It's derived from XQuery, a functional language for querying
| and creating XML and JSON. (I realize that that's not an answer
| to your question.)
| emmanueloga_ wrote:
| The look and feel of this language is due to its close
| relationship with XQuery and the XML world, including the
| syntax.
|
| From [1]:
|
| _XQuery 3.1 was designed with the goal to support additional
| data structures (maps, arrays) in memory. These structures are
| mapped to JSON for input and output. XQuery 3.1 has been a W3C
| recommendation since March 2017._
|
| _JSONiq was designed with the goal of querying and updating
| JSON in settings such as document stores. It was also designed
| by members of the XML Query working group (disclaimer: I am one
| of them) while investigating various possibilities to support
| JSON._
|
| 1: https://stackoverflow.com/questions/44919443/what-are-the-
| di...
| ghislainfourny wrote:
| Thank you for your question.
|
| This is, in fact, not an assignment, but a variable binding
| that is highly optimizable by an execution engine. The let
| clause is part of the FLWOR expression, works in orchestration
| with for, where, group by, order by, count, return; the ability
| to bind variables while doing relational algebra is a feature
| that is often missed in SQL.
|
| JSONiq is functional and, in its core (non scripting) version,
| does not allow modifying variable values.
| lichtenberger wrote:
| If you don't want both, then use
|
| https://github.com/sirixdb/brackit
|
| for instance :)
| Felk wrote:
| Can someone explain to me what the main differences to jq are,
| besides the syntax?
| masklinn wrote:
| Jq is xpath, this looks to be xquery. In fact it specifically
| works as an xquery embed.
| emmanueloga_ wrote:
| ... although, it seems neither JSONIq nor jq contain a
| "parent" operator, as far as I can tell.
| jolmg wrote:
| Not that not having it makes either of them any less
| powerful. If you descend to an inner context, you can refer
| to the parent/ancestor via a variable you can set before-
| hand.
| lichtenberger wrote:
| This might be too restricting regarding the storage.
|
| But we have a function: https://github.com/sirixdb/brackit
| bonzini wrote:
| You can write the example below in jq as
| def avg: add / length; group_by(.url) | map({
| "url": .[0].url, "hits": length, "avg":
| map(.response_time) | avg })
|
| so jq should be (at least roughly) as powerful as JSONiq.
| EdwardDiego wrote:
| My first thought also - would be a good entry for a FAQ or blog
| post.
| baq wrote:
| looks nothing like jq 1. let $stats :=
| collection("stats") 2. for $access in $stats 3.
| group by $url := $access.url 4. return 5. {
| 6. "url": $url, 7. "avg": avg($access.response_time),
| 8. "hits": count($access) 9. }
| keymone wrote:
| > besides the syntax
| mro_name wrote:
| isn't that question like "cinema besides the movies"?
| keymone wrote:
| no, language is not just syntax.
| mro_name wrote:
| Nor is the cinema just movies.
| keymone wrote:
| so if cinema isn't just movies then what's the problem
| with asking about the difference in "cinema besides the
| movies"?
| mro_name wrote:
| it's senseless besides nit-picking.
| keymone wrote:
| that's just like your opinion man.
| chii wrote:
| the user experience. jq is often a one liner, terse and
| expressive. This jsoniq language looks almost like a
| scripting language, requiring multiple lines to write an
| expression.
| _wolfie_ wrote:
| I have multiple 30+ lines jq scripts in my current
| project. So "often a one liner" is true, but it is not a
| requirement, so I'm still not sure why use this instead.
| jbverschoor wrote:
| > JSONiq borrows a large numbers of ideas from XQuery
|
| So basically grep or even sql ->xquery. No thank you!
| bonzini wrote:
| jq seems to have more focus on the generator and pipe
| abstractions. In jq you say "foo | map(bar)"; foo and map(bar)
| are both generators, and bar refers to each element of foo as
| ".". Here you say "for $x in foo return bar"; foo and bar are
| both JSON objects, and bar refers to each element of foo as
| "$x", so the iteration is more explicit.
|
| Likewise, compare "sum($element.response_time)" with
| "map(.response_time) | add" in jq. Processing in JSONiq goes
| inside to outside while jq goes left to right.
| Deukhoofd wrote:
| For one thing, modifying data I guess.
| jolmg wrote:
| What do you mean? jq can modify data: $ jq
| '.foo += 1' <<< '{"foo": 2}' { "foo": 3 }
| ainar-g wrote:
| Related:
|
| * RFC 6901 JavaScript Object Notation (JSON) Pointer:
| https://datatracker.ietf.org/doc/html/rfc6901
|
| * JSONPath RFC Draft: https://www.rfc-editor.org/rfc/internet-
| drafts/draft-ietf-js...
|
| * jq utility query language:
| https://stedolan.github.io/jq/manual/
| seumars wrote:
| Also:
|
| * GROQ: https://groq.dev
| Aaargh20318 wrote:
| JSONata: https://jsonata.org/
| dariosalvi78 wrote:
| somewhat similar to the Arango Query Language as well:
| https://www.arangodb.com/docs/stable/aql/
| lichtenberger wrote:
| We're working on an implementation for SirixDB with currently a
| slightly different syntax (for instance "=>" to dereference
| object fields). We're also supporting all kinds of temporal query
| enhancements (point in time queries, time travel queries...):
|
| https://github.com/sirixdb/sirix
|
| The query engine used is developed here (core implemented by
| Sebastian Bachle and his tudents). Ideally the backend can be any
| other data store as well.
|
| https://github.com/sirixdb/brackit
| sparsely wrote:
| > Unlike SQL, which can only manipulate normalized data, JSONiq
| natively works on the entire normalization spectrum: textual,
| heterogeneous, deeply nested
|
| Maybe I'm misunderstanding something, but isn't SQL perfectly
| capable of acting on denormalized data? We often talk about the
| degree of normalization that a RDMS has. Perhaps this is in
| reference to nested data structures (which would normally be
| represented in RDMS via FK relations), but even there it's
| implementation dependent.
| ghislainfourny wrote:
| Thank you for your comment.
|
| There exist indeed recent extensions of SQL that add support
| for denormalized data (arrays, objects), for example in Spark
| SQL and PostgreSQL, however SQL was originally designed for
| tables and it remains cumbersome to write complex queries on
| denormalized data (lateral views, etc).
|
| A deeper and more detailed analysis of query languages for
| nested data can be found in our recent paper with a concrete
| use case in high energy physics, to be presented at VLDB 2022:
|
| https://arxiv.org/abs/2104.12615
| chippiewill wrote:
| > Perhaps this is in reference to nested data structures
|
| If you have nested data structures then it violates 1NF, i.e.
| the data isn't normalised.
|
| Any "pure" RDBMS won't even be able to represent that
| denormalised data in a table, let alone query it with SQL.
| Normally this is worked around by straight up serialising it.
| Obviously some implementations have extensions, like Postgres
| has JSON columns.
| e-master wrote:
| To be honest, I'm unsure how many 'pure' RDBMS are out there,
| and doesn't seem like the parent comment had them in mind.
| I'd rather suggest that mainstream RDBMS in general don't
| have trouble working with data formats such as json or xml.
| SQL Server for one has no trouble querying into deeply nested
| XML/JSON columns and it does it surprisingly quickly based on
| my experience.
| maweki wrote:
| As there are various degrees of "not normalized", this was a
| very valid question (and answer).
|
| SQL databases do indeed work with data not in Boyce-Codd-
| Normal-Form. :D
| user3939382 wrote:
| Yes, MySQL also has functions dedicated to traversing and
| selecting JSON
| marstall wrote:
| if anyone's here looking for javascript-language json query-er,
| I've been using jsonata. found it deep and well-thought out, and
| not too hard to get my head around the syntax
|
| https://jsonata.org
| ur-whale wrote:
| I see dollar signs for variable names in there ... it is 2021 ...
| have we learnt nothing?
| gizdan wrote:
| What's wrong with dollar signs for variable names?
| jraph wrote:
| We the Europeans feel discriminated. We'd like to see
| variable names with EUR or PS.
|
| let EURgdpr = 42;
|
| let PScolour = #B0B;
|
| There, this is more compelling, isn't it?
| jhgb wrote:
| That's why let $?whatever = 123;
|
| was invented instead.
| ivanche wrote:
| I'd argue it's not dollar signs for variable names but dollar
| signs in variable names.
| HelloNurse wrote:
| As you should expect, there's a reason: distinguishing
| variables of the script from names within the JSON document and
| from numerous keywords and builtin stuff. Variables get a sigil
| because in normal code they are a minority of the names.
| jolmg wrote:
| Yup. Same with shells. Sigils on variables are so we don't
| need to quote every argument/option on every command. And we
| don't want to quote our strings in the shell because
| different to other programming languages, shells are for
| interactive use and so string literals are far more frequent
| than variables.
| bayesian_horse wrote:
| Isn't every programming language a json query language?
| Particularly functional ones?
| emmanueloga_ wrote:
| This answer will be a bit convoluted but hopefully has some
| interesting and related concepts ...
|
| First, from [1]:
|
| _... two representative [JSON] transformation tasks are
| considered ... The exercise demonstrates that the absence of
| parent or ancestor axes in the native representation of JSON
| means that the transformation task needs to be approached in a
| very different way._
|
| The article shows that the ability to "navigate upward" to the
| parent of a node can make certain queries and transformation
| easier to implement. BTW, as far as I can tell, JSONIq does not
| provide a way to "navigate upward" ...
|
| But! also from [1]:
|
| _The ability to navigate upwards (and to a lesser extent,
| sideways, to preceding and following siblings) clearly has
| advantages and disadvantages. Without upwards navigation, a
| transformation process that operates primarily as a recursive
| tree walk cannot discover the context of leaf nodes (for
| example, when processing a price, what product does it relate
| to?), so this information needs to be passed down in the form
| of parameters. However, the convenience of being able to
| determine the context of a node comes at a significant price._
|
| I remember dealing with trees in pure functional languages and
| finding that typically implementing parent pointers can be
| tricky [2]. I think if we combine this two facts, we can see
| that not all languages make the same query or transformation
| tasks equally easy.
|
| 1: Page 167 of
| https://archive.xmlprague.cz/2016/files/xmlprague-2016-proce...
|
| 2: https://okmij.org/ftp/Scheme/xml.html#parent-ptr
| gregwebs wrote:
| Sure, just like you don't have to use SQL.
|
| JSONiq is strongly typed, so only the strongly typed languages
| are equivalent.
|
| The point of this project, like SQL, is to have something
| programming language agnostic that can be understood and
| optimized by a db or a tool that doesn't use your programming
| language.
|
| It's mostly designed for data analysis and data interchange.
| capableweb wrote:
| I can only think of one language that supports JSON "natively":
| JavaScript. For the rest of the languages, you usually need a
| parser (which, to be fair, JavaScript also needs, it just
| happens to come with the environment you use (browsers/NodeJS))
| before you can actually do anything with the data.
| johnbeech wrote:
| Agreed, I generally impress colleagues with how fast it is to
| solve problems with JSON / JS - the big advantage for me is
| being able to "dump" a portion of program memory to JSON, and
| then reload that into state of the application. It makes
| systems much easier to understand when you can peer inside
| the memory structures. I find by comparison that strongly
| typed languages end up with a huge amount of boiler plate and
| abstractions, you start to question which bits/bytes actually
| matter. JSON reduces a problem space to something meaningful
| and easily craftable by humans.
| jbverschoor wrote:
| You can eval() json.. That's where it came from
| michaelmior wrote:
| You can in JavaScript but you almost certainly don't want
| to when you can just use JSON.parse and avoid the
| possibility of the execution of arbitrary code.
| dusted wrote:
| I'd say yes, even the node repl is pretty comfortable for
| working with json, but I know it's popular to hate on js for
| not being typescript.
| jansommer wrote:
| See also https://rumbledb.org by same author
| oever wrote:
| There was a talk about RumbleDB and JSONiq at Declarative
| Amsterdam a few days ago.
|
| https://declarative.amsterdam/program
| great_wubwub wrote:
| "The main language construct is called a FLWOR expression...an
| acronym for for, let, where, order by and return."
|
| They missed a huge opportunity to call it FLOWR (flow-er as in
| one who flows, or flower as in the thing that grows).
| bestouff wrote:
| Especially when you hover over "group by" and the pop-up title
| says "FLWOR power" !
| FrontAid wrote:
| Not sure where you got that quote from, but FLWOR is not
| something JSONiq invented.
|
| - https://en.wikipedia.org/wiki/FLWOR
|
| - https://www.w3.org/TR/xquery-30/#id-flwor-expressions
|
| FLWOR is pronounced 'flower'.
| Mariehane wrote:
| I took Ghislain Fourny's Big Data course at ETH last year and
| found working with JSONiq quite pleasant. I think it makes
| writing complex queries quite simple, and way more readable than
| trying to do the same in a query language like SQL, XPath, etc,
| and since json seems to be fairly standard it is also useable
| out-of-the-box for a lot of data "in the wild". It is my
| impression that the tooling is a bit difficult to setup though.
|
| Another thing is that it supposedly can scale to absolutely
| massive datasets, since there is a Rumbe/Hadoop backend.
| ocrb wrote:
| Was the same for me. Like others here, I was quite sceptical
| because I use Python / JS on a daily basis as a JSON query
| language, but after I got used to it, I really liked the
| language and think that it could increase my productivity if I
| would use it more often.
| heresie-dabord wrote:
| It's interesting to watch solutions and anti-patterns repeat in
| history ("the time-series of Reality").
|
| Structuring data is an art and a skill. Perhaps not taught very
| well.
|
| There is a low-cost solution: CSV, TSV, lines of text.
|
| The high-cost solution is relational structure and servers.
|
| In the middle range, with risk of expensive tooling and glaring
| anti-patterns, are XML and JSON.
|
| The latter can both be used simply or with grotesque opacity. A
| some point, feeding the monster becomes a main activity of the
| village. At which point, the villagers elect a new monster with
| their shovels and pitchforks.
| rendall wrote:
| I'm not exactly sure what you're talking about, but I like it
| / am scared by it
| traceroute66 wrote:
| How does this differ from JMESPath[1] ?
|
| [1] https://jmespath.org/
| Mariehane wrote:
| Looks to me like JMESPath is more similar to XPath, while
| JSONiq uses statements like 'let', 'for', etc. which are more
| similar to SQL (and seem more readable to me).
| justsomeuser wrote:
| I already have a JSON query language - it's JavaScript.
|
| > Queries are 80% shorter than imperative code
|
| From the examples they look the same length as the corresponding
| JS I would write, maybe slightly longer.
|
| What are the advantages to using this language over JS?
|
| JS has a fast JIT which would make filtering data fast. How does
| this query language compare in performance? Does it have indexes?
| ghislainfourny wrote:
| Thank you for your comment.
|
| The motivation for JSONiq and RumbleDB is discussed in this
| recent paper, the core argument is data independence:
|
| https://arxiv.org/abs/1910.11582v3
|
| JSONiq is a functional language and thus makes it easy to scale
| to collections that have billions of objects.
|
| A notable difference with JavaScript is that JSONiq has the
| FLWOR expression, which is similar to, but more generic than, a
| SQL statement.
| markstos wrote:
| Good points. JSON processing tools I find helpful in addition
| to JavaScript include Gron aka "grep for JavaScript" (
| https://github.com/tomnomnom/gron ), jq for querying, json_pp
| for pretty-printing and `bat` for syntax highlighting.
| Mariehane wrote:
| This does some clever stuff in the background, e.g. query
| optimization and performing the computations of a 'for'-loop
| concurrently (potentially over multiple machines in a cluster).
| So in my experience there was a bit of latency for the query to
| execute, but then it executes faster and on much larger
| datasets than regular JS could handle.
|
| Just like how you could easily manipulate tabular data using
| numpy or pandas (or excel), but SQL allows you to do it
| declaratively, which has benefits in some cases.
| marstall wrote:
| it's just less code - instead of writing a series of buggy
| loops with null checks and string concatenations all over the
| place, you just writing a line of declaration and it deals with
| all that for you.
___________________________________________________________________
(page generated 2021-11-09 23:01 UTC)