[HN Gopher] Dasel: Select, put and delete data from JSON, TOML, ...
___________________________________________________________________
Dasel: Select, put and delete data from JSON, TOML, YAML, XML and
CSV
Author : edward
Score : 242 points
Date : 2024-08-18 14:11 UTC (8 hours ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| michaelcampbell wrote:
| Neat; seems about every quarter or so one of these types of tools
| is highlighted here.
|
| Awaiting all the responses from people to show off or list what
| tool they've landed on to support their specific use cases; I
| always learn a lot from these.
| digdugdirk wrote:
| I'm a bit confused as to the use case. Is it just a way to
| interact with json/yaml style documents as if they were a
| structured database, but from the command line? Kind of an in-
| between for those moments you don't want to write a quick
| script to batch modify files?
|
| It looks really well done, I think I'm just failing to see how
| this is more beneficial than just opening a single file in the
| editor and making changes, or writing a quick functional script
| so you have the history of the changes that were made to a
| batch of files.
|
| If someone could explain how I could (and why I should) add a
| new tool to my digital toolbelt, I'd greatly appreciate it.
| 0thgen wrote:
| one benefit (idk if it applies here) is if the
| select/put/delete statements didn't require loading the data
| in memory; so you could query massive data files with limited
| RAM and not have to solve that problem yourself for each data
| storage format you're working with
| supriyo-biswas wrote:
| For things that are mostly shell scripts and things in a
| similar family (Ansible playbooks, deployment pipelines etc.)
| and where you need to modify a structured file quickly, it's
| usually much faster to use the DSL provided by the tool than
| calling out to various scripts to extract or modify a single
| JSON key.
|
| People often say that they'd prefer to write their shell
| scripts in Python or even Go these days, but the problem
| there is that the elements of structured programming makes
| the overall steps difficult to follow. Typically, the
| paradigm with use cases adjacent with shell scripts is to be
| able to view what it is doing without any sort of
| abstractions.
| Lord_Zero wrote:
| This could be useful for CICD where you need to bump a
| version number in a file based on the build number.
| kate_bits wrote:
| For this specific use case? sed would work just as well and
| probably already exists in your environment.
| macNchz wrote:
| I see the appeal of having a declarative syntax rather than
| writing a bunch of code to make the change reliably and
| safely.
| fsckboy wrote:
| the in-between mode that you mention but seem to dismiss it
| is the way most traditional unixheads work with data most of
| the time: from the command line
|
| editor? when i pull up emacs, 50% of the time it's write
| emacs macros, and I do that because shell scripts don't
| easily go backward in the stream. (something rarely mentioned
| about teco was that it was a stream editor that would chew
| its way forward through files; you didn't need the memory to
| keep it all in core, and it could go backward within
| understandable limits)
|
| writing an actual shellscript is only for when it's really
| hairy, you are going to be repeating it and/or you need the
| types of error handling that cloud up the clarity of the
| commandline
|
| the commandline does provide rudimentary "records" in the
| saved history
| simonw wrote:
| I use jq for this kind of thing several times a week. It's
| great for piped data - things like running curl to fetch
| JSON, then piping it though to reformat it in different ways.
|
| Here's a jq expression I used recently to turn a complete
| GitHub Issues thread into a single Markdown document:
| curl -s "https://api.github.com/repos/simonw/shot-
| scraper/issues/1/comments" \ | jq -r '.[] | "##
| Comment by \(.user.login) on \(.created_at)\n\n\(.body)\n"'
|
| I use this pattern a lot. Data often comes in slightly the
| wrong shape - being able to fix that with a one-liner
| terminal command is really useful.
| paulddraper wrote:
| > or writing a quick functional script
|
| It's exactly a quick functional script.
| tofflos wrote:
| I used yq last week to scan through all the Java projects
| (i.e. Maven pom.xml-files) within our org to check which ones
| inherit from the corporate pom. yq eval
| --input-format xml --output-format csv '[file_index,
| file_name, .project.parent.groupId,
| .project.parent.artifactId, .project.parent.version]'
| **/pom.xml
| hnlmorg wrote:
| Which yq? Last time I checked, there seemed to be a few
| tools with the same name.
| hnlmorg wrote:
| Personally I think this is a problem better spent by fixing the
| shell. There's a few alt shells out there now, Nushell, Elvish
| plus the one I help maintain, Murex (https://murex.rocks).
|
| I'm obviously going to biased here, but it's definitely worth
| your time checking out some alt shells.
| 0thgen wrote:
| I like the idea of using select/put/delete (sql-style syntax) to
| query non-rdb data storage. It sort of raises the question of,
| could there be 1 universal language to query relational
| databases, text file storage (json, csv, etc), and anything else.
|
| Or put another way, is there any data storage format that
| couldn't be queried by SQL?
| IgorPartola wrote:
| From what I understand SQL is or at least can be made Turing
| complete so in that sense you should be able to query any data
| store using it. However, that doesn't mean it will be efficient
| to do so.
|
| I suspect for most data structures you could construct an index
| to make querying faster. But think about querying something
| like a linked list: it is not going to be too efficient without
| an index but you should still be able to write an engine that
| will do so.
|
| If you have something like a collection of arbitrary JSON
| objects without a set structure you should still be able to
| express what you are trying to do with SQL because Turing
| completeness means it can examine the object structure as well
| as contents before deciding what to do with it. But your SQL
| would look more like procedural code than you might be used to.
| Derelicte wrote:
| There are a lot of differences between storage formats. It
| would be incredibly difficult to create a universal query
| language. It would need to either a) change the storage formats
| so much that they're not really following their original
| standard, or b) create so many different versions of the query
| language that it's not really one standard.
|
| Off the top of my head, SQL can't do lists as values, and
| doesn't have simple key-value storage. Json doesn't have
| tables, or primary keys / foreign keys, and can have nested
| data
| esprehn wrote:
| SQL has both standard JSON and Array functions. What's the
| "list as value" feature you think is missing?
| Perz1val wrote:
| XML attributes come to mind
| lagniappe wrote:
| Perz1val, it's me, your grandchild from the distant future.
| Don't do this. XML goes rogue and destroys humanity.
| slightwinder wrote:
| > Or put another way, is there any data storage format that
| couldn't be queried by SQL?
|
| Depends on how keen you are on pure SQL. For example, postgres
| and sqlite have json-extensions, but they also enhance the
| syntax for it. Simliar can be done for all other formats too,
| but this means you need to learn special syntax and be aware of
| the storage-format for every query. This is far off from a real
| universal language.
| TeMPOraL wrote:
| > _Or put another way, is there any data storage format that
| couldn't be queried by SQL?_
|
| Is your SQL Turing-complete? If yes, then it could query
| anything. Whether or not you'd like the experience is another
| thing.
|
| Queries are programs. Querying data from a fixed schema, is
| easy. Hell, you could make an "universal query language" by
| just concatenating together this dasel, with SQL and Cypher, so
| you'd use the relevant facet when querying a specific data
| source. The real problem starts when your query structure isn't
| fixed - where what data you need depends on what the data says.
| When you're dealing with indirection. Once you start doing
| joins or conditionals or `foo[bar['baz']] if
| bar.hasProperty('baz') else 42` kind of indirection, you
| quickly land in the Turing tarpit[0] - whatever your query
| language is, some shapes of data will be super painful for it
| to deal with. Painful, but still possible.
|
| --
|
| [0] - https://en.wikipedia.org/wiki/Turing_tarpit
| gumby wrote:
| > It sort of raises the question of, could there be 1 universal
| language to query relational databases, text file storage
| (json, csv, etc), and anything else.
|
| Sure there _could_ be -- any turing-complete language (which
| SQL is) can query anything.
|
| But the reason we have different programming languages* is
| because they have different affordances and make it easy to
| express certain things at the cost of being less convenient for
| other things. Thus APL/Prolog/Lisp/C/Python can all coexist.
|
| SQL is great for relational databases, but it's like commuting
| to work in a tank when it comes to key-value stores.
|
| * and of course because programmers love building tools, and a
| language is the ultimate tool.
| sweeter wrote:
| sounds like a nightmare to do logistically. it would be cool
| though.
| ablob wrote:
| If entries can be relations themselves it is not possible
| afaik. For example User | Telephone Numbers
| -----+------------------ A | 123, 456 <- not
| atomic; more than 1 number (i.e. a set) B | 789
|
| Now there are academic operators to convert to and from a
| purely relational system, but I don't think they are
| implemented/in the standard. I forgot what they are called,
| however.
|
| In general you don't want a universal query language. Depending
| on the shape of the data you want different things to be easily
| expressible. You can, for example express queries on tree-
| shaped data with SQL (see xPath-Accelerator), but it is quite
| cumbersome and its meaning is lost to the reader. I.e.: It's
| fine when computer-generated, but there is too much noise for a
| human to read/write themselves. I'd be glad to be proven wrong
| here, but as time has shown, there is no one size fits all for
| programming languages. The requirements for different
| applications just vary too much.
| wslh wrote:
| > It sort of raises the question of, could there be 1 universal
| language to query relational databases...
|
| Even if SQL and/or another query language could be Turing-
| complete, that doesn't mean that you can have 1 universal
| language to perform all possible queries in an efficient way.
| In basic computer science terms that means that your data
| structure is linked with the queries, and efficiency you want
| to achieve, and ad-hoc changes should be created for specific
| problems.
| acjohnson55 wrote:
| That's basically SQL. Many SQL systems have lots of built in
| connectivity to various data sources.
|
| DuckDB is a good example of a (literally) serverless SQL-based
| tool for data processing. It is designed to be able to treat
| the common data serialization formats as though they are tables
| in a schema [1], and you can export to many of the same
| formats. With extensions, you can also connect to relational
| databases as foreign tables.
|
| This connectivity is a big reason it has built a pretty avid
| following in the data science world.
|
| [1] https://duckdb.org/docs/data/overview
|
| [2] https://duckdb.org/docs/extensions/json#json-importexport
|
| [3] https://duckdb.org/docs/extensions/postgres
| bloopernova wrote:
| Having recently messed with JMESPath in AWS, I wonder which of
| these structured data tools: - Is easier to learn
| - Has most/best documentation - Is faster to write in
|
| Does anyone know of a good comparison article?
|
| (I still default to jq, I guess it has the momentum)
| wodenokoto wrote:
| Is there a JMESPath tool that works on json, yaml, toml, etc?
| bloopernova wrote:
| I don't think so, that's a good point against it.
|
| It's heavily used by AWS and Azure though.
| montroser wrote:
| Cool project -- but we need a standardized/spec'd query language
| in order to realize the goals in the "one tool to rule them all"
| section of this readme.
|
| I have a hard time internalizing the jq query syntax, and am not
| overly excited to invest in learning all the quirks when it's not
| based on a widely-adopted open standard. Maybe `JMESPath` could
| be the way forward.
|
| Sometimes `gron` can be a pretty great alternative approach,
| depending on your use case. At least it is very intuitive and
| plays nicely with other tools.
| AtlasBarfed wrote:
| Ultimately JSON, TOML, YAML, XML, properties files are tree
| structures, and XPath type syntax should roughly apply to them
| all, along with about a hundreds "path expression" languages
| (java had SpEL, velocity, JSP-EL, OGNL, and probably dozens of
| others).
|
| XPath, although it had some clunky artifacts for XML (which was
| the reason we moved from XML like namespaces... ugh), had
| basically the apex of expression/path/navigation capabilites.
| It would be really nice to see XPath ported to a general nav
| language that is supported by all programming environments and
| handled all the relevant formats.
| dleeftink wrote:
| I still like Xidel[0] for this reason; it may be a little
| older, but for a CLI scraper a lot of data transformations
| needs can be satisfied with Xpath/XQuery.
|
| [0]: https://github.com/benibela/xidel
| sitkack wrote:
| Speaking of trees, gron/ungron is an amazing transformer that
| allows one to use _any_ query tool on the leaves of the tree
| and then turn the flattened structure back into a document
| (json).
|
| I'd love to see gron/ungron implemented for all tree
| structures.
|
| https://github.com/tomnomnom/gron
| paulddraper wrote:
| Jq is far more useful/capable than JMESPath.
| levzettelin wrote:
| How often do you have to add singular entries to some data file
| your working with? For all other cases, Miller and xsv look more
| powerful.
| notRobot wrote:
| Often you're adding multiple lines programmatically from a
| script or cron or whatever.
| FireInsight wrote:
| I like using Nushell for this. It has a `from` builtin for all
| sorts of formats
| https://www.nushell.sh/commands/categories/formats.html and after
| that the data is just tables, which you can query with other
| builtins and syntax
| https://www.nushell.sh/book/navigating_structured_data.html
| RulerOf wrote:
| I got really excited when I saw `from_ini` but disappointed
| when I saw no `to_ini`.
|
| I would really like to find a good workflow for idempotent
| modifications to INI files, but haven't stumbled across one
| yet.
| Gepsens wrote:
| I think you need some kind of autocomplete here to make it
| worthwhile
| frou_dh wrote:
| Another one for the big list:
|
| https://github.com/dbohdan/structured-text-tools
|
| In fact it's already on it 6 times.
| mbrumlow wrote:
| I can't tell you how many times I'd have hobbled together a tool
| like this to use in go. I will be converting to this.
|
| Sometimes we don't actually want to parse yaml, we just want to
| mutate it without needing to module the underlying objects.
|
| Being able to select and replace, add data to an existing yaml
| document is a huge win for automation.
| bbkane wrote:
| Yes! This is especially powerful when combined with git-xargs
| to auto-open PRs with the results of the mutation.
|
| I wrote about this a little in https://www.bbkane.com/blog/go-
| project-notes/#scripting-chan... and it's really helped me
| keepy GitHub workflows and various config files in sync across
| project repos
| arandomhuman wrote:
| Shameless plug but if you're a fan of jq style querying rather
| than sql for some reason you can also use qq[0] for these and a
| few other formats.
|
| [0] https://github.com/JFryy/qq
| ranger_danger wrote:
| Why are terminal "movies" always a gif with no video controls
| that move at the speed of light?
| hnlmorg wrote:
| They aren't always. There's sites like asciinema (I hope I've
| spelt that right). But the problem is GitHub readme's are
| pretty limited in what you can embed. So you either have to
| link out to another site, or embed an animated gif.
| samstave wrote:
| This was really cool to read:
|
| Please open a discussion if: You have a
| question. You're not sure how to achieve something with
| dasel. You have an idea but don't quite know how you
| would like it to work. You have achieved something cool
| with dasel and want to show it off. Anything else!
|
| ---
|
| I really like dasal.
|
| Can I pipe a .csv to dasal and have it spit it out in JSON? And
| is that the best way to do that? (arent there like a ton of ways
| to achieve this, or would dasal make it super simple?)
|
| Also, what would be interesting would be to be able to pull and
| scrape text, to put into a structured JSON.
|
| For example - I was talking about using a Discrenment Lattice to
| construct a profile for a PERON PLACE THING that one was doing
| research on, such that you can pull multiple sources/data-types
| for information on [SUBJECT] and have the knowledge dossier
| updated. Where, for example one could pull a lot of results that
| can be summarized by an GPT - then using Dasal to grab the
| relevant component-data-points and dasal-ize and feed them into
| the Discernment Lattice JSON File such as I described here:
|
| https://i.imgur.com/vuuAtAL.png
|
| So building out a structured lattice file for a senator would
| look like:
|
| https://i.imgur.com/68WFiGA.png
|
| So, using a crawlee txtai workflow --> dasal parse --> into
| lattice file.
|
| Then the lattice file can be used to compare similar slices
| across all the different [SUBJECTS] -- such that further ties can
| be made.
|
| So, in this example - we have the data being organized for all
| the various entanglements a congress person has - and we can use
| that as a constraint for searching for relations between
| [subjects] which share elements across ordinarily opaque threads.
|
| The cool thing, is that one could then easily use it to ensure
| you scrub and manipulate the data into a more trainable lens for
| effectively fine tuning the data that you want to fine tune the
| model with/on - thus creating a hyper contextually focused lens -
| https://i.imgur.com/yngUwpr.png
| ABCD0 wrote:
| Comment
| pixelbeat__ wrote:
| A similar tool for ini files
| https://www.pixelbeat.org/programs/crudini/
| afh1 wrote:
| Interesting, but I guess yq already does that, albeit slower
| according to the README.
| gergely wrote:
| I have 6 petabytes of parquet where I would need to replace 1
| value in each line. Could it handle that?
___________________________________________________________________
(page generated 2024-08-18 23:00 UTC)