[HN Gopher] Dasel: Select, put and delete data from JSON, TOML, ...
       ___________________________________________________________________
        
       Dasel: Select, put and delete data from JSON, TOML, YAML, XML and
       CSV
        
       Author : edward
       Score  : 242 points
       Date   : 2024-08-18 14:11 UTC (8 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | michaelcampbell wrote:
       | Neat; seems about every quarter or so one of these types of tools
       | is highlighted here.
       | 
       | Awaiting all the responses from people to show off or list what
       | tool they've landed on to support their specific use cases; I
       | always learn a lot from these.
        
         | digdugdirk wrote:
         | I'm a bit confused as to the use case. Is it just a way to
         | interact with json/yaml style documents as if they were a
         | structured database, but from the command line? Kind of an in-
         | between for those moments you don't want to write a quick
         | script to batch modify files?
         | 
         | It looks really well done, I think I'm just failing to see how
         | this is more beneficial than just opening a single file in the
         | editor and making changes, or writing a quick functional script
         | so you have the history of the changes that were made to a
         | batch of files.
         | 
         | If someone could explain how I could (and why I should) add a
         | new tool to my digital toolbelt, I'd greatly appreciate it.
        
           | 0thgen wrote:
           | one benefit (idk if it applies here) is if the
           | select/put/delete statements didn't require loading the data
           | in memory; so you could query massive data files with limited
           | RAM and not have to solve that problem yourself for each data
           | storage format you're working with
        
           | supriyo-biswas wrote:
           | For things that are mostly shell scripts and things in a
           | similar family (Ansible playbooks, deployment pipelines etc.)
           | and where you need to modify a structured file quickly, it's
           | usually much faster to use the DSL provided by the tool than
           | calling out to various scripts to extract or modify a single
           | JSON key.
           | 
           | People often say that they'd prefer to write their shell
           | scripts in Python or even Go these days, but the problem
           | there is that the elements of structured programming makes
           | the overall steps difficult to follow. Typically, the
           | paradigm with use cases adjacent with shell scripts is to be
           | able to view what it is doing without any sort of
           | abstractions.
        
           | Lord_Zero wrote:
           | This could be useful for CICD where you need to bump a
           | version number in a file based on the build number.
        
             | kate_bits wrote:
             | For this specific use case? sed would work just as well and
             | probably already exists in your environment.
        
           | macNchz wrote:
           | I see the appeal of having a declarative syntax rather than
           | writing a bunch of code to make the change reliably and
           | safely.
        
           | fsckboy wrote:
           | the in-between mode that you mention but seem to dismiss it
           | is the way most traditional unixheads work with data most of
           | the time: from the command line
           | 
           | editor? when i pull up emacs, 50% of the time it's write
           | emacs macros, and I do that because shell scripts don't
           | easily go backward in the stream. (something rarely mentioned
           | about teco was that it was a stream editor that would chew
           | its way forward through files; you didn't need the memory to
           | keep it all in core, and it could go backward within
           | understandable limits)
           | 
           | writing an actual shellscript is only for when it's really
           | hairy, you are going to be repeating it and/or you need the
           | types of error handling that cloud up the clarity of the
           | commandline
           | 
           | the commandline does provide rudimentary "records" in the
           | saved history
        
           | simonw wrote:
           | I use jq for this kind of thing several times a week. It's
           | great for piped data - things like running curl to fetch
           | JSON, then piping it though to reformat it in different ways.
           | 
           | Here's a jq expression I used recently to turn a complete
           | GitHub Issues thread into a single Markdown document:
           | curl -s "https://api.github.com/repos/simonw/shot-
           | scraper/issues/1/comments" \           | jq -r '.[] | "##
           | Comment by \(.user.login) on \(.created_at)\n\n\(.body)\n"'
           | 
           | I use this pattern a lot. Data often comes in slightly the
           | wrong shape - being able to fix that with a one-liner
           | terminal command is really useful.
        
           | paulddraper wrote:
           | > or writing a quick functional script
           | 
           | It's exactly a quick functional script.
        
           | tofflos wrote:
           | I used yq last week to scan through all the Java projects
           | (i.e. Maven pom.xml-files) within our org to check which ones
           | inherit from the corporate pom.                 yq eval
           | --input-format xml --output-format csv '[file_index,
           | file_name, .project.parent.groupId,
           | .project.parent.artifactId, .project.parent.version]'
           | **/pom.xml
        
             | hnlmorg wrote:
             | Which yq? Last time I checked, there seemed to be a few
             | tools with the same name.
        
         | hnlmorg wrote:
         | Personally I think this is a problem better spent by fixing the
         | shell. There's a few alt shells out there now, Nushell, Elvish
         | plus the one I help maintain, Murex (https://murex.rocks).
         | 
         | I'm obviously going to biased here, but it's definitely worth
         | your time checking out some alt shells.
        
       | 0thgen wrote:
       | I like the idea of using select/put/delete (sql-style syntax) to
       | query non-rdb data storage. It sort of raises the question of,
       | could there be 1 universal language to query relational
       | databases, text file storage (json, csv, etc), and anything else.
       | 
       | Or put another way, is there any data storage format that
       | couldn't be queried by SQL?
        
         | IgorPartola wrote:
         | From what I understand SQL is or at least can be made Turing
         | complete so in that sense you should be able to query any data
         | store using it. However, that doesn't mean it will be efficient
         | to do so.
         | 
         | I suspect for most data structures you could construct an index
         | to make querying faster. But think about querying something
         | like a linked list: it is not going to be too efficient without
         | an index but you should still be able to write an engine that
         | will do so.
         | 
         | If you have something like a collection of arbitrary JSON
         | objects without a set structure you should still be able to
         | express what you are trying to do with SQL because Turing
         | completeness means it can examine the object structure as well
         | as contents before deciding what to do with it. But your SQL
         | would look more like procedural code than you might be used to.
        
         | Derelicte wrote:
         | There are a lot of differences between storage formats. It
         | would be incredibly difficult to create a universal query
         | language. It would need to either a) change the storage formats
         | so much that they're not really following their original
         | standard, or b) create so many different versions of the query
         | language that it's not really one standard.
         | 
         | Off the top of my head, SQL can't do lists as values, and
         | doesn't have simple key-value storage. Json doesn't have
         | tables, or primary keys / foreign keys, and can have nested
         | data
        
           | esprehn wrote:
           | SQL has both standard JSON and Array functions. What's the
           | "list as value" feature you think is missing?
        
         | Perz1val wrote:
         | XML attributes come to mind
        
           | lagniappe wrote:
           | Perz1val, it's me, your grandchild from the distant future.
           | Don't do this. XML goes rogue and destroys humanity.
        
         | slightwinder wrote:
         | > Or put another way, is there any data storage format that
         | couldn't be queried by SQL?
         | 
         | Depends on how keen you are on pure SQL. For example, postgres
         | and sqlite have json-extensions, but they also enhance the
         | syntax for it. Simliar can be done for all other formats too,
         | but this means you need to learn special syntax and be aware of
         | the storage-format for every query. This is far off from a real
         | universal language.
        
         | TeMPOraL wrote:
         | > _Or put another way, is there any data storage format that
         | couldn't be queried by SQL?_
         | 
         | Is your SQL Turing-complete? If yes, then it could query
         | anything. Whether or not you'd like the experience is another
         | thing.
         | 
         | Queries are programs. Querying data from a fixed schema, is
         | easy. Hell, you could make an "universal query language" by
         | just concatenating together this dasel, with SQL and Cypher, so
         | you'd use the relevant facet when querying a specific data
         | source. The real problem starts when your query structure isn't
         | fixed - where what data you need depends on what the data says.
         | When you're dealing with indirection. Once you start doing
         | joins or conditionals or `foo[bar['baz']] if
         | bar.hasProperty('baz') else 42` kind of indirection, you
         | quickly land in the Turing tarpit[0] - whatever your query
         | language is, some shapes of data will be super painful for it
         | to deal with. Painful, but still possible.
         | 
         | --
         | 
         | [0] - https://en.wikipedia.org/wiki/Turing_tarpit
        
         | gumby wrote:
         | > It sort of raises the question of, could there be 1 universal
         | language to query relational databases, text file storage
         | (json, csv, etc), and anything else.
         | 
         | Sure there _could_ be -- any turing-complete language (which
         | SQL is) can query anything.
         | 
         | But the reason we have different programming languages* is
         | because they have different affordances and make it easy to
         | express certain things at the cost of being less convenient for
         | other things. Thus APL/Prolog/Lisp/C/Python can all coexist.
         | 
         | SQL is great for relational databases, but it's like commuting
         | to work in a tank when it comes to key-value stores.
         | 
         | * and of course because programmers love building tools, and a
         | language is the ultimate tool.
        
           | sweeter wrote:
           | sounds like a nightmare to do logistically. it would be cool
           | though.
        
         | ablob wrote:
         | If entries can be relations themselves it is not possible
         | afaik. For example                 User | Telephone Numbers
         | -----+------------------       A    | 123, 456           <- not
         | atomic; more than 1 number (i.e. a set)       B    | 789
         | 
         | Now there are academic operators to convert to and from a
         | purely relational system, but I don't think they are
         | implemented/in the standard. I forgot what they are called,
         | however.
         | 
         | In general you don't want a universal query language. Depending
         | on the shape of the data you want different things to be easily
         | expressible. You can, for example express queries on tree-
         | shaped data with SQL (see xPath-Accelerator), but it is quite
         | cumbersome and its meaning is lost to the reader. I.e.: It's
         | fine when computer-generated, but there is too much noise for a
         | human to read/write themselves. I'd be glad to be proven wrong
         | here, but as time has shown, there is no one size fits all for
         | programming languages. The requirements for different
         | applications just vary too much.
        
         | wslh wrote:
         | > It sort of raises the question of, could there be 1 universal
         | language to query relational databases...
         | 
         | Even if SQL and/or another query language could be Turing-
         | complete, that doesn't mean that you can have 1 universal
         | language to perform all possible queries in an efficient way.
         | In basic computer science terms that means that your data
         | structure is linked with the queries, and efficiency you want
         | to achieve, and ad-hoc changes should be created for specific
         | problems.
        
         | acjohnson55 wrote:
         | That's basically SQL. Many SQL systems have lots of built in
         | connectivity to various data sources.
         | 
         | DuckDB is a good example of a (literally) serverless SQL-based
         | tool for data processing. It is designed to be able to treat
         | the common data serialization formats as though they are tables
         | in a schema [1], and you can export to many of the same
         | formats. With extensions, you can also connect to relational
         | databases as foreign tables.
         | 
         | This connectivity is a big reason it has built a pretty avid
         | following in the data science world.
         | 
         | [1] https://duckdb.org/docs/data/overview
         | 
         | [2] https://duckdb.org/docs/extensions/json#json-importexport
         | 
         | [3] https://duckdb.org/docs/extensions/postgres
        
       | bloopernova wrote:
       | Having recently messed with JMESPath in AWS, I wonder which of
       | these structured data tools:                 - Is easier to learn
       | - Has most/best documentation        - Is faster to write in
       | 
       | Does anyone know of a good comparison article?
       | 
       | (I still default to jq, I guess it has the momentum)
        
         | wodenokoto wrote:
         | Is there a JMESPath tool that works on json, yaml, toml, etc?
        
           | bloopernova wrote:
           | I don't think so, that's a good point against it.
           | 
           | It's heavily used by AWS and Azure though.
        
       | montroser wrote:
       | Cool project -- but we need a standardized/spec'd query language
       | in order to realize the goals in the "one tool to rule them all"
       | section of this readme.
       | 
       | I have a hard time internalizing the jq query syntax, and am not
       | overly excited to invest in learning all the quirks when it's not
       | based on a widely-adopted open standard. Maybe `JMESPath` could
       | be the way forward.
       | 
       | Sometimes `gron` can be a pretty great alternative approach,
       | depending on your use case. At least it is very intuitive and
       | plays nicely with other tools.
        
         | AtlasBarfed wrote:
         | Ultimately JSON, TOML, YAML, XML, properties files are tree
         | structures, and XPath type syntax should roughly apply to them
         | all, along with about a hundreds "path expression" languages
         | (java had SpEL, velocity, JSP-EL, OGNL, and probably dozens of
         | others).
         | 
         | XPath, although it had some clunky artifacts for XML (which was
         | the reason we moved from XML like namespaces... ugh), had
         | basically the apex of expression/path/navigation capabilites.
         | It would be really nice to see XPath ported to a general nav
         | language that is supported by all programming environments and
         | handled all the relevant formats.
        
           | dleeftink wrote:
           | I still like Xidel[0] for this reason; it may be a little
           | older, but for a CLI scraper a lot of data transformations
           | needs can be satisfied with Xpath/XQuery.
           | 
           | [0]: https://github.com/benibela/xidel
        
           | sitkack wrote:
           | Speaking of trees, gron/ungron is an amazing transformer that
           | allows one to use _any_ query tool on the leaves of the tree
           | and then turn the flattened structure back into a document
           | (json).
           | 
           | I'd love to see gron/ungron implemented for all tree
           | structures.
           | 
           | https://github.com/tomnomnom/gron
        
         | paulddraper wrote:
         | Jq is far more useful/capable than JMESPath.
        
       | levzettelin wrote:
       | How often do you have to add singular entries to some data file
       | your working with? For all other cases, Miller and xsv look more
       | powerful.
        
         | notRobot wrote:
         | Often you're adding multiple lines programmatically from a
         | script or cron or whatever.
        
       | FireInsight wrote:
       | I like using Nushell for this. It has a `from` builtin for all
       | sorts of formats
       | https://www.nushell.sh/commands/categories/formats.html and after
       | that the data is just tables, which you can query with other
       | builtins and syntax
       | https://www.nushell.sh/book/navigating_structured_data.html
        
         | RulerOf wrote:
         | I got really excited when I saw `from_ini` but disappointed
         | when I saw no `to_ini`.
         | 
         | I would really like to find a good workflow for idempotent
         | modifications to INI files, but haven't stumbled across one
         | yet.
        
       | Gepsens wrote:
       | I think you need some kind of autocomplete here to make it
       | worthwhile
        
       | frou_dh wrote:
       | Another one for the big list:
       | 
       | https://github.com/dbohdan/structured-text-tools
       | 
       | In fact it's already on it 6 times.
        
       | mbrumlow wrote:
       | I can't tell you how many times I'd have hobbled together a tool
       | like this to use in go. I will be converting to this.
       | 
       | Sometimes we don't actually want to parse yaml, we just want to
       | mutate it without needing to module the underlying objects.
       | 
       | Being able to select and replace, add data to an existing yaml
       | document is a huge win for automation.
        
         | bbkane wrote:
         | Yes! This is especially powerful when combined with git-xargs
         | to auto-open PRs with the results of the mutation.
         | 
         | I wrote about this a little in https://www.bbkane.com/blog/go-
         | project-notes/#scripting-chan... and it's really helped me
         | keepy GitHub workflows and various config files in sync across
         | project repos
        
       | arandomhuman wrote:
       | Shameless plug but if you're a fan of jq style querying rather
       | than sql for some reason you can also use qq[0] for these and a
       | few other formats.
       | 
       | [0] https://github.com/JFryy/qq
        
       | ranger_danger wrote:
       | Why are terminal "movies" always a gif with no video controls
       | that move at the speed of light?
        
         | hnlmorg wrote:
         | They aren't always. There's sites like asciinema (I hope I've
         | spelt that right). But the problem is GitHub readme's are
         | pretty limited in what you can embed. So you either have to
         | link out to another site, or embed an animated gif.
        
       | samstave wrote:
       | This was really cool to read:
       | 
       | Please open a discussion if:                   You have a
       | question.         You're not sure how to achieve something with
       | dasel.         You have an idea but don't quite know how you
       | would like it to work.         You have achieved something cool
       | with dasel and want to show it off.         Anything else!
       | 
       | ---
       | 
       | I really like dasal.
       | 
       | Can I pipe a .csv to dasal and have it spit it out in JSON? And
       | is that the best way to do that? (arent there like a ton of ways
       | to achieve this, or would dasal make it super simple?)
       | 
       | Also, what would be interesting would be to be able to pull and
       | scrape text, to put into a structured JSON.
       | 
       | For example - I was talking about using a Discrenment Lattice to
       | construct a profile for a PERON PLACE THING that one was doing
       | research on, such that you can pull multiple sources/data-types
       | for information on [SUBJECT] and have the knowledge dossier
       | updated. Where, for example one could pull a lot of results that
       | can be summarized by an GPT - then using Dasal to grab the
       | relevant component-data-points and dasal-ize and feed them into
       | the Discernment Lattice JSON File such as I described here:
       | 
       | https://i.imgur.com/vuuAtAL.png
       | 
       | So building out a structured lattice file for a senator would
       | look like:
       | 
       | https://i.imgur.com/68WFiGA.png
       | 
       | So, using a crawlee txtai workflow --> dasal parse --> into
       | lattice file.
       | 
       | Then the lattice file can be used to compare similar slices
       | across all the different [SUBJECTS] -- such that further ties can
       | be made.
       | 
       | So, in this example - we have the data being organized for all
       | the various entanglements a congress person has - and we can use
       | that as a constraint for searching for relations between
       | [subjects] which share elements across ordinarily opaque threads.
       | 
       | The cool thing, is that one could then easily use it to ensure
       | you scrub and manipulate the data into a more trainable lens for
       | effectively fine tuning the data that you want to fine tune the
       | model with/on - thus creating a hyper contextually focused lens -
       | https://i.imgur.com/yngUwpr.png
        
       | ABCD0 wrote:
       | Comment
        
       | pixelbeat__ wrote:
       | A similar tool for ini files
       | https://www.pixelbeat.org/programs/crudini/
        
       | afh1 wrote:
       | Interesting, but I guess yq already does that, albeit slower
       | according to the README.
        
       | gergely wrote:
       | I have 6 petabytes of parquet where I would need to replace 1
       | value in each line. Could it handle that?
        
       ___________________________________________________________________
       (page generated 2024-08-18 23:00 UTC)