https://datastation.multiprocess.io/blog/2022-01-11-dsq.html January 11, 2022: New blog post! dsq: Commandline tool for running SQL queries against JSON, CSV, Excel, Parquet, and more. DataStation The Open-Source Data IDE Install the App Subscribe Documentation Blog The Pitch Community Try it out online! Blog dsq: Commandline tool for running SQL queries against JSON, CSV, Excel, Parquet, and more. Published on January 11, 2022 by Phil Eaton dsqgosqltutorial What is DataStation?DataStation is an open-source data IDE for developers. It allows you to easily build graphs and tables with data pulled from SQL databases, logging databases, metrics databases, HTTP servers, and all kinds of text and binary files. Need to join or munge data? Write embedded scripts as needed in Python, JavaScript, Ruby, R, or Julia. All in one application. dsq is a commandline tool for running SQL queries against all the data file types that DataStation supports (JSON, CSV, Parquet, Excel, etc.). Here is a simple example: $ cat users.csv name,age Jorge,30 Nina,32 Cal,28 Mona,27 $ dsq users.csv "SELECT * FROM {} ORDER BY CAST(age AS INT) DESC" | jq [ { "age": "32", "name": "Nina" }, { "age": "30", "name": "Jorge" }, { "age": "28", "name": "Cal" }, { "age": "27", "name": "Mona" } ] dsq uses DataStation libraries under-the-hood and DataStation in turn uses SQLite for these kinds of queries. The dsq source code is available on Github. Install dsq originated as a subproject in the DataStation repo but now exists on its own. You can install it with Go 1.17+: $ go install github.com/multiprocessio/dsq@latest Multiple files and joins If you pass multiple files to dsq, they become available under the alias {N} where N is the 0-based index of the list of files passed on the commandline. {} is an alias for {0}. For example: $ cat ids.json [ {"id": 1, "name": "Jorge"}, {"id": 2, "name": "Mona"}, {"id": 3, "name": "Cal"}, {"id": 4, "name": "Nina"} ] $ dsq users.csv ids.json "SELECT id, {0}.name FROM {0} JOIN {1} ON {0}.name = {1}.name" | jq [ { "id": 1, "name": "Jorge" }, { "id": 4, "name": "Nina" }, { "id": 3, "name": "Cal" }, { "id": 2, "name": "Mona" } ] File type support As of today dsq supports: JSON, CSV, TSV, Excel, Parquet, OpenOffice Sheets, and Apache2 and nginx logs. This list will continue to grow over time. The complete list of file extensions and mimetypes is in the README here. Comparison to other tools There are a number of existing tools like q, textql and octosql. These are mature tools and there wouldn't be a need for dsq. But DataStation already supports the core logic so exposing this CLI is very simple. Moreover DataStation supports more file types than these existing tools. A detailed comparison table is available in the repo README here. Try it out! Give it a go, share any bugs or feedback! The code is available on Github. Share dsq now supports joining against multiple files (of any supported type!). It also now supports .tsv and .ods (OpenOffice Sheets) files. And it has a brand new repo so that it's easier to find. Read more in the blog post here:https://t.co/MVjuWlAMwy pic.twitter.com/FOXTx5BfrL -- Multiprocess Labs (@multiprocessio) January 11, 2022 Questions? Feedback? Feel free to reach the author at phil@multiprocess.io. Subscribe for more posts like this! DataStation Documentation Blog Community Twitter Github Contact (c) 2021-2022 Multiprocess Labs LLC