[HN Gopher] Show HN: DataStation - App to easily query, script, ...
___________________________________________________________________
Show HN: DataStation - App to easily query, script, and visualize
data
Author : eatonphil
Score : 45 points
Date : 2022-05-31 20:10 UTC (2 hours ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| canMarsHaveLife wrote:
| How does it compare to Redash (now Databricks SQL):
| https://github.com/getredash/redash?
| [deleted]
| eatonphil wrote:
| I haven't used it but just from looking at the Github page. It
| looks like redash has more advanced dashboarding features today
| (I'd like to catch up here). In contrast redash doesn't really
| allow you to manipulate data very much if it doesn't come in a
| form you want or if you can't get it into the right form with
| SQL alone.
|
| DataStation allows you to script results of database queries
| (or loaded Parquet, Excel, CSV, etc. files or HTTP API
| responses) in Python, Node, R, Julia, etc.
|
| Also, DataStation is first-off a desktop app today so it's very
| easy to install and use -- especially in a corporate
| environment. Data never leaves your laptop. In the future I
| think more people will use the server version of DataStation so
| you can get server features like recurring exports and hosted
| dashboards but desktop will always be supported too.
| programmarchy wrote:
| Looks very useful! In terms of feedback, I think if you brought
| in a designer you'd have a much bigger "wow" factor. There's a
| lot of low hanging fruit like consistent button styles, fonts,
| whitespace, larger text inputs, that'd go a long way. And I'm
| sure you've thought of this already, but seems like a node-based
| paradigm could be an improvement over the panel-based paradigm
| e.g. more akin to something like Blender nodes, or Tableau.
| eatonphil wrote:
| > In terms of feedback, I think if you brought in a designer
| you'd have a much bigger "wow" factor. There's a lot of low
| hanging fruit like consistent button styles, fonts, whitespace,
| larger text inputs, that'd go a long way.
|
| Yes this would be nice to have! If there's a version of this
| that gets funded or bootstrapped then I'd definitely like to
| bring someone on to help.
|
| > And I'm sure you've thought of this already, but seems like a
| node-based paradigm could be an improvement over the panel-
| based paradigm e.g. more akin to something like Blender nodes,
| or Tableau.
|
| Actually no I'm not familiar with this concept. But I have seen
| what natto.dev does and I'm concerned that that is too free
| form compared to how DataStation works. A little structure is
| useful IMO. I'm not sure how similar Blender nodes or Tableau
| are to natto.dev.
|
| That said, DataStation panels show up in an order but the order
| of evaluation is not set. You can import the results of a panel
| defined below the current panel it just matters that the panel
| you refer to has been _run_. So it may be closer to a node-
| based design in that case. But again I 'm not sure if that's
| what you mean.
| programmarchy wrote:
| Hadn't seen natto before, but I agree that's pretty far out
| there! If you search images of Tableau Prep, that's more
| along the lines of what I had in mind. Although Tableau
| supports Python and R, it's not nearly as well integrated as
| what you've done with DataStation. In general, it's more
| geared towards Excel power user types, rather than
| programmers.
| eatonphil wrote:
| > If you search images of Tableau Prep, that's more along
| the lines of what I had in mind.
|
| Ah! I think this is a visualization of what does happen
| with DataStation panels too. Eventually I'd like to have
| better support for understanding the dependency graph like
| this but for now that's just been a nice idea to have
| sometime in the future.
|
| > Although Tableau supports Python and R, it's not nearly
| as well integrated as what you've done with DataStation. In
| general, it's more geared towards Excel power user types,
| rather than programmers.
|
| Yeah it was definitely my impression it was not geared
| toward programmers as much (though I know many programmers
| or data scientists use it).
| bamazizi wrote:
| The UX reminded me of [PipeDream](https://pipedream.com/)
|
| The industry around abstractions tools/ui on top DBs is growing.
| We use Retool very heavily and it does get pricy.
|
| This is a very neat execution and has potential for SAAS or Cloud
| offering. Like "Bring your own DB" and build your own
| abstractions.
| [deleted]
| eatonphil wrote:
| > This is a very neat execution and has potential for SAAS or
| Cloud offering. Like "Bring your own DB" and build your own
| abstractions.
|
| Definitely my goal for the future is SaaS/Cloud where you can
| work on projects as a team and configure hosted dashboards,
| recurring exports and alerts out of panels you set up in a
| DataStation project.
| eatonphil wrote:
| Hey folks! I quit my job at Oracle almost a year ago now to build
| DataStation. It's an app I've wanted as an engineering manager
| for years. It's entirely open-source and while I've had a few
| awesome contributors I'm mostly the only person on it. It has
| been funded out of contract development and savings.
|
| DataStation helps you query a variety of data sources
| (conventional SQL like PostgreSQL and MySQL, non-SQL like
| Prometheus or Elasticsearch), files and HTTP APIs. It is not a
| SQL layer on top of these various APIs like FDW in Postgres or
| Apache Calcite.
|
| DataStation just tries to abstract away glue code. So in
| DataStation for Prometheus you query with PromQL. For
| Elasticsearch you query with Lucene. And for SQL databases you
| query with their SQL dialect. But you don't need to remember how
| to use the appropriate library for your language. You just need
| your own credentials.
|
| DataStation is made of panels (other apps might call them cells)
| that each produce a result. Panels can refer to other panels.
| These allow you to build workflows that cross the boundary of a
| particular datasource. For example you might have some data in a
| CSV a product manager gave you and the bulk of your data is in
| PostgreSQL. In DataStation you could pull in the CSV with a File
| panel and pull in the Postgres data with a Database panel. Then
| you can join both panel results in a Code panel using your
| favorite language like Python, Ruby, R, Node, Julia, etc. You can
| even script Code panels in a SQLite dialect with a bunch of rich
| addons (url parsing, best-effort date parsing, statistics
| aggregation, etc.): https://github.com/multiprocessio/go-
| sqlite3-stdlib.
|
| You can watch a simple introductory video:
| https://www.youtube.com/watch?v=q_jRBvbwIzU. Or if you want to
| see that cross-datasource interaction taken to an extreme, check
| out this video using Postgres metadata to filter log data in
| Elasticsearch to do historic request analysis on a subset of
| customers: https://www.youtube.com/watch?v=tIh99YVHoRE.
|
| DataStation is mainly a desktop app today where the end result is
| that you export graph SVGs or HTML tables or markdown tables or
| just a CSV file. All this data stays on your laptop so it's as
| easy to use in a corporate environment as any existing SQL IDE or
| Jupyter Notebook.
|
| In the last year it's reached 1.5k stars on Github, over 1000
| unique users and currently on-average about 40 fairly active
| users per month (defined as having opened the app more than a few
| times).
|
| Since it's only just now 12 months old it's been going through a
| lot of maturing during this time. If you've tried it before and
| it was buggy or too slow it's probably worth another try now if
| you're still interested.
|
| DataStation is primarily an Electron app but the code that
| evaluates panels is written in Go. The Go evaluation code forms
| the backbone of another app you may have seen around HN, dsq:
| https://github.com/multiprocessio/dsq, which is a limited version
| of DataStation as a CLI for querying files with SQL.
|
| In the future I'd like to see more people using it as a server
| app where my goal is to support read-only dashboards and
| recurring exports. That part is still work-in-progress.
|
| You can find a ton of tutorials on how to interact with supported
| databases on the DataStation website:
| https://datastation.multiprocess.io/docs/.
|
| Looking forward to your feedback!
| lopatin wrote:
| This is really cool. Maybe in the future you can make a paid
| version with a bunch of BI features.
|
| In your opinion, how does it compare to PyCharm (Enterprise
| version) when it's all blinged out with big data tools and
| integrations? I recently realized that PyCharm is my Data IDE
| and not just my Python editor. I only use limited features
| though, so hard for me to compare the extent of functionalities
| between the two.
|
| Edit: Well, PyCharm won't let you join two different data
| sources, so that's one big difference!
| eatonphil wrote:
| > Edit: Well, PyCharm won't let you join two different data
| sources, so that's one big difference!
|
| Right!
|
| On the other hand, any real code IDE will have high-quality
| autocomplete, jump-to-definition, all that code IDE stuff. In
| the future DataStation may be able to hook into tree-sitter
| or LSP but for now it's more like a textarea with syntax
| highlighting (although the SQL code panel autocomplete is
| relatively complete).
|
| Similarly, SQL IDEs have better exploration of your database.
| DataStation can't tell you about which tables or schemas
| exist yet (although I want it to in the future).
|
| DataStation competes more directly with _Python scripts_ than
| with SQL IDEs and code IDEs (although there is of course
| overlap).
| tyingq wrote:
| It does look at bit like parts of Tableau's desktop
| product.
| eatonphil wrote:
| I haven't used Tableau but I have had some people show up
| in Discord to ask about using DataStation as an
| alternative. So maybe it is similar, but I don't know.
| alashow wrote:
| Any reason for not having a web client?
| eatonphil wrote:
| You can run it as a web server! It's just not as commonly
| done right now since I haven't put much time into integration
| with cloud providers (stuff like CloudFormation templates I
| mean) and I don't yet have a public Docker image that is up
| to date.
|
| https://datastation.multiprocess.io/docs/0.11.0/DataStation_.
| ..
| moltar wrote:
| Looks amazing.
|
| Will try tomorrow. Athena alone is a superior offer in my mind.
| Even TablePus, my favourite SQL client doesn't do that :)
|
| If you can add dbt integration it will be a killer product!
|
| Thank you!
| eatonphil wrote:
| Thanks for the kind words!
|
| The only caveat I'll say is that it's definitely not as mature
| in general as SQL clients (stuff like table, column discovery
| and autocomplete does not exist yet). But it is pretty
| convenient to use DataStation if you like being able to easily
| switch into Python/JavaScript/whatever without needing to look
| up the docs for how to connect to and run a query against every
| database.
|
| > If you can add dbt integration it will be a killer product!
|
| I haven't used dbt and my impression was that it was a glue
| system for copying data from one place to another. But maybe
| that's not correct. Is it possible to query dbt data directly?
| Or how would you imagine it fitting into a DataStation flow.
| Thank you!
___________________________________________________________________
(page generated 2022-05-31 23:00 UTC)