[HN Gopher] Dataflow, a self-hosted Observable notebook editor
       ___________________________________________________________________
        
       Dataflow, a self-hosted Observable notebook editor
        
       Author : tosh
       Score  : 121 points
       Date   : 2021-05-13 18:28 UTC (4 hours ago)
        
 (HTM) web link (observablehq.com)
 (TXT) w3m dump (observablehq.com)
        
       | lejohnq wrote:
       | This is pretty awesome. Feels like a streamlit for the javascript
       | world.
        
       | FormFollowsFunc wrote:
       | I've been looking for something like this for data vis
       | exploration. Compared to Observable accessing local data files is
       | more convenient. Currently I use a Jupyter notebook along with
       | Pandas and Matplotlib. I'm not a huge fan of Matplotlib so I
       | would prefer to use Plot or Vega Lite API and Pandas could be
       | replaced with Danfo.js or Arquero.
        
       | [deleted]
        
       | whoevercares wrote:
       | How does this related to data flow or it's just a brand name
        
       | RocketSyntax wrote:
       | Help me understand what the page being rendered is doing. Is that
       | like an interactive app you are serving for user input?
        
         | qbasic_forever wrote:
         | It's an observable notebook: https://observablehq.com/
         | Basically a notebook where you write JS code and see the
         | results immediately rendered in the notebook. In this case it's
         | being served locally instead of requiring you to use their
         | service website. If you've ever used Jupyter or IPython this is
         | very similar (code notebooks) but with some interesting changes
         | in philosophy and more of a Javascript implementation instead
         | of python.
         | 
         | What might be tripping you up is that in this demo the
         | observable notebook isn't showing the code cells, only the
         | outputs. The code is in the editor on the left and the output
         | on the right is the result of running the code as an observable
         | notebook. In some ways it is like a simple interactive web app.
        
           | Isthatablackgsd wrote:
           | Is that similar concept to Overleaf for LaTeX?
        
       | chrisweekly wrote:
       | OK! I can't put off creating an observablehq acct any longer.
       | 
       | ... Done. Stoked to dive in this weekend!
        
       | simonw wrote:
       | This project looks fantastic.
       | 
       | I adore Observable notebooks, but the one thing that makes me
       | hesitate in using them for everything is that the editor
       | component itself is closed-source and only available on
       | https://observablehq.com/
       | 
       | They're great open source ecosystem supporters - they released
       | their runtime, their parser, their standard library and all sorts
       | of other stuff through https://github.com/observablehq - but the
       | editor itself is their proprietary sauce.
       | 
       | I totally support their decision on this - it's what they're
       | building their business around, and I want them to be successful.
       | But as a user it does give me pause.
       | 
       | This project from Alex Garcia looks like a fix for exactly that.
       | Having more-than-one editor for their notebook format (and an
       | open source option a that) resolves my hesitancy in leaning hard
       | into their ecosystem.
       | 
       | I don't even see it as a competitor to ObservableHQ - the hosted
       | Observable editor has collaboration features that don't even make
       | sense for a local running version.
       | 
       | Plus, Dataflow has some great ideas of its own - in particular
       | the live file attachments thing.
        
         | edtechdev wrote:
         | Yeah the lack of open source prevented me from committing to
         | observable, too, so I look forward to trying dataflow out.
         | 
         | Just in case this is of interest to others, some other open
         | source browser-based computational notebook tools include:
         | 
         | * Starboard https://starboard.gg/ * And of course there's
         | always Jupyter, but it requires a server component
         | 
         | And this isn't the same thing, more of a javascript playground
         | (open source alternative to codepen and the like), but see also
         | Slingcode: https://slingcode.net/
        
       | nautilus12 wrote:
       | I see all these notebooks products and I honestly don't know how
       | any of them plan to compete with AWS...no body wants self hosted
       | anymore, everyone just wants to pay AWS or databricks for it.
       | 
       | Can other people chime in? Maybe i'm just working at the wrong
       | place.
        
         | qbasic_forever wrote:
         | It's running on localhost here and I presume that's their
         | intended use case for this feature. Localhost is critical for
         | development--imagine if VS code wouldn't work unless you were
         | connected to Github.com. This is fixing that issue with
         | observable notebooks so now you can run and develop your
         | notebook locally without depending directly on the internet or
         | their cloud service.
        
         | simonw wrote:
         | https://observablehq.com/ is a cloud hosted platform already.
         | 
         | This thing - Dataflow - is an open source run-on-your-own-
         | machine alternative to the official Observable hosted solution,
         | taking advantage of the fact that Observable itself is
         | JavaScript code with some special sauce that's available as
         | open source runtime/parser libraries.
        
       | [deleted]
        
       | mistidoi wrote:
       | As a total Observable/Bostock stan who works with HIPAA protected
       | data, I love this.
        
       | d--b wrote:
       | I am also working on an alternative: https://www.jigdev.com
       | 
       | It's the same idea except that cells are spread out on a 2d
       | canvas with tabs similar to excel.
        
       | keeganj wrote:
       | I'm not a data scientist, but I've been interested in the idea of
       | a "code notebook" ever since Jupyter hit it big. I write mostly
       | in JS/TS for application logic, so this looks like it could be
       | really useful.
       | 
       | Related, does anyone have any recommendations of a (Postgres) SQL
       | "notebook"? I don't really need any visualizations, more just a
       | markdown integrated doc that allows me to lay out the different
       | queries I use to answer a question.
        
         | amcaskill wrote:
         | I am working on a SQL-in-markdown reporting tool called
         | evidence.
         | 
         | It's feels like a markdown doc that runs SQL.
         | 
         | https://evidence.dev/
        
           | keeganj wrote:
           | This is almost exactly what I was imagining. Just subscribed
           | to updates, very interested to see what this becomes!
        
         | qbasic_forever wrote:
         | I like the ipython-sql magic in Jupyter:
         | https://github.com/catherinedevlin/ipython-sql Depending on
         | what you're doing you might be able to get away entirely with
         | just using it and some basic queries, i.e. no python glue code
         | in the notebook at all. But worst case you might need a cell to
         | open up the DB connection and make the magic aware of it, then
         | you can execute clean and simple SQL queries in cells using the
         | magic.
        
         | robertlacok wrote:
         | Deepnote has native Postgres cells :) you can mix them with
         | Python too.
         | 
         | Disclaimer - I work there :)
        
         | gradys wrote:
         | Maybe just a Python notebook with a Postgres client library and
         | some helper functions to keep the amount of Python in the main
         | body to a minimum?
        
         | Hasnep wrote:
         | Rmarkdown notebooks can contain SQL chunks, so you'd only need
         | to use R to configure the connection. [1]
         | 
         | [1] https://bookdown.org/yihui/rmarkdown/language-
         | engines.html#s...
        
           | keeganj wrote:
           | I didn't know you could write SQL directly in Rmarkdown like
           | this, very interesting. Thanks!
        
             | pbowyer wrote:
             | Same, when I've read the docs I've always got the
             | impression that it was R only supported.
        
         | sixdimensional wrote:
         | Apache Zeppelin is one open source option -
         | https://zeppelin.apache.org.
        
         | RocketSyntax wrote:
         | Lots of jupyter magic `%` commands for that already
         | https://www.datacamp.com/community/tutorials/sql-interface-w...
        
         | natrys wrote:
         | Emacs and Org-mode has great integration with multiple SQL
         | implementations including Postgres (via org-babel). Org-mode
         | tables are pretty neat, and you can have query result directly
         | populated into tables. Read this blogpost if you are
         | interested:
         | 
         | https://fluca1978.github.io/2021/01/18/PostgreSQLLiteratePro...
        
         | tlarkworthy wrote:
         | https://observablehq.com/@observablehq/databases
        
         | simonw wrote:
         | Weirdly my Django SQL Dashboard project may fit the bill a bit
         | here: you can build up a "dashboard" (which is a tiny bit
         | notebook-like if you squint at it the right way) with multiple
         | SQL queries on it, and save that either as a bookmark or as a
         | "saved dashboard" with a URL.
         | 
         | https://django-sql-dashboard.datasette.io/
         | 
         | In my own work I've been using it for the kind of things that I
         | would normally use a Jupyter notebook for - gathering together
         | research on problems I'm trying to solve.
        
           | keeganj wrote:
           | Interesting take, I'm not deep in the python ecosystem, but
           | this looks like it's lightweight enough to function as a
           | refreshable notebook. Will give this a try, thanks!
        
         | javierluraschi wrote:
         | For viz/DS/ML/AI with JS/TS is either observablehq or and IDE
         | with custom extensions; this project looks relevant if you are
         | already into observablehq.
         | 
         | Shameless plug, we are building a few tools for JS to narrow
         | down this gap as well: - https://hal9.ai (Drag&Drop / IDE) -
         | https://marketplace.visualstudio.com/items?itemName=Hal9.hal...
         | (VSCode extension) -
         | https://observablehq.com/@javierluraschi/running-nodejs-in-o...
         | (ObservableHQ extension)
         | 
         | Would love to chat if you are interested in providing feedback,
         | I'm in javier at hal9.ai. Cheers.
        
         | Siira wrote:
         | org-babel should fit the bill.
        
         | shapiromatron wrote:
         | re: sql notebook, this came up a few months ago and worked
         | great when I played around with it:
         | https://blog.jupyter.org/an-sql-solution-for-jupyter-
         | ef4a00a.... It's just a different kernel you can install to an
         | existing jupyter instance.
        
         | okennedy wrote:
         | It's based on Spark rather than Postgresql directly, but I'm
         | part of an effort to build a workflow system disguised as a
         | notebook callled Vizier [1]. SQL is a first-class primitive in
         | Vizier, and the notebook plays nice with postgres (you can load
         | from and unload to postgres using Spark's native data loader).
         | 
         | [1] https://vizierdb.info
        
       | thirtyseven wrote:
       | I know "dataflow" is kind of a generic name, but the authors
       | might want to consider that there is already a 7 year old Google
       | Cloud product for running data pipelines called Dataflow.
        
         | taftster wrote:
         | Came here to post the same comment. Exactly right. There are
         | lots of projects that use the term "dataflow".
         | 
         | To add to this, the name of this product is confusing given the
         | context and usecase shown. I assume "dataflow" to the author
         | means the ability to watch data being rendered on a page?
         | 
         | To "big data" folks (like myself), the term "dataflow" tends to
         | represent the routing and processing of data streams along an
         | information pipeline. Not anything to do with a visual
         | representation of a dynamic notebook.
        
         | marcinzm wrote:
         | And a Cloudera project:
         | https://www.cloudera.com/products/cdf.html
         | 
         | And an Azure feature: https://docs.microsoft.com/en-
         | us/azure/data-factory/control-...
         | 
         | And a Spring feature: https://spring.io/projects/spring-cloud-
         | dataflow
        
           | rectang wrote:
           | And an entire programming discipline.
           | 
           | https://en.wikipedia.org/wiki/Dataflow_programming
        
           | [deleted]
        
       ___________________________________________________________________
       (page generated 2021-05-13 23:00 UTC)