hngopher.com

       [HN Gopher] IPyflow: Reactive Python Notebooks in Jupyter(Lab)
       ___________________________________________________________________
        
       IPyflow: Reactive Python Notebooks in Jupyter(Lab)
        
       Author : smacke
       Score  : 171 points
       Date   : 2023-05-10 08:30 UTC (14 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | uniqueuid wrote:
       | Reactivity is great. Is there any framework for using it without
       | a REPL?
       | 
       | I.e. to define a DAG of tasks and have them executed as needed? I
       | know existing workflow engines, and they are typically not
       | reactive but rather work on batches.
        
         | qbasic_forever wrote:
         | You probably want rxpy, a port of the reactivex extensions from
         | many other languages to python:
         | https://rxpy.readthedocs.io/en/latest/
        
         | krawczstef wrote:
         | From a nuts and bolts perspective, I've been thinking of
         | building some reactivity on top of https://github.com/dagworks-
         | inc/hamilton (author here) that could get at this. (If you have
         | a use case that could be documented, I'd appreciate it.)
        
       | quickthrower2 wrote:
       | That's a pretty nice idea. The problem of knowing what state has
       | been invalidated often drives me away from using a notebook. So
       | it is nice to see this solved.
        
         | BiteCode_dev wrote:
         | I often though we would benefit from having some kind of shell,
         | only a mix between ipython qtconsole et jupyter.
         | 
         | Not an editor like jupyter, rather a shell with a REPL flow.
         | But each prompt is like a jupyter cell, and the whole history
         | is saved in a file.
         | 
         | But if you don't create a file, it should work as well. One of
         | the annoying things about jupyter is that you can use it
         | without file on disk unlike ipython shell.
        
           | westurner wrote:
           | jupyter_console is the IPython REPL for non-ipykernel jupyter
           | kernels.
           | 
           | This _magic command_ logs IPython REPL input and output to a
           | file:                 %logstart -o example.log.py
        
         | smacke wrote:
         | "solved" is a very generous adjective
        
       | singhrac wrote:
       | Can you explain a little more about how it works? Does it handle
       | cases like loops correctly (or self-referencing cells)?
       | 
       | Is this running a CPython fork, or how does the lineage tracking
       | work? Are the values "x" and "y" in the quickstart example still
       | simple Python int types, or are they a wrapped type?
       | 
       | The papers seem very interesting but even as an early adopter of
       | tools like this I'd like to know what the limits and expectations
       | are, and some docs would really help.
        
         | smacke wrote:
         | It's running on top of vanilla CPython, but with heavy
         | instrumentation via sys.settrace as well as ast
         | transformations. x and y are just normal Python ints. The
         | downside is the overhead, but it's paid mainly for top-level /
         | module-level statements, so I've found it to be acceptable in
         | practice. The benefits are on portability -- as it scales for
         | all the major Python versions supported today (3.6 to 3.11),
         | and even for some different Python runtimes such as Cinder.
         | 
         | More details in this paper:
         | https://smacke.net/papers/nbslicer.pdf
         | 
         | But overall I agree I need to get on top of the docs and talk
         | in more depth about the implementation there.
        
       | smacke wrote:
       | Hi -- author here. I just presented this work at JupyterCon 2023
       | so I figured it was time to advertise more broadly. There are a
       | few rough edges but my hope is that, by making all the reactive
       | behavior opt-in and only enabled for in-order execution (i.e.,
       | cells above the one I execute will never reactively execute by
       | default), it can be predictable enough to be useful in practice.
       | 
       | There's still a long way to go to get e.g. full dataflow
       | understanding of all the common libraries, understanding file
       | paths, autoreload integration, etc., but after nearly 3 years of
       | on-and-off development I think it's finally useable-ish.
        
         | Micoloth wrote:
         | Crazy seeing this here!
         | 
         | I searched for this last week, as I'm playing with building the
         | same thing but as a VSCode extension.. See here [1]
         | 
         | I found another similar project on Github, but it was from many
         | years ago. Yours did not turn up..
         | 
         | Very interested in finding out how you implemented it
         | 
         | [1] https://github.com/micoloth/vscode-reactive-jupyter#readme
        
           | smacke wrote:
           | We have a couple of papers that go into some of the details.
           | 
           | https://smacke.net/papers/nbsafety.pdf
           | https://smacke.net/papers/nbslicer.pdf
           | 
           | It looks like you are using a static approach for dependency
           | inference. There are a lot of benefits to static approaches,
           | but they can only get you so far. My JupyterCon presentation
           | includes a bunch of examples where dynamic approaches are a
           | must: https://t.ly/78rS
           | 
           | Besides that, there are a bunch of interesting design
           | decisions about when to add edges between cells, when to
           | break them, what metadata to annotate edges with, etc.
           | 
           | I'm hoping to abstract a way a bunch of the complexity by
           | developing something like a runtime version of a language
           | server protocol (working name "language kernel protocol") so
           | that any editor that implements the protocol would get
           | reactivity for free when running a kernel that likewise
           | implements the protocol. I have an early version of this
           | which is how IPyflow works for both Jupyter and JupyterLab;
           | VSCode would be a great editor to add support for next.
        
         | eigenspace wrote:
         | Will this approach ever be usable with other Jupyter langauges?
         | Like, do you have an API for another language to tell you what
         | the code dependency graph is? Or is Python a fundamental
         | assumption here?
        
           | smacke wrote:
           | For this particular project, Python is a requirement.
           | 
           | For the general approach, the answer is more complicated. It
           | depends on what hooks the language implementation exposes --
           | and even if it exposes enough to make this work in theory,
           | tracking dataflow at the same level of accuracy and
           | granularity as IPyflow does may not be possible without
           | taking an unacceptable performance hit, or without
           | sacrificing portability across language versions.
           | 
           | My hope is that the approach can scale to languages like
           | Julia or R, but I'm not as familiar with those languages as I
           | am with Python, and I kind of suspect each language may
           | require its own bespoke tricks.
           | 
           | Regardless, for Python it was a journey roughly 3 years in
           | the making (and still ongoing) -- other languages would be
           | easier now that I've learned a fair amount, but the work to
           | add this kind of support is by far the most complex I've ever
           | done.
        
         | analog31 wrote:
         | I'll give this a try. Managing "hidden state" in notebooks is a
         | known flaw of Jupyter. If nothing else, an indicator that says,
         | "this code is dirty" would be useful.
         | 
         | I have a long standing habit of doing "restart kernel and run
         | all cells" before walking away from a session, to help avoid
         | this. I'd rather see it break in front of me than have it break
         | 6 months later or in someone else's use.
        
         | spenrose wrote:
         | Kudos! When at Mozilla c.2016, I tried to work with the core
         | Jupyter team on solving the stale-cell problem. I couldn't find
         | a path forward that they would consider. Glad to see someone
         | making progress.
        
           | smacke wrote:
           | I would be very surprised if something like this gets support
           | in core Jupyter -- there's a lot of added complexity.
           | Fortunately it is doable as extensions for Jupyter /
           | JupyterLab.
        
       | youssefabdelm wrote:
       | I love Jupyter notebooks I just wish they looked as good as
       | observable notebooks[1], not just in the overall layout but the
       | charts/graphs you could make in general (plotly, matplotlib, etc
       | don't even come close to d3.js, Observable Plot, etc)... I don't
       | know why there seems to be a hole in the Python ecosystem for
       | good designers or something
       | 
       | This seems to be a step in the right direction with reactivity
       | though. But it's not instant like Observable notebooks. But still
       | good
       | 
       | [1] https://observablehq.com
        
         | n8henrie wrote:
         | Doesn't Altair use something like D3 behind the scenes? I guess
         | it's Vega.
         | 
         | https://pypi.org/project/altair/
        
         | smacke wrote:
         | IPyflow takes a round trip from client to kernel for each
         | execution (including reactive executions) -- this approach is
         | necessary to get the best possible accuracy when determining
         | dataflow in a highly dynamic language like Python, but it is an
         | architectural limitation that prevents the reactivity from
         | feeling as instant as in Observable or Pluto.
        
       | krawczstef wrote:
       | Reminds me of this project from Stitch Fix from years ago --
       | https://github.com/stitchfix/nodebook. Ahead of it's time I
       | guess..
        
       | spprashant wrote:
       | Now I wonder why this isn't an option in plain Jupyter.
       | Inconsistent cell states and having to re-execute all cells after
       | a single line change slows me down a lot.
       | 
       | Like I get why this doesn't need to be default, but this seems
       | crucial enough to warrant being included in the base package.
        
         | smacke wrote:
         | It's very new, and the current frontend implementations for
         | Jupyter and JupyterLab include some workarounds for fundamental
         | protocol-level limitations that probably make adding this kind
         | of feature as part of the core package a no-go (without first
         | addressing the core protocol limitations).
        
       | vrglvrglvrgl wrote:
       | [dead]
        
       | flusteredBias wrote:
       | I kind of think quarto is a much better solution to the problems
       | that notebooks try to solve plus you get the added bonus of
       | having plain text as the file source.
        
         | esafak wrote:
         | How's the multi-user story? Do you use Quarto at work with
         | other people?
        
       | wodenokoto wrote:
       | I believe this is what Pluto sets out to do for Julia.
       | 
       | I used it as part of the "Computational Thinking" with Julia
       | course a year or two back. Even then the beta software was very
       | good and some of the demos the Pluto dev showed were nothing
       | short of amazing
       | 
       | https://plutojl.org/
        
         | O_H_E wrote:
         | Yeah, Pluto rules!
         | 
         | After getting used to it with Julia I found it really jarring
         | to go back to plain Jupyter (when I need python) where I have
         | to keep re-executing the cells.
         | 
         | This is going to make that much less painful.
        
         | rsfern wrote:
         | Pluto is awesome
         | 
         | Looks like by default you have to manually trigger reactivity
         | in ipyflow, but there is a `%flow mode reactive` ipython magic
         | mode that enables Pluto-style reactivity!
        
           | smacke wrote:
           | Yep. I think there is also a way to enable it by default in
           | your ipython profile which I'll document at some point, so
           | that you don't have to run `%flow mode reactive`. I'm curious
           | though -- personally I much prefer to use the opt-in reactive
           | execution mode with ctrl/cmd+shift+enter; curious to
           | understand your preferences better :)
        
             | TuringTest wrote:
             | An always-reactive notebook is essentially a "literate
             | spreadsheet", where you have data cells in between
             | multimedia descriptions. In this model, all computed data
             | is always up to date with whatever changes you make to the
             | input parameters, including things like graphics connected
             | to interactive sliders and text boxes. You can prototype
             | the logic of an application very fast with real data and
             | interactions.
        
             | rsfern wrote:
             | Your ipython profile suggestion is good, I use that for
             | `%autoreload` so I don't see why it wouldn't work for
             | ipyflow
             | 
             | interesting question, I'm going to have to try the opt in
             | reactivity in ipyflow because it's not an option in Pluto.
             | Actually that's kind of a strength, one point of
             | frustration in Pluto is accidentally triggering reactive
             | execution of an expensive cell before everything is ready
             | 
             | I think the thing I like most about always-on reactivity is
             | that the state of the REPL and outputs can never become
             | stale. I used to run into that in jupyter a lot as a
             | (physical sciences) student writing hacky prototype code
             | with implicit control flow... nice for debugging but in the
             | long run it's quite painful.
             | 
             | The nearest thing I had found in python is streamlit, but
             | it is not as smooth as Pluto IMO. Looking forward to trying
             | ipyflow, honestly I have been hoping for something like
             | this for a while because using Pluto+PyCall as a jupyter
             | replacement is a bit too cumbersome for python-forward
             | projects
        
       | sixhobbits wrote:
       | Looks like it solves a common problem but the page is a bit
       | confusing. It could make it clearer upfront what it does (I
       | didn't know what reactive meant in this context) and how it
       | relates to Jupyter (I thought it was official/core stuff at
       | first, but I take it its a third party tool that integrates into
       | jupyter).
       | 
       | Stuff like "Trust me? Good." in the introduction doesn't really
       | help me answer "wtf does this do" more quickly and the first
       | intro sentence is pretty long and convoluted.
        
         | sonofaragorn wrote:
         | Agreed on the confusing page. I use notebooks every single day
         | but a quick glance at the README gave me zero indication of
         | this being something I need
        
           | TuringTest wrote:
           | On the other hand, the link description alone was enough to
           | convince me that this is something I want.
           | 
           | Having a very specific target makes it easier to reach that
           | target in writing, I guess, and harder for people outside the
           | target to understand what it's about.
        
             | sixhobbits wrote:
             | Well that was my point. I am part of the target audience
             | and I would find this useful but it took me a while to
             | realise that.
             | 
             | Of course, maybe I am just incompetent or missing knowledge
             | that everyone else in the target has.
        
               | TuringTest wrote:
               | Well, other than Observable, reactive notebooks are not
               | that common and well known (precisely because Jupyter,
               | which is the most famous, didn't support that model
               | before).
               | 
               | So maybe today is the first day that you are exposed to
               | that model and you learn about it? There's always a first
               | time.
        
         | tuanm wrote:
         | Indeed. Some concepts are overused and I still don't know what
         | they exactly are.
         | 
         | Anyway, it seems to solve few UX problems when working with
         | Jupyter Notebooks.
        
           | TuringTest wrote:
           | Reactive notebooks are what change the workflow from command
           | line-like to spreadsheet-like.
           | 
           | It may not matter much of you use the notebook as a glorified
           | terminal, but it is a godsend if your workflow involves data
           | analysis with heavy dependencies between filtered subsets.
        
       | boxy_brown wrote:
       | Very cool. This project addresses some usability challenges that
       | the Jupyter community has been debating and struggling with for
       | years:
       | 
       | https://twitter.com/jakevdp/status/935178916490223616
       | 
       | My favorite take on this subject is Joel Grus' talk from
       | JupyterCon 2018 (title: "I don't like notebooks").
       | 
       | https://conferences.oreilly.com/jupyter/jup-ny/public/schedu...
       | 
       | Slides:
       | 
       | https://docs.google.com/presentation/d/1n2RlMdmv1p25Xy5thJUh...
        
         | smacke wrote:
         | Yep that was one of the talks that motivated this work --
         | talked about it in my JupyterCon talk today :)
         | 
         | https://t.ly/78rS
        
       | vchuravy wrote:
       | How closely tied is this to Python? The need for reactivity is
       | what drove the development for Pluto.jl, but it would be nice to
       | have something like this for IJulia.jl as well.
        
       | vchuravy wrote:
       | How closely tied is this to Python? The need for reactivity is
       | what drove the development for Pluto.jl, but it would be nice to
       | have something like this for IJulia.jl as well.
        
         | smacke wrote:
         | For this project, Python is a hard requirement, though it's
         | possible the approach may be applied (after significant effort
         | -- Python took me ~3 years and counting) to other languages /
         | runtimes as well.
        
       | maegul wrote:
       | Nice! From what I've gathered this has been in the works for a
       | while(?)
       | 
       | Thoughts ...
       | 
       | 1. Yea, the Readme could do with a bit of polish. Your hero
       | feature, AFAIU, is the automatic reactivity. This is in your
       | second GIF, put this front and center and make it really clear
       | what is happening. You (and I) know what reactivity looks like so
       | we know what to look for, but someone new to the idea in
       | notebooks could easily blink and miss this. I'd work on a nicer
       | GIF and even a little youtube video just to make it really clear
       | what's going on here. Bostock and ObservableHQ advertised their
       | reactivity a while ago, you might be able to get inspiration for
       | how they demonstrated it?
       | 
       | 2. The syntax extensions are cool! Integration with ipywidgets is
       | Ace!!
       | 
       | 3. Do you have any comments on how ObservableHQ (Javascript
       | runtime by Bostock) and Pluto (inspired by previous) informed or
       | inspired your choices and implementation here? Is this basically
       | the same for python/jupyter as those are for JS/Julia?
       | 
       | 4. Annoying Questions or feature requests ... Are there any
       | overheads? Any timeout facilities for long running code? Can the
       | full variable and/or cell dependency graph be surfaced and
       | visualised (ObservableHQ put this into the UI a while back and it
       | was kinda cool).
       | 
       | Otherwise ... awesome to see this land! Congratulations!!!
        
         | smacke wrote:
         | Thank you for all the feedback, positive and constructive! Yep
         | docs definitely need a lot of polish :)
         | 
         | 3. I actually started from scratch -- ipyflow's reactivity
         | model is a bit different from these, since for Python, my
         | experience is that static dependency inference is too
         | unreliable to be useful. (Though after talking with the Pluto
         | maintainer earlier today it sounds like Pluto may be reaching
         | some of the same conclusions and also be moving toward a
         | dynamic dependency inference strategy)
         | 
         | In the future, my hope is that as a community we will develop a
         | live-coding analogue to lsps which one might call a "language
         | kernel protocol" so that we can standardize some of these
         | features across different languages / editors
         | 
         | 4. For top-level / module-level statements, yes there is lots
         | of overhead (> 100x), but it's largely limited to those
         | statements (i.e. external library calls, recursive function
         | calls, etc have close to 0 overhead thanks to intelligent
         | instrumentation disabling for these) and turns out to be OK in
         | practice (more details in nbslicer paper
         | https://smacke.net/papers/nbslicer.pdf). At some point I'll run
         | it through a profiler and try to grab the low-hanging
         | optimizations but it hasn't been noticeable so far.
         | 
         | Surfacing the DAG is definitely something I want to do at some
         | point; we have all the information in the backend so we should
         | try to surface it in the frontend.
        
       | Helmut10001 wrote:
       | I think it is a great idea, but doesn't apply to 99% of my
       | notebooks because they need to run up to 2 hours to completely
       | execute once, due to data intensive tasks. I usually run all
       | cells once per day, up to cell x (where I left the day before),
       | then continue working by adding cells and updating the state
       | manually until I make progress (=no errors or output is as
       | expected) and move on to the next cell - without re-executing
       | other cells because this would be a productivity nightmare.
        
         | RobinL wrote:
         | Many reactivity frameworks (e.g. observablehq, shiny) recompute
         | intelligently - they're aware of what parts of the calculation
         | has changed and needs recomputing. Haven't checked with
         | ipyflow, but this idea would help mitigate some of your
         | concerns
        
       | TheAlchemist wrote:
       | Looks quite neat.
       | 
       | I'm getting the habit of regularly restart the kernel and re-run
       | everything - just to make sure everything works as expected.
        
         | jerpint wrote:
         | Simply doing "restart + run all" ensures both readability and
         | reproducibility
        
           | orwin wrote:
           | If you have four of five sklearn tiles that take 20 minute to
           | an hour to execute, you might not want to do that too often.
           | 
           | I did not use jupiter notebook/jupyterLab much, but each
           | time, it was in the context of datascience. The first was on
           | an OCR during my internship, the second for data exploration
           | (mix of quantitative/qualitative, but the project was
           | scrapped after a week or two). In both case, having to re-run
           | all each time the kernel was shut down was actually a pain
           | point.
        
             | singhrac wrote:
             | I encountered this a lot in my previous work and found
             | workarounds that I write about here:
             | https://rachitsingh.com/collaborating-jupyter/. At a high
             | level I think being scared of rerunning your kernel is
             | indicative of a code smell, and there's relatively easy
             | ways to combat that.
        
               | smacke wrote:
               | Personally I don't like the "write to disk" approach; I
               | think it kind of just punts the state problem somewhere
               | else (i.e. from memory to disk). Writing to a database
               | and adding versioning is better, but that's a lot of
               | machinery to expect a notebook user to adopt (though
               | maybe better tooling could help). Also a lot of Python
               | objects are not out-of-the-box pickleable (e.g.
               | generators). Also pickle is a mess.
        
       | qumpis wrote:
       | What's the correlation between people who don't use debuggers and
       | people who use notebooks? I can't imagine writing code without a
       | visual debugger, one at a level of pycharm. I think people who
       | use notebooks must either be very active how they write code
       | (don't misremeber variables, complexity of arrays, dicts etc) or
       | have other means of debugging (print? Jupyterlab debugger? pbd?).
       | 
       | I love notebooks for their ability to preload chunks of code/data
       | and have the ability to explore without delay. But having to put
       | mental strain in keeping track of objects is too much for me.
       | Vscode and pycharm have made strides in unifying the experience
       | but it's still very much sub par, at least in my experience.
       | Matlab-like style of executing code with possibility of reusing
       | same debugger solution was perfect.
        
         | anentropic wrote:
         | I don't use notebooks much, but pdb is available in them
        
         | pilotneko wrote:
         | Personally, I use notebooks to do exploratory data analysis and
         | to get model training configured. Any large-scale model
         | training event is converted to a script, and nothing
         | production-facing is in a notebook.
        
         | qbasic_forever wrote:
         | Jupyter lab has a debugger for python:
         | https://jupyterlab.readthedocs.io/en/stable/user/debugger.ht...
         | 
         | Most people write notebooks that are ephemeral and meant for
         | ad-hoc analysis. If a value needs to be inspected it can just
         | be printed in a cell, or even better a fancy widget or graph
         | can display it. You don't need breakpoints as much since you
         | can just choose what cells to execute, or create a throw away
         | cell to grab some values.
         | 
         | Once you need to turn an analysis into a business process or
         | repeatable task it makes sense to move it into a proper python
         | module and use any IDE, debugger, etc.
        
         | jrvarela56 wrote:
         | Not sure if correlated. In Ruby I do a mix of REPL/console,
         | tests and step-through debugging. When using Python, I always
         | use a notebook as a scratchpad - to me it's a REPL but easier
         | to keep tidy. The notebook can be good docs of how things work
         | too, a complement to tests-as-docs as it's easy to show in
         | different (real) contexts.
         | 
         | I sorely miss being able to do this when working on frontend,
         | have tried setting up node console to import files but React
         | just makes it very easy to couple everything. This leaves me
         | with tests as the easiest way to code outside of a view (which
         | has too much friction for playing around). Hot reloading is
         | great but iterating logic in isolation is way harder without a
         | REPL.
        
           | appleiigs wrote:
           | I'd use Ruby more if it would work better in a notebook
           | environment. It appears that iruby is in maintenance mode and
           | falling behind Julia in usability.
        
       | baggiponte wrote:
       | Congratulations! I starred it a long time ago but never used it
       | (sorry). But I do think this IS the way to go for Jupyter. I
       | don't know how I could contribute to this - lack of time, but
       | mostly knowledge, but I would love to find other ways to help.
        
         | smacke wrote:
         | Thank you! If you had tried to use it before, it probably would
         | have broken pretty quickly. Now it will still break, but not so
         | quickly as to not be useful, hopefully.
         | 
         | External contributions are mostly blocked on me right now to
         | improve both user and developer docs (improve = write the first
         | draft in this case).
        
       | __mharrison__ wrote:
       | I spend a good deal of time teaching my students inside of
       | Jupyter.
       | 
       | For Pandas, many problems can be solved by chaining (debugging as
       | you go), converting the chain to a function, and placing the
       | function at the top of the notebook after you load the raw data.
       | 
       | I get the problem this is solving, but adding some congrats and
       | practical software engineering makes for much better notebook
       | experiences.
        
       ___________________________________________________________________
       (page generated 2023-05-10 23:01 UTC)