[HN Gopher] IPyflow: Reactive Python Notebooks in Jupyter(Lab)
___________________________________________________________________
IPyflow: Reactive Python Notebooks in Jupyter(Lab)
Author : smacke
Score : 171 points
Date : 2023-05-10 08:30 UTC (14 hours ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| uniqueuid wrote:
| Reactivity is great. Is there any framework for using it without
| a REPL?
|
| I.e. to define a DAG of tasks and have them executed as needed? I
| know existing workflow engines, and they are typically not
| reactive but rather work on batches.
| qbasic_forever wrote:
| You probably want rxpy, a port of the reactivex extensions from
| many other languages to python:
| https://rxpy.readthedocs.io/en/latest/
| krawczstef wrote:
| From a nuts and bolts perspective, I've been thinking of
| building some reactivity on top of https://github.com/dagworks-
| inc/hamilton (author here) that could get at this. (If you have
| a use case that could be documented, I'd appreciate it.)
| quickthrower2 wrote:
| That's a pretty nice idea. The problem of knowing what state has
| been invalidated often drives me away from using a notebook. So
| it is nice to see this solved.
| BiteCode_dev wrote:
| I often though we would benefit from having some kind of shell,
| only a mix between ipython qtconsole et jupyter.
|
| Not an editor like jupyter, rather a shell with a REPL flow.
| But each prompt is like a jupyter cell, and the whole history
| is saved in a file.
|
| But if you don't create a file, it should work as well. One of
| the annoying things about jupyter is that you can use it
| without file on disk unlike ipython shell.
| westurner wrote:
| jupyter_console is the IPython REPL for non-ipykernel jupyter
| kernels.
|
| This _magic command_ logs IPython REPL input and output to a
| file: %logstart -o example.log.py
| smacke wrote:
| "solved" is a very generous adjective
| singhrac wrote:
| Can you explain a little more about how it works? Does it handle
| cases like loops correctly (or self-referencing cells)?
|
| Is this running a CPython fork, or how does the lineage tracking
| work? Are the values "x" and "y" in the quickstart example still
| simple Python int types, or are they a wrapped type?
|
| The papers seem very interesting but even as an early adopter of
| tools like this I'd like to know what the limits and expectations
| are, and some docs would really help.
| smacke wrote:
| It's running on top of vanilla CPython, but with heavy
| instrumentation via sys.settrace as well as ast
| transformations. x and y are just normal Python ints. The
| downside is the overhead, but it's paid mainly for top-level /
| module-level statements, so I've found it to be acceptable in
| practice. The benefits are on portability -- as it scales for
| all the major Python versions supported today (3.6 to 3.11),
| and even for some different Python runtimes such as Cinder.
|
| More details in this paper:
| https://smacke.net/papers/nbslicer.pdf
|
| But overall I agree I need to get on top of the docs and talk
| in more depth about the implementation there.
| smacke wrote:
| Hi -- author here. I just presented this work at JupyterCon 2023
| so I figured it was time to advertise more broadly. There are a
| few rough edges but my hope is that, by making all the reactive
| behavior opt-in and only enabled for in-order execution (i.e.,
| cells above the one I execute will never reactively execute by
| default), it can be predictable enough to be useful in practice.
|
| There's still a long way to go to get e.g. full dataflow
| understanding of all the common libraries, understanding file
| paths, autoreload integration, etc., but after nearly 3 years of
| on-and-off development I think it's finally useable-ish.
| Micoloth wrote:
| Crazy seeing this here!
|
| I searched for this last week, as I'm playing with building the
| same thing but as a VSCode extension.. See here [1]
|
| I found another similar project on Github, but it was from many
| years ago. Yours did not turn up..
|
| Very interested in finding out how you implemented it
|
| [1] https://github.com/micoloth/vscode-reactive-jupyter#readme
| smacke wrote:
| We have a couple of papers that go into some of the details.
|
| https://smacke.net/papers/nbsafety.pdf
| https://smacke.net/papers/nbslicer.pdf
|
| It looks like you are using a static approach for dependency
| inference. There are a lot of benefits to static approaches,
| but they can only get you so far. My JupyterCon presentation
| includes a bunch of examples where dynamic approaches are a
| must: https://t.ly/78rS
|
| Besides that, there are a bunch of interesting design
| decisions about when to add edges between cells, when to
| break them, what metadata to annotate edges with, etc.
|
| I'm hoping to abstract a way a bunch of the complexity by
| developing something like a runtime version of a language
| server protocol (working name "language kernel protocol") so
| that any editor that implements the protocol would get
| reactivity for free when running a kernel that likewise
| implements the protocol. I have an early version of this
| which is how IPyflow works for both Jupyter and JupyterLab;
| VSCode would be a great editor to add support for next.
| eigenspace wrote:
| Will this approach ever be usable with other Jupyter langauges?
| Like, do you have an API for another language to tell you what
| the code dependency graph is? Or is Python a fundamental
| assumption here?
| smacke wrote:
| For this particular project, Python is a requirement.
|
| For the general approach, the answer is more complicated. It
| depends on what hooks the language implementation exposes --
| and even if it exposes enough to make this work in theory,
| tracking dataflow at the same level of accuracy and
| granularity as IPyflow does may not be possible without
| taking an unacceptable performance hit, or without
| sacrificing portability across language versions.
|
| My hope is that the approach can scale to languages like
| Julia or R, but I'm not as familiar with those languages as I
| am with Python, and I kind of suspect each language may
| require its own bespoke tricks.
|
| Regardless, for Python it was a journey roughly 3 years in
| the making (and still ongoing) -- other languages would be
| easier now that I've learned a fair amount, but the work to
| add this kind of support is by far the most complex I've ever
| done.
| analog31 wrote:
| I'll give this a try. Managing "hidden state" in notebooks is a
| known flaw of Jupyter. If nothing else, an indicator that says,
| "this code is dirty" would be useful.
|
| I have a long standing habit of doing "restart kernel and run
| all cells" before walking away from a session, to help avoid
| this. I'd rather see it break in front of me than have it break
| 6 months later or in someone else's use.
| spenrose wrote:
| Kudos! When at Mozilla c.2016, I tried to work with the core
| Jupyter team on solving the stale-cell problem. I couldn't find
| a path forward that they would consider. Glad to see someone
| making progress.
| smacke wrote:
| I would be very surprised if something like this gets support
| in core Jupyter -- there's a lot of added complexity.
| Fortunately it is doable as extensions for Jupyter /
| JupyterLab.
| youssefabdelm wrote:
| I love Jupyter notebooks I just wish they looked as good as
| observable notebooks[1], not just in the overall layout but the
| charts/graphs you could make in general (plotly, matplotlib, etc
| don't even come close to d3.js, Observable Plot, etc)... I don't
| know why there seems to be a hole in the Python ecosystem for
| good designers or something
|
| This seems to be a step in the right direction with reactivity
| though. But it's not instant like Observable notebooks. But still
| good
|
| [1] https://observablehq.com
| n8henrie wrote:
| Doesn't Altair use something like D3 behind the scenes? I guess
| it's Vega.
|
| https://pypi.org/project/altair/
| smacke wrote:
| IPyflow takes a round trip from client to kernel for each
| execution (including reactive executions) -- this approach is
| necessary to get the best possible accuracy when determining
| dataflow in a highly dynamic language like Python, but it is an
| architectural limitation that prevents the reactivity from
| feeling as instant as in Observable or Pluto.
| krawczstef wrote:
| Reminds me of this project from Stitch Fix from years ago --
| https://github.com/stitchfix/nodebook. Ahead of it's time I
| guess..
| spprashant wrote:
| Now I wonder why this isn't an option in plain Jupyter.
| Inconsistent cell states and having to re-execute all cells after
| a single line change slows me down a lot.
|
| Like I get why this doesn't need to be default, but this seems
| crucial enough to warrant being included in the base package.
| smacke wrote:
| It's very new, and the current frontend implementations for
| Jupyter and JupyterLab include some workarounds for fundamental
| protocol-level limitations that probably make adding this kind
| of feature as part of the core package a no-go (without first
| addressing the core protocol limitations).
| vrglvrglvrgl wrote:
| [dead]
| flusteredBias wrote:
| I kind of think quarto is a much better solution to the problems
| that notebooks try to solve plus you get the added bonus of
| having plain text as the file source.
| esafak wrote:
| How's the multi-user story? Do you use Quarto at work with
| other people?
| wodenokoto wrote:
| I believe this is what Pluto sets out to do for Julia.
|
| I used it as part of the "Computational Thinking" with Julia
| course a year or two back. Even then the beta software was very
| good and some of the demos the Pluto dev showed were nothing
| short of amazing
|
| https://plutojl.org/
| O_H_E wrote:
| Yeah, Pluto rules!
|
| After getting used to it with Julia I found it really jarring
| to go back to plain Jupyter (when I need python) where I have
| to keep re-executing the cells.
|
| This is going to make that much less painful.
| rsfern wrote:
| Pluto is awesome
|
| Looks like by default you have to manually trigger reactivity
| in ipyflow, but there is a `%flow mode reactive` ipython magic
| mode that enables Pluto-style reactivity!
| smacke wrote:
| Yep. I think there is also a way to enable it by default in
| your ipython profile which I'll document at some point, so
| that you don't have to run `%flow mode reactive`. I'm curious
| though -- personally I much prefer to use the opt-in reactive
| execution mode with ctrl/cmd+shift+enter; curious to
| understand your preferences better :)
| TuringTest wrote:
| An always-reactive notebook is essentially a "literate
| spreadsheet", where you have data cells in between
| multimedia descriptions. In this model, all computed data
| is always up to date with whatever changes you make to the
| input parameters, including things like graphics connected
| to interactive sliders and text boxes. You can prototype
| the logic of an application very fast with real data and
| interactions.
| rsfern wrote:
| Your ipython profile suggestion is good, I use that for
| `%autoreload` so I don't see why it wouldn't work for
| ipyflow
|
| interesting question, I'm going to have to try the opt in
| reactivity in ipyflow because it's not an option in Pluto.
| Actually that's kind of a strength, one point of
| frustration in Pluto is accidentally triggering reactive
| execution of an expensive cell before everything is ready
|
| I think the thing I like most about always-on reactivity is
| that the state of the REPL and outputs can never become
| stale. I used to run into that in jupyter a lot as a
| (physical sciences) student writing hacky prototype code
| with implicit control flow... nice for debugging but in the
| long run it's quite painful.
|
| The nearest thing I had found in python is streamlit, but
| it is not as smooth as Pluto IMO. Looking forward to trying
| ipyflow, honestly I have been hoping for something like
| this for a while because using Pluto+PyCall as a jupyter
| replacement is a bit too cumbersome for python-forward
| projects
| sixhobbits wrote:
| Looks like it solves a common problem but the page is a bit
| confusing. It could make it clearer upfront what it does (I
| didn't know what reactive meant in this context) and how it
| relates to Jupyter (I thought it was official/core stuff at
| first, but I take it its a third party tool that integrates into
| jupyter).
|
| Stuff like "Trust me? Good." in the introduction doesn't really
| help me answer "wtf does this do" more quickly and the first
| intro sentence is pretty long and convoluted.
| sonofaragorn wrote:
| Agreed on the confusing page. I use notebooks every single day
| but a quick glance at the README gave me zero indication of
| this being something I need
| TuringTest wrote:
| On the other hand, the link description alone was enough to
| convince me that this is something I want.
|
| Having a very specific target makes it easier to reach that
| target in writing, I guess, and harder for people outside the
| target to understand what it's about.
| sixhobbits wrote:
| Well that was my point. I am part of the target audience
| and I would find this useful but it took me a while to
| realise that.
|
| Of course, maybe I am just incompetent or missing knowledge
| that everyone else in the target has.
| TuringTest wrote:
| Well, other than Observable, reactive notebooks are not
| that common and well known (precisely because Jupyter,
| which is the most famous, didn't support that model
| before).
|
| So maybe today is the first day that you are exposed to
| that model and you learn about it? There's always a first
| time.
| tuanm wrote:
| Indeed. Some concepts are overused and I still don't know what
| they exactly are.
|
| Anyway, it seems to solve few UX problems when working with
| Jupyter Notebooks.
| TuringTest wrote:
| Reactive notebooks are what change the workflow from command
| line-like to spreadsheet-like.
|
| It may not matter much of you use the notebook as a glorified
| terminal, but it is a godsend if your workflow involves data
| analysis with heavy dependencies between filtered subsets.
| boxy_brown wrote:
| Very cool. This project addresses some usability challenges that
| the Jupyter community has been debating and struggling with for
| years:
|
| https://twitter.com/jakevdp/status/935178916490223616
|
| My favorite take on this subject is Joel Grus' talk from
| JupyterCon 2018 (title: "I don't like notebooks").
|
| https://conferences.oreilly.com/jupyter/jup-ny/public/schedu...
|
| Slides:
|
| https://docs.google.com/presentation/d/1n2RlMdmv1p25Xy5thJUh...
| smacke wrote:
| Yep that was one of the talks that motivated this work --
| talked about it in my JupyterCon talk today :)
|
| https://t.ly/78rS
| vchuravy wrote:
| How closely tied is this to Python? The need for reactivity is
| what drove the development for Pluto.jl, but it would be nice to
| have something like this for IJulia.jl as well.
| vchuravy wrote:
| How closely tied is this to Python? The need for reactivity is
| what drove the development for Pluto.jl, but it would be nice to
| have something like this for IJulia.jl as well.
| smacke wrote:
| For this project, Python is a hard requirement, though it's
| possible the approach may be applied (after significant effort
| -- Python took me ~3 years and counting) to other languages /
| runtimes as well.
| maegul wrote:
| Nice! From what I've gathered this has been in the works for a
| while(?)
|
| Thoughts ...
|
| 1. Yea, the Readme could do with a bit of polish. Your hero
| feature, AFAIU, is the automatic reactivity. This is in your
| second GIF, put this front and center and make it really clear
| what is happening. You (and I) know what reactivity looks like so
| we know what to look for, but someone new to the idea in
| notebooks could easily blink and miss this. I'd work on a nicer
| GIF and even a little youtube video just to make it really clear
| what's going on here. Bostock and ObservableHQ advertised their
| reactivity a while ago, you might be able to get inspiration for
| how they demonstrated it?
|
| 2. The syntax extensions are cool! Integration with ipywidgets is
| Ace!!
|
| 3. Do you have any comments on how ObservableHQ (Javascript
| runtime by Bostock) and Pluto (inspired by previous) informed or
| inspired your choices and implementation here? Is this basically
| the same for python/jupyter as those are for JS/Julia?
|
| 4. Annoying Questions or feature requests ... Are there any
| overheads? Any timeout facilities for long running code? Can the
| full variable and/or cell dependency graph be surfaced and
| visualised (ObservableHQ put this into the UI a while back and it
| was kinda cool).
|
| Otherwise ... awesome to see this land! Congratulations!!!
| smacke wrote:
| Thank you for all the feedback, positive and constructive! Yep
| docs definitely need a lot of polish :)
|
| 3. I actually started from scratch -- ipyflow's reactivity
| model is a bit different from these, since for Python, my
| experience is that static dependency inference is too
| unreliable to be useful. (Though after talking with the Pluto
| maintainer earlier today it sounds like Pluto may be reaching
| some of the same conclusions and also be moving toward a
| dynamic dependency inference strategy)
|
| In the future, my hope is that as a community we will develop a
| live-coding analogue to lsps which one might call a "language
| kernel protocol" so that we can standardize some of these
| features across different languages / editors
|
| 4. For top-level / module-level statements, yes there is lots
| of overhead (> 100x), but it's largely limited to those
| statements (i.e. external library calls, recursive function
| calls, etc have close to 0 overhead thanks to intelligent
| instrumentation disabling for these) and turns out to be OK in
| practice (more details in nbslicer paper
| https://smacke.net/papers/nbslicer.pdf). At some point I'll run
| it through a profiler and try to grab the low-hanging
| optimizations but it hasn't been noticeable so far.
|
| Surfacing the DAG is definitely something I want to do at some
| point; we have all the information in the backend so we should
| try to surface it in the frontend.
| Helmut10001 wrote:
| I think it is a great idea, but doesn't apply to 99% of my
| notebooks because they need to run up to 2 hours to completely
| execute once, due to data intensive tasks. I usually run all
| cells once per day, up to cell x (where I left the day before),
| then continue working by adding cells and updating the state
| manually until I make progress (=no errors or output is as
| expected) and move on to the next cell - without re-executing
| other cells because this would be a productivity nightmare.
| RobinL wrote:
| Many reactivity frameworks (e.g. observablehq, shiny) recompute
| intelligently - they're aware of what parts of the calculation
| has changed and needs recomputing. Haven't checked with
| ipyflow, but this idea would help mitigate some of your
| concerns
| TheAlchemist wrote:
| Looks quite neat.
|
| I'm getting the habit of regularly restart the kernel and re-run
| everything - just to make sure everything works as expected.
| jerpint wrote:
| Simply doing "restart + run all" ensures both readability and
| reproducibility
| orwin wrote:
| If you have four of five sklearn tiles that take 20 minute to
| an hour to execute, you might not want to do that too often.
|
| I did not use jupiter notebook/jupyterLab much, but each
| time, it was in the context of datascience. The first was on
| an OCR during my internship, the second for data exploration
| (mix of quantitative/qualitative, but the project was
| scrapped after a week or two). In both case, having to re-run
| all each time the kernel was shut down was actually a pain
| point.
| singhrac wrote:
| I encountered this a lot in my previous work and found
| workarounds that I write about here:
| https://rachitsingh.com/collaborating-jupyter/. At a high
| level I think being scared of rerunning your kernel is
| indicative of a code smell, and there's relatively easy
| ways to combat that.
| smacke wrote:
| Personally I don't like the "write to disk" approach; I
| think it kind of just punts the state problem somewhere
| else (i.e. from memory to disk). Writing to a database
| and adding versioning is better, but that's a lot of
| machinery to expect a notebook user to adopt (though
| maybe better tooling could help). Also a lot of Python
| objects are not out-of-the-box pickleable (e.g.
| generators). Also pickle is a mess.
| qumpis wrote:
| What's the correlation between people who don't use debuggers and
| people who use notebooks? I can't imagine writing code without a
| visual debugger, one at a level of pycharm. I think people who
| use notebooks must either be very active how they write code
| (don't misremeber variables, complexity of arrays, dicts etc) or
| have other means of debugging (print? Jupyterlab debugger? pbd?).
|
| I love notebooks for their ability to preload chunks of code/data
| and have the ability to explore without delay. But having to put
| mental strain in keeping track of objects is too much for me.
| Vscode and pycharm have made strides in unifying the experience
| but it's still very much sub par, at least in my experience.
| Matlab-like style of executing code with possibility of reusing
| same debugger solution was perfect.
| anentropic wrote:
| I don't use notebooks much, but pdb is available in them
| pilotneko wrote:
| Personally, I use notebooks to do exploratory data analysis and
| to get model training configured. Any large-scale model
| training event is converted to a script, and nothing
| production-facing is in a notebook.
| qbasic_forever wrote:
| Jupyter lab has a debugger for python:
| https://jupyterlab.readthedocs.io/en/stable/user/debugger.ht...
|
| Most people write notebooks that are ephemeral and meant for
| ad-hoc analysis. If a value needs to be inspected it can just
| be printed in a cell, or even better a fancy widget or graph
| can display it. You don't need breakpoints as much since you
| can just choose what cells to execute, or create a throw away
| cell to grab some values.
|
| Once you need to turn an analysis into a business process or
| repeatable task it makes sense to move it into a proper python
| module and use any IDE, debugger, etc.
| jrvarela56 wrote:
| Not sure if correlated. In Ruby I do a mix of REPL/console,
| tests and step-through debugging. When using Python, I always
| use a notebook as a scratchpad - to me it's a REPL but easier
| to keep tidy. The notebook can be good docs of how things work
| too, a complement to tests-as-docs as it's easy to show in
| different (real) contexts.
|
| I sorely miss being able to do this when working on frontend,
| have tried setting up node console to import files but React
| just makes it very easy to couple everything. This leaves me
| with tests as the easiest way to code outside of a view (which
| has too much friction for playing around). Hot reloading is
| great but iterating logic in isolation is way harder without a
| REPL.
| appleiigs wrote:
| I'd use Ruby more if it would work better in a notebook
| environment. It appears that iruby is in maintenance mode and
| falling behind Julia in usability.
| baggiponte wrote:
| Congratulations! I starred it a long time ago but never used it
| (sorry). But I do think this IS the way to go for Jupyter. I
| don't know how I could contribute to this - lack of time, but
| mostly knowledge, but I would love to find other ways to help.
| smacke wrote:
| Thank you! If you had tried to use it before, it probably would
| have broken pretty quickly. Now it will still break, but not so
| quickly as to not be useful, hopefully.
|
| External contributions are mostly blocked on me right now to
| improve both user and developer docs (improve = write the first
| draft in this case).
| __mharrison__ wrote:
| I spend a good deal of time teaching my students inside of
| Jupyter.
|
| For Pandas, many problems can be solved by chaining (debugging as
| you go), converting the chain to a function, and placing the
| function at the top of the notebook after you load the raw data.
|
| I get the problem this is solving, but adding some congrats and
| practical software engineering makes for much better notebook
| experiences.
___________________________________________________________________
(page generated 2023-05-10 23:01 UTC)