[HN Gopher] Introduction to Pluto.jl
___________________________________________________________________
Introduction to Pluto.jl
Author : joshday
Score : 82 points
Date : 2021-04-29 18:30 UTC (4 hours ago)
(HTM) web link (www.juliafordatascience.com)
(TXT) w3m dump (www.juliafordatascience.com)
| LetThereBeLight wrote:
| I know that this site mentions MIT's Introduction to
| Computational Thinking course, but the hyperlink doesn't send me
| there.
|
| For those interested in seeing Pluto in action I highly recommend
| checking out the course notebooks here:
| https://computationalthinking.mit.edu/Spring21/
| teruakohatu wrote:
| I have played around with Pluto.jl, and colleagues of mine use it
| for research, but I keep going back to Jupyter. I tend to have
| long running cells that are pulling information from external
| sources or training models, and triggering one of those cells
| accidentally will waste a lot of time running something that may
| not be reliably interrupted.
|
| There is talk about putting in execution barriers that would help
| with this, at the risk of making Pluto more complicated for
| users:
|
| https://github.com/fonsp/Pluto.jl/discussions/298
| dandanua wrote:
| This can be easily solved. You can bind a variable to a
| checkbox like this: @bind allow_run html"Run
| cell below <input type=checkbox>"
|
| and wrap your long running cell in the if block:
| if allow_run your_code end
| nerdponx wrote:
| FWIW I've significantly improved my experience by breaking up
| my notebooks into smaller pieces such that each notebook only
| does "one thing", while using DVC to run them and keep track of
| intermediate results. Or in a case where the intermedaite
| result was itself somewhat "exploratory", having the notebook
| itself check for the existence of an intermediate result and
| load it from disk instead of recomputing it.
|
| Execution barriers are a nice idea though. There is/was a
| Jupyter notebook extension for "initialization cells", but the
| whole notebook extension ecosystem seems kind of dead and it's
| unclear if Jupyter Lab will ever have equivalents.
| oivey wrote:
| The fact that Pluto only runs dependent cells on changes mostly
| solves this for me. For example, a cell can load things into
| the variable data, and then another cell can apply a function
| f(data). If I alter f, data is not reloaded and f(data)
| automatically runs.
| teruakohatu wrote:
| That is fine if you are working sequentially, but often tasks
| involve going back to the original data and doing some
| wrangling.
|
| data -> model(data) -> output(model)
|
| So if you go back to mess around with the data, your model
| and output could be or would be recomputed, which you would
| need to do eventually but not while making iterative tweaks.
|
| Another commenter suggested adding checkboxes which is a good
| idea, although then you are managing a bunch of checkbox
| states.
| f6v wrote:
| > So if you go back to mess around with the data, your
| model and output could be or would be recomputed, which you
| would need to do eventually but not while making iterative
| tweaks.
|
| On the other hand, not everyone remembers to re-run
| dependent cells. I've had many R notebooks handed in to me
| where an author didn't check it runs top to bottom with
| fresh workspace.
| Someone wrote:
| I think the ideal user-friendly system would switch between
| automatic and manual recomputation depending on expected
| time of recomputation and expected time until the user
| triggers another recomputation (and clearly indicate which
| cells need recomputation to make them reflect the latest
| state of the system). If you're editing a file path, for
| example, you don't want the system to read or, worse, write
| that file after every key you press. Similarly, if you
| change one cell and within a second start editing a second
| one, you don't want to start recomputation.
|
| So, if the system thinks it takes T seconds to compute a
| cell, it could only start recomputation after f(T) seconds
| without user input.
|
| Finding a good function f is left as an exercise for the
| reader. That's where good systems will add value. A good
| system likely would need a more complex f, which also has
| ideas about how much file and network I/O the steps take
| and whether steps can easily be cancelled.
| nerdponx wrote:
| I really wish the Julia ecosystem would stop assuming that you
| always interact with your computer through the Julia REPL and
| started supporting proper command line interfaces. This is one of
| the big annoyances and mistakes of the R ecosystem, and I think
| it's unwise to carry that mistake over to Julia.
|
| Also, big "ugh" to browser-based tooling. I want to browse
| webpages in my browser, I don't want to do my data science work
| there. We don't even have a good native client for Jupyter
| notebooks yet, let alone for this new Jupyter alternative that
| doesn't support the existing Jupyter kernel protocol.
|
| Not only that, but Pluto also apparently has some obnoxious UX
| limitations that remind me of other less-than-usable wannabe-
| Jupyter-notebooks (e.g. Apache Zeppelin, Databricks):
| https://towardsdatascience.com/could-pluto-be-a-real-jupyter...
|
| In short: nice idea, but I'd rather see continued unification
| around Jupyter and a proper IDE that can at least emit and
| interact with Jupyter-compatible data.
|
| On the other hand, the Jupyter notebook JSON format is bad for a
| variety of reasons (e.g. you need special tools for readable Git
| diffs) and I really wish we had all settled on R Markdown
| instead. But R has its own NIH tooling problem and nobody was
| ever going to adopt it because the R community itself (driven by
| RStudio) has little interest in sharing or interoperability with
| other languages.
|
| </cynical-angry-rant>
| lacker wrote:
| To me this seems like an improvement in the direction that you
| want, in particular that notebooks are reactive. All too often
| I get a Jupyter notebook from someone else and try to run it on
| my machine only to find that some intermediate step does not
| work any more, because the original developer ran something out
| of order or removed a critical step. A reactive notebook seems
| more likely to still work after a lot of changes are made while
| experimenting.
| clarkevans wrote:
| Pluto notebooks are Julia scripts, usable at the command line.
|
| Edit: Pluto uses Julia's package manager; moreover,
| Manifest.toml can be used to pin all of your project's
| dependencies so the notebook is repeatable, from a code
| perspective.
| nerdponx wrote:
| That's good to know. But I was talking about the package
| manager and starting the Pluto server.
| dash2 wrote:
| Big ugh to browser based tooling, and yet also continued
| unification around Jupyter? Are there any plans to have a non-
| browser Jupyter?
| nerdponx wrote:
| It's "ugh" in the Jupyter world too.
|
| A good quality standalone "notebook editor" would be an
| incredible tool. Nteract exists, but is not "good".
| pdeffebach wrote:
| QtConsole is actively maintained. I don't use it but I do
| like it a lot.
| Iwan-Zotow wrote:
| > Are there any plans to have a non-browser Jupyter?
|
| Sure. VSCode with python and Jupyter extensions
| gugagore wrote:
| What difference does this make, though? Isn't VSCode an
| Electron app? All of its UI is based on web stuff, anyway!
| TechBro8615 wrote:
| I tried this for the first time the other day and it was a
| great experience. Ironically the most cumbersome part
| continues to be Python environment management. I'll spare
| you my usual rant about that, but hopefully by Python 4
| they'll find a solution.
| kylebarron wrote:
| nbterm [0] was recently released. You can also use Jupyter as
| a command line interface through Jupyter Console.
|
| [0]: https://blog.jupyter.org/nbterm-jupyter-notebooks-in-
| the-ter...
| pdeffebach wrote:
| Plenty of people use the REPL in terminal and sublime text or
| vim or whatever. I also dislike browser-based tooling and think
| Julia has done a _good_ job avoiding Rstudio-style
| dependencies.
|
| But if your point is the inability to do `julia script.jl` ,
| yeah thats a pain point. Fortunately there has been some
| tooling to make running many jobs in a row easier:
| https://github.com/dmolina/DaemonMode.jl
| thenoblesunfish wrote:
| Is Julia different from Python in this regard? I use Python
| mostly by executing scripts, but it's nice to have the REPL and
| IPython and Jupyter. With Julia I'm free to just run "julia
| script.jl", aren't I? There's probably more to your complaint
| than I naively realize, though. Maybe Python has better IDE
| support?
| JustFinishedBSG wrote:
| > I really wish the Julia ecosystem would stop assuming that
| you always interact with your computer through the Julia REPL
| and started supporting proper command line interfaces.
|
| What does it even mean? What is a CLI interface for a
| programming language if not a REPL ?
| krastanov wrote:
| I also do not really get the complaint, but it is along the
| lines of people wanting to write `julia-pkg install Pluto`
| instead of `julia -e 'using Pkg; Pkg.add("Pluto")'`. It seems
| it is a big pet peeve for some people.
| spekcular wrote:
| Yes. I agree with this complaint. The REPL is useful in
| some cases but in general I avoid interacting with it
| whenever possible. My impression is that workflow is highly
| task-dependent (perhaps obvious) but there are many of us
| who just want to write a script, run the script, and
| repeat.
| maximilianroos wrote:
| I used Pluto for last year's Advent of Code. It's extremely good
| for these sorts of problems -- rapid iteration with modest
| computational requirements.
|
| Think of something you might use a spreadsheet for -- Pluto has a
| similar feeling of instant feedback.
|
| ---
|
| Some features that are missing:
|
| - Some things are difficult to do with the keyboard; I used my
| mouse more than with other tools. The author doesn't like modal
| editing, but ideally they could be implemented with modifier keys
| (https://github.com/fonsp/Pluto.jl/issues/65)
|
| - It's hard to understand what happens _within_ a cell -- logging
| goes to the terminal rather than the notebook -- and there aren't
| many introspection tools. This is an environment where
| transparency / introspection would be particularly helpful.
|
| ---
|
| Pluto doesn't solve every problem, or completely replace
| notebooks; to respond to a couple of comments:
|
| > I have many extremely long notebooks that would almost
| certainly crash if you tried to recompute the whole thing
|
| Right, don't use Pluto for that! It's not one environment to rule
| them all
|
| > Many of the cells won't work at all because the inputs are long
| gone
|
| That seems bad! Pluto will help you ensure that doesn't happen.
| xal wrote:
| It's funny because this is probably a really non-standard
| sentiment but I really wish that they would make an electron app
| out of this. Installing it is reasonably easy but definitely
| beyond a lot of people who could get value from it.
| nerdponx wrote:
| Normally I dislike Electron apps (with some very well-built
| exceptions) but in this case it makes perfect sense. It already
| renders HTML, CSS, and JS!
| borodi wrote:
| For those that are put off byt the "weird" cell execution
| behavior there is also
| https://github.com/compleathorseplayer/Neptune.jl A non reactive
| fork of Pluto that has basically all the benefits of pluto and
| multi-line cell execution without begin without the reactive
| behaviour. Also running code blocks with inline results in vscode
| also has some notebook feel to me.
| legerdemain wrote:
| LOL, how often do you want your entire notebook to recompute just
| because you change something somewhere? Have you never tried
| pursuing a little side experiment in an existing notebook, or
| have ten abandoned false starts leading to one good result? I
| have many extremely long notebooks that would almost certainly
| crash if you tried to recompute the whole thing, and many of the
| cells won't work at all because the inputs are long gone. Some of
| these notebooks are years old. The datasets they have in memory
| aren't saved anywhere else. What possible motivation do I have to
| lose all of this precious state?
|
| If I wanted a software-grade, rock-solid data pipeline, I would
| just copy-paste some code from an existing notebook and run it on
| Papermill.
| MisterBiggs wrote:
| The whole notebook doesn't recompute only cells that are
| dependent on the cell that changed. This is extremely powerful
| because you never end up with stale cells that are showing
| incorrect values.
| lacker wrote:
| _Some of these notebooks are years old. The datasets they have
| in memory aren 't saved anywhere else._
|
| That sounds dangerous to me. If your computer crashes or you
| introduce a bug to your notebook, you could lose all that data.
| Personally, I prefer my notebooks to be reproducible at any
| point.
| yunohn wrote:
| Exactly, or at the very least,
| pickle/serialise/export/whatever the models so that the
| computer can survive a reboot.
| legerdemain wrote:
| These are usually small aggregates and summaries, so I just
| display them in notebook output. It does make it take a bit
| longer to scroll through the notebook to find something,
| but that's what being disciplined with organization is for.
| [deleted]
| spinningslate wrote:
| I'm always impressed by the quality of the Julia ecosystem. It
| seems to be in that sweet spot with sufficient use & contribution
| to be viable, but not so popular that quality suffers.
| kruxigt wrote:
| Yes! Hope it stays that way for a long time. The continued
| popularity of Python is probably good for this. Mostly only the
| right people move on from Python to Julia.
| machineko wrote:
| https://madnight.github.io/githut/#/pull_requests/2021/1
|
| Just saying but julia open source is kinda dead outside few
| very high quality packages
| teruakohatu wrote:
| I love Julia and part of its charm is that everything is
| relatively new and so quite consistent, also helped by the
| community ethos and technical features that aid composition.
|
| Python and R (especially R) have plenty of libraries that are
| high-quality, or even industry standard, but which are decades
| old and feel it. Python's NLTK is 20 years old for example and
| it can feel grating switching between NLTK and spaCy. R has
| three different object systems (four according to some), so you
| might be using some ancient battle tested library with Hadley
| Wickham's latest cutting edge libraries.
| f6v wrote:
| R has a terrible naming problem where you don't know which
| convention is used in a particular library.
___________________________________________________________________
(page generated 2021-04-29 23:00 UTC)