[HN Gopher] Introduction to Pluto.jl
       ___________________________________________________________________
        
       Introduction to Pluto.jl
        
       Author : joshday
       Score  : 82 points
       Date   : 2021-04-29 18:30 UTC (4 hours ago)
        
 (HTM) web link (www.juliafordatascience.com)
 (TXT) w3m dump (www.juliafordatascience.com)
        
       | LetThereBeLight wrote:
       | I know that this site mentions MIT's Introduction to
       | Computational Thinking course, but the hyperlink doesn't send me
       | there.
       | 
       | For those interested in seeing Pluto in action I highly recommend
       | checking out the course notebooks here:
       | https://computationalthinking.mit.edu/Spring21/
        
       | teruakohatu wrote:
       | I have played around with Pluto.jl, and colleagues of mine use it
       | for research, but I keep going back to Jupyter. I tend to have
       | long running cells that are pulling information from external
       | sources or training models, and triggering one of those cells
       | accidentally will waste a lot of time running something that may
       | not be reliably interrupted.
       | 
       | There is talk about putting in execution barriers that would help
       | with this, at the risk of making Pluto more complicated for
       | users:
       | 
       | https://github.com/fonsp/Pluto.jl/discussions/298
        
         | dandanua wrote:
         | This can be easily solved. You can bind a variable to a
         | checkbox like this:                  @bind allow_run html"Run
         | cell below <input type=checkbox>"
         | 
         | and wrap your long running cell in the if block:
         | if allow_run           your_code        end
        
         | nerdponx wrote:
         | FWIW I've significantly improved my experience by breaking up
         | my notebooks into smaller pieces such that each notebook only
         | does "one thing", while using DVC to run them and keep track of
         | intermediate results. Or in a case where the intermedaite
         | result was itself somewhat "exploratory", having the notebook
         | itself check for the existence of an intermediate result and
         | load it from disk instead of recomputing it.
         | 
         | Execution barriers are a nice idea though. There is/was a
         | Jupyter notebook extension for "initialization cells", but the
         | whole notebook extension ecosystem seems kind of dead and it's
         | unclear if Jupyter Lab will ever have equivalents.
        
         | oivey wrote:
         | The fact that Pluto only runs dependent cells on changes mostly
         | solves this for me. For example, a cell can load things into
         | the variable data, and then another cell can apply a function
         | f(data). If I alter f, data is not reloaded and f(data)
         | automatically runs.
        
           | teruakohatu wrote:
           | That is fine if you are working sequentially, but often tasks
           | involve going back to the original data and doing some
           | wrangling.
           | 
           | data -> model(data) -> output(model)
           | 
           | So if you go back to mess around with the data, your model
           | and output could be or would be recomputed, which you would
           | need to do eventually but not while making iterative tweaks.
           | 
           | Another commenter suggested adding checkboxes which is a good
           | idea, although then you are managing a bunch of checkbox
           | states.
        
             | f6v wrote:
             | > So if you go back to mess around with the data, your
             | model and output could be or would be recomputed, which you
             | would need to do eventually but not while making iterative
             | tweaks.
             | 
             | On the other hand, not everyone remembers to re-run
             | dependent cells. I've had many R notebooks handed in to me
             | where an author didn't check it runs top to bottom with
             | fresh workspace.
        
             | Someone wrote:
             | I think the ideal user-friendly system would switch between
             | automatic and manual recomputation depending on expected
             | time of recomputation and expected time until the user
             | triggers another recomputation (and clearly indicate which
             | cells need recomputation to make them reflect the latest
             | state of the system). If you're editing a file path, for
             | example, you don't want the system to read or, worse, write
             | that file after every key you press. Similarly, if you
             | change one cell and within a second start editing a second
             | one, you don't want to start recomputation.
             | 
             | So, if the system thinks it takes T seconds to compute a
             | cell, it could only start recomputation after f(T) seconds
             | without user input.
             | 
             | Finding a good function f is left as an exercise for the
             | reader. That's where good systems will add value. A good
             | system likely would need a more complex f, which also has
             | ideas about how much file and network I/O the steps take
             | and whether steps can easily be cancelled.
        
       | nerdponx wrote:
       | I really wish the Julia ecosystem would stop assuming that you
       | always interact with your computer through the Julia REPL and
       | started supporting proper command line interfaces. This is one of
       | the big annoyances and mistakes of the R ecosystem, and I think
       | it's unwise to carry that mistake over to Julia.
       | 
       | Also, big "ugh" to browser-based tooling. I want to browse
       | webpages in my browser, I don't want to do my data science work
       | there. We don't even have a good native client for Jupyter
       | notebooks yet, let alone for this new Jupyter alternative that
       | doesn't support the existing Jupyter kernel protocol.
       | 
       | Not only that, but Pluto also apparently has some obnoxious UX
       | limitations that remind me of other less-than-usable wannabe-
       | Jupyter-notebooks (e.g. Apache Zeppelin, Databricks):
       | https://towardsdatascience.com/could-pluto-be-a-real-jupyter...
       | 
       | In short: nice idea, but I'd rather see continued unification
       | around Jupyter and a proper IDE that can at least emit and
       | interact with Jupyter-compatible data.
       | 
       | On the other hand, the Jupyter notebook JSON format is bad for a
       | variety of reasons (e.g. you need special tools for readable Git
       | diffs) and I really wish we had all settled on R Markdown
       | instead. But R has its own NIH tooling problem and nobody was
       | ever going to adopt it because the R community itself (driven by
       | RStudio) has little interest in sharing or interoperability with
       | other languages.
       | 
       | </cynical-angry-rant>
        
         | lacker wrote:
         | To me this seems like an improvement in the direction that you
         | want, in particular that notebooks are reactive. All too often
         | I get a Jupyter notebook from someone else and try to run it on
         | my machine only to find that some intermediate step does not
         | work any more, because the original developer ran something out
         | of order or removed a critical step. A reactive notebook seems
         | more likely to still work after a lot of changes are made while
         | experimenting.
        
         | clarkevans wrote:
         | Pluto notebooks are Julia scripts, usable at the command line.
         | 
         | Edit: Pluto uses Julia's package manager; moreover,
         | Manifest.toml can be used to pin all of your project's
         | dependencies so the notebook is repeatable, from a code
         | perspective.
        
           | nerdponx wrote:
           | That's good to know. But I was talking about the package
           | manager and starting the Pluto server.
        
         | dash2 wrote:
         | Big ugh to browser based tooling, and yet also continued
         | unification around Jupyter? Are there any plans to have a non-
         | browser Jupyter?
        
           | nerdponx wrote:
           | It's "ugh" in the Jupyter world too.
           | 
           | A good quality standalone "notebook editor" would be an
           | incredible tool. Nteract exists, but is not "good".
        
           | pdeffebach wrote:
           | QtConsole is actively maintained. I don't use it but I do
           | like it a lot.
        
           | Iwan-Zotow wrote:
           | > Are there any plans to have a non-browser Jupyter?
           | 
           | Sure. VSCode with python and Jupyter extensions
        
             | gugagore wrote:
             | What difference does this make, though? Isn't VSCode an
             | Electron app? All of its UI is based on web stuff, anyway!
        
             | TechBro8615 wrote:
             | I tried this for the first time the other day and it was a
             | great experience. Ironically the most cumbersome part
             | continues to be Python environment management. I'll spare
             | you my usual rant about that, but hopefully by Python 4
             | they'll find a solution.
        
           | kylebarron wrote:
           | nbterm [0] was recently released. You can also use Jupyter as
           | a command line interface through Jupyter Console.
           | 
           | [0]: https://blog.jupyter.org/nbterm-jupyter-notebooks-in-
           | the-ter...
        
         | pdeffebach wrote:
         | Plenty of people use the REPL in terminal and sublime text or
         | vim or whatever. I also dislike browser-based tooling and think
         | Julia has done a _good_ job avoiding Rstudio-style
         | dependencies.
         | 
         | But if your point is the inability to do `julia script.jl` ,
         | yeah thats a pain point. Fortunately there has been some
         | tooling to make running many jobs in a row easier:
         | https://github.com/dmolina/DaemonMode.jl
        
         | thenoblesunfish wrote:
         | Is Julia different from Python in this regard? I use Python
         | mostly by executing scripts, but it's nice to have the REPL and
         | IPython and Jupyter. With Julia I'm free to just run "julia
         | script.jl", aren't I? There's probably more to your complaint
         | than I naively realize, though. Maybe Python has better IDE
         | support?
        
         | JustFinishedBSG wrote:
         | > I really wish the Julia ecosystem would stop assuming that
         | you always interact with your computer through the Julia REPL
         | and started supporting proper command line interfaces.
         | 
         | What does it even mean? What is a CLI interface for a
         | programming language if not a REPL ?
        
           | krastanov wrote:
           | I also do not really get the complaint, but it is along the
           | lines of people wanting to write `julia-pkg install Pluto`
           | instead of `julia -e 'using Pkg; Pkg.add("Pluto")'`. It seems
           | it is a big pet peeve for some people.
        
             | spekcular wrote:
             | Yes. I agree with this complaint. The REPL is useful in
             | some cases but in general I avoid interacting with it
             | whenever possible. My impression is that workflow is highly
             | task-dependent (perhaps obvious) but there are many of us
             | who just want to write a script, run the script, and
             | repeat.
        
       | maximilianroos wrote:
       | I used Pluto for last year's Advent of Code. It's extremely good
       | for these sorts of problems -- rapid iteration with modest
       | computational requirements.
       | 
       | Think of something you might use a spreadsheet for -- Pluto has a
       | similar feeling of instant feedback.
       | 
       | ---
       | 
       | Some features that are missing:
       | 
       | - Some things are difficult to do with the keyboard; I used my
       | mouse more than with other tools. The author doesn't like modal
       | editing, but ideally they could be implemented with modifier keys
       | (https://github.com/fonsp/Pluto.jl/issues/65)
       | 
       | - It's hard to understand what happens _within_ a cell -- logging
       | goes to the terminal rather than the notebook -- and there aren't
       | many introspection tools. This is an environment where
       | transparency / introspection would be particularly helpful.
       | 
       | ---
       | 
       | Pluto doesn't solve every problem, or completely replace
       | notebooks; to respond to a couple of comments:
       | 
       | > I have many extremely long notebooks that would almost
       | certainly crash if you tried to recompute the whole thing
       | 
       | Right, don't use Pluto for that! It's not one environment to rule
       | them all
       | 
       | > Many of the cells won't work at all because the inputs are long
       | gone
       | 
       | That seems bad! Pluto will help you ensure that doesn't happen.
        
       | xal wrote:
       | It's funny because this is probably a really non-standard
       | sentiment but I really wish that they would make an electron app
       | out of this. Installing it is reasonably easy but definitely
       | beyond a lot of people who could get value from it.
        
         | nerdponx wrote:
         | Normally I dislike Electron apps (with some very well-built
         | exceptions) but in this case it makes perfect sense. It already
         | renders HTML, CSS, and JS!
        
       | borodi wrote:
       | For those that are put off byt the "weird" cell execution
       | behavior there is also
       | https://github.com/compleathorseplayer/Neptune.jl A non reactive
       | fork of Pluto that has basically all the benefits of pluto and
       | multi-line cell execution without begin without the reactive
       | behaviour. Also running code blocks with inline results in vscode
       | also has some notebook feel to me.
        
       | legerdemain wrote:
       | LOL, how often do you want your entire notebook to recompute just
       | because you change something somewhere? Have you never tried
       | pursuing a little side experiment in an existing notebook, or
       | have ten abandoned false starts leading to one good result? I
       | have many extremely long notebooks that would almost certainly
       | crash if you tried to recompute the whole thing, and many of the
       | cells won't work at all because the inputs are long gone. Some of
       | these notebooks are years old. The datasets they have in memory
       | aren't saved anywhere else. What possible motivation do I have to
       | lose all of this precious state?
       | 
       | If I wanted a software-grade, rock-solid data pipeline, I would
       | just copy-paste some code from an existing notebook and run it on
       | Papermill.
        
         | MisterBiggs wrote:
         | The whole notebook doesn't recompute only cells that are
         | dependent on the cell that changed. This is extremely powerful
         | because you never end up with stale cells that are showing
         | incorrect values.
        
         | lacker wrote:
         | _Some of these notebooks are years old. The datasets they have
         | in memory aren 't saved anywhere else._
         | 
         | That sounds dangerous to me. If your computer crashes or you
         | introduce a bug to your notebook, you could lose all that data.
         | Personally, I prefer my notebooks to be reproducible at any
         | point.
        
           | yunohn wrote:
           | Exactly, or at the very least,
           | pickle/serialise/export/whatever the models so that the
           | computer can survive a reboot.
        
             | legerdemain wrote:
             | These are usually small aggregates and summaries, so I just
             | display them in notebook output. It does make it take a bit
             | longer to scroll through the notebook to find something,
             | but that's what being disciplined with organization is for.
        
         | [deleted]
        
       | spinningslate wrote:
       | I'm always impressed by the quality of the Julia ecosystem. It
       | seems to be in that sweet spot with sufficient use & contribution
       | to be viable, but not so popular that quality suffers.
        
         | kruxigt wrote:
         | Yes! Hope it stays that way for a long time. The continued
         | popularity of Python is probably good for this. Mostly only the
         | right people move on from Python to Julia.
        
         | machineko wrote:
         | https://madnight.github.io/githut/#/pull_requests/2021/1
         | 
         | Just saying but julia open source is kinda dead outside few
         | very high quality packages
        
         | teruakohatu wrote:
         | I love Julia and part of its charm is that everything is
         | relatively new and so quite consistent, also helped by the
         | community ethos and technical features that aid composition.
         | 
         | Python and R (especially R) have plenty of libraries that are
         | high-quality, or even industry standard, but which are decades
         | old and feel it. Python's NLTK is 20 years old for example and
         | it can feel grating switching between NLTK and spaCy. R has
         | three different object systems (four according to some), so you
         | might be using some ancient battle tested library with Hadley
         | Wickham's latest cutting edge libraries.
        
           | f6v wrote:
           | R has a terrible naming problem where you don't know which
           | convention is used in a particular library.
        
       ___________________________________________________________________
       (page generated 2021-04-29 23:00 UTC)