[HN Gopher] The Jupyter+Git problem is now solved
___________________________________________________________________
The Jupyter+Git problem is now solved
Author : jph00
Score : 262 points
Date : 2022-08-26 00:09 UTC (22 hours ago)
(HTM) web link (www.fast.ai)
(TXT) w3m dump (www.fast.ai)
| flutetornado wrote:
| JupyterLab
|
| JupyterHub
|
| Jupytext - converts ipynb to py
|
| Nbstripout - strips all output from ipynb
|
| Nbmerge - resolves merge conflicts
|
| Vim-jupytext - vim plugin to auto convert ipynb to py
|
| Papermill - parameterize notebooks
|
| Git
|
| Pandas, Altair - data analysis / Visualization
|
| Phabricator - code reviews of notebooks
|
| Vimdiff + vim-jupytext - diffs in terminal
|
| This solved all my jupyter problems.
| ticklemyelmo wrote:
| Why is there a "Jupyter+Git" problem specifically? Why aren't we
| worrying about the "C+Git" problem and the "XML+Git" problem and
| the "Python+Git" problem? Because merge markers break, well,
| _every_ file format.
|
| Is it because Jupyter users in particular don't typically
| understand that there is a formatted text file behind the
| notebook, or how merge conflicts work?
| thefrozenone wrote:
| This is a good question and made me think. I have come up with:
| "Jupyter notebooks can be thought of as Code _and_ an IDE
| layout "
| Bjartr wrote:
| It's because the primary editor of the notebook files barfs
| when presented with the file if it includes merge markers since
| it's no longer valid json. Imagine if one of your normal code-
| friendly text editors, or your ide, refused to open a .c or .py
| file and you had to open it in notepad to fix it.
|
| That's what it feels like to be forced to drop into a normal
| text editor rather than using the normal notebook ui to fix the
| conflicts.
| ahurmazda wrote:
| Thanks but a hard pass from me. The original sin was using goofy
| JSON as the file format (and no! I dont care for your pretty 5MB
| pngs polluting my git tree). This is the nth attempt at applying
| lipstick on the pig (n-1 being jupytext)
| killjoywashere wrote:
| Why is this not on brew yet! (+deg#deg)+( +-+
| kmod wrote:
| Streamlit has completely replaced my usage of Jupyter -- I find
| it to have the quick iteration speed and visual output of
| notebooks, but it's just normal python so all the normal tooling
| works (there is no "git problem") and you don't have the weird
| state problems of notebooks.
|
| Definitely recommend checking it out if you haven't already!
| whacked_new wrote:
| I used to use Jupytext a lot for this problem, and think it does
| a decent job. The main problem with Jupytext is its reliance on
| BYOD (D for discipline), which is a poor (but possibly best
| available) solution for human systems.
|
| IMHO the Jupyter+Git problem stems from the ipynb format.
| Jupytext does it "right" in the sense that you can work in
| .ipynb, and diff in .md. But as long as the base format is diff-
| unfriendly, all tools are methods of indirection of format
| complexity to tool complexity.
|
| That's not to take away from the tool -- it looks great. It also
| takes the D out of BYOD, which is a win. But I think "solving" it
| means that anybody who receives an ipynb is able to just look at
| it out of the box, like plain text, so we're still a ways off.
| wasimlorgat wrote:
| Oh, on the topic of file formats: Quarto also lets you do
| plaintext notebooks in quite an interesting way, definitely
| worth checking out:
| https://quarto.org/docs/computations/python.html
| euler_angles wrote:
| The latest release of nbdev has fully embraced Quarto! It's
| very awesome, check it out.
| cycomanic wrote:
| I have been using jupytext as well and it really makes handling
| notebooks much easier. I think the decision for Jupyter to save
| to json was not a good one and they should instead have looked
| at systems like org mode for inspiration.
|
| I also don't understand what you mean by discipline. Yes you
| need to make sure that everyone has the jupytext extension
| installed, but that just becomes part of the needed dev
| environment. After that the whole experience becomes completely
| seemless.
| wasimlorgat wrote:
| I agree re format vs tool complexity. I don't think Jupyter is
| a particularly difficult format though, its mostly light JSON
| -- all human-readable.
|
| We realised after working with Jupyter+Git for a while that the
| pain-points were actually with Jupyter editors (and/or their
| conventions) rather than the format, because they do things
| like store user-metadata _in the file_ which pollutes diffs and
| leads to merge conflicts.
|
| In fact, if Jupyter editors could handle merge conflicted
| files, we wouldn't need a custom merge driver either.
| maegul wrote:
| I feel like the BYODiscipline problem with jupytext could be
| solved with relatively rudimentary text-editor plugins.
|
| I've started rolling my own little plugin utilities and, so
| far, I have a (very) rudimentary notebook-like interface in
| plain text.
|
| Combine a proper attempt at such a thing with a good interface
| to a background ipython kernel system (for which ipython could
| do with some minor enhancements AFAICT), and you'd basically
| have the best of all worlds (all plain text editor features
| including version control and personalisation and
| customisation, and the iterative advantages of notebook code-
| cell-based runtimes).
|
| Hopefully, with such a combination functioning well, there'd be
| an emergent feature that allows one to more easily get
| interactive with a code-base for the purposes of understanding,
| debugging or developing it.
|
| Personally, my biggest gripe with Jupyter at the moment is that
| a few years ago they decided to try to create a quasi-IDE
| (where they'll probably be beaten by VSCode) rather than
| improve the general utility of the kernel (or kernel
| protocol/interface?) and/or the essential notebook UI.
|
| It's a personal gripe, and there's clearly value in the web-
| first interface they've made with JupyterLab (despite the not
| insubstantial growing pains that project has faced), but,
| watching ObservableHQ and Pluto (for Julia) focus just on the
| core notebook interface, while VSCode have focused on the IDE
| side and easily incorporated or recreated the now rather
| old/simple Jupyter Notebook interface, both with success, seem
| like some vindication on my gripe.
| zelphirkalt wrote:
| By "essential notebook UI", do you mean the old notebook or
| the lab interface?
|
| The old notebook was painful to extend in JS compared to
| writing lab extensions in TS. In Jupyter Lab 3 they have
| taken questionable steps, but so far I have been able to work
| around issues.
| maegul wrote:
| I was referring to the notebook-like "part" of the
| interface, where JupyterLab is a notebook interface with
| IDE-like components wrapped around it (file explorer,
| terminal etc).
|
| The ObservableHQ interface, for instance, I'd classify as
| just a notebook interface. IE, individually manipulatable
| code-cells with a shared runtime.
|
| And yea, JupyterLab is better than the classic IMO. But,
| until recently I'd say, the notebook part of the interface
| hasn't gotten much love at all, while there've been steps,
| due to popular demand it seems, to provide alternative UIs
| that strip away much of what they've added on top of the
| notebook (ie, simple mode, and now Jupyter Lite).
|
| I haven't really got experience writing extensions in the
| old Jupyter notebook, and hardly any with JupyterLab, but
| my experience with JupyterLab was frustrating because it
| felt like they really killed the ability to implement small
| and hacky plugins like you could with with the old. This
| always struck me as a shame. A necessary one perhaps given
| what I presume is the increased power of their new
| framework. But it always felt like there was a mismatch
| between the complexity of the plugin framework (which is a
| full web-dev experience) and the base features of the
| "product", where customising my test-editor is now much
| easier AFACT.
|
| What issues and questionable steps were you thinking of?
| zelphirkalt wrote:
| > What issues and questionable steps were you thinking
| of?
|
| 2 things come to mind right now:
|
| Starting with JupyterLab 3 (maybe 3.1), JupyterLab
| removes query arguments from the URL. Query arguments
| were the only way I know, to give arguments from the
| outside of JupyterLab to JupyterLab. Any extension, that
| relies on arguments given from the outside would break,
| just because JupyterLab removes query arguments, which
| were there since the beginning and did not do any harm,
| at aleast any I could tell. But suddenly this was taken
| away, without proper alternative. Now you have to hook
| into their "router" to quickly grab those arguments,
| before they are gone. This seems silly to me. Why
| randomly delete query arguments? They are there for a
| reason and since JupyterLab does not add any of its own,
| I cannot understand this decision. Simply seems to make
| it less powerful a tool.
|
| The constant nagging about posting in their community JS-
| only forum. ("You should post this in the forum.", "Have
| you seen this post in the forum? _links to forum_ ") Why
| can this community not handle issues in issues, which can
| be easily found using a search engine. Why hide
| everything behind a JS-only forum, which one has to
| create another account for or associate ones Github
| account with? Whenever anyone gives me a link to the
| forum, where supposedly the answer to my question is, I
| keep thinking: "Ahh great, why did you have to hide it in
| there? If you had documented this in an issue, I would
| have found it via search engine and the thing would not
| have wasted my time and neither would I have had to waste
| yours." -- something along those lines. When I find an
| issue and its solution, I still post it as Github issue,
| so that other people can easily find it, without signing
| up to their forum.
|
| > I haven't really got experience writing extensions in
| the old Jupyter notebook [...]
|
| I have done that a few years ago, when JupyterLab was
| still alpha versions. It worked, but the typical JS
| mistakes plagued me. JupyterLab is of course using
| TypeScript, which helps a lot with avoiding silly
| mistakes. However, I do think there is something to what
| you say about no longer encouraging the quick hack. Some
| functionality took years to appear in JupyterLab, but was
| already available for Jupyter Notebook, before JupyterLab
| took off.
| scombridae wrote:
| _Subversion used to say, 'CVS done right.' With that slogan there
| is nowhere you can go. There is no way to do CVS right._ -- Linus
| T.
|
| Jupyter's ipynb format is only slightly more amenable to git than
| say an MSWord doc. Nbdime and friends will never get you to a
| point where git+jupyter will be worth the ugly.
| fragmede wrote:
| _slightly_. Since, like, Office 2007, MSWord docs are zip files
| with xml inside.
| jph00 wrote:
| What are the outstanding problems you feel are there even with
| the new nbdev2 functionality? Since I've been using it (the
| prerelease version) over the last few months I haven't come
| across a single problem, personally, despite doing a very large
| amount of collaborative notebook work.
| Helmut10001 wrote:
| Not criticizing the authors approach, but the Jupyter+Git Problem
| was solved for a long time with Jupytext [1].
|
| Jupytext will convert Notebooks (.ipynb) files to Markdown (md)
| and Python (py) 'on the fly' (while working in Notebooks).
|
| - Markdown files can be added to git
|
| - Python and .ipynb files are added to .gitignore
|
| - Python files allow 'chained' import of notebooks (*.py
| verions), which allows to split larger notebooks into multiple
| smaller ones
|
| This is my folder structure: .
|
| +-- notebooks
|
| | +-- notebook1.ipynb # automatically generated from md
|
| | +-- notebook2.ipynb # automatically generated from md
|
| +-- md
|
| | +-- notebook1.md # versioned in git
|
| | +-- notebook2.md # versioned in git
|
| +-- py
|
| | +--modules
|
| | | +--__init__.py # empty
|
| | | +--tools.py # use for cross-project base tools
|
| | +--__init__.py # empty
|
| | +-- notebook1.py # automatically generated from md
|
| | +-- notebook2.py # automatically generated from md
|
| +--jupytext.toml
|
| +--.git
|
| +-- README.md
|
| See an example here [2]
|
| Jupytext is mentioned as a 'potential' alternative. Re the "save"
| cell output: I usually produce html-files at the end of my
| notebooks (see the example), and add those either to git or auto-
| upload to an external webserver. The html is standalone and
| includes outputs, table of contents, and images (example [3]). I
| would advice against versioning all outputs (images) in git.
|
| Very happy with this approach for a long time now. Jupytext
| increased my productivity by a hundred percent.
|
| [1]: https://github.com/mwouts/jupytext
|
| [2]: https://gitlab.vgiscience.de/ad/yfcc_gridagg
|
| [3]: https://ad.vgiscience.org/tagmaps-mapnik-
| jupyter/01_mapnik-t...
| jph00 wrote:
| The pros and cons of Jupytext are discussed in the linked post.
| It's a great approach, but wasn't sufficient for our needs --
| so for us, at least, it didn't fully solve the Jupyter+git
| problem.
|
| Specifically, it doesn't handle the situation where you need
| cell outputs in version control -- since in that case, you
| still need the notebook, which results in all the usual
| problems occuring. With nbdev2, you don't need to think about
| anything or do anything special, and stuff like GitHub notebook
| rendering, nbviewer, ReviewNB, etc all just work. You just run
| a single command (`nb_install_hooks`) and that's it.
|
| Also, no-one has to install anything extra to view your
| notebooks, since they're stored in the regular notebook format.
| cycomanic wrote:
| I'm not sure what your cell outputs are, but if you are doing
| plots or images inside your notebook, than I agree with the
| OP that it is generally not a good idea. You now store binary
| data inside your git repository (which sometimes just carries
| its own problems), but worse that binary data is mixed into
| your text diff.
|
| If you do a diff between two revisions where some figure
| changed you essentially will be swamped by the diff in the
| figure making it difficult to find what actually changed. Now
| tools like nbreview get around that, but now you're forcing
| everyone to use the same dev tools, and can't look at diffs
| any other way really.
| glenngillen wrote:
| It's been a while, but last time I needed to GitHub at
| least had really great tooling for diffs between versions
| of image files.
|
| > but now you're fixing everyone to use the same dev tools
|
| No they're not. You can continue using whatever approach
| you're using. Attempting to shut down alternatives like
| this though could be seen as forcing everyone to accept
| whatever that status quo and lowest common denominator
| solution, even if their dev tools could support something
| better.
| wodenokoto wrote:
| This is mentioned and critiqued on both pro and cons in the
| article.
| wasimlorgat wrote:
| Jupytext does a lot more than just fix Jupyter/git integration,
| which is great if you want to adopt its approach, but a bit too
| heavy IMO if you don't. The approach mentioned here is
| extremely lightweight and doesn't use too much more than built-
| in Jupyter/git functionality (and it all happens automatically
| behind the scenes)
| spiim wrote:
| I find this conversion a little bit clunky. The approach that
| seems to work for me is to use quarto with it's .qmd format.
| https://quarto.org/
| g8oz wrote:
| Quarto looks amazing!
| ellisv wrote:
| It's pretty nice. At first I thought it was just a
| rebranding of R Markdown but it's been decently
| modernized/improved to the point where it makes sense that
| it is its own, separate thing.
| da39a3ee wrote:
| > Here at fast.ai we use Jupyter for everything. All our tests,
| documentation, and module source code for all of our many
| libraries is entirely developed in notebooks
|
| That sounds like a nightmare. Why would you want to develop a
| library in a jupyter notebook?
|
| > The solution presented here is the result of years of work by
| many people.
|
| It's a bit depressing that it came to this. It's hard not to
| think that it was a mistake from the beginning and that the
| format should have been based on using special comment markers in
| valid code, together with an accompanying JSON metadata file. Or
| something like that. One way or another, we have a very strong
| tradition of storing code in plain text files, not embedded in
| strings in JSON or otherwise embedded in any opaque format. Maybe
| there'll come a day when it's appropriate to abandon that to get
| some advantages, but I don't think that day was the original
| creation of Jupyter. I know it was created by thoughtful and
| expert software engineers, but I feel that it was a mistake and
| it's actually made a lot of data science / academia-oriented
| people less qualified to participate in industry software
| engineering, because of the poor practices forced upon them by
| the inability to use git with Jupyter, and notions like
| developing library code in notebook cells.
| jayd16 wrote:
| You know, is there anything like the Language Server Protocol for
| diff/merge resolution? Seems like there's an opportunity to build
| a system for semantic aware merge that's language/format aware
| and tool agnostic (and auto configurable to boot).
|
| Suddenly binary formats could become mergeable.
| zie wrote:
| See [Pijul](https://pijul.org/manual/theory.html). They did
| that hard work.
| samatman wrote:
| Hmm I'm a huge fan of pijul, looks like the future of change
| management from where I'm sitting, but no: they have not.
|
| Semantic diffing needs something like pijul, but a system
| taking advantage of this doesn't yet exist. Pijul avoids some
| merge conflicts by design, won't do the wrong thing, and
| handles conflicts correctly: we still need tools with a
| fuller awareness of what strings _mean_ to have rich semantic
| diffs.
| zie wrote:
| True, Pijul only offers the safe diff/patch part.
| medo-bear wrote:
| or ... why not just use org mode?
| faustlast wrote:
| I wish org-mode was standard and more appreciated. It is so
| good, but I feel that I'm the only one using it and it is hard
| to sell emacs to others.
| hendry wrote:
| If there a good place to see Jupyter note books solving a real
| problem?
|
| Idk, like importing some data and doing some analysis /
| forecasting?
|
| Most notebooks appear really bad quality. Worse internally.
|
| Better off looking at some excel
| https://github.com/martinshkreli/models
| throwaway72937 wrote:
| Does this work for editing notebooks in VS code? (Unclear to me
| where the saving hooks reside, and whether you have to edit them
| through Jupyter labs/notebook) Any issue if the notebooks reside
| on a remote server?
| planede wrote:
| Is it possible to get a "diff3"-like conflict style? That is
| showing <<< side1 |||
| ancestor === side2 >>>
| cigrainger wrote:
| This is one reason I feel lucky to be working with Elixir.
| Livebook's livemd is basically just markdown.
| https://livebook.dev
| wodenokoto wrote:
| So is jupytext, rmd and qmd. But what do you do about the
| output?
|
| The nice thing about markdown-like notebooks is that they play
| well with git. The nice thing about jupyter style notebooks is
| that they contain all the content needed to actually _read_ the
| notebook.
| cs702 wrote:
| Wow, thank you to the authors!
|
| It looks like this tool, _nbdev2_ , solves a real-world problem
| for Jupyter users, including me, with _zero effort_ required to
| use it every day. It relies on clever hooks to get git to treat
| cells as first-class citizens (as opposed to lines of text, the
| default). Nice! Based on that alone, I would expect _nbdev2_ to
| be widely adopted over time. In fact, if it works as well as
| advertised, it should be incorporated into Jupyter. I, for one,
| will be giving it a try!
|
| If you use Jupyter to solve problems in your domain of expertise,
| feel free to ignore all the smart-sounding software engineers who
| will poo-pooh this tool _only_ because they don 't like notebooks
| and don't want anyone to use them. No matter what you do, there
| will _always_ be people who look down on easy-to-use tools that
| enable scientists and practitioners from other disciplines to
| write, run, and explore ad-hoc code on-the-fly.
|
| EDIT: _nbdev2_ 's authors are on this page, answering questions.
| Thank you again!
| xcambar wrote:
| Stating that git breaks Jupyter notebooks is quite a flex.
|
| It stains the article from the very first paragraph.
| [deleted]
| wasimlorgat wrote:
| Have you worked with Jupyter notebooks and git? It's a
| literally true statement :D and quite a struggle for many of us
| xcambar wrote:
| If you leave git diffs in your files, whether Jupyter
| notebooks or otherwise, and run/compile them... They will
| break.
|
| If you give me a counter example, good for you, but my
| statement holds true 99%.
| jsweojtj wrote:
| You state in the top level comment that this claim stains
| the article: "Stating that git breaks Jupyter notebooks is
| quite a flex."
|
| But you are saying here: "If you leave git diffs in your
| files, whether Jupyter notebooks or otherwise, and
| run/compile them... They will break."
|
| Have you changed your mind in this thread? Or what's your
| objection?
| xcambar wrote:
| I'm suggesting that git only breaks Jupyter notebooks (or
| anything else) if you do not know what to expect from
| git.
|
| But if you don't know that git modifies files when
| conflicts, then you're an interesting and rather
| unexpected audience, I assume.
|
| Meaning that for the typical git user, meaning, knowing
| about git diffs, the behavior is expected hence not
| broken. The files end up in an expected broken state, but
| git does not break them per se.
|
| If you still disagree, let's just settle that we disagree
| and be done with it.
| cycomanic wrote:
| It's the wrong way around though, Jupyter notebooks break a
| git work flow. I think the fault here is completely with the
| design of the Jupyter notebook file format (and the way
| editors save to it).
|
| I think it's quite unfortunate that they did not consider
| that the format would integrate well with version control
| systems when first designing ipython notebooks.
| fumeux_fume wrote:
| Nah man, you got it backwards. Git still works just fine
| while my notebooks are definitely broken. Not here to play
| the blame game, just trying to relate the practical
| results.
| persedes wrote:
| No mention of https://github.com/srstevenson/nb-clean ?
|
| Has been my go to for this. It seems like nbdev2 is fastais own
| cooked solution with a bunch of other tools.
| wasimlorgat wrote:
| Hi, I'm the author of the git merge driver and Jupyter save hook
| in nbdev2 :) I'd be happy to answer any questions you have about
| how we're handling using notebooks with git
| howon92 wrote:
| I enjoyed reading the writeup and think the solution is clean!
| Thanks for sharing
| jks wrote:
| Can this do three-way merge? If I have to resolve two
| conflicting code blocks, it is often useful to know how each of
| them change the code from the shared parent.
| wasimlorgat wrote:
| It does an ordinary three-way git merge (treating notebooks
| as plaintext) then a two-way merge on conflicted bits. We
| opted for that approach because its incredibly simple and has
| worked perfectly for us (I think since we tend to work with
| small code cells). I think nbdime has a full-on three-way
| notebook merge if that's what you need, which can be used
| together with nbdev's Jupyter save hook to clan up unneeded
| metadata.
| p1necone wrote:
| I haven't used Jupyter but from what I can gather from this
| article they've built a simultaneous editing system on top of
| automatically committing to git in the background as multiple
| people edit things, and using that to share the changes between
| users.
|
| Do I have that right? Because that sounds /insane/.
| anigbrowl wrote:
| If this doesn't work for you JetBrains' DataSpell might, it's
| oriented towards notebooks for teams. It has hiccups, things
| like ipywidgets don't always work as expected so I sometimes
| find myself falling back to Jupyterlab. But overall it's a very
| comfy chair.
| jph00 wrote:
| No it just uses normal git in the normal way. The simple trick
| is to use a jupyter-native git merge driver, so that merges are
| done at a cell level instead of a line level.
|
| Also, unneeded metadata is removed from the notebook when
| saving, so there's less changes to merge.
|
| Both these two things are done using standard hooks built into
| each of git and Jupyter. That is: git is written in such a way
| that it can fully support non line-oriented formats. We just
| took advantage of that capability.
| p1necone wrote:
| Ahh right, so you still make manual git commits normally,
| it's just that the Jupyter UI used to fall over when it
| encountered merge conflict markers in source files. And now
| it doesn't fall over any more and can nicely represent them
| because the conflict markers are no longer done for
| individual lines of text?
| jph00 wrote:
| Yes exactly :D
| bsdz wrote:
| I use this plugin for my jupyter notebook git integration. It has
| a git diff option that's useful but gets very slow for complex
| documents. Perhaps under the hood it's using one of the other
| tools mentioned in the postscript.
|
| https://github.com/jupyterlab/jupyterlab-git
|
| Edit: Looking at the source, it does appear to use nbdime under
| the hood.
| wanderingmind wrote:
| I thought jupytext solved it long ago with percent formatted
| python file. Since its a python text file you can run automated
| formating, linting, static type checking and git version diff.
| What's new is being solved here?
| liquids wrote:
| I've used this library for a number of projects and it's a joy to
| use. I don't think it's an understatement to say it's paradigm
| shifting - to the extent that once you have your environment set
| up, you are free to code, think, iterate, deploy and document
| your projects all at 99% of the speed of thought.
|
| There seems to be a lot of discussion in here around the pitfalls
| of jupyter, and notebooks, and the poor coding practices of data
| scientists. If you haven't read the article or used the software
| I'd like to highlight that all of these (legitimate) complaints
| are exactly what nbdev2 was created to address, and in my opinion
| very successfully solves.
|
| The way it works is that everything runs off a master notebook,
| and then with one command: libraries are built, git diffs are
| magically fixed, tests are run, documentation is automatically
| created. It doesn't fundamentally change your workflow in any
| way, it just abstracts and automates away all of these pain
| points.
|
| There's a reason that everyone uses jupyter notebooks. They are
| fun to use, they are great for exploring and developing ideas.
| And (minus the aforementioned git collaboration issues) they are
| great for sharing with others, which is a huge part of the wider
| data science ecosystem. We don't need to recommend avoiding
| notebooks, and allege they are just for beginners. We need to use
| tooling which addresses some of these final issues with writing
| mature software. And I'd like to thank the authors of nbdev for
| this.
|
| The people who look down their noses at notebooks can continue to
| do so - but what they will find is that nbdev quite effortlessly
| leap-frogs over these sneered complaints, and allows you to write
| better software more productively.
| no_identd wrote:
| Okay, now gimme a PowerShell version of it.
| boringg wrote:
| This actually works? Awesome - never really thought about how
| dysfunctional git is with jupyter - I always assumed that it just
| didn't work. Nice to have someone fix the problem that I just
| lived with :)
| HuwFulcher wrote:
| Whenever I can I strongly recommend not using Jupyter for
| anything more than the most transient tasks.
|
| I don't know whether it's the Data Science culture or Jupyter but
| there is a big lack of discipline in writing maintainable code in
| DS and non-existent git support is part of that.
|
| I always strongly discouraged developing models using notebooks,
| instead advocating for using .py files and then using notebooks
| for sanity checking data.
|
| I don't have any clever ideas for how we can move past Jupyter
| but the sooner we do the better.
| maegul wrote:
| Yes, the Data Science culture around maintainable code does
| seem to be reaching a critical level of toxicity (in some
| environments at least).
|
| In line with a nephew comment of mine, I feel that bringing the
| immediate interactivity or iteration cycle of notebooks to the
| development experience would help a lot, and not be too bad a
| thing for common development either.
|
| I've heard of the related nbdev project, which seems like an
| interesting and compelling idea. But it'd be nice to see the
| reverse: something that makes ordinary python development more
| immediate than using a debugger/vanilla REPL.
| Leo_Germond wrote:
| I think that improving the shell experience and allowing e.g.
| multimedia content to be displayed and manipulated directly
| into the shell, would help a lot with interactivity. Maybe
| some specific terminal emulator (like kitty) with ipython
| would constitute a good starting point...
| tetris11 wrote:
| It depends how you use it. When you're still new to the data,
| using a freeflow text-and-codeblock workflow like jupyter or
| org-mode really speeds up the exploration phase.
|
| Once you have a consistent set of questions, and methods to
| answer them, then yes, copy off the relevant chunks into their
| own scripts and source these when using similar data to bring
| you up to speed, and modify them to your tastes.
|
| The issue with starting off with an external script initially
| is the distracting temptation to refine your code so it can be
| better used with future data, despite not yet having seen or
| not knowing what that future data is like. The initial "play
| and explore" phase of an analysis is very important imo, and
| notebooks really facilitate that.
| HuwFulcher wrote:
| I agree, Jupyter has its place in helping so exploration and
| learning. A problem that Data Science faces is that the
| majority of courses don't show Data Scientists how to
| progress on from notebooks to write robust training pipelines
| that are reproducible and safe.
| cantagi wrote:
| Yes, people writing unmaintainable code in Jupyter notebooks is
| a problem.
|
| Personally, I start every notebook with
| %load_ext autoreload %autoreload 2
|
| then develop production quality code in .py files.
| etrautmann wrote:
| I didn't realize anyone didn't do this. Totally essential,
| great point!
| shapefrog wrote:
| Well that has improved my life - thanks!
| pplonski86 wrote:
| Low quality Data Science code is not a fault of Jupyter.
|
| The Jupyter allow you to load big chunk of data or some large
| model only once, and then use it for experiments in other
| cells. It is hard to replace this feature with plain `*.py`
| file. For me, this is the killer feature.
| mFixman wrote:
| Jupyter is not great for collaboration with multiple people
| editing, but with a little bit of order it's perfect for in-
| person working and presenting that work.
|
| Notebooks can be clean if you follow some rules:
|
| 1. Code flow always goes down: holding Option+Enter should
| execute all fields without any errors. Don't do `x += 1` if `x`
| is defined underneath.
|
| 2. All blocks are idempotent: running any block 5 times should
| produce the same result as running it 1 time. Don't do `x += 1`
| unless `x` is defined in that block.
|
| 3. Keep block-local variables short and block-global variables
| long. Don't do `x += 1` unless you are not using `x` anywhere
| else.
|
| Also, the Table of Contents extension [1] is a life-saver for
| making long analyses workable.
|
| [1] https://jupyter-contrib-
| nbextensions.readthedocs.io/en/lates...
| analog31 wrote:
| I have a rule that helps with hidden state and out-of-order
| execution. Once in a while I do a "restart kernel and run all
| cells." If doing that breaks anything, then I have to fix it.
| But it also ensures that a notebook is reproducible later on.
| Of course I don't have things that take hours to run.
|
| It would be nice if there were something that would make out-
| of-order problems light up, the way that code editors can
| highlight errors while you're editing. A limitation of
| "browser as editor" is that it misses out on some of the
| powerful things that code editors do today.
|
| Another thing is to put things in functions, so temporary
| variables are disposed of. That's a halfway step to putting
| things in .py files. A benefit if .py files is not always
| that jupyter is bad, but that variable scoping is good
| hygiene.
| n8henrie wrote:
| I try to follow these ideas, and in many of my notebooks I
| frequently test them by occasionally running "restart kernel
| and run all cells" from the menu, which tends to point out
| anything I've accidentally moved or run out of order.
|
| As a table of contents, I usually write some markdown up top
| with links to markdown HTML anchors elsewhere in the page,
| which themselves also have a link back to the TOC. Works
| pretty well. Will have to check out that extension.
| jononor wrote:
| Good rules. I will add some of mine:
|
| 4. Use a function in each block, that is defined and the
| called with appropriate arguments, and return the values of
| the block. This prevents proliferation of global state - and
| the functions are really easy to move out to .py modules when
| things have solidified a bit.
| z3c0 wrote:
| Agreed on all points. Notebooks really aren't that hard to
| maintain. They just require some slightly different rules
| from standard scripts.
|
| Personally, I like to label block-global variables in capital
| case (like PEP8 constants), so as to make them easy to spot.
| Being formatted like constants also causes me to think twice
| about altering it after instantiation.
| cinntaile wrote:
| It would be great to have tools available that force these
| rules on you.
| da39a3ee wrote:
| > with a little bit of order it's perfect for in-person
| working
|
| It's not perfect for in-person working because a single
| person should always keep their work under version control,
| and they should be able to view meaningful diffs to
| understand the history.
| tuukkah wrote:
| > _non-existent git support_
|
| From the beginning of the article: " _With nbdev2, the
| Jupyter+git problem has been totally solved. It provides a set
| of hooks which provide clean git diffs, solve most git
| conflicts automatically, and ensure that any remaining
| conflicts can be resolved entirely within the standard Jupyter
| notebook environment._ "
| qsort wrote:
| One of my most upvoted comments says something to the effect of
| "notebooks bad", so you're preaching to the choir here --
| however:
|
| - I work with several people who are purely data scientists,
| and I lean on "culture" rather than "Jupyter". In some circles,
| probably influenced by academia, programming is considered to
| be low status work. You are not going to solve the problem by
| switching to .py files, even though for most tasks _literally
| anything_ is better than Jupyter.
|
| That they don't use git, or that git wasn't originally even a
| concern, is a consequence of that low-status perception. If you
| pitched something to the developer community and told them "oh,
| by the way, you can't use git", they'd synthesize tomatoes out
| of thin air to throw at you.
|
| - I'd carve an exception for stuff that satisfies ALL of the
| following: (a) is self-contained in a single notebook, (b) has
| no dependencies on anything non-standard, (c) is demonstrative
| in nature, or a personal exercise rather than production
| software. For example, I wrote my solutions to the Advent of
| Code problems in a notebook and I liked the experience,
| especially how you could mix math and code.
| HuwFulcher wrote:
| I would also lean more towards culture and the environment
| that Jupyter provides only perpetuates it. I think what made
| me leave Data Science in the end was that I wasn't driven by
| work outside of the notebooks (i.e. coming up with a
| mathematically superior solution) but driven by building ML
| driven systems as a whole.
|
| I think notebooks are a great way of presenting findings and
| showing your workings at the same time. If that was their
| main use then I wouldn't have any issues
| gradschoolfail wrote:
| I would say that in academia, explorability and immediacy is
| way prioritized over reproducibility and maintainability..
| Two kinds of human beings?
| qsort wrote:
| The incentives in academia are different. The objective is
| to publish, code is important only insofar as it allows you
| to achieve that goal. This is not to say that academics
| can't code, but even if you are a professor who cares
| passionately about making high-quality software, you're
| fighting uphill, because that's not what you're being
| evaluated on.
|
| If you want to make the argument about "two kinds of
| people", I think it's more about A-type/B-type data
| scientists in the industry. I'm really mostly a developer
| and not a data scientist, but when I assist in DS tasks I
| wear a distinct B-type hat, and that informs my
| perspective. A-type people have different priorities and
| that's fine; my gripe is when you try to import A-type
| practices in a B-type scenario.
| wasimlorgat wrote:
| I'm always surprised when people advocate for .py files over
| notebooks because of poor software practice. (Genuine question)
| have you found that it improves the situation at all?
| HuwFulcher wrote:
| I've found varied success. In general, I've encouraged the
| move across to being teaching source control. That has been
| in contexts where notebooks are being used for critical
| outputs rather than exploration.
|
| When you get into MLOps as well, having .py templates
| actually makes the Data Scientist's job easier as they can
| plug and play their models into a system that tracks inputs,
| outputs and changes for them
| jhrmnn wrote:
| I think of Jupyter Notebooks as scratch paper on my desk. It's
| not to archive things, it's for developing ideas. Once ideas
| are developed, I transfer them to a long-term medium (LaTeX or
| Markdown document, Python source file, etc).
| ajford wrote:
| Yep. I worked in scientific applications, and when developing
| some new data cleaning and processing pipelines for our
| hydrology data, Jupyter was phenomenal.
|
| It was easy to use as a presentation, with figures and plots
| embedded. With controls enabled, you could demonstrate what
| varying certain parameters would do and pitch proposed
| cleaning profiles.
|
| I was rather easily able to send a directory and it's
| notebooks/data sources to colleagues in the water sciences
| team so they could validate my results on their own (they
| were luckily also familiar with Python and Jupyter), and
| caught some minor bugs in the pipeline.
|
| This was all much more collaborative and concise, and I feel
| Jupyter played a huge part in it.
|
| Once it was done, it and a "final draft" pdf were added to
| the Docs in the repo and the pipeline was written out into a
| full application of it's own.
| moonshotideas wrote:
| Same, it's perfect for "work in progress code", and working
| out a problem step by step. I've always wanted this
| environment in other languages
| jstx1 wrote:
| My workflow is:
|
| 1. Experiments in notebooks. Notebooks are saved under git but
| mostly as a backup, I don't care how nicely they play together.
| I don't get why you would discourage notebooks for running
| experiments, doing it with .py files sounds kind of miserable.
|
| 2. Services and library code in .py files, under version
| control, just like any other software we write.
| HuwFulcher wrote:
| Experiments using notebooks are fine as long as they are well
| documented.
|
| Having your services and library code as .py files you can
| import in is great.
|
| The issue comes with how to move from experimentation to
| deployment. If you already have services/library code as .py
| files you make your life a lot easier. The issue comes when
| everything is spread across multiple, poorly documented
| notebooks. If you're working with an MLOps team it makes
| their life a nightmare to take those notebooks and conform
| them into something usable.
|
| Jupyter is great when it is used in the right way.
| targafarian wrote:
| 100% agreed.
|
| People use a great tool in a poor way and then broadly
| condemn the tool.
|
| And any tool that is sufficiently flexible to be broadly
| useful can be used in very poor ways.
|
| Jupyter is great, it gets me over the barrier potential for
| starting a task every time. I build and prove out an
| algorithm/task piece by piece. Once I'm happy, I move the
| meat of it to a function in a .py file, and move the code I
| used to test the algorithm to a unit test function. Delete
| the duplicated bits and replace with imports, and then what
| remains is a tutorial/demonstrator notebook using the
| function I wrote and maybe some nice plots to go along with
| that, that I wouldn't put in a unit test (nor that show up
| in docstrings). This can be converted to sphinx docs if the
| code gets big enough.
|
| What a great tool for incrementally building software! In
| my world, I build brick by brick, not all at once. Jupyter
| is a key to that process.
| montebicyclelo wrote:
| The big benefit of Jupyter in the context of machine learning,
| is that you are often dealing with models that take quite a few
| seconds to load. You can put big, slow loading, things into
| memory in the top cells, then try a bunch of logic with them
| below. Whereas when working with just '.py' scripts, you'd have
| to reload the model every time, which can make for slow and
| uncomfortable iteration.
| infinityio wrote:
| not a complete solution, but PyCharm and VSCode both support
| using `# %%` to split a python script into 'cells' (stolen
| from matlab?), which then be executed individually/repeatedly
| nidnogg wrote:
| One alternative to loading models in .py scripts is making
| use of joblib's dump() and load() methods for pipelines. http
| s://joblib.readthedocs.io/en/latest/generated/joblib.dum....
|
| That way, if you put your classifiers in joblib pipelines,
| once you're done with fitting steps you can just export your
| trained classifier with: joblib.dump(pipe,
| "trained_classifier.dump")
|
| And resume your work with:
| joblib.load("trained_classifier.dump")
|
| Considering this works for any Python object, a lot of heavy
| lifting can be exported for later (swift) use this way.
| tcpekin wrote:
| The way I get around this is to start an IPython interpreter,
| and run .py files with `run -i file1.py`. This loads things
| into memory in the interpreter, and then I can run file2.py
| with the actual analysis, and iterate with file2.py until I'm
| happy. In the end, you can keep the files separate, or
| combine them into 1 file that will run top to bottom your
| whole analysis. As long as you keep the IPython session open
| everything remains in memory, just like in a notebook. The
| autoreload magic also works if you set it to the correct
| option, so if you are working on a library/package it will
| automatically reload them if necessary.
| HuwFulcher wrote:
| Yes that's a big plus of notebooks. Hopefully a solution can
| be found for .py files in future where you can earmark the
| top part of the script to be cached so the interpreter skips
| over it
| akx wrote:
| Where would an interpreter cache things if it's not running
| anymore? The disk? You're back to loading data from disk.
| HuwFulcher wrote:
| Yep, I don't know enough about the interpreter under the
| hood but an interactive mode like a debugger where you
| can go back to a previous line, etc might be the
| solution. I doubt that's high on the priorities of the
| Python team though.
| bobbruno wrote:
| You're thinking it the wrong way. Notebooks don't do well in
| software development, but they are extremely useful on
| exploratory data analysis and quick iteration when searching
| for a suitable modeling approach. These two tasks use code, but
| for completely different purposes. A DS is working on the data,
| understanding it and trying to identify what information it may
| have. Then they try to find a model that will leverage that
| information to deliver whatever inference solves the business
| need. This is extremely interactive and iterative, and
| everything from the actual business problem to the ML approach
| may change at each iteration. Imposing software development
| practices at this point is disruptive to the train of thought,
| which is very burdened already by the level of uncertainty and
| all the mathematics required to understand the data results.
| The goal is to find a viable approach, not write production
| code.
|
| Once this approach is found, a good clean-up/refactor is
| strongly recommended, to then start a proper software
| development that will create a live product from the found
| approach. I call this the switch between research mode and
| development mode, and it has strong parallels to the way R&D is
| done in many industries. I believe a lack of understanding of
| this dual nature of ML is what causes many of the problems in
| MLOps: plans that don't take into account the research time and
| risk, mixed teams where engineers don't understand the initial
| nature of DS work, attempts to put notebooks containing
| research code in production, etc. Even planning for the
| refactor doesn't solve it all - what will happen when the next
| generation of a model has to be created? Will the refactor Ed
| code be forced on the DS and ruin their research productivity?
| Will they start from scratch again and not only lose all the
| refactor/dev cost but also make this a recurring cost? I have
| been looking for answers for this for years now, and found none
| so far.
|
| Source: I've been working with data for 27 years, as a data
| engineer, data architect and data scientist. When I do DE, my
| code is considered high quality by my peers, but when I'm doing
| DS research, I know I write bad code - and I won't change that.
| It's more productive to work this way and do the big refactor
| (possibly leaving the notebook env behind along the way) than
| the alternative.
| scombridae wrote:
| _not using Jupyter for anything more than the most transient
| tasks_
|
| While most programmers have reached this conclusion, they're
| generally not day-in day-out jupyter users. They need to
| understand *everything* is transient for scientists who
| optimize for proof-of-concept and publish-and-forget-it paper
| writing.
| frumiousirc wrote:
| > *everything* is transient for scientists who optimize for
| proof-of-concept and publish-and-forget-it paper writing.*
|
| Which itself is a huge problem.
|
| Happily this mindset is changing, at least in some scientific
| all fields. For example, in particle physics proposals a
| document ("data management plan") much be written describing
| how that unconscionable attitude will not be taken with the
| experiment's data and software. That said, this transient
| mindset and derision of real software skills is still fairly
| prevalent in this field.
| scombridae wrote:
| _Which itself is a huge problem_
|
| More "nature of the beast" in my opinion. Science measures
| itself by how many alluring women it can date; engineering,
| by how long it can keep the wife happy.
| LeanderK wrote:
| If you work with something visual, interactive then this
| workflow is so super awkward that I never end up doing it. For
| data-driven workflow you have to analyse the data, note down
| your thoughts, analyse a bit more and then come to a
| conclusion. Your conclusion might be code living in .py files,
| or another type of data then consumed by something else. But
| this will result in a significant part of the "thought-process"
| and relevant code living in those notebooks, with all their
| problems. I can't just switch to some .py files because I want
| to change the axis for some plot, or look at it in log-scale.
| But then where do you draw the line? A .py file for only 10
| lines of code generating the resulting .csv? That's also a pain
| to maintain because you have all those disconnected files. We
| need those notebooks, they have to get better.
| medo-bear wrote:
| i strongly agree with what you are saying about Jupyter,
| however i strongly disagree about using netobooks in general
| (literal programming)
|
| one of the key things that a good notebook system must allow
| you to do is to mix something like markup format + LaTeX +
| source code. writing math-heavy documentation and explanations
| is simply impractical and limited (readability suffers) if done
| in comments. jupyter however is severely limited as it is
| unreadable in its raw format and therefore does not play well
| with a version control system such as git
|
| instead there is a solution that allows one to do everything
| jupyter does good with the additional benefit that it plays
| with version control really well - ie _org-mode_ [1]. the only
| difference is that instead of using a browser to interact with
| it, you use emacs. the added benefit to this is that you can
| also use full-featured key bindings (emacs / vim) and even
| integrate a language server for auto-completion [2]
|
| EDIT: moreover the list of supported languages in orgmode far
| exceeds that of jupyter [3] (or did the last time i made this
| comparison)
|
| [1] https://orgmode.org/
|
| [2] https://emacs-lsp.github.io/lsp-mode/manual-language-
| docs/ls...
|
| [3] https://orgmode.org/worg/org-
| contrib/babel/languages/index.h...
| kgwgk wrote:
| > using netobooks in general (literal programming)
|
| I guess you meant literate programming.
|
| Literate programming is different from interleaving console
| inputs and outputs and random paragraphs in the same
| document.
|
| Even if we expand the original idea to comprise that,
| literate programming is much more than that.
| medo-bear wrote:
| yeah it was supposed to say literate programming. anyway
| there is no doubt that org-mode (and jupyter) is an
| application of literate programming concepts. See https://e
| n.wikipedia.org/wiki/Literate_programming#Literate_...
| kgwgk wrote:
| But programming is more than writing notebooks.
|
| How many python packages are written in literate
| programming style?
|
| How many programs written as notebooks would be actually
| better if they were structured differently?
| medo-bear wrote:
| ? i never said that its a one-size-fit-all solution.
| certainly you would not write a software package in a
| notebook. but you might write a tutorial, textbook,
| academic paper, homework, personal notes, etc.
| kgwgk wrote:
| My point was that a comment against notebooks being
| overused - where a different structure would make more
| sense - is not a necessarily a comment against literate
| programming.
|
| The issues with notebooks - in general - are unrelated to
| literate programming. The notebook format is convenient
| to have some kind of "interactive" programming though,
| rather than "literate".
| medo-bear wrote:
| > The notebook format is convenient to have some kind of
| "interactive" programming though, rather than "literate"
|
| interactive programming is usually handled by the repl,
| for which you do not need a notebook
| kgwgk wrote:
| Of course you don't! The notebooks are glorified repls
| and you can also have literate programming without
| interactive notebooks. What notebooks get you compared to
| alternatives is both things at the same time.
| medo-bear wrote:
| my point is similar but restricted to jupyter. i think
| that that org-mode can offer a much more advanced and
| complete literate programing environment than jupyter
| that's far beyond just markdown + repl
| kgwgk wrote:
| Agreed.
|
| Note how babel is presented, by the way (last point in
| particular): Babel augments Org code
| blocks by providing: interactive and
| programmatic execution of code blocks; code
| blocks as functions that accept parameters, refer to
| other code blocks, and can be called remotely; and
| export to files for literate programming.
|
| https://orgmode.org/worg/org-contrib/babel/intro.html
| bobbruno wrote:
| Have you considered that the notebook is an evolution of
| a repl, with improved visualization and feedback, for for
| analysis-heavy work? The problem starts when notebooks
| are used for development and production.
| bobbylarrybobby wrote:
| I think the solution is quarto https://github.com/quarto-
| dev/quarto-cli
| Grumbledour wrote:
| There is always a lot of org-mode promotion on here when the
| topic is interactive notebooks. And I get it, people love it
| and it solved many of the problems other systems have. But
| org-mode users need to understand that the one thing holding
| org-mode back is simply emacs. I know you probably all love
| it, but everyone else is not interested in breaking of their
| fingers by learning obscure key command chains just to use
| org-mode. Sorry, but that is just the reality. If someone can
| implement the majority of org-mode in a better editor, there
| might be more users interested. But as it stands, it's just
| to much of a hassle.
| natrys wrote:
| I expect one of the main reasons someone could evangelise
| Emacs for is the fact that defaults don't mean much when
| it's all configurable. So if you don't like the keys, just
| bind them to whatever you like. That's like the fundamental
| ethos of everything in Emacs. Also, CUA-mode exists.
|
| If org-mode wasn't backed by Emacs, it would merely be a
| markdown substitute hence much less useful. There are many
| org-mode clones for modern editors like neovim or VSCode,
| except all they offer is front-end features (highlighting,
| folding, node manipulation etc). There is simply no reason
| to use those over a decent markdown editor. So I think you
| have this backwards; Emacs isn't holding back org-mode,
| rather much of advanced org-mode features are made possible
| and distinguished by the fact that it builds on Emacs.
| Majromax wrote:
| > I expect one of the main reasons someone could
| evangelise Emacs for is the fact that defaults don't mean
| much when it's all configurable. So if you don't like the
| keys, just bind them to whatever you like.
|
| Configurability is a strength of a system, but it is no
| an answer to a difficult learning curve. A user must
| first understand the system in order to configure it
| appropriately.
|
| Even at the level of key bindings, the user needs to
| understand the relative frequency and importance of an
| operation to choose an appropriate key combination.
| Universal reconfiguration may even make the system less
| learnable, if documentation and tutorials can't assume a
| reasonable default configuration.
|
| In my opinion, configuration is great as one of the final
| steps of a user's journey, taking the system from
| something that works to something that _sings_. It 's
| just the wrong level to sell benefits to beginners.
| medo-bear wrote:
| > Even at the level of key bindings, the user needs to
| understand the relative frequency and importance of an
| operation to choose an appropriate key combination.
| Universal reconfiguration may even make the system less
| learnable, if documentation and tutorials can't assume a
| reasonable default configuration.
|
| i have a feeling that people who write these things have
| never really tried emacs beyond opening it and getting
| annoyed that ctrl-c/v/x don't work (at first) the way
| they are used to
|
| emacs is not key-binding-based, it is command based. if
| you change a key binding its not like you can wreck
| anything as you can always call the command prompt by M-x
| and search for the command that you wanted some key
| binding to perform. key-bindings are just shortcuts to
| commands so i think its best to listen to your fingers
| and form muscle memory and then assign them
|
| what are your most basic commands? copy, paste, select,
| start/end of line/function/class/paragraph/etc, move by
| word/sentence/etc, save, exit? these are not that many to
| set to whatever key combinations you want. i wish my
| browser had at least this level of extensibility
| medo-bear wrote:
| > it's all configurable
|
| also the ecosystem is huge and chances are that the
| configuration you are after is just a package-install
| away
| medo-bear wrote:
| > I know you probably all love it, but everyone else is not
| interested in breaking of their fingers by learning obscure
| key command chains just to use org-mode. Sorry, but that is
| just the reality
|
| i'm sorry to burst your strong held convictions but you can
| choose any of the following
|
| a) use any key-bindings you like including emacs, vim, cua,
| or combination of
|
| b) use org-mode without any knowledge of more advanced
| emacs commands (except basic knowledge of using an editor)
|
| c) drink some milk (gotta have strong bones) and learn how
| to use the emacs system including emacs lisp and have one
| of the most advanced computing environments in existence at
| your service
|
| sorry, but that is just the reality
| scombridae wrote:
| About 1000 times per day, someone says emacs is too big
| an ask for org-mode, and someone replies, it's
| configurable to feel like whatever you're used to.
|
| The latter needs to accept that most users, particularly
| scientists, reject out-of-hand anything requiring
| configuration or compilation no matter how trivial.
|
| But it's all moot since org-mode is largely promoted by
| non-scientists (computer science is not a science), and
| should a wysiwig-inclined scientist ever get past the
| emacs obstacle, he'll balk at the awkward BEGIN_SRC
| incantations.
| medo-bear wrote:
| > But it's all moot since org-mode is largely promoted by
| non-scientists (computer science is not a science)
|
| My academic training is in physics and mathematics. I was
| introduced to programming in my computational physics
| class. We used emacs as our editor
| dtech wrote:
| Congratulations for being special, 99% of academia uses
| Matlab, a simple GUI text-editor or something like
| Anaconda.
|
| Emacs is such a fundamentally different paradigm from all
| other IT tools/editors that it just doesn't make sense to
| recommend a specialized tool with a steep learning curve
| and non-transferable skills when it's not ubiquitous and
| more standard alternatives exist without it which do OK.
| It doesn't matter that emacs was historically the first
| and everyone else decided to go in different directions,
| that's just the reality of today.
| medo-bear wrote:
| > non-transferable skills
|
| im curious which transferable skills you think Jupyter
| has
|
| > Emacs is such a fundamentally different paradigm from
| all other IT tools/editors
|
| its not. you use a mouse and click where you want your
| pointer to go. you use a keyboard to type
|
| > steep learning curve
|
| this is very much like saying that linux has a steep
| learning curve and you refuse to touch ubuntu because you
| are scared to blow up your computer
| scombridae wrote:
| _much like saying that linux has a steep learning curve_
|
| Most people do say that. What programmers cannot grasp is
| progress is about giving people what they want, not what
| is rationally best. Most scientists want to knock out a
| paper or presentation, and make it home on time for
| dinner.
| medo-bear wrote:
| > Most scientists want to ...
|
| are you their union rep ?
| BeetleB wrote:
| > But it's all moot since org-mode is largely promoted by
| non-scientists (computer science is not a science)
|
| I think you're just suffering from selection bias, given
| that this is on HN.
|
| Add me to the list of people who began using Emacs and
| Org mode during academia in a non-CS program.
|
| Furthermore, go look at the Emacs conference - you'll
| find a significant number of speakers are not CS folks.
| avgcorrection wrote:
| Is astronomy a science?
|
| https://www.youtube.com/watch?v=WgyRdnjRI4o
| avgcorrection wrote:
| > I know you probably all love it, but everyone else is not
| interested in breaking of their fingers by learning obscure
| key command chains just to use org-mode. Sorry, but that is
| just the reality. If someone can implement the majority of
| org-mode in a better editor, there might be more users
| interested.
|
| https://www.spacemacs.org/
| liotier wrote:
| I would even settle for a wiki with state-of-the-art
| outlining shortcuts and dates as a first-class dimension.
| BeetleB wrote:
| > But org-mode users need to understand that the one thing
| holding org-mode back is simply emacs. I know you probably
| all love it, but everyone else is not interested in
| breaking of their fingers by learning obscure key command
| chains just to use org-mode.
|
| We get it - what makes you think we don't. We are merely
| pointing out a superior solution.
|
| Like back in 2004 I would tell people how many of their
| problems would be resolved if they switched to Linux. Fast
| forward two decades later, the statement is still true, and
| most people still don't use Linux. But it wasn't a
| problematic thing to point it out to them - be it in 2004
| or now.
|
| (It's a lot easier to use Emacs than switch to Linux.)
| operator-name wrote:
| Others have mentioned the usefulness of literate programming so
| I won't reiterate that.
|
| Partially the lack of discipline comes from the implicit data
| dependancies between cells. Variables are all globally scoped
| and unless you ensure the notebook can be ran top to bottom its
| easy to introduce subtle bugs. I believe Julia's
| https://github.com/fonsp/Pluto.jl solves this issue quite well.
|
| Another part comes from cells that should really be functions.
| In my opinion this is because functions are 2nd class citizens
| compared to cells, and could be improved with UI (function
| cells? node based programming?).
|
| Programming is more than just manipulating text, so why
| shouldn't tools move in a direction of just being fancy text
| editors?
| lake_vincent wrote:
| Oh man, thank you. I grew up with C++ and Java as my main
| languages, so I always feel more at home with a py file.
|
| Notebooks never caught on with me.
| kriro wrote:
| It all depends on the context. In academia it is a great tool.
| I can set up a couple of notebooks on our GPU server and give
| many students access to powerful GPUs without having to worry
| abbout shell access etc. Aditionally they are ready to go and
| do interesting things immediately and don't have to install the
| environment on their laptops (which might be win/linux/mac but
| at least these days that's easier but still extra work for
| them).
|
| I also use it a lot for experimenting, parameter tuning etc.
| It's not too bad to have it explicitly distinct from production
| level code. Run/tune/experiment in notebook, once you're happy
| with the model -> code it up in .py file(s). Also great for
| quick presentations :)
|
| However, the fast.ai team is actually doing a pretty solid job
| running everything off notebooks. So if I wanted to go that
| direction (and skip the .py files) it's that project I'd look
| at for how to do it.
| carderne wrote:
| > without having to worry about shell access etc
|
| Do you mean you _don't_ want to give the students shell
| access?
|
| By default you can run shell commands from within a Jupyter
| notebook be prefixing them with `!`.
| rovr138 wrote:
| They might mean without having to setup individual user
| accounts for the server.
|
| https://jupyter.org/hub
|
| and if it's just the professor's lab, just a jupyter lab
| instance with one password works too.
| mistrial9 wrote:
| dude - more than 1 million undergraduate computer science
| students worldwide will learn Jupyter this Fall, and you are
| getting contrarian votes among a bunch of average-of-
| masters+industry CS people here
|
| "we" have to learn and teach the next sets of people new to
| computer science
| HuwFulcher wrote:
| Totally agree with you. I've been teaching people over my
| career so far, with varying degrees of success
| slewis wrote:
| nbdev2, which this article is about, is a solution to this
| problem.
|
| It makes notebooks testable, composable, versionable, and more.
| fifilura wrote:
| It can be very useful to run recurring jobs (e.g jobs that run
| once per day) in a notebook to add the output as kind of
| advanced logging.
|
| And then serve the results as a static page under
| my.logs.intranet/my-
| job/2022-02-14/my_recurring_task_notebook.ipynb.html
|
| You can get so much more context regarding what went well or
| wrong compared to browsing through log lines in some more or
| less user friendly tool.
___________________________________________________________________
(page generated 2022-08-26 23:02 UTC)