hngopher.com

       [HN Gopher] The Jupyter+Git problem is now solved
       ___________________________________________________________________
        
       The Jupyter+Git problem is now solved
        
       Author : jph00
       Score  : 262 points
       Date   : 2022-08-26 00:09 UTC (22 hours ago)
        
 (HTM) web link (www.fast.ai)
 (TXT) w3m dump (www.fast.ai)
        
       | flutetornado wrote:
       | JupyterLab
       | 
       | JupyterHub
       | 
       | Jupytext - converts ipynb to py
       | 
       | Nbstripout - strips all output from ipynb
       | 
       | Nbmerge - resolves merge conflicts
       | 
       | Vim-jupytext - vim plugin to auto convert ipynb to py
       | 
       | Papermill - parameterize notebooks
       | 
       | Git
       | 
       | Pandas, Altair - data analysis / Visualization
       | 
       | Phabricator - code reviews of notebooks
       | 
       | Vimdiff + vim-jupytext - diffs in terminal
       | 
       | This solved all my jupyter problems.
        
       | ticklemyelmo wrote:
       | Why is there a "Jupyter+Git" problem specifically? Why aren't we
       | worrying about the "C+Git" problem and the "XML+Git" problem and
       | the "Python+Git" problem? Because merge markers break, well,
       | _every_ file format.
       | 
       | Is it because Jupyter users in particular don't typically
       | understand that there is a formatted text file behind the
       | notebook, or how merge conflicts work?
        
         | thefrozenone wrote:
         | This is a good question and made me think. I have come up with:
         | "Jupyter notebooks can be thought of as Code _and_ an IDE
         | layout "
        
         | Bjartr wrote:
         | It's because the primary editor of the notebook files barfs
         | when presented with the file if it includes merge markers since
         | it's no longer valid json. Imagine if one of your normal code-
         | friendly text editors, or your ide, refused to open a .c or .py
         | file and you had to open it in notepad to fix it.
         | 
         | That's what it feels like to be forced to drop into a normal
         | text editor rather than using the normal notebook ui to fix the
         | conflicts.
        
       | ahurmazda wrote:
       | Thanks but a hard pass from me. The original sin was using goofy
       | JSON as the file format (and no! I dont care for your pretty 5MB
       | pngs polluting my git tree). This is the nth attempt at applying
       | lipstick on the pig (n-1 being jupytext)
        
       | killjoywashere wrote:
       | Why is this not on brew yet! (+deg#deg)+( +-+
        
       | kmod wrote:
       | Streamlit has completely replaced my usage of Jupyter -- I find
       | it to have the quick iteration speed and visual output of
       | notebooks, but it's just normal python so all the normal tooling
       | works (there is no "git problem") and you don't have the weird
       | state problems of notebooks.
       | 
       | Definitely recommend checking it out if you haven't already!
        
       | whacked_new wrote:
       | I used to use Jupytext a lot for this problem, and think it does
       | a decent job. The main problem with Jupytext is its reliance on
       | BYOD (D for discipline), which is a poor (but possibly best
       | available) solution for human systems.
       | 
       | IMHO the Jupyter+Git problem stems from the ipynb format.
       | Jupytext does it "right" in the sense that you can work in
       | .ipynb, and diff in .md. But as long as the base format is diff-
       | unfriendly, all tools are methods of indirection of format
       | complexity to tool complexity.
       | 
       | That's not to take away from the tool -- it looks great. It also
       | takes the D out of BYOD, which is a win. But I think "solving" it
       | means that anybody who receives an ipynb is able to just look at
       | it out of the box, like plain text, so we're still a ways off.
        
         | wasimlorgat wrote:
         | Oh, on the topic of file formats: Quarto also lets you do
         | plaintext notebooks in quite an interesting way, definitely
         | worth checking out:
         | https://quarto.org/docs/computations/python.html
        
           | euler_angles wrote:
           | The latest release of nbdev has fully embraced Quarto! It's
           | very awesome, check it out.
        
         | cycomanic wrote:
         | I have been using jupytext as well and it really makes handling
         | notebooks much easier. I think the decision for Jupyter to save
         | to json was not a good one and they should instead have looked
         | at systems like org mode for inspiration.
         | 
         | I also don't understand what you mean by discipline. Yes you
         | need to make sure that everyone has the jupytext extension
         | installed, but that just becomes part of the needed dev
         | environment. After that the whole experience becomes completely
         | seemless.
        
         | wasimlorgat wrote:
         | I agree re format vs tool complexity. I don't think Jupyter is
         | a particularly difficult format though, its mostly light JSON
         | -- all human-readable.
         | 
         | We realised after working with Jupyter+Git for a while that the
         | pain-points were actually with Jupyter editors (and/or their
         | conventions) rather than the format, because they do things
         | like store user-metadata _in the file_ which pollutes diffs and
         | leads to merge conflicts.
         | 
         | In fact, if Jupyter editors could handle merge conflicted
         | files, we wouldn't need a custom merge driver either.
        
         | maegul wrote:
         | I feel like the BYODiscipline problem with jupytext could be
         | solved with relatively rudimentary text-editor plugins.
         | 
         | I've started rolling my own little plugin utilities and, so
         | far, I have a (very) rudimentary notebook-like interface in
         | plain text.
         | 
         | Combine a proper attempt at such a thing with a good interface
         | to a background ipython kernel system (for which ipython could
         | do with some minor enhancements AFAICT), and you'd basically
         | have the best of all worlds (all plain text editor features
         | including version control and personalisation and
         | customisation, and the iterative advantages of notebook code-
         | cell-based runtimes).
         | 
         | Hopefully, with such a combination functioning well, there'd be
         | an emergent feature that allows one to more easily get
         | interactive with a code-base for the purposes of understanding,
         | debugging or developing it.
         | 
         | Personally, my biggest gripe with Jupyter at the moment is that
         | a few years ago they decided to try to create a quasi-IDE
         | (where they'll probably be beaten by VSCode) rather than
         | improve the general utility of the kernel (or kernel
         | protocol/interface?) and/or the essential notebook UI.
         | 
         | It's a personal gripe, and there's clearly value in the web-
         | first interface they've made with JupyterLab (despite the not
         | insubstantial growing pains that project has faced), but,
         | watching ObservableHQ and Pluto (for Julia) focus just on the
         | core notebook interface, while VSCode have focused on the IDE
         | side and easily incorporated or recreated the now rather
         | old/simple Jupyter Notebook interface, both with success, seem
         | like some vindication on my gripe.
        
           | zelphirkalt wrote:
           | By "essential notebook UI", do you mean the old notebook or
           | the lab interface?
           | 
           | The old notebook was painful to extend in JS compared to
           | writing lab extensions in TS. In Jupyter Lab 3 they have
           | taken questionable steps, but so far I have been able to work
           | around issues.
        
             | maegul wrote:
             | I was referring to the notebook-like "part" of the
             | interface, where JupyterLab is a notebook interface with
             | IDE-like components wrapped around it (file explorer,
             | terminal etc).
             | 
             | The ObservableHQ interface, for instance, I'd classify as
             | just a notebook interface. IE, individually manipulatable
             | code-cells with a shared runtime.
             | 
             | And yea, JupyterLab is better than the classic IMO. But,
             | until recently I'd say, the notebook part of the interface
             | hasn't gotten much love at all, while there've been steps,
             | due to popular demand it seems, to provide alternative UIs
             | that strip away much of what they've added on top of the
             | notebook (ie, simple mode, and now Jupyter Lite).
             | 
             | I haven't really got experience writing extensions in the
             | old Jupyter notebook, and hardly any with JupyterLab, but
             | my experience with JupyterLab was frustrating because it
             | felt like they really killed the ability to implement small
             | and hacky plugins like you could with with the old. This
             | always struck me as a shame. A necessary one perhaps given
             | what I presume is the increased power of their new
             | framework. But it always felt like there was a mismatch
             | between the complexity of the plugin framework (which is a
             | full web-dev experience) and the base features of the
             | "product", where customising my test-editor is now much
             | easier AFACT.
             | 
             | What issues and questionable steps were you thinking of?
        
               | zelphirkalt wrote:
               | > What issues and questionable steps were you thinking
               | of?
               | 
               | 2 things come to mind right now:
               | 
               | Starting with JupyterLab 3 (maybe 3.1), JupyterLab
               | removes query arguments from the URL. Query arguments
               | were the only way I know, to give arguments from the
               | outside of JupyterLab to JupyterLab. Any extension, that
               | relies on arguments given from the outside would break,
               | just because JupyterLab removes query arguments, which
               | were there since the beginning and did not do any harm,
               | at aleast any I could tell. But suddenly this was taken
               | away, without proper alternative. Now you have to hook
               | into their "router" to quickly grab those arguments,
               | before they are gone. This seems silly to me. Why
               | randomly delete query arguments? They are there for a
               | reason and since JupyterLab does not add any of its own,
               | I cannot understand this decision. Simply seems to make
               | it less powerful a tool.
               | 
               | The constant nagging about posting in their community JS-
               | only forum. ("You should post this in the forum.", "Have
               | you seen this post in the forum? _links to forum_ ") Why
               | can this community not handle issues in issues, which can
               | be easily found using a search engine. Why hide
               | everything behind a JS-only forum, which one has to
               | create another account for or associate ones Github
               | account with? Whenever anyone gives me a link to the
               | forum, where supposedly the answer to my question is, I
               | keep thinking: "Ahh great, why did you have to hide it in
               | there? If you had documented this in an issue, I would
               | have found it via search engine and the thing would not
               | have wasted my time and neither would I have had to waste
               | yours." -- something along those lines. When I find an
               | issue and its solution, I still post it as Github issue,
               | so that other people can easily find it, without signing
               | up to their forum.
               | 
               | > I haven't really got experience writing extensions in
               | the old Jupyter notebook [...]
               | 
               | I have done that a few years ago, when JupyterLab was
               | still alpha versions. It worked, but the typical JS
               | mistakes plagued me. JupyterLab is of course using
               | TypeScript, which helps a lot with avoiding silly
               | mistakes. However, I do think there is something to what
               | you say about no longer encouraging the quick hack. Some
               | functionality took years to appear in JupyterLab, but was
               | already available for Jupyter Notebook, before JupyterLab
               | took off.
        
       | scombridae wrote:
       | _Subversion used to say, 'CVS done right.' With that slogan there
       | is nowhere you can go. There is no way to do CVS right._ -- Linus
       | T.
       | 
       | Jupyter's ipynb format is only slightly more amenable to git than
       | say an MSWord doc. Nbdime and friends will never get you to a
       | point where git+jupyter will be worth the ugly.
        
         | fragmede wrote:
         | _slightly_. Since, like, Office 2007, MSWord docs are zip files
         | with xml inside.
        
         | jph00 wrote:
         | What are the outstanding problems you feel are there even with
         | the new nbdev2 functionality? Since I've been using it (the
         | prerelease version) over the last few months I haven't come
         | across a single problem, personally, despite doing a very large
         | amount of collaborative notebook work.
        
       | Helmut10001 wrote:
       | Not criticizing the authors approach, but the Jupyter+Git Problem
       | was solved for a long time with Jupytext [1].
       | 
       | Jupytext will convert Notebooks (.ipynb) files to Markdown (md)
       | and Python (py) 'on the fly' (while working in Notebooks).
       | 
       | - Markdown files can be added to git
       | 
       | - Python and .ipynb files are added to .gitignore
       | 
       | - Python files allow 'chained' import of notebooks (*.py
       | verions), which allows to split larger notebooks into multiple
       | smaller ones
       | 
       | This is my folder structure: .
       | 
       | +-- notebooks
       | 
       | | +-- notebook1.ipynb # automatically generated from md
       | 
       | | +-- notebook2.ipynb # automatically generated from md
       | 
       | +-- md
       | 
       | | +-- notebook1.md # versioned in git
       | 
       | | +-- notebook2.md # versioned in git
       | 
       | +-- py
       | 
       | | +--modules
       | 
       | | | +--__init__.py # empty
       | 
       | | | +--tools.py # use for cross-project base tools
       | 
       | | +--__init__.py # empty
       | 
       | | +-- notebook1.py # automatically generated from md
       | 
       | | +-- notebook2.py # automatically generated from md
       | 
       | +--jupytext.toml
       | 
       | +--.git
       | 
       | +-- README.md
       | 
       | See an example here [2]
       | 
       | Jupytext is mentioned as a 'potential' alternative. Re the "save"
       | cell output: I usually produce html-files at the end of my
       | notebooks (see the example), and add those either to git or auto-
       | upload to an external webserver. The html is standalone and
       | includes outputs, table of contents, and images (example [3]). I
       | would advice against versioning all outputs (images) in git.
       | 
       | Very happy with this approach for a long time now. Jupytext
       | increased my productivity by a hundred percent.
       | 
       | [1]: https://github.com/mwouts/jupytext
       | 
       | [2]: https://gitlab.vgiscience.de/ad/yfcc_gridagg
       | 
       | [3]: https://ad.vgiscience.org/tagmaps-mapnik-
       | jupyter/01_mapnik-t...
        
         | jph00 wrote:
         | The pros and cons of Jupytext are discussed in the linked post.
         | It's a great approach, but wasn't sufficient for our needs --
         | so for us, at least, it didn't fully solve the Jupyter+git
         | problem.
         | 
         | Specifically, it doesn't handle the situation where you need
         | cell outputs in version control -- since in that case, you
         | still need the notebook, which results in all the usual
         | problems occuring. With nbdev2, you don't need to think about
         | anything or do anything special, and stuff like GitHub notebook
         | rendering, nbviewer, ReviewNB, etc all just work. You just run
         | a single command (`nb_install_hooks`) and that's it.
         | 
         | Also, no-one has to install anything extra to view your
         | notebooks, since they're stored in the regular notebook format.
        
           | cycomanic wrote:
           | I'm not sure what your cell outputs are, but if you are doing
           | plots or images inside your notebook, than I agree with the
           | OP that it is generally not a good idea. You now store binary
           | data inside your git repository (which sometimes just carries
           | its own problems), but worse that binary data is mixed into
           | your text diff.
           | 
           | If you do a diff between two revisions where some figure
           | changed you essentially will be swamped by the diff in the
           | figure making it difficult to find what actually changed. Now
           | tools like nbreview get around that, but now you're forcing
           | everyone to use the same dev tools, and can't look at diffs
           | any other way really.
        
             | glenngillen wrote:
             | It's been a while, but last time I needed to GitHub at
             | least had really great tooling for diffs between versions
             | of image files.
             | 
             | > but now you're fixing everyone to use the same dev tools
             | 
             | No they're not. You can continue using whatever approach
             | you're using. Attempting to shut down alternatives like
             | this though could be seen as forcing everyone to accept
             | whatever that status quo and lowest common denominator
             | solution, even if their dev tools could support something
             | better.
        
         | wodenokoto wrote:
         | This is mentioned and critiqued on both pro and cons in the
         | article.
        
         | wasimlorgat wrote:
         | Jupytext does a lot more than just fix Jupyter/git integration,
         | which is great if you want to adopt its approach, but a bit too
         | heavy IMO if you don't. The approach mentioned here is
         | extremely lightweight and doesn't use too much more than built-
         | in Jupyter/git functionality (and it all happens automatically
         | behind the scenes)
        
         | spiim wrote:
         | I find this conversion a little bit clunky. The approach that
         | seems to work for me is to use quarto with it's .qmd format.
         | https://quarto.org/
        
           | g8oz wrote:
           | Quarto looks amazing!
        
             | ellisv wrote:
             | It's pretty nice. At first I thought it was just a
             | rebranding of R Markdown but it's been decently
             | modernized/improved to the point where it makes sense that
             | it is its own, separate thing.
        
       | da39a3ee wrote:
       | > Here at fast.ai we use Jupyter for everything. All our tests,
       | documentation, and module source code for all of our many
       | libraries is entirely developed in notebooks
       | 
       | That sounds like a nightmare. Why would you want to develop a
       | library in a jupyter notebook?
       | 
       | > The solution presented here is the result of years of work by
       | many people.
       | 
       | It's a bit depressing that it came to this. It's hard not to
       | think that it was a mistake from the beginning and that the
       | format should have been based on using special comment markers in
       | valid code, together with an accompanying JSON metadata file. Or
       | something like that. One way or another, we have a very strong
       | tradition of storing code in plain text files, not embedded in
       | strings in JSON or otherwise embedded in any opaque format. Maybe
       | there'll come a day when it's appropriate to abandon that to get
       | some advantages, but I don't think that day was the original
       | creation of Jupyter. I know it was created by thoughtful and
       | expert software engineers, but I feel that it was a mistake and
       | it's actually made a lot of data science / academia-oriented
       | people less qualified to participate in industry software
       | engineering, because of the poor practices forced upon them by
       | the inability to use git with Jupyter, and notions like
       | developing library code in notebook cells.
        
       | jayd16 wrote:
       | You know, is there anything like the Language Server Protocol for
       | diff/merge resolution? Seems like there's an opportunity to build
       | a system for semantic aware merge that's language/format aware
       | and tool agnostic (and auto configurable to boot).
       | 
       | Suddenly binary formats could become mergeable.
        
         | zie wrote:
         | See [Pijul](https://pijul.org/manual/theory.html). They did
         | that hard work.
        
           | samatman wrote:
           | Hmm I'm a huge fan of pijul, looks like the future of change
           | management from where I'm sitting, but no: they have not.
           | 
           | Semantic diffing needs something like pijul, but a system
           | taking advantage of this doesn't yet exist. Pijul avoids some
           | merge conflicts by design, won't do the wrong thing, and
           | handles conflicts correctly: we still need tools with a
           | fuller awareness of what strings _mean_ to have rich semantic
           | diffs.
        
             | zie wrote:
             | True, Pijul only offers the safe diff/patch part.
        
       | medo-bear wrote:
       | or ... why not just use org mode?
        
         | faustlast wrote:
         | I wish org-mode was standard and more appreciated. It is so
         | good, but I feel that I'm the only one using it and it is hard
         | to sell emacs to others.
        
       | hendry wrote:
       | If there a good place to see Jupyter note books solving a real
       | problem?
       | 
       | Idk, like importing some data and doing some analysis /
       | forecasting?
       | 
       | Most notebooks appear really bad quality. Worse internally.
       | 
       | Better off looking at some excel
       | https://github.com/martinshkreli/models
        
       | throwaway72937 wrote:
       | Does this work for editing notebooks in VS code? (Unclear to me
       | where the saving hooks reside, and whether you have to edit them
       | through Jupyter labs/notebook) Any issue if the notebooks reside
       | on a remote server?
        
       | planede wrote:
       | Is it possible to get a "diff3"-like conflict style? That is
       | showing                  <<<        side1        |||
       | ancestor        ===        side2        >>>
        
       | cigrainger wrote:
       | This is one reason I feel lucky to be working with Elixir.
       | Livebook's livemd is basically just markdown.
       | https://livebook.dev
        
         | wodenokoto wrote:
         | So is jupytext, rmd and qmd. But what do you do about the
         | output?
         | 
         | The nice thing about markdown-like notebooks is that they play
         | well with git. The nice thing about jupyter style notebooks is
         | that they contain all the content needed to actually _read_ the
         | notebook.
        
       | cs702 wrote:
       | Wow, thank you to the authors!
       | 
       | It looks like this tool, _nbdev2_ , solves a real-world problem
       | for Jupyter users, including me, with _zero effort_ required to
       | use it every day. It relies on clever hooks to get git to treat
       | cells as first-class citizens (as opposed to lines of text, the
       | default). Nice! Based on that alone, I would expect _nbdev2_ to
       | be widely adopted over time. In fact, if it works as well as
       | advertised, it should be incorporated into Jupyter. I, for one,
       | will be giving it a try!
       | 
       | If you use Jupyter to solve problems in your domain of expertise,
       | feel free to ignore all the smart-sounding software engineers who
       | will poo-pooh this tool _only_ because they don 't like notebooks
       | and don't want anyone to use them. No matter what you do, there
       | will _always_ be people who look down on easy-to-use tools that
       | enable scientists and practitioners from other disciplines to
       | write, run, and explore ad-hoc code on-the-fly.
       | 
       | EDIT: _nbdev2_ 's authors are on this page, answering questions.
       | Thank you again!
        
       | xcambar wrote:
       | Stating that git breaks Jupyter notebooks is quite a flex.
       | 
       | It stains the article from the very first paragraph.
        
         | [deleted]
        
         | wasimlorgat wrote:
         | Have you worked with Jupyter notebooks and git? It's a
         | literally true statement :D and quite a struggle for many of us
        
           | xcambar wrote:
           | If you leave git diffs in your files, whether Jupyter
           | notebooks or otherwise, and run/compile them... They will
           | break.
           | 
           | If you give me a counter example, good for you, but my
           | statement holds true 99%.
        
             | jsweojtj wrote:
             | You state in the top level comment that this claim stains
             | the article: "Stating that git breaks Jupyter notebooks is
             | quite a flex."
             | 
             | But you are saying here: "If you leave git diffs in your
             | files, whether Jupyter notebooks or otherwise, and
             | run/compile them... They will break."
             | 
             | Have you changed your mind in this thread? Or what's your
             | objection?
        
               | xcambar wrote:
               | I'm suggesting that git only breaks Jupyter notebooks (or
               | anything else) if you do not know what to expect from
               | git.
               | 
               | But if you don't know that git modifies files when
               | conflicts, then you're an interesting and rather
               | unexpected audience, I assume.
               | 
               | Meaning that for the typical git user, meaning, knowing
               | about git diffs, the behavior is expected hence not
               | broken. The files end up in an expected broken state, but
               | git does not break them per se.
               | 
               | If you still disagree, let's just settle that we disagree
               | and be done with it.
        
           | cycomanic wrote:
           | It's the wrong way around though, Jupyter notebooks break a
           | git work flow. I think the fault here is completely with the
           | design of the Jupyter notebook file format (and the way
           | editors save to it).
           | 
           | I think it's quite unfortunate that they did not consider
           | that the format would integrate well with version control
           | systems when first designing ipython notebooks.
        
             | fumeux_fume wrote:
             | Nah man, you got it backwards. Git still works just fine
             | while my notebooks are definitely broken. Not here to play
             | the blame game, just trying to relate the practical
             | results.
        
       | persedes wrote:
       | No mention of https://github.com/srstevenson/nb-clean ?
       | 
       | Has been my go to for this. It seems like nbdev2 is fastais own
       | cooked solution with a bunch of other tools.
        
       | wasimlorgat wrote:
       | Hi, I'm the author of the git merge driver and Jupyter save hook
       | in nbdev2 :) I'd be happy to answer any questions you have about
       | how we're handling using notebooks with git
        
         | howon92 wrote:
         | I enjoyed reading the writeup and think the solution is clean!
         | Thanks for sharing
        
         | jks wrote:
         | Can this do three-way merge? If I have to resolve two
         | conflicting code blocks, it is often useful to know how each of
         | them change the code from the shared parent.
        
           | wasimlorgat wrote:
           | It does an ordinary three-way git merge (treating notebooks
           | as plaintext) then a two-way merge on conflicted bits. We
           | opted for that approach because its incredibly simple and has
           | worked perfectly for us (I think since we tend to work with
           | small code cells). I think nbdime has a full-on three-way
           | notebook merge if that's what you need, which can be used
           | together with nbdev's Jupyter save hook to clan up unneeded
           | metadata.
        
       | p1necone wrote:
       | I haven't used Jupyter but from what I can gather from this
       | article they've built a simultaneous editing system on top of
       | automatically committing to git in the background as multiple
       | people edit things, and using that to share the changes between
       | users.
       | 
       | Do I have that right? Because that sounds /insane/.
        
         | anigbrowl wrote:
         | If this doesn't work for you JetBrains' DataSpell might, it's
         | oriented towards notebooks for teams. It has hiccups, things
         | like ipywidgets don't always work as expected so I sometimes
         | find myself falling back to Jupyterlab. But overall it's a very
         | comfy chair.
        
         | jph00 wrote:
         | No it just uses normal git in the normal way. The simple trick
         | is to use a jupyter-native git merge driver, so that merges are
         | done at a cell level instead of a line level.
         | 
         | Also, unneeded metadata is removed from the notebook when
         | saving, so there's less changes to merge.
         | 
         | Both these two things are done using standard hooks built into
         | each of git and Jupyter. That is: git is written in such a way
         | that it can fully support non line-oriented formats. We just
         | took advantage of that capability.
        
           | p1necone wrote:
           | Ahh right, so you still make manual git commits normally,
           | it's just that the Jupyter UI used to fall over when it
           | encountered merge conflict markers in source files. And now
           | it doesn't fall over any more and can nicely represent them
           | because the conflict markers are no longer done for
           | individual lines of text?
        
             | jph00 wrote:
             | Yes exactly :D
        
       | bsdz wrote:
       | I use this plugin for my jupyter notebook git integration. It has
       | a git diff option that's useful but gets very slow for complex
       | documents. Perhaps under the hood it's using one of the other
       | tools mentioned in the postscript.
       | 
       | https://github.com/jupyterlab/jupyterlab-git
       | 
       | Edit: Looking at the source, it does appear to use nbdime under
       | the hood.
        
       | wanderingmind wrote:
       | I thought jupytext solved it long ago with percent formatted
       | python file. Since its a python text file you can run automated
       | formating, linting, static type checking and git version diff.
       | What's new is being solved here?
        
       | liquids wrote:
       | I've used this library for a number of projects and it's a joy to
       | use. I don't think it's an understatement to say it's paradigm
       | shifting - to the extent that once you have your environment set
       | up, you are free to code, think, iterate, deploy and document
       | your projects all at 99% of the speed of thought.
       | 
       | There seems to be a lot of discussion in here around the pitfalls
       | of jupyter, and notebooks, and the poor coding practices of data
       | scientists. If you haven't read the article or used the software
       | I'd like to highlight that all of these (legitimate) complaints
       | are exactly what nbdev2 was created to address, and in my opinion
       | very successfully solves.
       | 
       | The way it works is that everything runs off a master notebook,
       | and then with one command: libraries are built, git diffs are
       | magically fixed, tests are run, documentation is automatically
       | created. It doesn't fundamentally change your workflow in any
       | way, it just abstracts and automates away all of these pain
       | points.
       | 
       | There's a reason that everyone uses jupyter notebooks. They are
       | fun to use, they are great for exploring and developing ideas.
       | And (minus the aforementioned git collaboration issues) they are
       | great for sharing with others, which is a huge part of the wider
       | data science ecosystem. We don't need to recommend avoiding
       | notebooks, and allege they are just for beginners. We need to use
       | tooling which addresses some of these final issues with writing
       | mature software. And I'd like to thank the authors of nbdev for
       | this.
       | 
       | The people who look down their noses at notebooks can continue to
       | do so - but what they will find is that nbdev quite effortlessly
       | leap-frogs over these sneered complaints, and allows you to write
       | better software more productively.
        
         | no_identd wrote:
         | Okay, now gimme a PowerShell version of it.
        
       | boringg wrote:
       | This actually works? Awesome - never really thought about how
       | dysfunctional git is with jupyter - I always assumed that it just
       | didn't work. Nice to have someone fix the problem that I just
       | lived with :)
        
       | HuwFulcher wrote:
       | Whenever I can I strongly recommend not using Jupyter for
       | anything more than the most transient tasks.
       | 
       | I don't know whether it's the Data Science culture or Jupyter but
       | there is a big lack of discipline in writing maintainable code in
       | DS and non-existent git support is part of that.
       | 
       | I always strongly discouraged developing models using notebooks,
       | instead advocating for using .py files and then using notebooks
       | for sanity checking data.
       | 
       | I don't have any clever ideas for how we can move past Jupyter
       | but the sooner we do the better.
        
         | maegul wrote:
         | Yes, the Data Science culture around maintainable code does
         | seem to be reaching a critical level of toxicity (in some
         | environments at least).
         | 
         | In line with a nephew comment of mine, I feel that bringing the
         | immediate interactivity or iteration cycle of notebooks to the
         | development experience would help a lot, and not be too bad a
         | thing for common development either.
         | 
         | I've heard of the related nbdev project, which seems like an
         | interesting and compelling idea. But it'd be nice to see the
         | reverse: something that makes ordinary python development more
         | immediate than using a debugger/vanilla REPL.
        
           | Leo_Germond wrote:
           | I think that improving the shell experience and allowing e.g.
           | multimedia content to be displayed and manipulated directly
           | into the shell, would help a lot with interactivity. Maybe
           | some specific terminal emulator (like kitty) with ipython
           | would constitute a good starting point...
        
         | tetris11 wrote:
         | It depends how you use it. When you're still new to the data,
         | using a freeflow text-and-codeblock workflow like jupyter or
         | org-mode really speeds up the exploration phase.
         | 
         | Once you have a consistent set of questions, and methods to
         | answer them, then yes, copy off the relevant chunks into their
         | own scripts and source these when using similar data to bring
         | you up to speed, and modify them to your tastes.
         | 
         | The issue with starting off with an external script initially
         | is the distracting temptation to refine your code so it can be
         | better used with future data, despite not yet having seen or
         | not knowing what that future data is like. The initial "play
         | and explore" phase of an analysis is very important imo, and
         | notebooks really facilitate that.
        
           | HuwFulcher wrote:
           | I agree, Jupyter has its place in helping so exploration and
           | learning. A problem that Data Science faces is that the
           | majority of courses don't show Data Scientists how to
           | progress on from notebooks to write robust training pipelines
           | that are reproducible and safe.
        
         | cantagi wrote:
         | Yes, people writing unmaintainable code in Jupyter notebooks is
         | a problem.
         | 
         | Personally, I start every notebook with
         | %load_ext autoreload         %autoreload 2
         | 
         | then develop production quality code in .py files.
        
           | etrautmann wrote:
           | I didn't realize anyone didn't do this. Totally essential,
           | great point!
        
           | shapefrog wrote:
           | Well that has improved my life - thanks!
        
         | pplonski86 wrote:
         | Low quality Data Science code is not a fault of Jupyter.
         | 
         | The Jupyter allow you to load big chunk of data or some large
         | model only once, and then use it for experiments in other
         | cells. It is hard to replace this feature with plain `*.py`
         | file. For me, this is the killer feature.
        
         | mFixman wrote:
         | Jupyter is not great for collaboration with multiple people
         | editing, but with a little bit of order it's perfect for in-
         | person working and presenting that work.
         | 
         | Notebooks can be clean if you follow some rules:
         | 
         | 1. Code flow always goes down: holding Option+Enter should
         | execute all fields without any errors. Don't do `x += 1` if `x`
         | is defined underneath.
         | 
         | 2. All blocks are idempotent: running any block 5 times should
         | produce the same result as running it 1 time. Don't do `x += 1`
         | unless `x` is defined in that block.
         | 
         | 3. Keep block-local variables short and block-global variables
         | long. Don't do `x += 1` unless you are not using `x` anywhere
         | else.
         | 
         | Also, the Table of Contents extension [1] is a life-saver for
         | making long analyses workable.
         | 
         | [1] https://jupyter-contrib-
         | nbextensions.readthedocs.io/en/lates...
        
           | analog31 wrote:
           | I have a rule that helps with hidden state and out-of-order
           | execution. Once in a while I do a "restart kernel and run all
           | cells." If doing that breaks anything, then I have to fix it.
           | But it also ensures that a notebook is reproducible later on.
           | Of course I don't have things that take hours to run.
           | 
           | It would be nice if there were something that would make out-
           | of-order problems light up, the way that code editors can
           | highlight errors while you're editing. A limitation of
           | "browser as editor" is that it misses out on some of the
           | powerful things that code editors do today.
           | 
           | Another thing is to put things in functions, so temporary
           | variables are disposed of. That's a halfway step to putting
           | things in .py files. A benefit if .py files is not always
           | that jupyter is bad, but that variable scoping is good
           | hygiene.
        
           | n8henrie wrote:
           | I try to follow these ideas, and in many of my notebooks I
           | frequently test them by occasionally running "restart kernel
           | and run all cells" from the menu, which tends to point out
           | anything I've accidentally moved or run out of order.
           | 
           | As a table of contents, I usually write some markdown up top
           | with links to markdown HTML anchors elsewhere in the page,
           | which themselves also have a link back to the TOC. Works
           | pretty well. Will have to check out that extension.
        
           | jononor wrote:
           | Good rules. I will add some of mine:
           | 
           | 4. Use a function in each block, that is defined and the
           | called with appropriate arguments, and return the values of
           | the block. This prevents proliferation of global state - and
           | the functions are really easy to move out to .py modules when
           | things have solidified a bit.
        
           | z3c0 wrote:
           | Agreed on all points. Notebooks really aren't that hard to
           | maintain. They just require some slightly different rules
           | from standard scripts.
           | 
           | Personally, I like to label block-global variables in capital
           | case (like PEP8 constants), so as to make them easy to spot.
           | Being formatted like constants also causes me to think twice
           | about altering it after instantiation.
        
           | cinntaile wrote:
           | It would be great to have tools available that force these
           | rules on you.
        
           | da39a3ee wrote:
           | > with a little bit of order it's perfect for in-person
           | working
           | 
           | It's not perfect for in-person working because a single
           | person should always keep their work under version control,
           | and they should be able to view meaningful diffs to
           | understand the history.
        
         | tuukkah wrote:
         | > _non-existent git support_
         | 
         | From the beginning of the article: " _With nbdev2, the
         | Jupyter+git problem has been totally solved. It provides a set
         | of hooks which provide clean git diffs, solve most git
         | conflicts automatically, and ensure that any remaining
         | conflicts can be resolved entirely within the standard Jupyter
         | notebook environment._ "
        
         | qsort wrote:
         | One of my most upvoted comments says something to the effect of
         | "notebooks bad", so you're preaching to the choir here --
         | however:
         | 
         | - I work with several people who are purely data scientists,
         | and I lean on "culture" rather than "Jupyter". In some circles,
         | probably influenced by academia, programming is considered to
         | be low status work. You are not going to solve the problem by
         | switching to .py files, even though for most tasks _literally
         | anything_ is better than Jupyter.
         | 
         | That they don't use git, or that git wasn't originally even a
         | concern, is a consequence of that low-status perception. If you
         | pitched something to the developer community and told them "oh,
         | by the way, you can't use git", they'd synthesize tomatoes out
         | of thin air to throw at you.
         | 
         | - I'd carve an exception for stuff that satisfies ALL of the
         | following: (a) is self-contained in a single notebook, (b) has
         | no dependencies on anything non-standard, (c) is demonstrative
         | in nature, or a personal exercise rather than production
         | software. For example, I wrote my solutions to the Advent of
         | Code problems in a notebook and I liked the experience,
         | especially how you could mix math and code.
        
           | HuwFulcher wrote:
           | I would also lean more towards culture and the environment
           | that Jupyter provides only perpetuates it. I think what made
           | me leave Data Science in the end was that I wasn't driven by
           | work outside of the notebooks (i.e. coming up with a
           | mathematically superior solution) but driven by building ML
           | driven systems as a whole.
           | 
           | I think notebooks are a great way of presenting findings and
           | showing your workings at the same time. If that was their
           | main use then I wouldn't have any issues
        
           | gradschoolfail wrote:
           | I would say that in academia, explorability and immediacy is
           | way prioritized over reproducibility and maintainability..
           | Two kinds of human beings?
        
             | qsort wrote:
             | The incentives in academia are different. The objective is
             | to publish, code is important only insofar as it allows you
             | to achieve that goal. This is not to say that academics
             | can't code, but even if you are a professor who cares
             | passionately about making high-quality software, you're
             | fighting uphill, because that's not what you're being
             | evaluated on.
             | 
             | If you want to make the argument about "two kinds of
             | people", I think it's more about A-type/B-type data
             | scientists in the industry. I'm really mostly a developer
             | and not a data scientist, but when I assist in DS tasks I
             | wear a distinct B-type hat, and that informs my
             | perspective. A-type people have different priorities and
             | that's fine; my gripe is when you try to import A-type
             | practices in a B-type scenario.
        
         | wasimlorgat wrote:
         | I'm always surprised when people advocate for .py files over
         | notebooks because of poor software practice. (Genuine question)
         | have you found that it improves the situation at all?
        
           | HuwFulcher wrote:
           | I've found varied success. In general, I've encouraged the
           | move across to being teaching source control. That has been
           | in contexts where notebooks are being used for critical
           | outputs rather than exploration.
           | 
           | When you get into MLOps as well, having .py templates
           | actually makes the Data Scientist's job easier as they can
           | plug and play their models into a system that tracks inputs,
           | outputs and changes for them
        
         | jhrmnn wrote:
         | I think of Jupyter Notebooks as scratch paper on my desk. It's
         | not to archive things, it's for developing ideas. Once ideas
         | are developed, I transfer them to a long-term medium (LaTeX or
         | Markdown document, Python source file, etc).
        
           | ajford wrote:
           | Yep. I worked in scientific applications, and when developing
           | some new data cleaning and processing pipelines for our
           | hydrology data, Jupyter was phenomenal.
           | 
           | It was easy to use as a presentation, with figures and plots
           | embedded. With controls enabled, you could demonstrate what
           | varying certain parameters would do and pitch proposed
           | cleaning profiles.
           | 
           | I was rather easily able to send a directory and it's
           | notebooks/data sources to colleagues in the water sciences
           | team so they could validate my results on their own (they
           | were luckily also familiar with Python and Jupyter), and
           | caught some minor bugs in the pipeline.
           | 
           | This was all much more collaborative and concise, and I feel
           | Jupyter played a huge part in it.
           | 
           | Once it was done, it and a "final draft" pdf were added to
           | the Docs in the repo and the pipeline was written out into a
           | full application of it's own.
        
           | moonshotideas wrote:
           | Same, it's perfect for "work in progress code", and working
           | out a problem step by step. I've always wanted this
           | environment in other languages
        
         | jstx1 wrote:
         | My workflow is:
         | 
         | 1. Experiments in notebooks. Notebooks are saved under git but
         | mostly as a backup, I don't care how nicely they play together.
         | I don't get why you would discourage notebooks for running
         | experiments, doing it with .py files sounds kind of miserable.
         | 
         | 2. Services and library code in .py files, under version
         | control, just like any other software we write.
        
           | HuwFulcher wrote:
           | Experiments using notebooks are fine as long as they are well
           | documented.
           | 
           | Having your services and library code as .py files you can
           | import in is great.
           | 
           | The issue comes with how to move from experimentation to
           | deployment. If you already have services/library code as .py
           | files you make your life a lot easier. The issue comes when
           | everything is spread across multiple, poorly documented
           | notebooks. If you're working with an MLOps team it makes
           | their life a nightmare to take those notebooks and conform
           | them into something usable.
           | 
           | Jupyter is great when it is used in the right way.
        
             | targafarian wrote:
             | 100% agreed.
             | 
             | People use a great tool in a poor way and then broadly
             | condemn the tool.
             | 
             | And any tool that is sufficiently flexible to be broadly
             | useful can be used in very poor ways.
             | 
             | Jupyter is great, it gets me over the barrier potential for
             | starting a task every time. I build and prove out an
             | algorithm/task piece by piece. Once I'm happy, I move the
             | meat of it to a function in a .py file, and move the code I
             | used to test the algorithm to a unit test function. Delete
             | the duplicated bits and replace with imports, and then what
             | remains is a tutorial/demonstrator notebook using the
             | function I wrote and maybe some nice plots to go along with
             | that, that I wouldn't put in a unit test (nor that show up
             | in docstrings). This can be converted to sphinx docs if the
             | code gets big enough.
             | 
             | What a great tool for incrementally building software! In
             | my world, I build brick by brick, not all at once. Jupyter
             | is a key to that process.
        
         | montebicyclelo wrote:
         | The big benefit of Jupyter in the context of machine learning,
         | is that you are often dealing with models that take quite a few
         | seconds to load. You can put big, slow loading, things into
         | memory in the top cells, then try a bunch of logic with them
         | below. Whereas when working with just '.py' scripts, you'd have
         | to reload the model every time, which can make for slow and
         | uncomfortable iteration.
        
           | infinityio wrote:
           | not a complete solution, but PyCharm and VSCode both support
           | using `# %%` to split a python script into 'cells' (stolen
           | from matlab?), which then be executed individually/repeatedly
        
           | nidnogg wrote:
           | One alternative to loading models in .py scripts is making
           | use of joblib's dump() and load() methods for pipelines. http
           | s://joblib.readthedocs.io/en/latest/generated/joblib.dum....
           | 
           | That way, if you put your classifiers in joblib pipelines,
           | once you're done with fitting steps you can just export your
           | trained classifier with:                   joblib.dump(pipe,
           | "trained_classifier.dump")
           | 
           | And resume your work with:
           | joblib.load("trained_classifier.dump")
           | 
           | Considering this works for any Python object, a lot of heavy
           | lifting can be exported for later (swift) use this way.
        
           | tcpekin wrote:
           | The way I get around this is to start an IPython interpreter,
           | and run .py files with `run -i file1.py`. This loads things
           | into memory in the interpreter, and then I can run file2.py
           | with the actual analysis, and iterate with file2.py until I'm
           | happy. In the end, you can keep the files separate, or
           | combine them into 1 file that will run top to bottom your
           | whole analysis. As long as you keep the IPython session open
           | everything remains in memory, just like in a notebook. The
           | autoreload magic also works if you set it to the correct
           | option, so if you are working on a library/package it will
           | automatically reload them if necessary.
        
           | HuwFulcher wrote:
           | Yes that's a big plus of notebooks. Hopefully a solution can
           | be found for .py files in future where you can earmark the
           | top part of the script to be cached so the interpreter skips
           | over it
        
             | akx wrote:
             | Where would an interpreter cache things if it's not running
             | anymore? The disk? You're back to loading data from disk.
        
               | HuwFulcher wrote:
               | Yep, I don't know enough about the interpreter under the
               | hood but an interactive mode like a debugger where you
               | can go back to a previous line, etc might be the
               | solution. I doubt that's high on the priorities of the
               | Python team though.
        
         | bobbruno wrote:
         | You're thinking it the wrong way. Notebooks don't do well in
         | software development, but they are extremely useful on
         | exploratory data analysis and quick iteration when searching
         | for a suitable modeling approach. These two tasks use code, but
         | for completely different purposes. A DS is working on the data,
         | understanding it and trying to identify what information it may
         | have. Then they try to find a model that will leverage that
         | information to deliver whatever inference solves the business
         | need. This is extremely interactive and iterative, and
         | everything from the actual business problem to the ML approach
         | may change at each iteration. Imposing software development
         | practices at this point is disruptive to the train of thought,
         | which is very burdened already by the level of uncertainty and
         | all the mathematics required to understand the data results.
         | The goal is to find a viable approach, not write production
         | code.
         | 
         | Once this approach is found, a good clean-up/refactor is
         | strongly recommended, to then start a proper software
         | development that will create a live product from the found
         | approach. I call this the switch between research mode and
         | development mode, and it has strong parallels to the way R&D is
         | done in many industries. I believe a lack of understanding of
         | this dual nature of ML is what causes many of the problems in
         | MLOps: plans that don't take into account the research time and
         | risk, mixed teams where engineers don't understand the initial
         | nature of DS work, attempts to put notebooks containing
         | research code in production, etc. Even planning for the
         | refactor doesn't solve it all - what will happen when the next
         | generation of a model has to be created? Will the refactor Ed
         | code be forced on the DS and ruin their research productivity?
         | Will they start from scratch again and not only lose all the
         | refactor/dev cost but also make this a recurring cost? I have
         | been looking for answers for this for years now, and found none
         | so far.
         | 
         | Source: I've been working with data for 27 years, as a data
         | engineer, data architect and data scientist. When I do DE, my
         | code is considered high quality by my peers, but when I'm doing
         | DS research, I know I write bad code - and I won't change that.
         | It's more productive to work this way and do the big refactor
         | (possibly leaving the notebook env behind along the way) than
         | the alternative.
        
         | scombridae wrote:
         | _not using Jupyter for anything more than the most transient
         | tasks_
         | 
         | While most programmers have reached this conclusion, they're
         | generally not day-in day-out jupyter users. They need to
         | understand *everything* is transient for scientists who
         | optimize for proof-of-concept and publish-and-forget-it paper
         | writing.
        
           | frumiousirc wrote:
           | > *everything* is transient for scientists who optimize for
           | proof-of-concept and publish-and-forget-it paper writing.*
           | 
           | Which itself is a huge problem.
           | 
           | Happily this mindset is changing, at least in some scientific
           | all fields. For example, in particle physics proposals a
           | document ("data management plan") much be written describing
           | how that unconscionable attitude will not be taken with the
           | experiment's data and software. That said, this transient
           | mindset and derision of real software skills is still fairly
           | prevalent in this field.
        
             | scombridae wrote:
             | _Which itself is a huge problem_
             | 
             | More "nature of the beast" in my opinion. Science measures
             | itself by how many alluring women it can date; engineering,
             | by how long it can keep the wife happy.
        
         | LeanderK wrote:
         | If you work with something visual, interactive then this
         | workflow is so super awkward that I never end up doing it. For
         | data-driven workflow you have to analyse the data, note down
         | your thoughts, analyse a bit more and then come to a
         | conclusion. Your conclusion might be code living in .py files,
         | or another type of data then consumed by something else. But
         | this will result in a significant part of the "thought-process"
         | and relevant code living in those notebooks, with all their
         | problems. I can't just switch to some .py files because I want
         | to change the axis for some plot, or look at it in log-scale.
         | But then where do you draw the line? A .py file for only 10
         | lines of code generating the resulting .csv? That's also a pain
         | to maintain because you have all those disconnected files. We
         | need those notebooks, they have to get better.
        
         | medo-bear wrote:
         | i strongly agree with what you are saying about Jupyter,
         | however i strongly disagree about using netobooks in general
         | (literal programming)
         | 
         | one of the key things that a good notebook system must allow
         | you to do is to mix something like markup format + LaTeX +
         | source code. writing math-heavy documentation and explanations
         | is simply impractical and limited (readability suffers) if done
         | in comments. jupyter however is severely limited as it is
         | unreadable in its raw format and therefore does not play well
         | with a version control system such as git
         | 
         | instead there is a solution that allows one to do everything
         | jupyter does good with the additional benefit that it plays
         | with version control really well - ie _org-mode_ [1]. the only
         | difference is that instead of using a browser to interact with
         | it, you use emacs. the added benefit to this is that you can
         | also use full-featured key bindings (emacs  / vim) and even
         | integrate a language server for auto-completion [2]
         | 
         | EDIT: moreover the list of supported languages in orgmode far
         | exceeds that of jupyter [3] (or did the last time i made this
         | comparison)
         | 
         | [1] https://orgmode.org/
         | 
         | [2] https://emacs-lsp.github.io/lsp-mode/manual-language-
         | docs/ls...
         | 
         | [3] https://orgmode.org/worg/org-
         | contrib/babel/languages/index.h...
        
           | kgwgk wrote:
           | > using netobooks in general (literal programming)
           | 
           | I guess you meant literate programming.
           | 
           | Literate programming is different from interleaving console
           | inputs and outputs and random paragraphs in the same
           | document.
           | 
           | Even if we expand the original idea to comprise that,
           | literate programming is much more than that.
        
             | medo-bear wrote:
             | yeah it was supposed to say literate programming. anyway
             | there is no doubt that org-mode (and jupyter) is an
             | application of literate programming concepts. See https://e
             | n.wikipedia.org/wiki/Literate_programming#Literate_...
        
               | kgwgk wrote:
               | But programming is more than writing notebooks.
               | 
               | How many python packages are written in literate
               | programming style?
               | 
               | How many programs written as notebooks would be actually
               | better if they were structured differently?
        
               | medo-bear wrote:
               | ? i never said that its a one-size-fit-all solution.
               | certainly you would not write a software package in a
               | notebook. but you might write a tutorial, textbook,
               | academic paper, homework, personal notes, etc.
        
               | kgwgk wrote:
               | My point was that a comment against notebooks being
               | overused - where a different structure would make more
               | sense - is not a necessarily a comment against literate
               | programming.
               | 
               | The issues with notebooks - in general - are unrelated to
               | literate programming. The notebook format is convenient
               | to have some kind of "interactive" programming though,
               | rather than "literate".
        
               | medo-bear wrote:
               | > The notebook format is convenient to have some kind of
               | "interactive" programming though, rather than "literate"
               | 
               | interactive programming is usually handled by the repl,
               | for which you do not need a notebook
        
               | kgwgk wrote:
               | Of course you don't! The notebooks are glorified repls
               | and you can also have literate programming without
               | interactive notebooks. What notebooks get you compared to
               | alternatives is both things at the same time.
        
               | medo-bear wrote:
               | my point is similar but restricted to jupyter. i think
               | that that org-mode can offer a much more advanced and
               | complete literate programing environment than jupyter
               | that's far beyond just markdown + repl
        
               | kgwgk wrote:
               | Agreed.
               | 
               | Note how babel is presented, by the way (last point in
               | particular):                 Babel augments Org code
               | blocks by providing:            interactive and
               | programmatic execution of code blocks;            code
               | blocks as functions that accept parameters, refer to
               | other code blocks, and can be called remotely; and
               | export to files for literate programming.
               | 
               | https://orgmode.org/worg/org-contrib/babel/intro.html
        
               | bobbruno wrote:
               | Have you considered that the notebook is an evolution of
               | a repl, with improved visualization and feedback, for for
               | analysis-heavy work? The problem starts when notebooks
               | are used for development and production.
        
           | bobbylarrybobby wrote:
           | I think the solution is quarto https://github.com/quarto-
           | dev/quarto-cli
        
           | Grumbledour wrote:
           | There is always a lot of org-mode promotion on here when the
           | topic is interactive notebooks. And I get it, people love it
           | and it solved many of the problems other systems have. But
           | org-mode users need to understand that the one thing holding
           | org-mode back is simply emacs. I know you probably all love
           | it, but everyone else is not interested in breaking of their
           | fingers by learning obscure key command chains just to use
           | org-mode. Sorry, but that is just the reality. If someone can
           | implement the majority of org-mode in a better editor, there
           | might be more users interested. But as it stands, it's just
           | to much of a hassle.
        
             | natrys wrote:
             | I expect one of the main reasons someone could evangelise
             | Emacs for is the fact that defaults don't mean much when
             | it's all configurable. So if you don't like the keys, just
             | bind them to whatever you like. That's like the fundamental
             | ethos of everything in Emacs. Also, CUA-mode exists.
             | 
             | If org-mode wasn't backed by Emacs, it would merely be a
             | markdown substitute hence much less useful. There are many
             | org-mode clones for modern editors like neovim or VSCode,
             | except all they offer is front-end features (highlighting,
             | folding, node manipulation etc). There is simply no reason
             | to use those over a decent markdown editor. So I think you
             | have this backwards; Emacs isn't holding back org-mode,
             | rather much of advanced org-mode features are made possible
             | and distinguished by the fact that it builds on Emacs.
        
               | Majromax wrote:
               | > I expect one of the main reasons someone could
               | evangelise Emacs for is the fact that defaults don't mean
               | much when it's all configurable. So if you don't like the
               | keys, just bind them to whatever you like.
               | 
               | Configurability is a strength of a system, but it is no
               | an answer to a difficult learning curve. A user must
               | first understand the system in order to configure it
               | appropriately.
               | 
               | Even at the level of key bindings, the user needs to
               | understand the relative frequency and importance of an
               | operation to choose an appropriate key combination.
               | Universal reconfiguration may even make the system less
               | learnable, if documentation and tutorials can't assume a
               | reasonable default configuration.
               | 
               | In my opinion, configuration is great as one of the final
               | steps of a user's journey, taking the system from
               | something that works to something that _sings_. It 's
               | just the wrong level to sell benefits to beginners.
        
               | medo-bear wrote:
               | > Even at the level of key bindings, the user needs to
               | understand the relative frequency and importance of an
               | operation to choose an appropriate key combination.
               | Universal reconfiguration may even make the system less
               | learnable, if documentation and tutorials can't assume a
               | reasonable default configuration.
               | 
               | i have a feeling that people who write these things have
               | never really tried emacs beyond opening it and getting
               | annoyed that ctrl-c/v/x don't work (at first) the way
               | they are used to
               | 
               | emacs is not key-binding-based, it is command based. if
               | you change a key binding its not like you can wreck
               | anything as you can always call the command prompt by M-x
               | and search for the command that you wanted some key
               | binding to perform. key-bindings are just shortcuts to
               | commands so i think its best to listen to your fingers
               | and form muscle memory and then assign them
               | 
               | what are your most basic commands? copy, paste, select,
               | start/end of line/function/class/paragraph/etc, move by
               | word/sentence/etc, save, exit? these are not that many to
               | set to whatever key combinations you want. i wish my
               | browser had at least this level of extensibility
        
               | medo-bear wrote:
               | > it's all configurable
               | 
               | also the ecosystem is huge and chances are that the
               | configuration you are after is just a package-install
               | away
        
             | medo-bear wrote:
             | > I know you probably all love it, but everyone else is not
             | interested in breaking of their fingers by learning obscure
             | key command chains just to use org-mode. Sorry, but that is
             | just the reality
             | 
             | i'm sorry to burst your strong held convictions but you can
             | choose any of the following
             | 
             | a) use any key-bindings you like including emacs, vim, cua,
             | or combination of
             | 
             | b) use org-mode without any knowledge of more advanced
             | emacs commands (except basic knowledge of using an editor)
             | 
             | c) drink some milk (gotta have strong bones) and learn how
             | to use the emacs system including emacs lisp and have one
             | of the most advanced computing environments in existence at
             | your service
             | 
             | sorry, but that is just the reality
        
               | scombridae wrote:
               | About 1000 times per day, someone says emacs is too big
               | an ask for org-mode, and someone replies, it's
               | configurable to feel like whatever you're used to.
               | 
               | The latter needs to accept that most users, particularly
               | scientists, reject out-of-hand anything requiring
               | configuration or compilation no matter how trivial.
               | 
               | But it's all moot since org-mode is largely promoted by
               | non-scientists (computer science is not a science), and
               | should a wysiwig-inclined scientist ever get past the
               | emacs obstacle, he'll balk at the awkward BEGIN_SRC
               | incantations.
        
               | medo-bear wrote:
               | > But it's all moot since org-mode is largely promoted by
               | non-scientists (computer science is not a science)
               | 
               | My academic training is in physics and mathematics. I was
               | introduced to programming in my computational physics
               | class. We used emacs as our editor
        
               | dtech wrote:
               | Congratulations for being special, 99% of academia uses
               | Matlab, a simple GUI text-editor or something like
               | Anaconda.
               | 
               | Emacs is such a fundamentally different paradigm from all
               | other IT tools/editors that it just doesn't make sense to
               | recommend a specialized tool with a steep learning curve
               | and non-transferable skills when it's not ubiquitous and
               | more standard alternatives exist without it which do OK.
               | It doesn't matter that emacs was historically the first
               | and everyone else decided to go in different directions,
               | that's just the reality of today.
        
               | medo-bear wrote:
               | > non-transferable skills
               | 
               | im curious which transferable skills you think Jupyter
               | has
               | 
               | > Emacs is such a fundamentally different paradigm from
               | all other IT tools/editors
               | 
               | its not. you use a mouse and click where you want your
               | pointer to go. you use a keyboard to type
               | 
               | > steep learning curve
               | 
               | this is very much like saying that linux has a steep
               | learning curve and you refuse to touch ubuntu because you
               | are scared to blow up your computer
        
               | scombridae wrote:
               | _much like saying that linux has a steep learning curve_
               | 
               | Most people do say that. What programmers cannot grasp is
               | progress is about giving people what they want, not what
               | is rationally best. Most scientists want to knock out a
               | paper or presentation, and make it home on time for
               | dinner.
        
               | medo-bear wrote:
               | > Most scientists want to ...
               | 
               | are you their union rep ?
        
               | BeetleB wrote:
               | > But it's all moot since org-mode is largely promoted by
               | non-scientists (computer science is not a science)
               | 
               | I think you're just suffering from selection bias, given
               | that this is on HN.
               | 
               | Add me to the list of people who began using Emacs and
               | Org mode during academia in a non-CS program.
               | 
               | Furthermore, go look at the Emacs conference - you'll
               | find a significant number of speakers are not CS folks.
        
               | avgcorrection wrote:
               | Is astronomy a science?
               | 
               | https://www.youtube.com/watch?v=WgyRdnjRI4o
        
             | avgcorrection wrote:
             | > I know you probably all love it, but everyone else is not
             | interested in breaking of their fingers by learning obscure
             | key command chains just to use org-mode. Sorry, but that is
             | just the reality. If someone can implement the majority of
             | org-mode in a better editor, there might be more users
             | interested.
             | 
             | https://www.spacemacs.org/
        
               | liotier wrote:
               | I would even settle for a wiki with state-of-the-art
               | outlining shortcuts and dates as a first-class dimension.
        
             | BeetleB wrote:
             | > But org-mode users need to understand that the one thing
             | holding org-mode back is simply emacs. I know you probably
             | all love it, but everyone else is not interested in
             | breaking of their fingers by learning obscure key command
             | chains just to use org-mode.
             | 
             | We get it - what makes you think we don't. We are merely
             | pointing out a superior solution.
             | 
             | Like back in 2004 I would tell people how many of their
             | problems would be resolved if they switched to Linux. Fast
             | forward two decades later, the statement is still true, and
             | most people still don't use Linux. But it wasn't a
             | problematic thing to point it out to them - be it in 2004
             | or now.
             | 
             | (It's a lot easier to use Emacs than switch to Linux.)
        
         | operator-name wrote:
         | Others have mentioned the usefulness of literate programming so
         | I won't reiterate that.
         | 
         | Partially the lack of discipline comes from the implicit data
         | dependancies between cells. Variables are all globally scoped
         | and unless you ensure the notebook can be ran top to bottom its
         | easy to introduce subtle bugs. I believe Julia's
         | https://github.com/fonsp/Pluto.jl solves this issue quite well.
         | 
         | Another part comes from cells that should really be functions.
         | In my opinion this is because functions are 2nd class citizens
         | compared to cells, and could be improved with UI (function
         | cells? node based programming?).
         | 
         | Programming is more than just manipulating text, so why
         | shouldn't tools move in a direction of just being fancy text
         | editors?
        
         | lake_vincent wrote:
         | Oh man, thank you. I grew up with C++ and Java as my main
         | languages, so I always feel more at home with a py file.
         | 
         | Notebooks never caught on with me.
        
         | kriro wrote:
         | It all depends on the context. In academia it is a great tool.
         | I can set up a couple of notebooks on our GPU server and give
         | many students access to powerful GPUs without having to worry
         | abbout shell access etc. Aditionally they are ready to go and
         | do interesting things immediately and don't have to install the
         | environment on their laptops (which might be win/linux/mac but
         | at least these days that's easier but still extra work for
         | them).
         | 
         | I also use it a lot for experimenting, parameter tuning etc.
         | It's not too bad to have it explicitly distinct from production
         | level code. Run/tune/experiment in notebook, once you're happy
         | with the model -> code it up in .py file(s). Also great for
         | quick presentations :)
         | 
         | However, the fast.ai team is actually doing a pretty solid job
         | running everything off notebooks. So if I wanted to go that
         | direction (and skip the .py files) it's that project I'd look
         | at for how to do it.
        
           | carderne wrote:
           | > without having to worry about shell access etc
           | 
           | Do you mean you _don't_ want to give the students shell
           | access?
           | 
           | By default you can run shell commands from within a Jupyter
           | notebook be prefixing them with `!`.
        
             | rovr138 wrote:
             | They might mean without having to setup individual user
             | accounts for the server.
             | 
             | https://jupyter.org/hub
             | 
             | and if it's just the professor's lab, just a jupyter lab
             | instance with one password works too.
        
         | mistrial9 wrote:
         | dude - more than 1 million undergraduate computer science
         | students worldwide will learn Jupyter this Fall, and you are
         | getting contrarian votes among a bunch of average-of-
         | masters+industry CS people here
         | 
         | "we" have to learn and teach the next sets of people new to
         | computer science
        
           | HuwFulcher wrote:
           | Totally agree with you. I've been teaching people over my
           | career so far, with varying degrees of success
        
         | slewis wrote:
         | nbdev2, which this article is about, is a solution to this
         | problem.
         | 
         | It makes notebooks testable, composable, versionable, and more.
        
         | fifilura wrote:
         | It can be very useful to run recurring jobs (e.g jobs that run
         | once per day) in a notebook to add the output as kind of
         | advanced logging.
         | 
         | And then serve the results as a static page under
         | my.logs.intranet/my-
         | job/2022-02-14/my_recurring_task_notebook.ipynb.html
         | 
         | You can get so much more context regarding what went well or
         | wrong compared to browsing through log lines in some more or
         | less user friendly tool.
        
       ___________________________________________________________________
       (page generated 2022-08-26 23:02 UTC)