[HN Gopher] How we made Jupyter notebooks load faster
       ___________________________________________________________________
        
       How we made Jupyter notebooks load faster
        
       Author : lneves12
       Score  : 117 points
       Date   : 2024-09-10 13:19 UTC (9 hours ago)
        
 (HTM) web link (www.singlestore.com)
 (TXT) w3m dump (www.singlestore.com)
        
       | davidgomes wrote:
       | I was on this team at SingleStore and I can vouch for how hard
       | this team worked on this project. I just opened a couple
       | notebooks in production and they loaded *instantly*, so kudos to
       | the team for seeing this project through.
       | 
       | (If you're not familiar with SingleStore's Jupyter Notebooks,
       | they're similar to Databricks Notebooks[1] or Azure Synapse
       | Notebooks[2]).
       | 
       | [1]: https://docs.databricks.com/en/notebooks/index.html
       | 
       | [2]: https://learn.microsoft.com/en-us/azure/synapse-
       | analytics/sp...
        
       | paddy_m wrote:
       | Impressive work. I love jupyter, but it's a bear to work on.
       | 
       | What mix of JS packages do you see? Could that be built into one
       | uber package?
        
         | lneves12 wrote:
         | We are actually bundling everything inside one big main.js
         | file, compared to jupyter-lab app that loads each extension
         | from a different file using webpack federated modules. We did
         | some benchmarking and it was actually faster than having one
         | file per extension. There is definitely still some room for
         | improvement here, but we have some other places we would like
         | to optimize first, like, optimising the fetching of the
         | notebooks content.
         | 
         | You can take a look at the notebooks entrypoint network
         | request: https://portal-notebooks.singlestore.com/
        
           | paddy_m wrote:
           | I take it that's supposed to be the pre-loading page to just
           | look at the requests, not a full working UI?
           | 
           | After cursory googling I couldn't find one, do you have a
           | public notebook gallery?
        
             | lneves12 wrote:
             | Yeah, sorry should have made that more clear. This doesn't
             | really load anything, it's just the entry point for our
             | iframe. To try our notebooks you can create an account at
             | portal.singlestore.com (we have a gallery of notebooks
             | there)
        
           | canucker2016 wrote:
           | I don't see a Content-Encoding header on the response for the
           | JS and HTML files, which suggests the 11.5MB JS and the HTML
           | files aren't compressed.
           | 
           | Not much of a worry on the tiny HTML file, but the 11.5MB JS
           | file should compress to a much smaller file on the wire.
        
       | spiralk wrote:
       | I dislike how Jupyter notebooks have become normalized. Yes, the
       | interactive execution and visuals are nice for more academic
       | workflows where the priority is quick results over code
       | organization. However, when it comes to sharing code with others
       | for the sake of doing reproducible science, jupyter notebooks
       | cause more trouble than they are worth. Using cell based
       | execution with python is so elegant with '# %%' lines in regular
       | .py files (though it requires using VSCode or fiddling with vim
       | plugins which not all scientists want to do I suppose). No .ipynb
       | is necessary, .py files can be version controlled and shared like
       | normal code while sill retaining the ability to use
       | interactively, cell by cell.
       | 
       | Its much easier to organize .py files into a proper python
       | module, and then share and collaborate with others. Instead,
       | groups will collect jumbles of slightly different versions of the
       | same jupyter notebooks that progressively become more complex and
       | less manageable over time. It's not a hypothetical unfortunately,
       | I've seen this happen at major university labs. I'm not blaming
       | anyone because I understand -- the funding is there to do science
       | and not rewrite code to build convenient software libraries. Yet,
       | I can't help but wish jupyter notebooks could be removed from
       | academic workflows.
        
         | luplex wrote:
         | In the end, usability wins. In a Jupyter notebook, you have a
         | much better idea of state between cells, you can iterate much
         | faster, you can write documentation in readable markdown.
         | Often, Jupiter notebooks are more like interactive markdown
         | than they are like python scripts.
        
           | dxbydt wrote:
           | > In a Jupyter notebook... > Often, Jupiter notebooks...
           | 
           | Everytime I search my Slack, I have to run two searches
           | because DS can't agree on how to spell the damn thing.
        
         | Twirrim wrote:
         | I use jupyter notebooks at work, not so much for academic
         | stuff, but often to help build and show a narrative to folks,
         | including executives (where I have any even remotely technical
         | leadership). It's great for narrative stuff, especially being
         | able to emit PDFs and what not. I've been in a number of
         | meetings where I've got the code up in Jupyter, sharing the
         | screen, and leadership want us to tweak numbers and see the
         | consequences.
         | 
         | It's great for exploring code and data too, especially
         | situations where I'm really trying to feel my way towards a
         | solution. I get to merrily intermingle rich text narrative and
         | code so I explain how I got to where I got to and can walk
         | people through it (I did that with some experimenting with an
         | SMT solver several months ago, meant that people that had no
         | experience with an SMT solver could understand the model I
         | built).
         | 
         | I'd never use it to share code though. If we get to that stage,
         | it's time to export from jupyter (which it natively supports),
         | and then tidy up the code and productionise it. There's no way
         | jupyter should be the deployed thing.
        
           | spiralk wrote:
           | That seems like a reasonable way to use jupyter notebooks
           | since you have an actual plan to move beyond it when
           | necessary. My issue is mostly with the way its misused, often
           | by people who are arguably at the top of the field.
        
         | KolenCh wrote:
         | I don't disagree anything you said. Jupytext can be a good tool
         | to bridge some gap, where you pair ipynb to a py script and can
         | then commit the py only (git-ignore all ipynb for your
         | collaborators.)
         | 
         | Also, while many practices out there is questionable, in
         | alternative scenarios where ipynb doesn't exist, they might
         | have been using something like matlab for example. Eg, in my
         | field (physics), often time there are experimentalists doing
         | some coding. Ipynb can be very enabling for them.
         | 
         | I think a piece of research should be broken down and worked by
         | multiple people to improve the state of the project. Some
         | scientists might be passing you the initial prototype in the
         | form of a notebook, and some others should be refactoring to
         | something more suitable for deployment and archival purpose.
         | Properly funding these roles is important, and is lacking but
         | improving (eg hiring RSE.)
         | 
         | In my field, the most prominent way when ipynb is shared a lot
         | is for training. It's a great application as that becomes
         | literate programming. In this sense notebook is highly
         | underused as literate programming still hasn't got mainstream.
        
         | ambicapter wrote:
         | The form factor of Jupyter notebooks seems to fit well with
         | peoples workflows though. Looks like you just wish the
         | internals of Jupyter were better architected.
        
           | spiralk wrote:
           | Imo, the better architected .ipynb is simply .py with '# %%'
           | blocks. It does almost everything a .ipynb can do with the
           | right VSCode extensions. Even interactive visualizations can
           | be sent to a browser window or saved to disk with plotly.
           | Though I do wish '# %%' cell based execution was accessible
           | to more people.
           | 
           | There isn't a single install tool that "just works" for this
           | at the moment. If editors came with more robust support for
           | it by default, I think the notebook format wouldn't be needed
           | at that point and people could use regular python and
           | interactive cell based python more interchangeably. I've seen
           | important code get buried under collections of jupyter
           | notebooks across different users so I have a good reason for
           | this. Notebooks simply dont scale beyond a certain
           | complexity.
        
         | abdullahkhalids wrote:
         | The same problem exists with spreadsheets. Should we get rid of
         | excel (the single tool that literally runs half the world), and
         | start manually writing markdown tables in text files?
         | 
         | The tool and the tool maker are supposed to serve the user. The
         | user is not supposed to conform to the whims of the tool maker.
        
         | epistasis wrote:
         | I think there's a fundamental mistunderstanding and mismatch
         | between what you want to do, and what Jupyter notebooks are
         | for. The distinction is between code versus the results.
         | 
         | If the code is the end product, sure, use a python package.
         | 
         | But does your .py with `# %%` in it also store the outputs? If
         | not, why even bring this up? A .py output without the plots
         | tied to the code doesn't meet the basic use case.
         | 
         | If the end product is the plot, I want to see how that plot was
         | generated. And a Jupyter notebook is a much much better
         | artifact than a Python package, unless that Python package hard
         | codes the inputs and execution path like a notebook would.
         | 
         | Over the past 20 years of my career I have run into this
         | divergence of use cases a lot. Software engineers seem to not
         | understand the end goals, how it should be performed, and the
         | learnings of the practitioners that have been generating
         | results for a long time. It's hard to protect data scientists
         | from these inflexible software engineers that see "aha that's
         | code, I know this!" without bothering to understand the actual
         | use case at hand.
        
           | spiralk wrote:
           | Not having the outputs tied into the code is actually
           | preferable if the ultimate goal is reproducible science. Code
           | should be code, documentation should be documentation, and
           | outputs should be outputs. Having multiple copies of
           | important code in non-version controlled files is not a good
           | practice. Having documentation dispersed with questionable
           | organization in unsearchable files is not good a practice.
           | Having outputs without run information and timestamps is not
           | a good practice. Its easy to fall in to those traps with
           | Jupyter notebooks. It might speed up initial set up and
           | experimentation, but I've been working academic labs long
           | enough to see the downstream effects.
        
             | majormajor wrote:
             | Having the outputs recorded _alongside specific versions of
             | the code_ can actually be very valuable.
             | 
             | But since most uses of Jupyter notebooks I've seen don't
             | version control them much at all, it's not as useful in
             | practice often.
        
       ___________________________________________________________________
       (page generated 2024-09-10 23:00 UTC)