[HN Gopher] How we made Jupyter notebooks load faster
___________________________________________________________________
How we made Jupyter notebooks load faster
Author : lneves12
Score : 117 points
Date : 2024-09-10 13:19 UTC (9 hours ago)
(HTM) web link (www.singlestore.com)
(TXT) w3m dump (www.singlestore.com)
| davidgomes wrote:
| I was on this team at SingleStore and I can vouch for how hard
| this team worked on this project. I just opened a couple
| notebooks in production and they loaded *instantly*, so kudos to
| the team for seeing this project through.
|
| (If you're not familiar with SingleStore's Jupyter Notebooks,
| they're similar to Databricks Notebooks[1] or Azure Synapse
| Notebooks[2]).
|
| [1]: https://docs.databricks.com/en/notebooks/index.html
|
| [2]: https://learn.microsoft.com/en-us/azure/synapse-
| analytics/sp...
| paddy_m wrote:
| Impressive work. I love jupyter, but it's a bear to work on.
|
| What mix of JS packages do you see? Could that be built into one
| uber package?
| lneves12 wrote:
| We are actually bundling everything inside one big main.js
| file, compared to jupyter-lab app that loads each extension
| from a different file using webpack federated modules. We did
| some benchmarking and it was actually faster than having one
| file per extension. There is definitely still some room for
| improvement here, but we have some other places we would like
| to optimize first, like, optimising the fetching of the
| notebooks content.
|
| You can take a look at the notebooks entrypoint network
| request: https://portal-notebooks.singlestore.com/
| paddy_m wrote:
| I take it that's supposed to be the pre-loading page to just
| look at the requests, not a full working UI?
|
| After cursory googling I couldn't find one, do you have a
| public notebook gallery?
| lneves12 wrote:
| Yeah, sorry should have made that more clear. This doesn't
| really load anything, it's just the entry point for our
| iframe. To try our notebooks you can create an account at
| portal.singlestore.com (we have a gallery of notebooks
| there)
| canucker2016 wrote:
| I don't see a Content-Encoding header on the response for the
| JS and HTML files, which suggests the 11.5MB JS and the HTML
| files aren't compressed.
|
| Not much of a worry on the tiny HTML file, but the 11.5MB JS
| file should compress to a much smaller file on the wire.
| spiralk wrote:
| I dislike how Jupyter notebooks have become normalized. Yes, the
| interactive execution and visuals are nice for more academic
| workflows where the priority is quick results over code
| organization. However, when it comes to sharing code with others
| for the sake of doing reproducible science, jupyter notebooks
| cause more trouble than they are worth. Using cell based
| execution with python is so elegant with '# %%' lines in regular
| .py files (though it requires using VSCode or fiddling with vim
| plugins which not all scientists want to do I suppose). No .ipynb
| is necessary, .py files can be version controlled and shared like
| normal code while sill retaining the ability to use
| interactively, cell by cell.
|
| Its much easier to organize .py files into a proper python
| module, and then share and collaborate with others. Instead,
| groups will collect jumbles of slightly different versions of the
| same jupyter notebooks that progressively become more complex and
| less manageable over time. It's not a hypothetical unfortunately,
| I've seen this happen at major university labs. I'm not blaming
| anyone because I understand -- the funding is there to do science
| and not rewrite code to build convenient software libraries. Yet,
| I can't help but wish jupyter notebooks could be removed from
| academic workflows.
| luplex wrote:
| In the end, usability wins. In a Jupyter notebook, you have a
| much better idea of state between cells, you can iterate much
| faster, you can write documentation in readable markdown.
| Often, Jupiter notebooks are more like interactive markdown
| than they are like python scripts.
| dxbydt wrote:
| > In a Jupyter notebook... > Often, Jupiter notebooks...
|
| Everytime I search my Slack, I have to run two searches
| because DS can't agree on how to spell the damn thing.
| Twirrim wrote:
| I use jupyter notebooks at work, not so much for academic
| stuff, but often to help build and show a narrative to folks,
| including executives (where I have any even remotely technical
| leadership). It's great for narrative stuff, especially being
| able to emit PDFs and what not. I've been in a number of
| meetings where I've got the code up in Jupyter, sharing the
| screen, and leadership want us to tweak numbers and see the
| consequences.
|
| It's great for exploring code and data too, especially
| situations where I'm really trying to feel my way towards a
| solution. I get to merrily intermingle rich text narrative and
| code so I explain how I got to where I got to and can walk
| people through it (I did that with some experimenting with an
| SMT solver several months ago, meant that people that had no
| experience with an SMT solver could understand the model I
| built).
|
| I'd never use it to share code though. If we get to that stage,
| it's time to export from jupyter (which it natively supports),
| and then tidy up the code and productionise it. There's no way
| jupyter should be the deployed thing.
| spiralk wrote:
| That seems like a reasonable way to use jupyter notebooks
| since you have an actual plan to move beyond it when
| necessary. My issue is mostly with the way its misused, often
| by people who are arguably at the top of the field.
| KolenCh wrote:
| I don't disagree anything you said. Jupytext can be a good tool
| to bridge some gap, where you pair ipynb to a py script and can
| then commit the py only (git-ignore all ipynb for your
| collaborators.)
|
| Also, while many practices out there is questionable, in
| alternative scenarios where ipynb doesn't exist, they might
| have been using something like matlab for example. Eg, in my
| field (physics), often time there are experimentalists doing
| some coding. Ipynb can be very enabling for them.
|
| I think a piece of research should be broken down and worked by
| multiple people to improve the state of the project. Some
| scientists might be passing you the initial prototype in the
| form of a notebook, and some others should be refactoring to
| something more suitable for deployment and archival purpose.
| Properly funding these roles is important, and is lacking but
| improving (eg hiring RSE.)
|
| In my field, the most prominent way when ipynb is shared a lot
| is for training. It's a great application as that becomes
| literate programming. In this sense notebook is highly
| underused as literate programming still hasn't got mainstream.
| ambicapter wrote:
| The form factor of Jupyter notebooks seems to fit well with
| peoples workflows though. Looks like you just wish the
| internals of Jupyter were better architected.
| spiralk wrote:
| Imo, the better architected .ipynb is simply .py with '# %%'
| blocks. It does almost everything a .ipynb can do with the
| right VSCode extensions. Even interactive visualizations can
| be sent to a browser window or saved to disk with plotly.
| Though I do wish '# %%' cell based execution was accessible
| to more people.
|
| There isn't a single install tool that "just works" for this
| at the moment. If editors came with more robust support for
| it by default, I think the notebook format wouldn't be needed
| at that point and people could use regular python and
| interactive cell based python more interchangeably. I've seen
| important code get buried under collections of jupyter
| notebooks across different users so I have a good reason for
| this. Notebooks simply dont scale beyond a certain
| complexity.
| abdullahkhalids wrote:
| The same problem exists with spreadsheets. Should we get rid of
| excel (the single tool that literally runs half the world), and
| start manually writing markdown tables in text files?
|
| The tool and the tool maker are supposed to serve the user. The
| user is not supposed to conform to the whims of the tool maker.
| epistasis wrote:
| I think there's a fundamental mistunderstanding and mismatch
| between what you want to do, and what Jupyter notebooks are
| for. The distinction is between code versus the results.
|
| If the code is the end product, sure, use a python package.
|
| But does your .py with `# %%` in it also store the outputs? If
| not, why even bring this up? A .py output without the plots
| tied to the code doesn't meet the basic use case.
|
| If the end product is the plot, I want to see how that plot was
| generated. And a Jupyter notebook is a much much better
| artifact than a Python package, unless that Python package hard
| codes the inputs and execution path like a notebook would.
|
| Over the past 20 years of my career I have run into this
| divergence of use cases a lot. Software engineers seem to not
| understand the end goals, how it should be performed, and the
| learnings of the practitioners that have been generating
| results for a long time. It's hard to protect data scientists
| from these inflexible software engineers that see "aha that's
| code, I know this!" without bothering to understand the actual
| use case at hand.
| spiralk wrote:
| Not having the outputs tied into the code is actually
| preferable if the ultimate goal is reproducible science. Code
| should be code, documentation should be documentation, and
| outputs should be outputs. Having multiple copies of
| important code in non-version controlled files is not a good
| practice. Having documentation dispersed with questionable
| organization in unsearchable files is not good a practice.
| Having outputs without run information and timestamps is not
| a good practice. Its easy to fall in to those traps with
| Jupyter notebooks. It might speed up initial set up and
| experimentation, but I've been working academic labs long
| enough to see the downstream effects.
| majormajor wrote:
| Having the outputs recorded _alongside specific versions of
| the code_ can actually be very valuable.
|
| But since most uses of Jupyter notebooks I've seen don't
| version control them much at all, it's not as useful in
| practice often.
___________________________________________________________________
(page generated 2024-09-10 23:00 UTC)