[HN Gopher] JupyterLite - WASM-powered Jupyter running in the br...
___________________________________________________________________
JupyterLite - WASM-powered Jupyter running in the browser
Author : ahurmazda
Score : 197 points
Date : 2021-05-29 06:59 UTC (16 hours ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| qbasic_forever wrote:
| This is _really_ cool. I don't know why a lot of commenters here
| are going into the weeds to grouse about Java, flash, and general
| anger at computational notebooks.
|
| What we have here is a complete client-side browser environment
| for development. Not some half-assed language or hyper restricted
| toy--this is real Python, and your browser's full JS engine all
| available in JupyterLab's IDE (basically a simpler VS Code at
| this point, it uses the same editing component).
|
| We all freaked out a bit as Apple drove out IDEs from their app
| store, Google locked down Termux and similar developer tools from
| Android. Well, here's the answer to those situations. Something
| no app store owner can kill on a whim. I love stuff like this and
| hope it helps to enable and inspire the next generation of
| developers.
| ktpsns wrote:
| Despite Pyolite has a miserable performance (20MB of downloads),
| the overall project direction is correct.
|
| I said this already 10 years ago: We don't need more cloud
| computing but need to empower users end devices again. Jupyter is
| typically operated on powerful notebooks and not on mobile
| devices.
| atoav wrote:
| The solution I use for local jupyter notebooks is nteract [0]
| which is like a standalone application that can edit/open
| .ipynb files.
|
| It has a few quirks, but works quite good for daily use.
|
| [0] https://nteract.io/
| kzrdude wrote:
| vscodium can also open ipynb files, to give an alternative.
|
| I try to work mostly with "py:percent" script files now, in
| the same jupyter notebook style but without saving outputs.
| ipsum2 wrote:
| What do you like about nteract over Jupyter's frontend? The
| website is devoid of details.
| atoav wrote:
| I like that it is a standalone notepad.exe-like program
| that opens my notebook and just works. I usually use it
| more for looking at notebooks than writing them tho.
| rerx wrote:
| It's convenient to directly open a notebook from a browser
| download or a slack message without having to fire up
| jupyter in a terminal and navigate it to that temp file.
| codefreakxff wrote:
| I use vscode to open and run notebooks using the jupyter
| plugin. Works great. Don't need to fire up the browser
| rdedev wrote:
| Vscode is now my go-to for opening and running notebooks.
| The issue I had with the normal jupyter was it it's
| support for normal python files were bad. With vscode I
| can work on both normal python files and notebook files
| in the same window without much overhead
| Godel_unicode wrote:
| Pycharm is also excellent at this. My biggest problem
| with the normal jupyter interface is years of reflex that
| Ctrl+w is for deleting words. Quite annoying in a web
| browser.
| jcims wrote:
| Unfortunately we will need more cloud computing. If you're
| watching what's going on in the ransomware and cyber insurance
| space, small and many medium-sized companies that require E&O
| coverage for their contracts are not going to be able to afford
| to run on their own equipment.
| Godel_unicode wrote:
| Good. People who are bad at administering computers will stop
| doing it, and will focus on what they're good at.
|
| Then, we can use the on-demand nature of cloud services to
| reduce their power consumption. Simultaneously we can move
| that consumption into renewable-powered datacenters. This is
| literally better for everyone.
| jcims wrote:
| Meanwhile we create tremendous concentration risk and the
| world pays rent to Amazon, Google and Microsoft? I wouldn't
| call that 'good'.
| aiNohY6g wrote:
| Similar to jupyterlite: https://starboard.gg/jupystar (and
| https://starboard.gg)
| croes wrote:
| Isn't that not just Java all over again, but this time with
| JavaScript?
| fmajid wrote:
| No, WASM gets compiled to native code
| youngtaff wrote:
| Java bytecode got translated to native too
|
| Think we've just got better at runtimes over time (JIT,
| intermediate formats etc.) and WASM was designed to be good
| for this rather than needed to work with an already exising
| bytecode
| fmajid wrote:
| Reflection and other language features preclude direct
| translation of Java bytecode to machine code, whereas
| WASM is designed to be a portable assembly language,
| closer to the IL of GCC or LLVM.
| _old_dude_ wrote:
| java -XX:+PrintCompilation prints all methods/loops
| generated to native code
|
| The reflection API has two issues, a security check each
| time you call a method and the arguments being
| transformed to objects. The code is still generated to
| assembly code but the assembly code is slower because of
| that overhead.
| adimitrov wrote:
| There are crucial differences between Java applets and JS.
|
| - Applets tried to render their own GUI, Wasm doesn't and
| defers to the browser.
|
| - applets needed a big, slow to start and resource hungry VM.
| Wasm is running in the same thread your JS is also running
| in, it's light, and loads faster than JS
|
| - Java and flash were plugins, which needed to be installed
| and kept up to date separately. Wasm is baked into your
| browser's JS engine
|
| - Wasm code is very fast and can achieve near native
| execution speeds. It can make use of advanced optimisations.
| SIMD has shipped in Chrome, and will soon in Firefox
|
| - The wasm spec is very, very good, and really quite small.
| This means that implementing it is comparatively cheap, and
| this should make it easy to see it implemented by different
| vendors.
|
| - Java was just Java. Wasm can serve as a platform for any
| language. See my earlier point about the spec
|
| So it's apples and oranges. The _need_ to have something
| besides JS hasn 't gone away, so their use cases might be
| similar. The two technologies couldn't be more distinct,
| though.
| croes wrote:
| You must view the browser with JS and WASM as a unit.
|
| The browser renders it's own GUI too, it's not OS native
|
| The browser uses lots of resources too.
|
| The browser is kind of a plugin to the OS and must be
| updated separately.
|
| Java nowadays is pretty fast too.
|
| Java VM serves a platform for multiple languages like
| Scala, Kotlin, Clojure.
|
| Let's face it, the browser is the new JVM and a soon it
| gets the same permissions like the JVM to access the file
| system and such, we get the same problems.
| kierangill wrote:
| > You must view the browser with JS and WASM as a unit
|
| "Web" assembly is a bit of a misnomer. It's an IR at the
| end of the day and can be run without a browser[1]. But
| your other points could be true one day if the de facto
| WASM runtime becomes bloated or decides to ship with some
| GUI renderer.
|
| [1] https://github.com/bytecodealliance/wasmtime
| westurner wrote:
| From https://news.ycombinator.com/item?id=24052393 re:
| Starboard:
|
| > _https://developer.mozilla.org/en-
| US/docs/Web/Security/Subres... : "Subresource Integrity
| (SRI) is a security feature that enables browsers to
| verify that resources they fetch (for example, from a
| CDN) are delivered without unexpected manipulation. It
| works by allowing you to provide a cryptographic hash
| that a fetched resource must match."_
|
| > _There 's a new Native Filesystem API: "The new Native
| File System API allows web apps to read or save changes
| directly to files and folders on the user's device."_
| https://web.dev/native-file-system/
|
| > _We 'll need a way to grant specific URLs specific,
| limited amounts of storage._
|
| [...]
|
| > _https://github.com/deathbeds/jyve/issues/46 _ :
|
| > _Would [Micromamba] and conda-forge build a WASM
| architecture target?_
| Dzugaru wrote:
| > a soon it gets the same permissions like the JVM to
| access the file system and such
|
| Like... never?
|
| We get better systems as we get more experience. That's
| why C# was better than Java, Java today is better than
| Java was when C# launched. That's why we now have amazing
| languages like Rust and also that's why the same problems
| will never be the same given we have a ton of experience
| with VMs, docker, sandboxing in browsers etc.
| pjmlp wrote:
| Java and Flash, but now it is Good (TM), because the powers
| that be decided so.
|
| "Everything Old is New Again: Binary Security of WebAssembly"
|
| https://www.usenix.org/conference/usenixsecurity20/presentat.
| ..
|
| So, enjoy the 2nd coming of applets/flash,
|
| https://platform.uno/
|
| https://dotnet.microsoft.com/apps/aspnet/web-apps/blazor
|
| https://tinygo.org/
|
| .... favourite stack compiled into WASM.
| throwaway894345 wrote:
| > I said this already 10 years ago: We don't need more cloud
| computing but need to empower users end devices again. Jupyter
| is typically operated on powerful notebooks and not on mobile
| devices.
|
| If you're working with data of any significant size at all it
| then it doesn't matter how fast your user device is--it's so
| much cheaper (time and network egress costs) to send the
| computations from a user device to the cloud than to pull tens-
| thousands of GB of data to your local machine. Moreover, I
| don't know of many local machines with tens of CPU cores,
| hundreds or thousands of GB of RAM, or tens-hundreds of TB of
| SSD for handling that computation quickly.
|
| User devices are great for very small data, but I don't see the
| point for larger datasets.
| otabdeveloper4 wrote:
| You definitely shouldn't be running this stuff from a Jupyter
| notebook.
| throwaway894345 wrote:
| Why not? We make a lot of money offering it to people.
| otabdeveloper4 wrote:
| Yes, the market in enabling people to do the "you
| shouldn't be doing this" stupid things is huge.
| throwaway894345 wrote:
| Any substance on why we shouldn't be doing it or why it's
| stupid? What's the alternative? Should researchers all
| learn Kubernetes and AWS and deploy their own
| environments?
| otabdeveloper4 wrote:
| The problem with Jupyter is that it impedes common-sense
| practices like version control, reproducibility, and
| automation.
|
| If you're spending the time and effort to rent these big
| servers, why not spend the 5 percent of the effort and do
| it right?
|
| Jupyter exists mostly because analytics/math guys are too
| lazy to spend a day learning software development
| practices. Must be some sort of us-vs-them point of
| misplaced pride.
| parasubvert wrote:
| I am shocked at your unjustified and rather arrogant
| gatekeeping here.
|
| Version controlled Jupyter notebooks running in an
| automated environment (eg Kubernetes), with repeatable
| test data loading into a processing environment (eg.
| MinIO or Spark), is quite commonplace. Making it even
| easier with WASM makes sense.
|
| What about Jupyter impedes good practices? That it
| empowers ad hoc exploration at all? It is merely an IDE
| tailored to sharing interactive text and code. To me it
| is one of the most exciting ecosystems for modern
| software development (and I've been developing software
| for 30 years).
| musingsole wrote:
| > Jupyter exists mostly because analytics/math guys are
| too lazy to spend a day learning software development
| practices
|
| Bahahahaha, yup those dang lazy mathematicians just
| shooting themselves in the foot and forcing us to deal
| with it! /s
|
| Your preferences for software development are irrelevant.
| The value is in delivering the math to the end user.
| Using GCP, kubernetes or JavaScript to do that is an
| implementation detail. Sorry to tell you this, but you're
| a servant to those dang lazy analysts and without their
| insights, you're worthless.
| otabdeveloper4 wrote:
| You misunderstood.
|
| The guy above correctly said that Jupyter is nice for ad-
| hoc analysis kind of work. The problem is that when
| you've reached terabytes of data and tens of cores you're
| not "ad-hoc" anymore.
|
| Too often the math guys try to avoid responsibility by
| claiming they're doing "ad-hoc" work when they're clearly
| not anymore. It's convenient, yes, but leads to a bad
| place eventually.
| musingsole wrote:
| I understood fine. The problem is trying to formalize "ad
| hoc" versus "production development" practices as if
| they're meaningful.
|
| There isn't some golden truth of software development
| that analysts are too lazy to learn and implement.
| There's the problem and then there's solutions.
| Complaining that Jupyter-based development doesn't
| adequately accomdate version control or some other
| whistle commonly used in software development is some
| peak developer entitlement.
| otabdeveloper4 wrote:
| Your boss _will_ eventually ask to make your "ad-hoc"
| stuff "production".
|
| At which point you'll dump the hot mess in somebody
| else's lap, and the whole thing will be rewritten from
| scratch.
|
| If that's your thing, then go for it.
| throwaway894345 wrote:
| That's fine. Why waste time productionizing something
| that may or may not ever go to production? We do this all
| the time. Do you think Tesla built a whole fully-
| automated factory to build its first proof-of-concept
| car?
| throwaway894345 wrote:
| > The problem is that when you've reached terabytes of
| data and tens of cores you're not "ad-hoc" anymore.
|
| Not in any meaningful way, the code to explore a 50MB
| data set on a little machine looks the same as the code
| to explore a 50GB data set on a big machine, so of course
| one doesn't need version control more than the other.
|
| > Too often the math guys try to avoid responsibility by
| claiming they're doing "ad-hoc" work when they're clearly
| not anymore. It's convenient, yes, but leads to a bad
| place eventually.
|
| This is a moralistic argument. The economic argument is
| that the primary artifact of research is _insight_ , not
| code, so getting to that novel insight as fast as
| possible is paramount. Putting version control, tests, or
| other ceremony into the exploration loop is a pointless
| cost. Productionizing that insight can happen later in a
| more traditional software development workflow. It's
| similar to how we write proofs of concept without the
| intensive testing effort that we would go into if we were
| writing production software.
| singhrac wrote:
| I think that's a misguided take. I'm a software developer
| and data scientist (of sorts). Jupyter is an extremely
| convenient tool for adhoc data analysis. By default it
| gives you easy visualizations and inspection ability,
| allowing you to verify intermediate computation before
| rerunning.
|
| It's easy to convert a notebook into a script with
| version control, reproducibility, and automation.
| OldTimeCoffee wrote:
| Most users don't have 'tens-thousands' of GB of data as part
| of their use case. You're describing a business case, not an
| end user consumer case.
| spicyramen wrote:
| Most of the users will not run Jupyter Notebooks locally,
| they expect a remote kernel in the cloud with GPU/multicore
| machines, only the web app will be running in browser
| (think of big query). Of course small users with you
| datasets will run locally.
| candiodari wrote:
| This depends greatly what you want to do. After all, this
| has incredible latency (compared to the cloud), limited
| throughput. Zero configuration. Free.
|
| You want to train the next imagenet model? Analyse 100Gig
| database? Probably not the correct tradeoff.
|
| You want 20 students to have a perfectly consistent
| instant-start-up python environment? Definitely the
| correct tradeoff.
|
| You want to try some python methods, write tests, ...
| Very short latency is going to help you more than
| throughput is.
| throwaway894345 wrote:
| I don't believe that the majority of Jupyterhub users are
| "consumers" rather than professionals, but more
| importantly, that doesn't change the fact that the
| professional use case exist and isn't amenable to the fat
| client approach.
|
| As an anecdote, I work on a Jupyterhub managed service
| offering with customers in both the private and public
| sectors and our data sizes are pretty much all in this
| range.
| mrtesthah wrote:
| > I work on a Jupyterhub managed service offering ...
|
| That sounds like selection bias.
| throwaway894345 wrote:
| It might be. Like I said: anecdote.
| mrtesthah wrote:
| Isn't it, though? The people most likely to need a cloud-
| managed service like that probably have too much data to
| crunch on a laptop, as you described.
| throwaway894345 wrote:
| Ah, to be clear I'm not saying our users are a
| representative sample of Jupyterhub users. I _am_ saying
| that there are a lot of people who use Jupyterhub for
| large datasets--it's certainly not uncommon.
| [deleted]
| egeozcan wrote:
| To be fair, "we don't need more cloud computing" doesn't
| mean, "we don't need any cloud computing".
|
| I don't agree that we don't need more of cloud, IMO we do,
| but we need to focus on personal computing much more than
| now, which is the general theme of the GP comment.
|
| Not everything is big data.
| throwaway894345 wrote:
| To be clear, I'm not talking about big data. More
| impotantly though, I didn't say there was no place for
| client compute, only that it isn't economical for datasets
| in excess of a few GB.
| huac wrote:
| Not only do cloud services offer better compute capabilities
| (and GPUs/TPUs etc), but they offer easier reproduciblity and
| sharing. Even when I hack on stuff myself, Colab is quick and
| easy to set up, no worrying about Docker or virtualenvs.
| [deleted]
| bhl wrote:
| How far are we away from having _collaborative_ Jupyter in the
| browser? Would love a Google Docs experience of sharing a no-sign
| up required link to help remotely teach basic python.
| mcintyre1994 wrote:
| Livebook does this! It only runs Elixir for now, but there is
| an issue to add other languages in future. It's a really cool
| project IMO. https://github.com/elixir-nx/livebook
| qbasic_forever wrote:
| Not that far, someone just needs to make a JupyterLab plugin
| that uses Automerge or a similar OT/CRDT structure for
| collaborative editing documents in a workspace (perhaps using
| WebRTC data channels for P2P sync between clients, or stick
| with the tried and true server model like Google docs). The
| trouble is turning that into something as polished and secure
| as Google Docs collaborative editing experience--there's a
| _lot_ of work to get there with tons of little corner cases,
| security issues (you're potentially giving strangers over the
| internet access to remotely run code in your browser--that
| should raise big alarm bells), etc. to think through. But the
| basic stuff is all out there for someone motivated to pick up
| and go wild with.
| jtpx wrote:
| There is a PR to add initial support for real time
| collaboration in JupyterLite:
| https://github.com/jtpio/jupyterlite/pull/109
|
| This reuses almost all the RTC work done upstream in
| JupyterLab itself.
|
| And since this is implemented as a regular JupyterLab plugin,
| folks will then be able to swap it for something else and
| implement their own if they want to, as a federated
| extension.
| dynamicwebpaige wrote:
| JupyterLab does support real-time collaborative editing in the
| browser, as of a few weeks ago.
|
| https://github.com/jupyterlab/rtc
| airocker wrote:
| The kernels will still run on the cloud I believe. It would be
| great if Jupiter works better for larger programs and ui
| development.
| qbasic_forever wrote:
| No, read more from the README. This uses pyodide, a WASM port
| of desktop Python that runs entirely in the browser (obviously
| stuff like file access is sandboxed). In addition it adds a web
| worker that runs a Javascript kernel powered by your browser's
| JS engine. All of this runs 100% in your browser, there is no
| server component at all.
| brumar wrote:
| I use frequently the basthon notebook
| https://notebook.basthon.fr/ which is also a wasm powered jupyter
| (based on pyodide) and quite like it. It's flying a bit under the
| radar as it's not translated in other langage than french. How
| does it compare to this project?
___________________________________________________________________
(page generated 2021-05-29 23:02 UTC)