[HN Gopher] JupyterLite - WASM-powered Jupyter running in the br...
       ___________________________________________________________________
        
       JupyterLite - WASM-powered Jupyter running in the browser
        
       Author : ahurmazda
       Score  : 197 points
       Date   : 2021-05-29 06:59 UTC (16 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | qbasic_forever wrote:
       | This is _really_ cool. I don't know why a lot of commenters here
       | are going into the weeds to grouse about Java, flash, and general
       | anger at computational notebooks.
       | 
       | What we have here is a complete client-side browser environment
       | for development. Not some half-assed language or hyper restricted
       | toy--this is real Python, and your browser's full JS engine all
       | available in JupyterLab's IDE (basically a simpler VS Code at
       | this point, it uses the same editing component).
       | 
       | We all freaked out a bit as Apple drove out IDEs from their app
       | store, Google locked down Termux and similar developer tools from
       | Android. Well, here's the answer to those situations. Something
       | no app store owner can kill on a whim. I love stuff like this and
       | hope it helps to enable and inspire the next generation of
       | developers.
        
       | ktpsns wrote:
       | Despite Pyolite has a miserable performance (20MB of downloads),
       | the overall project direction is correct.
       | 
       | I said this already 10 years ago: We don't need more cloud
       | computing but need to empower users end devices again. Jupyter is
       | typically operated on powerful notebooks and not on mobile
       | devices.
        
         | atoav wrote:
         | The solution I use for local jupyter notebooks is nteract [0]
         | which is like a standalone application that can edit/open
         | .ipynb files.
         | 
         | It has a few quirks, but works quite good for daily use.
         | 
         | [0] https://nteract.io/
        
           | kzrdude wrote:
           | vscodium can also open ipynb files, to give an alternative.
           | 
           | I try to work mostly with "py:percent" script files now, in
           | the same jupyter notebook style but without saving outputs.
        
           | ipsum2 wrote:
           | What do you like about nteract over Jupyter's frontend? The
           | website is devoid of details.
        
             | atoav wrote:
             | I like that it is a standalone notepad.exe-like program
             | that opens my notebook and just works. I usually use it
             | more for looking at notebooks than writing them tho.
        
             | rerx wrote:
             | It's convenient to directly open a notebook from a browser
             | download or a slack message without having to fire up
             | jupyter in a terminal and navigate it to that temp file.
        
               | codefreakxff wrote:
               | I use vscode to open and run notebooks using the jupyter
               | plugin. Works great. Don't need to fire up the browser
        
               | rdedev wrote:
               | Vscode is now my go-to for opening and running notebooks.
               | The issue I had with the normal jupyter was it it's
               | support for normal python files were bad. With vscode I
               | can work on both normal python files and notebook files
               | in the same window without much overhead
        
               | Godel_unicode wrote:
               | Pycharm is also excellent at this. My biggest problem
               | with the normal jupyter interface is years of reflex that
               | Ctrl+w is for deleting words. Quite annoying in a web
               | browser.
        
         | jcims wrote:
         | Unfortunately we will need more cloud computing. If you're
         | watching what's going on in the ransomware and cyber insurance
         | space, small and many medium-sized companies that require E&O
         | coverage for their contracts are not going to be able to afford
         | to run on their own equipment.
        
           | Godel_unicode wrote:
           | Good. People who are bad at administering computers will stop
           | doing it, and will focus on what they're good at.
           | 
           | Then, we can use the on-demand nature of cloud services to
           | reduce their power consumption. Simultaneously we can move
           | that consumption into renewable-powered datacenters. This is
           | literally better for everyone.
        
             | jcims wrote:
             | Meanwhile we create tremendous concentration risk and the
             | world pays rent to Amazon, Google and Microsoft? I wouldn't
             | call that 'good'.
        
         | aiNohY6g wrote:
         | Similar to jupyterlite: https://starboard.gg/jupystar (and
         | https://starboard.gg)
        
         | croes wrote:
         | Isn't that not just Java all over again, but this time with
         | JavaScript?
        
           | fmajid wrote:
           | No, WASM gets compiled to native code
        
             | youngtaff wrote:
             | Java bytecode got translated to native too
             | 
             | Think we've just got better at runtimes over time (JIT,
             | intermediate formats etc.) and WASM was designed to be good
             | for this rather than needed to work with an already exising
             | bytecode
        
               | fmajid wrote:
               | Reflection and other language features preclude direct
               | translation of Java bytecode to machine code, whereas
               | WASM is designed to be a portable assembly language,
               | closer to the IL of GCC or LLVM.
        
               | _old_dude_ wrote:
               | java -XX:+PrintCompilation prints all methods/loops
               | generated to native code
               | 
               | The reflection API has two issues, a security check each
               | time you call a method and the arguments being
               | transformed to objects. The code is still generated to
               | assembly code but the assembly code is slower because of
               | that overhead.
        
           | adimitrov wrote:
           | There are crucial differences between Java applets and JS.
           | 
           | - Applets tried to render their own GUI, Wasm doesn't and
           | defers to the browser.
           | 
           | - applets needed a big, slow to start and resource hungry VM.
           | Wasm is running in the same thread your JS is also running
           | in, it's light, and loads faster than JS
           | 
           | - Java and flash were plugins, which needed to be installed
           | and kept up to date separately. Wasm is baked into your
           | browser's JS engine
           | 
           | - Wasm code is very fast and can achieve near native
           | execution speeds. It can make use of advanced optimisations.
           | SIMD has shipped in Chrome, and will soon in Firefox
           | 
           | - The wasm spec is very, very good, and really quite small.
           | This means that implementing it is comparatively cheap, and
           | this should make it easy to see it implemented by different
           | vendors.
           | 
           | - Java was just Java. Wasm can serve as a platform for any
           | language. See my earlier point about the spec
           | 
           | So it's apples and oranges. The _need_ to have something
           | besides JS hasn 't gone away, so their use cases might be
           | similar. The two technologies couldn't be more distinct,
           | though.
        
             | croes wrote:
             | You must view the browser with JS and WASM as a unit.
             | 
             | The browser renders it's own GUI too, it's not OS native
             | 
             | The browser uses lots of resources too.
             | 
             | The browser is kind of a plugin to the OS and must be
             | updated separately.
             | 
             | Java nowadays is pretty fast too.
             | 
             | Java VM serves a platform for multiple languages like
             | Scala, Kotlin, Clojure.
             | 
             | Let's face it, the browser is the new JVM and a soon it
             | gets the same permissions like the JVM to access the file
             | system and such, we get the same problems.
        
               | kierangill wrote:
               | > You must view the browser with JS and WASM as a unit
               | 
               | "Web" assembly is a bit of a misnomer. It's an IR at the
               | end of the day and can be run without a browser[1]. But
               | your other points could be true one day if the de facto
               | WASM runtime becomes bloated or decides to ship with some
               | GUI renderer.
               | 
               | [1] https://github.com/bytecodealliance/wasmtime
        
               | westurner wrote:
               | From https://news.ycombinator.com/item?id=24052393 re:
               | Starboard:
               | 
               | > _https://developer.mozilla.org/en-
               | US/docs/Web/Security/Subres... : "Subresource Integrity
               | (SRI) is a security feature that enables browsers to
               | verify that resources they fetch (for example, from a
               | CDN) are delivered without unexpected manipulation. It
               | works by allowing you to provide a cryptographic hash
               | that a fetched resource must match."_
               | 
               | > _There 's a new Native Filesystem API: "The new Native
               | File System API allows web apps to read or save changes
               | directly to files and folders on the user's device."_
               | https://web.dev/native-file-system/
               | 
               | > _We 'll need a way to grant specific URLs specific,
               | limited amounts of storage._
               | 
               | [...]
               | 
               | > _https://github.com/deathbeds/jyve/issues/46 _ :
               | 
               | > _Would [Micromamba] and conda-forge build a WASM
               | architecture target?_
        
               | Dzugaru wrote:
               | > a soon it gets the same permissions like the JVM to
               | access the file system and such
               | 
               | Like... never?
               | 
               | We get better systems as we get more experience. That's
               | why C# was better than Java, Java today is better than
               | Java was when C# launched. That's why we now have amazing
               | languages like Rust and also that's why the same problems
               | will never be the same given we have a ton of experience
               | with VMs, docker, sandboxing in browsers etc.
        
           | pjmlp wrote:
           | Java and Flash, but now it is Good (TM), because the powers
           | that be decided so.
           | 
           | "Everything Old is New Again: Binary Security of WebAssembly"
           | 
           | https://www.usenix.org/conference/usenixsecurity20/presentat.
           | ..
           | 
           | So, enjoy the 2nd coming of applets/flash,
           | 
           | https://platform.uno/
           | 
           | https://dotnet.microsoft.com/apps/aspnet/web-apps/blazor
           | 
           | https://tinygo.org/
           | 
           | .... favourite stack compiled into WASM.
        
         | throwaway894345 wrote:
         | > I said this already 10 years ago: We don't need more cloud
         | computing but need to empower users end devices again. Jupyter
         | is typically operated on powerful notebooks and not on mobile
         | devices.
         | 
         | If you're working with data of any significant size at all it
         | then it doesn't matter how fast your user device is--it's so
         | much cheaper (time and network egress costs) to send the
         | computations from a user device to the cloud than to pull tens-
         | thousands of GB of data to your local machine. Moreover, I
         | don't know of many local machines with tens of CPU cores,
         | hundreds or thousands of GB of RAM, or tens-hundreds of TB of
         | SSD for handling that computation quickly.
         | 
         | User devices are great for very small data, but I don't see the
         | point for larger datasets.
        
           | otabdeveloper4 wrote:
           | You definitely shouldn't be running this stuff from a Jupyter
           | notebook.
        
             | throwaway894345 wrote:
             | Why not? We make a lot of money offering it to people.
        
               | otabdeveloper4 wrote:
               | Yes, the market in enabling people to do the "you
               | shouldn't be doing this" stupid things is huge.
        
               | throwaway894345 wrote:
               | Any substance on why we shouldn't be doing it or why it's
               | stupid? What's the alternative? Should researchers all
               | learn Kubernetes and AWS and deploy their own
               | environments?
        
               | otabdeveloper4 wrote:
               | The problem with Jupyter is that it impedes common-sense
               | practices like version control, reproducibility, and
               | automation.
               | 
               | If you're spending the time and effort to rent these big
               | servers, why not spend the 5 percent of the effort and do
               | it right?
               | 
               | Jupyter exists mostly because analytics/math guys are too
               | lazy to spend a day learning software development
               | practices. Must be some sort of us-vs-them point of
               | misplaced pride.
        
               | parasubvert wrote:
               | I am shocked at your unjustified and rather arrogant
               | gatekeeping here.
               | 
               | Version controlled Jupyter notebooks running in an
               | automated environment (eg Kubernetes), with repeatable
               | test data loading into a processing environment (eg.
               | MinIO or Spark), is quite commonplace. Making it even
               | easier with WASM makes sense.
               | 
               | What about Jupyter impedes good practices? That it
               | empowers ad hoc exploration at all? It is merely an IDE
               | tailored to sharing interactive text and code. To me it
               | is one of the most exciting ecosystems for modern
               | software development (and I've been developing software
               | for 30 years).
        
               | musingsole wrote:
               | > Jupyter exists mostly because analytics/math guys are
               | too lazy to spend a day learning software development
               | practices
               | 
               | Bahahahaha, yup those dang lazy mathematicians just
               | shooting themselves in the foot and forcing us to deal
               | with it! /s
               | 
               | Your preferences for software development are irrelevant.
               | The value is in delivering the math to the end user.
               | Using GCP, kubernetes or JavaScript to do that is an
               | implementation detail. Sorry to tell you this, but you're
               | a servant to those dang lazy analysts and without their
               | insights, you're worthless.
        
               | otabdeveloper4 wrote:
               | You misunderstood.
               | 
               | The guy above correctly said that Jupyter is nice for ad-
               | hoc analysis kind of work. The problem is that when
               | you've reached terabytes of data and tens of cores you're
               | not "ad-hoc" anymore.
               | 
               | Too often the math guys try to avoid responsibility by
               | claiming they're doing "ad-hoc" work when they're clearly
               | not anymore. It's convenient, yes, but leads to a bad
               | place eventually.
        
               | musingsole wrote:
               | I understood fine. The problem is trying to formalize "ad
               | hoc" versus "production development" practices as if
               | they're meaningful.
               | 
               | There isn't some golden truth of software development
               | that analysts are too lazy to learn and implement.
               | There's the problem and then there's solutions.
               | Complaining that Jupyter-based development doesn't
               | adequately accomdate version control or some other
               | whistle commonly used in software development is some
               | peak developer entitlement.
        
               | otabdeveloper4 wrote:
               | Your boss _will_ eventually ask to make your  "ad-hoc"
               | stuff "production".
               | 
               | At which point you'll dump the hot mess in somebody
               | else's lap, and the whole thing will be rewritten from
               | scratch.
               | 
               | If that's your thing, then go for it.
        
               | throwaway894345 wrote:
               | That's fine. Why waste time productionizing something
               | that may or may not ever go to production? We do this all
               | the time. Do you think Tesla built a whole fully-
               | automated factory to build its first proof-of-concept
               | car?
        
               | throwaway894345 wrote:
               | > The problem is that when you've reached terabytes of
               | data and tens of cores you're not "ad-hoc" anymore.
               | 
               | Not in any meaningful way, the code to explore a 50MB
               | data set on a little machine looks the same as the code
               | to explore a 50GB data set on a big machine, so of course
               | one doesn't need version control more than the other.
               | 
               | > Too often the math guys try to avoid responsibility by
               | claiming they're doing "ad-hoc" work when they're clearly
               | not anymore. It's convenient, yes, but leads to a bad
               | place eventually.
               | 
               | This is a moralistic argument. The economic argument is
               | that the primary artifact of research is _insight_ , not
               | code, so getting to that novel insight as fast as
               | possible is paramount. Putting version control, tests, or
               | other ceremony into the exploration loop is a pointless
               | cost. Productionizing that insight can happen later in a
               | more traditional software development workflow. It's
               | similar to how we write proofs of concept without the
               | intensive testing effort that we would go into if we were
               | writing production software.
        
               | singhrac wrote:
               | I think that's a misguided take. I'm a software developer
               | and data scientist (of sorts). Jupyter is an extremely
               | convenient tool for adhoc data analysis. By default it
               | gives you easy visualizations and inspection ability,
               | allowing you to verify intermediate computation before
               | rerunning.
               | 
               | It's easy to convert a notebook into a script with
               | version control, reproducibility, and automation.
        
           | OldTimeCoffee wrote:
           | Most users don't have 'tens-thousands' of GB of data as part
           | of their use case. You're describing a business case, not an
           | end user consumer case.
        
             | spicyramen wrote:
             | Most of the users will not run Jupyter Notebooks locally,
             | they expect a remote kernel in the cloud with GPU/multicore
             | machines, only the web app will be running in browser
             | (think of big query). Of course small users with you
             | datasets will run locally.
        
               | candiodari wrote:
               | This depends greatly what you want to do. After all, this
               | has incredible latency (compared to the cloud), limited
               | throughput. Zero configuration. Free.
               | 
               | You want to train the next imagenet model? Analyse 100Gig
               | database? Probably not the correct tradeoff.
               | 
               | You want 20 students to have a perfectly consistent
               | instant-start-up python environment? Definitely the
               | correct tradeoff.
               | 
               | You want to try some python methods, write tests, ...
               | Very short latency is going to help you more than
               | throughput is.
        
             | throwaway894345 wrote:
             | I don't believe that the majority of Jupyterhub users are
             | "consumers" rather than professionals, but more
             | importantly, that doesn't change the fact that the
             | professional use case exist and isn't amenable to the fat
             | client approach.
             | 
             | As an anecdote, I work on a Jupyterhub managed service
             | offering with customers in both the private and public
             | sectors and our data sizes are pretty much all in this
             | range.
        
               | mrtesthah wrote:
               | > I work on a Jupyterhub managed service offering ...
               | 
               | That sounds like selection bias.
        
               | throwaway894345 wrote:
               | It might be. Like I said: anecdote.
        
               | mrtesthah wrote:
               | Isn't it, though? The people most likely to need a cloud-
               | managed service like that probably have too much data to
               | crunch on a laptop, as you described.
        
               | throwaway894345 wrote:
               | Ah, to be clear I'm not saying our users are a
               | representative sample of Jupyterhub users. I _am_ saying
               | that there are a lot of people who use Jupyterhub for
               | large datasets--it's certainly not uncommon.
        
               | [deleted]
        
           | egeozcan wrote:
           | To be fair, "we don't need more cloud computing" doesn't
           | mean, "we don't need any cloud computing".
           | 
           | I don't agree that we don't need more of cloud, IMO we do,
           | but we need to focus on personal computing much more than
           | now, which is the general theme of the GP comment.
           | 
           | Not everything is big data.
        
             | throwaway894345 wrote:
             | To be clear, I'm not talking about big data. More
             | impotantly though, I didn't say there was no place for
             | client compute, only that it isn't economical for datasets
             | in excess of a few GB.
        
           | huac wrote:
           | Not only do cloud services offer better compute capabilities
           | (and GPUs/TPUs etc), but they offer easier reproduciblity and
           | sharing. Even when I hack on stuff myself, Colab is quick and
           | easy to set up, no worrying about Docker or virtualenvs.
        
           | [deleted]
        
       | bhl wrote:
       | How far are we away from having _collaborative_ Jupyter in the
       | browser? Would love a Google Docs experience of sharing a no-sign
       | up required link to help remotely teach basic python.
        
         | mcintyre1994 wrote:
         | Livebook does this! It only runs Elixir for now, but there is
         | an issue to add other languages in future. It's a really cool
         | project IMO. https://github.com/elixir-nx/livebook
        
         | qbasic_forever wrote:
         | Not that far, someone just needs to make a JupyterLab plugin
         | that uses Automerge or a similar OT/CRDT structure for
         | collaborative editing documents in a workspace (perhaps using
         | WebRTC data channels for P2P sync between clients, or stick
         | with the tried and true server model like Google docs). The
         | trouble is turning that into something as polished and secure
         | as Google Docs collaborative editing experience--there's a
         | _lot_ of work to get there with tons of little corner cases,
         | security issues (you're potentially giving strangers over the
         | internet access to remotely run code in your browser--that
         | should raise big alarm bells), etc. to think through. But the
         | basic stuff is all out there for someone motivated to pick up
         | and go wild with.
        
           | jtpx wrote:
           | There is a PR to add initial support for real time
           | collaboration in JupyterLite:
           | https://github.com/jtpio/jupyterlite/pull/109
           | 
           | This reuses almost all the RTC work done upstream in
           | JupyterLab itself.
           | 
           | And since this is implemented as a regular JupyterLab plugin,
           | folks will then be able to swap it for something else and
           | implement their own if they want to, as a federated
           | extension.
        
         | dynamicwebpaige wrote:
         | JupyterLab does support real-time collaborative editing in the
         | browser, as of a few weeks ago.
         | 
         | https://github.com/jupyterlab/rtc
        
       | airocker wrote:
       | The kernels will still run on the cloud I believe. It would be
       | great if Jupiter works better for larger programs and ui
       | development.
        
         | qbasic_forever wrote:
         | No, read more from the README. This uses pyodide, a WASM port
         | of desktop Python that runs entirely in the browser (obviously
         | stuff like file access is sandboxed). In addition it adds a web
         | worker that runs a Javascript kernel powered by your browser's
         | JS engine. All of this runs 100% in your browser, there is no
         | server component at all.
        
       | brumar wrote:
       | I use frequently the basthon notebook
       | https://notebook.basthon.fr/ which is also a wasm powered jupyter
       | (based on pyodide) and quite like it. It's flying a bit under the
       | radar as it's not translated in other langage than french. How
       | does it compare to this project?
        
       ___________________________________________________________________
       (page generated 2021-05-29 23:02 UTC)