[HN Gopher] Growing open source from Torch to PyTorch
___________________________________________________________________
Growing open source from Torch to PyTorch
Author : plinkplonk
Score : 35 points
Date : 2021-08-02 20:23 UTC (2 days ago)
(HTM) web link (soumith.ch)
(TXT) w3m dump (soumith.ch)
| posharma wrote:
| PyTorch is amazing. The article was a good read. Although I'm
| confused. How can a ML framework be not obsessed with
| speed/performance?
| smhx wrote:
| Author here. Being conscious about speed and performance is
| different from making that your competitive advantage or USP.
|
| Our main focus is usability, and one of our secondary focuses
| is to not look like clowns in the performance department.
|
| So, we try to take more decisions that trade off performance
| for usability than vice versa.
| mirker wrote:
| Thanks for the post.
|
| One question: One of the advantages about having a clean
| design is that performance is easier to optimize, since the
| 80%/20% rule of performance becomes much more obvious. How
| true was this in your experience? Were there any major
| performance-related design changes or was performance
| optimization a matter of tuning a few selected functions?
| ampdepolymerase wrote:
| You are doing a good job balancing the two. For Julia's Flux,
| they did the opposite and it has severe performance problems
| compared to PyTorch despite being more usable and easier to
| install.
|
| Installing PyTorch with Poetry is next to impossible. Flux
| got this right by bundling the GPU drivers. Their
| installation is also standardized and does not require the
| weird pip -f flag for CPU only installations.
| amkkma wrote:
| >it has severe performance problems
|
| It had. It's now around parity with pytorch.
|
| And no, it wasn't about a usability tradeoff.
|
| It was about being more general- More general compiler,
| more general code, more composable code.
|
| Then, the team has been optimizing that and including
| compiler optimizations in the language that benefit all
| code. ML type code stressed that in a particular way.
| Pytorch does ML array heavy stuff as a special case.
|
| Julia will be doing the same, but it's setting the
| groundwork for domain specific optimizations to be done in
| package and user space. A different sort of philosophy
|
| It was about being more greedy and setting the groundwork
| for a more powerful tool in general, at some short term
| cost.
|
| They could have just wrote a framework that baked in fp
| 32/64,16 with cuda kernels and tracing and operator
| overloading computational graphs and gotten more speedup
| over pytorch (in fact, avalon.jl takes that approach.),
| with better usability.
|
| But they didn't and now there's a burgeoning ecosystem that
| does things no other framework can't. It's not quite as
| marginally beneficial for current vanilla ML because that
| is stuck in a local optimum, but I think that is going to
| change: https://www.stochasticlifestyle.com/useful-
| algorithms-that-a...
|
| In the meantime, places like MIT, moderna, NASA etc are
| reaping the benefits.
| jsinai wrote:
| > In the meantime, places like MIT, moderna, NASA etc are
| reaping the benefits.
|
| Can you elaborate more? MIT is well known but would
| interesting to know how Moderna and NASA are using Flux?
| amkkma wrote:
| Sure!
|
| NASA: https://www.youtube.com/watch?v=tQpqsmwlfY0
|
| Moderna: https://pumas.ai/
| https://discourse.julialang.org/t/has-moderna-used-pumas-
| ai-...
|
| There are many many more. These unique and sought after
| capability are what got Julia Computing its 24 mil series
| A (https://twitter.com/Viral_B_Shah/status/14171284162063
| 76960)
| amkkma wrote:
| Some specific steps that will push it past jax/pytorch
| for chunky array heavy GPU code (can already beat or meet
| openblas/MKL for kernels written in scalar form).
|
| 1. Better compile time memory management
| (https://github.com/aviatesk/EscapeAnalysis.jl)
|
| 2. Linalg passes built on generic composable compiler
| ecosystem: https://youtu.be/IlFVwabDh6Q?t=818
|
| 3. Metatheory.jl egraph based symbolic optimization
| interleaved with the abstract interpreter:
| https://github.com/0x0f0f0f/Metatheory.jl
|
| 4. Partial eval mixed concrete and abstract
| interpretation
|
| 5. Compiler based autoparallel with dagger.jl
|
| 6. New compiler integrated AD (as a package) that isn't
| based on an accidental lispy compiler hack like zygote:
| https://github.com/JuliaDiff/Diffractor.jl
|
| 7. Changes to array semantics which will include generic
| immutability/ ownership concepts.
|
| And many more. The key is that all the initial groundwork
| that traded off fundamental flexibility for specific
| speed will then feed back into making the ML usecase
| faster than if it had focused on that initially. People
| can do all kinds of crazy yet composable things, in pure
| Julia without modifying the base compiler.
|
| Bonus: Being able to modify the type lattice to track
| custom program properties. This means that you don't need
| to be stuck into global tradeoffs with a static type
| system and can do things like opt in track array shapes
| at compile time per module: https://twitter.com/KenoFisch
| er/status/1407810981338796035 Other packages like for
| quantum computing are planning to do their own analyses.
| It's generic and the usecases and compositions aren't
| frozen at the outset. (unlike for example, the swift
| tensors fitting perfectly proposal).
| smhx wrote:
| We ship everything needed for userland -- including parts
| of CUDA/CuBLAS and CuDNN that we need (which is why our
| binaries are so fat).
|
| GPU drivers would be kernel-land and I don't think we
| actually can install GPU drivers as part of a `pip
| install`. Will look into what Flux is doing, but I doubt
| they ship GPU drivers.
|
| Separately, thanks for flagging the Poetry issue, we might
| prioritize it, especially if the fix is easy.
| amkkma wrote:
| yes Flux doesn't ship GPU drivers. It ships everything
| else (like CUDA toolkit etc) as needed, using the
| artifact / pkg system, for all mainstream OSes. Doesn't
| interfere with system libraries.
|
| https://julialang.org/blog/2019/11/artifacts/
| albertzeyer wrote:
| It was necessary to move away from Lua to stay relevant within
| the machine learning community. Python was a natural choice
| because there was Theano and TensorFlow.
|
| PyTorch could make use of the best API ideas from the other
| frameworks (also higher-level like Keras). And it was executed
| well. All these core principles of easy debuggability are indeed
| very important to win developers. Clean code, understandable
| code, flexibility, these are all very related to that, or mostly
| the same thing.
|
| It's easy to get bloated, complex and complicated for a
| successful framework though. I wonder how PyTorch will look in a
| few years. I also remember the first TensorFlow releases, where
| the whole source code was also quite easy to understand. Then
| TensorFlow added more and more things, and many different types
| of APIs, starting to deprecate some earlier things, etc. The
| PyTorch internal code is also already much more complex than it
| was initially.
|
| One reason JAX is now popular is because it again started with a
| fresh API. Despite being based on a new kind of idea of code
| transformations, which seems nice and powerful.
|
| When looking at these developments, I really wonder what the
| future will look like. It's good to have new ideas and new or
| improved APIs. It's also good to adapt things for new kinds of
| hardware (GPUs, TPUs, maybe other neuromorphic hardware later).
| jimsimmons wrote:
| Keras was a copy of Torch API. If you read the original Keras
| readme it literally says so.
| amkkma wrote:
| As a julia user, thanks for this! Inspiring and packed with
| pearls. There's a lot we can learn from the python community
| blt wrote:
| This article does a good job explaining how PyTorch gained an
| advantage over TensorFlow. The 1.0 release of TensorFlow with
| graphs and feed_dicts was a little clunky but made sense. After
| 1.0 the second-system effect took hold quickly. Eager mode,
| Keras, TFX ... it all started to look like a mess.
___________________________________________________________________
(page generated 2021-08-04 23:00 UTC)