[HN Gopher] PyTorch is dead. Long live Jax
       ___________________________________________________________________
        
       PyTorch is dead. Long live Jax
        
       Author : lairv
       Score  : 73 points
       Date   : 2024-08-16 20:24 UTC (2 hours ago)
        
 (HTM) web link (neel04.github.io)
 (TXT) w3m dump (neel04.github.io)
        
       | casualscience wrote:
       | > Multi-backend is doomed
       | 
       | Finally! Someone says it! This is why the C programming language
       | will never have wide adoption. /s
        
       | munchler wrote:
       | This is such a hyperbolic headline it's hard to be interested in
       | reading the actual article.
        
         | crancher wrote:
         | All blanket statements are false.
        
       | semiinfinitely wrote:
       | PyTorch is the javascript of ML. sadly "worse is better" software
       | has better survival characteristics even when there is consensus
       | that technology X is theoretically better
        
         | bob020202 wrote:
         | Nothing is even theoretically better than Javascript for its
         | intended use cases, web frontends and backends. Mainly because
         | it went all-in on event loop parallelism early-on, which isn't
         | just for usability but also performance. And didn't go all-in
         | on OOP unlike Java, and has easy imports/packages unlike
         | Python. It has some quirks like the "0 trinity," but that
         | doesn't really matter. No matter how good you are with
         | something else, it still takes more dev time ($$$) than JS.
         | 
         | Now it's been forever since I used PyTorch or TF, but I only
         | remember TF 1.x being more like "why TF isn't this working." At
         | some point I didn't blame myself, I blamed the tooling, which
         | TF2 later admitted. It seemed like no matter how skilled I got
         | with TF1, it'd always take much longer than developing with
         | PyTorch, so I switched early.
        
           | josephg wrote:
           | You don't think its possible to (even theoretically) improve
           | on javascript for its intended use case? What a terrific lack
           | of imagination.
           | 
           | Typescript and Elm would like a word
        
             | bob020202 wrote:
             | No, I said there's nothing that exists right now that's
             | theoretically better. Typescript isn't. It'd be great if
             | TS's type inference were smart enough that it basically
             | takes no additional dev input vs JS, but until then, it's
             | expensive to use. It's also bolted on awkwardly, but that's
             | changing soon. Could also imagine JS getting some nice Py
             | features like list comp.
             | 
             | Also, generally when people complain that JS won the web,
             | it's not because they prefer TS, it's cause they wanted to
             | use something else and can't.
             | 
             | Never used Elm, but... no variables, kinda like Erlang,
             | which I've used. That has its appeal, but you're not going
             | to find a consensus that this is better for web.
        
               | dwattttt wrote:
               | I see this a lot, and I want to find the right words for
               | it: I don't want the types automatically determined for
               | what I write, because I write mistakes.
               | 
               | I want to write the type, and for that to reveal the
               | mistake.
        
               | bob020202 wrote:
               | If you explicitly mark types at lower levels, or just use
               | typed libs, it's not very easy to pass the wrong type
               | somewhere and have it still accidentally work. The most
               | automatic type inference I can think of today is in Rust,
               | which is all about safety, or maybe C++ templates count
               | too.
        
         | sva_ wrote:
         | I don't think the comparison is fair. Imo PyTorch has the
         | cleanest abstractions, which is the reason it is so popular.
         | People can do quick prototyping without having to spend too
         | much time figuring out the engineering details that make their
         | hardware run it.
        
       | ein0p wrote:
       | Jax is dead, long live PyTorch. PyTorch has _twenty times_ as
       | many users as Jax. Any rumors of its death are highly exaggerated
        
         | melling wrote:
         | They used to say the same thing about Perl and Python
         | 
         | Downvoted. Hmmm. I'm a little tired so I don't want to go into
         | detail. However, I was a Perl programmer when Python was
         | rising. So, needless to say, having a big lead doesn't matter.
         | 
         | Please learn from history. A big lead means nothing.
        
           | ein0p wrote:
           | It's been years and Jax is just where it was, no growth
           | whatsoever. And that's with all of Google forced internally
           | to use only Jax. Look, I like the technical side of Jax for
           | the most part, but it's years too late to the party and it's
           | harder to use than PyTorch. It just isn't going to ever take
           | off at this point.
        
         | deisteve wrote:
         | lot of contrarian takes are popular but rarely implemented in
         | reality
        
         | tripplyons wrote:
         | It's definitely exaggerated, but I personally prefer JAX and
         | have found it easier to use than PyTorch for almost everything.
         | If you haven't already, I would give JAX a good try.
        
       | srush wrote:
       | PyTorch is a generationally important project. I've never seen a
       | tool that is so inline with how researchers learn and internalize
       | a subject. Teaching Machine Learning before and after its
       | adoption has been a completely different experience. Never can be
       | said enough how cool it is that Meta fosters and supports it.
       | 
       | Viva PyTorch! (Jax rocks too)
        
       | hprotagonist wrote:
       | > I believe that all infrastructure built on Torch is just a huge
       | pile of technical debt, that will haunt the field for a long,
       | long time.
       | 
       | ... from the company that pioneered the approach with tensorflow.
       | I've worked with worse ML frameworks, but they're by now pretty
       | obscure; i cannot remember (and i am very happy about it) the
       | last time i saw MXNet in the wild, for example. You'll still find
       | Caffe on some embedded systems, but you can mostly sidestep it.
        
       | deisteve wrote:
       | i like pytorch because all the academia release their code with
       | it
       | 
       | ive never even heard of jax nor will i have the skills to use it
       | 
       | i literally just want to know two things: 1) how much vram 2) how
       | to run it on pytorch
        
         | sva_ wrote:
         | Jax is a competing computational framework that does something
         | similar to PyTorch, so both of your questions don't really make
         | sense.
        
           | etiam wrote:
           | Maybe deisteve will answer for himself, but I don't think
           | that's meant to mean how to run Jax on Pytorch, but rather
           | the questions he's interested in for any published model.
        
       | 0cf8612b2e1e wrote:
       | As we all know, the technically best solution always wins.
        
       | logicchains wrote:
       | PyTorch beat Tensorflow because it was much easier to use for
       | research. Jax is much harder to use for exploratory research than
       | PyTorch, due to requiring a fixed shape computation graph, which
       | makes implementing many custom model architectures very
       | difficult.
       | 
       | Jax's advantages shine when it comes to parallelizing a new
       | architecture across multiple GPU/TPUs, which it makes much easier
       | than PyTorch (no need for custom cuda/networking code). Needing
       | to scale up a new architecture across many GPUs is however not a
       | common use-case, and most teams that have the resources for
       | large-scale multi-gpu training also have the resources for
       | specialised engineers to do it in PyTorch.
        
       | funks_ wrote:
       | I wish dex-lang [1] had gotten more traction. It's JAX without
       | the limitations that come from being a Python DSL. But ML
       | researchers apparently don't want to touch anything that doesn't
       | look exactly like Python.
       | 
       | [1]: https://github.com/google-research/dex-lang
        
         | hatmatrix wrote:
         | It seems like an experimental research language.
         | 
         | Julia also competes in this domain from a more practical
         | standpoint and has less limitations than JAX as I understand
         | it, but is less mature and still working on getting wider
         | traction.
        
           | funks_ wrote:
           | The Julia AD ecosystem is very interesting in that the
           | community is trying to make the entire language
           | differentiable, which is much broader in scope than what
           | Torch and JAX are doing. But unlike Dex, Julia is not a
           | language built from the ground up for automatic
           | differentiation.
           | 
           | Shameless plug for one of my talks at JuliaCon 2024:
           | https://www.youtube.com/live/ZKt0tiG5ajw?t=19747s. The
           | comparison between Python and Julia starts at 5:31:44.
        
         | mccoyb wrote:
         | Dex is also missing user authored composable program
         | transformations, which is one of JAX's hidden superpowers.
         | 
         | So not quite "JAX without limitations" -- but certainly without
         | some of the limitations.
        
           | funks_ wrote:
           | Are you talking about custom VJPs/JVPs?
        
             | mccoyb wrote:
             | No, I'm talking about custom `Jaxpr` interpreters which can
             | modify programs to do things.
        
       | sva_ wrote:
       | > I've personally known researchers who set the seeds in the
       | wrong file at the wrong place and they weren't even used by torch
       | at all - instead, were just silently ignored, thus invalidating
       | all their experiments. (That researcher was me)
       | 
       | Some _assert_ -ing won't hurt you. Seriously. It might even help
       | keeping your sanity.
        
       | marcinzm wrote:
       | My main reason to avoid Jax is Google. Google doesn't provide
       | good support even for things you pay them for. They do things
       | because they want to, to get their internal promotions,
       | irrespective of their customers or the impact on them.
        
       | smhx wrote:
       | the author got a couple of things wrong, that are worth pointing
       | out:
       | 
       | 1. PyTorch is going all-in on torch.compile -- Dynamo is the
       | frontend, Inductor is the backend -- with a strong default
       | Inductor codegen powered by OpenAI Triton (which now has CPU,
       | NVIDIA GPU and AMD GPU backends). The author's view that PyTorch
       | is building towards a multi-backend future isn't really where
       | things are going. PyTorch supports extensibility of backends
       | (including XLA), but there's disproportionate effort into the
       | default path. torch.compile is 2 years old, XLA is 7 years old.
       | Compilers take a few years to mature. torch.compile will get
       | there (and we have reasonable measures that the compiler is on
       | track to maturity).
       | 
       | 2. PyTorch/XLA exists, mainly to drive a TPU backend for PyTorch,
       | as Google gives no other real way to access the TPU. It's not
       | great to try shoe-in XLA as a backend into PyTorch -- as XLA
       | fundamentally doesn't have the flexibility that PyTorch supports
       | by default (especially dynamic shapes). PyTorch on TPUs is
       | unlikely to ever have the experience of JAX on TPUs, almost by
       | definition.
       | 
       | 3. JAX was developed at Google, not at Deepmind.
        
       | cs702 wrote:
       | A more accurate title for the OP would be "I _hope and wish_
       | PyTorch is dead. Long live Jax. " Leaving aside the fact that
       | PyTorch's ecosystem is 10x to 100x larger, depending on how you
       | measure it, PyTorch's biggest advantage, in my experience, is
       | that it is picked up quickly by developers who are new to it.
       | Jax, despite its superiority, or maybe because of it, is not.
       | equinox does a great job of making Jax accessible, but its
       | functional approach remains more difficult to learn and master
       | than PyTorch's object-oriented one.
        
       ___________________________________________________________________
       (page generated 2024-08-16 23:00 UTC)