[HN Gopher] Leaving Meta and PyTorch
       ___________________________________________________________________
        
       Leaving Meta and PyTorch
        
       Author : saikatsg
       Score  : 663 points
       Date   : 2025-11-07 06:14 UTC (16 hours ago)
        
 (HTM) web link (soumith.ch)
 (TXT) w3m dump (soumith.ch)
        
       | msmd74 wrote:
       | Sounds like you had a momentous run.
       | 
       | If you take advice from reformed Internet trolls, consider
       | turning off all your devices and trying to give yourself at least
       | a week, but ideally a month offline staring at your new baby.
       | You'll never get that time back and there's nothing your brain
       | will appreciate more than loading up those memories as they grow.
       | 
       | Good luck.
        
       | qmatch wrote:
       | As a loyal JAX user, I hope they can play catchup. PyTorch has
       | dominated the AI scene since TF1 fumbled the ball at 10th yard
       | line. What Matt Johnson has done turning Autograd into JAX is
       | hopefully going to be worthy of as much praise as what Soumith
       | has received.
        
         | n_u wrote:
         | > PyTorch has dominated the AI scene since TF1 fumbled the ball
         | at 10th yard line
         | 
         | can you explain why you think TensorFlow fumbled?
        
           | zapnuk wrote:
           | For me it was about 8 years ago. Back then TF was already
           | bloated but had two weaknesses. Their bet on static compute
           | graphs made writing code verbose and debugging difficult.
           | 
           | The few people I know back then used keras instead. I
           | switched to PyTorch for my next project which was more
           | "batteries included".
        
           | michaelt wrote:
           | Imagine a total newbie trying to fine-tune an image
           | classifier, reusing some open source example code, about a
           | decade ago.
           | 
           | If their folder of 10,000 labelled images contains one image
           | that's a different size to the others, the training job will
           | fail with an error about unexpected dimensions while
           | concatenating.
           | 
           | But it _won 't_ be able to say the file's name, or that the
           | problem is an input image of the wrong size. It'll just say
           | it can't concatenate tensors of different sizes.
           | 
           | An experienced user will recognise the error immediately, and
           | will have run a data cleansing script beforehand anyway. But
           | it's not experienced users who bounce from frameworks, it's
           | newbies.
        
             | mschuster91 wrote:
             | > An experienced user will recognise the error immediately,
             | and will have run a data cleansing script beforehand
             | anyway. But it's not experienced users who bounce from
             | frameworks, it's newbies.
             | 
             | Even seasoned developers will bounce away from frameworks
             | or libraries - no matter if old dogs or the next hot thing
             | - if the documentation isn't up to speed or simple, common
             | tasks require wading through dozens of pages of
             | documentation.
             | 
             | Writing good documentation is hard enough, writing relevant
             | "common usage examples" is even harder... but keeping them
             | _up to date_ and _working_ is a rarely seen art.
             | 
             | And the greatest art of all of it is logging. Soooo many
             | libraries refuse to implement detailed structured logging
             | in internal classes (despite particularly Java and PHP
             | offering very powerful mechanisms), making it much more
             | difficult to troubleshoot problems in the field.
        
           | qmatch wrote:
           | I personally believe TF1 was serving the need of its core
           | users. It provided a compileable compute graph with autodiff,
           | and you got very efficient training and inference from it.
           | There was a steep learning curve, but if you got past it,
           | things worked very very well. The distributed TF never really
           | took off--it was buggy, and I think they made some wrong
           | earlier bets in the design for performance reasons that they
           | should have been sacrificed in favor of simplicity.
           | 
           | I believe some years after the TF1 release, they realized the
           | learning curve was too steep, they were losing users to
           | PyTorch. I think also the Cloud team was attempting to sell
           | customers on their amazing DL tech, which was falling flat.
           | So they tried to keep the TF brand while totally changing the
           | product under the hood by introducing imperative programming
           | and gradient tapes. They killed TF1, upsetting those users,
           | while not having a fully functioning TF2, all the while
           | having plenty of documentation pointing to TF1 references
           | that didn't work. Any new grad student made the simple choice
           | of using a tool that was user-friendly and worked, which was
           | PyTorch. And most old TF1 users hopped on the band wagon.
        
           | tdullien wrote:
           | I only remember 2015 TF and I was wondering: why would I use
           | Python to assemble a computational graph when what I really
           | want is to write code and then differentiate through it?
        
           | Gazoche wrote:
           | I'm no machine learning engineer but I've dabbled
           | professionally with both frameworks a few years ago and the
           | developer experience didn't even compare. The main issue with
           | TF was that you could only chose between a powerful but
           | incomprehensible, poorly documented [1], ultra-verbose and
           | ever changing low-level API, and an abstraction layer (Keras)
           | that was too high level to be really useful.
           | 
           | Maybe TF has gotten better since but at the time it really
           | felt like an internal tool that Google decided to just throw
           | into the wild. By contrast PyTorch offered a more reasonable
           | level of abstraction along with excellent API documentation
           | and tutorials, so it's no wonder that machine learning
           | engineers (who are generally more interested in the science
           | of the model than the technical implementation) ended up
           | favoring it.
           | 
           | [1] The worst part was that Google only hosted the docs for
           | the _latest_ version of TF, so if you were stuck on an older
           | version (because, oh I don 't know, you wanted a stable
           | environment to serve models in production), well tough luck.
           | That certainly didn't gain TF any favors.
        
           | HarHarVeryFunny wrote:
           | The original TensorFlow had an API similar to the original
           | Lua-based Torch (the predecessor to PyTorch) that required
           | you to first build the network, node by node, then run it.
           | PyTorch used a completely different, and much more convenient
           | approach, where the network is built automatically for you
           | just by running the forward pass code (and will then be used
           | for the backward pass), using both provided node types and
           | arbitrary NumPy compatible code. You're basically just
           | writing differentiable code.
           | 
           | This new PyTorch approach was eventually supported by
           | TensorFlow as well ("immediate mode"), but the PyTorch
           | approach was such a huge improvement that there had been an
           | immediate shift by many developers from TF to PyTorch, and TF
           | never seemed able to regain the momentum.
           | 
           | TF also suffered from having a confusing array of alternate
           | user libraries built on top of the core framework, none of
           | which had great documentation, while PyTorch had a more
           | focused approach and fantastic online support from the
           | developer team.
        
             | liuliu wrote:
             | LuaTorch is eager-execution. The problem with LuaTorch is
             | the GC. You cannot rely on traditional GC for good work,
             | since each tensor is megabytes (at the time), now gigabytes
             | large, you need to collect them aggressively rather than at
             | intervals (Python's reference-counting system solves this
             | issue, and of course, by "collecting", I don't mean free
             | the memory (PyTorch has a simple slab allocator to manage
             | CUDA memory)).
        
               | HarHarVeryFunny wrote:
               | With Lua Torch the model execution was eager, but you
               | still had to construct the model graph beforehand - it
               | wasn't "define by run" like PyTorch.
               | 
               | Back in the day, having completed Andrew Ng's ML coursew,
               | I then built my own C++ NN framework copying this graph-
               | mode Lua Torch API. One of the nice things about
               | explicitly building a graph was that my framework
               | supported having the model generate a GraphViz DOT
               | representation of itself so I could visualize it.
        
               | liuliu wrote:
               | Ah, I get what you mean now. I am mixing up the nn module
               | and the tensor execution bits. (to be fair, the PyTorch
               | nn module carries over many these quirks!).
        
           | stared wrote:
           | In 2018, I co-wrote a blog post with the inflammatory title
           | "Don't use TensorFlow, try PyTorch instead"
           | (https://news.ycombinator.com/item?id=17415321). As it gained
           | traction here, it was changed to "Keras vs PyTorch" (some
           | edgy things that work for a private blog are not good for a
           | corporate one). Yet the initial title stuck, and you can see
           | it resonated well with the crowd.
           | 
           | TensorFlow (while a huge step on top of Theano) had issues
           | with a strange API, mixing needlessly complex parts (even for
           | the simplest layers) with magic-box-like optimization.
           | 
           | There was Keras, which I liked and used before it was cool
           | (when it still supported the Theano backend), and it was the
           | right decision for TF to incorporate it as the default API.
           | But it was 1-2 years too late.
           | 
           | At the same time, I initially looked at PyTorch as some
           | intern's summer project porting from Lua to Python. I
           | expected an imitation of the original Torch. Yet the more it
           | developed, the better it was, with (at least to my mind) the
           | perfect level of abstraction. On the one hand, you can easily
           | add two tensors, as if it were NumPy (and print its values in
           | Python, which was impossible with TF at that time). On the
           | other hand, you can wrap anything (from just a simple
           | operation to a huge network) in an nn.Module. So it offered
           | this natural hierarchical approach to deep learning. It
           | offered building blocks that can be easily created, composed,
           | debugged, and reused. It offered a natural way of picking the
           | abstraction level you want to work with, so it worked well
           | for industry and experimentation with novel architectures.
           | 
           | So, while in 2016-2017 I was using Keras as the go-to for
           | deep learning (https://p.migdal.pl/blog/2017/04/teaching-
           | deep-learning/), in 2018 I saw the light of PyTorch and
           | didn't feel a need to look back. In 2019, even for the intro,
           | I used PyTorch (https://github.com/stared/thinking-in-
           | tensors-writing-in-pyt...).
        
             | stared wrote:
             | Actually, I opened "Teaching deep learning" and smiled as I
             | saw how it evolved:
             | 
             | > There is a handful of popular deep learning libraries,
             | including TensorFlow, Theano, Torch and Caffe. Each of them
             | has Python interface (now also for Torch: PyTorch)
             | 
             | > [...]
             | 
             | > EDIT (July 2017): If you want a low-level framework,
             | PyTorch may be the best way to start. It combines
             | relatively brief and readable code (almost like Keras) but
             | at the same time gives low-level access to all features
             | (actually, more than TensorFlow).
             | 
             | > EDIT (June 2018): In Keras or PyTorch as your first deep
             | learning framework I discuss pros and cons of starting
             | learning deep learning with each of them.
        
           | probably_wrong wrote:
           | I see good answers already, but here's a concrete example:
           | 
           | In my University we had to decide between both libraries so,
           | as a test, we decided to write a language model from scratch.
           | The first minor problem with TF was that (if memory serves me
           | right) you were supposed to declare your network "backwards"
           | - instead of saying "A -> B -> C" you had to declare
           | "C(B(A))". The major problem, however, was that there was no
           | way to add debug messages - either your network worked or it
           | didn't. To make matters worse, the "official" TF tutorial on
           | how to write a Seq2Seq model didn't compile because the
           | library had changed but the bug reports for that were met for
           | years with "we are changing the API so we'll fix the example
           | once we're done".
           | 
           | PyTorch, by comparison, had the advantage of a Python-based
           | interface - you simply defined classes like you always did
           | (including debug statements!), connected them as variables,
           | and that was that. So when I and my beginner colleagues had
           | to decide which library to pick, "the one that's not a
           | nightmare to debug" sounded much better than "the one that's
           | more efficient if you have several billions training
           | datapoints and a cluster". Me and my colleagues then went on
           | to become professionals, and we all brought PyTorch with us.
        
             | jszymborski wrote:
             | The inability to use print debug to tell me the dimensions
             | of my hidden states was 100% why TF was hard for me to use
             | as a greenhorn MSc student.
             | 
             | Another consequence of this was that PyTorch let you use
             | regular old Python for logic flow.
        
             | n_u wrote:
             | This was also my experience. TensorFlow's model of
             | constructing then evaluating a computation graph felt at
             | odds with Python's principles. It made it extremely
             | difficult to debug because _you couldn 't print tensors
             | easily!_ It didn't feel like Python at all.
             | 
             | Also the API changed constantly so examples from docs or
             | open source repos wouldn't work.
             | 
             | They also had that weird thing about all tensors having a
             | unique global name. I remember I tried to evaluate a DQN
             | network twice in the same script and it errored because of
             | that.
             | 
             | It's somewhat vindicating to see many people in this thread
             | shared my frustrations. Considering the impact of these
             | technologies I think a documentary about why TensorFlow
             | failed and PyTorch took off would be a great watch.
        
           | morshu9001 wrote:
           | I just remember TF1 being super hard to use as a beginner and
           | Google repeatedly insisting it had to be that way. People
           | talk about the layering API, but it's more than that,
           | everything about it was covered with sharp edges.
        
           | htrp wrote:
           | Greenfielding TF2.X and not maintaining 1.X compatibility
        
           | rockinghigh wrote:
           | First, the migration to 2.0 in 20219 to add eager mode
           | support was horribly painful. Then, starting around 2.7,
           | backward compatibility kept being broken. Not being able to
           | load previously trained models with a new version of the
           | library is wildly painful.
        
         | intermerda wrote:
         | Do you have experience in both JAX and PyTorch? Why do you
         | prefer JAX?
        
           | cl3misch wrote:
           | Not OP. I prefer JAX for non-AI tasks in scientific computing
           | because of the different mental model than PyTorch. In JAX,
           | you think about functions and gradients of functions. In
           | PyTorch you think about tensors which accumulate a gradient
           | while being manipulated through functions. JAX just suits my
           | way of thinking much better.
           | 
           | I also like that jax.jit forces you to write "functional"
           | functions free of side effects or inplace array updates. It
           | might feel weird at first (and not every algorithm is suited
           | for this style) but ultimately it leads to clearer and faster
           | code.
           | 
           | I am surprised that JIT in PyTorch gets so little attention.
           | Maybe it's less impactful for PyTorch's usual usecase of
           | large networks, as opposed to general scientific computing?
        
             | imtringued wrote:
             | >I also like that jax.jit forces you to write "functional"
             | functions free of side effects or inplace array updates. It
             | might feel weird at first (and not every algorithm is
             | suited for this style) but ultimately it leads to clearer
             | and faster code.
             | 
             | It's not weird. It's actually the most natural way of doing
             | things for me. You just write down your math equations as
             | JAX and you're done.
        
               | Majromax wrote:
               | > You just write down your math equations as JAX and
               | you're done.
               | 
               | It's natural when your basic unit is a whole vector
               | (tensor), manipulated by some linear algebra expression.
               | It's less natural if your basic unit is an element of a
               | vector.
               | 
               | If you're solving sudoku, for example, the obvious
               | 'update' is in-place.
               | 
               | In-place updates are also often the right answer for
               | performance reasons, such as writing the output of a
               | .map() operation directly to the destination tensor. Jax
               | leans _heavily_ on compile-time optimizations to turn the
               | mathematically-nice code into computer-nice code, so the
               | delta between eager-Jax and compiled-Jax is much larger
               | than the delta between eager-Pytorch and compiled-
               | Pytorch.
        
           | havercosine wrote:
           | Not Op. I have production / scale experience in PyTorch and
           | toy/hobby experience in JAX. I wish I could have time time or
           | liberty to use JAX more. It consists of small, orthogonal set
           | of ideas that combine like lego blocks. I can attempt to
           | reason from first principals about performance. The
           | documentation is super readable and strives to make you
           | understand things.
           | 
           | JAX seems well engineered. One would argue so was TensorFlow.
           | But ideas behind JAX were built outside Google (autograd) so
           | it has struck right balance with being close to idiomatic
           | Python / Numpy.
           | 
           | PyTorch is where the tailwinds are, though. It is a wildly
           | successful project which has acquired ton of code over the
           | years. So it is little harder to figure out how something
           | works (say torch-compile) from first principles.
        
         | bjourne wrote:
         | JAX seems great but the Google ghost is a lot to stomach. The
         | risk of JAX getting axed or replaced with a JAX 2.0 -
         | completely incompatible with existing JAX code - is not
         | insignificant.
        
       | chopete3 wrote:
       | >>Every major AI company and hardware vendor are on a speed dial.
       | This kind of power is really hard to give up. But curiosity
       | ultimately won out in my head.
       | 
       | A simple feeling has such a power. May he gets an opportunity to
       | create one more powerful tool before retiring.
        
         | Lord-Jobo wrote:
         | If the curiosity dies, the entire thing crumbles.
         | 
         | The second I stop being curious I stop finding new and exciting
         | things to do, and I stop feeling fulfillment. It's one of the
         | biggest signs saying "it's time to move on".
         | 
         | I feel so strongly for the people who can't afford the luxury.
         | Ive been there, unfulfilling jobs for years because bills or
         | resume building.
        
           | rubicon33 wrote:
           | Gosh, given you've been there I have to ask what allowed you
           | to get out of that and pursue only things that interest and
           | excite you?
        
       | perfmode wrote:
       | Respect.
        
       | mxkopy wrote:
       | PyTorch is one of those tools that's so simple and easy to take
       | apart that you feel like you might've been able to make it
       | yourself. I can't imagine how much engineering effort was behind
       | all those moments where I thought to myself, "of course it should
       | work like that, how can it be any other way?"
        
         | TechnicolorByte wrote:
         | Can anyone recommend a technical overview describing the design
         | decisions PyTorch made that led it to win out?
        
           | huevosabio wrote:
           | I don't know the full list, but back when it came out, TF
           | felt like a crude set of bindings to the underlying c++/CUDA
           | workhorse. PyTorch felt, in contrast, pythonic. It was much
           | closer in feeling to numpy.
        
           | puttycat wrote:
           | I think it was mostly the eager evaluation that made it
           | possible to debug every step in the network forward/backward
           | passes. Tensorflow didn't have that at the time which made
           | debugging practically impossible.
        
           | GistNoesis wrote:
           | The choice of the dynamic computation graph [1] of PyTorch
           | made it easier to debug and implement, leading to higher
           | adoption, even though running speed was initially slower (and
           | therefore training cost higher).
           | 
           | Other decisions follow from this one.
           | 
           | Tensorflow started with static and had to move to dynamic at
           | version 2.0, which broke everything. Fragmentation between
           | tensorflow 1, tensorflow 2, keras, jax.
           | 
           | Pytorch's compilation of this computation graph erased the
           | remaining edge of Tensorflow.
           | 
           | Is the battle over ? From a purely computational point,
           | Pytorch solution is very far from optimal and billions of
           | dollars of electricity and GPUs are burned every year, but
           | major players are happy with circular deals to entrench their
           | positions. So at the pace of current AI code development,
           | probably one or two years before Pytorch is old history.
           | 
           | [1] https://www.geeksforgeeks.org/deep-learning/dynamic-vs-
           | stati...
        
             | Uehreka wrote:
             | > at the pace of current AI code development, probably one
             | or two years before Pytorch is old history.
             | 
             | Ehhh, I don't know about that.
             | 
             | Sure, new AI techniques and new models are coming out
             | pretty fast, but when I go to work with a new AI project,
             | they're often using a version of PyTorch or CUDA from when
             | the project began a year or two ago. It's been super
             | annoying having to update projects to PyTorch 2.7.0 and
             | CUDA 12.8 so I can run them on RTX 5000 series GPUs.
             | 
             | All this to say: If PyTorch was going to be replaced in a
             | year or two, we'd know the name of its killer by now, and
             | they'd be the talk of HN. Not to mention that at this point
             | all of the PhDs flooding into AI startups wrote their grad
             | work in PyTorch, it has a lot of network lock-in that an
             | upstart would have to overcome by being way better at
             | something PyTorch can never be good at. I don't even know
             | what that would be.
             | 
             | Bear in mind that it took a few years for Tensorflow to die
             | out due to lock in, and we all knew about PyTorch that
             | whole time.
        
               | GistNoesis wrote:
               | > a lot of network lock-in that an upstart would have to
               | overcome by being way better at something PyTorch can
               | never be good at
               | 
               | Higher level code migration to the newer framework, is
               | going to 0. You ask your favorite agent (or intern) to
               | port and check that the migration is exact. We already
               | see this in the multitude of deep-learning frameworks.
               | 
               | The day one optimization trick that PyTorch can't do but
               | another framework can, which reduce your training cost
               | 10x and PyTorch is going the way of the dodo.
               | 
               | The day one architecture which can't be implemented in
               | PyTorch get superior performance, and it's bye bye
               | python.
               | 
               | We see this with architectures which require real-time
               | rendering like Gaussian Splatting (Instant Nerf), or the
               | caching strategies for LLM sequence generation.
               | 
               | Pytorch's has 3 main selling point :
               | 
               | - Abstracting away the GPU (or device) specific code,
               | which is due to nvidia's mess : custom optimized kernels,
               | which you are forced to adapt to if you don't want to
               | write custom kernels.
               | 
               | If you don't mind writing optimized kernels, because the
               | machine write them. Or if you don't need Cuda because you
               | can't use Nvidia hardware because for example you are in
               | China. Or if you use custom silicon, like Grok and need
               | your own kernels anyway.
               | 
               | - Automatic differentiation. It's one of its weak point,
               | because they went for easy instead of optimal. They shut
               | themselves off some architectures. Some language like
               | Julia because of the dynamic low-level compilation can do
               | things Pytorch won't even dream about, (but Julia has its
               | own problems mainly related to memory allocations). Here
               | with the pytorch's introduction of the "scan function"[2]
               | we have made our way full circle to Theano,
               | Tensorflow's/Keras ancestor, which is usually the pain
               | point of the bad automatic differentiating strategy
               | chosen by Pytorch.
               | 
               | The optimal solution like all physics Phds which wrote
               | simulations know, is writing custom adjoint code with
               | 'Source Code Transformation' or symbolically : it's not
               | hard but very tedious so it's now a great fit for your
               | LLM (or intern or Phd candidate running 'student gradient
               | descent') if you prove or check your gradient calculation
               | is ok.
               | 
               | - Cluster Orchestration and serialization : a model can
               | be shared with less security risks than arbitrary source
               | code, because you only share weights. A model can be
               | splitted between machines dynamically. But this is also a
               | big weakness because your code rust as you become
               | dependent of versioning, you are locked with the specific
               | version number your model was trained on.
               | 
               | [2]
               | "https://docs.pytorch.org/xla/master/features/scan.html
        
               | morshu9001 wrote:
               | What would stop PyTorch from implementing whatever
               | optimization trick becomes important? Even if it requires
               | a different API.
        
               | GistNoesis wrote:
               | There are two types of stops : soft stops, and hard
               | stops.
               | 
               | - Soft stops is when the dynamic graph computation
               | overhead is too much, which mean you can still calculate,
               | but if you were to write the function manually or with a
               | better framework, you could be 10x faster.
               | 
               | Typical example involve manually unrolling a loop. Or
               | doing kernel fusion. Other typical example is when you
               | have lots of small objects or need to do loops in python
               | because it doesn't vectorize well. Or using the sparsity
               | efficiently by ignoring the zeros.
               | 
               | - Hard stop is when computing the function become
               | impossible, because the memory needed to do the
               | computation in a non optimal way explode. Some times you
               | can get away with just writing customized kernels.
               | 
               | The typical example where you can get away with it are
               | custom attention layers.
               | 
               | Typical example where you can't get away are physics
               | simulations. Like for example the force is the gradient
               | of energy, but you have n^2 interactions between the
               | particles, so if you use anything more than 0 memory
               | preserved during the forward pass per interaction, your
               | memory consumption explode. And typically with things
               | like Lagrangian or Hamiltonian neural networks where you
               | look the discover dynamics of an energy conserving
               | system, you need to be able differentiate at least three
               | times in a row.
               | 
               | There are also energy expanding stops, where you need to
               | find work-around to make it work like if you want to have
               | your parameters changing shape during the optimization
               | process like learning point clouds of growing size, and
               | they spread you thin so they won't be standardized.
        
             | saagarjha wrote:
             | Someone's got to prototype the next generation of
             | architectures.
        
           | mxkopy wrote:
           | I'm not sure if such an overview exists, but when caffe2 was
           | still a thing and JAX was a big contender dynamic vs static
           | computational graphs seemed to be a major focus point for
           | people ranking the frameworks.
        
           | albanD wrote:
           | I would highly recommend the podcast by ezyang
           | https://pytorch-dev-podcast.simplecast.com/ for a collection
           | of design discussions on the different parts of the library.
        
       | BoredPositron wrote:
       | The last few years must have been incredibly exhausting. Thanks
       | for your work good luck and 73.
        
       | vintermann wrote:
       | That man has an infective enthusiasm. I remember the DCGAN paper
       | inspired me to try getting the (Lua) Torch code to work, and I
       | tried it on the Oxford flowers dataset early on. It worked
       | surprisingly well, and Soumith Chintala even shared it around in
       | social media, surprised at how well it worked on such a small
       | dataset. Of course back then we didn't really appreciate the
       | problem of mode collapse.
       | 
       | Pytorch and old Lua Torch were a pleasure to work with compared
       | to the contemporary Tensorflow. Lots of S.C's code was copied
       | around liberally, it had its quirks (I remember the DCGAN code
       | had a pretty odd way of doing parameter passing) but it was also
       | really easy to understand and made random people like me feel
       | like we had suddenly stumbled onto something crazy powerful
       | (which we had!). It was wonderfully hackable.
        
         | whitten wrote:
         | What is the Oxford flowers dataset ? Where is it available ?
        
           | rockinghigh wrote:
           | https://www.robots.ox.ac.uk/~vgg/data/flowers/
        
       | utopiah wrote:
       | What I find most interesting with this is that it shows they
       | believe there is nothing unique at Meta related to AI. There is
       | no resource, people and computing power, that they can't get
       | elsewhere for whatever they believe would be more interesting for
       | them.
       | 
       | I mention this because it feels analogous to military research,
       | where people "dream" of how advanced the military is, how forward
       | they are compared to public research... and yet, it seems to be a
       | recurring myth they love to sustain.
       | 
       | So the signal I get here is AI "labs" in BigTech have nothing
       | worth waiting for around the corner, it's just more of the same
       | and boring for people who stick there.
        
         | rtpg wrote:
         | I don't think that's the read? Guy says he wants to work on
         | something small. If you want to work on something big you
         | probably want to be in a big corp to have the resources to do
         | the big thing.
         | 
         | Also absolutely unknown if the "new thing" is AI-related at
         | all!
        
           | utopiah wrote:
           | Well he left so whatever is coming next, AI related or not,
           | "small" or not (small for them might be reaching just a
           | million people, he wrote that he "lead the software layer
           | that powers the entire AI industry." so his notion of scale
           | is probably unlike mine, maybe yours too) is more exciting to
           | him that whatever he could do next with all of Meta's
           | resources.
           | 
           | Edit: to be clear, I didn't mean to imply their next thing is
           | AI related, solely that they obviously know more about AI at
           | Meta than e.g. XR at Meta, just because that's their
           | expertise.
        
             | hombre_fatal wrote:
             | Your assumption is a bad read because it only works if his
             | set of life priorities contains _nothing else_ but
             | maximizing his impact in the world of AI.
             | 
             | If he has just one other priority in that set (which could
             | still include a robotic min/max of AI impact), then your
             | assumption fails.
        
           | radicalbyte wrote:
           | It reads to me as if he was the victim of office politics and
           | decided to say "fuck it" instead of being transferred to
           | something else within Meta.
        
             | disgruntledphd2 wrote:
             | > It reads to me as if he was the victim of office politics
             | and decided to say "fuck it" instead of being transferred
             | to something else within Meta.
             | 
             | It looks like he'd already been transferred once (to Infra)
             | and maybe didn't want to do it again.
        
             | sheepscreek wrote:
             | Pretty crazy/bizarre that a VP/Fellow engineer would have
             | such little say in what they do at Meta. In my mind,
             | companies would do everything possible to retain them. They
             | are a special and rare breed.
        
           | embedding-shape wrote:
           | > If you want to work on something big you probably want to
           | be in a big corp to have the resources to do the big thing.
           | 
           | If anything, the reverse seems to be true, if you want to
           | work on something big, you want to be in a small company,
           | sufficiently funded, filled with great people, yet not "big",
           | that's when "something big" seems to be more likely to
           | happen.
           | 
           | In contrast, as far as I can think, the bigger a company
           | gets, the less likely they are to actually come up with
           | "something big", it seems like most of the times you need
           | (creative) constraints in order for the results to end up
           | being actually innovative, otherwise you end up like IBM and
           | Meta, throwing money on stuff and getting some results, but
           | nothing really out of the ordinary considering what's
           | happening elsewhere in their ecosystems.
        
         | jansan wrote:
         | > I mention this because it feels analogous to military
         | research, where people "dream" of how advanced the military is,
         | how forward they are compared to public research... and yet, it
         | seems to be a recurring myth they love to sustain.
         | 
         | I don't think that you can read this from the blog post at all,
         | but it gives me a chuckle to think how the quest for AGI at
         | Meta may be "The Men Who Stare at Goats" all over again.
        
           | utopiah wrote:
           | I'm totally speculating. I have no extra information there.
           | 
           | It just makes me think of all the staff, technical staff,
           | that left OpenAI recently. Altman was making grand claims
           | about what was coming next.
           | 
           | Well, we know what followed, namely I don't think any
           | researcher who left knowing what was in the pipeline feel
           | like they missed much in terms of access.
        
           | utopiah wrote:
           | Just checked BTW and ... premise looks fun but the score is
           | too low
           | https://www.rottentomatoes.com/m/men_who_stare_at_goats was
           | it actually good as movie, not just the idea behind it?
        
             | jansan wrote:
             | It's more the idea behind it. Considering the great cast,
             | the movie could have been much better.
        
               | vintermann wrote:
               | The non-fiction book behind it is probably better
               | comparison than the film adaptation, if you think Meta
               | are doing goat-staring (I don't think they're especially
               | bad on this issue compared to their rivals).
        
         | reactordev wrote:
         | Negative, what you shave taken away is it's _the people_. He
         | mentions standing up clusters. Small shops can't afford
         | clusters. Ignore the technical aspect of this article and read
         | it for what it's for. A thank you note to the _people_ he has
         | worked with on amazing projects. Research in a bubble of 1
         | isn't very useful. Research in a small team with Meta Budget is
         | extremely useful. With _the right people_.
        
         | nrjames wrote:
         | If you can afford to support yourself, which I'm sure he can,
         | there's a serenity to working on small projects that are nothin
         | the public eye. It may simply be that he craves some quiet time
         | that enables him to focus on his family and himself.
        
         | ErroneousBosh wrote:
         | > where people "dream" of how advanced the military is
         | 
         | If you've ever worked on "advanced military grade" equipment,
         | you'd know better.
         | 
         | It tends to be what you'd euphemistically call "well-proven
         | technology", built down to a price by the lowest bidder, by
         | comparatively unskilled labour.
         | 
         | The most shocking thing about the "captured" Russian drones is
         | they use name-brand Raspberry Pis inside. I'm prepared to bet
         | the American versions use whatever AliExpress crap is on
         | special this week. The UK stuff definitely does.
        
           | embedding-shape wrote:
           | Isn't that exactly the point parent was trying to make? Maybe
           | I misunderstood their comment, but it seems like you're
           | repeating what they said.
        
             | ErroneousBosh wrote:
             | Post cup-of-tea (not NATO-spec, just black thanks, tea with
             | just tea in it) I realise you're correct.
        
             | utopiah wrote:
             | You read it right, I think they agree. Maybe when I wrote
             | "dream" in quotes the irony was lost.
        
           | esseph wrote:
           | I mean, these things do exist. There are always tons of big
           | and small tech projects floating around in the special
           | operations community. Cutting-edge sets of hybrid
           | night/thermal vision. Classified helicopters. Hand-built
           | rifled with custom cartridges. Classified medical tech.
           | Advanced fixed wing aircraft with unique capabilities.
           | Advanced dive gear. So on.
           | 
           | "Big Army" doesn't see that stuff for decades, if ever, and
           | mostly never due to cost. And I'm not even getting into
           | classified submarine and nuclear tech, fixed wing drones and
           | aircraft flying at night out of known test facilities, etc.
           | 
           | There's tons of actually advance tech out there in military
           | circles.
        
             | compiler-guy wrote:
             | Yeah. The DOD is enormous and definitely has your boring
             | every day stuff, but tons of skunk works as r&d. just not
             | very public. An organization that big has all kinds of
             | nooks and crannies, so isn't really that monolithic.
        
         | oxfordmale wrote:
         | I think you might be reading a bit too much into this.
         | 
         | He's been with Meta for 11 years and is likely in a very
         | comfortable financial position, given the substantial stock
         | options he's received over that time.
         | 
         | He also mentioned the arrival of a new child, and it's well
         | known that Meta's work-life balance isn't always ideal.
         | 
         | On top of that, Meta, like many major tech companies, has been
         | shifting its focus toward LLM-based AI, moving away from more
         | traditional PyTorch use cases.
         | 
         | Considering all of this, it seems like a natural time for him
         | to move on and pursue new, more exciting opportunities.
        
           | ralusek wrote:
           | > toward LLM-based AI, moving away from more traditional
           | PyTorch use cases
           | 
           | Wait, are LLMs not built with PyTorch?
        
             | gordonhart wrote:
             | GP is likely saying that "building with AI" these days is
             | mostly prompting pretrained models rather than training
             | your own (using PyTorch).
        
               | SV_BubbleTime wrote:
               | Everyone is fine-tuning constantly though. Training an
               | entire model in excess of a few billion parameters. It's
               | pretty much on nobody's personal radar, you have a
               | handful of well fundedgroups using pytorch to do that.
               | The masses are still using pytorch, just on small
               | training jobs.
               | 
               | Building AI, and building with AI.
        
               | gordonhart wrote:
               | Fine-tuning is great for known, concrete use cases where
               | you have the data in hand already, but how much of the
               | industry does that actually cover? Managers have hated
               | those use cases since the beginning of the deep learning
               | era -- huge upfront cost for data collection, high
               | latency cycles for training and validation, slow reaction
               | speed to new requirements and conditions.
        
             | pseudocomposer wrote:
             | Llama and Candle are a lot more modern for these things
             | than PyTorch/libtorch, though libtorch is still the de-
             | facto standard.
        
               | vlovich123 wrote:
               | Pytorch is still pretty dominant in cloud hosting. I'm
               | not aware of anyone not using it (usually by way of vLLM
               | or similar). It's also completely dominant for training.
               | I'm not aware of anyone using anything else.
               | 
               | It's not dominant in terms of self-hosted where llama.cpp
               | wins but there's also not really that much self-hosting
               | going on (at least compared with the amount of requests
               | that hosted models are serving)
        
               | liuliu wrote:
               | That's wrong. Llama.cpp / Candle doesn't offer anything
               | on the table that PyTorch cannot do (design wise). What
               | they offer is smaller deployment footprint.
               | 
               | What's modern about LLM is the training infrastructure
               | and single coordinator pattern, which PyTorch just
               | started and inferior to many internal implementations:
               | https://pytorch.org/blog/integration-idea-monarch/
        
           | Anon1096 wrote:
           | > On top of that, Meta, like many major tech companies, has
           | been shifting its focus toward LLM-based AI, moving away from
           | more traditional PyTorch use cases.
           | 
           | This is very wrong. Meta is on the forefront of
           | recommendation algorithms and that's all done with
           | traditional ML models made using PyTorch.
        
             | oxfordmale wrote:
             | Meta is definitely at the forefront of recommendation
             | algorithms built. However, the leadership team likely has
             | shifted focus to LLMs.
        
             | skybrian wrote:
             | Some recommendations are uncanny, except that I don't want
             | any of them in my Facebook news feed and no matter how
             | often I select "never show me this feed again," it keeps
             | trying.
        
         | HarHarVeryFunny wrote:
         | > What I find most interesting with this is that it shows they
         | believe there is nothing unique at Meta related to AI
         | 
         | Whether or not this is the case, I don't get this as being the
         | reason for Sousmith leaving - it sounds as if he is just ready
         | for a change.
         | 
         | Still, it is noticeable that with many of the AI companies
         | claiming that their version of "AGI" is just around the corner,
         | developers and staff don't appear to be particularly excited
         | about this (I assume they realize it is just hype, not some
         | momentous advance around the corner), and leave to pursue
         | different things, such as Mira Murati starting a fine-tuning
         | company, Karpathy going back to education, others switching
         | ship (typically from OpenAI to Anthropic), etc.
        
           | moron4hire wrote:
           | "Ready for change" is just the polite way to say, "I can't
           | stand it here anymore. I'd rather roll the dice on a new
           | place because reversion-to-mean means it's probably going to
           | be better than whatever this has become."
           | 
           | There are a lot of things I don't like about my current job,
           | but not enough for it to make sense to gamble on a new place.
           | It's easier to push for change from my current position than
           | to count on any new place being any better.
           | 
           | But if it gets worse and I do leave, I'll definitely be
           | telling the interviewer, "I was just ready for a change."
        
             | embedding-shape wrote:
             | > is just the polite way to say
             | 
             | Can be*, that's not necessarily always true. I've quit jobs
             | plenty of times without having any plan for the future or
             | particular drama-reason for leaving, just "It's not as fun
             | here anymore, despite this being a great place to work",
             | I'm sure I'm not the only one who does so.
             | 
             | What I've never done though, is leaving a place without
             | being 100% honest exactly why I'm leaving. I won't say "I
             | was just ready for change" if that wasn't the reason, I
             | have no reason not to be honest about why I'm leaving.
        
               | ghaff wrote:
               | I've generally had 10+ year tenures other than a return
               | to school that was basically always in my plan and dot-
               | bomb (leaving a company I wasn't really a fit with
               | anyway). But, yeah, I've always been ready to move on at
               | about that ten year point which is actually fairly long
               | by a lot of people's standards in the tech industry.
               | 
               | I do disagree though that, unless there's some actionable
               | change that would specifically benefit you like more
               | money, my answer outside of private conversations with
               | people I know well is going to be some variant of "time
               | for a change." Anything else just invites arguments and
               | conversations I don't want to have.
        
             | aprilthird2021 wrote:
             | I do want to push back on this a little. People leave all
             | the time for this "I wanna see what else is out there"
             | especially at such senior levels and with so much financial
             | security as he inevitably has from working at Meta for 11
             | years. It is not always a gamble for many of them, and many
             | of them are not so skeptical and cynical of other places
             | they could go and bring their expertise
        
             | pelagicAustral wrote:
             | I think age plays an important part in the decision to move
             | away from a place. I think in your 20s or very early 30s
             | you have far more leeway to kind of go away and start
             | again, but a lot of the hope to actually be able to find
             | that unicorn workplace fades away as you approach your late
             | 30s. Once into your 40s, depending on your trade, you're
             | dead on arrival unless you successfully manage to rebrand
             | yourself as a consultant, whatever the fuck that means.
        
               | ghaff wrote:
               | Age does factor in various ways. It can be "it's now or
               | never" or it may be "I might as well hold on for a few"
               | or something in between.
        
             | assemblyman wrote:
             | On the other hand, while I know nothing about Soumith, he
             | clearly has enough financial runway (see my calc below) to
             | not have to work again.
             | 
             | As far as I know, we all get one life. If one can help it
             | (modulo other constraints), one should not get trapped by
             | prestige, achievement, short-term admiration by others,
             | impact and external facing factors. To see an alternate
             | reality, it helps to escape the bubble, for example, by
             | spending time in a completely different culture or
             | environment where no one knows or cares about what one did.
             | 
             | I admire people taking such decisions. It's easy to be on
             | autopilot in life. But, people who wear their success
             | lightly are rare but more philosophically aware, in my
             | opinion at least. I wish him good luck!
        
               | Mars008 wrote:
               | > see an alternate reality, it helps to escape the
               | bubble, for example, by spending time in a completely
               | different culture
               | 
               | I'm at similar position now, need to make decision. The
               | problem is after leaving IT world for a while it will be
               | hard to get back. I'll have to change my life completely
               | and discard all knowledge and expertise I have. That will
               | be fun, interesting, eyes opening, etc, but no way back.
        
               | mandevil wrote:
               | I don't know you, don't know your situation, but this
               | does not seem to match the experiences of many of my
               | friends who left for a while and then came back. "Spent
               | two years starting a restaurant" and "had to take care of
               | my parents" were not blockers for getting another
               | computer related job in due time. There are few truly
               | irrevocable decisions in our life.
               | 
               | Now, the current job market makes this significantly
               | harder than it was in the 2010's, but that's floating
               | over all of us- if your company does an Amazon tomorrow,
               | would you get a job as nice as you currently have? Maybe,
               | maybe not.
        
               | ghaff wrote:
               | In executive roles, your expertise really is in
               | management acumen a lot of the time. But as an individual
               | contributor--or adjacent--once you're out of a technical
               | space for a few years, it's increasingly hard to get back
               | in even if you've casually kept a finger in.
        
               | Mars008 wrote:
               | Exactly, the only way to stay current is to keep doing
               | something at least half time. The good thing it doesn't
               | have to be the same as prev job. Just keep brain working
               | and learning.
        
         | KaiserPro wrote:
         | I don't think thats what is being said.
         | 
         | Having friends who are at or near both FAIR and other AI parts
         | of meta, reosurces are not the issue, anymore at least. (there
         | had been a massive squeeze for the last two years though) But
         | pytorch and FAIR use(d) a AWS based cluster. (however pytorch
         | is used everywhere else inside facebook though. well not
         | everywhere...)
         | 
         | There is/are plenty of interesting things happening at big
         | tech, and Meta specifically. If you like computer vision, then
         | Meta is pretty much still the world leader. Much as it pains me
         | to say it.
        
         | GuB-42 wrote:
         | About the military, from my limited experience, they are
         | significantly behind the civilian state of the art, except for
         | technology that has has few applications outside of the
         | military, like stealth.
         | 
         | In fact everything secret tends to be behind. Secrecy is a huge
         | burden, and seriously limits all forms of collaboration.
         | 
         | In addition, because military projects are often big and highly
         | politicized you get all the inefficiencies that goes with that.
         | Classification is also convenient for hiding screwups and
         | corruption.
        
           | FuriouslyAdrift wrote:
           | Post Cold War, most militaries shifted to COTS and less
           | boutique development. Turns out, you only need to put
           | resources in a few places to stay ahead (stealth, sensing and
           | measuring, space, hypersonics, drones, etc).
           | 
           | It's MUCH cheaper and quicker.
        
           | dmix wrote:
           | I just assume all government software is poorly written by
           | huge consulting companies, like the famous FBI one
           | https://en.wikipedia.org/wiki/Virtual_Case_File
           | 
           | > a 318-page report [...] said the SAIC software was
           | incomplete, inadequate and so poorly designed that it would
           | be essentially unusable under real-world conditions. Even in
           | rudimentary tests, the system did not comply with basic
           | requirements
           | 
           | I figured the reason Palantir was so successful was because
           | it was a SV software company instead of a defense contractor
           | dabbling in IT or specialized government consultancy.
        
           | shadowgovt wrote:
           | The military doesn't have the luxury of things being
           | unreliable. It puts a pressure on them that corporations
           | don't necessarily have: they'd rather have a less-effective
           | but proven system than a potentially-more-effective but
           | riskier system (especially since each system they have comes
           | with massive logistics support).
           | 
           | Ironically, corporations can afford to take _more_ risks of
           | failure (financially and project-wise) than militaries
           | because failure for them doesn 't mean actual human death
           | (and when it can, you see processes come in that look a lot
           | more like military processes).
        
             | Jtsummers wrote:
             | It's actually the commercial/consumer side that gets more
             | reliability than the military side.
             | 
             | The military _should_ have very reliable systems, and they
             | often know the point at which their systems will fail (MTBF
             | calculations are easier to develop with their record
             | keeping). However, the military _also_ has an almost
             | unlimited budget and body count to keep just reliable
             | enough things working much better than they should. It 's
             | also really bad about actually competing companies against
             | each other.
             | 
             | The commercial sector, targeting consumers, is where you
             | actually get reliable systems. Why? Because consumers will
             | go towards either the cheapest option (reliability is
             | replaced with ubiquity in the market, it's replaceable) or
             | the more reliable but more expensive options. They
             | (individuals) don't have an unlimited budget or unlimited
             | time to maintain everything in their life. There's
             | competition in the commercial world that's completely
             | absent in the military world
             | 
             | The two major exceptions are where COTS products have taken
             | over (definitionally, DOD is using commercial, often
             | consumer-targeted, products instead of military specific
             | products) and special forces. Special forces often bypasses
             | normal acquisitions processes and so ends up having a
             | better chance to compete vendors against each other than
             | other parts of the military.
             | 
             | This doesn't mean everything the DOD procures through
             | normal acquisitions is inherently unreliable, but
             | reliability is only one of many factors and often only
             | really discovered after selection and full-rate production
             | has started. By that point, the DOD is committed to it for
             | years to come. Each DOD procurement is separate enough from
             | others that you don't even get huge opportunities for
             | reuse. The F-35, to pick something from this century,
             | didn't get components that were shared with other aircraft
             | in the DOD fleet. It's almost all new, which means a lot of
             | things were learned about its reliability after it started
             | flying. It has new comms, new radar, new almost everything.
             | Even the engine (though that probably used many
             | subcomponents shared with other engines) was a new engine
             | just used by the F-35.
        
           | mandevil wrote:
           | I spent a dozen years as a US defense contractor across a
           | broad spectrum of places (from R&D for the future to working
           | with E3's today), and worked at internet scale and start-up
           | B2B stuff in the other dozen years of my working career.
           | 
           | I think that the major difference about deployed military
           | technologies- in contrast to both military R&D and the entire
           | commercial side- is that they are, by and large, incredibly
           | rock solid and reliable. If they aren't, they don't actually
           | get used. It takes a lot of effort to get them that way. I
           | remember once at a testing ground for our robot tanks of the
           | far future, right next door was an outdoor test-track. And
           | they were testing a kitchen trailer (a kitchen for ~200 men
           | that can be towed by a Humvee). And they drove it around the
           | track continuously for three weeks, stopping only long enough
           | to change drivers/vehicles, and the four times a day they
           | would halt and make 200 people meals, and then pack up and
           | get back to driving. This was one of several reliability
           | tests that the kitchen trailer had to get through before it
           | was accepted for service.
           | 
           | Our R&D stuff couldn't handle that (it needed 3-4 engineers
           | to carefully monitor it at all times), but the stuff that
           | needed to be in the hands of some random 18 year old with a
           | two week training course had to be rock solid to use, do
           | regular maintenance on, and fix, even when they were only
           | getting four hours of sleep a night. If it wasn't up to that
           | level, then the troops ended up ignoring it, leaving it
           | behind when they went out to do their job. And by and large,
           | from what I could tell, most of the stuff they had was that
           | reliable. There were some cool things that we were doing in
           | the R&D space, but we were a long way from that level.
        
             | mandevil wrote:
             | One thing I meant to add: this extensive testing- and the
             | enormous amount of documentation/training materials
             | necessary to take an 18 year old with average ASVAB scores
             | and produce someone who can cook meals for 200 other
             | soldiers on four hours of sleep a night- is both why
             | military things cost so much, relative to commercial grade
             | stuff, and why they don't get updated particularly often.
             | Frequent software updates that change menus around play
             | havoc with the detailed training powerpoints that the
             | military relies on to produce those 18 year old tool
             | operators.
             | 
             | Secret Squirrel projects (which I was near but never read
             | into) can get away with lower reliability because they can
             | count on the users to be much better trained and prepared,
             | though again, from my brief encounters with these sorts,
             | they will ignore anything they don't trust to be completely
             | reliable. Reliability matters far more than cutting edge
             | for like 99.9% of military gear.
        
         | groundzeros2015 wrote:
         | Or he just takes for granted the resources he has.
        
         | oofbey wrote:
         | Nothing unique? Totally disagree.
         | 
         | Unlimited compute resources aren't literally unique but there
         | are only a small handful of places in the world that have that.
         | 
         | Vast quantities of private data, especially text communications
         | and images. Very few places have that. Coupled with a culture
         | that puts zero privacy protections on that data. Even Google
         | likes to think they're doing the right thing, so I think that
         | makes Meta unique.
        
       | aabhay wrote:
       | For anyone that's curious, the underlying Torch library is also a
       | joy to work with, as are the many other torch bindings. For
       | example, Rust has tch and Burn which both work with libtorch.
       | 
       | PyTorch of course has the benefit of being dynamically
       | debuggable. Can't forget the first time I break pointed my
       | pytorch model and wrote pytorch calls inside the terminal to
       | inspect the behavior. That's still something I miss a lot now
       | that I'm working with only "fast" compiled code.
        
         | lysecret wrote:
         | I wrote som truly awful code back in the day because of that
         | but god it was glorious.
        
       | irthomasthomas wrote:
       | Counterfactual Regret Minimization irl
        
       | numice wrote:
       | I read one post on his blog and found that Adam Paszke reached
       | out to the author and got an internship. I wonder if it was that
       | easy to get an internship at FAIR. I thought that they hire only
       | PhDs.
        
         | vintermann wrote:
         | I didn't know that. Soumith Chintala certainly paid it forward.
         | He was very helpful and supportive of random strangers (like
         | me!) in the early pytorch days. I count him with Andrej
         | Karpathy and Chris Olah as one of the people who really made
         | machine learning accessible to regular software engineers.
        
           | SpaceManNabs wrote:
           | Chris Olah is the goat.
           | 
           | I reached out to him myself years ago and was surprised at
           | getting a response.
           | 
           | And the response was incredibly generous. I probably wouldn't
           | have had the confidence to do my switch if it wasn't for
           | Olah.
           | 
           | And as I got further into this path, I learned that Olah had
           | done the same for some of my mentors and colleagues.
           | 
           | Every time Olah speaks, I listen.
        
         | chrneu wrote:
         | You can't do anything if you never try.
        
         | nswizzle31 wrote:
         | I was pretty involved in the PyTorch ecosystem in the early
         | days around 2016 and Adam was nothing short of a genius and
         | prolific developer whose contributions to the codebase and
         | community were immense. I think he was like an undergrad in
         | Poland at the time. My understanding is that his contributions
         | came before the internship, but I don't know.
         | 
         | My memory is that Souminth was really open to other people's
         | contributions and questions, no matter their credentials. He
         | was a great leader who felt approachable to the open-source
         | community.
        
       | gdiamos wrote:
       | This is the end of an era. Amazing work soumith.
        
       | hshdhdhehd wrote:
       | Nice, that is the dream career!
        
       | isusmelj wrote:
       | Very proud as a Swiss that Soumith has a .ch domain!
        
         | roflmaostc wrote:
         | Probably because his first name is Chintala
        
           | spprashant wrote:
           | That d be his last name
        
             | roflmaostc wrote:
             | true haha
        
       | sumedh wrote:
       | His homepage says he wants to build a robot. So he is probably
       | going to work with robots for his next role.
       | 
       | He is an investor in Anthropic, didnt know you could do that
       | working for Meta.
        
         | geodel wrote:
         | Could be Meta is quite liberal in this area. Or it could be one
         | of those "For my friend, anything, for everyone else its
         | corporate policy."
        
       | ergocoder wrote:
       | I wonder how much this guy has earned from Meta in total. Would
       | it reach $100M?
        
         | stephenlf wrote:
         | Considering Meta was trying to Poach AI talent for $250M, I
         | wouldn't be surprised if this guy has his own 8-figure income
        
           | assemblyman wrote:
           | If someone made $2 million/year over 10 years, after taxes,
           | it would be $1 million (NYC has local taxes too). Let's say
           | all of it was saved and invested in SP500 or Meta.
           | 
           | SP500: tripled over 10 years i.e. ~12% a year. Reinvesting
           | dividends gives ~14% a year
           | 
           | Meta: 8x over 10 years i.e. ~23% a year.
           | 
           | If growth was uniform over 10 years and compensation/savings
           | was uniform over 10 years, total portfolio would be:
           | 
           | ((1+r)^11-1)/r (geometric series since each year's
           | contributions grow for different amount of times)
           | 
           | 1 (this year) + (1+r) (previous year) + (1+r)^2 (previous-to-
           | previous year) and so on
           | 
           | SP500: 14% -> $23M Meta: 23% -> $38M
           | 
           | Now, it's entirely possible, the compensation for a position
           | like this runs into $10s of millions and one can easily
           | account for non-uniform compensation.
           | 
           | Even in NYC, actually even in Manhattan, $10M is more than
           | comfortable for retirement. It lets you draw $300-$400K (3-4%
           | per year adjusted for inflation annually). If one is taking a
           | short sabbatical, then it's a no-brainer.
        
             | lovecg wrote:
             | This seems to assume unusual optimism or foresight, most
             | people don't invest their life savings 100% into stocks and
             | don't hold on to 100% of their company vests through ups
             | and downs. You might as well say "assuming he put all his
             | money in NVDA..."
        
               | assemblyman wrote:
               | It's a back-of-the-envelope calculation not a precise
               | one. It doesn't take foresight to invest in SP500. DCAing
               | (dollar-cost averaging) into an index fund is actually
               | the recommended savings strategy with a short-term cash
               | balance of 6 months-2 years of cash savings depending on
               | your plans (sabbatical etc.), especially when one is
               | decades away from retirement age.
               | 
               | I only included meta because he works/worked at meta and
               | it's not unusual for people to just leave their rsus in
               | their accounts after they vested. I agree though that one
               | shouldn't pick stocks that happened to explode (e.g.
               | nvda).
               | 
               | There are several unrealistic assumptions I did make:
               | 
               | * Presumably when someone starts, they earn less than in
               | recent years. He probably wasn't making huge amounts his
               | first few years. Amounts invested in earlier years are
               | smaller but have more time to compound and amounts
               | invested in recent years are larger but have had less
               | time to compound.
               | 
               | * Returns aren't constant.
               | 
               | * I pulled the $2 million/yr out of thin air. It could be
               | $1 million/yr or even $10 million/yr. I have no idea what
               | the overall head of a project like PyTorch would make.
               | 
               | * Everyone's expenses are different. In and around NYC,
               | one can live on $80k/year, $120-150k/year as well as as
               | on $1 million/yr. I assumed zero since I wanted a nice
               | even $1 million/yr savings. Maybe it was $500k/yr of
               | savings in which case all the numbers should be halved.
               | 
               | In any case, I can't see how one wouldn't end up with at
               | least $10 million in a position like this with 10 years
               | at meta. Unless one buys a $5 million unit in Manhattan
               | and is burdened by a high mortgage.
        
       | kleiba wrote:
       | You forgot to thank Jurgen. /scnr
        
       | jsrozner wrote:
       | Is this also partially AI generated? What's with the repeated
       | short phrases? Is this just everyone's style now?
        
         | Cthulhu_ wrote:
         | You're asking a lot of questions but are you willing to think
         | about it? For one, no it's not "everyone's style" because you
         | wouldn't have asked whether it was, you'd know.
        
         | kelvinjps10 wrote:
         | I felt it very human the writing on this post
        
       | ishouldbework wrote:
       | Look, I get that _some_ pages require javascript, but
       | <style class="fallback">body{visibility:hidden;white-
       | space:pre;font-family:monospace}</style>
       | 
       | which is then unset by JS, with no <noscript> anywhere, is
       | just... I just get white page.
       | 
       | Changing it to                   <style
       | class="fallback">body{white-space:pre-wrap;font-
       | family:monospace}</style>
       | 
       | gives perfectly readable web, so it seem bit... pointless.
        
       | philipwhiuk wrote:
       | There's no context around 'FAIR' - is it https://www.go-
       | fair.org/?
        
         | abracos wrote:
         | it's https://ai.meta.com/research/
        
         | aprotyas wrote:
         | Facebook Artificial Intelligence Research, c.f.
         | https://engineering.fb.com/category/ai-research/#:~:text=Art...
        
           | overfeed wrote:
           | Now retconned as "Fundamental AI Research" since the Meta
           | rebrand.
        
             | stuxnet79 wrote:
             | Interesting, is Yann Lecun still heading FAIR? From the
             | outside looking in, it seems like he is getting sidelined
        
       | shevy-java wrote:
       | To me it sounds as if he is trying to open a new chapter in his
       | life. Good for him, but I wonder if everything was really as
       | smooth as described. People often write how everything is always
       | perfect on their blog. Well - could be. But it could also be that
       | not everything was perfect but no description followed on the
       | blog.
        
       | ninjagoo wrote:
       | Soumith's 2nd release?
       | https://github.com/pytorch/pytorch/releases/tag/v0.1.1
       | 
       | Also, looking at the contribution history for a long career is
       | very interesting; reflects the changing roles over time
       | https://github.com/soumith
        
       | xpe wrote:
       | It is notable (but perhaps not surprising) that this is mostly
       | about the people and the work itself. The writing is silent on
       | the downstream impacts on the world. In contrast, there are
       | fields (global development, medicine, etc.) where people tend to
       | focus on the impact on humanity (especially when reaching a
       | milestone in their career).
        
       | CommenterPerson wrote:
       | Firstly, Good work.
       | 
       | Ironical but one HN front page item today is this: "Meta
       | projected 10% of 2024 revenue came from scams and banned goods,
       | Reuters reports"
       | 
       | Glad you're leaving, hopefully you're in a good place
       | financially. Take a page from Bill Gates and work on something
       | that attempts to improve society. Stay away from surveillance
       | capitalism and enshittification.
        
       | odyssey7 wrote:
       | > It's taught in classrooms from MIT to rural India. The tools I
       | dreamed about making accessible? They are. The barrier to entry I
       | wanted to lower? It's almost gone.
       | 
       | I have an ironic sense that there are classrooms in rural India
       | with better pedagogy and lower barriers to entry than some of our
       | elite engineering programs.
        
         | john01dav wrote:
         | Many elite engineering programs in the United States (I don't
         | know if this is what you mean by "our") are elite solely due to
         | social status (rankings need to publish rankings that feel
         | right, or they're ignored, and they accept bribes to rank
         | specific programs) and research output, with little to do with
         | quality of pedagogy. Instead, pedagogy is generally poor
         | because the elite researchers usually view teaching as a chore
         | and many don't have any real skill in it either.
        
         | crazygringo wrote:
         | I thought traditional Indian pedagogy was heavily criticized
         | for being heavily based on rote memorization over conceptual
         | understanding or problem solving. And also being heavily
         | hierarchical and exam-oriented.
         | 
         | This isn't to say engineering programs in the US can't be
         | improved, but there seems to be widespread consensus that they
         | don't suffer from the kinds of serious problems that ones in
         | India commonly do.
        
       | mmaunder wrote:
       | " What's next for me? Something small. Something new. Something I
       | don't fully understand yet. Something uncomfortable. I could have
       | moved to something else inside Meta. But I needed to know what's
       | out there. I needed to do something small again. I couldn't live
       | with the counterfactual regret of never trying something outside
       | Meta."
       | 
       | Shades of Siddhartha. Back to the forest.
        
         | galoisscobi wrote:
         | Ah yes, shades of Siddhartha. I almost forgot about the part
         | where he worked for a megacorp that was ripping society's
         | social fabric apart and wanted to do something else for a
         | while.
        
           | abustamam wrote:
           | I don't think he was involved in that though.
        
             | dbgrman wrote:
             | He was. he didn't just parachute in Meta to start working
             | on PyTorch. he worked in many areas of the product and a
             | member of the senior technical staff, was knowledgable
             | about many aspects of the company.
        
       | cs702 wrote:
       | Many of the comments here are judging PyTorch _in hindsight_ ,
       | which is unfair.
       | 
       | When Soumith Chintala co-launched PyTorch, and for many years
       | after, the alternatives for fast, interactive, convenient
       | development were _much worse_. There was no Jax.
       | 
       | Every single AI researcher I know, including me, who _tried_
       | PyTorch back then _immediately wanted to switch to it, because it
       | was so much better_. Andrej Karpathy described what PyTorch felt
       | like back then when he tweeted, on May 2017,  "I've been using
       | PyTorch a few months now and I've never felt better. I have more
       | energy. My skin is clearer. My eyesight has improved."[a]
       | 
       | THANK YOU SOUMITH for your hard work over all these years! Your
       | hard work has made a difference for a huge number of people,
       | including many of us here on HN.
       | 
       | We wish you success in your future endeavors, whatever they turn
       | out to be!
       | 
       | Please ignore all the petty criticism.
       | 
       | ---
       | 
       | [a] https://x.com/karpathy/status/868178954032513024
        
         | golly_ned wrote:
         | There was Chainer, which originated the define-by-run model
         | that characterized PyTorch's effectiveness. It was developed by
         | a much smaller, much less influential company in Japan. Early
         | PyTorch is transparent about the debt owed to Chainer.
        
           | maxc01 wrote:
           | Yes, exactly--not many people know about Chainer nowadays.
           | Back in 2016, PyTorch's interface was actually inferior to
           | Chainer's, and I think Chainer's design was really ahead of
           | its time.
        
           | NalNezumi wrote:
           | The company is called preferred networks, and they're still
           | around, and have some branched off subsidiaries too.
        
           | cs702 wrote:
           | Thanks. Yes, I remember Chainer, but only vaguely. I kinda
           | remember looking at it, but not actually using it.
           | 
           | My recollection is that when I looked at Chainer back then,
           | it didn't offer a comprehensive library of preexisting
           | components for deep learning. When I tried PyTorch, on the
           | other hand, I vividly remember it as _already_ having lots of
           | prebuilt components (common layers, activation functions,
           | etc.) in `torch.nn`, so it was easier and faster to get
           | going.
           | 
           | These memories are vague, so I could be wrong.
        
         | Teodolfo wrote:
         | PyTorch was partly inspired by the python Autograd library
         | (circa 2015 [1]) to the point where they called their autodiff
         | [2] system "autograd" [3]. Jax is the direct successor of the
         | Autograd library and several of the Autograd developers work on
         | Jax to this day. Of course, for that matter, PyTorch author
         | Adam Paszke is currently on the JAX team and seems to work on
         | JAX and Dex these days.
         | 
         | [1] https://pypi.org/project/autograd/#history
         | 
         | [2]
         | https://www.cs.toronto.edu/~rgrosse/courses/csc421_2019/read...
         | 
         | [2]
         | https://web.archive.org/web/20170422051747/http://pytorch.or...
        
           | cs702 wrote:
           | Yes, PyTorch borrowed from Autograd, Chainer, etc.
           | 
           | ...but PyTorch felt friendlier and more Pythonic, and it came
           | with a comprehensive library of prebuilt components for deep
           | learning in `torch.nn`.
           | 
           | See https://news.ycombinator.com/item?id=45848768
        
       | jmward01 wrote:
       | All I can say is 'thanks!'. It does take a team, and a community,
       | but individuals matter. I use pytorch daily, it has made it
       | possible for me to play with ideas that I would have only dreamed
       | of. It is a big part of my life so, thanks for your contributions
       | and best of luck on the next thing!
        
       | RobRivera wrote:
       | Good luck, good job, and may more fantastical journeys lay ahead.
        
       | jurschreuder wrote:
       | It's just a python wrapper around Torch, an open source C program
       | by a Swiss non-profit, so META could compete with TensorFlow of
       | Google.
       | 
       | I don't know why this is celebrated so much, a big company
       | rebranding a non-profit for profit.
       | 
       | But I guess that's the norm for AI now.
        
         | whymauri wrote:
         | Wait, PyTorch and the ecosystem are much more than just that.
         | You can't be serious?
        
           | jurschreuder wrote:
           | It's just Torch for idiots
        
       | w10-1 wrote:
       | Given the technical achievements and industry impact, I'm struck
       | that his emphasis is on the people, and in particular growing
       | people to take on challenges. His influence might lie as much in
       | what they do. The connection linking them seems to be building
       | curiosity and wonder into delight for users.
       | 
       | If there's a soul to silicon valley, that's it. However many
       | jerks and political/power players I encounter, I remain inspired
       | by the few like this.
        
       | mupuff1234 wrote:
       | https://news.ycombinator.com/item?id=45847465
       | 
       | Seems relevant.
        
       | jeffreysmith wrote:
       | I'm one of the many people who Soumith hired to Meta and PyTorch.
       | I had the privilege of working on PyTorch with him and lots of
       | the folks on this post.
       | 
       | As his longtime colleague, the one thing I would want people to
       | know about him and this decision is that Soumith has always
       | viewed PyTorch as a community project. He consistently celebrated
       | the contributions of his co-creators Adam and Sam, and he
       | extended the same view towards the Yangqing and the Caffe2 crew
       | that we merged into PyTorch. At the very beginning, by Soumith's
       | highly intentional design, PyTorch was aimed at being truly
       | developed by and for the AI research community and for many years
       | that was the key way in which we grew the framework, FB PT team,
       | and the wider community. At every single stage of PT's lifecycle,
       | he always ensured that our conception of PT and its community
       | grew to include and celebrate the new people and organizations
       | growing what was possible with PT. He's an incredible talent
       | magnet, and thus more and more smart people kept dedicating their
       | blood, sweat, and tears to making PT bigger and better for more
       | people.
       | 
       | I've worked with some very well known and highly compensated
       | leaders in tech, but *no one* has done the job he has done with
       | ameliorating a bus factor problem with his baby. PT has a unique
       | level of broad support that few other open source technology can
       | reach. In a world of unbounded AI salaries, people who want to
       | move AI research methods forward still freely give their time and
       | attention to PyTorch and its ecosystem. It's the great lever of
       | this era of AI that is moving the world, *due in large part* to
       | the strength of the community he fostered and can now let
       | continue without his direct involvement.
       | 
       | His departure is the end of an era, but it's also operationally a
       | true non-event. PyTorch is going strong and can afford to let one
       | of its creators retire from stewardship. This is precisely what
       | success looks like in open source software.
       | 
       | He deserves our congratulations and our thanks. Enjoy your PT
       | retirement, man.
        
         | casualscience wrote:
         | Also worked with Soumith. The man is a legend, moves mountains
         | and completely changed the course of my career because he liked
         | something I wrote. No arrogance, no politics, just an extremely
         | down to earth and chill guy who elevates everyone around him.
         | 
         | Hope him the best!
        
           | sumedh wrote:
           | What did you write?
        
       | warbaker wrote:
       | I just want to say: thank you. It is hard to overstate how much
       | Pytorch has accelerated ML/AI development, across the board.
        
       | livelaughlove69 wrote:
       | What an amazing impact this man has had. He seems like a great
       | guy too. I don't know him at all personally but his replies
       | online have always been so nice. I really wish him all the best
       | with whatever he does with his time next
        
       ___________________________________________________________________
       (page generated 2025-11-07 23:00 UTC)