[HN Gopher] Leaving Meta and PyTorch
___________________________________________________________________
Leaving Meta and PyTorch
Author : saikatsg
Score : 663 points
Date : 2025-11-07 06:14 UTC (16 hours ago)
(HTM) web link (soumith.ch)
(TXT) w3m dump (soumith.ch)
| msmd74 wrote:
| Sounds like you had a momentous run.
|
| If you take advice from reformed Internet trolls, consider
| turning off all your devices and trying to give yourself at least
| a week, but ideally a month offline staring at your new baby.
| You'll never get that time back and there's nothing your brain
| will appreciate more than loading up those memories as they grow.
|
| Good luck.
| qmatch wrote:
| As a loyal JAX user, I hope they can play catchup. PyTorch has
| dominated the AI scene since TF1 fumbled the ball at 10th yard
| line. What Matt Johnson has done turning Autograd into JAX is
| hopefully going to be worthy of as much praise as what Soumith
| has received.
| n_u wrote:
| > PyTorch has dominated the AI scene since TF1 fumbled the ball
| at 10th yard line
|
| can you explain why you think TensorFlow fumbled?
| zapnuk wrote:
| For me it was about 8 years ago. Back then TF was already
| bloated but had two weaknesses. Their bet on static compute
| graphs made writing code verbose and debugging difficult.
|
| The few people I know back then used keras instead. I
| switched to PyTorch for my next project which was more
| "batteries included".
| michaelt wrote:
| Imagine a total newbie trying to fine-tune an image
| classifier, reusing some open source example code, about a
| decade ago.
|
| If their folder of 10,000 labelled images contains one image
| that's a different size to the others, the training job will
| fail with an error about unexpected dimensions while
| concatenating.
|
| But it _won 't_ be able to say the file's name, or that the
| problem is an input image of the wrong size. It'll just say
| it can't concatenate tensors of different sizes.
|
| An experienced user will recognise the error immediately, and
| will have run a data cleansing script beforehand anyway. But
| it's not experienced users who bounce from frameworks, it's
| newbies.
| mschuster91 wrote:
| > An experienced user will recognise the error immediately,
| and will have run a data cleansing script beforehand
| anyway. But it's not experienced users who bounce from
| frameworks, it's newbies.
|
| Even seasoned developers will bounce away from frameworks
| or libraries - no matter if old dogs or the next hot thing
| - if the documentation isn't up to speed or simple, common
| tasks require wading through dozens of pages of
| documentation.
|
| Writing good documentation is hard enough, writing relevant
| "common usage examples" is even harder... but keeping them
| _up to date_ and _working_ is a rarely seen art.
|
| And the greatest art of all of it is logging. Soooo many
| libraries refuse to implement detailed structured logging
| in internal classes (despite particularly Java and PHP
| offering very powerful mechanisms), making it much more
| difficult to troubleshoot problems in the field.
| qmatch wrote:
| I personally believe TF1 was serving the need of its core
| users. It provided a compileable compute graph with autodiff,
| and you got very efficient training and inference from it.
| There was a steep learning curve, but if you got past it,
| things worked very very well. The distributed TF never really
| took off--it was buggy, and I think they made some wrong
| earlier bets in the design for performance reasons that they
| should have been sacrificed in favor of simplicity.
|
| I believe some years after the TF1 release, they realized the
| learning curve was too steep, they were losing users to
| PyTorch. I think also the Cloud team was attempting to sell
| customers on their amazing DL tech, which was falling flat.
| So they tried to keep the TF brand while totally changing the
| product under the hood by introducing imperative programming
| and gradient tapes. They killed TF1, upsetting those users,
| while not having a fully functioning TF2, all the while
| having plenty of documentation pointing to TF1 references
| that didn't work. Any new grad student made the simple choice
| of using a tool that was user-friendly and worked, which was
| PyTorch. And most old TF1 users hopped on the band wagon.
| tdullien wrote:
| I only remember 2015 TF and I was wondering: why would I use
| Python to assemble a computational graph when what I really
| want is to write code and then differentiate through it?
| Gazoche wrote:
| I'm no machine learning engineer but I've dabbled
| professionally with both frameworks a few years ago and the
| developer experience didn't even compare. The main issue with
| TF was that you could only chose between a powerful but
| incomprehensible, poorly documented [1], ultra-verbose and
| ever changing low-level API, and an abstraction layer (Keras)
| that was too high level to be really useful.
|
| Maybe TF has gotten better since but at the time it really
| felt like an internal tool that Google decided to just throw
| into the wild. By contrast PyTorch offered a more reasonable
| level of abstraction along with excellent API documentation
| and tutorials, so it's no wonder that machine learning
| engineers (who are generally more interested in the science
| of the model than the technical implementation) ended up
| favoring it.
|
| [1] The worst part was that Google only hosted the docs for
| the _latest_ version of TF, so if you were stuck on an older
| version (because, oh I don 't know, you wanted a stable
| environment to serve models in production), well tough luck.
| That certainly didn't gain TF any favors.
| HarHarVeryFunny wrote:
| The original TensorFlow had an API similar to the original
| Lua-based Torch (the predecessor to PyTorch) that required
| you to first build the network, node by node, then run it.
| PyTorch used a completely different, and much more convenient
| approach, where the network is built automatically for you
| just by running the forward pass code (and will then be used
| for the backward pass), using both provided node types and
| arbitrary NumPy compatible code. You're basically just
| writing differentiable code.
|
| This new PyTorch approach was eventually supported by
| TensorFlow as well ("immediate mode"), but the PyTorch
| approach was such a huge improvement that there had been an
| immediate shift by many developers from TF to PyTorch, and TF
| never seemed able to regain the momentum.
|
| TF also suffered from having a confusing array of alternate
| user libraries built on top of the core framework, none of
| which had great documentation, while PyTorch had a more
| focused approach and fantastic online support from the
| developer team.
| liuliu wrote:
| LuaTorch is eager-execution. The problem with LuaTorch is
| the GC. You cannot rely on traditional GC for good work,
| since each tensor is megabytes (at the time), now gigabytes
| large, you need to collect them aggressively rather than at
| intervals (Python's reference-counting system solves this
| issue, and of course, by "collecting", I don't mean free
| the memory (PyTorch has a simple slab allocator to manage
| CUDA memory)).
| HarHarVeryFunny wrote:
| With Lua Torch the model execution was eager, but you
| still had to construct the model graph beforehand - it
| wasn't "define by run" like PyTorch.
|
| Back in the day, having completed Andrew Ng's ML coursew,
| I then built my own C++ NN framework copying this graph-
| mode Lua Torch API. One of the nice things about
| explicitly building a graph was that my framework
| supported having the model generate a GraphViz DOT
| representation of itself so I could visualize it.
| liuliu wrote:
| Ah, I get what you mean now. I am mixing up the nn module
| and the tensor execution bits. (to be fair, the PyTorch
| nn module carries over many these quirks!).
| stared wrote:
| In 2018, I co-wrote a blog post with the inflammatory title
| "Don't use TensorFlow, try PyTorch instead"
| (https://news.ycombinator.com/item?id=17415321). As it gained
| traction here, it was changed to "Keras vs PyTorch" (some
| edgy things that work for a private blog are not good for a
| corporate one). Yet the initial title stuck, and you can see
| it resonated well with the crowd.
|
| TensorFlow (while a huge step on top of Theano) had issues
| with a strange API, mixing needlessly complex parts (even for
| the simplest layers) with magic-box-like optimization.
|
| There was Keras, which I liked and used before it was cool
| (when it still supported the Theano backend), and it was the
| right decision for TF to incorporate it as the default API.
| But it was 1-2 years too late.
|
| At the same time, I initially looked at PyTorch as some
| intern's summer project porting from Lua to Python. I
| expected an imitation of the original Torch. Yet the more it
| developed, the better it was, with (at least to my mind) the
| perfect level of abstraction. On the one hand, you can easily
| add two tensors, as if it were NumPy (and print its values in
| Python, which was impossible with TF at that time). On the
| other hand, you can wrap anything (from just a simple
| operation to a huge network) in an nn.Module. So it offered
| this natural hierarchical approach to deep learning. It
| offered building blocks that can be easily created, composed,
| debugged, and reused. It offered a natural way of picking the
| abstraction level you want to work with, so it worked well
| for industry and experimentation with novel architectures.
|
| So, while in 2016-2017 I was using Keras as the go-to for
| deep learning (https://p.migdal.pl/blog/2017/04/teaching-
| deep-learning/), in 2018 I saw the light of PyTorch and
| didn't feel a need to look back. In 2019, even for the intro,
| I used PyTorch (https://github.com/stared/thinking-in-
| tensors-writing-in-pyt...).
| stared wrote:
| Actually, I opened "Teaching deep learning" and smiled as I
| saw how it evolved:
|
| > There is a handful of popular deep learning libraries,
| including TensorFlow, Theano, Torch and Caffe. Each of them
| has Python interface (now also for Torch: PyTorch)
|
| > [...]
|
| > EDIT (July 2017): If you want a low-level framework,
| PyTorch may be the best way to start. It combines
| relatively brief and readable code (almost like Keras) but
| at the same time gives low-level access to all features
| (actually, more than TensorFlow).
|
| > EDIT (June 2018): In Keras or PyTorch as your first deep
| learning framework I discuss pros and cons of starting
| learning deep learning with each of them.
| probably_wrong wrote:
| I see good answers already, but here's a concrete example:
|
| In my University we had to decide between both libraries so,
| as a test, we decided to write a language model from scratch.
| The first minor problem with TF was that (if memory serves me
| right) you were supposed to declare your network "backwards"
| - instead of saying "A -> B -> C" you had to declare
| "C(B(A))". The major problem, however, was that there was no
| way to add debug messages - either your network worked or it
| didn't. To make matters worse, the "official" TF tutorial on
| how to write a Seq2Seq model didn't compile because the
| library had changed but the bug reports for that were met for
| years with "we are changing the API so we'll fix the example
| once we're done".
|
| PyTorch, by comparison, had the advantage of a Python-based
| interface - you simply defined classes like you always did
| (including debug statements!), connected them as variables,
| and that was that. So when I and my beginner colleagues had
| to decide which library to pick, "the one that's not a
| nightmare to debug" sounded much better than "the one that's
| more efficient if you have several billions training
| datapoints and a cluster". Me and my colleagues then went on
| to become professionals, and we all brought PyTorch with us.
| jszymborski wrote:
| The inability to use print debug to tell me the dimensions
| of my hidden states was 100% why TF was hard for me to use
| as a greenhorn MSc student.
|
| Another consequence of this was that PyTorch let you use
| regular old Python for logic flow.
| n_u wrote:
| This was also my experience. TensorFlow's model of
| constructing then evaluating a computation graph felt at
| odds with Python's principles. It made it extremely
| difficult to debug because _you couldn 't print tensors
| easily!_ It didn't feel like Python at all.
|
| Also the API changed constantly so examples from docs or
| open source repos wouldn't work.
|
| They also had that weird thing about all tensors having a
| unique global name. I remember I tried to evaluate a DQN
| network twice in the same script and it errored because of
| that.
|
| It's somewhat vindicating to see many people in this thread
| shared my frustrations. Considering the impact of these
| technologies I think a documentary about why TensorFlow
| failed and PyTorch took off would be a great watch.
| morshu9001 wrote:
| I just remember TF1 being super hard to use as a beginner and
| Google repeatedly insisting it had to be that way. People
| talk about the layering API, but it's more than that,
| everything about it was covered with sharp edges.
| htrp wrote:
| Greenfielding TF2.X and not maintaining 1.X compatibility
| rockinghigh wrote:
| First, the migration to 2.0 in 20219 to add eager mode
| support was horribly painful. Then, starting around 2.7,
| backward compatibility kept being broken. Not being able to
| load previously trained models with a new version of the
| library is wildly painful.
| intermerda wrote:
| Do you have experience in both JAX and PyTorch? Why do you
| prefer JAX?
| cl3misch wrote:
| Not OP. I prefer JAX for non-AI tasks in scientific computing
| because of the different mental model than PyTorch. In JAX,
| you think about functions and gradients of functions. In
| PyTorch you think about tensors which accumulate a gradient
| while being manipulated through functions. JAX just suits my
| way of thinking much better.
|
| I also like that jax.jit forces you to write "functional"
| functions free of side effects or inplace array updates. It
| might feel weird at first (and not every algorithm is suited
| for this style) but ultimately it leads to clearer and faster
| code.
|
| I am surprised that JIT in PyTorch gets so little attention.
| Maybe it's less impactful for PyTorch's usual usecase of
| large networks, as opposed to general scientific computing?
| imtringued wrote:
| >I also like that jax.jit forces you to write "functional"
| functions free of side effects or inplace array updates. It
| might feel weird at first (and not every algorithm is
| suited for this style) but ultimately it leads to clearer
| and faster code.
|
| It's not weird. It's actually the most natural way of doing
| things for me. You just write down your math equations as
| JAX and you're done.
| Majromax wrote:
| > You just write down your math equations as JAX and
| you're done.
|
| It's natural when your basic unit is a whole vector
| (tensor), manipulated by some linear algebra expression.
| It's less natural if your basic unit is an element of a
| vector.
|
| If you're solving sudoku, for example, the obvious
| 'update' is in-place.
|
| In-place updates are also often the right answer for
| performance reasons, such as writing the output of a
| .map() operation directly to the destination tensor. Jax
| leans _heavily_ on compile-time optimizations to turn the
| mathematically-nice code into computer-nice code, so the
| delta between eager-Jax and compiled-Jax is much larger
| than the delta between eager-Pytorch and compiled-
| Pytorch.
| havercosine wrote:
| Not Op. I have production / scale experience in PyTorch and
| toy/hobby experience in JAX. I wish I could have time time or
| liberty to use JAX more. It consists of small, orthogonal set
| of ideas that combine like lego blocks. I can attempt to
| reason from first principals about performance. The
| documentation is super readable and strives to make you
| understand things.
|
| JAX seems well engineered. One would argue so was TensorFlow.
| But ideas behind JAX were built outside Google (autograd) so
| it has struck right balance with being close to idiomatic
| Python / Numpy.
|
| PyTorch is where the tailwinds are, though. It is a wildly
| successful project which has acquired ton of code over the
| years. So it is little harder to figure out how something
| works (say torch-compile) from first principles.
| bjourne wrote:
| JAX seems great but the Google ghost is a lot to stomach. The
| risk of JAX getting axed or replaced with a JAX 2.0 -
| completely incompatible with existing JAX code - is not
| insignificant.
| chopete3 wrote:
| >>Every major AI company and hardware vendor are on a speed dial.
| This kind of power is really hard to give up. But curiosity
| ultimately won out in my head.
|
| A simple feeling has such a power. May he gets an opportunity to
| create one more powerful tool before retiring.
| Lord-Jobo wrote:
| If the curiosity dies, the entire thing crumbles.
|
| The second I stop being curious I stop finding new and exciting
| things to do, and I stop feeling fulfillment. It's one of the
| biggest signs saying "it's time to move on".
|
| I feel so strongly for the people who can't afford the luxury.
| Ive been there, unfulfilling jobs for years because bills or
| resume building.
| rubicon33 wrote:
| Gosh, given you've been there I have to ask what allowed you
| to get out of that and pursue only things that interest and
| excite you?
| perfmode wrote:
| Respect.
| mxkopy wrote:
| PyTorch is one of those tools that's so simple and easy to take
| apart that you feel like you might've been able to make it
| yourself. I can't imagine how much engineering effort was behind
| all those moments where I thought to myself, "of course it should
| work like that, how can it be any other way?"
| TechnicolorByte wrote:
| Can anyone recommend a technical overview describing the design
| decisions PyTorch made that led it to win out?
| huevosabio wrote:
| I don't know the full list, but back when it came out, TF
| felt like a crude set of bindings to the underlying c++/CUDA
| workhorse. PyTorch felt, in contrast, pythonic. It was much
| closer in feeling to numpy.
| puttycat wrote:
| I think it was mostly the eager evaluation that made it
| possible to debug every step in the network forward/backward
| passes. Tensorflow didn't have that at the time which made
| debugging practically impossible.
| GistNoesis wrote:
| The choice of the dynamic computation graph [1] of PyTorch
| made it easier to debug and implement, leading to higher
| adoption, even though running speed was initially slower (and
| therefore training cost higher).
|
| Other decisions follow from this one.
|
| Tensorflow started with static and had to move to dynamic at
| version 2.0, which broke everything. Fragmentation between
| tensorflow 1, tensorflow 2, keras, jax.
|
| Pytorch's compilation of this computation graph erased the
| remaining edge of Tensorflow.
|
| Is the battle over ? From a purely computational point,
| Pytorch solution is very far from optimal and billions of
| dollars of electricity and GPUs are burned every year, but
| major players are happy with circular deals to entrench their
| positions. So at the pace of current AI code development,
| probably one or two years before Pytorch is old history.
|
| [1] https://www.geeksforgeeks.org/deep-learning/dynamic-vs-
| stati...
| Uehreka wrote:
| > at the pace of current AI code development, probably one
| or two years before Pytorch is old history.
|
| Ehhh, I don't know about that.
|
| Sure, new AI techniques and new models are coming out
| pretty fast, but when I go to work with a new AI project,
| they're often using a version of PyTorch or CUDA from when
| the project began a year or two ago. It's been super
| annoying having to update projects to PyTorch 2.7.0 and
| CUDA 12.8 so I can run them on RTX 5000 series GPUs.
|
| All this to say: If PyTorch was going to be replaced in a
| year or two, we'd know the name of its killer by now, and
| they'd be the talk of HN. Not to mention that at this point
| all of the PhDs flooding into AI startups wrote their grad
| work in PyTorch, it has a lot of network lock-in that an
| upstart would have to overcome by being way better at
| something PyTorch can never be good at. I don't even know
| what that would be.
|
| Bear in mind that it took a few years for Tensorflow to die
| out due to lock in, and we all knew about PyTorch that
| whole time.
| GistNoesis wrote:
| > a lot of network lock-in that an upstart would have to
| overcome by being way better at something PyTorch can
| never be good at
|
| Higher level code migration to the newer framework, is
| going to 0. You ask your favorite agent (or intern) to
| port and check that the migration is exact. We already
| see this in the multitude of deep-learning frameworks.
|
| The day one optimization trick that PyTorch can't do but
| another framework can, which reduce your training cost
| 10x and PyTorch is going the way of the dodo.
|
| The day one architecture which can't be implemented in
| PyTorch get superior performance, and it's bye bye
| python.
|
| We see this with architectures which require real-time
| rendering like Gaussian Splatting (Instant Nerf), or the
| caching strategies for LLM sequence generation.
|
| Pytorch's has 3 main selling point :
|
| - Abstracting away the GPU (or device) specific code,
| which is due to nvidia's mess : custom optimized kernels,
| which you are forced to adapt to if you don't want to
| write custom kernels.
|
| If you don't mind writing optimized kernels, because the
| machine write them. Or if you don't need Cuda because you
| can't use Nvidia hardware because for example you are in
| China. Or if you use custom silicon, like Grok and need
| your own kernels anyway.
|
| - Automatic differentiation. It's one of its weak point,
| because they went for easy instead of optimal. They shut
| themselves off some architectures. Some language like
| Julia because of the dynamic low-level compilation can do
| things Pytorch won't even dream about, (but Julia has its
| own problems mainly related to memory allocations). Here
| with the pytorch's introduction of the "scan function"[2]
| we have made our way full circle to Theano,
| Tensorflow's/Keras ancestor, which is usually the pain
| point of the bad automatic differentiating strategy
| chosen by Pytorch.
|
| The optimal solution like all physics Phds which wrote
| simulations know, is writing custom adjoint code with
| 'Source Code Transformation' or symbolically : it's not
| hard but very tedious so it's now a great fit for your
| LLM (or intern or Phd candidate running 'student gradient
| descent') if you prove or check your gradient calculation
| is ok.
|
| - Cluster Orchestration and serialization : a model can
| be shared with less security risks than arbitrary source
| code, because you only share weights. A model can be
| splitted between machines dynamically. But this is also a
| big weakness because your code rust as you become
| dependent of versioning, you are locked with the specific
| version number your model was trained on.
|
| [2]
| "https://docs.pytorch.org/xla/master/features/scan.html
| morshu9001 wrote:
| What would stop PyTorch from implementing whatever
| optimization trick becomes important? Even if it requires
| a different API.
| GistNoesis wrote:
| There are two types of stops : soft stops, and hard
| stops.
|
| - Soft stops is when the dynamic graph computation
| overhead is too much, which mean you can still calculate,
| but if you were to write the function manually or with a
| better framework, you could be 10x faster.
|
| Typical example involve manually unrolling a loop. Or
| doing kernel fusion. Other typical example is when you
| have lots of small objects or need to do loops in python
| because it doesn't vectorize well. Or using the sparsity
| efficiently by ignoring the zeros.
|
| - Hard stop is when computing the function become
| impossible, because the memory needed to do the
| computation in a non optimal way explode. Some times you
| can get away with just writing customized kernels.
|
| The typical example where you can get away with it are
| custom attention layers.
|
| Typical example where you can't get away are physics
| simulations. Like for example the force is the gradient
| of energy, but you have n^2 interactions between the
| particles, so if you use anything more than 0 memory
| preserved during the forward pass per interaction, your
| memory consumption explode. And typically with things
| like Lagrangian or Hamiltonian neural networks where you
| look the discover dynamics of an energy conserving
| system, you need to be able differentiate at least three
| times in a row.
|
| There are also energy expanding stops, where you need to
| find work-around to make it work like if you want to have
| your parameters changing shape during the optimization
| process like learning point clouds of growing size, and
| they spread you thin so they won't be standardized.
| saagarjha wrote:
| Someone's got to prototype the next generation of
| architectures.
| mxkopy wrote:
| I'm not sure if such an overview exists, but when caffe2 was
| still a thing and JAX was a big contender dynamic vs static
| computational graphs seemed to be a major focus point for
| people ranking the frameworks.
| albanD wrote:
| I would highly recommend the podcast by ezyang
| https://pytorch-dev-podcast.simplecast.com/ for a collection
| of design discussions on the different parts of the library.
| BoredPositron wrote:
| The last few years must have been incredibly exhausting. Thanks
| for your work good luck and 73.
| vintermann wrote:
| That man has an infective enthusiasm. I remember the DCGAN paper
| inspired me to try getting the (Lua) Torch code to work, and I
| tried it on the Oxford flowers dataset early on. It worked
| surprisingly well, and Soumith Chintala even shared it around in
| social media, surprised at how well it worked on such a small
| dataset. Of course back then we didn't really appreciate the
| problem of mode collapse.
|
| Pytorch and old Lua Torch were a pleasure to work with compared
| to the contemporary Tensorflow. Lots of S.C's code was copied
| around liberally, it had its quirks (I remember the DCGAN code
| had a pretty odd way of doing parameter passing) but it was also
| really easy to understand and made random people like me feel
| like we had suddenly stumbled onto something crazy powerful
| (which we had!). It was wonderfully hackable.
| whitten wrote:
| What is the Oxford flowers dataset ? Where is it available ?
| rockinghigh wrote:
| https://www.robots.ox.ac.uk/~vgg/data/flowers/
| utopiah wrote:
| What I find most interesting with this is that it shows they
| believe there is nothing unique at Meta related to AI. There is
| no resource, people and computing power, that they can't get
| elsewhere for whatever they believe would be more interesting for
| them.
|
| I mention this because it feels analogous to military research,
| where people "dream" of how advanced the military is, how forward
| they are compared to public research... and yet, it seems to be a
| recurring myth they love to sustain.
|
| So the signal I get here is AI "labs" in BigTech have nothing
| worth waiting for around the corner, it's just more of the same
| and boring for people who stick there.
| rtpg wrote:
| I don't think that's the read? Guy says he wants to work on
| something small. If you want to work on something big you
| probably want to be in a big corp to have the resources to do
| the big thing.
|
| Also absolutely unknown if the "new thing" is AI-related at
| all!
| utopiah wrote:
| Well he left so whatever is coming next, AI related or not,
| "small" or not (small for them might be reaching just a
| million people, he wrote that he "lead the software layer
| that powers the entire AI industry." so his notion of scale
| is probably unlike mine, maybe yours too) is more exciting to
| him that whatever he could do next with all of Meta's
| resources.
|
| Edit: to be clear, I didn't mean to imply their next thing is
| AI related, solely that they obviously know more about AI at
| Meta than e.g. XR at Meta, just because that's their
| expertise.
| hombre_fatal wrote:
| Your assumption is a bad read because it only works if his
| set of life priorities contains _nothing else_ but
| maximizing his impact in the world of AI.
|
| If he has just one other priority in that set (which could
| still include a robotic min/max of AI impact), then your
| assumption fails.
| radicalbyte wrote:
| It reads to me as if he was the victim of office politics and
| decided to say "fuck it" instead of being transferred to
| something else within Meta.
| disgruntledphd2 wrote:
| > It reads to me as if he was the victim of office politics
| and decided to say "fuck it" instead of being transferred
| to something else within Meta.
|
| It looks like he'd already been transferred once (to Infra)
| and maybe didn't want to do it again.
| sheepscreek wrote:
| Pretty crazy/bizarre that a VP/Fellow engineer would have
| such little say in what they do at Meta. In my mind,
| companies would do everything possible to retain them. They
| are a special and rare breed.
| embedding-shape wrote:
| > If you want to work on something big you probably want to
| be in a big corp to have the resources to do the big thing.
|
| If anything, the reverse seems to be true, if you want to
| work on something big, you want to be in a small company,
| sufficiently funded, filled with great people, yet not "big",
| that's when "something big" seems to be more likely to
| happen.
|
| In contrast, as far as I can think, the bigger a company
| gets, the less likely they are to actually come up with
| "something big", it seems like most of the times you need
| (creative) constraints in order for the results to end up
| being actually innovative, otherwise you end up like IBM and
| Meta, throwing money on stuff and getting some results, but
| nothing really out of the ordinary considering what's
| happening elsewhere in their ecosystems.
| jansan wrote:
| > I mention this because it feels analogous to military
| research, where people "dream" of how advanced the military is,
| how forward they are compared to public research... and yet, it
| seems to be a recurring myth they love to sustain.
|
| I don't think that you can read this from the blog post at all,
| but it gives me a chuckle to think how the quest for AGI at
| Meta may be "The Men Who Stare at Goats" all over again.
| utopiah wrote:
| I'm totally speculating. I have no extra information there.
|
| It just makes me think of all the staff, technical staff,
| that left OpenAI recently. Altman was making grand claims
| about what was coming next.
|
| Well, we know what followed, namely I don't think any
| researcher who left knowing what was in the pipeline feel
| like they missed much in terms of access.
| utopiah wrote:
| Just checked BTW and ... premise looks fun but the score is
| too low
| https://www.rottentomatoes.com/m/men_who_stare_at_goats was
| it actually good as movie, not just the idea behind it?
| jansan wrote:
| It's more the idea behind it. Considering the great cast,
| the movie could have been much better.
| vintermann wrote:
| The non-fiction book behind it is probably better
| comparison than the film adaptation, if you think Meta
| are doing goat-staring (I don't think they're especially
| bad on this issue compared to their rivals).
| reactordev wrote:
| Negative, what you shave taken away is it's _the people_. He
| mentions standing up clusters. Small shops can't afford
| clusters. Ignore the technical aspect of this article and read
| it for what it's for. A thank you note to the _people_ he has
| worked with on amazing projects. Research in a bubble of 1
| isn't very useful. Research in a small team with Meta Budget is
| extremely useful. With _the right people_.
| nrjames wrote:
| If you can afford to support yourself, which I'm sure he can,
| there's a serenity to working on small projects that are nothin
| the public eye. It may simply be that he craves some quiet time
| that enables him to focus on his family and himself.
| ErroneousBosh wrote:
| > where people "dream" of how advanced the military is
|
| If you've ever worked on "advanced military grade" equipment,
| you'd know better.
|
| It tends to be what you'd euphemistically call "well-proven
| technology", built down to a price by the lowest bidder, by
| comparatively unskilled labour.
|
| The most shocking thing about the "captured" Russian drones is
| they use name-brand Raspberry Pis inside. I'm prepared to bet
| the American versions use whatever AliExpress crap is on
| special this week. The UK stuff definitely does.
| embedding-shape wrote:
| Isn't that exactly the point parent was trying to make? Maybe
| I misunderstood their comment, but it seems like you're
| repeating what they said.
| ErroneousBosh wrote:
| Post cup-of-tea (not NATO-spec, just black thanks, tea with
| just tea in it) I realise you're correct.
| utopiah wrote:
| You read it right, I think they agree. Maybe when I wrote
| "dream" in quotes the irony was lost.
| esseph wrote:
| I mean, these things do exist. There are always tons of big
| and small tech projects floating around in the special
| operations community. Cutting-edge sets of hybrid
| night/thermal vision. Classified helicopters. Hand-built
| rifled with custom cartridges. Classified medical tech.
| Advanced fixed wing aircraft with unique capabilities.
| Advanced dive gear. So on.
|
| "Big Army" doesn't see that stuff for decades, if ever, and
| mostly never due to cost. And I'm not even getting into
| classified submarine and nuclear tech, fixed wing drones and
| aircraft flying at night out of known test facilities, etc.
|
| There's tons of actually advance tech out there in military
| circles.
| compiler-guy wrote:
| Yeah. The DOD is enormous and definitely has your boring
| every day stuff, but tons of skunk works as r&d. just not
| very public. An organization that big has all kinds of
| nooks and crannies, so isn't really that monolithic.
| oxfordmale wrote:
| I think you might be reading a bit too much into this.
|
| He's been with Meta for 11 years and is likely in a very
| comfortable financial position, given the substantial stock
| options he's received over that time.
|
| He also mentioned the arrival of a new child, and it's well
| known that Meta's work-life balance isn't always ideal.
|
| On top of that, Meta, like many major tech companies, has been
| shifting its focus toward LLM-based AI, moving away from more
| traditional PyTorch use cases.
|
| Considering all of this, it seems like a natural time for him
| to move on and pursue new, more exciting opportunities.
| ralusek wrote:
| > toward LLM-based AI, moving away from more traditional
| PyTorch use cases
|
| Wait, are LLMs not built with PyTorch?
| gordonhart wrote:
| GP is likely saying that "building with AI" these days is
| mostly prompting pretrained models rather than training
| your own (using PyTorch).
| SV_BubbleTime wrote:
| Everyone is fine-tuning constantly though. Training an
| entire model in excess of a few billion parameters. It's
| pretty much on nobody's personal radar, you have a
| handful of well fundedgroups using pytorch to do that.
| The masses are still using pytorch, just on small
| training jobs.
|
| Building AI, and building with AI.
| gordonhart wrote:
| Fine-tuning is great for known, concrete use cases where
| you have the data in hand already, but how much of the
| industry does that actually cover? Managers have hated
| those use cases since the beginning of the deep learning
| era -- huge upfront cost for data collection, high
| latency cycles for training and validation, slow reaction
| speed to new requirements and conditions.
| pseudocomposer wrote:
| Llama and Candle are a lot more modern for these things
| than PyTorch/libtorch, though libtorch is still the de-
| facto standard.
| vlovich123 wrote:
| Pytorch is still pretty dominant in cloud hosting. I'm
| not aware of anyone not using it (usually by way of vLLM
| or similar). It's also completely dominant for training.
| I'm not aware of anyone using anything else.
|
| It's not dominant in terms of self-hosted where llama.cpp
| wins but there's also not really that much self-hosting
| going on (at least compared with the amount of requests
| that hosted models are serving)
| liuliu wrote:
| That's wrong. Llama.cpp / Candle doesn't offer anything
| on the table that PyTorch cannot do (design wise). What
| they offer is smaller deployment footprint.
|
| What's modern about LLM is the training infrastructure
| and single coordinator pattern, which PyTorch just
| started and inferior to many internal implementations:
| https://pytorch.org/blog/integration-idea-monarch/
| Anon1096 wrote:
| > On top of that, Meta, like many major tech companies, has
| been shifting its focus toward LLM-based AI, moving away from
| more traditional PyTorch use cases.
|
| This is very wrong. Meta is on the forefront of
| recommendation algorithms and that's all done with
| traditional ML models made using PyTorch.
| oxfordmale wrote:
| Meta is definitely at the forefront of recommendation
| algorithms built. However, the leadership team likely has
| shifted focus to LLMs.
| skybrian wrote:
| Some recommendations are uncanny, except that I don't want
| any of them in my Facebook news feed and no matter how
| often I select "never show me this feed again," it keeps
| trying.
| HarHarVeryFunny wrote:
| > What I find most interesting with this is that it shows they
| believe there is nothing unique at Meta related to AI
|
| Whether or not this is the case, I don't get this as being the
| reason for Sousmith leaving - it sounds as if he is just ready
| for a change.
|
| Still, it is noticeable that with many of the AI companies
| claiming that their version of "AGI" is just around the corner,
| developers and staff don't appear to be particularly excited
| about this (I assume they realize it is just hype, not some
| momentous advance around the corner), and leave to pursue
| different things, such as Mira Murati starting a fine-tuning
| company, Karpathy going back to education, others switching
| ship (typically from OpenAI to Anthropic), etc.
| moron4hire wrote:
| "Ready for change" is just the polite way to say, "I can't
| stand it here anymore. I'd rather roll the dice on a new
| place because reversion-to-mean means it's probably going to
| be better than whatever this has become."
|
| There are a lot of things I don't like about my current job,
| but not enough for it to make sense to gamble on a new place.
| It's easier to push for change from my current position than
| to count on any new place being any better.
|
| But if it gets worse and I do leave, I'll definitely be
| telling the interviewer, "I was just ready for a change."
| embedding-shape wrote:
| > is just the polite way to say
|
| Can be*, that's not necessarily always true. I've quit jobs
| plenty of times without having any plan for the future or
| particular drama-reason for leaving, just "It's not as fun
| here anymore, despite this being a great place to work",
| I'm sure I'm not the only one who does so.
|
| What I've never done though, is leaving a place without
| being 100% honest exactly why I'm leaving. I won't say "I
| was just ready for change" if that wasn't the reason, I
| have no reason not to be honest about why I'm leaving.
| ghaff wrote:
| I've generally had 10+ year tenures other than a return
| to school that was basically always in my plan and dot-
| bomb (leaving a company I wasn't really a fit with
| anyway). But, yeah, I've always been ready to move on at
| about that ten year point which is actually fairly long
| by a lot of people's standards in the tech industry.
|
| I do disagree though that, unless there's some actionable
| change that would specifically benefit you like more
| money, my answer outside of private conversations with
| people I know well is going to be some variant of "time
| for a change." Anything else just invites arguments and
| conversations I don't want to have.
| aprilthird2021 wrote:
| I do want to push back on this a little. People leave all
| the time for this "I wanna see what else is out there"
| especially at such senior levels and with so much financial
| security as he inevitably has from working at Meta for 11
| years. It is not always a gamble for many of them, and many
| of them are not so skeptical and cynical of other places
| they could go and bring their expertise
| pelagicAustral wrote:
| I think age plays an important part in the decision to move
| away from a place. I think in your 20s or very early 30s
| you have far more leeway to kind of go away and start
| again, but a lot of the hope to actually be able to find
| that unicorn workplace fades away as you approach your late
| 30s. Once into your 40s, depending on your trade, you're
| dead on arrival unless you successfully manage to rebrand
| yourself as a consultant, whatever the fuck that means.
| ghaff wrote:
| Age does factor in various ways. It can be "it's now or
| never" or it may be "I might as well hold on for a few"
| or something in between.
| assemblyman wrote:
| On the other hand, while I know nothing about Soumith, he
| clearly has enough financial runway (see my calc below) to
| not have to work again.
|
| As far as I know, we all get one life. If one can help it
| (modulo other constraints), one should not get trapped by
| prestige, achievement, short-term admiration by others,
| impact and external facing factors. To see an alternate
| reality, it helps to escape the bubble, for example, by
| spending time in a completely different culture or
| environment where no one knows or cares about what one did.
|
| I admire people taking such decisions. It's easy to be on
| autopilot in life. But, people who wear their success
| lightly are rare but more philosophically aware, in my
| opinion at least. I wish him good luck!
| Mars008 wrote:
| > see an alternate reality, it helps to escape the
| bubble, for example, by spending time in a completely
| different culture
|
| I'm at similar position now, need to make decision. The
| problem is after leaving IT world for a while it will be
| hard to get back. I'll have to change my life completely
| and discard all knowledge and expertise I have. That will
| be fun, interesting, eyes opening, etc, but no way back.
| mandevil wrote:
| I don't know you, don't know your situation, but this
| does not seem to match the experiences of many of my
| friends who left for a while and then came back. "Spent
| two years starting a restaurant" and "had to take care of
| my parents" were not blockers for getting another
| computer related job in due time. There are few truly
| irrevocable decisions in our life.
|
| Now, the current job market makes this significantly
| harder than it was in the 2010's, but that's floating
| over all of us- if your company does an Amazon tomorrow,
| would you get a job as nice as you currently have? Maybe,
| maybe not.
| ghaff wrote:
| In executive roles, your expertise really is in
| management acumen a lot of the time. But as an individual
| contributor--or adjacent--once you're out of a technical
| space for a few years, it's increasingly hard to get back
| in even if you've casually kept a finger in.
| Mars008 wrote:
| Exactly, the only way to stay current is to keep doing
| something at least half time. The good thing it doesn't
| have to be the same as prev job. Just keep brain working
| and learning.
| KaiserPro wrote:
| I don't think thats what is being said.
|
| Having friends who are at or near both FAIR and other AI parts
| of meta, reosurces are not the issue, anymore at least. (there
| had been a massive squeeze for the last two years though) But
| pytorch and FAIR use(d) a AWS based cluster. (however pytorch
| is used everywhere else inside facebook though. well not
| everywhere...)
|
| There is/are plenty of interesting things happening at big
| tech, and Meta specifically. If you like computer vision, then
| Meta is pretty much still the world leader. Much as it pains me
| to say it.
| GuB-42 wrote:
| About the military, from my limited experience, they are
| significantly behind the civilian state of the art, except for
| technology that has has few applications outside of the
| military, like stealth.
|
| In fact everything secret tends to be behind. Secrecy is a huge
| burden, and seriously limits all forms of collaboration.
|
| In addition, because military projects are often big and highly
| politicized you get all the inefficiencies that goes with that.
| Classification is also convenient for hiding screwups and
| corruption.
| FuriouslyAdrift wrote:
| Post Cold War, most militaries shifted to COTS and less
| boutique development. Turns out, you only need to put
| resources in a few places to stay ahead (stealth, sensing and
| measuring, space, hypersonics, drones, etc).
|
| It's MUCH cheaper and quicker.
| dmix wrote:
| I just assume all government software is poorly written by
| huge consulting companies, like the famous FBI one
| https://en.wikipedia.org/wiki/Virtual_Case_File
|
| > a 318-page report [...] said the SAIC software was
| incomplete, inadequate and so poorly designed that it would
| be essentially unusable under real-world conditions. Even in
| rudimentary tests, the system did not comply with basic
| requirements
|
| I figured the reason Palantir was so successful was because
| it was a SV software company instead of a defense contractor
| dabbling in IT or specialized government consultancy.
| shadowgovt wrote:
| The military doesn't have the luxury of things being
| unreliable. It puts a pressure on them that corporations
| don't necessarily have: they'd rather have a less-effective
| but proven system than a potentially-more-effective but
| riskier system (especially since each system they have comes
| with massive logistics support).
|
| Ironically, corporations can afford to take _more_ risks of
| failure (financially and project-wise) than militaries
| because failure for them doesn 't mean actual human death
| (and when it can, you see processes come in that look a lot
| more like military processes).
| Jtsummers wrote:
| It's actually the commercial/consumer side that gets more
| reliability than the military side.
|
| The military _should_ have very reliable systems, and they
| often know the point at which their systems will fail (MTBF
| calculations are easier to develop with their record
| keeping). However, the military _also_ has an almost
| unlimited budget and body count to keep just reliable
| enough things working much better than they should. It 's
| also really bad about actually competing companies against
| each other.
|
| The commercial sector, targeting consumers, is where you
| actually get reliable systems. Why? Because consumers will
| go towards either the cheapest option (reliability is
| replaced with ubiquity in the market, it's replaceable) or
| the more reliable but more expensive options. They
| (individuals) don't have an unlimited budget or unlimited
| time to maintain everything in their life. There's
| competition in the commercial world that's completely
| absent in the military world
|
| The two major exceptions are where COTS products have taken
| over (definitionally, DOD is using commercial, often
| consumer-targeted, products instead of military specific
| products) and special forces. Special forces often bypasses
| normal acquisitions processes and so ends up having a
| better chance to compete vendors against each other than
| other parts of the military.
|
| This doesn't mean everything the DOD procures through
| normal acquisitions is inherently unreliable, but
| reliability is only one of many factors and often only
| really discovered after selection and full-rate production
| has started. By that point, the DOD is committed to it for
| years to come. Each DOD procurement is separate enough from
| others that you don't even get huge opportunities for
| reuse. The F-35, to pick something from this century,
| didn't get components that were shared with other aircraft
| in the DOD fleet. It's almost all new, which means a lot of
| things were learned about its reliability after it started
| flying. It has new comms, new radar, new almost everything.
| Even the engine (though that probably used many
| subcomponents shared with other engines) was a new engine
| just used by the F-35.
| mandevil wrote:
| I spent a dozen years as a US defense contractor across a
| broad spectrum of places (from R&D for the future to working
| with E3's today), and worked at internet scale and start-up
| B2B stuff in the other dozen years of my working career.
|
| I think that the major difference about deployed military
| technologies- in contrast to both military R&D and the entire
| commercial side- is that they are, by and large, incredibly
| rock solid and reliable. If they aren't, they don't actually
| get used. It takes a lot of effort to get them that way. I
| remember once at a testing ground for our robot tanks of the
| far future, right next door was an outdoor test-track. And
| they were testing a kitchen trailer (a kitchen for ~200 men
| that can be towed by a Humvee). And they drove it around the
| track continuously for three weeks, stopping only long enough
| to change drivers/vehicles, and the four times a day they
| would halt and make 200 people meals, and then pack up and
| get back to driving. This was one of several reliability
| tests that the kitchen trailer had to get through before it
| was accepted for service.
|
| Our R&D stuff couldn't handle that (it needed 3-4 engineers
| to carefully monitor it at all times), but the stuff that
| needed to be in the hands of some random 18 year old with a
| two week training course had to be rock solid to use, do
| regular maintenance on, and fix, even when they were only
| getting four hours of sleep a night. If it wasn't up to that
| level, then the troops ended up ignoring it, leaving it
| behind when they went out to do their job. And by and large,
| from what I could tell, most of the stuff they had was that
| reliable. There were some cool things that we were doing in
| the R&D space, but we were a long way from that level.
| mandevil wrote:
| One thing I meant to add: this extensive testing- and the
| enormous amount of documentation/training materials
| necessary to take an 18 year old with average ASVAB scores
| and produce someone who can cook meals for 200 other
| soldiers on four hours of sleep a night- is both why
| military things cost so much, relative to commercial grade
| stuff, and why they don't get updated particularly often.
| Frequent software updates that change menus around play
| havoc with the detailed training powerpoints that the
| military relies on to produce those 18 year old tool
| operators.
|
| Secret Squirrel projects (which I was near but never read
| into) can get away with lower reliability because they can
| count on the users to be much better trained and prepared,
| though again, from my brief encounters with these sorts,
| they will ignore anything they don't trust to be completely
| reliable. Reliability matters far more than cutting edge
| for like 99.9% of military gear.
| groundzeros2015 wrote:
| Or he just takes for granted the resources he has.
| oofbey wrote:
| Nothing unique? Totally disagree.
|
| Unlimited compute resources aren't literally unique but there
| are only a small handful of places in the world that have that.
|
| Vast quantities of private data, especially text communications
| and images. Very few places have that. Coupled with a culture
| that puts zero privacy protections on that data. Even Google
| likes to think they're doing the right thing, so I think that
| makes Meta unique.
| aabhay wrote:
| For anyone that's curious, the underlying Torch library is also a
| joy to work with, as are the many other torch bindings. For
| example, Rust has tch and Burn which both work with libtorch.
|
| PyTorch of course has the benefit of being dynamically
| debuggable. Can't forget the first time I break pointed my
| pytorch model and wrote pytorch calls inside the terminal to
| inspect the behavior. That's still something I miss a lot now
| that I'm working with only "fast" compiled code.
| lysecret wrote:
| I wrote som truly awful code back in the day because of that
| but god it was glorious.
| irthomasthomas wrote:
| Counterfactual Regret Minimization irl
| numice wrote:
| I read one post on his blog and found that Adam Paszke reached
| out to the author and got an internship. I wonder if it was that
| easy to get an internship at FAIR. I thought that they hire only
| PhDs.
| vintermann wrote:
| I didn't know that. Soumith Chintala certainly paid it forward.
| He was very helpful and supportive of random strangers (like
| me!) in the early pytorch days. I count him with Andrej
| Karpathy and Chris Olah as one of the people who really made
| machine learning accessible to regular software engineers.
| SpaceManNabs wrote:
| Chris Olah is the goat.
|
| I reached out to him myself years ago and was surprised at
| getting a response.
|
| And the response was incredibly generous. I probably wouldn't
| have had the confidence to do my switch if it wasn't for
| Olah.
|
| And as I got further into this path, I learned that Olah had
| done the same for some of my mentors and colleagues.
|
| Every time Olah speaks, I listen.
| chrneu wrote:
| You can't do anything if you never try.
| nswizzle31 wrote:
| I was pretty involved in the PyTorch ecosystem in the early
| days around 2016 and Adam was nothing short of a genius and
| prolific developer whose contributions to the codebase and
| community were immense. I think he was like an undergrad in
| Poland at the time. My understanding is that his contributions
| came before the internship, but I don't know.
|
| My memory is that Souminth was really open to other people's
| contributions and questions, no matter their credentials. He
| was a great leader who felt approachable to the open-source
| community.
| gdiamos wrote:
| This is the end of an era. Amazing work soumith.
| hshdhdhehd wrote:
| Nice, that is the dream career!
| isusmelj wrote:
| Very proud as a Swiss that Soumith has a .ch domain!
| roflmaostc wrote:
| Probably because his first name is Chintala
| spprashant wrote:
| That d be his last name
| roflmaostc wrote:
| true haha
| sumedh wrote:
| His homepage says he wants to build a robot. So he is probably
| going to work with robots for his next role.
|
| He is an investor in Anthropic, didnt know you could do that
| working for Meta.
| geodel wrote:
| Could be Meta is quite liberal in this area. Or it could be one
| of those "For my friend, anything, for everyone else its
| corporate policy."
| ergocoder wrote:
| I wonder how much this guy has earned from Meta in total. Would
| it reach $100M?
| stephenlf wrote:
| Considering Meta was trying to Poach AI talent for $250M, I
| wouldn't be surprised if this guy has his own 8-figure income
| assemblyman wrote:
| If someone made $2 million/year over 10 years, after taxes,
| it would be $1 million (NYC has local taxes too). Let's say
| all of it was saved and invested in SP500 or Meta.
|
| SP500: tripled over 10 years i.e. ~12% a year. Reinvesting
| dividends gives ~14% a year
|
| Meta: 8x over 10 years i.e. ~23% a year.
|
| If growth was uniform over 10 years and compensation/savings
| was uniform over 10 years, total portfolio would be:
|
| ((1+r)^11-1)/r (geometric series since each year's
| contributions grow for different amount of times)
|
| 1 (this year) + (1+r) (previous year) + (1+r)^2 (previous-to-
| previous year) and so on
|
| SP500: 14% -> $23M Meta: 23% -> $38M
|
| Now, it's entirely possible, the compensation for a position
| like this runs into $10s of millions and one can easily
| account for non-uniform compensation.
|
| Even in NYC, actually even in Manhattan, $10M is more than
| comfortable for retirement. It lets you draw $300-$400K (3-4%
| per year adjusted for inflation annually). If one is taking a
| short sabbatical, then it's a no-brainer.
| lovecg wrote:
| This seems to assume unusual optimism or foresight, most
| people don't invest their life savings 100% into stocks and
| don't hold on to 100% of their company vests through ups
| and downs. You might as well say "assuming he put all his
| money in NVDA..."
| assemblyman wrote:
| It's a back-of-the-envelope calculation not a precise
| one. It doesn't take foresight to invest in SP500. DCAing
| (dollar-cost averaging) into an index fund is actually
| the recommended savings strategy with a short-term cash
| balance of 6 months-2 years of cash savings depending on
| your plans (sabbatical etc.), especially when one is
| decades away from retirement age.
|
| I only included meta because he works/worked at meta and
| it's not unusual for people to just leave their rsus in
| their accounts after they vested. I agree though that one
| shouldn't pick stocks that happened to explode (e.g.
| nvda).
|
| There are several unrealistic assumptions I did make:
|
| * Presumably when someone starts, they earn less than in
| recent years. He probably wasn't making huge amounts his
| first few years. Amounts invested in earlier years are
| smaller but have more time to compound and amounts
| invested in recent years are larger but have had less
| time to compound.
|
| * Returns aren't constant.
|
| * I pulled the $2 million/yr out of thin air. It could be
| $1 million/yr or even $10 million/yr. I have no idea what
| the overall head of a project like PyTorch would make.
|
| * Everyone's expenses are different. In and around NYC,
| one can live on $80k/year, $120-150k/year as well as as
| on $1 million/yr. I assumed zero since I wanted a nice
| even $1 million/yr savings. Maybe it was $500k/yr of
| savings in which case all the numbers should be halved.
|
| In any case, I can't see how one wouldn't end up with at
| least $10 million in a position like this with 10 years
| at meta. Unless one buys a $5 million unit in Manhattan
| and is burdened by a high mortgage.
| kleiba wrote:
| You forgot to thank Jurgen. /scnr
| jsrozner wrote:
| Is this also partially AI generated? What's with the repeated
| short phrases? Is this just everyone's style now?
| Cthulhu_ wrote:
| You're asking a lot of questions but are you willing to think
| about it? For one, no it's not "everyone's style" because you
| wouldn't have asked whether it was, you'd know.
| kelvinjps10 wrote:
| I felt it very human the writing on this post
| ishouldbework wrote:
| Look, I get that _some_ pages require javascript, but
| <style class="fallback">body{visibility:hidden;white-
| space:pre;font-family:monospace}</style>
|
| which is then unset by JS, with no <noscript> anywhere, is
| just... I just get white page.
|
| Changing it to <style
| class="fallback">body{white-space:pre-wrap;font-
| family:monospace}</style>
|
| gives perfectly readable web, so it seem bit... pointless.
| philipwhiuk wrote:
| There's no context around 'FAIR' - is it https://www.go-
| fair.org/?
| abracos wrote:
| it's https://ai.meta.com/research/
| aprotyas wrote:
| Facebook Artificial Intelligence Research, c.f.
| https://engineering.fb.com/category/ai-research/#:~:text=Art...
| overfeed wrote:
| Now retconned as "Fundamental AI Research" since the Meta
| rebrand.
| stuxnet79 wrote:
| Interesting, is Yann Lecun still heading FAIR? From the
| outside looking in, it seems like he is getting sidelined
| shevy-java wrote:
| To me it sounds as if he is trying to open a new chapter in his
| life. Good for him, but I wonder if everything was really as
| smooth as described. People often write how everything is always
| perfect on their blog. Well - could be. But it could also be that
| not everything was perfect but no description followed on the
| blog.
| ninjagoo wrote:
| Soumith's 2nd release?
| https://github.com/pytorch/pytorch/releases/tag/v0.1.1
|
| Also, looking at the contribution history for a long career is
| very interesting; reflects the changing roles over time
| https://github.com/soumith
| xpe wrote:
| It is notable (but perhaps not surprising) that this is mostly
| about the people and the work itself. The writing is silent on
| the downstream impacts on the world. In contrast, there are
| fields (global development, medicine, etc.) where people tend to
| focus on the impact on humanity (especially when reaching a
| milestone in their career).
| CommenterPerson wrote:
| Firstly, Good work.
|
| Ironical but one HN front page item today is this: "Meta
| projected 10% of 2024 revenue came from scams and banned goods,
| Reuters reports"
|
| Glad you're leaving, hopefully you're in a good place
| financially. Take a page from Bill Gates and work on something
| that attempts to improve society. Stay away from surveillance
| capitalism and enshittification.
| odyssey7 wrote:
| > It's taught in classrooms from MIT to rural India. The tools I
| dreamed about making accessible? They are. The barrier to entry I
| wanted to lower? It's almost gone.
|
| I have an ironic sense that there are classrooms in rural India
| with better pedagogy and lower barriers to entry than some of our
| elite engineering programs.
| john01dav wrote:
| Many elite engineering programs in the United States (I don't
| know if this is what you mean by "our") are elite solely due to
| social status (rankings need to publish rankings that feel
| right, or they're ignored, and they accept bribes to rank
| specific programs) and research output, with little to do with
| quality of pedagogy. Instead, pedagogy is generally poor
| because the elite researchers usually view teaching as a chore
| and many don't have any real skill in it either.
| crazygringo wrote:
| I thought traditional Indian pedagogy was heavily criticized
| for being heavily based on rote memorization over conceptual
| understanding or problem solving. And also being heavily
| hierarchical and exam-oriented.
|
| This isn't to say engineering programs in the US can't be
| improved, but there seems to be widespread consensus that they
| don't suffer from the kinds of serious problems that ones in
| India commonly do.
| mmaunder wrote:
| " What's next for me? Something small. Something new. Something I
| don't fully understand yet. Something uncomfortable. I could have
| moved to something else inside Meta. But I needed to know what's
| out there. I needed to do something small again. I couldn't live
| with the counterfactual regret of never trying something outside
| Meta."
|
| Shades of Siddhartha. Back to the forest.
| galoisscobi wrote:
| Ah yes, shades of Siddhartha. I almost forgot about the part
| where he worked for a megacorp that was ripping society's
| social fabric apart and wanted to do something else for a
| while.
| abustamam wrote:
| I don't think he was involved in that though.
| dbgrman wrote:
| He was. he didn't just parachute in Meta to start working
| on PyTorch. he worked in many areas of the product and a
| member of the senior technical staff, was knowledgable
| about many aspects of the company.
| cs702 wrote:
| Many of the comments here are judging PyTorch _in hindsight_ ,
| which is unfair.
|
| When Soumith Chintala co-launched PyTorch, and for many years
| after, the alternatives for fast, interactive, convenient
| development were _much worse_. There was no Jax.
|
| Every single AI researcher I know, including me, who _tried_
| PyTorch back then _immediately wanted to switch to it, because it
| was so much better_. Andrej Karpathy described what PyTorch felt
| like back then when he tweeted, on May 2017, "I've been using
| PyTorch a few months now and I've never felt better. I have more
| energy. My skin is clearer. My eyesight has improved."[a]
|
| THANK YOU SOUMITH for your hard work over all these years! Your
| hard work has made a difference for a huge number of people,
| including many of us here on HN.
|
| We wish you success in your future endeavors, whatever they turn
| out to be!
|
| Please ignore all the petty criticism.
|
| ---
|
| [a] https://x.com/karpathy/status/868178954032513024
| golly_ned wrote:
| There was Chainer, which originated the define-by-run model
| that characterized PyTorch's effectiveness. It was developed by
| a much smaller, much less influential company in Japan. Early
| PyTorch is transparent about the debt owed to Chainer.
| maxc01 wrote:
| Yes, exactly--not many people know about Chainer nowadays.
| Back in 2016, PyTorch's interface was actually inferior to
| Chainer's, and I think Chainer's design was really ahead of
| its time.
| NalNezumi wrote:
| The company is called preferred networks, and they're still
| around, and have some branched off subsidiaries too.
| cs702 wrote:
| Thanks. Yes, I remember Chainer, but only vaguely. I kinda
| remember looking at it, but not actually using it.
|
| My recollection is that when I looked at Chainer back then,
| it didn't offer a comprehensive library of preexisting
| components for deep learning. When I tried PyTorch, on the
| other hand, I vividly remember it as _already_ having lots of
| prebuilt components (common layers, activation functions,
| etc.) in `torch.nn`, so it was easier and faster to get
| going.
|
| These memories are vague, so I could be wrong.
| Teodolfo wrote:
| PyTorch was partly inspired by the python Autograd library
| (circa 2015 [1]) to the point where they called their autodiff
| [2] system "autograd" [3]. Jax is the direct successor of the
| Autograd library and several of the Autograd developers work on
| Jax to this day. Of course, for that matter, PyTorch author
| Adam Paszke is currently on the JAX team and seems to work on
| JAX and Dex these days.
|
| [1] https://pypi.org/project/autograd/#history
|
| [2]
| https://www.cs.toronto.edu/~rgrosse/courses/csc421_2019/read...
|
| [2]
| https://web.archive.org/web/20170422051747/http://pytorch.or...
| cs702 wrote:
| Yes, PyTorch borrowed from Autograd, Chainer, etc.
|
| ...but PyTorch felt friendlier and more Pythonic, and it came
| with a comprehensive library of prebuilt components for deep
| learning in `torch.nn`.
|
| See https://news.ycombinator.com/item?id=45848768
| jmward01 wrote:
| All I can say is 'thanks!'. It does take a team, and a community,
| but individuals matter. I use pytorch daily, it has made it
| possible for me to play with ideas that I would have only dreamed
| of. It is a big part of my life so, thanks for your contributions
| and best of luck on the next thing!
| RobRivera wrote:
| Good luck, good job, and may more fantastical journeys lay ahead.
| jurschreuder wrote:
| It's just a python wrapper around Torch, an open source C program
| by a Swiss non-profit, so META could compete with TensorFlow of
| Google.
|
| I don't know why this is celebrated so much, a big company
| rebranding a non-profit for profit.
|
| But I guess that's the norm for AI now.
| whymauri wrote:
| Wait, PyTorch and the ecosystem are much more than just that.
| You can't be serious?
| jurschreuder wrote:
| It's just Torch for idiots
| w10-1 wrote:
| Given the technical achievements and industry impact, I'm struck
| that his emphasis is on the people, and in particular growing
| people to take on challenges. His influence might lie as much in
| what they do. The connection linking them seems to be building
| curiosity and wonder into delight for users.
|
| If there's a soul to silicon valley, that's it. However many
| jerks and political/power players I encounter, I remain inspired
| by the few like this.
| mupuff1234 wrote:
| https://news.ycombinator.com/item?id=45847465
|
| Seems relevant.
| jeffreysmith wrote:
| I'm one of the many people who Soumith hired to Meta and PyTorch.
| I had the privilege of working on PyTorch with him and lots of
| the folks on this post.
|
| As his longtime colleague, the one thing I would want people to
| know about him and this decision is that Soumith has always
| viewed PyTorch as a community project. He consistently celebrated
| the contributions of his co-creators Adam and Sam, and he
| extended the same view towards the Yangqing and the Caffe2 crew
| that we merged into PyTorch. At the very beginning, by Soumith's
| highly intentional design, PyTorch was aimed at being truly
| developed by and for the AI research community and for many years
| that was the key way in which we grew the framework, FB PT team,
| and the wider community. At every single stage of PT's lifecycle,
| he always ensured that our conception of PT and its community
| grew to include and celebrate the new people and organizations
| growing what was possible with PT. He's an incredible talent
| magnet, and thus more and more smart people kept dedicating their
| blood, sweat, and tears to making PT bigger and better for more
| people.
|
| I've worked with some very well known and highly compensated
| leaders in tech, but *no one* has done the job he has done with
| ameliorating a bus factor problem with his baby. PT has a unique
| level of broad support that few other open source technology can
| reach. In a world of unbounded AI salaries, people who want to
| move AI research methods forward still freely give their time and
| attention to PyTorch and its ecosystem. It's the great lever of
| this era of AI that is moving the world, *due in large part* to
| the strength of the community he fostered and can now let
| continue without his direct involvement.
|
| His departure is the end of an era, but it's also operationally a
| true non-event. PyTorch is going strong and can afford to let one
| of its creators retire from stewardship. This is precisely what
| success looks like in open source software.
|
| He deserves our congratulations and our thanks. Enjoy your PT
| retirement, man.
| casualscience wrote:
| Also worked with Soumith. The man is a legend, moves mountains
| and completely changed the course of my career because he liked
| something I wrote. No arrogance, no politics, just an extremely
| down to earth and chill guy who elevates everyone around him.
|
| Hope him the best!
| sumedh wrote:
| What did you write?
| warbaker wrote:
| I just want to say: thank you. It is hard to overstate how much
| Pytorch has accelerated ML/AI development, across the board.
| livelaughlove69 wrote:
| What an amazing impact this man has had. He seems like a great
| guy too. I don't know him at all personally but his replies
| online have always been so nice. I really wish him all the best
| with whatever he does with his time next
___________________________________________________________________
(page generated 2025-11-07 23:00 UTC)