hngopher.com

       [HN Gopher] PyTorch 2.0
       ___________________________________________________________________
        
       PyTorch 2.0
        
       Author : lairv
       Score  : 240 points
       Date   : 2022-12-02 16:17 UTC (6 hours ago)
        
 (HTM) web link (pytorch.org)
 (TXT) w3m dump (pytorch.org)
        
       | singularity2001 wrote:
       | Not available for Mac M1 yet (?)
        
       | amelius wrote:
       | One thing I'm noticing lately is that these DL libraries and
       | their supporting libraries are getting unwieldy large, and
       | difficult to version-manage.
       | 
       | In my mind, DL is doing little more than performing some inner-
       | products between tensors, so I'm curious why we should have
       | libraries such as libcudnn, libcudart, libcublas, torch, etc.
       | containing gigabytes of executable code. I just checked and I
       | have 2.4GB (!!) of cuda-related libraries on my system, and this
       | doesn't even include torch.
       | 
       | Also, going to a newer minor version of e.g. libcudnn might cause
       | your torch installation to break. Why isn't this backward
       | compatible?
        
         | modeless wrote:
         | The complexity of deep learning algorithms is low but the
         | complexity of the _hardware_ is high. The problem solved by
         | these gigabytes of libraries is getting peak utilization for
         | simple algorithms on complex and varied hardware.
         | 
         | CuDNN is enormous because it embeds precompiled binaries of
         | many different compute kernels, times many variations of each
         | kernel specialized for different data sizes and/or fused with
         | other kernels, and again times several different GPU
         | architectures.
         | 
         | If you don't care about getting peak utilization of your
         | hardware you can run state of the art neural nets with a truly
         | tiny amount of code. The algorithms are so simple you don't
         | even need any libraries, it's easy enough to write everything
         | from scratch even in low level languages. It's a fun exercise.
         | But it will be many orders of magnitude less efficient so
         | you'll have to wait a really long time for it to run.
        
           | amelius wrote:
           | Ok, is there any way to trim down the amount of code used
           | without reducing the performance of my particular
           | application, and my particular machine?
           | 
           | I have the feeling that it's an all-or-nothing proposition.
           | Either you have a simple CPU-only algorithm, or you have
           | several gigabytes of libraries you don't really need.
           | 
           | Also, in some applications I would be willing to give up 10%
           | of performance if I could reclaim 90% of space.
        
             | modeless wrote:
             | CuDNN is only for Nvidia GPUs, and those machines generally
             | have decent sized disks and decent network connections so
             | no nobody cares about a few GBs of libraries. There are
             | alternatives to using CuDNN with much smaller binary size.
             | Maybe they can match or beat it or maybe not, depending on
             | your model and hardware. But you'll have to do your own
             | work to switch to them, since most people are happy enough
             | with CuDNN for now.
             | 
             | The real problem with deep learning on Nvidia is the Linux
             | driver situation. Ugh. Hopefully one day they will come to
             | their senses.
        
               | amelius wrote:
               | It's not just disk size. Also memory size, and loading
               | speed.
               | 
               | Yes, I agree about the driver situation.
        
               | modeless wrote:
               | The disk size of the shared library is not indicative of
               | RAM usage. Shared libraries are memory mapped with demand
               | paging. Only the actually used parts of the library will
               | be loaded into RAM.
        
           | claytonjy wrote:
           | While I think you raise important points about the dominance
           | of hardware optimizations, I think you're massively
           | overstating the simplicity of the algorithms.
           | 
           | Sure, it's easy to code the forward pass of a fully connected
           | neural network, but writing code to train a useful modern
           | architecture is a very different endeavor.
        
             | brrrrrm wrote:
             | I disagree, the burden is almost exclusively maintaining
             | fast implementations of primitive operators for all
             | hardware. These ML libraries are collections of pure
             | functions with minimal interfaces. There's very little code
             | interdependence and it's not particularly difficult to
             | implement modern algorithms to train networks.
             | 
             | full stable diffusion in <800 lines: https://github.com/geo
             | hot/tinygrad/blob/4fb97b8de0e210cc3778...
             | 
             | autograd in <30 lines: https://github.com/geohot/tinygrad/b
             | lob/4fb97b8de0e210cc3778...
             | 
             | Adam in <20 lines: https://github.com/geohot/tinygrad/blob/
             | 4fb97b8de0e210cc3778...
        
             | modeless wrote:
             | I disagree. I mean it's not trivial but it is completely
             | within reach of a single person. The only part you'd really
             | need to lean on libraries for would be data loading (e.g.
             | jpeg). The core neural net stuff really is not that
             | complex, even in the latest architectures like transformers
             | or diffusion models. Look at stuff like George Hotz's
             | tinygrad or Andrej Karpathy's makemore.
        
       | poorman wrote:
       | Sill no support for/from Apple Silicon?
        
         | cube2222 wrote:
         | It's supported since 1.12[0], no?
         | 
         | [0]: https://pytorch.org/blog/introducing-accelerated-pytorch-
         | tra...
        
           | chadykamar wrote:
           | It's also officially in beta as of 1.13
           | https://pytorch.org/blog/PyTorch-1.13-release/#beta-
           | support-...
        
         | [deleted]
        
       | hintymad wrote:
       | A big lesson I learned from PyTorch vs other frameworks is that
       | productivity trumps incremental performance improvement. Both
       | Caffe and MXNet marketed themselves for being fast, yet
       | apparently being faster here and here by some percentage simply
       | didn't matter that much. On the other hand, once we make a system
       | work and make it popular, the community will close the
       | performance gap sooner than competitors expect.
       | 
       | Another lesson is probably old but worth repeating: investment
       | and professional polishing matters to open source projects. Meta
       | reportedly had more than 300 (?) people working on PyTorch,
       | helping the community resolve issues, producing tons of high-
       | quality documentation and libraries, and marketing themselves
       | nicely in all kinds of conferences and media.
        
         | wging wrote:
         | > Meta reportedly had more than 300 (?) people working on
         | PyTorch,
         | 
         | How much did this change after the big Meta layoffs? I think I
         | know people who are no longer there, but I haven't talked to
         | them about it yet.
        
           | PartiallyTyped wrote:
           | NB: PyTorch is now under the Linux Foundation.
           | 
           | https://news.ycombinator.com/item?id=32810976
        
           | not2b wrote:
           | Because Meta open sourced it, and because PyTorch caught on,
           | hopefully some of those laid off people can continue to work
           | on it and also market themselves as PyTorch experts.
        
         | SleekEagle wrote:
         | Exactly, especially in the age of ridiculously rapid
         | development that we have found ourselves in over the past few
         | years. This is exactly why TensorFlow is dying
        
           | la_fayette wrote:
           | :) it still is impossible to bring a cnn rnn network in
           | pytorch to mobile, which works fine with tflite...
        
             | levesque wrote:
             | I don't think that's a problem for the vast majority of
             | pytorch users
        
         | vikinghckr wrote:
         | What I really find interesting here is that PyTorch, a library
         | maintained by Facebook, is winning the marketshare and
         | mindshare due to clean API, whereas Tensorflow, maintained by
         | Google, is losing due to inferior API. In general, Google as a
         | company emphasizes code quality and best practices far more
         | than Facebook. But the story was reversed here.
        
           | miohtama wrote:
           | An interesting question. Maybe Google lacks the culture to
           | work with external developers (think Android) while Facebook
           | has some of it.
        
           | version_five wrote:
           | Pure speculation but isn't Google the king of "beta" releases
           | that demonstrate a concept but mostly end up with a 90%
           | solution that doesn't meet the mark for a finished project.
           | 
           | I'm sure there are other factors as well, but it doesn't
           | surprise me that Google made something that started off
           | promising and then underdelivered
        
       | marban wrote:
       | From what I see, still no 3.11 support -- Same for Tensorflow
       | which won't ship before Q1 23.
        
         | joelfried wrote:
         | Who is still supporting Windows 3.11?
        
           | sairahul82 wrote:
           | It's python 3.11 :)
        
             | robertlagrant wrote:
             | Guido works for Microsoft, so Python 95 will be out soon.
        
               | sgt wrote:
               | Speaking of Guido, be sure to check out the podcast with
               | Lex Fridman. Guido is such a down to Earth guy.
        
               | itgoon wrote:
               | I'm going to skip Python ME.
        
         | minimaxir wrote:
         | It's not a huge deal, as the speed improvements in 3.11 likely
         | wouldn't trickle down at the core PyTorch level.
        
           | black3r wrote:
           | but if you do some data pre-processing or post-processing in
           | python, that would be affected by 3.11 speed improvements...
           | or if you have a pytorch based model integrated into a bigger
           | application as just one of many features, there are still
           | some devs who prefer monoliths over microservices....
        
       | ansk wrote:
       | So this looks like a further convergence of the tensorflow and
       | pytorch APIs (the lower-level APIs at least). Tensorflow was
       | designed with compilable graphs as the primary execution model
       | and as part of their 2.0 release, they redesigned the APIs to
       | encompass eager execution as well. Pytorch is coming from the
       | other end, with eager execution being the default and now
       | emphasizing improved tools for graph compilation in their 2.0
       | release. The key differentiator going forward seems to be that
       | tensorflow is using XLA as their compiler and pytorch is
       | developing their own toolset for compilation. As someone who
       | cares far more about performance than API ergonomics, the quality
       | of the compiler is the main selling point for me and I'll gladly
       | switch over to whatever framework is winning in the compiler
       | race. Does anyone know of any benchmarks comparing the
       | performance of pytorch's compilers with XLA?
        
         | amelius wrote:
         | How much performance can be squeezed from going from the plain
         | python API to the graph-based solution, typically?
        
           | ansk wrote:
           | This varies quite a bit based on the type of model. The
           | graph-based approach has two benefits: (1) removing overhead
           | from executing python between operations and (2) enabling
           | compilers to make optimizations based on the graph structure.
           | The benefit from (1) is relatively modest for models which
           | run a few large ops in series (e.g. image classifiers and
           | most feedforward models) but can be significant for models
           | with many ops that are smaller and not necessarily wired up
           | sequentially (e.g. RNNs). In my experience, I've had RNN
           | models run several times faster in tensorflow's graph mode
           | than in its eager mode. The benefit from (2) is significant
           | in almost any model since the typical "layer" building block
           | (matmul/conv/einsum->bias->activation) can be fused together
           | which improves throughput on GPUs. In my experience
           | compilation can offer performance increases from 1.5x to 3x,
           | but I don't know if this holds generally. Also note that the
           | distinction between graph and eager execution can be somewhat
           | blurry, as even an "eager" API could be calling a fused layer
           | under the hood.
        
           | PartiallyTyped wrote:
           | It depends.. Using Jax to compile down to XLA, I often saw >2
           | orders of magnitude improvements. This however was roughly 6
           | months ago.
        
       | chazeon wrote:
       | How is PyTorch compares to JAX and its stack?
        
         | PartiallyTyped wrote:
         | I find that Jax tends to result in messy code unless you build
         | good abstractions. I personally don't like Flax and Haiku, I
         | prefer stax and Equinox as they are more transparent on what is
         | happening, feel a lot less like magic, and more pythonic
         | (explicit is better than implicit etc).
         | 
         | PyTorch is far more friendly for deep learning stuff, but
         | sometimes all you want is pure numerical computations that can
         | be vmapped across tensors, and this is where jax shines imho.
         | 
         | Personal Example: I needed to sample a bunch of datapoints,
         | make distributions out of them, sample, and then compute the
         | density of each sample across distributions. Doing this with
         | pytorch was rather slow, I was probably doing something wrong
         | with vectorization and broadcasting, but I didn't have the time
         | to figure it out.
         | 
         | With jax, I wrote a function that produces the samples, then I
         | vmapped the evaluation of a sample across all distributions,
         | then vmapped over all samples. Took a couple of minutes to
         | implement and seconds to execute.
         | 
         | PyTorch also has the advantage of a far more mature ecosystem,
         | libraries like Lightning, Accelerate, Transformers, Evaluate,
         | and so on make building models a breeze.
        
           | zone411 wrote:
           | > Personal Example: I needed to sample a bunch of datapoints,
           | make distributions out of them, sample, and then compute the
           | density of each sample across distributions. Doing this with
           | pytorch was rather slow, I was probably doing something wrong
           | with vectorization and broadcasting, but I didn't have the
           | time to figure it out.
           | 
           | You probably were not doing anything wrong. I spent a lot of
           | time trying to be clever in order to parallelize things like
           | this and it just wasn't possible without doing CUDA
           | extensions. But it is now! PyTorch now has vmap through
           | functorch and it works.
        
         | staunch wrote:
         | PyTorch and JAX are both open-source libraries for developing
         | machine learning models, but they have some important
         | differences. PyTorch is a more general-purpose library that
         | provides a wide range of functionalities for developing and
         | training machine learning models. It also has strong support
         | for deep learning and is used by many researchers and companies
         | in production environments.
         | 
         | JAX, on the other hand, is designed specifically for high-
         | performance machine learning research. It is built on top of
         | the popular NumPy library and provides a set of tools for
         | creating, optimizing, and executing machine learning algorithms
         | with high performance. JAX also integrates with the popular
         | Autograd library, which allows users to automatically
         | differentiate functions for training machine learning models.
         | 
         | Overall, the choice between PyTorch and JAX will depend on the
         | specific requirements and goals of the project. PyTorch is a
         | good choice for general-purpose machine learning development
         | and is widely used in industry, while JAX is a better choice
         | for high-performance research and experimentation.
         | 
         | https://chat.openai.com/chat
        
           | satvikpendem wrote:
           | It seems to use the same type of template for comparisons:
           | 
           | React and Vue are both JavaScript libraries for building user
           | interfaces. The main difference between the two is that React
           | is developed and maintained by Facebook, while Vue is an
           | independent open-source project.
           | 
           | React uses a virtual DOM (Document Object Model) to update
           | the rendered components efficiently, while Vue uses a more
           | intuitive and straightforward approach to rendering
           | components. This makes Vue easier to learn and use,
           | especially for developers who are new to front-end
           | development.
           | 
           | React also has a larger community and ecosystem, with a wider
           | range of available libraries and tools. This can make it a
           | better choice for larger, more complex projects, while Vue
           | may be a better fit for smaller projects or teams that prefer
           | a more lightweight and flexible approach.
           | 
           | Overall, the choice between React and Vue will depend on your
           | specific project requirements and personal preferences. It's
           | worth trying out both to see which one works better for you.
        
           | whimsicalism wrote:
           | I was reading this and thinking it was a pretty terrible
           | answer - glad it is just generated by an AI and not you
           | personally so I'm not insulting you.
           | 
           | JAX is basically numpy on steroids and lets you do a lot of
           | non-standard things (like a differentiable physics simulation
           | or something) that would be harder with Pytorch.
           | 
           | They are both "high-performance."
           | 
           | Pytorch is more geared towards traditional deep learning and
           | has the utilities and idioms to support it.
        
             | brap wrote:
             | I'm not sure why, but I realized it was AI from the very
             | first sentence, not exaggerating. It's just not something
             | someone on HN would write.
        
               | eastWestMath wrote:
               | It reminded me of the sort of lazy Wikipedia
               | regurgitation that a lot of undergrads used to give when
               | I was teaching. So it is a bit jarring to see a response
               | like that in a non-compulsory setting.
        
               | windsignaling wrote:
               | Yup. Reminds me of an article you'd find in the top 10
               | Google search results...
        
             | dekhn wrote:
             | jax is not numpy on steroids. jax is "use python
             | idiomatically to generate optimized XLA code for evaluating
             | functions both forward and backward."
        
               | whimsicalism wrote:
               | Probably the primary use of jax is `jax.numpy` which is
               | XLA accelerated and differentiable numpy.
               | 
               | I'll admit that saying "basically numpy on steroids"
               | might have been an overreduction. It is a system for
               | function transformations that is built on XLA and
               | oriented towards science & ML applications.
               | 
               | It's not just me saying stuff like this.
               | 
               | Francois Chollet (creator of Keras): "[jax is] basically
               | Numpy with gradients. And it can compile to XLA, for
               | strong GPU/TPU acceleration. It's an ideal fit for
               | researchers who want maximum flexibility when
               | implementing new ideas from scratch."
        
               | dekhn wrote:
               | Yes- and that gradient part is a key detail that makes it
               | more than "numpy on steroids". numpy on steroids would be
               | a hardware accelerator that took numpy calls and made
               | them return more quickly, but without the command-and-
               | control and compile-python-to-xla aspects.
        
               | whimsicalism wrote:
               | Well clearly I meant steroids of the gradient-developing
               | variety.
               | 
               | I think you are being far too pedantic about what a
               | biological compound would analogously do to a software
               | library, especially given that I mention the
               | differentiability property in the same sentence you are
               | taking issue with.
        
               | dekhn wrote:
               | OK, actually as long as it's gradient-developing
               | steroids, I'll allow it.
        
             | uoaei wrote:
             | Can someone comment more on what makes JAX that much better
             | for differentiable simulations than PyTorch?
             | 
             | I'm working on a new module for work and none of my
             | colleagues have much experience developing ML per se. I'm
             | trying to decide whether to force their hand by
             | implementing v1 in PyTorch or JAX and differentiable
             | physics simulations is a likely future use case. Why is
             | PyTorch harder?
        
               | patrickkidger wrote:
               | At least prior to this announcement: JAX was much faster
               | than PyTorch for differentiable physics. (Better JIT
               | compiler; reduced Python-level overhead.)
               | 
               | E.g for numerical ODE simulation, I've found that Diffrax
               | (https://github.com/patrick-kidger/diffrax) is ~100 times
               | faster than torchdiffeq on the forward pass. The backward
               | pass is much closer, and for this Diffrax is about 1.5
               | times faster.
               | 
               | It remains to be seen how PyTorch 2.0 will compare, of
               | course!
               | 
               | Right now my job is actually building out the scientific
               | computing ecosystem in JAX, so feel free to ping me with
               | any other questions.
        
               | adgjlsfhk1 wrote:
               | If you care about performance of differential physics you
               | shouldn't use python. Diffrax is almost OKish, but is
               | missing a ton of features (e.g. good stiff solvers,
               | arbitrary precision support, events for anything other
               | than stopping the simulation, ability to control the
               | linear solve which are needed for large problems). For
               | simple cases it can come close to the C++/Julia solvers,
               | but for anything complicated, you either won't be able to
               | formulate the model, or you won't be able to solve it
               | efficiently.
        
               | patrickkidger wrote:
               | > If you care about performance
               | 
               | This definitely isn't true. On any benchmark I've tried,
               | JAX and Julia basically match each other. Usually I find
               | JAX to be a bit faster, but that might just be that I'm a
               | bit more skilled at optimising that framework.
               | 
               | Anyway I'm not going to try and debunk things point-by-
               | point, I'd rather avoid yet another unpleasant Julia
               | flame-war.
        
               | chazeon wrote:
               | I have seen JAX-MD[1] but not sure about "much better".
               | On the other hand, there is just no MD implemented with
               | PyTorch.
               | 
               | [1]: https://github.com/jax-md/jax-md
        
               | whimsicalism wrote:
               | Because the `jax.numpy` operations & primitives are
               | almost 1:1 with numpy, many working scientists who
               | already have experience working with numpy will be able
               | to figure out jax faster.
               | 
               | It is also easier to rewrite existing code/snippets (say
               | you were working on a non-differentiable simulator
               | before) into jax if you already have them in numpy then
               | to do the whole rewrite in pytorch.
               | 
               | I will say that I think pytorch has improved its numpy
               | compatability a lot in recent years, functions that I was
               | convinced didn't exist with pytorch (like eigh)
               | apparently actually do.
        
           | cube2222 wrote:
           | It's funny, cause already after the first sentence it felt
           | like ChatGPT, probably because I've played with it a lot
           | these past few days, and expectedly I found a disclaimer at
           | the end.
           | 
           | That said, the answer isn't really useful, as it's very
           | generic, without anything concrete (other than the mention of
           | Autograd) imo.
           | 
           | Though a follow up question might improve on that.
        
       | singularity2001 wrote:
       | Getting Started
       | 
       | ...
       | 
       | and zero words on how to get started.
       | 
       | pip3 install torch2?
       | 
       | pip3 install torch==2.0? nope
        
         | SekstiNi wrote:
         | https://pytorch.org/get-started/pytorch-2.0/#requirements
        
       | quietbritishjim wrote:
       | > Today, we announce torch.compile, a feature that pushes PyTorch
       | performance to new heights and starts the move for parts of
       | PyTorch from C++ back into Python.
       | 
       | I'll admit I don't know enough about PyTorch to know what
       | torch.compile is exactly. But does this means some features of
       | PyTorch will no longer be available in the core C++ library? One
       | of the nice things about PyTorch had been that you could do your
       | training in Python then deploy with a pure C++ application.
        
         | fddr wrote:
         | The `torch.compile` API itself will not be available from C++.
         | That means that you won't get the pytorch 2.0 performance gains
         | if you use it via C++ API.
         | 
         | There's no plan to deprecate the existing C++ API, it should
         | keep working as it is. However, a common theme of all the
         | changes is implementing more of pytorch in python (explicitly
         | the goal of primtorch), so if this plan works it could happen
         | in the long run.
        
         | danieldk wrote:
         | _One of the nice things about PyTorch had been that you could
         | do your training in Python then deploy with a pure C++
         | application._
         | 
         | Or even train in C++ or Rust without much loss in
         | functionality.
        
           | synergy20 wrote:
           | Rust really has not had any presence in AI training engine
           | yet, it's probably 100% c++.
        
             | danieldk wrote:
             | I was referring to the libtorch library, which you can use
             | through the tch crate. It is possible to make such rich
             | bindings because so much of Torch is exposed through the
             | C++ API. When more new functionality is moved to Python, it
             | makes it harder to use functionality from the C++ interface
             | and downstream bindings.
        
         | synergy20 wrote:
         | Facebook did similar thing to its original code PHP, it uses
         | HHVM to 'compile' PHP(now called Hacklang) to gain performance,
         | it seems doing similar thing with python here.
        
       | algon33 wrote:
       | The FAQ contains re-states the content for point 14 in point 13.
       | Point 14 is about why your code might be slower when using 2.0.
       | 13 should be about how to keep up with PT 2.0 developments.
       | Someone should change that.
        
       | belval wrote:
       | > We believe that this is a substantial new direction for PyTorch
       | - hence we call it 2.0. torch.compile is a fully additive (and
       | optional) feature and hence 2.0 is 100% backward compatible by
       | definition.
       | 
       | How about just calling it PyTorch 1.14 if it's backward
       | compatible? Version numbering shouldn't be used as a marketing
       | gimmick.
        
         | posharma wrote:
         | Is this really the biggest problem that needs to be solved in
         | AI?
        
           | whimsicalism wrote:
           | No? What would have given you that impression?
           | 
           | Oh, I see. You were trying to be dismissive.
        
           | robertlagrant wrote:
           | Quite the non sequitur you have there.
        
           | belval wrote:
           | Not sure I understand that question, is versioning the
           | biggest problem no, but it costs nothing to keep semver and
           | prevent production headaches later.
           | 
           | If you meant inference speed then yeah it's a very big
           | problem so it's good that they are addressing it.
        
             | mi_lk wrote:
             | what exact production headaches you are expecting by bump
             | the number from 1.13 -> 2.0, while all existing codes keep
             | working as before?
             | 
             | And how is it different from bumping 1.13 to 1.14, even if
             | they named it 1.14?
        
               | belval wrote:
               | The soft kind. Major versions are deeply ingrained as
               | "possible backward-compatibility issues" in most
               | engineers' brain. If you handle model development,
               | evaluation and deployment yourself than sure you won't
               | have any issues, but in a bigger organization you have to
               | get people to switch and that version number will mean
               | that everyone will ask the same "hang on this is a major
               | version change?!" question every step of the way.
        
         | pdntspa wrote:
         | They're saying it represents a change in direction and is a
         | pretty big feature, traditionally that's been a good reason to
         | increment a major version number.
        
         | js2 wrote:
         | Dismissive comments like this make me not want to read HN
         | anymore and in addition it's against the HN guidelines:
         | 
         | It's snarky. It's incurious. It's neither thoughtful nor
         | substantive. It's flame bait. It's a shallow dismissal. It
         | doesn't teach anything. It's the most provocative thing to
         | complain about.
         | 
         | https://news.ycombinator.com/newsguidelines.html
         | 
         | I'm sorry I had to leave this comment, so let me also try to
         | respond thoughtfully:
         | 
         | Assuming that PyTorch is using semantic versioning requires
         | that the major version MUST change when making a backwards
         | incompatible API change:
         | 
         | > Major version X (X.y.z | X > 0) MUST be incremented if any
         | backwards incompatible changes are introduced to the public
         | API. It MAY also include minor and patch level changes. Patch
         | and minor versions MUST be reset to 0 when major version is
         | incremented.
         | 
         | This requirement does NOT preclude changing the major version
         | when making backwards-compatible changes.
         | 
         | PyTorch has not violated semver here. It is absolutely
         | compatible with semver to bump the major version for marketing
         | reasons.
         | 
         | https://semver.org/
        
           | belval wrote:
           | Personal attack aside, from your own link:
           | 
           | > Given a version number MAJOR.MINOR.PATCH, increment the:
           | 
           | > MAJOR version when you make incompatible API changes
           | 
           | > MINOR version when you add functionality in a backwards
           | compatible manner
           | 
           | > PATCH version when you make backwards compatible bug fixes
           | 
           | > Additional labels for pre-release and build metadata are
           | available as extensions to the MAJOR.MINOR.PATCH format.
           | 
           | You can point towards some other details, but it doesn't
           | change the fact that for the overwhelming majority of people,
           | the quote above is what semver is. Besides, my original
           | comment does not say "They broke semver", it says they
           | shouldn't bump the major version if they don't make backward
           | incompatible change because afterwards the mental model of
           | "Can I use version X.Y.Z?" is broken.
           | 
           | When TensorFlow moved to 2.0 it's because they were changing
           | from graphs and session definition to eager mode. That makes
           | sense, that means the underlying API and how the downstream
           | users interact with it changed. These are just newer features
           | that, while very useful, have limited bearing on downstream
           | users.
        
       ___________________________________________________________________
       (page generated 2022-12-02 23:01 UTC)