[HN Gopher] Enzyme - High-performance automatic differentiation ...
___________________________________________________________________
Enzyme - High-performance automatic differentiation of LLVM
Author : albertzeyer
Score : 64 points
Date : 2021-02-03 10:15 UTC (1 days ago)
(HTM) web link (enzyme.mit.edu)
(TXT) w3m dump (enzyme.mit.edu)
| KenoFischer wrote:
| Also see the Julia package that makes it acessible with a high
| level interface and probably one of the easier ways to play with
| it: https://github.com/wsmoses/Enzyme.jl.
| ksr wrote:
| > The Enzyme project is a tool for performing reverse-mode
| automatic differentiation (AD) of statically-analyzable LLVM IR.
| This allows developers to use Enzyme to automatically create
| gradients of their source code without much additional work.
|
| Can someone please explain applications of creating gradients of
| my source code?
| 6d65 wrote:
| AFAIK, It's mainly used for implementing gradient descent,
| which is used for training neural networks.
|
| Frameworks like pytorch, tensorflow, probably used back
| propagation to calculate the gradient of a multidimensional
| function. But in involves tracing, and storing the network
| state during the forward pass.
|
| Static automatic differentiation should be faster and should
| look a lot like differentiation is done mathematically rather
| than numerically.
|
| Of course there are more applications to AD in scientific
| computing.
| oscargrouch wrote:
| I think Swift is also going into this direction of backing in
| directly into the compiler and provide it as a higher level
| language construction.
|
| https://github.com/apple/swift/blob/main/docs/Differentiable.
| ..
|
| Which leads to "Swift for Tensorflow" that unlike other
| languages like Java, Go or Python is not just about bindings
| to the C++ tensorflow library.
| ant6n wrote:
| Optimization. Then again, one could probably calculate
| gradients numerically.
| cameronperot wrote:
| One could, but automatic differentiation is much more
| efficient than numerical differentiation, thus for high
| performance applications it is preferable to use automatic
| differentiation.
| PartiallyTyped wrote:
| It constructs an analytical gradient from the code. The reason
| is that you can compute the gradient directly. This can enable
| optimizations such as avoiding caching big matrices because you
| don't need to keep track of states/trace the graph, or you can
| compute the 2nd, 3rd, 4th... and so on derivatives because you
| have an analytical gradient.
|
| For example in an affine function, the gradient of the
| bias/intercept is the gradient of the loss wrt the activation
| function and for the weights, it's the product of loss wrt
| activation function and the input to the layer.
|
| With automatic graph construction e.g. eager
| Tensorflow/Pytorch, the layer needs to cache the input of the
| layer, so that it can compute the gradient of the weights. If
| the layer receives inputs multiple times within the computation
| graph, you end up caching it multiple times.
|
| With analytical gradients, you may be able to save memory by
| finding optimizations because you have the analytical gradient,
| e.g. above you can sum the inputs ie (dL/dz)input1 +
| (dL/dz)input2 = (dL/dz)(input1+input2).
| [deleted]
| bdauvergne wrote:
| Funny, I worked on Tapenade (one of the compared automatic
| differentiation software). I'm happy that it still reaches 60% of
| the performance of something written directly inside an
| optimizing compiler.
___________________________________________________________________
(page generated 2021-02-04 23:00 UTC)