[HN Gopher] Differentiable Programming from Scratch
___________________________________________________________________
Differentiable Programming from Scratch
Author : sksxihve
Score : 88 points
Date : 2025-04-17 04:30 UTC (18 hours ago)
(HTM) web link (thenumb.at)
(TXT) w3m dump (thenumb.at)
| weinzierl wrote:
| _" Now that we understand differentiation, let's move on to
| programming. So far, we've only considered mathematical
| functions, but we can easily translate our perspective to
| programs. For simplicity, we'll only consider pure functions,
| i.e. functions whose output depends solely on its parameters (no
| state)."_
|
| I think I've seen this notion that the constraint is pureness
| also in documentation of autodiff libraries, but this cannot be
| strong enough, right?
|
| It easy enough to come up with functions that are nowhere
| differentiable. So my question is, what are the actual
| requirements a state of the art autodiff library has for the
| input function and why do people focus on the pureness aspect if
| that is probably the least of the problems.
| yorwba wrote:
| If the function is pure, autodiff can produce a result. If the
| function is not differentiable for a given input, the result
| produced by autodiff of course cannot be the derivative (since
| none exists) but it could be another number (e.g. a
| subderivative) that is useful in some of the ways a derivative
| would have been useful.
| grandempire wrote:
| State is just values which are not considered to be variable
| function inputs - in other words each time you change state you
| have new functions.
|
| For example f(x, y) = xy and then defining a differentiable
| function g(x) = f(x, a). You can imagine "a" being a state
| variable.
| hansvm wrote:
| Purity, interestingly, is not a requirement. You can represent
| any closure as something implicitly captured, and you can
| represent mutable state as an immutable input with a different,
| related, immutable output. For any autodiff library not also
| throwing in dense+sparse decompositions (none of the major ones
| do any such optimization; the programmer has to manually choose
| which things to represent how), that doesn't even waste any
| more space for reverse-mode autodiff than your other options
| (i.e., you need to maintain the before and after state anyway,
| give or take your particular checkpointing scheme, and if you
| don't have a way to represent the delta cheaply then contrasted
| with the underlying mutable forward pass which probably was
| faster and cheaper than the alternatives, the differentiation
| is probably quite expensive). Purity is often easier to code
| around, especially in "complicated" languages with many
| features you'd like to support, but it's not mandatory.
|
| In terms of actual requirements, something that's sufficient
| [0] is for every sub-component to be differentiable and for no
| dynamic control flow to depend on the things being
| differentiated. In practice, most libraries wind up requiring
| something like this, mostly because it's very hard to do
| anything else. As an example, define f(x) := 0 for floats with
| an even LSB and 1 for floats with an odd LSB. Define g(x) := 1
| - f(x). Neither of these are meaningfully differentiable, but
| g(x) + f(x) is identically equal to one. Autodiff relies
| crucially on the fact that it can perform local
| transformations, and that sort of whole-program analysis is (a)
| impossible in general, and (b) hard even when it's possible.
|
| For local-only autodiff (the only thing people ever do), the
| thing that's necessary is for every sub-component to have a
| derivative-like operator defined such that if the sub-
| components are composed into a differentiable function then the
| normal chain rule and other autodiff compositions of those
| operators is also differentiable and represents the derivative
| in question (along with some requirements on dynamic control
| flow -- they don't have to be quite as strict as I described,
| but it's impossible to relax in general that with local-only
| autodiff, so that dynamic requirement from the above paragraph
| is also necessary).
|
| There are few (zero?) components where that's possible -- an
| adversary can always come up with a composition violating the
| derivative being incorrect. However, for some interesting
| functions (like eigenvalues and eigenvectors) in the normal way
| people use them, these sorts of things can be defined. E.g.,
| the eigenvalue derivative is not unique (up to a choice of
| phase), but if your composition also doesn't depend on phase
| then you're still fine.
|
| [0] Even for things like differentiating through a loop
| converging to a value, this property holds, with one meaningful
| exception: The error in the derivative compared with the true
| function you're approximating will still converge to zero with
| enough iterations, but that number can be much higher than you
| need to get the function itself to converge. You _will_,
| however, get the derivative of the approximation perfect.
| FilosofumRex wrote:
| Historical fact: Differentiable programming was a little known
| secret back in the 90's, used mainly by engineers simulating
| numerically stiff systems like nukes and chemicals in FORTRAN 95.
| It then disappeared for nearly 30 yrs before rediscovery by the
| ML/AI researchers!
| constantcrying wrote:
| It wasn't forgotten, I learned it in university outside of any
| AI context. It just had most of its applications exhausted and
| ceased being a particularly active research topic.
| kxyvr wrote:
| Automatic differentiation was actively and continuously used in
| some communities for the last 40 years. Louis Rall has an
| entire book about it published in 1981. One of the more popular
| books on AD written by Griewank was published in 2000. I
| learned about it in university in the early 2000s. I do agree
| that the technology was not as well used as it should have been
| until more recently, but the technology was well known within
| numerical math world and used continuously over the years.
| taeric wrote:
| Computer Algebra Systems (CAS) were not really a secret. And
| they often have many many tricks that we are constantly
| relearning. Some of this relearning, of course, is by design. A
| lot of what they do are things we teach. How to calculate
| different functions and such. Repeated squares and such are fun
| topics.
|
| A lot of the current new set of learning is that we have the
| compute power to do these things in more places. It is also
| something that has been long done in expensive environments
| that many of us just don't have access to.
| thechao wrote:
| My PhD dissertation included a chapter (originally from a 2006
| paper) on generic programming in CAS' on the algorithmic
| differentiable ring (and operator). By the 1990s, algorithmic
| differentiation was easily 30-40 years old. Griewank & Monagan
| both knew guys who had built early electromagnetic naval
| targeting computers that used the methodology back by at least
| the early 60s "by hand". (Very literally.)
|
| I watched the ML/AI bros actively ignore previous research --
| even when they were requested to properly cite sources they
| were plagiarizing -- in real time. The race to publish (even
| for big journals) was so important that it was easier to ignore
| the rank dishonesty than it was to correct their misbehavior.
| I'm 1000x happier to not have stayed around for all that crap.
| srean wrote:
| > on the algorithmic differentiable ring (and operator)
|
| That sounds like an interesting read. Do you have the chapter
| or the reference to the paper that you can share ?
|
| Regarding th crop of deep neural network research their self-
| serving and willful blindness has a reputation that's well
| deserved.
|
| A grad student from Hinton's lab mentioned one researcher who
| would misspell a citation on purpose so that the citation
| count of the cited paper does not go up.
| hwpythonner wrote:
| I'm not deep into autodiff (just recall some calculus from
| university), but the syntax in this post reminds me a lot of ML
| (the programming language, not machine learning)
|
| I know autodiff isn't lambda calculus, but the expression-based
| structure and evaluation rules feel similar. Couldn't this be
| implemented in something like ML or Clojure? Just wondering what
| the custom DSL adds that existing functional languages wouldn't
| already support
| fire_lake wrote:
| There is an F# implementation (Microsoft flavoured ML) called
| DiffSharp
| hansvm wrote:
| I didn't see a DSL anywhere, just normal JS code.
|
| As to what it adds?
|
| - It's more accessible to a wider audience (and looks like how
| you'd implement autodiff in most languages)
|
| - It runs in the browser trivially (powering those demos)
|
| - The author (potentially) didn't have to learn a new language
| just to get started
|
| - Programs are not fully differentiable, or at the very least
| there are some crazy edge cases and dragons lurking if you
| attempt to make them so. A dedicated whitelist of supported
| operations isn't necessarily a bad design, contrasted with an
| implicit whitelist in Clojure (depending on the implementation
| of course, but there wasn't a lot of source-to-source
| boilerplate even in this example, so I assume the benefit of a
| functional language would be stripping away some of the
| characteristics I think are important).
| constantcrying wrote:
| Automatic differentiation can be implemented in essentially any
| language. Some just make it look "nicer".
| kgwgk wrote:
| For example, it looks nice in Common Lisp:
| https://people.eecs.berkeley.edu/~fateman/papers/overload-
| AD...
| constantcrying wrote:
| >Unfortunately, finite differences have a big problem: they only
| compute the derivative of f in one direction. If our input is
| very high dimensional, computing the full gradient of f becomes
| computationally infeasible, as we would have to evaluate f for
| each dimension separately.
|
| And it is terribly unstable numerically. f(x) and f(x+h) are very
| similar, h is very small. You have to expect destructive
| cancellation to happen. For black boxes it is the only real
| alternative though, you can do a bit better by taking a
| derivative in both directions.
___________________________________________________________________
(page generated 2025-04-17 23:02 UTC)