[HN Gopher] The Elements of Differentiable Programming
___________________________________________________________________
The Elements of Differentiable Programming
Author : leephillips
Score : 70 points
Date : 2024-03-22 18:08 UTC (4 hours ago)
(HTM) web link (arxiv.org)
(TXT) w3m dump (arxiv.org)
| macawfish wrote:
| This is amazing! Seems like a perfect excuse to get back into
| Julia. I just wish Julia had more compile targets. Ideally I'd
| like to have the option to target the browser (wasm/webgpu).
| UncleOxidant wrote:
| https://julialang.org/jsoc/gsoc/wasm/
| macawfish wrote:
| Awesome, I hope it pans out
| UncleOxidant wrote:
| The Julia community seems like they've been out front when it
| comes to differentiable programming. For example:
| https://www.youtube.com/watch?v=rF2QAJLM730
| fpgamlirfanboy wrote:
| i don't know why people write these things. it's an absolute
| hodge-podge of theorem/proofs/results/techniques with no unifying
| theme other than "CALCULUS". so it's a pretty bad math book to
| actually learn math from (you can always spot a pedagogically
| unsound math book by its lack of exercises). the book doesn't
| even have any code in it which is surprising considering it has
| "programming" in the title.
|
| actually i know why people write these but i still don't know why
| they publish them: this is a phase everyone goes through in their
| "math" life where they look back on everything they've learned
| hastily between undergrad/phd/postdoc (or whatever) and they have
| the urge to formalize/crystallize. everyone has the urge - i had
| a late-career QFT prof tell me that he was excited to take his
| sabbatical so that he could finally do all of the exercises in
| peskin&schroeder for real real and type it all up neatly.
|
| i've done it too, in-the-small (some very nice notes that i'm
| proud of, on various things). you sit down, make your list of
| things, pull up all of the books/papers you're going to use as
| references and you start essentially transcribing - but you tell
| yourself you're putting your own spin on it (adding ample
| "motivation"). and it's all fine and healthy and gratifying _for
| you and yourself alone_. but i don 't think i'd ever imagine to
| myself "well hmm my organization of these topics is going to be
| useful for other people i should put it out there for the world
| to see". but that's just me.
|
| ninja edit:
|
| before someone jumps down my throat about "what's the harm?". the
| harm is n00b/undergrads/young people/etc will look at this and
| think this is the right way to learn this material and some of
| them will even make an attempt to learn the material from this
| thing and they'll struggle and fail and be discouraged - i speak
| from experience! it's not a good thing for the community. sure
| maybe 1 in a 100 can learn this stuff from just reading a
| monograph (what these things used to be called...) but that's the
| exception that proves the rule.
| hyperbovine wrote:
| 1.3 Intended audience
|
| This book is intended to be a graduate-level introduction to
| differentiable programming. Our pedagogical choices are made
| with the machine learning community in mind. Some familiarity
| with calculus, linear algebra, probability theory and machine
| learning is beneficial.
|
| ---
|
| It's a book for ML researchers. I'm excited to read it. Calm
| down.
| fpgamlirfanboy wrote:
| > graduate-level introduction to differentiable programming.
|
| go check out any _real_ graduate textbook. what you will find
| is they all have exercises.
|
| > It's a book for ML researchers. I'm excited to read it.
| Calm down.
|
| just because the authors claim something doesn't make it
| true. i'm not wrong - this is not a good pedagogical resource
| and i would bet a year of my salary (as an ML researcher)
| that you will in fact not read more than 5% of this book.
| smallnamespace wrote:
| It's a good thing you're an ML researcher and not, say, a
| quant trader then ;)
| thfuran wrote:
| Really? I'd gladly read 6% of a book for a year of pay.
| bbor wrote:
| You're getting hit on a bunch of different things here, but I'd
| like to focus on two: it's an absolute hodge-
| podge of theorem/proofs/results/techniques with no unifying
| theme other than "CALCULUS".
|
| I mean, yes...? Maybe I'm a terrible programmer but I've never
| applied calculus to my work in any real way. A book that's just
| "calculus applications for software design" seems quite useful,
| and quite unrelated to teaching "math" in a direct way.
| and it's all fine and healthy and gratifying for you and
| yourself alone. but i don't think i'd ever imagine to myself
| "well hmm my organization of these topics is going to be useful
| for other people i should put it out there for the world to
| see". but that's just me.
|
| Doesn't this apply to all books of any kind? How do you know
| what if you can write a book before you try? I feel like
| "transcribe papers with a spin on them" is a perfect
| description of Russel and Norvig's AI book, and many find it
| valuable. Is this just a math-specific criticism?
|
| Either way, thanks for the very interested contrarian comment!
| You're well spoken and I do love a discussion more than "wow
| this looks useful", which is all I was prepared to give.
| fpgamlirfanboy wrote:
| > I mean, yes...? Maybe I'm a terrible programmer but I've
| never applied calculus to my work in any real way. A book
| that's just "calculus applications for software design" seems
| quite useful, and quite unrelated to teaching "math" in a
| direct way.
|
| that's not what this book is. like i said in the first
| sentence right there in the part right before the all caps
| calculus - this is book is a compendium of
| theorem/proofs/etc. very little actual software.
|
| > Doesn't this apply to all books of any kind?
|
| again, i already covered this: a pedadogically sound textbook
| will have exercises and structure/themes/etc rather than just
| 120 "propositions".
| nonagono wrote:
| It's an introduction to a relatively niche new subfield. If I
| (an expert in the field but not the subfield) want to learn
| about differentiable programming, my only option before this
| monograph was to read through tens of random papers which use
| different presentation styles, terminology etc. Now I can read
| through the second half of this, around 100 pages, and jump
| back to the first half if there's a prerequisite I don't know.
|
| That's how most subfields are born. Assorted papers ->
| monograph -> textbook. The first arrow is defining the subfield
| as a discrete topic, which is immensely valuable. Only after
| you have that you can start optimizing for presentation to
| nonexperts.
| amelius wrote:
| Would this be useful for general applications, or just numerical
| ones?
| dkjaudyeqooe wrote:
| Your program takes various inputs, does processing and gives
| you results. To quote the book this applies to programs
| "including those with control flows and data structures" and
| are not entirely numerical in the most common sense. Broadly
| speaking, with AD you can modify the results and the program
| will spit out the required inputs to get those results.
| amelius wrote:
| So if a program's result is "no access", then AD can figure
| out how you can get access. Sounds like an important hacker
| tool.
| cscheid wrote:
| I mean, "differential cryptanalysis" doesn't go through AD,
| but I would be surprised if it's not possible to get pretty
| close to a discrete analogue to AD using bit flips instead
| of differential and abstract interpretation.
| kelseyfrog wrote:
| Cryptography functions, I'd assume, are purposely non-
| differentiable.
| lsb wrote:
| Yes, these are "adversarial patches" in image
| classification, like https://arxiv.org/abs/1712.09665 .
| Similarly you can take these adversaries and add them to
| your own larger model, in an arms race.
| JadeNB wrote:
| > Broadly speaking, with AD you can modify the results and
| the program will spit out the required inputs to get those
| results.
|
| I assume you're trying to phrase it in a non-technical way
| for accessibility, but I wonder if that might have lost some
| precision. What you describe sounds more like (logically)
| reversible programming
| (https://en.wikipedia.org/wiki/Reversible_computing).
| Differentiability doesn't imply reversibility; for example,
| the program that takes in an input and returns 1 is as
| differentiable as they come, but there's nothing that
| differentiation, automatic or otherwise, can do to tell you
| an input that will make it return 2.
| thfuran wrote:
| But it can immediately return "no".
| dkjaudyeqooe wrote:
| This is a timely book. Maybe more interesting (at least to me)
| than the recent results of AI research is the application of
| techniques used in the field applied elsewhere.
|
| For my work going forward catering to automatic differentiation
| in the code is a no-brainer.
| whoevercares wrote:
| TBH I would hope this is a Jax deep dive book
| geor9e wrote:
| Why did the deep learning model cross the road? Because it was
| smooth and differentiable
| MikeBattaglia wrote:
| One very interesting thing about automatic differentiation is
| that you can think of it as involving a new algebra, similar to
| the complex numbers, where we adjoin an extra element to the
| reals to form a plane. This new algebra is called the ring of
| "dual numbers." The difference is that instead of adding a new
| element "i" with i2 = -1, we add one called "h" with h2 = 0!
|
| Every element in the dual numbers is of the form a + bh, and in
| fact the entire ring can be turned into a totally ordered ring in
| a very natural way: simply declare h < r for any real r > 0. In
| essence, we are saying h is an infinitesimal - so small that its
| square is 0. So we have a non-Archimedean ring with
| infinitesimals - the _smallest_ such ring extending the real
| numbers.
|
| Why is this so important? Well, if you have some function f which
| can be extended to the dual number plane - which many can,
| similar to the complex plane - we have
|
| f(x+h) = f(x) + f'(x)h
|
| Which is little more than restating the usual definition of the
| derivative: f'(x) = (f(x+h) - f(x))/h
|
| For instance, suppose we have f(x) = 2x2 - 3x + 1, then
|
| f(x+h) = 2(x+h)2 - 3(x+h) + 1 = 2(x2 + 2xh + h2) - 3(x+h) + 1 =
| (2x2 - 3x + 1) + (4x - 3)h
|
| Where the last step just involves rearranging terms and canceling
| out the h2 = 0 term. Note that the expression for the derivative
| we get, (4x-3), is correct, and magically computed itself
| straight from the properties of the algebra.
|
| In short, just like creating i2 = -1 revolutionized algebra,
| setting h2 = 0 revolutionizes calculus. Most autodiff packages
| (such as Pytorch) use something not much more advanced than this,
| although there are optimizations to speed it up (e.g. reverse
| mode diff).
| compacct27 wrote:
| Where do I go to learn what you just said?
| fpgamlirfanboy wrote:
| there's a reason no one uses dual numbers (smooth analysis) for
| anything (neither autodiff nor calculus itself): because
| manipulating infinitesmals like this is fraught formal
| manipulation (it's algebra...) where as limits are much more
| rigorous (bounds, inequalities, convergence, etc.).
|
| > Most autodiff packages (such as Pytorch) use something not
| much more advanced than this
|
| pytorch absolutely does not use the dual number formulation -
| there are absolutely no magic epsilon's anywhere in pytorch's
| (or tensorflow's) code base. what you're calling dual are the
| adjoints where are indeed stored/cached on every node in
| pytorch graphs.
___________________________________________________________________
(page generated 2024-03-22 23:00 UTC)