[HN Gopher] The Elements of Differentiable Programming
       ___________________________________________________________________
        
       The Elements of Differentiable Programming
        
       Author : leephillips
       Score  : 70 points
       Date   : 2024-03-22 18:08 UTC (4 hours ago)
        
 (HTM) web link (arxiv.org)
 (TXT) w3m dump (arxiv.org)
        
       | macawfish wrote:
       | This is amazing! Seems like a perfect excuse to get back into
       | Julia. I just wish Julia had more compile targets. Ideally I'd
       | like to have the option to target the browser (wasm/webgpu).
        
         | UncleOxidant wrote:
         | https://julialang.org/jsoc/gsoc/wasm/
        
           | macawfish wrote:
           | Awesome, I hope it pans out
        
         | UncleOxidant wrote:
         | The Julia community seems like they've been out front when it
         | comes to differentiable programming. For example:
         | https://www.youtube.com/watch?v=rF2QAJLM730
        
       | fpgamlirfanboy wrote:
       | i don't know why people write these things. it's an absolute
       | hodge-podge of theorem/proofs/results/techniques with no unifying
       | theme other than "CALCULUS". so it's a pretty bad math book to
       | actually learn math from (you can always spot a pedagogically
       | unsound math book by its lack of exercises). the book doesn't
       | even have any code in it which is surprising considering it has
       | "programming" in the title.
       | 
       | actually i know why people write these but i still don't know why
       | they publish them: this is a phase everyone goes through in their
       | "math" life where they look back on everything they've learned
       | hastily between undergrad/phd/postdoc (or whatever) and they have
       | the urge to formalize/crystallize. everyone has the urge - i had
       | a late-career QFT prof tell me that he was excited to take his
       | sabbatical so that he could finally do all of the exercises in
       | peskin&schroeder for real real and type it all up neatly.
       | 
       | i've done it too, in-the-small (some very nice notes that i'm
       | proud of, on various things). you sit down, make your list of
       | things, pull up all of the books/papers you're going to use as
       | references and you start essentially transcribing - but you tell
       | yourself you're putting your own spin on it (adding ample
       | "motivation"). and it's all fine and healthy and gratifying _for
       | you and yourself alone_. but i don 't think i'd ever imagine to
       | myself "well hmm my organization of these topics is going to be
       | useful for other people i should put it out there for the world
       | to see". but that's just me.
       | 
       | ninja edit:
       | 
       | before someone jumps down my throat about "what's the harm?". the
       | harm is n00b/undergrads/young people/etc will look at this and
       | think this is the right way to learn this material and some of
       | them will even make an attempt to learn the material from this
       | thing and they'll struggle and fail and be discouraged - i speak
       | from experience! it's not a good thing for the community. sure
       | maybe 1 in a 100 can learn this stuff from just reading a
       | monograph (what these things used to be called...) but that's the
       | exception that proves the rule.
        
         | hyperbovine wrote:
         | 1.3 Intended audience
         | 
         | This book is intended to be a graduate-level introduction to
         | differentiable programming. Our pedagogical choices are made
         | with the machine learning community in mind. Some familiarity
         | with calculus, linear algebra, probability theory and machine
         | learning is beneficial.
         | 
         | ---
         | 
         | It's a book for ML researchers. I'm excited to read it. Calm
         | down.
        
           | fpgamlirfanboy wrote:
           | > graduate-level introduction to differentiable programming.
           | 
           | go check out any _real_ graduate textbook. what you will find
           | is they all have exercises.
           | 
           | > It's a book for ML researchers. I'm excited to read it.
           | Calm down.
           | 
           | just because the authors claim something doesn't make it
           | true. i'm not wrong - this is not a good pedagogical resource
           | and i would bet a year of my salary (as an ML researcher)
           | that you will in fact not read more than 5% of this book.
        
             | smallnamespace wrote:
             | It's a good thing you're an ML researcher and not, say, a
             | quant trader then ;)
        
             | thfuran wrote:
             | Really? I'd gladly read 6% of a book for a year of pay.
        
         | bbor wrote:
         | You're getting hit on a bunch of different things here, but I'd
         | like to focus on two:                 it's an absolute hodge-
         | podge of theorem/proofs/results/techniques with no unifying
         | theme other than "CALCULUS".
         | 
         | I mean, yes...? Maybe I'm a terrible programmer but I've never
         | applied calculus to my work in any real way. A book that's just
         | "calculus applications for software design" seems quite useful,
         | and quite unrelated to teaching "math" in a direct way.
         | and it's all fine and healthy and gratifying for you and
         | yourself alone. but i don't think i'd ever imagine to myself
         | "well hmm my organization of these topics is going to be useful
         | for other people i should put it out there for the world to
         | see". but that's just me.
         | 
         | Doesn't this apply to all books of any kind? How do you know
         | what if you can write a book before you try? I feel like
         | "transcribe papers with a spin on them" is a perfect
         | description of Russel and Norvig's AI book, and many find it
         | valuable. Is this just a math-specific criticism?
         | 
         | Either way, thanks for the very interested contrarian comment!
         | You're well spoken and I do love a discussion more than "wow
         | this looks useful", which is all I was prepared to give.
        
           | fpgamlirfanboy wrote:
           | > I mean, yes...? Maybe I'm a terrible programmer but I've
           | never applied calculus to my work in any real way. A book
           | that's just "calculus applications for software design" seems
           | quite useful, and quite unrelated to teaching "math" in a
           | direct way.
           | 
           | that's not what this book is. like i said in the first
           | sentence right there in the part right before the all caps
           | calculus - this is book is a compendium of
           | theorem/proofs/etc. very little actual software.
           | 
           | > Doesn't this apply to all books of any kind?
           | 
           | again, i already covered this: a pedadogically sound textbook
           | will have exercises and structure/themes/etc rather than just
           | 120 "propositions".
        
         | nonagono wrote:
         | It's an introduction to a relatively niche new subfield. If I
         | (an expert in the field but not the subfield) want to learn
         | about differentiable programming, my only option before this
         | monograph was to read through tens of random papers which use
         | different presentation styles, terminology etc. Now I can read
         | through the second half of this, around 100 pages, and jump
         | back to the first half if there's a prerequisite I don't know.
         | 
         | That's how most subfields are born. Assorted papers ->
         | monograph -> textbook. The first arrow is defining the subfield
         | as a discrete topic, which is immensely valuable. Only after
         | you have that you can start optimizing for presentation to
         | nonexperts.
        
       | amelius wrote:
       | Would this be useful for general applications, or just numerical
       | ones?
        
         | dkjaudyeqooe wrote:
         | Your program takes various inputs, does processing and gives
         | you results. To quote the book this applies to programs
         | "including those with control flows and data structures" and
         | are not entirely numerical in the most common sense. Broadly
         | speaking, with AD you can modify the results and the program
         | will spit out the required inputs to get those results.
        
           | amelius wrote:
           | So if a program's result is "no access", then AD can figure
           | out how you can get access. Sounds like an important hacker
           | tool.
        
             | cscheid wrote:
             | I mean, "differential cryptanalysis" doesn't go through AD,
             | but I would be surprised if it's not possible to get pretty
             | close to a discrete analogue to AD using bit flips instead
             | of differential and abstract interpretation.
        
             | kelseyfrog wrote:
             | Cryptography functions, I'd assume, are purposely non-
             | differentiable.
        
             | lsb wrote:
             | Yes, these are "adversarial patches" in image
             | classification, like https://arxiv.org/abs/1712.09665 .
             | Similarly you can take these adversaries and add them to
             | your own larger model, in an arms race.
        
           | JadeNB wrote:
           | > Broadly speaking, with AD you can modify the results and
           | the program will spit out the required inputs to get those
           | results.
           | 
           | I assume you're trying to phrase it in a non-technical way
           | for accessibility, but I wonder if that might have lost some
           | precision. What you describe sounds more like (logically)
           | reversible programming
           | (https://en.wikipedia.org/wiki/Reversible_computing).
           | Differentiability doesn't imply reversibility; for example,
           | the program that takes in an input and returns 1 is as
           | differentiable as they come, but there's nothing that
           | differentiation, automatic or otherwise, can do to tell you
           | an input that will make it return 2.
        
             | thfuran wrote:
             | But it can immediately return "no".
        
       | dkjaudyeqooe wrote:
       | This is a timely book. Maybe more interesting (at least to me)
       | than the recent results of AI research is the application of
       | techniques used in the field applied elsewhere.
       | 
       | For my work going forward catering to automatic differentiation
       | in the code is a no-brainer.
        
       | whoevercares wrote:
       | TBH I would hope this is a Jax deep dive book
        
       | geor9e wrote:
       | Why did the deep learning model cross the road? Because it was
       | smooth and differentiable
        
       | MikeBattaglia wrote:
       | One very interesting thing about automatic differentiation is
       | that you can think of it as involving a new algebra, similar to
       | the complex numbers, where we adjoin an extra element to the
       | reals to form a plane. This new algebra is called the ring of
       | "dual numbers." The difference is that instead of adding a new
       | element "i" with i2 = -1, we add one called "h" with h2 = 0!
       | 
       | Every element in the dual numbers is of the form a + bh, and in
       | fact the entire ring can be turned into a totally ordered ring in
       | a very natural way: simply declare h < r for any real r > 0. In
       | essence, we are saying h is an infinitesimal - so small that its
       | square is 0. So we have a non-Archimedean ring with
       | infinitesimals - the _smallest_ such ring extending the real
       | numbers.
       | 
       | Why is this so important? Well, if you have some function f which
       | can be extended to the dual number plane - which many can,
       | similar to the complex plane - we have
       | 
       | f(x+h) = f(x) + f'(x)h
       | 
       | Which is little more than restating the usual definition of the
       | derivative: f'(x) = (f(x+h) - f(x))/h
       | 
       | For instance, suppose we have f(x) = 2x2 - 3x + 1, then
       | 
       | f(x+h) = 2(x+h)2 - 3(x+h) + 1 = 2(x2 + 2xh + h2) - 3(x+h) + 1 =
       | (2x2 - 3x + 1) + (4x - 3)h
       | 
       | Where the last step just involves rearranging terms and canceling
       | out the h2 = 0 term. Note that the expression for the derivative
       | we get, (4x-3), is correct, and magically computed itself
       | straight from the properties of the algebra.
       | 
       | In short, just like creating i2 = -1 revolutionized algebra,
       | setting h2 = 0 revolutionizes calculus. Most autodiff packages
       | (such as Pytorch) use something not much more advanced than this,
       | although there are optimizations to speed it up (e.g. reverse
       | mode diff).
        
         | compacct27 wrote:
         | Where do I go to learn what you just said?
        
         | fpgamlirfanboy wrote:
         | there's a reason no one uses dual numbers (smooth analysis) for
         | anything (neither autodiff nor calculus itself): because
         | manipulating infinitesmals like this is fraught formal
         | manipulation (it's algebra...) where as limits are much more
         | rigorous (bounds, inequalities, convergence, etc.).
         | 
         | > Most autodiff packages (such as Pytorch) use something not
         | much more advanced than this
         | 
         | pytorch absolutely does not use the dual number formulation -
         | there are absolutely no magic epsilon's anywhere in pytorch's
         | (or tensorflow's) code base. what you're calling dual are the
         | adjoints where are indeed stored/cached on every node in
         | pytorch graphs.
        
       ___________________________________________________________________
       (page generated 2024-03-22 23:00 UTC)