[HN Gopher] Mathematical Introduction to Deep Learning: Methods,...
___________________________________________________________________
Mathematical Introduction to Deep Learning: Methods,
Implementations, and Theory
Author : Anon84
Score : 180 points
Date : 2024-01-01 18:46 UTC (4 hours ago)
(HTM) web link (arxiv.org)
(TXT) w3m dump (arxiv.org)
| dachworker wrote:
| Is anyone using any of this math? My guess is no. At best it
| provides "moral support" for deep learning researchers who want
| to feel reassured that what they are attempting to do is not
| impossible.
|
| Glad to be proven wrong, though.
| nerdponx wrote:
| Describing it as "moral support" really sells it short.
|
| Imagine computer science without sorting algorithms, search
| algorithms, etc that have been proven correct and have known
| proven properties. This math serves the same purpose as CS
| theory.
|
| So yes, if you're just fitting a model from a library like
| Keras, you're not really "using" the math. If you're working
| with data sets below a certain size, problems below a certain
| level of complexity, and models that have been deployed for
| many years and have well studied properties, you can do a lot
| with only a cursory understanding of the math, much like you
| can write perfectly functional web apps in Python or Java
| without really understanding how the language runtime works at
| a deep level.
|
| But if you don't actually know how it works, you're going to
| get stuck pretty badly if you encounter a situation that isn't
| already baked into a library.
|
| If you want to see what happens when you don't know the
| underlying math, look at the current generation of "data
| science" graduates, who don't know their math or statistics
| fundamentals. There are plenty of issues on the hiring side of
| course, but ultimately the reason those kids aren't getting
| jobs is that they don't actually know what they're doing,
| because they were never forced to learn this stuff.
| nephanth wrote:
| According to the abstract it covers different ANN
| architectures, optimization algorithms, probably
| backpropagation.. so um yes? That is stuff anyoke in machine
| learning uses everyday?
| danielmarkbruce wrote:
| Some people like to think and communicate in dense math
| notation. So, yes.
| godelski wrote:
| There's something I tell my students. You don't need math to
| make good models, but you do need to know math to know why your
| models are wrong.
|
| So yes, math is needed. If you don't have math you're going to
| hoodwink yourself into thinking you can get to AGI by scale
| alone. You'll just use transformers everywhere because that's
| what everyone else does and you'll get confused between
| activation functions. You'll make models and models that work,
| but there's a big difference in working models and knowing
| where to expect your models to fail and understanding their
| limitations.
|
| I feel a lot of people just look at test set results and expect
| that to mean that the model isn't overfitting. (not to mention
| tuning hps based on test set results)
| light_hue_1 wrote:
| Oh sure. I say the same to my students.
|
| But the particular spin on this book makes it look to non-
| experts that this is the math you need to do something useful
| with deep learning. And that's just not true.
|
| Certainly you need to understand what you're optimizing, how
| your optimizer works, what your objective function is doing,
| etc. But the vast majority of people don't need to know about
| theoretical approximation results for problems that they will
| never actually encounter in real life, etc. For example, I
| have never used used anything like "6.1.3 Lyapunov-type
| stability for GD optimization" in a decade of ML research.
| I'm sure people do! But not on the kinds of problems I work
| on.
|
| Just look at the comments here. People are complaining about
| the lack of context, but this is fine for the audience the
| book is aimed at. It's just the average HN reader.
|
| I think it would be better if the authors chose a different
| title. As it stands, non-experts will be attracted and then
| be put off, and experts will think the book is likely to be
| too generic.
| godelski wrote:
| Yeah I would have a very hard time recommending this book
| too. It is absurdly math heavy. I'm not sure I've even seen
| another book this math dense before and I've read some
| pretty dense books targeting review. So I'm not even sure
| what audience this book is aimed for. Citations? And I
| fully agree that the title doesn't fit whoever that
| audience is.
| joe_the_user wrote:
| _If you don 't have math you're going to hoodwink yourself
| into thinking you can get to AGI by scale alone._
|
| There are very smart people who think we can get to AGI by
| scale alone - they call that the "the scaling hypothesis", in
| fact. I think they're wrong but I thought they knew a fair
| amount of math.
|
| What math would you use to describe the limitations of deep
| learning? My impression is there aren't any exact theorems
| that describe either it's limits or it's
| behavior/possibilities, there are just suggestive theorems
| and constructions combined with heuristics.
| fastneutron wrote:
| In the latter part of the book that covers PINNs and other PDE
| methods, it helps to frame these using the same kind of
| functional analysis that is used to develop more traditional
| numerical methods. In this case, it provides a way for
| practitioners to verify the physical consistency between the
| various methods.
| reqo wrote:
| Is it common to publish books directly to ArXiv, especially books
| that have just been released?
| godelski wrote:
| It's not too uncommon to see books available online from an
| official location. At least math and CS textbooks
| nerdponx wrote:
| Normally I just see it on the author's website.
| godelski wrote:
| First time I've seen one of these books where I wished there was
| more words and less math. Usually it is quite the opposite. But
| this book seems written as if they wanted to avoid natural
| language at all costs.
| axpy906 wrote:
| This is in Tensorflow. Would rather see a numpy version or
| something along those lines so that students can better
| understand what each step looks like in code.
|
| I concur on the comments noting lack of explanation for the
| notation/lemmas/proof.
| godelski wrote:
| I second this. Numpy would be the way to go, so students can
| switch to JAX or PyTorch trivially. Or they could use a mix,
| starting with numpy, build the layer from scratch, then hand
| over the abstraction. Pyro would be really good for this too
| _giorgio_ wrote:
| Tensorflow? LOL what is this, the year 2010?
| CamperBob2 wrote:
| Most of the examples I saw used Pytorch. (Which is still a step
| or two removed from the actual machinery, of course.)
| HybridCurve wrote:
| As someone who has a deeper knowledge of programming rather than
| math, I find the mathematical notation here to be harder to
| understand than the code (even in a programming language I do not
| know).
|
| Does anyone with a stronger mathematical background here find it
| easier to understand the math as written more easily than the
| source code?
| joshuanapoli wrote:
| Mathematical notation usually has a problem with preferring
| single-letter names. We usually prefer to avoid highly
| abbreviated identifier names in software, because they make the
| program harder to read. But they're common in Math, and I think
| that it makes for a lot of work jumping back and forth to
| remind oneself what each symbol means when trying to make sense
| of a statement.
| aabajian wrote:
| All three authors are PhDs or PhD-candidates in mathematics.
| The notation is extremely dense. I'm curious who their target
| audience of "students and scientists" are for this book.
| angra_mainyu wrote:
| I had a bunch of classes in undergrad (physics) that had
| basically the same notation and style.
| layer8 wrote:
| Mathematical notation is more concise, which may take some
| getting used to. One reason is that it is optimized for
| handwriting. Handwriting program code would be very tedious, so
| you can see why mathematical notation is the way it is.
|
| Apart from that, there is no "the code" equivalent.
| Mathematical notation is for stating mathematical facts or
| propositions. That's different from the purpose of the code you
| would write to implement deep-learning algorithms.
| conformist wrote:
| Yes, it's easier for mathematicians, because a lot of
| background knowledge and intuition is encoded in mathematical
| conventions (eg "C(R)" for continuous functions on the reals
| etc...). Note that this is probably a book for mathematicians.
| strangedejavu2 wrote:
| It's not too difficult to understand, but this introduction
| isn't written with pedagogy in mind IMO
| andrepd wrote:
| Obligatory hn comment on any math-related topic: "notation bad"
|
| Please be more original.
| outrun86 wrote:
| I'm just wrapping up a PhD in ML. The notation here is
| unnecessarily complex IMO. Notation can make things easier, or
| it can make things more difficult, depending on a number of
| factors.
| angra_mainyu wrote:
| Really? Coming from physics (B.Sc only) the notation is
| refreshingly familiar and straightforward. My topology and
| analysis classes were basically like this.
|
| In fact, this pdf is literally the resource I've been
| searching for as many others are far too ambiguous and
| handwavey focusing more on libraries and APIs than what's
| going on behind the scenes.
|
| If only there were a similar one for microeconomics and
| macroeconomics, I'd have my curiosity satiated.
| youainti wrote:
| As a PhD econ student, the mathematics just comes down
| solving constrained optimization problems. Figuring out
| what to consider as an optimand and the associated
| constraints is the real kicker.
| tnecniv wrote:
| It depends on what you're doing. That is accurate for,
| say, describing the training of a neural network, but if
| you want to prove something about generalization, for
| example (which the book at least touches on from my
| skimming), you'll need other techniques as well
| ceh123 wrote:
| As someone that's in the later stages of a PhD in math, given
| the title starts with "Mathematical Introduction...", the
| notation feels pretty reasonable for someone with a background
| in math.
|
| Sure I might want some slight changes to the notation I found
| skimming through on my phone, but everything they define and
| the notation they choose feels pretty familiar and I understand
| why they did what they did.
|
| Mirroring what someone else said, this is exactly the kind of
| intro I've been looking for for deep learning.
| WhitneyLand wrote:
| Use ChatGpt.
|
| Screenshot the math, crop it down to the equation, paste into
| the chat window.
|
| It can explain everything about it, what each symbol means, and
| how it applies to the subject.
|
| It's an amazing accelerator for learning math. There's no more
| getting stuck.
|
| I think it's underrated because people hear "LLM's aren't good
| at math". They are not good at certain kinds of problem solving
| (yet), but GPT4 is a fantastic conversational tutor.
| tnecniv wrote:
| So this is a book written by applied mathematicians for applied
| mathematics (they state in the preface it's for scientists, but
| some theoretical scientists and engineers are essentially
| applied mathematics). As a result, both the topics and the
| presentation are biased towards those types of people. For
| example, I've never seen in practice worry about the existence
| and uniqueness conditions for their gradient-based optimization
| algorithm in deep learning. However, that's the kind of result
| those people do care about and academic papers are written on
| the topic. The title does say that this is a book on the
| theoretical underpinnings of the subject, so I am not surprised
| that it is written this way. People also don't necessarily read
| these books cover-to-cover, but drill into the few chapters
| that use techniques relevant to what they themselves are
| researching. There was a similarly verbose monograph I used to
| use in my research, but only about 20-30 pages had the meat I
| was interested in.
|
| This kind of book is more verbose than my liking both in terms
| of rigor and content. For example, they include Gronwall's
| inequality as a lemma and prove it. The version that they use
| is a bit more general than the one I normally see, but
| Gronwall's inequality is a very standard tool in analyzing ODEs
| and I have rigorous control theory books that state it without
| proof to avoid clutter (they do provide a reference to a
| proof). A lot of this verbosity comes about when your standard
| of proof is high and the assumptions you make are small.
| spi wrote:
| Sharing my experience here. My background is in math (Ph.D. and
| a couple of postdoc years) before switching to practitioner in
| deep learning. This year I taught a class at university (as
| invited prof) in deep learning for students doing a masters in
| math and statistics (but with some programming knowledge, too).
|
| I tried to present concepts in an as reasonably accurate
| mathematical way as possible, and in the end I cut through a
| lot of math in part to avoid the heavy notation which seems to
| be present in this book (and in part to make sure students
| could spend what they learnt in the industry). My actual
| classes had way more code than formulas.
|
| If you want to write everything very accurately, things get
| messy, quickly. Finding a good notation for new concepts in
| math is very hard, something that gets sometimes done by bright
| minds only, even though afterwards everybody recognizes it was
| "clear" (think about Einstein notation, Feynman diagrams, etc.,
| or even just matrix notation, which Gauss was unaware of). If
| you just take domain A and write in notations from domain B,
| it's hard to get something useful (translating quantum
| mechanics to math with C* algebras and co. was a big endeavour,
| still an open research field to some extent).
|
| So I'll disagree with some of the comments below and claim that
| the effort of writing down this book was huge but probably
| scarcely useful. Who can read comfortably these equations
| probably won't need them (if you know what an affine
| transformation is, you hardly need to see all its ijkl indices
| written down explicitly for a 4-dimensional tensor), and the
| others will just be scared off. There might be a middle ground
| where it helps some, but at least I haven't encountered such
| people...
| HighFreqAsuka wrote:
| I've seen quite a few of these books attempting to explain deep
| learning from a mathematical perspective and it always surprises
| me. Deep learning is clearly an empirical science for the time
| being, and very little theoretical work that has been so
| impactful that I would think to include it in a book. Of the such
| books I've seen, this one seems like actively the worst one. A
| significant amount of space is dedicated to proving lemmas that
| provide no additional understanding and are only loosely related
| to deep learning. And a significant chunk of the code I see is
| just the plotting code, which I don't even understand why you'd
| include. I'm confident that very few people will ever read
| significant chunks of this.
|
| I think the best textbooks are still Deep Learning by Goodfellow
| etal and the more modern Understanding Deep Learning
| (https://udlbook.github.io/udlbook/).
| blauditore wrote:
| I think the mathematical background starts making sense once
| you get a good understanding of the topic, and then people make
| the wrong assumption that understanding the math will help
| learning the overall topic, but it that's usually pretty hard.
|
| Rather than trying to form an ituition based on the theory,
| it's often easier to understand the technicalities after
| getting an intuition. This is generally true in exact sciences,
| especially mathematics. That's why examples are helpful.
| danielmarkbruce wrote:
| UDL has some dense math notation in it.
|
| Math isn't just about proofs. It's a way to communicate. There
| are several different ways to communicate how a neural net
| functions. One is with pictures. One is with some code. One is
| with words. One is with some quite dense math notation.
| HighFreqAsuka wrote:
| I agree with that, I think UDL uses the necessary amount of
| math to communicate the ideas correctly. That is obviously a
| good thing. What it does not do is pretend to be presenting a
| mathematical theory of deep learning. Basically UDL is
| exactly how I think current textbooks should be presented.
| n3ur0n wrote:
| I would say UDL should be very accessible to any undergrad
| from a strong program.
|
| I would not call the notation 'dense' rather it's 'abused'
| notation. Once you have seen the abused notation enough
| times, it makes just makes sense. Aka "mathematical maturity"
| in the ML space.
|
| My views on this have changed as a first year PhD in ML I got
| annoyed by the shorthand. Now as someone with a PhD, I get it
| -- It's just too cumbersome to write out what exactly you
| mean and you write like you're writing for peers +\\- a
| level.
| thehappyfellow wrote:
| This book is not aimed at practitioners but I don't think that
| means it deserves to be called ,,actively the worst one".
|
| Even though the frontier of deep learning is very much
| empirical, there's interesting work trying to understand why
| the techniques work, not only which ones do.
|
| I'm sorry but saying proofs are not a good method for gaining
| understanding is ridiculous. Of course it's not great for
| everyone but a book titled ,,Mathematical Introduction to x" is
| obviously for people with some mathematical training. For that
| kind of audience lemmas and their proof are natural way of
| building understanding.
| HighFreqAsuka wrote:
| Just read the section on ResNets (Section 1.5) and tell me if
| you think that's the best way to explain ResNets to literally
| anyone. Tell me if, from that description, you take away that
| the reason skip connections improve performance is that they
| improve gradient flow in very deep networks.
| p1esk wrote:
| _the reason skip connections improve performance is that
| they improve gradient flow in very deep networks._
|
| Can you prove this statement?
| HighFreqAsuka wrote:
| Empirically yes, I can consider a very deep fully-
| connected network, measure the gradients in each layer
| with and without skip connections, and compare. I can do
| this across multiple seeds and run a statistical test on
| the deltas.
| ottaborra wrote:
| This makes me wonder. Is deep learning as a field an empirical
| science purely because everyone is afraid of the math? It has the
| richness of modern day physics but for some reason most the
| practioners seem to want to keep thinking of it as the wild west
| HighFreqAsuka wrote:
| No, there are many very mathematically inclined deep learning
| researchers. It's an empirical science because the mathematical
| tools we possess are not sufficient to describe the phenomena
| we observe and make predictions under one unified theory. Being
| an empirical science does not mean that the field is a "wild
| west". Deep learning models are subjectable to repeatable
| controlled experiments, from which you can improve your
| understanding of what will happen in most cases. Good
| practitioners know this.
| ottaborra wrote:
| The main point you're making is fair
|
| The only gripe I have is > Being an empirical science does
| not mean that the field is a "wild west"
|
| I think what you meant to say is: "Being an empirical science
| does not <b>necessarily</b> mean that the field is a \"wild
| west\""
|
| you clearly haven't seen the social sciences
|
| > Good practitioners know this
|
| sure?
|
| Edit: Removed unnecessary portions that wouldn't have
| continued the conversation in any meaningful way
| trhway wrote:
| >It's an empirical science because the mathematical tools we
| possess are not sufficient to describe the phenomena we
| observe and make predictions under one unified theory.
|
| To me the deep learning is actually itself a tool (which has
| well established, and simple at that, math underneath -
| gradient based optimization, vector space representation and
| compression) to make a good progress toward mathematical
| foundations of the empirical science of cognition.
|
| In the 90-ies there were works showing that for example
| Gabors in the first layer of the biological visual cortex are
| optimal for the feature based image recognition that we have.
| And as it happens in the visual NNs the convolution kernels
| in the first layers also converge to the Gabor-like. I see
| [signs of] similar convergence in the other layers (and all
| those semantically meaningful vector operations in the
| embedding space in LLMs are also very telling). Proving
| optimality or similar is much harder there, yet to me those
| "repeatable controlled experiments" (i.e. stable convergence)
| provide strong indication that it will be the case (as
| something does drive that convergence, and when there is such
| a drive in dynamic systems, you naturally end asymptotically
| up ("attracted") near something either fixed or periodic),
| and that would be a (or even "the") math foundation for
| understanding of cognition (dis-convergence from the real
| biological cognition, ie. emergence of completely different,
| yet comparable, type of cognition would also be great, if not
| even the much greater result) .
| tnecniv wrote:
| A little bit of A and B. You can do a lot with very little math
| beyond linear algebra, calculus, and undergraduate probability,
| and that knowledge is mainly there to provide intuition and
| formalize the problem that you're solving a bit. You also churn
| out results (including very impressive ones) without doing any
| math.
|
| A result of the above is that people are empirically
| demonstrating new problems and solving them very quickly --
| much more quickly than people can come up with theoretical
| results explaining why they work. The theory is harder to come
| by for a few reasons, but many of the successful examples of
| deep learning don't fit nicely into older frameworks from,
| e.g., statistics and optimal control, to explain them well.
| runsWphotons wrote:
| I like this book and everyone complaining about the math and math
| notation is a silly goose.
___________________________________________________________________
(page generated 2024-01-01 23:00 UTC)