[HN Gopher] The Modern Mathematics of Deep Learning
___________________________________________________________________
The Modern Mathematics of Deep Learning
Author : tims457
Score : 150 points
Date : 2021-06-12 16:37 UTC (6 hours ago)
(HTM) web link (arxiv.org)
(TXT) w3m dump (arxiv.org)
| joe_the_user wrote:
| It seems like this can leave the reader with the wrong
| impression. Calculus really is "the mathematics of Newtonian
| physics". This is just "some mathematics that might help a bit in
| your intuitions of deep learning".
|
| IE, Deep learning is fundamentally just about getting the
| mathematically simple but complex and multi-layerd "neural
| networks" to do stuff. Training them, testing them and deploying
| them. There are many intuitions about these things but there's no
| complete theory - some intuitions involve mathematical analogies
| and simplifications while other involve "folk knowledge" or large
| scale experiments. And that's not saying folks giving math about
| deep learning aren't proving real things. It's just they
| characterizing the whole or even a substantial part of such
| systems.
|
| It's not surprising that a complex like a many-layered Relu
| network can't fully characterized or solved mathematically. You'd
| expect that of any arbitrarily complex algorithmic construct.
| Differential equations of many variables and arbitrary functions
| also can't have their solutions fully characterized.
| fogof wrote:
| As a PhD student who sort of burned out on this type of
| research, I agree that the complexity of Neural Networks as a
| mathematical construct makes them very difficult to analyze.
| This might also have to do with Deep learning theory being a
| subset of learning theory which is subject to "No Free Lunch"
| [1], which means that you always have to be very careful not to
| try to prove something that turns out to be impossible.
|
| That being said, research on the Kernel regime is one of the
| very cool ideas, in my opinion, to gain traction in this field
| in the past few years. To summarize: "If you make a neural
| network wide enough, it gains the power to control its output
| on each individual input separately, and will begin to fit its
| training data perfectly". Of course, the real pleasure is in
| understanding all the mathematical details of this statement!
|
| [1] : https://en.wikipedia.org/wiki/No_free_lunch_theorem
| joe_the_user wrote:
| I got my master's years ago so now I'm a strict amateur. That
| said, I don't think the "No free lunch theorem" is very
| "interesting". It's nearly tautological that no approximation
| method works for "any" function. The set of
| predictable/interesting/useful/"real-world" functions is
| going to have measure 0 compared to white noise so "any
| function" will basically look like white noise and can't be
| predicted. Approximating functions/sequences with vanishingly
| low Kolmogorov complexity is more interesting, impossible in
| general by Godel's theorem but what's the case "on average"?
| (depends on the choice process and so ill-defined but
| defining might be interesting). The kernel regime stuff looks
| interesting but I don't know it's relation to wide networks.
|
| Neural networks "tend to generalize well in the real world".
| That's a pretty fuzzy statement imo since "real world" is
| hardly defined but it's still what people experience and it's
| more useful to provide a more precise model where this works
| rather than a model where this doesn't work.
|
| Also, there's good theory on deep networks as universal well
| as theories of wide/shallow networks [1].
|
| [1]: https://arxiv.org/abs/1901.02220
| roenxi wrote:
| > Neural networks "tend to generalize well in the real
| world".
|
| I've always interpreted that as "we've found an algorithm
| that could, given a foreseeable amount of computing power
| and maybe some tweaks, simulate human decision making".
|
| It isn't so much that neural networks can approximate the
| real world as they can approximate human perception of the
| real world.
| jhrmnn wrote:
| There are a few works that try to put deep learning on some
| theoretical basis, I like this one, for example:
|
| https://arxiv.org/abs/1703.00810
|
| This goes beyond mere intuition, but it is also still very far
| from a "complete theory".
|
| I find it disappointing that so few people in deep learning
| work on the theoretical foundations.
| quibono wrote:
| What are some subfields of mathematics that you would say are
| crucial for gaining a proper understanding of all the things
| related to deep learning (e.g. let's say the paper you
| linked)? Even though the theory isn't complete, I'm sure a
| grounding in certain fields of mathematics will be helpful.
| iNic wrote:
| This is always difficult to answer, and it will probably be
| a mixture of many, however I am currently following
| categorical approaches to machine learning. Category Theory
| is the area of mathematics that studies composable
| structures, i.e. like layers in a deep network. It is very
| abstract and was invented to solve problem in algebraic
| geometry, but has been fruitful in other areas as well.
| convexity123 wrote:
| Could you give some favourite references, some use of
| category theory in ML which gives good results compared
| to standard approaches?
|
| Is there a group doing this in Zurich?
| 317070 wrote:
| Dynamical systems and chaos theory (especially for neural
| networks), information theory (especially for the paper
| linked), probability theory (especially the more
| foundational and axiomatic work)
| 0-_-0 wrote:
| Of the many "understanding neural networks" papers this is
| one of the few valuable ones.
| keithalewis wrote:
| Agreed. Until we get to the point where there are theorems of
| the form, for example, "Given a problem satisfying conditions
| X, the optimal number of layers to minimize expected training
| time for data satisfying Y is Z", it is just stamp
| collecting.
| conformist wrote:
| It seems like it aims at giving somebody who would like to get
| started doing theoretical research in the field some pointers
| and basic insights. I don't think it does a particularly bad
| job at this, in particular given that it will be a book
| chapter? The target audience are probably people who have had
| some exposure to Functional Analysis and the likes before.
| rohittidke wrote:
| I believe that the curse of dimensionility doesn't apply here as
| we are optimizing the "universal apppriximator" of the "surface"
| of the possible real world function.
| antipaul wrote:
| Does "possible" in your statement refer to the inherent
| constraints of the architecture as specified by the researcher,
| or something else?
| amelius wrote:
| What are the prerequisites?
| fspeech wrote:
| Mostly analysis. If you understand section 1 notations, you are
| obviously set. But even if you don't you should still be able
| to get the ideas with a bit of mental translation. In a word
| the notation seemed unnecessarily heavy for the level of
| discussion.
| 0-_-0 wrote:
| Deep learning papers often use math in a way that obscures
| rather than enlightens. And when you finally understand what
| they are saying, you realize it's not interesting at all, or
| they made a mistake in the math.
| thanksok wrote:
| Looks like a little bit of everything except the likes of
| abstract algebra, logic, category theory.
|
| These include linear algebra, graph theory, probability,
| algorithms, mathematical analysis, topology, differential
| geometry. But the most important prereqs are math maturity and
| mental toughness/endurance.
| SilurianWenlock wrote:
| mental toughness/endurance haha!
| keithalewis wrote:
| Mind reading. They use terminology without defining it or
| giving a reference.
| beforeolives wrote:
| Seriously, I'm struggling to understand things that I already
| know.
| sundarurfriend wrote:
| Any examples? I haven't yet come across something like that
| yet, but I'm only a short way into the article.
| keithalewis wrote:
| The terms "measurable" and "tempered" for starters.
| ganzuul wrote:
| For the latter maybe this?
| https://en.wikipedia.org/wiki/Parallel_tempering
| cpp_frog wrote:
| While I can't give the _exact_ prerequisites, I know that all
| of the things that appear in the paper relate to:
|
| (1) Linear Algebra
|
| (2) Optimization Theory (Convex Analysis, non-convex
| optimization) [0], [2]
|
| (3) Probability Theory and Statistics (Measure Theory,
| Multivariate Statistics) [1], [3], [4], [5]
|
| (4) Analysis, to a lesser extent. (2) and (3) are the most
| important.
|
| I would give more references, but my background is too
| theoretical (and my field is Numerical Analysis of PDE). From
| the classes I took in college, three or four on each of (1-4),
| a person with a similar background can recognize the tools
| without much digging. Maybe some folks here can provide some
| insights into books that center on applications. So I'm trying
| not to diverge into too much theory (i.e. for measures, [4]
| instead of Folland). There also seems to be good use of
| Analysis techniques in the paper, see theorem 2.1.
|
| I love that the paper references the Moore-Penrose pseudo-
| inverse, an object of study in both statistics and optimization
| for which I had to give a lecture for a course.
|
| [0] https://web.stanford.edu/~boyd/cvxbook/bv_cvxbook.pdf
| _Convex Optimization_ , Boyd and Vandenberghe
|
| [1] _An Introduction to Multivariate Statistical Analysis_ ,
| Anderson
|
| [2] _Convex Analysis and Monotone Operator Theory in Hilbert
| Spaces_ , Bauschke-Combettes
|
| [3] _Theory of Multivariate Statistics_ , Bilodeau-Brenner
|
| [4] _The Elements of Integration and Lebesgue Measure_ , Bartle
|
| [5] _Probability: Theory and Examples_ , Durrett
| godelski wrote:
| I skimmed it. Looks like just some basic calc and linear
| algebra. Nothing that crazy.
| pcbro141 wrote:
| Tangent, but has anyone taken Fast.ai or similar courses and
| transitioned into the Deep Learning/ML field without a MS/PhD? To
| be honest, I don't even know what 'doing ML/DL' looks like in
| practice, but I'm just curious if a lot of folks get in to the
| field without graduate degrees.
| mustafa_pasi wrote:
| You can learn all you need to know in 2 to 3 university level
| courses. So we are talking less than a year of university
| courses.
|
| Fast.ai is too high level. I don't like it. You would be better
| served taking actual university courses. A few days ago people
| linked to LeCun's university class[1]. This is a solid
| introduction. Does not cover everything but that is OK. Seems
| like it is missing Bayesian approaches. Then if you want to
| specialize in vision or speech or robotics or whatever, you
| take special classes on that topic and learn all the SOTA
| techniques. Then you are ready to do research already, or apply
| your knowledge to build stuff. Of course you still have to
| learn how to do real machine learning, which involves all the
| data manipulation stuff, but that is learned by doing.
|
| [1] https://cds.nyu.edu/deep-learning/
| akgoel wrote:
| I am in a Fintech boot camp, and it's clear that doing ML/DL
| requires very little math, as the math is all abstracted away.
| catillac wrote:
| The problem with this view is that once one gets stuck, which
| is very quick when one is doing the work for real, one
| doesn't have any tools to debug anything except at the most
| basic level and most probably doesn't understand anything
| intuitively enough to even reason about what the underlying
| problem could be.
|
| I don't do this work myself, but we've hired many interns
| from bootcamps to do ML, and ones from college with ML
| projects. The bootcamp grads with no additional background
| have almost universally hit hard walls once anything gets
| more complex than using Keras to glue together layers. It's
| given me the impression, anecdotally, that bootcamps are
| largely predatory to take ones money and provide only a
| veneer of knowledge in the area. This doesn't seem to apply
| to people with a CS or math background that took an ML
| bootcamp to add that dimension to their already-mathematical
| skillset. But people who have, again only anecdotally in my
| experience with an n of perhaps only 20, taken a bootcamp to
| reskill from a totally unrelated and perhaps qualitative
| field have not had success with a bootcamp alone, but have
| had success in doing what the above poster recommended in
| taking university courses in the area.
|
| Very respectfully, if you're in a boot camp right now, you're
| unlikely deep enough into the day to day work of ML to make
| the assertion you're making.
| maxwells-daemon wrote:
| I think it depends! If you want to zoom out and take the
| "systems view" using standard components, then you probably
| don't need much math. If you want to develop new
| architectures or algorithms, then you definitely will. The
| well-trodden paths of ML might have most of their math
| abstracted away, but in my experience every time you get
| close to the frontiers, people are using math to understand
| what's going on or develop new approaches.
| hogFeast wrote:
| It also doesn't really work if you have to tackle a new
| problem.
|
| I stopped studying maths well before university. I am not
| some kind of math super genius. But working on my own
| stuff, which did involve new problems, I was up the creek
| fairly quickly without a solid mathematical understanding
| of the techniques I was trying to use.
|
| I don't think the bar is particularly high here. Solid
| understanding of stats, ESL...but I have seen people
| shotgunning models (I did this years ago too), and that
| isn't going to work very long.
|
| Also, I don't really understand why you wouldn't study some
| of this stuff. Maths as taught in schools treats you like a
| meat calculator...that isn't fun. But if you are interested
| in ML, going through Stats, Linear Algebra...it is pretty
| interesting because there are so many clear connections
| with your work.
| maxwells-daemon wrote:
| Not Fast.ai, but I self-studied ML during undergrad (mostly
| from books) and am currently working as an ML research
| scientist.
|
| That being said, I'm also thinking about starting an ML PhD
| because it does honestly open more doors to top research
| groups.
| tmabraham wrote:
| I took the fast.ai course and now I am doing a Ph.D. in
| Biomedical Engineering focused on applying deep learning to
| microscopy.
|
| I don't think fast.ai is enough if you want to do theoretical
| research in deep learning, but it certainly provides enough to
| work on practical problems with deep learning. That said, many
| of us in the fastai community are able to delve deep into,
| understand, and implement recent deep learning papers and even
| develop novel techniques. So I think with a little extra
| studying, one could go easily transition to core deep learning
| research.
| TrackerFF wrote:
| One example I can come up with now - image classification /
| segmentation / regression problems.
|
| Unfortunately, not all data is available or provided in a data
| "friendly" format - sometimes all you get are image files, and
| similar. Maybe you want to read some value off these images,
| count objects, or whatever - which traditionally has been done
| by trained/skilled workers.
|
| With CNNs, it _can_ be a trivial task implement models for
| solving the above problems. That's time and money saved for a
| business.
___________________________________________________________________
(page generated 2021-06-12 23:00 UTC)