[HN Gopher] Understanding Machine Learning: From Theory to Algor...
___________________________________________________________________
Understanding Machine Learning: From Theory to Algorithms
Author : Anon84
Score : 163 points
Date : 2025-04-04 18:25 UTC (4 hours ago)
(HTM) web link (www.cs.huji.ac.il)
(TXT) w3m dump (www.cs.huji.ac.il)
| TechDebtDevin wrote:
| Anyone who wants to demystify ML should read: The StatQuest
| Illustrated Guide to Machine Learning [0] By Josh Starmer. To
| this day I haven't found a teacher who could express complex
| ideas as clearly and concisely as Starmer does. It's written in
| an almost children's book like format that is very easy to read
| and understand. He also just published a book on NN that is just
| as good. Highly recommend even if you are already an expert as it
| will give you great ways to teach and communicate complex ideas
| in ML.
|
| [0]: https://www.goodreads.com/book/show/75622146-the-
| statquest-i...
| joshdavham wrote:
| I haven't read that book, but I can personally attest to Josh
| Starmer's StatQuest Youtube channel[1] being awesome! I used
| his lessons as a supplement to my studies when I was studying
| statistics in uni.
|
| [1]: https://www.youtube.com/channel/UCtYLUTtgS3k1Fg4y5tAhLbw
| gavinray wrote:
| This is the 2nd or 3rd time in the last few weeks I've seen
| this person recommended. Must be something to that.
| j_bum wrote:
| He's great. I learned a ton from him when I was starting my
| computational biology studies as a grad student.
| RealityVoid wrote:
| I have it in my bookshelf! I bought it on a whim, used, along
| with other CS books, but didn't think it's that good! I will
| try reading it. Thanks.
| kenjackson wrote:
| I would've thought that NN and ML would be taught together.
| Does he assume with the NN book that you already have a certain
| level of ML understanding?
| m11a wrote:
| Most ML is disjoint from the current NN trends, IMO. Compare
| Bishop's PRML to his Deep Learning textbook. First couple
| chapters are copy+paste preliminaries (probability,
| statistics, Gaussians, other maths background), and then they
| completely diverge. I'm not sure how useful classical ML is
| for understanding NNs.
| xmprt wrote:
| That's fair. My understanding is that NN and ML are similar
| insofar as they are both about minimizing a loss value
| (like negative log likelihood). And then the methods of
| doing that are very different and once you get even more
| advanced, NN concepts feel like a completely different
| universe.
| grep_it wrote:
| Thanks for the recommendation. Purchased both them!
| cubefox wrote:
| I have read parts of it years ago. As far as I remember, this is
| very theoretical (lots of statistical learning theory, including
| some IMHO mistaken treatment of Vapnik's theory of structural
| risk minimization), with strong focus on theory and basicasically
| zero focus on applications. Which would be completely outdated by
| now anyway, as the book is from 2014, an eternity in AI.
|
| I don't think many people will want to read it today. As far as I
| know, mathematical theories like SLT have been of little use for
| the invention of transformers or for explaining why neural
| networks don't overfit despite large VC dimension.
|
| Edit: I think the title "From theory to machine learning" sums up
| what was wrong with this theory-first approach. Basically, people
| with interest in math but with no interest in software
| engineering got interested in ML and invented various abstract
| "learning theories", e.g. statistical learning theory (SLT).
| Which had very little to do with what you can do in practice.
| Meanwhile, engineers ignored those theories and got their hands
| dirty on actual neural network implementations while trying to
| figure out how their performance can be improved, which led to
| things like CNNs and later transformers.
|
| I remember Vapnik (the V in VC dimension) complaining in the
| preface to one of his books about the prevalent (alleged)
| extremism of focussing on practice only while ignoring all those
| beautiful math theories. As far as I know, it has now turned out
| that these theories just were far too weak to explain the actual
| complexity of approaches that do work in practice. It has clearly
| turned out that machine learning is a branch of engineering, not
| a branch of mathematics or theoretical computer science.
|
| The title of this book encapsulates the mistaken hope that first
| people will learn those abstract learning theories, they get
| inspired, and promptly invent new algorithms. But that's not what
| happened. SLT is barely able to model supervised learning, let
| alone reinforcement learning or self-supervised learning. As I
| mentioned, they can't even explain why neural networks are robust
| to overfitting. Other learning theories (like
| computational/algorithmic learning theory, or fantasy stuff like
| Solomonoff induction / Kolmogorov complexity) are even more
| detached from reality.
| lamename wrote:
| I watched a discussion the other day on this "NNs don't overfit
| point". I realize yes certain aspects are surprising, and in
| many cases with the right size and diversity in a dataset
| scaling laws prevail, but my experience with real datasets
| training from scratch (not fine tuning pretrained models), and
| impression has always been that NNs definitely can overfit if
| you don't have large quantities of data. My gut assumption is
| that original theories were not demonstrated to be true in
| certain circumstances (i.e. certain dataset characteristics),
| but that's never mentioned in shorthand these days when data
| sets size is often assumed to be huge.
|
| (Before anyone laughs this off, this is still an actual problem
| in the real world for non-FAANG companies who have niche
| problems or cannot use open-but-non-commercial datasets. Not
| everything can be solved with foundational/frontier models.)
|
| Please point me to these papers because I'm still learning.
| cubefox wrote:
| Yes they can overfit. SLT assumed that this is caused by
| large VC dimension. Which apparently isn't true because there
| exist various techniques/hacks which effectively combat
| overfitting while not actually reducing the very large VC
| dimension of those neural networks. Basically, the theory
| predicts they always overfit, while in reality they mostly
| work surprisingly well. That's often the case in ML
| engineering: people discover things work well and others
| don't, while not being exactly sure why. The famous
| Chinchilla scaling law was an empirical discovery, not a
| theoretical prediction, because theories like SLT are far too
| weak to make interesting predictions like that. Engineering
| is basically decades ahead of those pure-theory learning
| theories.
|
| > Please point me to these papers because I'm still learning.
|
| Not sure which papers you have in mind. To be clear, I'm not
| an expert, just an interested layman. I just wanted to
| highlight the stark difference between the apparently failed
| pure math approach I learned years ago in a college class,
| and the actual ML papers that are released today, with major
| practical breakthroughs on a regular basis. Similarly
| practical papers were always available, just from very
| different people, e.g. LeCun or people at DeepMind, not from
| theoretical computer science department people who wrote text
| books like the one here. Back in the day it wasn't very clear
| (to me) that those practice guys were really onto something
| while the theory guys were a dead end.
| kadushka wrote:
| Theory is still needed if you want to understand things like
| variational inference (which is in turn needed to understand
| things like diffusion models). It's just like physics - you
| need math theories to understand things like quantum mechanics,
| because otherwise it might not make sense.
| janis1234 wrote:
| Book is 10 years old, isn't it outdated?
| 0cf8612b2e1e wrote:
| Have not read the book, but only deep learning has had such
| wild advancement that a decade would change anything. The
| fundamentals of ML training/testing, variance/bias, etc are the
| same. The classical algorithms still have their place. The only
| modern advancement which might not be present would be XGBoost
| style forests.
| TechDebtDevin wrote:
| Machine Learning concepts have been around forever, they just
| used to call them statistics ;0
| antegamisou wrote:
| Nope, and AIMA/PRML/ESL are still king!
|
| Apart from these 3 you literally need nothing else for the very
| fundamentals and even advanced topics.
| nyrikki wrote:
| Even Russel and Norvig is still applicable for the
| fundamentals, and with the rise of agenic efforts would be
| extremely helpful.
|
| The updates to even the Bias/Variance Dilemma (Geman 1992) are
| minor if you look at the original paper:
|
| https://www.dam.brown.edu/people/documents/bias-variance.pdf
|
| They were dealing with small datasets or infinite datasets, and
| double decent only really works when the patterns in your test
| set are similar enough to those in your training set.
|
| While you do need to be mindful about some of the the older
| opinions, the fundamentals are the same.
|
| For fine tuning or RL, the same problems with small datasets or
| infinite datasets, where concept classes for training data may
| be novel, that 1992 paper still applies and will bite you if
| you assume it is universally invalid.
|
| Most of the foundational concepts are from the mid 20th
| century.
|
| The availability of mass amounts of data and new discoveries
| have modified the assumptions and tooling way more than
| invalidating previous research. Skim that paper and you will
| see they simply dismissed the mass data and compute we have
| today as impractical at the time.
|
| Find the book that works best for you, learn the concepts and
| build tacit experience.
|
| Lots of efforts are trying to incorporate symbolic and other
| methods too.
|
| IMHO Building breadth and depth is what will save time and help
| you find opportunities, knowledge of the fundamentals is
| critical for that.
| janalsncm wrote:
| Depends on what your goal is. If you're just curious about ML,
| probably none of the info will be wrong. But it's also really
| not engaging with the most interesting problems engineers are
| tackling today, unlike an 11 year old chemistry book for
| example (I think). So as interview material or to break into
| the field it's not going to be the most useful.
| cubefox wrote:
| I have read parts of it. It arguably was already "outdated"
| back then, as it mostly focused on abstract mathematical theory
| of questionable value instead of cutting edge "deep learning".
| mikedelfino wrote:
| Any recommendations?
| pajamasam wrote:
| I would recommend https://udlbook.github.io/udlbook/ instead if
| you're looking to learn about modern generative AI.
| smath wrote:
| +1 for Simon prince's UDL book. Very clearly written
| miltava wrote:
| Thanks for the recommendation. Have you looked at Bishop's Deep
| learning book (https://www.bishopbook.com/)? How would you
| compare both? Thanks again
| m11a wrote:
| You'll be happy with either. Bishop's approach is
| historically more mathematical (cf his 2006 PRML text), and
| you see that in the preliminaries chapters of Deep Learning,
| but there's less of this as the book goes on.
|
| I've read chapters from both. Much overlaps, but sometimes
| one book or the other explains a concept better or provides
| different perspectives or details.
| johnsutor wrote:
| https://bloomberg.github.io/foml/#home This course is my personal
| favorite.
| joshdavham wrote:
| What other books do people recommend?
___________________________________________________________________
(page generated 2025-04-04 23:00 UTC)