[HN Gopher] Understanding Machine Learning: From Theory to Algor...
       ___________________________________________________________________
        
       Understanding Machine Learning: From Theory to Algorithms
        
       Author : Anon84
       Score  : 163 points
       Date   : 2025-04-04 18:25 UTC (4 hours ago)
        
 (HTM) web link (www.cs.huji.ac.il)
 (TXT) w3m dump (www.cs.huji.ac.il)
        
       | TechDebtDevin wrote:
       | Anyone who wants to demystify ML should read: The StatQuest
       | Illustrated Guide to Machine Learning [0] By Josh Starmer. To
       | this day I haven't found a teacher who could express complex
       | ideas as clearly and concisely as Starmer does. It's written in
       | an almost children's book like format that is very easy to read
       | and understand. He also just published a book on NN that is just
       | as good. Highly recommend even if you are already an expert as it
       | will give you great ways to teach and communicate complex ideas
       | in ML.
       | 
       | [0]: https://www.goodreads.com/book/show/75622146-the-
       | statquest-i...
        
         | joshdavham wrote:
         | I haven't read that book, but I can personally attest to Josh
         | Starmer's StatQuest Youtube channel[1] being awesome! I used
         | his lessons as a supplement to my studies when I was studying
         | statistics in uni.
         | 
         | [1]: https://www.youtube.com/channel/UCtYLUTtgS3k1Fg4y5tAhLbw
        
           | gavinray wrote:
           | This is the 2nd or 3rd time in the last few weeks I've seen
           | this person recommended. Must be something to that.
        
             | j_bum wrote:
             | He's great. I learned a ton from him when I was starting my
             | computational biology studies as a grad student.
        
         | RealityVoid wrote:
         | I have it in my bookshelf! I bought it on a whim, used, along
         | with other CS books, but didn't think it's that good! I will
         | try reading it. Thanks.
        
         | kenjackson wrote:
         | I would've thought that NN and ML would be taught together.
         | Does he assume with the NN book that you already have a certain
         | level of ML understanding?
        
           | m11a wrote:
           | Most ML is disjoint from the current NN trends, IMO. Compare
           | Bishop's PRML to his Deep Learning textbook. First couple
           | chapters are copy+paste preliminaries (probability,
           | statistics, Gaussians, other maths background), and then they
           | completely diverge. I'm not sure how useful classical ML is
           | for understanding NNs.
        
             | xmprt wrote:
             | That's fair. My understanding is that NN and ML are similar
             | insofar as they are both about minimizing a loss value
             | (like negative log likelihood). And then the methods of
             | doing that are very different and once you get even more
             | advanced, NN concepts feel like a completely different
             | universe.
        
         | grep_it wrote:
         | Thanks for the recommendation. Purchased both them!
        
       | cubefox wrote:
       | I have read parts of it years ago. As far as I remember, this is
       | very theoretical (lots of statistical learning theory, including
       | some IMHO mistaken treatment of Vapnik's theory of structural
       | risk minimization), with strong focus on theory and basicasically
       | zero focus on applications. Which would be completely outdated by
       | now anyway, as the book is from 2014, an eternity in AI.
       | 
       | I don't think many people will want to read it today. As far as I
       | know, mathematical theories like SLT have been of little use for
       | the invention of transformers or for explaining why neural
       | networks don't overfit despite large VC dimension.
       | 
       | Edit: I think the title "From theory to machine learning" sums up
       | what was wrong with this theory-first approach. Basically, people
       | with interest in math but with no interest in software
       | engineering got interested in ML and invented various abstract
       | "learning theories", e.g. statistical learning theory (SLT).
       | Which had very little to do with what you can do in practice.
       | Meanwhile, engineers ignored those theories and got their hands
       | dirty on actual neural network implementations while trying to
       | figure out how their performance can be improved, which led to
       | things like CNNs and later transformers.
       | 
       | I remember Vapnik (the V in VC dimension) complaining in the
       | preface to one of his books about the prevalent (alleged)
       | extremism of focussing on practice only while ignoring all those
       | beautiful math theories. As far as I know, it has now turned out
       | that these theories just were far too weak to explain the actual
       | complexity of approaches that do work in practice. It has clearly
       | turned out that machine learning is a branch of engineering, not
       | a branch of mathematics or theoretical computer science.
       | 
       | The title of this book encapsulates the mistaken hope that first
       | people will learn those abstract learning theories, they get
       | inspired, and promptly invent new algorithms. But that's not what
       | happened. SLT is barely able to model supervised learning, let
       | alone reinforcement learning or self-supervised learning. As I
       | mentioned, they can't even explain why neural networks are robust
       | to overfitting. Other learning theories (like
       | computational/algorithmic learning theory, or fantasy stuff like
       | Solomonoff induction / Kolmogorov complexity) are even more
       | detached from reality.
        
         | lamename wrote:
         | I watched a discussion the other day on this "NNs don't overfit
         | point". I realize yes certain aspects are surprising, and in
         | many cases with the right size and diversity in a dataset
         | scaling laws prevail, but my experience with real datasets
         | training from scratch (not fine tuning pretrained models), and
         | impression has always been that NNs definitely can overfit if
         | you don't have large quantities of data. My gut assumption is
         | that original theories were not demonstrated to be true in
         | certain circumstances (i.e. certain dataset characteristics),
         | but that's never mentioned in shorthand these days when data
         | sets size is often assumed to be huge.
         | 
         | (Before anyone laughs this off, this is still an actual problem
         | in the real world for non-FAANG companies who have niche
         | problems or cannot use open-but-non-commercial datasets. Not
         | everything can be solved with foundational/frontier models.)
         | 
         | Please point me to these papers because I'm still learning.
        
           | cubefox wrote:
           | Yes they can overfit. SLT assumed that this is caused by
           | large VC dimension. Which apparently isn't true because there
           | exist various techniques/hacks which effectively combat
           | overfitting while not actually reducing the very large VC
           | dimension of those neural networks. Basically, the theory
           | predicts they always overfit, while in reality they mostly
           | work surprisingly well. That's often the case in ML
           | engineering: people discover things work well and others
           | don't, while not being exactly sure why. The famous
           | Chinchilla scaling law was an empirical discovery, not a
           | theoretical prediction, because theories like SLT are far too
           | weak to make interesting predictions like that. Engineering
           | is basically decades ahead of those pure-theory learning
           | theories.
           | 
           | > Please point me to these papers because I'm still learning.
           | 
           | Not sure which papers you have in mind. To be clear, I'm not
           | an expert, just an interested layman. I just wanted to
           | highlight the stark difference between the apparently failed
           | pure math approach I learned years ago in a college class,
           | and the actual ML papers that are released today, with major
           | practical breakthroughs on a regular basis. Similarly
           | practical papers were always available, just from very
           | different people, e.g. LeCun or people at DeepMind, not from
           | theoretical computer science department people who wrote text
           | books like the one here. Back in the day it wasn't very clear
           | (to me) that those practice guys were really onto something
           | while the theory guys were a dead end.
        
         | kadushka wrote:
         | Theory is still needed if you want to understand things like
         | variational inference (which is in turn needed to understand
         | things like diffusion models). It's just like physics - you
         | need math theories to understand things like quantum mechanics,
         | because otherwise it might not make sense.
        
       | janis1234 wrote:
       | Book is 10 years old, isn't it outdated?
        
         | 0cf8612b2e1e wrote:
         | Have not read the book, but only deep learning has had such
         | wild advancement that a decade would change anything. The
         | fundamentals of ML training/testing, variance/bias, etc are the
         | same. The classical algorithms still have their place. The only
         | modern advancement which might not be present would be XGBoost
         | style forests.
        
           | TechDebtDevin wrote:
           | Machine Learning concepts have been around forever, they just
           | used to call them statistics ;0
        
         | antegamisou wrote:
         | Nope, and AIMA/PRML/ESL are still king!
         | 
         | Apart from these 3 you literally need nothing else for the very
         | fundamentals and even advanced topics.
        
         | nyrikki wrote:
         | Even Russel and Norvig is still applicable for the
         | fundamentals, and with the rise of agenic efforts would be
         | extremely helpful.
         | 
         | The updates to even the Bias/Variance Dilemma (Geman 1992) are
         | minor if you look at the original paper:
         | 
         | https://www.dam.brown.edu/people/documents/bias-variance.pdf
         | 
         | They were dealing with small datasets or infinite datasets, and
         | double decent only really works when the patterns in your test
         | set are similar enough to those in your training set.
         | 
         | While you do need to be mindful about some of the the older
         | opinions, the fundamentals are the same.
         | 
         | For fine tuning or RL, the same problems with small datasets or
         | infinite datasets, where concept classes for training data may
         | be novel, that 1992 paper still applies and will bite you if
         | you assume it is universally invalid.
         | 
         | Most of the foundational concepts are from the mid 20th
         | century.
         | 
         | The availability of mass amounts of data and new discoveries
         | have modified the assumptions and tooling way more than
         | invalidating previous research. Skim that paper and you will
         | see they simply dismissed the mass data and compute we have
         | today as impractical at the time.
         | 
         | Find the book that works best for you, learn the concepts and
         | build tacit experience.
         | 
         | Lots of efforts are trying to incorporate symbolic and other
         | methods too.
         | 
         | IMHO Building breadth and depth is what will save time and help
         | you find opportunities, knowledge of the fundamentals is
         | critical for that.
        
         | janalsncm wrote:
         | Depends on what your goal is. If you're just curious about ML,
         | probably none of the info will be wrong. But it's also really
         | not engaging with the most interesting problems engineers are
         | tackling today, unlike an 11 year old chemistry book for
         | example (I think). So as interview material or to break into
         | the field it's not going to be the most useful.
        
         | cubefox wrote:
         | I have read parts of it. It arguably was already "outdated"
         | back then, as it mostly focused on abstract mathematical theory
         | of questionable value instead of cutting edge "deep learning".
        
           | mikedelfino wrote:
           | Any recommendations?
        
       | pajamasam wrote:
       | I would recommend https://udlbook.github.io/udlbook/ instead if
       | you're looking to learn about modern generative AI.
        
         | smath wrote:
         | +1 for Simon prince's UDL book. Very clearly written
        
         | miltava wrote:
         | Thanks for the recommendation. Have you looked at Bishop's Deep
         | learning book (https://www.bishopbook.com/)? How would you
         | compare both? Thanks again
        
           | m11a wrote:
           | You'll be happy with either. Bishop's approach is
           | historically more mathematical (cf his 2006 PRML text), and
           | you see that in the preliminaries chapters of Deep Learning,
           | but there's less of this as the book goes on.
           | 
           | I've read chapters from both. Much overlaps, but sometimes
           | one book or the other explains a concept better or provides
           | different perspectives or details.
        
       | johnsutor wrote:
       | https://bloomberg.github.io/foml/#home This course is my personal
       | favorite.
        
       | joshdavham wrote:
       | What other books do people recommend?
        
       ___________________________________________________________________
       (page generated 2025-04-04 23:00 UTC)