[HN Gopher] Probabilistic Machine Learning: Advanced Topics
       ___________________________________________________________________
        
       Probabilistic Machine Learning: Advanced Topics
        
       Author : mariuz
       Score  : 177 points
       Date   : 2022-03-04 10:28 UTC (1 days ago)
        
 (HTM) web link (probml.github.io)
 (TXT) w3m dump (probml.github.io)
        
       | it_does_follow wrote:
       | Kevin Murphy has done an incredible service to the ML (and Stats)
       | community by producing such an encyclopedic work of contemporary
       | views on ML. These books are really a much need update of the now
       | outdated feeling "The Elements of Statistical Learning" and the
       | logical continuation of Bishop's nearly perfect "Pattern
       | Recognition and Machine Learning".
       | 
       | One thing I do find a bit surprising is that in the nearly 2000
       | pages covered between these two books there is almost no mention
       | of understanding parameter variance. I get that in machine
       | learning we typically don't care, but this is such an essential
       | part of basic statistics I'm surprised it's not covered at all.
       | 
       | The closest we get is in the Inference section which is mostly
       | interested in prediction variance. It's also surprising that in
       | neither the section on Laplace Approximation or Fisher
       | information does anyone call out the Cramer-Rao lower-bound which
       | seems like a vital piece of information regarding uncertainty
       | estimates.
       | 
       | This is of course a minor critique since virtual no ML books
       | touch on these topics, it's just unfortunate that in a volume
       | this massive we still see ML ignoring what is arguably the most
       | useful part of what statistics has to offer to machine learning.
        
         | yldedly wrote:
         | To get the prediction variance in a Bayesian treatment, you
         | integrate over the posterior of the parameters - surely
         | computing or approximating the posterior counts as considering
         | parameter variance?
        
         | dxbydt wrote:
         | Do you really expect this situation to ever change ? The
         | communities are vastly different in their goals despite some
         | minor overlap in their theoretical foundations. Suppose you
         | take rnorm(100) sample and find its variance. Then you ask the
         | crowd the mean and variance of that sample variance. If your
         | crowd is a 100 professional statisticians with a degree in
         | Statistics, you should get the right answer atleast 90% of the
         | time. If instead you have a 100 ML professionals with some sort
         | of a degree in cs/vision/nlp, less than 10% would know how to
         | go about computing the variance of sample variance, let alone
         | what distribution that belongs to. The worst case is 100 self-
         | taught Valley bros - not only will you get the wrong answer
         | 100% of the time, they'll pile on you for gatekeeping and
         | computing useless statistical quantities by hand when you
         | should be focused on the latest and greatest libraries in numpy
         | that will magically do all these sorts of things if you invoke
         | the right api. As a statistician, I feel quite sad. But
         | classical stats has no place in what passes for ML these days.
         | Folks can't Rao Blackwellize for shit, how can you expect a
         | Fisher Information matrix from them ?
        
           | it_does_follow wrote:
           | I think Bishop et al. WIP book _Model-Based Machine Learning_
           | [0] is a nice step in the right direction. Honestly the most
           | important thing missing from ML that stats has is the idea
           | that your model is a model of something. That how you
           | construct a problem mathematically says something about how
           | you believe the world works. Then we can ask all sorts of
           | detailed question about "how good is this model and what does
           | it tell me?"
           | 
           | I'm not sure this will ever dominate. As much as I love
           | Bayesian approaches I sort of feel there is a push to make
           | them ever more byzantine, recreating all of the original
           | critiques of where frequentist stats had gone wrong. So
           | essentially we're just seeing a different orthodoxy dominant
           | thinking with all of the same trapping of the previous
           | orthodoxy.
           | 
           | 0. https://www.mbmlbook.com/
        
           | kuloku wrote:
           | What would you advise to ML professionals to do to improve
           | their knowledge of statistics? Some recommended books?
        
           | jstx1 wrote:
           | Wait, what's the problem with people not knowing things that
           | they don't need to know? This just comes across as being
           | bitter that self taught people exist, or that other people
           | are somehow encroaching on your field.
        
             | YeGoblynQueenne wrote:
             | I think your comment does what the OP complains about,
             | regarding gatekeeping etc.
             | 
             | I don't know about OP, whose comment I find a little harsh,
             | but personally I'm always frustrated a bit and despairing a
             | bit when I realise how poor the background is of the
             | average machine learning researcher today, i.e. of my
             | generation. Sometimes it's like nothing matters other than
             | the chance that Google or Facebook will want to hire
             | someone with a certain skillset and any knowledge that
             | isn't absolutely essential to getting that skillset, is
             | irrelevant.
             | 
             | Who said "Those who do not know their history are doomed to
             | repeat it"? In research that means being oblivious of the
             | trials and tribulations of previous generations of
             | researchers and then falling down the same pits that they
             | did. See for example how deep learning models today are
             | criticised for being "brittle", a criticism that was last
             | levelled against expert systems, and for similar, although
             | superficially different, reasons. Why can't we ever learn?
        
               | jstx1 wrote:
               | > I think your comment does what the OP complains about,
               | regarding gatekeeping etc.
               | 
               | Oh absolutely, that's how I intended it. I don't think
               | that preemptively calling out people's reaction gives the
               | parent comment a pass on gatekeeping.
               | 
               | Your concern about poor background... it's only a problem
               | for people who are jumping into things without the
               | prerequisite background and they aren't learning fast
               | enough. But modern deep learning is much more empirical -
               | there are a few building blocks and people are trying out
               | different things to see how they perform. I don't get why
               | we need to look down on people for not knowing things
               | that they don't need to know. If there was some magic
               | that comes from knowing much more statistics, then the
               | researchers who do would be outperforming the rest of the
               | field by a lot but I don't think that's the case.
        
       | [deleted]
        
       | qumpis wrote:
       | No exercises it seems, which imo are just as valuable as the
       | contents of the book
        
         | yellowcake0 wrote:
         | When the book is published in 2023 it will have exercises, this
         | is just a preliminary draft.
        
       | melling wrote:
       | Here are some videos covering his book Probabilistic Machine
       | Learning: An Introduction:
       | 
       | https://youtube.com/playlist?list=PLOk2cpmAEiU3YgtHRUm58zGkw...
        
       | deepsun wrote:
       | Why PDF? Why not ePUB or .tex? Especially since "Preface" is
       | already badly formatted (text too far to the left overlapping
       | line numbers).
        
         | benrbray wrote:
         | > Why not ePUB
         | 
         | ePUB is notoriously bad at displaying mathematics. It also
         | takes away the author's control of the page layout. To me there
         | is nothing more satisfying than a well-crafted PDF.
        
           | amelius wrote:
           | Yes!
           | 
           | And maybe someone can write some Machine Learning tool to
           | "unformat" the PDF into ePUB.
        
         | quantumduck wrote:
         | I'm fairly certain that PDF was generated using LaTeX, everyone
         | in academia uses it. Besides, it's not fair to complain about
         | formatting in a very early draft.
        
         | modernpink wrote:
         | Why complain?
        
         | jstx1 wrote:
         | If it was epub or tex, the first thing I would do is look for a
         | way to make a pdf out of it.
        
           | rsfern wrote:
           | One nice upside to having tex source is that you can set the
           | page size to match e.g., a phone screen. Reading standard pdf
           | textbooks and papers on a phone isn't very fun.
           | 
           | I used to do this for reading arxiv preprints, but the script
           | I wrote was kind of brittle and it doesn't really work out
           | with figures anyway
           | 
           | Honestly if the scientific community moved to something that
           | could be interactive and reflowable I would be so happy
        
             | tomrod wrote:
             | I want markdown for academics. Mermaid markdown comes
             | close. Needs charts.
        
           | deepsun wrote:
           | Sure, it's easy. But it's hard the other way around.
           | 
           | It's similar to publishing a binary program with or without
           | source code.
        
         | colesantiago wrote:
         | I agree.
         | 
         | Was very surprised that most academic documents aren't being
         | published in DjVu [0] format anymore, very sad.
         | 
         | [0] https://en.wikipedia.org/wiki/DjVu
        
       | axpy906 wrote:
       | Why should I read this as opposed to Murphy or Bishop?
        
         | ai_ia wrote:
         | This is Murphy btw. Just the advanced version.
        
           | axpy906 wrote:
           | After opening the link.
           | 
           | > by Kevin Patrick Murphy
           | 
           | This is the advanced version of Machine Learning: A
           | Probabilistic Perspective
        
             | it_does_follow wrote:
             | No, this is the second volume of "Probabilistic Machine
             | Learning", the first volume of which was just published
             | this week. The 2 volume set can be seen as a complete
             | rewrite/replacement for "Machine Learning: A Probabilistic
             | Perspective"
        
         | it_does_follow wrote:
         | For clarification, Murphy's first book is just _Machine
         | Learning: A probabilistic perspective_ this is his newest, 2
         | volume book, _Probabilistic Machine Learning_ which is broken
         | down into two parts _an Introduction_ (published March 1, 2022)
         | and _Advanced Topics_ (expected to be published in 2023, but
         | draft preview available now).
         | 
         | To answer your question. This book is even more complete and a
         | bit improved over the first book. I don't believe there's
         | anything in _Machine Learning_ that isn 't well covered, or
         | correctly omitted from _Probabilistic Machine Learning_. This
         | also has the benefit of a few more years of rethinking these
         | topics. So between the existing Murphy books, _Probabilistic
         | Machine Learning: an Introduction_ is probably the one you
         | should have.
         | 
         | Why this over Bishop (which I'm not sure is the case)? While on
         | the surface they are very similar (very mathematical overviews
         | of ML from a very probability focused perspective) they
         | function as very different books. Murphy is much more of a
         | reference to contemporary ML. If you want to understand how
         | most leading researchers think about and understand ML, and
         | want a reference covering the mathematical underpinnings this
         | is a book you really need for a reference.
         | 
         | Bishop is a much more opinionated book in that Bishop isn't
         | just listing out all possible ways of thinking about a problem,
         | but really building out a specific view of how probability
         | relates to machine learning. If I'm going to sit down and read
         | a book, it's going to be Bishop because he has a much stronger
         | voice as an author and thinker. However Bishop's book is now
         | more than 10 years old an misses out on nearly all of the major
         | progress we've seen in deep learning. That's a lot to be
         | missing and it won't be rectified in Bishop's perpetual WIP
         | book [0.]
         | 
         | A better comparison is not Murphy to Murphy or Murphy to
         | Bishop, but Murphy to Hastie et al. _The Elements of
         | Statistical Learning_ for many years was _the_ standard
         | reference for advanced ML stuff, especially during the brief
         | time when GBDT and Random Forests where the hot thing (which
         | they still are to an extent in some communities). I really
         | enjoy EoSL but it does have a very  "Stanford Statistics"
         | (which I feel is even more aggressively Frequentist than your
         | average Frequentist) feel to the intuitions. Murphy is really
         | the contemporary computer science/Bayesian understanding of ML
         | that has dominated the top research teams for the last few
         | years. It feels much more modern and should be the replacement
         | reference text for most people.
         | 
         | 0. https://www.mbmlbook.com/
        
           | jmeister wrote:
           | Comments like these are why I come to HN. Thank you.
        
           | jcurbo wrote:
           | Echoing others, thank you for writing this (as someone doing
           | an applied math masters and digging into ML - I have used ESL
           | for a class but not the others you mention)
        
           | axpy906 wrote:
           | I'm in agreement with much of your post. The Elements of
           | Statistical Learning played its role quite well years ago but
           | a fresher take is needed. Thanks for the response.
        
           | YeGoblynQueenne wrote:
           | I read TESL during my Master's and I remember being very
           | confused with the way it described decision tree learning. I
           | remember being pleased with myself that I had a strong grip
           | on decision tree learning before reading TESL and then being
           | thoroughly confused after reading about them on TESL.
           | 
           | Eyballing the relevant chapter again (9.2) I think that may
           | have been because it introduces decision tree learning with
           | CART (the algorithm), whereas I was more familiar with ID3
           | and C4.5. Perhaps it's simpler to describe CART as TESL does,
           | but decision trees are a propositional logic "model" (in
           | truth, a theory) and for me the natural way to describe them,
           | is as a propositional logic "model" (theory). I also get the
           | feeling that Quinlan's work is sidelined a little, perhaps
           | because he was coming from a more classical AI background and
           | that's poo-poo'd in statistical learning circles. If so,
           | that's a bit of a shame and a bit of an omission. Machine
           | learning is not just statistics and it's not just AI, it's a
           | little bit of both and one needs to have at least some
           | background in both subjects to understand what's really going
           | on. But perhaps it's the data mining/ data science angle that
           | I find a bit one-sided.
           | 
           | Sorry to digress. I'm so excited when people discuss actual
           | textbooks on HN.
        
       | graycat wrote:
       | Bourbaki student M. Talagrand has some work on _approximate_
       | independence. If I were trying to do something along the lines of
       | _Probabilistic Machine Learning: Advanced Topics_ I would look
       | 
       | (1) carefully at the now classic
       | 
       | L. Breiman, _et al.,_ _Classification and Regression Trees
       | (CART)_ ,
       | 
       | and
       | 
       | (2) at the classic Markov limiting results, e.g., as in
       | 
       | E. Cinlar, _Introduction to Stochastic Processes_ ,
       | 
       | at least to be sure are not missing something relevant and
       | powerful,
       | 
       | (3) at some of the work on _sufficient_ statistics, of course,
       | first via the classic Halmos and Savage paper and then at the
       | interesting more recent work in
       | 
       | Robert J. Serfling, _Approximation Theorems of Mathematical
       | Statistics_ ,
       | 
       | and then for the most promising
       | 
       | (4) very carefully at Talagrand.
       | 
       | (1) and (2) are old but a careful look along with more recent
       | work may yield some directions for progress.
       | 
       | What Serfling develops is a bit amazing.
       | 
       | Then don't expect the Talagrand material to be trivial.
        
       ___________________________________________________________________
       (page generated 2022-03-05 23:01 UTC)