[HN Gopher] Probabilistic Machine Learning: Advanced Topics
___________________________________________________________________
Probabilistic Machine Learning: Advanced Topics
Author : mariuz
Score : 177 points
Date : 2022-03-04 10:28 UTC (1 days ago)
(HTM) web link (probml.github.io)
(TXT) w3m dump (probml.github.io)
| it_does_follow wrote:
| Kevin Murphy has done an incredible service to the ML (and Stats)
| community by producing such an encyclopedic work of contemporary
| views on ML. These books are really a much need update of the now
| outdated feeling "The Elements of Statistical Learning" and the
| logical continuation of Bishop's nearly perfect "Pattern
| Recognition and Machine Learning".
|
| One thing I do find a bit surprising is that in the nearly 2000
| pages covered between these two books there is almost no mention
| of understanding parameter variance. I get that in machine
| learning we typically don't care, but this is such an essential
| part of basic statistics I'm surprised it's not covered at all.
|
| The closest we get is in the Inference section which is mostly
| interested in prediction variance. It's also surprising that in
| neither the section on Laplace Approximation or Fisher
| information does anyone call out the Cramer-Rao lower-bound which
| seems like a vital piece of information regarding uncertainty
| estimates.
|
| This is of course a minor critique since virtual no ML books
| touch on these topics, it's just unfortunate that in a volume
| this massive we still see ML ignoring what is arguably the most
| useful part of what statistics has to offer to machine learning.
| yldedly wrote:
| To get the prediction variance in a Bayesian treatment, you
| integrate over the posterior of the parameters - surely
| computing or approximating the posterior counts as considering
| parameter variance?
| dxbydt wrote:
| Do you really expect this situation to ever change ? The
| communities are vastly different in their goals despite some
| minor overlap in their theoretical foundations. Suppose you
| take rnorm(100) sample and find its variance. Then you ask the
| crowd the mean and variance of that sample variance. If your
| crowd is a 100 professional statisticians with a degree in
| Statistics, you should get the right answer atleast 90% of the
| time. If instead you have a 100 ML professionals with some sort
| of a degree in cs/vision/nlp, less than 10% would know how to
| go about computing the variance of sample variance, let alone
| what distribution that belongs to. The worst case is 100 self-
| taught Valley bros - not only will you get the wrong answer
| 100% of the time, they'll pile on you for gatekeeping and
| computing useless statistical quantities by hand when you
| should be focused on the latest and greatest libraries in numpy
| that will magically do all these sorts of things if you invoke
| the right api. As a statistician, I feel quite sad. But
| classical stats has no place in what passes for ML these days.
| Folks can't Rao Blackwellize for shit, how can you expect a
| Fisher Information matrix from them ?
| it_does_follow wrote:
| I think Bishop et al. WIP book _Model-Based Machine Learning_
| [0] is a nice step in the right direction. Honestly the most
| important thing missing from ML that stats has is the idea
| that your model is a model of something. That how you
| construct a problem mathematically says something about how
| you believe the world works. Then we can ask all sorts of
| detailed question about "how good is this model and what does
| it tell me?"
|
| I'm not sure this will ever dominate. As much as I love
| Bayesian approaches I sort of feel there is a push to make
| them ever more byzantine, recreating all of the original
| critiques of where frequentist stats had gone wrong. So
| essentially we're just seeing a different orthodoxy dominant
| thinking with all of the same trapping of the previous
| orthodoxy.
|
| 0. https://www.mbmlbook.com/
| kuloku wrote:
| What would you advise to ML professionals to do to improve
| their knowledge of statistics? Some recommended books?
| jstx1 wrote:
| Wait, what's the problem with people not knowing things that
| they don't need to know? This just comes across as being
| bitter that self taught people exist, or that other people
| are somehow encroaching on your field.
| YeGoblynQueenne wrote:
| I think your comment does what the OP complains about,
| regarding gatekeeping etc.
|
| I don't know about OP, whose comment I find a little harsh,
| but personally I'm always frustrated a bit and despairing a
| bit when I realise how poor the background is of the
| average machine learning researcher today, i.e. of my
| generation. Sometimes it's like nothing matters other than
| the chance that Google or Facebook will want to hire
| someone with a certain skillset and any knowledge that
| isn't absolutely essential to getting that skillset, is
| irrelevant.
|
| Who said "Those who do not know their history are doomed to
| repeat it"? In research that means being oblivious of the
| trials and tribulations of previous generations of
| researchers and then falling down the same pits that they
| did. See for example how deep learning models today are
| criticised for being "brittle", a criticism that was last
| levelled against expert systems, and for similar, although
| superficially different, reasons. Why can't we ever learn?
| jstx1 wrote:
| > I think your comment does what the OP complains about,
| regarding gatekeeping etc.
|
| Oh absolutely, that's how I intended it. I don't think
| that preemptively calling out people's reaction gives the
| parent comment a pass on gatekeeping.
|
| Your concern about poor background... it's only a problem
| for people who are jumping into things without the
| prerequisite background and they aren't learning fast
| enough. But modern deep learning is much more empirical -
| there are a few building blocks and people are trying out
| different things to see how they perform. I don't get why
| we need to look down on people for not knowing things
| that they don't need to know. If there was some magic
| that comes from knowing much more statistics, then the
| researchers who do would be outperforming the rest of the
| field by a lot but I don't think that's the case.
| [deleted]
| qumpis wrote:
| No exercises it seems, which imo are just as valuable as the
| contents of the book
| yellowcake0 wrote:
| When the book is published in 2023 it will have exercises, this
| is just a preliminary draft.
| melling wrote:
| Here are some videos covering his book Probabilistic Machine
| Learning: An Introduction:
|
| https://youtube.com/playlist?list=PLOk2cpmAEiU3YgtHRUm58zGkw...
| deepsun wrote:
| Why PDF? Why not ePUB or .tex? Especially since "Preface" is
| already badly formatted (text too far to the left overlapping
| line numbers).
| benrbray wrote:
| > Why not ePUB
|
| ePUB is notoriously bad at displaying mathematics. It also
| takes away the author's control of the page layout. To me there
| is nothing more satisfying than a well-crafted PDF.
| amelius wrote:
| Yes!
|
| And maybe someone can write some Machine Learning tool to
| "unformat" the PDF into ePUB.
| quantumduck wrote:
| I'm fairly certain that PDF was generated using LaTeX, everyone
| in academia uses it. Besides, it's not fair to complain about
| formatting in a very early draft.
| modernpink wrote:
| Why complain?
| jstx1 wrote:
| If it was epub or tex, the first thing I would do is look for a
| way to make a pdf out of it.
| rsfern wrote:
| One nice upside to having tex source is that you can set the
| page size to match e.g., a phone screen. Reading standard pdf
| textbooks and papers on a phone isn't very fun.
|
| I used to do this for reading arxiv preprints, but the script
| I wrote was kind of brittle and it doesn't really work out
| with figures anyway
|
| Honestly if the scientific community moved to something that
| could be interactive and reflowable I would be so happy
| tomrod wrote:
| I want markdown for academics. Mermaid markdown comes
| close. Needs charts.
| deepsun wrote:
| Sure, it's easy. But it's hard the other way around.
|
| It's similar to publishing a binary program with or without
| source code.
| colesantiago wrote:
| I agree.
|
| Was very surprised that most academic documents aren't being
| published in DjVu [0] format anymore, very sad.
|
| [0] https://en.wikipedia.org/wiki/DjVu
| axpy906 wrote:
| Why should I read this as opposed to Murphy or Bishop?
| ai_ia wrote:
| This is Murphy btw. Just the advanced version.
| axpy906 wrote:
| After opening the link.
|
| > by Kevin Patrick Murphy
|
| This is the advanced version of Machine Learning: A
| Probabilistic Perspective
| it_does_follow wrote:
| No, this is the second volume of "Probabilistic Machine
| Learning", the first volume of which was just published
| this week. The 2 volume set can be seen as a complete
| rewrite/replacement for "Machine Learning: A Probabilistic
| Perspective"
| it_does_follow wrote:
| For clarification, Murphy's first book is just _Machine
| Learning: A probabilistic perspective_ this is his newest, 2
| volume book, _Probabilistic Machine Learning_ which is broken
| down into two parts _an Introduction_ (published March 1, 2022)
| and _Advanced Topics_ (expected to be published in 2023, but
| draft preview available now).
|
| To answer your question. This book is even more complete and a
| bit improved over the first book. I don't believe there's
| anything in _Machine Learning_ that isn 't well covered, or
| correctly omitted from _Probabilistic Machine Learning_. This
| also has the benefit of a few more years of rethinking these
| topics. So between the existing Murphy books, _Probabilistic
| Machine Learning: an Introduction_ is probably the one you
| should have.
|
| Why this over Bishop (which I'm not sure is the case)? While on
| the surface they are very similar (very mathematical overviews
| of ML from a very probability focused perspective) they
| function as very different books. Murphy is much more of a
| reference to contemporary ML. If you want to understand how
| most leading researchers think about and understand ML, and
| want a reference covering the mathematical underpinnings this
| is a book you really need for a reference.
|
| Bishop is a much more opinionated book in that Bishop isn't
| just listing out all possible ways of thinking about a problem,
| but really building out a specific view of how probability
| relates to machine learning. If I'm going to sit down and read
| a book, it's going to be Bishop because he has a much stronger
| voice as an author and thinker. However Bishop's book is now
| more than 10 years old an misses out on nearly all of the major
| progress we've seen in deep learning. That's a lot to be
| missing and it won't be rectified in Bishop's perpetual WIP
| book [0.]
|
| A better comparison is not Murphy to Murphy or Murphy to
| Bishop, but Murphy to Hastie et al. _The Elements of
| Statistical Learning_ for many years was _the_ standard
| reference for advanced ML stuff, especially during the brief
| time when GBDT and Random Forests where the hot thing (which
| they still are to an extent in some communities). I really
| enjoy EoSL but it does have a very "Stanford Statistics"
| (which I feel is even more aggressively Frequentist than your
| average Frequentist) feel to the intuitions. Murphy is really
| the contemporary computer science/Bayesian understanding of ML
| that has dominated the top research teams for the last few
| years. It feels much more modern and should be the replacement
| reference text for most people.
|
| 0. https://www.mbmlbook.com/
| jmeister wrote:
| Comments like these are why I come to HN. Thank you.
| jcurbo wrote:
| Echoing others, thank you for writing this (as someone doing
| an applied math masters and digging into ML - I have used ESL
| for a class but not the others you mention)
| axpy906 wrote:
| I'm in agreement with much of your post. The Elements of
| Statistical Learning played its role quite well years ago but
| a fresher take is needed. Thanks for the response.
| YeGoblynQueenne wrote:
| I read TESL during my Master's and I remember being very
| confused with the way it described decision tree learning. I
| remember being pleased with myself that I had a strong grip
| on decision tree learning before reading TESL and then being
| thoroughly confused after reading about them on TESL.
|
| Eyballing the relevant chapter again (9.2) I think that may
| have been because it introduces decision tree learning with
| CART (the algorithm), whereas I was more familiar with ID3
| and C4.5. Perhaps it's simpler to describe CART as TESL does,
| but decision trees are a propositional logic "model" (in
| truth, a theory) and for me the natural way to describe them,
| is as a propositional logic "model" (theory). I also get the
| feeling that Quinlan's work is sidelined a little, perhaps
| because he was coming from a more classical AI background and
| that's poo-poo'd in statistical learning circles. If so,
| that's a bit of a shame and a bit of an omission. Machine
| learning is not just statistics and it's not just AI, it's a
| little bit of both and one needs to have at least some
| background in both subjects to understand what's really going
| on. But perhaps it's the data mining/ data science angle that
| I find a bit one-sided.
|
| Sorry to digress. I'm so excited when people discuss actual
| textbooks on HN.
| graycat wrote:
| Bourbaki student M. Talagrand has some work on _approximate_
| independence. If I were trying to do something along the lines of
| _Probabilistic Machine Learning: Advanced Topics_ I would look
|
| (1) carefully at the now classic
|
| L. Breiman, _et al.,_ _Classification and Regression Trees
| (CART)_ ,
|
| and
|
| (2) at the classic Markov limiting results, e.g., as in
|
| E. Cinlar, _Introduction to Stochastic Processes_ ,
|
| at least to be sure are not missing something relevant and
| powerful,
|
| (3) at some of the work on _sufficient_ statistics, of course,
| first via the classic Halmos and Savage paper and then at the
| interesting more recent work in
|
| Robert J. Serfling, _Approximation Theorems of Mathematical
| Statistics_ ,
|
| and then for the most promising
|
| (4) very carefully at Talagrand.
|
| (1) and (2) are old but a careful look along with more recent
| work may yield some directions for progress.
|
| What Serfling develops is a bit amazing.
|
| Then don't expect the Talagrand material to be trivial.
___________________________________________________________________
(page generated 2022-03-05 23:01 UTC)