[HN Gopher] Introduction to Probability for Data Science
___________________________________________________________________
Introduction to Probability for Data Science
Author : mariuz
Score : 105 points
Date : 2022-01-24 17:23 UTC (5 hours ago)
(HTM) web link (probability4datascience.com)
(TXT) w3m dump (probability4datascience.com)
| tacoluv wrote:
| Does anyone know of a way to send some money to the author? I
| know he says "free" a lot but this is so awesome I want to treat
| them to something.
| LittlePeter wrote:
| In second paragraph of Chapter 2 - Probability:
|
| > No matter whether you prefer the frequentist's view or the
| Bayesian's view...
|
| I don't think the intended audience reading this chapter has this
| preference at all...
|
| Then the set notation uses square brackets instead of curly
| braces? I cannot get over this for some reason.
| hervature wrote:
| You are misrepresenting that quote. That comes after giving a
| fairly generic overview of both in which someone could form an
| opinion. One does not need to know the peculiarities of
| Bayesian reasoning to have the opinion "you should incorporate
| prior knowledge". Also, the set notation does use curly braces.
| LittlePeter wrote:
| In my mind you cannot be frequentist or Bayesian after
| reading just the first paragraph of Chapter 2. But fair
| enough I am a bit too critical here.
|
| Also you are right, set notation does use curly braces, I am
| relieved :-). I was confused by the A = [-1, 1-1/n] (interval
| set notation) on page 8 that I misread as [-1, 1, 1/n]...
| ska wrote:
| > In my mind you cannot be frequentist or Bayesian after
| reading just the first paragraph of Chapter 2.
|
| I don't think the author is asking you to, at all. They are
| pointing out that there are two "camps" and you will see
| these terms bandied about (e.g. if you google stuff). But
| then they claim (rightly, I think for an intro like this)
| that it doesn't really matter for the material to
| (immediately) follow and you are better off focusing on
| more fundamental ideas of probability.
| heresie-dabord wrote:
| > Some people ask how much money I can make from this book. The
| answer is ZERO. There is not a single penny that goes to my
| pocket. Why do I do that? Textbooks today are just ridiculously
| expensive. [...] Education should be accessible to as many people
| as possible, especially to those underpreviledged families.
|
| B r a v o ! A free, quality education is the foundation for
| social progress and economic prosperity.
| dwrodri wrote:
| This looks like a fantastic resource. Thanks for sharing!
|
| I really enjoy the Bayesian side of ML, but it's definitely not
| the most accessible. Erik Bernhardsson cites latent dirichlet
| allocation as a big inspiration behind the music recommendation
| system he originally designed for Spotify, which is apparently
| still in use today[1]. I still struggle with grokking latent
| factor models, but it can be so rewarding to build your own and
| watch it work (even with only moderate success!).
|
| Kevin Murphy has been working on a new edition of MLaPP that is
| now two volumes, with the last volume on advanced topics slated
| for release next year. However, both the old edition and the
| drafts for the new edition are available on his website here[2].
|
| The University of Tubingen has a course on probabilistic ml which
| probably has one of the most thorough walkthroughs of a latent
| factor model I've found on the Internet. You can find the full
| playlist of lectures for free here on YouTube[3].
|
| In terms of other resources for deep study on fastinating topics
| which require some command over stats and probability:
|
| - David Silverman's lectures on reinforcement learning are
| fantastic [4]
|
| - The Machine Learning Summer School lectures are often quite
| good, with exceptionally talented researchers / practictioners
| being invited to provide multi-hour lectures on their domain of
| expertise with the intended audience being a bunch of graduate
| students with intermediate backgrounds in general ML topics. [5]
|
| 1: https://www.slideshare.net/erikbern/music-recommendations-
| ml... 2: https://probml.github.io/pml-book/ 3:
| https://www.youtube.com/playlist?list=PL05umP7R6ij1tHaOFY96m...
| 4: https://www.youtube.com/playlist?list=PLqYmG7hTraZDM-
| OYHWgPe... 5: http://mlss.cc
| graycat wrote:
| "A random process is a function indexed by a random key."
|
| Not just wrong, wildly bad nonsense.
|
| Go get some data. Now you have the value of a _random variable_.
|
| We don't get clear on just what _random_ means, and in _random
| variable_ we do not assume some element of not knowing. In
| particular _truly random_ is nonsense.
|
| Suppose we have a non-empty set I and for each i in I we have a
| random variable X_i (using TeX notation for a subscript). Then
| the I and the set of all X_i is a _random process_ or a
| _stochastic process_. We might write (X_i, I) or some such
| notation.
|
| Commonly the set I is an interval subset of the real line and
| denotes time. Set I might be half of the real line or all of it
| or just some interval, e.g., [0,1].
|
| The set I might be just the numbers
|
| [1, 2, 3, 4, 5, 6}
|
| for, say, playing with dice with the usual six sides.
|
| I might be the integers in [1, 52] for considering card games.
|
| But the set I might be all the points on the surface of a sphere
| for considering, say, the weather, maybe the oceans, etc.
|
| The set I might be all pairs (t, x, y, z) where t is a real
| number denoting time and the other three are coordinates in
| ordinary 3-space.
|
| A random variable can also be considered a function with domain a
| _probability space_ O. So for random variable Y, for each w in O,
| Y(w) is the value of the random variable Y at _sample_ w. Right,
| the usual notation has capital Greek omega for O and lower case
| Greek omega for w.
|
| Then for a particular w and stochastic process X with index set
| I, all the X_t(w) as t varies is a _sample path_ of the process
| X. E.g., a plot of the stock marked DJI for yesterday is part of
| such a sample path. So, with stochastic processes, what we
| observe are sample paths.
|
| That's a start on stochastic processes. Going deep into the field
| gets to be difficult quickly. Just quickly, look for names
| Kolmogorov, Dynkin, Doob, Ito, Shiryaev, Skorokhod, Rockafellar,
| Cinlar, Strook, Varadhan, Mckean, Blumenthal, Getoor, Fleming,
| Bertsekas, Karatzas, Shreve, Neveu, Tulcea(s).
|
| For some of the _flavor_ of probability theory and stochastic
| processes, see the article on _liftings_ at
|
| https://en.wikipedia.org/wiki/Lifting_theory
|
| I had the main book on liftings, I'd gotten for $1 at a used book
| store (not a big seller) but lost it in a recent move.
___________________________________________________________________
(page generated 2022-01-24 23:03 UTC)