[HN Gopher] Probability and Statistics Cookbook (2011) [pdf]
___________________________________________________________________
Probability and Statistics Cookbook (2011) [pdf]
Author : cpp_frog
Score : 174 points
Date : 2022-06-06 13:34 UTC (9 hours ago)
(HTM) web link (pages.cs.wisc.edu)
(TXT) w3m dump (pages.cs.wisc.edu)
| kiernanmcgowan wrote:
| Another set of notes that I refer to often are from ECE 830, also
| at UW Madison[0]. It was a great class that really represented
| the culmination of all the probability theory and signals classes
| I had taken over the years.
|
| [0] https://nowak.ece.wisc.edu/ece830/index.html
| cjohnson318 wrote:
| This is a nice collection of definitions and key results, but
| it's not a cookbook. I think of a cookbook as a collection of
| useful, focused examples, demonstrating best practices, and
| listing caveats.
| jenny91 wrote:
| This actually seems pretty good and has great coverage!
| snicker7 wrote:
| There is a book titled "All of Statistics" if you'd like a
| whirlwind tour.
| conformist wrote:
| https://www.stat.cmu.edu/~larry/all-of-statistics/index.html
| buzzdenver wrote:
| I would call this a cheat-sheet rather than a cookbook.
| jmt_ wrote:
| Agree, I typed up something similar (but less detailed) as a
| reference during my undergrad stats major. I'd expect a
| cookbook to have worked out examples of applications of these
| topics. But still looks very useful as a reference
| mturmon wrote:
| Actually quite good.
|
| I've TA'd this class but it's surprising how many of these little
| facts can be helpful if you recall them at the right time. I was
| just reminded of: Var[Y] = E[Var[Y|X]] +
| Var[E[Y|X]]
|
| and it unstuck me from a little puzzle.
|
| More recent version at: http://statistics.zone
| jarenmf wrote:
| Similar to the law of total expectation, easier for me to think
| of partitioning a weighted mean :)
| willdearden wrote:
| There is a professor who was at Wisconsin, Charles Manski, who
| developed partial identification, which uses tons of these
| decompositions.
|
| Idea is let's say you have a binary survey question where 80%
| respond and 90% of them respond "yes". What can we say about
| population "yes" rate (assume sample size is huge for
| simplicity)?
|
| P(Yes) = P(Yes | response) * P(response) + P(Yes | no response)
| * P(no response) = 0.9 * 0.8 + P(Yes | no response) * 0.2 =
| 0.72 + P(Yes | no response) * 0.2
|
| Then 0 <= P(Yes | no response) <= 1, so 0.72 <= P(Yes) <= 0.92.
| This example is somewhat trivial but it's a useful technique
| for showing exactly how your assumptions map to inferences.
| evandwight wrote:
| For those who are wondering how this equation is true:
|
| https://en.wikipedia.org/wiki/Law_of_total_variance
| mdp2021 wrote:
| Note that the linked PDF is a 2011 version (as is explicit),
|
| but the most recent version (0.2.7) is dated 2021 - and available
| at https://github.com/mavam/stat-
| cookbook/releases/download/0.2...
|
| There exist "release notes" pages with the differences (at
| https://github.com/mavam/stat-cookbook/releases )
| Bo0kerDeWitt wrote:
| Nice, original LaTeX code is there too.
| russellbeattie wrote:
| I don't know math at all. I'd love a programmer version of this,
| with all the algorithms in code. Probably already exists in NumPy
| or something.
| time_to_smile wrote:
| If you're interested in probability and statistics it's well
| worth your time to get more comfortable with the math.
|
| There's a common mistake in thinking among programmers that
| there's a one-to-one mapping between math and code and that
| mathematical notation is just an annoying terse short hand.
|
| As someone who spends a lot of time implementing mathematical
| ideas into code I can tell you this is not remotely true.
| Mathematics is dealing with a level of abstraction and thinking
| that is fundamentally distinct from the computational
| implementation of these.
|
| A clear example of this is the Gamma function which appears all
| over those notes. It's an essential function for working deeply
| with statistics, you'll find it shows up just about everywhere
| if you look carefully enough. You can manipulate it
| mathematically to solve a range of problem.
|
| However if you want to implement this from scratch in code,
| that is to understand how to _compute_ the Gamma function, you
| 're going to have to spend a lot of time studying numeric
| methods if you want to do more than robotically copy it from
| _Numerical Recipes_.
|
| Similarly many of the integrals used in statistics can end up
| quite difficult to compute, but that difficulty doesn't impact
| their ease of use in a mathematical context. This is a common
| theme when working with applied math: you can do quite a lot of
| mathematical work on problems that you don't necessarily know
| how to compute yet. Once you solve your problem mathematically,
| then you can go on to solving how to actually compute the
| answer.
| jbay808 wrote:
| To add to this comment, it's hard to make any useful program
| if you don't have at least a clear conceptual understanding
| of what you're trying to do.
|
| For example, perhaps you are trying to calculate a variance.
| But do you have a set of raw data from which you will
| estimate the variance? Or some summary statistics? Or do you
| already have a probability distribution from which you will
| compute the variance? How is it represented? Is that the
| probability distribution over the particular variable you
| want the variance for, or is it a related variable that needs
| to be transformed first somehow?
|
| You don't necessarily need to know how to handle all the math
| by hand, but there's no avoiding the need for at least a
| clear idea of what you're doing and what the sticking points
| might be.
___________________________________________________________________
(page generated 2022-06-06 23:01 UTC)