[HN Gopher] What are the most important statistical ideas of the...
___________________________________________________________________
What are the most important statistical ideas of the past 50 years?
[pdf]
Author : luu
Score : 183 points
Date : 2021-03-07 10:40 UTC (1 days ago)
(HTM) web link (www.stat.columbia.edu)
(TXT) w3m dump (www.stat.columbia.edu)
| bookmarkable wrote:
| Bidding on NFT for this image starts at 1 ETH.
| TameAntelope wrote:
| I desperately want to get to a point where I'm able to quickly
| sketch out a statistical model for making a decision, but I'm
| struggling mightily on where to get started that doesn't include
| some kind of "hide the uncertainty" shell game, such as how I
| feel when Bayesianism gets thrown around.
|
| Submissions like this give me hope, that if I just read more,
| I'll get there, but nothing listed here feels like it's usable in
| a "napkin math" scenario. I just bought two of Gerd Gigerenzer's
| latest books, and I've already obtained some of David Sklansky's
| more introductory work on gambling, but I'm concerned my focus is
| either too theoretical or too narrowly on gambling (though I
| think there's a lot to learn in how a gambler assesses a bet, and
| how it applies to other situations).
|
| I guess if it were easy, we'd all be doing it...
| jbay808 wrote:
| I start with Wikipedia's list of maximum entropy distributions,
| pick the one that best represents the category for the kind of
| thing I'm reasoning about, and then update it with what data I
| happen to have.
|
| It's usually simpler than it sounds, and helped me put together
| a lot of quick decision models, sales forecasts, etc.
|
| https://en.m.wikipedia.org/wiki/Maximum_entropy_probability_...
| chewbaxxa wrote:
| What do you mean by hide the uncertainty? To me Bayesian
| modelling is about making explicit the uncertainty.
| Arnehuang wrote:
| Is that statistics? I think of statistics as "given historical
| data, infer future data". But it seems like what you want is to
| know which decision is best, which involves many more things
| (like estimating the impact/utility of each outcome), that
| seems more like economics?
| mindcrime wrote:
| Arguably it's just quibbling over a trivial terminological
| difference, but I get the feeling that you're thinking more
| about "Decision Theory"[1] (or "Decision Science") as opposed
| to "just Statistics". Decision Theory, of course, _uses_
| Statistics, and I guess one could argue the question of
| whether one is just a subfield of the other, or argue exactly
| where the dividing line is.
|
| [1]: https://en.wikipedia.org/wiki/Decision_theory
| taeric wrote:
| I've grown to the idea that statistics is a form of data
| compression. It isn't so much, "infer future data" as it is
| "if the data we have is representative of all data, what is a
| number/equation that represents this data?". Usually with a
| certain framing.
| analog31 wrote:
| When I took statistics in college, we started with a rather
| basic definition, something to the effect of: "A statistic is
| a function performed on a set." Statistics studies what you
| can infer when you know _something_ about a set, but not
| _everything_ about it, namely its precise contents. Often,
| what you are told about a set is something about its
| probability distribution, thus linking probability and
| statistics together.
|
| A useful parallel can be drawn with situations involving
| measurements and data, since data often have the same feature
| of telling us _something_ but not _everything._ This is what
| I believe makes statistics so useful for science.
| asdff wrote:
| Economics is also statistics.
| Arnehuang wrote:
| I think of it like this:
|
| Suppose I want to make a decision about whether to hedge
| for a market crash right now. Statistics can tell me the
| likelihood of a crash, and how bad. But if the market
| crashes, and very badly, how might that affect my life? To
| make a good decision I would need to think of all the
| things that come with a market crash (job loss, savings
| loss). This is not statistics.
|
| I could again use statistics to say what is the chance I
| lose my job given a market crash (say 70%). But then I
| would need to estimate the impact on my life should I lose
| my job (Stress, etc). This is not statistics. But it should
| very well factor into my ability to do back of the napkin
| math on whether I should hedge or not.
| chewbaxxa wrote:
| This is exactly statistics. This is an expectation of a
| utility function with respect to some distribution.
| fractionalhare wrote:
| If your decision substantially involves or derives from
| making an estimate about a population based on a sample,
| it is statistics. "Making decisions under uncertainty" is
| well-studied in statistical literature, just like
| "quantifying uncertainty" is well-studied. It sounds like
| you think the latter is "actual statistics", but these
| things are both statistics.
|
| In particular:
|
| _> But if the market crashes, and very badly, how might
| that affect my life? To make a good decision I would need
| to think of all the things that come with a market crash
| (job loss, savings loss). This is not statistics._
|
| This is all statistics, not just the part where you're
| forecasting likelihood of the market crashing. The reason
| is because making decisions about the future under the
| constraints of uncertainty implicitly involves a
| forecast. When you decide how to diversify your personal
| investment portfolio, how much to allocate to your Roth
| versus traditional IRA or 401k, etc, you are making
| forecasts about which allocation will provide you with a
| more favorable outcome.
|
| Stated more concisely: there is no rational reason to use
| statistics for forecasting market events but not for
| deciding what to do in the event specific market events
| occur.
| nanis wrote:
| > Statistics can tell me the likelihood of a crash
|
| Statistics cannot tell you any such thing.
| drdeca wrote:
| Do you mean to say that nothing can tell you such a
| thing?
|
| What is a likelihood, but a statistic?
|
| If there is any method to determine a statistic, it seems
| reasonable to me to say that that method involved
| statistics.
|
| (Now, of course, except for possibly where quantum
| randomness is relevant, which might be quite often, I'm
| fairly confident that the only probabilities are
| subjective or relative to some set of assumptions, or
| something along those lines, because the future "already
| exists". But, given some fixed priors and some fixed
| evidence, there should in principle be a well defined
| probability of such a crash. So, insofar as peoples
| priors match up, there should, in principle, be a common
| well defined probability given "the information which is
| publicly available", or also, given whatever other set of
| evidence.)
|
| Of course, that doesn't mean it is computationally
| tractable to compute such a probability.
| [deleted]
| nanis wrote:
| > Economics is also statistics.
|
| Economics is not Statistics (and definitely not
| statistics).
|
| Most of the discipline focuses on testing models and making
| inferences on observational data. The techniques for
| dealing with that sort of data, of course, build on
| Statistics, but their nature is different enough that there
| is Econometrics.
|
| A large part of economics is not empirical at all --
| despite the fact that people get Nobel prizes pretending
| this not to be the case.
|
| Even in the context of experimental economics, since the
| behavior of the observed vary depending on the mode of
| observation, the contexts in which the most straightforward
| Statistical methods designed to apply to
| engineering/chemistry/biology experiment type situations
| are not directly applicable (although it is great when they
| agree with the fancier methods).
| huitzitziltzin wrote:
| >A large part of economics is not empirical at all --
| despite the fact that people get Nobel prizes pretending
| this not to be the case.
|
| I'm not sure which parts of the field or which prize
| winners you are talking about. To be clear: you think
| economics is _not actually empirical_, but people are
| awarded Nobel Prizes for _pretending that it is_? That's
| a little odd. Let me know if that's not what you meant.
|
| When you look at this list:
|
| https://en.wikipedia.org/wiki/List_of_Nobel_Memorial_Priz
| e_l...
|
| Who satisfies that condition, in your mind? Who is
| getting the prize on the basis of pretending that
| economics is empirical?
| nanis wrote:
| > To be clear: you think economics is _not actually
| empirical_
|
| That is a misrepresentation of what I said.
|
| To be clear, I think what I said:
|
| >>A large part of economics is not empirical at all
|
| E.g., as an example, Kahneman's Nobel is solely a product
| of taking an axiomatic theory and designing experiments
| where regular people who are actually not being paid
| according to their performance are gently prodded into
| violating the axioms in weird settings. It is attractive
| to people who want to claim that clearly the plebes
| cannot be allowed to choose for themselves as they are
| not "rational".
|
| The only meaning of "rational" in Economics is that
| individuals choose the best alternative according to
| their preferences among a constrained set of
| alternatives. Here an "alternative" or "bundle" is a
| point in the entire commodity space.
|
| The only test of this is consistency with GARP: A choice
| is not rational if a feasible and more preferred
| alternative exists.
| riesz-repr wrote:
| There are actually several economists on this list, like
| Victor Chernozhukov, Guido Imbens, and Susan Athey...
| hntrader wrote:
| A common mistake technical people make is to be too theoretical
| or overcomplicated with their work and decision making, where
| everything has to be some math model written out in a nicely
| formatted LaTeX document. Don't fall into that trap, it is very
| ineffective.
|
| Stats is a tool like any other tool. Boil your question down to
| the fundamentals (first principles) and maybe stats is one of
| the tools you decide to use to solve part of it, where
| appropriate.
|
| Most questions involving data can be answered with something as
| basic as a plot.
| cambalache wrote:
| Easier said than done. The complete history of science is the
| human struggle to boil down the mathematical modelling of
| nature to its fundamentals.
| mindcrime wrote:
| Depending on the context, I'm a fan of the work of Douglas
| Hubbard, in his book _How to Measure Anything_ [1]. His
| approach involves working out answers to things that might
| sometimes be done as a "back of the napkin" kind of thing, but
| in a slightly more rigorous way. Note that there are criticisms
| of his approach, and I'll freely admit that it doesn't
| guarantee arriving at an optimal answer. But arguably the
| criticisms of his approach ("what if you leave out a variable
| in your model?", etc.) apply to many (most?) other modeling
| approaches.
|
| On a related note, one of the last times I mentioned Hubbard
| here, another book came up in the surrounding discussion, which
| looks really good as well. _Guesstimation: Solving the World 's
| Problems on the Back of a Cocktail Napkin_[2] - I bought a copy
| but haven't had time to read it yet. Maybe somebody who is
| familiar will chime in with their thoughts?
|
| [1]: https://www.amazon.com/How-Measure-Anything-Intangibles-
| Busi...
|
| [2]:
| https://www.amazon.com/gp/product/0691129495/ref=ppx_yo_dt_b...
| dr_dshiv wrote:
| Let me second "How to measure anything." I think it should be
| required reading for human beings.
| alex_anglin wrote:
| I would add that "How the Measure Anything in Cybersecurity
| Risk" should be a core part of Infosec literature.
| asdff wrote:
| Frame your hypothesis in the most simplest way possible and go
| from there.
| [deleted]
| cf wrote:
| While I'm not sure how much the modern methods are amendable to
| napkin math as stated in the article a lot more methods use
| simulation which if you can code are pretty straightforward to
| get working.
|
| Jake Vanderplas's presentation
| https://speakerdeck.com/jakevdp/statistics-for-hackers can give
| you some concrete ideas of how far you can get with just a
| random number generator.
| platz wrote:
| Decisions are moral/political/biological, not statistical
| roenxi wrote:
| Typically. But if somebody manages to make statistically
| sound evidence-based decisions they will steamroller people
| who are making decisions using other factors.
| karlmcguire wrote:
| Statistical Consequences of Fat Tails by Nassim Taleb.
| alexilliamson wrote:
| This feels only slightly more legitimate than recommending the
| 538 blog as a statistical authority.
| spekcular wrote:
| I have read this book and want to leave an anti-recommendation
| here. It's a poorly edited mess and makes at least one blatant
| mathematical error.
|
| More broadly, let me leave a Taleb anti-recommendation. His
| entire shtick is yelling that traditional statisticians have
| ignored heavy-tailed random variables in their modeling and
| that he has special insight into the nature of tail risk
| (perhaps along with a few select other people, like
| Mandelbrot).
|
| But this is manifestly not the case. In fact, if you go through
| his Amazon reviews page, you can find him leaving positive
| reviews several years ago on all the books written by
| traditional statisticians that he learned about heavy-tailed
| randomness from!
| sfashset wrote:
| link to his Amazon reviews page?
| spekcular wrote:
| Scroll back to the early 2010s: https://www.amazon.com/gp/p
| rofile/amzn1.account.AHMHNR4MRTDL...
|
| For a more detailed critique, see Robert Lund, _Revenge of
| the White Swan_ , The American Statistician Vol. 61, No. 3
| (Aug., 2007). Accessible through your favorite Russian
| website.
|
| If you want a better book on heavy-tailed randomness, I
| like Didier Sornette's _Critical Phenomena in Natural
| Sciences_ (subtitled _Chaos, Fractals, Selforganization and
| Disorder: Concepts and Tools_ ).
| SkyMarshal wrote:
| _Revenge of the White Swan_ also appears available on
| ResearchGate:
|
| https://www.researchgate.net/publication/4741329_Revenge_
| of_...
| jasonwatkinspdx wrote:
| Taleb is... not a good source for learning statistics. Start
| with Wasserman. Taleb says obvious and well known things using
| his own invented terminology in order to cast himself as some
| sort of contrarian genius. It's not that he's wrong, it's that
| the insights he hawks are banal. That's why his readership base
| are insight porn book junkies not people actually trying to
| learn statistical methods.
| actusual wrote:
| "insight porn books" is going in my "objects you've been
| searching for titles for" Notion list.
| jasonwatkinspdx wrote:
| Yeah, I think I first heard it in relation to Malcolm
| Gladwell and it's just so apt at capturing everything wrong
| with that category of book. I mean he's a skillful writer,
| and it's definitely entertaining stuff. But if you flip
| into critical mode and do comparative research vs
| authoritative sources, you start seeing how vapid it is
| really fast.
| disgruntledphd2 wrote:
| When I read Fooled by Randomness I found it useful. Not
| groundbreaking work, but it drew some nice analogies
| between statistical distributions and human's over-
| certainty.
| tajd wrote:
| If you do have his books then the reference lists in the back
| provide a good starting point for further reading.
| stblack wrote:
| Not mentioned, not cited in the paper. That's shocking.
|
| Edit: the word "tail" appears nowhere in the paper, in any
| context. I'm beyond shocked now.
| [deleted]
| spekcular wrote:
| This is subsumed in the robust estimation section.
| disgruntledphd2 wrote:
| Because this was well known to statisticians long before
| Taleb talked about it?
|
| That would be my suspicion as to why it isn't there.
| FabHK wrote:
| Quite plausible. Extreme Value theory [1] appears to have
| been codified by the 1960s, and one of the main theorems is
| credited "to Frechet (1927), Ronald Fisher and Leonard
| Henry Caleb Tippett (1928), Mises (1936) and Gnedenko
| (1943)" [2]. ETA: And the second theorem of Extreme Value
| Analysis is from the mid 1970s. [3]
|
| 1. https://en.wikipedia.org/wiki/Extreme_value_theory
|
| 2. https://en.wikipedia.org/wiki/Fisher-Tippett-
| Gnedenko_theore...
|
| 3. https://en.wikipedia.org/wiki/Pickands-Balkema-
| De_Haan_theor...
| bigbillheck wrote:
| My stats training was in the 90s and we absolutely
| covered leptokurtic things.
| selimthegrim wrote:
| The book by Leadbetter, Lindgren and Rootzen is good too
| if a bit dated.
| ojnabieoot wrote:
| Unsolicited advertising for Taleb's newest book is really not
| constructive or helpful, and betrays that you don't actually
| know very much about statistics.
| sigstoat wrote:
| calling someone's book suggestion an "advertisement" is rude,
| and inaccurate. taleb wouldn't pay anyone to suggest his book
| when he could instead just show up here and insult everyone
| for free.
| ojnabieoot wrote:
| I didn't mean it strictly literally. The original comment
| was a thoughtless namedrop of a brand new book (which means
| by tautology it's irrelevant to the topic at hand) and
| doesn't have a shred of reasoning behind it. So,
| functionally, it's a billboard advertising Taleb's book.
|
| I am aware that the Cult of Taleb means that people are
| willing to advertise his work for free.
| ncmncm wrote:
| The re-discovery of causation analysis, by Pearl, after it was
| suppressed for many decades by the statistics mandarinate,
| clearly qualifies.
|
| Max Planck is quoted, "Science progresses one funeral at a time."
| In this case, the grand old man of statistics finally died still
| insisting that it could not be proved that smoking caused cancer,
| but not before blighting careers of those who were showing it
| could.
| huitzitziltzin wrote:
| > after it was suppressed for many decades by the statistics
| mandarinate, clearly qualifies.
|
| I have heard Pearl make this claim but have never seen his
| evidence for it.
|
| As a counterexample: Don Rubin is a statistician who has a
| well-known framework for causal inference who has been at the
| top of the field for a very long time. Rubin has published
| widely and very well on causal inference.
|
| Is there good evidence for the topic actually being
| _suppressed_ by anyone within the statistics profession? There
| is work on causal inference going back to RA Fisher. If anyone
| has tried to suppress it, I 'm not sure they have been very
| effective.
| mlac wrote:
| >Is there good evidence for the topic actually being
| suppressed by anyone within the statistics profession?
|
| Unfortunately there's not enough evidence to show
| causation...
| alilleybrinker wrote:
| For anyone wanting to learn causal inference (the first item in
| the list), I highly recommend "Causal Inference, the Mixtape" by
| Scott Cunningham, a professor of economics at Baylor University.
| Scott has been writing this book incrementally in the open for
| the last couple of years, and recently completed and published
| it, and it is a thorough introduction to numerous techniques for
| inferring causation in different contexts.
| https://www.scunning.com/mixtape.html
| cambalache wrote:
| https://mixtape.scunning.com/
|
| Link to the free HTML version
| thepangolino wrote:
| That's a great practically oriented crash course on modern
| statistics!
| andyxor wrote:
| Causality, i.e. Causal inference & Graphical Models, see the work
| by Judea Pearl, he pretty much singlehandedly pioneered the
| field.
|
| https://www.amazon.com/Causality-Reasoning-Inference-Judea-P...
|
| https://en.wikipedia.org/wiki/Causality#Theories
|
| https://en.wikipedia.org/wiki/Graphical_model
| ncmncm wrote:
| Single-handedly pioneered _reviving_ the field, but yes.
|
| Pearl is very careful to give his deceased predecessors their
| due credit. That their work was suppressed will always be a
| blot on the leading names in statistics in the past century.
___________________________________________________________________
(page generated 2021-03-08 23:00 UTC)