[HN Gopher] What are the most important statistical ideas of the...
       ___________________________________________________________________
        
       What are the most important statistical ideas of the past 50 years?
       [pdf]
        
       Author : luu
       Score  : 183 points
       Date   : 2021-03-07 10:40 UTC (1 days ago)
        
 (HTM) web link (www.stat.columbia.edu)
 (TXT) w3m dump (www.stat.columbia.edu)
        
       | bookmarkable wrote:
       | Bidding on NFT for this image starts at 1 ETH.
        
       | TameAntelope wrote:
       | I desperately want to get to a point where I'm able to quickly
       | sketch out a statistical model for making a decision, but I'm
       | struggling mightily on where to get started that doesn't include
       | some kind of "hide the uncertainty" shell game, such as how I
       | feel when Bayesianism gets thrown around.
       | 
       | Submissions like this give me hope, that if I just read more,
       | I'll get there, but nothing listed here feels like it's usable in
       | a "napkin math" scenario. I just bought two of Gerd Gigerenzer's
       | latest books, and I've already obtained some of David Sklansky's
       | more introductory work on gambling, but I'm concerned my focus is
       | either too theoretical or too narrowly on gambling (though I
       | think there's a lot to learn in how a gambler assesses a bet, and
       | how it applies to other situations).
       | 
       | I guess if it were easy, we'd all be doing it...
        
         | jbay808 wrote:
         | I start with Wikipedia's list of maximum entropy distributions,
         | pick the one that best represents the category for the kind of
         | thing I'm reasoning about, and then update it with what data I
         | happen to have.
         | 
         | It's usually simpler than it sounds, and helped me put together
         | a lot of quick decision models, sales forecasts, etc.
         | 
         | https://en.m.wikipedia.org/wiki/Maximum_entropy_probability_...
        
         | chewbaxxa wrote:
         | What do you mean by hide the uncertainty? To me Bayesian
         | modelling is about making explicit the uncertainty.
        
         | Arnehuang wrote:
         | Is that statistics? I think of statistics as "given historical
         | data, infer future data". But it seems like what you want is to
         | know which decision is best, which involves many more things
         | (like estimating the impact/utility of each outcome), that
         | seems more like economics?
        
           | mindcrime wrote:
           | Arguably it's just quibbling over a trivial terminological
           | difference, but I get the feeling that you're thinking more
           | about "Decision Theory"[1] (or "Decision Science") as opposed
           | to "just Statistics". Decision Theory, of course, _uses_
           | Statistics, and I guess one could argue the question of
           | whether one is just a subfield of the other, or argue exactly
           | where the dividing line is.
           | 
           | [1]: https://en.wikipedia.org/wiki/Decision_theory
        
           | taeric wrote:
           | I've grown to the idea that statistics is a form of data
           | compression. It isn't so much, "infer future data" as it is
           | "if the data we have is representative of all data, what is a
           | number/equation that represents this data?". Usually with a
           | certain framing.
        
           | analog31 wrote:
           | When I took statistics in college, we started with a rather
           | basic definition, something to the effect of: "A statistic is
           | a function performed on a set." Statistics studies what you
           | can infer when you know _something_ about a set, but not
           | _everything_ about it, namely its precise contents. Often,
           | what you are told about a set is something about its
           | probability distribution, thus linking probability and
           | statistics together.
           | 
           | A useful parallel can be drawn with situations involving
           | measurements and data, since data often have the same feature
           | of telling us _something_ but not _everything._ This is what
           | I believe makes statistics so useful for science.
        
           | asdff wrote:
           | Economics is also statistics.
        
             | Arnehuang wrote:
             | I think of it like this:
             | 
             | Suppose I want to make a decision about whether to hedge
             | for a market crash right now. Statistics can tell me the
             | likelihood of a crash, and how bad. But if the market
             | crashes, and very badly, how might that affect my life? To
             | make a good decision I would need to think of all the
             | things that come with a market crash (job loss, savings
             | loss). This is not statistics.
             | 
             | I could again use statistics to say what is the chance I
             | lose my job given a market crash (say 70%). But then I
             | would need to estimate the impact on my life should I lose
             | my job (Stress, etc). This is not statistics. But it should
             | very well factor into my ability to do back of the napkin
             | math on whether I should hedge or not.
        
               | chewbaxxa wrote:
               | This is exactly statistics. This is an expectation of a
               | utility function with respect to some distribution.
        
               | fractionalhare wrote:
               | If your decision substantially involves or derives from
               | making an estimate about a population based on a sample,
               | it is statistics. "Making decisions under uncertainty" is
               | well-studied in statistical literature, just like
               | "quantifying uncertainty" is well-studied. It sounds like
               | you think the latter is "actual statistics", but these
               | things are both statistics.
               | 
               | In particular:
               | 
               |  _> But if the market crashes, and very badly, how might
               | that affect my life? To make a good decision I would need
               | to think of all the things that come with a market crash
               | (job loss, savings loss). This is not statistics._
               | 
               | This is all statistics, not just the part where you're
               | forecasting likelihood of the market crashing. The reason
               | is because making decisions about the future under the
               | constraints of uncertainty implicitly involves a
               | forecast. When you decide how to diversify your personal
               | investment portfolio, how much to allocate to your Roth
               | versus traditional IRA or 401k, etc, you are making
               | forecasts about which allocation will provide you with a
               | more favorable outcome.
               | 
               | Stated more concisely: there is no rational reason to use
               | statistics for forecasting market events but not for
               | deciding what to do in the event specific market events
               | occur.
        
               | nanis wrote:
               | > Statistics can tell me the likelihood of a crash
               | 
               | Statistics cannot tell you any such thing.
        
               | drdeca wrote:
               | Do you mean to say that nothing can tell you such a
               | thing?
               | 
               | What is a likelihood, but a statistic?
               | 
               | If there is any method to determine a statistic, it seems
               | reasonable to me to say that that method involved
               | statistics.
               | 
               | (Now, of course, except for possibly where quantum
               | randomness is relevant, which might be quite often, I'm
               | fairly confident that the only probabilities are
               | subjective or relative to some set of assumptions, or
               | something along those lines, because the future "already
               | exists". But, given some fixed priors and some fixed
               | evidence, there should in principle be a well defined
               | probability of such a crash. So, insofar as peoples
               | priors match up, there should, in principle, be a common
               | well defined probability given "the information which is
               | publicly available", or also, given whatever other set of
               | evidence.)
               | 
               | Of course, that doesn't mean it is computationally
               | tractable to compute such a probability.
        
             | [deleted]
        
             | nanis wrote:
             | > Economics is also statistics.
             | 
             | Economics is not Statistics (and definitely not
             | statistics).
             | 
             | Most of the discipline focuses on testing models and making
             | inferences on observational data. The techniques for
             | dealing with that sort of data, of course, build on
             | Statistics, but their nature is different enough that there
             | is Econometrics.
             | 
             | A large part of economics is not empirical at all --
             | despite the fact that people get Nobel prizes pretending
             | this not to be the case.
             | 
             | Even in the context of experimental economics, since the
             | behavior of the observed vary depending on the mode of
             | observation, the contexts in which the most straightforward
             | Statistical methods designed to apply to
             | engineering/chemistry/biology experiment type situations
             | are not directly applicable (although it is great when they
             | agree with the fancier methods).
        
               | huitzitziltzin wrote:
               | >A large part of economics is not empirical at all --
               | despite the fact that people get Nobel prizes pretending
               | this not to be the case.
               | 
               | I'm not sure which parts of the field or which prize
               | winners you are talking about. To be clear: you think
               | economics is _not actually empirical_, but people are
               | awarded Nobel Prizes for _pretending that it is_? That's
               | a little odd. Let me know if that's not what you meant.
               | 
               | When you look at this list:
               | 
               | https://en.wikipedia.org/wiki/List_of_Nobel_Memorial_Priz
               | e_l...
               | 
               | Who satisfies that condition, in your mind? Who is
               | getting the prize on the basis of pretending that
               | economics is empirical?
        
               | nanis wrote:
               | > To be clear: you think economics is _not actually
               | empirical_
               | 
               | That is a misrepresentation of what I said.
               | 
               | To be clear, I think what I said:
               | 
               | >>A large part of economics is not empirical at all
               | 
               | E.g., as an example, Kahneman's Nobel is solely a product
               | of taking an axiomatic theory and designing experiments
               | where regular people who are actually not being paid
               | according to their performance are gently prodded into
               | violating the axioms in weird settings. It is attractive
               | to people who want to claim that clearly the plebes
               | cannot be allowed to choose for themselves as they are
               | not "rational".
               | 
               | The only meaning of "rational" in Economics is that
               | individuals choose the best alternative according to
               | their preferences among a constrained set of
               | alternatives. Here an "alternative" or "bundle" is a
               | point in the entire commodity space.
               | 
               | The only test of this is consistency with GARP: A choice
               | is not rational if a feasible and more preferred
               | alternative exists.
        
               | riesz-repr wrote:
               | There are actually several economists on this list, like
               | Victor Chernozhukov, Guido Imbens, and Susan Athey...
        
         | hntrader wrote:
         | A common mistake technical people make is to be too theoretical
         | or overcomplicated with their work and decision making, where
         | everything has to be some math model written out in a nicely
         | formatted LaTeX document. Don't fall into that trap, it is very
         | ineffective.
         | 
         | Stats is a tool like any other tool. Boil your question down to
         | the fundamentals (first principles) and maybe stats is one of
         | the tools you decide to use to solve part of it, where
         | appropriate.
         | 
         | Most questions involving data can be answered with something as
         | basic as a plot.
        
           | cambalache wrote:
           | Easier said than done. The complete history of science is the
           | human struggle to boil down the mathematical modelling of
           | nature to its fundamentals.
        
         | mindcrime wrote:
         | Depending on the context, I'm a fan of the work of Douglas
         | Hubbard, in his book _How to Measure Anything_ [1]. His
         | approach involves working out answers to things that might
         | sometimes be done as a "back of the napkin" kind of thing, but
         | in a slightly more rigorous way. Note that there are criticisms
         | of his approach, and I'll freely admit that it doesn't
         | guarantee arriving at an optimal answer. But arguably the
         | criticisms of his approach ("what if you leave out a variable
         | in your model?", etc.) apply to many (most?) other modeling
         | approaches.
         | 
         | On a related note, one of the last times I mentioned Hubbard
         | here, another book came up in the surrounding discussion, which
         | looks really good as well. _Guesstimation: Solving the World 's
         | Problems on the Back of a Cocktail Napkin_[2] - I bought a copy
         | but haven't had time to read it yet. Maybe somebody who is
         | familiar will chime in with their thoughts?
         | 
         | [1]: https://www.amazon.com/How-Measure-Anything-Intangibles-
         | Busi...
         | 
         | [2]:
         | https://www.amazon.com/gp/product/0691129495/ref=ppx_yo_dt_b...
        
           | dr_dshiv wrote:
           | Let me second "How to measure anything." I think it should be
           | required reading for human beings.
        
             | alex_anglin wrote:
             | I would add that "How the Measure Anything in Cybersecurity
             | Risk" should be a core part of Infosec literature.
        
         | asdff wrote:
         | Frame your hypothesis in the most simplest way possible and go
         | from there.
        
         | [deleted]
        
         | cf wrote:
         | While I'm not sure how much the modern methods are amendable to
         | napkin math as stated in the article a lot more methods use
         | simulation which if you can code are pretty straightforward to
         | get working.
         | 
         | Jake Vanderplas's presentation
         | https://speakerdeck.com/jakevdp/statistics-for-hackers can give
         | you some concrete ideas of how far you can get with just a
         | random number generator.
        
         | platz wrote:
         | Decisions are moral/political/biological, not statistical
        
           | roenxi wrote:
           | Typically. But if somebody manages to make statistically
           | sound evidence-based decisions they will steamroller people
           | who are making decisions using other factors.
        
       | karlmcguire wrote:
       | Statistical Consequences of Fat Tails by Nassim Taleb.
        
         | alexilliamson wrote:
         | This feels only slightly more legitimate than recommending the
         | 538 blog as a statistical authority.
        
         | spekcular wrote:
         | I have read this book and want to leave an anti-recommendation
         | here. It's a poorly edited mess and makes at least one blatant
         | mathematical error.
         | 
         | More broadly, let me leave a Taleb anti-recommendation. His
         | entire shtick is yelling that traditional statisticians have
         | ignored heavy-tailed random variables in their modeling and
         | that he has special insight into the nature of tail risk
         | (perhaps along with a few select other people, like
         | Mandelbrot).
         | 
         | But this is manifestly not the case. In fact, if you go through
         | his Amazon reviews page, you can find him leaving positive
         | reviews several years ago on all the books written by
         | traditional statisticians that he learned about heavy-tailed
         | randomness from!
        
           | sfashset wrote:
           | link to his Amazon reviews page?
        
             | spekcular wrote:
             | Scroll back to the early 2010s: https://www.amazon.com/gp/p
             | rofile/amzn1.account.AHMHNR4MRTDL...
             | 
             | For a more detailed critique, see Robert Lund, _Revenge of
             | the White Swan_ , The American Statistician Vol. 61, No. 3
             | (Aug., 2007). Accessible through your favorite Russian
             | website.
             | 
             | If you want a better book on heavy-tailed randomness, I
             | like Didier Sornette's _Critical Phenomena in Natural
             | Sciences_ (subtitled _Chaos, Fractals, Selforganization and
             | Disorder: Concepts and Tools_ ).
        
               | SkyMarshal wrote:
               | _Revenge of the White Swan_ also appears available on
               | ResearchGate:
               | 
               | https://www.researchgate.net/publication/4741329_Revenge_
               | of_...
        
         | jasonwatkinspdx wrote:
         | Taleb is... not a good source for learning statistics. Start
         | with Wasserman. Taleb says obvious and well known things using
         | his own invented terminology in order to cast himself as some
         | sort of contrarian genius. It's not that he's wrong, it's that
         | the insights he hawks are banal. That's why his readership base
         | are insight porn book junkies not people actually trying to
         | learn statistical methods.
        
           | actusual wrote:
           | "insight porn books" is going in my "objects you've been
           | searching for titles for" Notion list.
        
             | jasonwatkinspdx wrote:
             | Yeah, I think I first heard it in relation to Malcolm
             | Gladwell and it's just so apt at capturing everything wrong
             | with that category of book. I mean he's a skillful writer,
             | and it's definitely entertaining stuff. But if you flip
             | into critical mode and do comparative research vs
             | authoritative sources, you start seeing how vapid it is
             | really fast.
        
               | disgruntledphd2 wrote:
               | When I read Fooled by Randomness I found it useful. Not
               | groundbreaking work, but it drew some nice analogies
               | between statistical distributions and human's over-
               | certainty.
        
           | tajd wrote:
           | If you do have his books then the reference lists in the back
           | provide a good starting point for further reading.
        
         | stblack wrote:
         | Not mentioned, not cited in the paper. That's shocking.
         | 
         | Edit: the word "tail" appears nowhere in the paper, in any
         | context. I'm beyond shocked now.
        
           | [deleted]
        
           | spekcular wrote:
           | This is subsumed in the robust estimation section.
        
           | disgruntledphd2 wrote:
           | Because this was well known to statisticians long before
           | Taleb talked about it?
           | 
           | That would be my suspicion as to why it isn't there.
        
             | FabHK wrote:
             | Quite plausible. Extreme Value theory [1] appears to have
             | been codified by the 1960s, and one of the main theorems is
             | credited "to Frechet (1927), Ronald Fisher and Leonard
             | Henry Caleb Tippett (1928), Mises (1936) and Gnedenko
             | (1943)" [2]. ETA: And the second theorem of Extreme Value
             | Analysis is from the mid 1970s. [3]
             | 
             | 1. https://en.wikipedia.org/wiki/Extreme_value_theory
             | 
             | 2. https://en.wikipedia.org/wiki/Fisher-Tippett-
             | Gnedenko_theore...
             | 
             | 3. https://en.wikipedia.org/wiki/Pickands-Balkema-
             | De_Haan_theor...
        
               | bigbillheck wrote:
               | My stats training was in the 90s and we absolutely
               | covered leptokurtic things.
        
               | selimthegrim wrote:
               | The book by Leadbetter, Lindgren and Rootzen is good too
               | if a bit dated.
        
         | ojnabieoot wrote:
         | Unsolicited advertising for Taleb's newest book is really not
         | constructive or helpful, and betrays that you don't actually
         | know very much about statistics.
        
           | sigstoat wrote:
           | calling someone's book suggestion an "advertisement" is rude,
           | and inaccurate. taleb wouldn't pay anyone to suggest his book
           | when he could instead just show up here and insult everyone
           | for free.
        
             | ojnabieoot wrote:
             | I didn't mean it strictly literally. The original comment
             | was a thoughtless namedrop of a brand new book (which means
             | by tautology it's irrelevant to the topic at hand) and
             | doesn't have a shred of reasoning behind it. So,
             | functionally, it's a billboard advertising Taleb's book.
             | 
             | I am aware that the Cult of Taleb means that people are
             | willing to advertise his work for free.
        
       | ncmncm wrote:
       | The re-discovery of causation analysis, by Pearl, after it was
       | suppressed for many decades by the statistics mandarinate,
       | clearly qualifies.
       | 
       | Max Planck is quoted, "Science progresses one funeral at a time."
       | In this case, the grand old man of statistics finally died still
       | insisting that it could not be proved that smoking caused cancer,
       | but not before blighting careers of those who were showing it
       | could.
        
         | huitzitziltzin wrote:
         | > after it was suppressed for many decades by the statistics
         | mandarinate, clearly qualifies.
         | 
         | I have heard Pearl make this claim but have never seen his
         | evidence for it.
         | 
         | As a counterexample: Don Rubin is a statistician who has a
         | well-known framework for causal inference who has been at the
         | top of the field for a very long time. Rubin has published
         | widely and very well on causal inference.
         | 
         | Is there good evidence for the topic actually being
         | _suppressed_ by anyone within the statistics profession? There
         | is work on causal inference going back to RA Fisher. If anyone
         | has tried to suppress it, I 'm not sure they have been very
         | effective.
        
           | mlac wrote:
           | >Is there good evidence for the topic actually being
           | suppressed by anyone within the statistics profession?
           | 
           | Unfortunately there's not enough evidence to show
           | causation...
        
       | alilleybrinker wrote:
       | For anyone wanting to learn causal inference (the first item in
       | the list), I highly recommend "Causal Inference, the Mixtape" by
       | Scott Cunningham, a professor of economics at Baylor University.
       | Scott has been writing this book incrementally in the open for
       | the last couple of years, and recently completed and published
       | it, and it is a thorough introduction to numerous techniques for
       | inferring causation in different contexts.
       | https://www.scunning.com/mixtape.html
        
         | cambalache wrote:
         | https://mixtape.scunning.com/
         | 
         | Link to the free HTML version
        
       | thepangolino wrote:
       | That's a great practically oriented crash course on modern
       | statistics!
        
       | andyxor wrote:
       | Causality, i.e. Causal inference & Graphical Models, see the work
       | by Judea Pearl, he pretty much singlehandedly pioneered the
       | field.
       | 
       | https://www.amazon.com/Causality-Reasoning-Inference-Judea-P...
       | 
       | https://en.wikipedia.org/wiki/Causality#Theories
       | 
       | https://en.wikipedia.org/wiki/Graphical_model
        
         | ncmncm wrote:
         | Single-handedly pioneered _reviving_ the field, but yes.
         | 
         | Pearl is very careful to give his deceased predecessors their
         | due credit. That their work was suppressed will always be a
         | blot on the leading names in statistics in the past century.
        
       ___________________________________________________________________
       (page generated 2021-03-08 23:00 UTC)