[HN Gopher] Review of "Statistics" by Freedman, Pisani, and Purv...
       ___________________________________________________________________
        
       Review of "Statistics" by Freedman, Pisani, and Purves (2017)
        
       Author : luu
       Score  : 120 points
       Date   : 2024-12-01 09:37 UTC (13 hours ago)
        
 (HTM) web link (cadlag.org)
 (TXT) w3m dump (cadlag.org)
        
       | treetalker wrote:
       | - David Freedman, Robert Pisani, and Roger Purves, _Statistics_ ,
       | 4th ed.
       | 
       | - The article's author also recommends these online materials:
       | https://www.stat.berkeley.edu/~stark/SticiGui/Text/toc.htm.
        
       | by256 wrote:
       | I read/worked through Freedman's Statistics a couple of years ago
       | and I walked away from it a different person. I always recommend
       | it when someone asks for a good book to learn statistics from.
       | However, it did leave me craving some of the maths that the
       | authors intentionally left out to make the material more
       | accessible. Freedman's more advanced book, Statistical Models,
       | has you derive many of the results from the first book right at
       | the start, then focuses mainly on linear models. It was a great
       | follow-up which provided the mathematical substance that I felt
       | was missing from the first book.
        
       | evertedsphere wrote:
       | excellent domain name
        
       | djoldman wrote:
       | Most statistics classes are not taught to people who will be
       | professional statisticians. I agree whole-heartedly with this:
       | 
       | > The book by Freedman, Pisani, and Purves is the one I would
       | have liked to teach from, and it was the book I drew upon the
       | most in prepping my own lectures, as an antidote to the
       | overwrought and confused style of my assigned text. The authors
       | maintain the underlying attitude that statistics is a useful tool
       | for understanding certain questions about the world, but in this
       | way it augments human judgement, rather than supplanting it. To
       | quote from the preface:
       | 
       | > > Why does the book include so many exercises that cannot be
       | solved by plugging into a formula? The reason is that few real-
       | life statistical problems can be solved that way. Blindly
       | plugging into statistical formulas has caused a lot of confusion.
       | So this book takes a different approach: thinking.
        
         | lupire wrote:
         | Like calculus for the engineer, statistics is primarily for the
         | social scientist. It is an applied mathenatics, or a form of
         | physics (math in the realm world).
         | 
         | Math fans tend to discount and dismiss applied statistics as
         | being not math, in a way that they don't do for physics, for
         | some reason I don't fully grasp.
         | 
         | I think it's because statistics gets a bad reputation from the
         | legions of terrible social scientists in the wild, who can
         | easily publish false but socially interesting results that get
         | applied to our real lives. Mathematically fraudulent physics,
         | on the other hand, usually immediately dies in the engineering
         | phase, leaving just a few rambling cranks that most of everyone
         | ignores.
         | 
         | Also (and related) perhaps, just as dry mathematical statistics
         | ignores real world empirical experimentation, "wet" applied
         | statistics goes to far into ignoring the math completely,
         | because too few empirical scientists are able to understand the
         | math when they would wncounter itm
        
       | maroonblazer wrote:
       | From the end of TFA:
       | 
       | >The book is not without its weak moments, although they are few.
       | One in particular which I recall is the treatment of A/B testing.
       | Essential to any hypothesis testing is the matter of how to
       | reduce the sampling mechanism to a simple probabilistic model, so
       | that a quantitative test may be derived. The book emphasizes one
       | such model: simple random sampling from a population, which then
       | involves the standard probabilistic ideas of binomial and
       | multinomial distributions, along with the normal approximation to
       | these. Thus, one obtains the z-test.
       | 
       | >In the context of randomized controlled experiments, where a set
       | of subjects is randomly assigned to either a control or treatment
       | group, the simple random sampling model is inapplicable.
       | Nonetheless, when asking whether the treatment has an effect
       | there is a suitable (two-sample) z-test. The mathematical ideas
       | behind it are necessarily different from those of the previously
       | mentioned z-test, because the sampling mechanism here is
       | different, but the end result looks the same. Why this works out
       | as it does is explained rather opaquely in the book, since the
       | authors never developed the probabilistic tools necessary to make
       | sense of it (here one would find at least a mention of
       | hypergeometric distributions). Given the emphasis placed in the
       | beginning of the book on the importance of randomized, controlled
       | experiments in statistics, it feels like this topic is getting
       | short-shrift.
       | 
       | Can anyone recommend good resources to fill this alleged gap?
        
         | DAGdug wrote:
         | I'd ignore the critique completely - it lacks internal
         | consistency. The similar final result is due to central limit
         | theorem, which is a large n result, and actually lets you
         | ignore the hypergeometric construct and use a binomial instead
         | since those are similar for large n. [edit: grammar]
        
         | by256 wrote:
         | In the book, Freedman states that two assumptions of the
         | standard error of the difference are violated by the way
         | subjects are assigned to control and treatment groups in
         | randomized controlled trials (RCTs).
         | 
         | The standard error of the difference assumes that a) samples
         | are drawn independently, i.e., with replacement; and b) that
         | the two groups are independent of each other. By samples being
         | drawn, I mean a subject being assigned to a group in a RCT
         | here.
         | 
         | If you derive the standard error of the difference, there are
         | two covariance terms that are zero when these assumptions are
         | true. When they're violated, like in RCTs, the covariances are
         | non-zero and should in theory be accounted for. However,
         | Freedman implies that it doesn't actually matter because they
         | effectively cancel each other out, as one inflates the standard
         | error and the other deflates it.
        
         | graycat wrote:
         | > gap?
         | 
         | E. L. Lehmann, 'Nonparametrics: Statistical Methods Based on
         | Ranks', ISBN 0-8162-4994-6, Holden-Day, San Francisco, 1975.
         | 
         | Sidney Siegel, 'Nonparametric Statistics for the Behavioral
         | Sciences', McGraw-Hill, New York, 1956.
         | 
         | Bradley Efron, 'The Jackknife, the Bootstrap, and Other
         | Resampling Plans', ISBN 0-89871-179-7, SIAM, Philadelphia,
         | 1982.
         | 
         | Hypothesis testing?? Somewhere maybe I still have my little
         | paper I wrote on using the Hahn decomposition and the Radon-
         | Nikodym theorem to give a relatively general proof of the
         | Neyman-Pearson theorem about the most powerful hypothesis test.
        
         | agnosticmantis wrote:
         | I found that I learned a lot about RCTs by going beyond RCTs
         | and reading about causal inference. You learn why each
         | assumption is important when it's broken.
         | 
         | 'Causal Inference: What If' is a nice intro and freely
         | available: https://www.hsph.harvard.edu/miguel-hernan/causal-
         | inference-...
        
       | j7ake wrote:
       | Great book to go through the exercises!
        
       | ivan_ah wrote:
       | I've been working on a introductory STATS book for the past
       | couple of years and I totally understand where the OP is coming
       | from. There are so many books out there that focus on technique
       | (the HOW), but don't explain the reasoning (the WHY).
       | 
       | I guess it wouldn't be a problem if the techniques being taught
       | in STATS101 were actually usable in the real world. A bit like
       | driving a car: you don't need to know how internal combustion
       | engines work, you just need to press the pedals (and not endanger
       | others on the road). The problem is z-tests, t-tests, ANOVA, have
       | very limited use cases. Most real-world data analysis will
       | require more advanced models, so the STATS education is doubly-
       | problematic: does not teach you useful skills OR teach you
       | general principles.
       | 
       | I spent a lot of time researching and thinking about STATS
       | curriculum and choosing which topics are actually worth covering.
       | I wrote a blog post about this[1]. In the end I settled on a
       | computation-heavy approach, which allows me to do lots of hands
       | simulations and demonstrations of concepts, something that will
       | be helpful for tech-literate readers, but I think also for the
       | non-tech people, since it will be easier to learn Python+STATS
       | than to try to learn STATS alone. Here is a detailed argument
       | about how Python is useful for learning statistics[2].
       | 
       | If you're interested in seeing the book outline, you can check
       | this google doc[3]. Comments welcome. I'm currently writing the
       | last chapter, so hopefully will be done with it by January. I
       | have a mailing list[4] for ppl who want to be notified when the
       | book is ready.
       | 
       | [1] https://minireference.com/blog/fixing-the-statistics-
       | curricu...
       | 
       | [2] https://minireference.com/blog/python-for-stats/
       | 
       | [3]
       | https://docs.google.com/document/d/1fwep23-95U-w1QMPU31nOvUn...
       | 
       | [4] https://confirmsubscription.com/h/t/A17516BF2FCB41B2
        
       | lupire wrote:
       | Blog post is 2017, but the book is 4th (and latest) edition 2007,
       | year before first author Freedman died, 1st edition published
       | 1978, which fits the cartoon illustrations.
       | 
       | Table of contents and section 1:
       | 
       | https://homepages.dcc.ufmg.br/~assuncao/EstatCC/Slides/Extra...
        
       | rafeyahmad wrote:
       | Excellent book. Read this on the side while taking AP Statistics
       | in high school and it gave me the intuition that the class
       | textbook didn't. Particularly love the emphasis on study design.
        
       | aerhardt wrote:
       | I've formally studied stats up to calculus-based probability and
       | I'm now brushing up on math ahead of starting Georgia Tech's
       | OMSCS. I feel more fluent than I've ever been but the following
       | quoted passage from the book really hits home:
       | 
       | "Why does the book include so many exercises that cannot be
       | solved by plugging into a formula? The reason is that few real-
       | life statistical problems can be solved that way. Blindly
       | plugging into statistical formulas has caused a lot of confusion.
       | So this book takes a different approach: thinking."
       | 
       | This applies to both math and stats. I appreciate the value in
       | grinding pure, fundamental technique but as I'm reviewing I'm
       | missing more real-life applications. Theory feels like a plan
       | until real-life throws you the first punch.
       | 
       | I'll be buying this book, thanks for the recommendation!
        
       | joshdavham wrote:
       | > Much of the power of statistics is in common sense, amplified
       | by appropriate mathematical tools, and refined through careful
       | analysis.
       | 
       | I hate to be contrarian, but even though I have a degree in
       | statistics, I feel like much of statistics/probability actually
       | violates common sense. In fact, it's probably the most
       | unintuitive field that I'm familiar with.
       | 
       | Many of the readers will probably be familiar with the Monty Hall
       | problem or the Birthday problem, but imo, the entire field of
       | statistics/probability is about equally unintuitive/violating of
       | common sense.
        
       | throwaway81523 wrote:
       | Oh this is a really good book. I've had it in my want-to-read
       | pile for years. I will read the review now.
        
       | wellshapedwords wrote:
       | Coincidentally, I just finished the final chapter of this book. I
       | wanted to learn the fundamentals after taking an (execrable)
       | Coursera/IBM course on Python and data science. This book was
       | perfect.
       | 
       | I like this style of introducing a technical topic to a broad
       | audience. It builds incrementally and practically. The prose is
       | clear enough for a layman to gain a conceptual appreciation of
       | the methods even if they skip the exercises. And while the
       | exercises weren't too demanding, there were many of them, always
       | framed in real world context. For the portion of the audience who
       | will study further, I like to think that the book's approach
       | towards problem solving and challenging the intuition could be
       | helpful throughout an entire career of statistical thinking.
        
       ___________________________________________________________________
       (page generated 2024-12-01 23:00 UTC)