[HN Gopher] Review of "Statistics" by Freedman, Pisani, and Purv...
___________________________________________________________________
Review of "Statistics" by Freedman, Pisani, and Purves (2017)
Author : luu
Score : 120 points
Date : 2024-12-01 09:37 UTC (13 hours ago)
(HTM) web link (cadlag.org)
(TXT) w3m dump (cadlag.org)
| treetalker wrote:
| - David Freedman, Robert Pisani, and Roger Purves, _Statistics_ ,
| 4th ed.
|
| - The article's author also recommends these online materials:
| https://www.stat.berkeley.edu/~stark/SticiGui/Text/toc.htm.
| by256 wrote:
| I read/worked through Freedman's Statistics a couple of years ago
| and I walked away from it a different person. I always recommend
| it when someone asks for a good book to learn statistics from.
| However, it did leave me craving some of the maths that the
| authors intentionally left out to make the material more
| accessible. Freedman's more advanced book, Statistical Models,
| has you derive many of the results from the first book right at
| the start, then focuses mainly on linear models. It was a great
| follow-up which provided the mathematical substance that I felt
| was missing from the first book.
| evertedsphere wrote:
| excellent domain name
| djoldman wrote:
| Most statistics classes are not taught to people who will be
| professional statisticians. I agree whole-heartedly with this:
|
| > The book by Freedman, Pisani, and Purves is the one I would
| have liked to teach from, and it was the book I drew upon the
| most in prepping my own lectures, as an antidote to the
| overwrought and confused style of my assigned text. The authors
| maintain the underlying attitude that statistics is a useful tool
| for understanding certain questions about the world, but in this
| way it augments human judgement, rather than supplanting it. To
| quote from the preface:
|
| > > Why does the book include so many exercises that cannot be
| solved by plugging into a formula? The reason is that few real-
| life statistical problems can be solved that way. Blindly
| plugging into statistical formulas has caused a lot of confusion.
| So this book takes a different approach: thinking.
| lupire wrote:
| Like calculus for the engineer, statistics is primarily for the
| social scientist. It is an applied mathenatics, or a form of
| physics (math in the realm world).
|
| Math fans tend to discount and dismiss applied statistics as
| being not math, in a way that they don't do for physics, for
| some reason I don't fully grasp.
|
| I think it's because statistics gets a bad reputation from the
| legions of terrible social scientists in the wild, who can
| easily publish false but socially interesting results that get
| applied to our real lives. Mathematically fraudulent physics,
| on the other hand, usually immediately dies in the engineering
| phase, leaving just a few rambling cranks that most of everyone
| ignores.
|
| Also (and related) perhaps, just as dry mathematical statistics
| ignores real world empirical experimentation, "wet" applied
| statistics goes to far into ignoring the math completely,
| because too few empirical scientists are able to understand the
| math when they would wncounter itm
| maroonblazer wrote:
| From the end of TFA:
|
| >The book is not without its weak moments, although they are few.
| One in particular which I recall is the treatment of A/B testing.
| Essential to any hypothesis testing is the matter of how to
| reduce the sampling mechanism to a simple probabilistic model, so
| that a quantitative test may be derived. The book emphasizes one
| such model: simple random sampling from a population, which then
| involves the standard probabilistic ideas of binomial and
| multinomial distributions, along with the normal approximation to
| these. Thus, one obtains the z-test.
|
| >In the context of randomized controlled experiments, where a set
| of subjects is randomly assigned to either a control or treatment
| group, the simple random sampling model is inapplicable.
| Nonetheless, when asking whether the treatment has an effect
| there is a suitable (two-sample) z-test. The mathematical ideas
| behind it are necessarily different from those of the previously
| mentioned z-test, because the sampling mechanism here is
| different, but the end result looks the same. Why this works out
| as it does is explained rather opaquely in the book, since the
| authors never developed the probabilistic tools necessary to make
| sense of it (here one would find at least a mention of
| hypergeometric distributions). Given the emphasis placed in the
| beginning of the book on the importance of randomized, controlled
| experiments in statistics, it feels like this topic is getting
| short-shrift.
|
| Can anyone recommend good resources to fill this alleged gap?
| DAGdug wrote:
| I'd ignore the critique completely - it lacks internal
| consistency. The similar final result is due to central limit
| theorem, which is a large n result, and actually lets you
| ignore the hypergeometric construct and use a binomial instead
| since those are similar for large n. [edit: grammar]
| by256 wrote:
| In the book, Freedman states that two assumptions of the
| standard error of the difference are violated by the way
| subjects are assigned to control and treatment groups in
| randomized controlled trials (RCTs).
|
| The standard error of the difference assumes that a) samples
| are drawn independently, i.e., with replacement; and b) that
| the two groups are independent of each other. By samples being
| drawn, I mean a subject being assigned to a group in a RCT
| here.
|
| If you derive the standard error of the difference, there are
| two covariance terms that are zero when these assumptions are
| true. When they're violated, like in RCTs, the covariances are
| non-zero and should in theory be accounted for. However,
| Freedman implies that it doesn't actually matter because they
| effectively cancel each other out, as one inflates the standard
| error and the other deflates it.
| graycat wrote:
| > gap?
|
| E. L. Lehmann, 'Nonparametrics: Statistical Methods Based on
| Ranks', ISBN 0-8162-4994-6, Holden-Day, San Francisco, 1975.
|
| Sidney Siegel, 'Nonparametric Statistics for the Behavioral
| Sciences', McGraw-Hill, New York, 1956.
|
| Bradley Efron, 'The Jackknife, the Bootstrap, and Other
| Resampling Plans', ISBN 0-89871-179-7, SIAM, Philadelphia,
| 1982.
|
| Hypothesis testing?? Somewhere maybe I still have my little
| paper I wrote on using the Hahn decomposition and the Radon-
| Nikodym theorem to give a relatively general proof of the
| Neyman-Pearson theorem about the most powerful hypothesis test.
| agnosticmantis wrote:
| I found that I learned a lot about RCTs by going beyond RCTs
| and reading about causal inference. You learn why each
| assumption is important when it's broken.
|
| 'Causal Inference: What If' is a nice intro and freely
| available: https://www.hsph.harvard.edu/miguel-hernan/causal-
| inference-...
| j7ake wrote:
| Great book to go through the exercises!
| ivan_ah wrote:
| I've been working on a introductory STATS book for the past
| couple of years and I totally understand where the OP is coming
| from. There are so many books out there that focus on technique
| (the HOW), but don't explain the reasoning (the WHY).
|
| I guess it wouldn't be a problem if the techniques being taught
| in STATS101 were actually usable in the real world. A bit like
| driving a car: you don't need to know how internal combustion
| engines work, you just need to press the pedals (and not endanger
| others on the road). The problem is z-tests, t-tests, ANOVA, have
| very limited use cases. Most real-world data analysis will
| require more advanced models, so the STATS education is doubly-
| problematic: does not teach you useful skills OR teach you
| general principles.
|
| I spent a lot of time researching and thinking about STATS
| curriculum and choosing which topics are actually worth covering.
| I wrote a blog post about this[1]. In the end I settled on a
| computation-heavy approach, which allows me to do lots of hands
| simulations and demonstrations of concepts, something that will
| be helpful for tech-literate readers, but I think also for the
| non-tech people, since it will be easier to learn Python+STATS
| than to try to learn STATS alone. Here is a detailed argument
| about how Python is useful for learning statistics[2].
|
| If you're interested in seeing the book outline, you can check
| this google doc[3]. Comments welcome. I'm currently writing the
| last chapter, so hopefully will be done with it by January. I
| have a mailing list[4] for ppl who want to be notified when the
| book is ready.
|
| [1] https://minireference.com/blog/fixing-the-statistics-
| curricu...
|
| [2] https://minireference.com/blog/python-for-stats/
|
| [3]
| https://docs.google.com/document/d/1fwep23-95U-w1QMPU31nOvUn...
|
| [4] https://confirmsubscription.com/h/t/A17516BF2FCB41B2
| lupire wrote:
| Blog post is 2017, but the book is 4th (and latest) edition 2007,
| year before first author Freedman died, 1st edition published
| 1978, which fits the cartoon illustrations.
|
| Table of contents and section 1:
|
| https://homepages.dcc.ufmg.br/~assuncao/EstatCC/Slides/Extra...
| rafeyahmad wrote:
| Excellent book. Read this on the side while taking AP Statistics
| in high school and it gave me the intuition that the class
| textbook didn't. Particularly love the emphasis on study design.
| aerhardt wrote:
| I've formally studied stats up to calculus-based probability and
| I'm now brushing up on math ahead of starting Georgia Tech's
| OMSCS. I feel more fluent than I've ever been but the following
| quoted passage from the book really hits home:
|
| "Why does the book include so many exercises that cannot be
| solved by plugging into a formula? The reason is that few real-
| life statistical problems can be solved that way. Blindly
| plugging into statistical formulas has caused a lot of confusion.
| So this book takes a different approach: thinking."
|
| This applies to both math and stats. I appreciate the value in
| grinding pure, fundamental technique but as I'm reviewing I'm
| missing more real-life applications. Theory feels like a plan
| until real-life throws you the first punch.
|
| I'll be buying this book, thanks for the recommendation!
| joshdavham wrote:
| > Much of the power of statistics is in common sense, amplified
| by appropriate mathematical tools, and refined through careful
| analysis.
|
| I hate to be contrarian, but even though I have a degree in
| statistics, I feel like much of statistics/probability actually
| violates common sense. In fact, it's probably the most
| unintuitive field that I'm familiar with.
|
| Many of the readers will probably be familiar with the Monty Hall
| problem or the Birthday problem, but imo, the entire field of
| statistics/probability is about equally unintuitive/violating of
| common sense.
| throwaway81523 wrote:
| Oh this is a really good book. I've had it in my want-to-read
| pile for years. I will read the review now.
| wellshapedwords wrote:
| Coincidentally, I just finished the final chapter of this book. I
| wanted to learn the fundamentals after taking an (execrable)
| Coursera/IBM course on Python and data science. This book was
| perfect.
|
| I like this style of introducing a technical topic to a broad
| audience. It builds incrementally and practically. The prose is
| clear enough for a layman to gain a conceptual appreciation of
| the methods even if they skip the exercises. And while the
| exercises weren't too demanding, there were many of them, always
| framed in real world context. For the portion of the audience who
| will study further, I like to think that the book's approach
| towards problem solving and challenging the intuition could be
| helpful throughout an entire career of statistical thinking.
___________________________________________________________________
(page generated 2024-12-01 23:00 UTC)