[HN Gopher] Introduction to Modern Statistics
___________________________________________________________________
Introduction to Modern Statistics
Author : noelwelsh
Score : 526 points
Date : 2023-10-12 08:45 UTC (12 hours ago)
(HTM) web link (openintro-ims2.netlify.app)
(TXT) w3m dump (openintro-ims2.netlify.app)
| noelwelsh wrote:
| Statistics education is undergoing a bit of a revolution, driven
| by the accessibility of computers. For example, hypothesis
| testing is introduced by randomization[1], using a randomized
| permutation test[2]. I find this really easy to understand,
| compared to how I learned statistics using a more traditional
| approach. The traditional approach taught be a cookbook of
| hypothesis tests to use: use the t-test in this situation, use
| the chi-squared in this situation, and so on. I never gained any
| understanding of why I should use these different tests, or where
| they came from, from the cookbook approach.
|
| For the same approach in a slightly different context see [3].
|
| [1]: https://openintro-ims2.netlify.app/11-foundations-
| randomizat...
|
| [2]: https://en.wikipedia.org/wiki/Permutation_test
|
| [3]:
| https://inferentialthinking.com/chapters/11/1/Assessing_a_Mo...
| iTokio wrote:
| There is also Brillant that has a very polished interactive
| course:
|
| https://brilliant.org/courses/statistics/
| usgroup wrote:
| These things are great if they add value for you, but I would
| be very skeptical of any non-mathematical approach to
| statistics. I think statistics is only made clear by
| mathematics, much the same as Physics. And one cannot grasp
| statistics without being able to understand the maths.
|
| I think that still the best way to understand statistics is
| to start with the mathematical theory and to grind 1000+
| textbook problems.
| mkl wrote:
| > I think that still the best way to understand statistics
| is to start with the mathematical theory and to grind 1000+
| textbook problems.
|
| Are there any books you'd recommend for this approach?
| usgroup wrote:
| My grind was "Mathematical Statistics with Applications"
| by Wackerly et al. There are PDF versions if you Google
| for it. I can't say it was quick, easy or intuitive; but
| it works.
|
| I also liked "In all Likelihood" by Pawitan for a
| "likelihoodist" foundational approach.
| usgroup wrote:
| I've had similar thoughts, but I think its more to do with what
| is in your head at the time you hear about it. I found
| permutation tests satisfying to learn about because they
| somehow helped consolidate what I knew from distribution
| theory. If I didn't know any distribution theory prior, I'm not
| sure they could have that effect.
|
| If you study mathematical statistics, it is not taught as a
| cookbook. At the elementary level you learn probability theory
| and distribution theory, all the different distributions,
| hypothesis tests, regression, ANOVA and so on proceed from
| there. Meanwhile, I think research scientists are often taught
| statistics as a set of recipes because its usually a short
| course for a specific discipline. E.g. Statistics for
| biologists.
| ImaCake wrote:
| I think those short courses would be more effective if they
| didn't bother with ANOVA and instead taught intro probability
| and distributions and then jumped straight to regression.
| ANOVA is just a really specific way of doing a regression.
|
| In R, and python::statsmodels you get the answer to
| (essentially) an ANOVA any time you run an LM or GLM; its the
| Z-statistic for your whole model.
|
| I know there is more nuance to this, but teaching students
| that they can use regression for most of the problems they
| would have used seemingly arcane tests for is going to be
| much more useful for the students.
|
| Here is a lovely page demonstrating how to do this in R:
| https://lindeloev.github.io/tests-as-linear/
| usgroup wrote:
| I agree with the sentiment although I'm not sure there is
| the time for all of it. At least when I took them,
| probability theory and distribution theory were separate
| semester long courses, and the former was a prerequisite
| for the latter.
| gpderetta wrote:
| Stastsmodels and that github page are the only reason I
| have some understanding of statistical tests.
| bunderbunder wrote:
| _Principles of Statistics_ by M.G. Bulmer is a nice
| introduction to the mathematical side of things. It 's part
| of Dover's classic textbook series, so it's inexpensive
| compared to newer textbooks, and also concise and well-
| written.
|
| It does assume you already have a solid understanding of
| calculus and combinatorics, though. Which I think is fair.
| Discrete statistics is arguably just applied combinatorics,
| and continuous statistics applied calculus, so if you have a
| strong foundation in those two subjects then you're already
| 90% of the way there. (And, if you don't, stop the cart and
| let the horse catch up.)
| dr_dshiv wrote:
| Do you know of any validation studies with Advanced Data
| Analysis (formerly code interpreter) in chatGPT? I think it can
| be excellent as a teaching tool.
| wespiser_2018 wrote:
| The difficulty of teaching statistics is that the maths you
| need to prove things are right and gain an intuitive
| understanding of the methods are far more advanced than what is
| presented in a basic stats course. Gosset came up with the
| t-test and proved to the world it made sense, yet we teach
| students to apply it in a black box way without a fundamental
| understanding of why it's right. That's not great pedagogy.
|
| IMO, this is where Bayesian Statistics is far superior. There's
| a Curry-Howard isomorphism to logic which runs extremely deep,
| and it's possible to introduce using conjugate distributions
| with nice closed form analytical solutions. Anything more
| complex, well, that's what computers are for, and there are
| great ways (STAN) to run complex distributions that are far
| more intricate than frequentist methods.
| zozbot234 wrote:
| Maximum likelihood (which underpins many frequentist methods)
| basically amounts to Bayesian statistics with a uniform prior
| on your parameters. And the "shape" of your prior actually
| depends on the chosen parametrization, so in principle you
| can account for non-flat priors as well.
| nextos wrote:
| IMHO, the discussion should not be so much whether to teach
| Bayesian or maximum likelihood. But instead, whether to
| teach generative models or to keep going with hypothesis
| tests, which are generally presented to students as a bag
| of tricks.
|
| Generative models, (implemented in e.g. Stan, PyMC, Pyro,
| Turing, etc.) split models from inference. So one can
| switch from maximum likelihood to variational inference or
| MCMC quite easily.
|
| Generative models, beginning from regression, make a lot
| more sense to students and yield much more robust
| inference. Most people I know who publish research articles
| on a frequent basis do not know p-values are not a measure
| of effect sizes. This demonstrates current education has
| failed.
| eutectic wrote:
| Maximum Likelihood corresponds to Bayesian statistics with
| MAP estimation, which is not the typical way to use the
| posterior.
| thefringthing wrote:
| > There's a Curry-Howard isomorphism [between] logic [and
| Bayesian statistical inference].
|
| This is an odd way of putting it. I think it's better to say
| that, given some mostly uncontroversial assumptions, if one
| is willing to assign real number degrees of belief to
| uncertain claims, then Bayesian statistical inference is the
| only way of reasoning about those claims that's compatible
| with classical propositional logic.
| jna_sh wrote:
| Very excited to see Mine Cetinkaya-Rundel is an author here! Many
| might be familiar with "R for Data Science"
| (https://r4ds.had.co.nz/), to which she is a contributor, but
| she's also published a lot of great papers around teaching data
| science.
| ayhanfuat wrote:
| She also has some online courses on Coursera
| (https://www.coursera.org/instructor/minecetinkayarundel).
| Hands down one of the best instructors I have seen.
| zvmaz wrote:
| What is a good book on statistics that one can use for self-
| learning?
| noelwelsh wrote:
| Depends where you are starting from and what you want to learn.
| The linked book is a first year introduction, and does a good
| job of that. If you want to go further there are many other
| options:
|
| * Statistical Inference by Casella and Berger. This book has a
| very good reputation for building statistics from first
| principles. I won't link to them, but you can find full PDF
| scans online with a simple search. Amazon reviews:
| https://www.amazon.com/Statistical-Inference-Roger-Berger/dp...
|
| * Statistics by Freedman, Pisani, and Purves has similarly very
| good reviews and can be easily found online. Amazon reviews:
| https://www.amazon.com/Statistics-Fourth-David-Freedman-eboo...
|
| * The majority of the Berkeley data science core curriculum
| books are online. This is not purely statistics but 1) is
| taught in a modern style that makes use of computation and
| randomization and 2) uses tools that may be useful to learn
| about.
|
| 1. https://inferentialthinking.com/chapters/intro.html (Data 8)
|
| 2. https://learningds.org/intro.html (Data 100)
|
| 3. http://prob140.org/textbook/content/README.html (Data 140)
|
| 4. https://data102.org/fa23/resources/#textbooks-from-
| previous-... (Data 102; this gets into machine learning and
| pure statistics)
|
| The Berkeley curriculum is not the only one; there are tens,
| possibly hundreds, of online courses. The Berkeley curriculum
| is just 1) quite extensive and 2) the one I happened to read
| the most about when I was recently researching how data science
| is currently taught.
| sudoankit wrote:
| I particularly like Statistical Inference by George Casella and
| Roger Lee Berger.
|
| You could also look at Introduction to Probability by Joseph K.
| Blitzstein and Jessica Hwang (available for free here:
| http://probabilitybook.net (redirects to drive)).
| laichzeit0 wrote:
| Should be noted that Casella's book is... well... really
| great if you thought Spivak's calculus and Rudin's analysis
| to be fun books, especially the exercises.
|
| Casella's exercises are absolutely brutal.
| dan-robertson wrote:
| I like _statistical rethinking_. It's targeted at science phd
| students so the focus is "how can you use statistics for
| testing your scientific hypotheses and trying to tease out
| causation". It doesn't go deep into the mathematics of things
| (though expects readers to be decently numerate and comfortable
| analysing data without statistics). It only really talks about
| Bayesian models and how to fit them by computer, so won't cover
| much of the frequenting side of things at all.
| verbify wrote:
| ISLR/ISLP is free, was used in my masters and is excellent (and
| has an accompanying video series)
|
| https://www.statlearning.com/
| dtjohnnyb wrote:
| A couple of more introductory books that come at it from the
| point of view of "someone who can code" are: -
| https://greenteapress.com/wp/think-stats-2e/ (and the similar
| Think Bayes if you enjoy this one) -
| https://nostarch.com/learnbayes
|
| Can second Statistical Rethinking though if you have the basics
| of stats and want to learn it again from a very different, more
| causal/bayesian point of view.
| begemotz wrote:
| What is your background and what field will you be applying
| your knowledge to?
|
| There can be a rather wide gap between a theoretical approach
| that you might encounter as taught by a statistician and an
| applied approach you might encounter in a business statistics
| or social science statistics course.
|
| Depending on your math background and the area of intended
| application, in my opinion, it would sway recommendations for a
| first 'book' on statistics for self-learning.
| photochemsyn wrote:
| Good video lecture series:
|
| https://www.thegreatcourses.com/courses/learning-statistics-...
|
| Might be available for free via your local library, too.
| ricksunny wrote:
| I'm looking for help with distilling 'truth' from folk belief
| systems by formalizng them under a Bayesian network framework, in
| case anyone is looking for a project through which to sharpen
| their statistical saw.
| d00mer wrote:
| They should remove "modern" from the title, because who the hell
| uses the "R programming language" these days anymore?
| Onawa wrote:
| Everyone in my branch of Toxicology? Tons of people in
| biological sciences. Just because you have bias against the
| tool and don't run in the same circles doesn't mean that R
| isn't used and love by a subset of devs.
| noelwelsh wrote:
| Statisticians do. The Berkeley curriculum, which I've linked to
| in another comment, uses Python.
| adr1an wrote:
| Everyone but you. Check any statistics journal. Only a few
| people developing methods switched to Python or Julia.
| i_love_limes wrote:
| A lot of people... in fact a huge portion of statisticians,
| epidemiologists, econometrics, use it as their primary
| language.
|
| I do genetic epidemiology (which is considerably more compute
| intensive than regular epidemiology), and R is still the most
| common language, with the most libraries and packages being
| used for it, compared to python for example.
|
| I think maybe you should consider being less forthcoming with
| your opinions on topics which you are not well informed on.
| wespiser_2018 wrote:
| I worked in data science for a few start ups, and even though
| I know Python (it's my LeetCode language of choice), R just
| dominates when it comes to accessing academic methods and
| computational analysis. If you are going to push the
| boundaries of what you can and can't analysis for statistical
| effects and leverage academic learnings, it's R.
| dereify wrote:
| fyi many state-of-the-art statistical libraries exist (or are
| properly maintained) in R only
| ImaCake wrote:
| I find it depends on what you want. There is no canonical GAM
| (gen. addative model) library in python but there are a few
| options - which are not easy to use. The statsmodels GAM
| implementation appears to be broken. R, of course, has a
| stupid easy to use GAM library that is pretty fast.
|
| On the other hand, R has _too many_ obscure options for what
| I can find in scipy or sklearn. So I find it easier to just
| jump into sklearn, use the very nice unified interface
| "pipelines" to churn through a whole bunch of different
| estimators without having to do any munging on my data.
|
| So I think it just depends on your field. But R seems to
| stick more with academia.
| nomilk wrote:
| Before I knew command line, I tried to install python and spent
| the next 3 days resolving an installation issue with 'wheel'.
|
| By contrast, from first downloading R to running my first R
| script took about 1 hour (the most difficult part was opening
| the 'script' pane in RStudio IDE, which doesn't open by default
| on new installations, for some reason).
|
| There's huge demand out there for statistical software that's
| accessible to people whose primary pursuit is not
| programming/cs, but genetics, bioinformatics, economics,
| ecology and other disciplines that necessitate tooling much
| more powerful than excel, but with barriers to entry not much
| greater than excel. R is a fairly amazing fit for those folks.
| perrygeo wrote:
| R and CRAN really get package management right. Even as a
| very infrequent R user, there are no surprises, it "just
| works". Compare that to my daily Python usage where I am
| continually flummoxed by dependency issues.
| _Wintermute wrote:
| Strong disagree, there's a reason RStudio/Posit are
| spending so much time trying to develop 3rd party
| alternatives to install.packages() and CRAN.
|
| Try installing an older version of a package without it
| pulling in the most recent incompatible dependencies, it's
| a whole adventure.
| MilStdJunkie wrote:
| Respectfully, I'm going to ask, "what what?". I can't swing a
| cat without hitting dplyr. It's probably industry dependent
| though - I could see a dataset that's 99% text having
| absolutely no reason to even look at R at all.
| f6v wrote:
| Most people in bioinformatics.
| epgui wrote:
| Probably most people who do statistics.
|
| R sucks as a language but it excels at that specific
| application, just because of its tremendous ecosystem (putting
| even python to shame in some niche areas).
| wespiser_2018 wrote:
| R is fine, it's no more absurd than other non-typed languages
| like javascript. Most languages are very good at one or two
| things, then not so good or appropriate for other tasks. For
| R, that's statistics, modeling, and exploratory analysis,
| which it absolutely crushes at due to ecosystem effects.
| dleeftink wrote:
| Anyone looking to apply and compare frequentist and bayesian
| methods within a unified GUI (which is essentially an elegant
| wrapper to R and selected/custom statistical packages), should
| check out _JASP_ developed by the University of Amsterdam [0]. It
| 's free to use, and the graphs + captions generated during each
| step are publication quality right out of the box.
|
| Using it truly feels like a 'fresh way' to do statistics. Its
| main website provides ample use cases, guides and tutorials, and
| I often return to the blog for the well documented deepdives into
| how traditional frequentist methods and their bayesian
| counterparts compare (the animated explainers are especially
| helpful, and I appreciate the devs reflecting on each release and
| future directions).
|
| [0]: https://jasp-stats.org/
| NeutralForest wrote:
| there was an interview of one of the JASP (creator or
| maintainer, can't remember) in the "Learn Bayesian Stats"
| podcast; it was very interesting.
| rdhyee wrote:
| I think the referenced episode is
| https://learnbayesstats.com/episode/61-why-we-still-use-
| non-... Thanks for pointing it out!
| dleeftink wrote:
| To me, it's academic software _done right_ , both in terms of
| accessibility and maintenance. I'd love to hear more about
| their governance and funding structure and how this might be
| applied elsewhere, and learn about academic software of
| similar grade and utility.
| mindcrime wrote:
| Even better than just being "free to use" it's F/OSS (under the
| AGPL):
|
| https://github.com/jasp-stats/jasp-desktop
| 3abiton wrote:
| How does this compare to other stat libraries?
| begemotz wrote:
| I like the inclusion of randomization and bootstrapping. It's
| unfortunate that the hypothesis framework is still NHST -- I
| wouldn't consider that 'modern' by any means.
| noelwelsh wrote:
| I don't see widespread agreement in the statistics community as
| to what should replace NHST. If you go Bayesian you need to
| completely rewrite the course. I've seen confidence intervals
| suggested as an alternative, but there are arguments against.
| I've also seen arguments that hypothesis tests shouldn't be
| used at all. Given that NHST is still widely used and there
| isn't a clear alternative I think it's a disservice to students
| to not introduce them.
| begemotz wrote:
| I probably should have been more clear. I didn't say
| hypothesis testing, I said NHST (the binary null/alt
| hypothesis approach) - which is an approach to hypothesis
| testing particularly prevelant in certain disciplines such as
| Psychology.
|
| And in that context, there is a lot of agreement that this
| approach is fundamentally flawed and outdated. if you are
| interested, I can provide references when I get to the
| office. But off the top of my head consider Gigerenzer and
| Cummings.
| noelwelsh wrote:
| For those following along at home Gigerenzer is, I think,
| "Mindless Statistics"[1] and Cummings is "The New
| Statistics"[2].
|
| [1]: https://pure.mpg.de/rest/items/item_2101336/component/
| file_2... [2]: Sample at
| https://tandfbis.s3.amazonaws.com/rt-
| media/pp/common/sample-...
| begemotz wrote:
| Yes, those are appropriate (although Gigerenzer and
| Cummings both have other relevant publications on the
| topic).
|
| As for a undergraduate text that 'teaches the
| difference', you can look at 'An Introduction to
| Statistics' by Carlson & Winquist.
| RedShift1 wrote:
| Can I download this as a PDF? I'd like to read it offline.
| noelwelsh wrote:
| Here: https://www.openintro.org/book/ims/
| RedShift1 wrote:
| This is the first version, not the 2nd?
| noelwelsh wrote:
| Hmmm ... must be because the 2nd edition is still in
| progress. Best option might be to follow the immortal words
| of Obiwan Kenobi and "use the source":
| https://github.com/OpenIntroStat/ims
|
| Otherwise you can try building a PDF from the very similar
| Data 8 book[1] using [2]
|
| [1]: https://github.com/data-8/textbook
|
| [2]: https://jupyterbook.org/en/stable/advanced/pdf.html
| usgroup wrote:
| I think Ronald Fisher may not have used bootstrap to calculate
| confidence intervals; but it looks to me like he invented most of
| the rest of the syllabus .. in the early 1900s :-)
| mjburgess wrote:
| What's often missing from these introductions is when statistics
| will not work; and what it even means when it "works". The amount
| of data needed to tell between two normal is about 30 data points
| -- between two power-law distributions, >trillion. (And this
| basically scuppers the central limit theorem, on which a lot of
| cargo-cult stats is justified).
|
| Stats, imv, should be taught simulation-first: code up your
| hypotheses and see if they're even testable. Many many projects
| would immediately fail at the research stage.
|
| Next, know that predictions are almost never a good goal. Almost
| everything is practically unpredictable -- with a near infinite
| number of relevant causes, uncontrollable.
|
| At best, in ideal cases, you can use stats to model a
| distribution of predictions _and then_ determine a risk /value
| across that range. Ie., the goal isnt to predict anything but to
| prescribe some action (or inference) according to a risk
| tolerance (risk of error, or financial risk, etc.).
|
| It seems a generation of people have half-learned bits of stats,
| glued them together, and created widespread 'statistical cargo-
| cultism'.
|
| The lesson of stats isnt hypothesis testing, but how almost no
| hypotheses are testable -- _and then_ what do you do
| Ensorceled wrote:
| It's ironic that this ... rant? ... is basically unreadable
| without knowledge of basic statistical methods.
|
| How do you teach any of this to someone who hasn't already
| taken introductory statistics? How do you learn anything if you
| first have to learn the myriad ways something you don't even
| have a basic working knowledge of can fail before you learn it?
| mjburgess wrote:
| The comment is addressed to the informed reader who is the
| only one with a hope of being persuaded on this point.
|
| To teach this, from scratch, I think is fairly easy -- but
| there's few with any incentive to do it. Many in academia
| wouldnt know how, and if they did, would discover that much
| of their research can be shown _a priori_ to not be
| worthwhile (rather than after a decade of 'debate').
|
| All you really need is to start with establishing an
| intuitive understanding of randomness, how apparently highly
| patterned it is, and so on. Then ask: how easy is it to
| reproduce an observed pattern with (simulated) randomness?
|
| That question alone, properly supported via basic programming
| simulations, will take you extremely far. Indeed, the answer
| to it is often obvious -- a trivial program.
|
| That few ever write such programs shows how the whole edifice
| of stats education is geared towards confirmation bias.
|
| Before computers, stats was either an extremely mathematical
| disipline seeking (empirically useless) formula for toy
| models; or using heuristic empirical formula that rarely
| applied.
|
| Computers basically obviate all of that. Stats is mostly
| about counting things and making comparisons -- perfect tasks
| for machines. with only a few high-school mathematical
| formula most could derive most useful statistical techniques
| as simple computer programs.
| noelwelsh wrote:
| The modern approach, of which this textbook is an example,
| does start with simulation. In fact there is very little
| classical statistics (distributions, analytic tests) in the
| book. The Berkeley Data 8 book, which I link to in another
| comment, takes the same approach. I imagine there is still
| too much classical material for your tastes, but there is
| definitely change happening.
| 2devnull wrote:
| " that much of their research can be shown a priori to not
| be worthwhile"
|
| Bingo. Cargo cult stats all the way down. It's not just
| personal interest, it's the entire field, it's their
| colleagues, mentors, and students. Good luck getting
| somebody to see the light when not just their own income
| depends on not seeing it, their whole world depends on the
| "stat recipes" handed down from granny.
| brutusborn wrote:
| I think the egotistical aspect is the most powerful: many
| researchers have built an identity based on the fact that
| they "know" something, so to propose better alternatives
| to their pet theories is tantamount to proposing their
| life is a lie. To change their mind they need to admit
| they didn't "know".
|
| The better the alternatives, the more fierce the passion
| with which they will be rejected by the mainstream.
| 2devnull wrote:
| I now think it's best explained by simple economics.
| Academia and academics are the product of economic forces
| by and large. It's not quirky personalities or uniquely
| talented minds that make up academia today. It's droves
| of conscientious (big five sense) conformists, with
| either high iq or mere socio-economic privilege, who have
| been trained by our society to feel that financial
| security means college, and even more financial security
| means even more college. Credentials are like alpha .05,
| they solve a scale problem in a way that alters the
| quality/quantity ratio. If you want more
| researchers/research/science output, credentials and
| alpha .05 cargo cult stats are your levers to get more
| quantity at lower quality.
| Retric wrote:
| It seems like a reasonable critique. The suggestion is to
| include such ideas as people are taking introductory
| statistics which isn't inappropriate. I wouldn't suggest
| forcing students to code up their own simulations from
| scratch, but creating a framework where students can plug in
| various formula for each population, attach a statistical
| test, and then run various simulations could do quite a bit.
| However, what kinds of formula students are told to plug in
| are important.
|
| If every formula is producing bell curves then that's a
| failure to educate people. 50d6 vs 50d6 + 1 is easy enough
| you can include 1d2 * 50 + 50d6 for a 2 tailed distribution,
| but also significantly different distributions which then
| fail various tests etc.
|
| I've seen people correctly remember the formula for
| statistical tests from memory and then wildly misapply them.
| That seems like focusing on the wrong things in an age when
| such information is at everyone's fingertips, but
| understanding of what that information means isn't.
| taeric wrote:
| Model building, at large, is the thing I regret being bad at.
| Model your problem and then throw inputs at it and see what you
| can see.
|
| Sucks, as we seem to have taught everyone that statistical
| models are somehow unique models that can only be made to get a
| prediction. To the point that we seem to have hard delineations
| between "predictive" models and other "models.".
|
| I suspect there are some decent ontologies there. But, at
| large, I regret that so many won't try to build a model.
| srean wrote:
| I work in applied ML and stats. Whenever a client gets pushy
| about getting a prediction and would not care about quantifying
| the uncertainty around it, I take it as a signal to disengage
| and look for better pastures. It is really not worth the time,
| more so if you value integrity.
|
| Competent stakeholders and decision makers use the uncertainty
| around predictions, the chances of an outcome that is different
| from the point-predicted outcome, to come to a decision and the
| plan includes what the course of action should be should the
| outcome differ from the prediction.
| 0xDEAFBEAD wrote:
| >The amount of data needed to tell between two normal is about
| 30 data points
|
| What are you trying to say here? If there are two normal
| distributions, both with variance one, one having mean 0 and
| the other having mean 100, and I get a single sample from one
| of the distributions, I can guess which distribution it came
| from with very high confidence. Where did the number 30 come
| from?
| sndean wrote:
| > Where did the number 30 come from?
|
| Yeah, I've also heard 30 for normal distributions over and
| over in ~7 stats courses that I've taken.
|
| This SE stats answer sounds reasonable enough:
| https://stats.stackexchange.com/a/2542
| juunpp wrote:
| I am a noob and I've always got stuck on comparing two
| independent means. Assumption: normality. Yeah, data is never
| normal in my bakery.
| haberman wrote:
| This really resonates with me. I've attempted self-study about
| statistics many times, each time wanting to understand the
| fundamental assumptions that underlie popular statistical
| methods. When I read the result of a poll or a a scientific
| study, how rigorous are the claimed results, and what false
| assumptions could undermine them?
|
| I want to build intuitions for how these statistical methods
| even work, at a high level, before getting drowned in math
| about all the details. And like you say, I want to understand
| the boundaries: "when statistics will not work; and what it
| even means when it "works".
|
| I imagine that different methodologies exist on a spectrum,
| where some give more reliable results, and others are more
| likely to be noise. I want to understand how to roughly tell
| the good from the bad, and how to spot common problems.
| wespiser_2018 wrote:
| "Simulation first" is how I did things when I worked in data
| science and bioinformatics. Define the simulation that
| represents "random", then see how far off the actual data is
| using either information theory or just a visual examination of
| the data and summary statistic checks. That's a fast and easy
| way to gut check any observation to see if there is an
| underlying effect, which you can then "prove" using a more
| sophisticated analysis.
|
| Just raw hypothesis is just too easy to juke by overwhelming it
| with trials. Lots of research papers have "statistically
| significant" results, but give no mention of how many
| experiments it took to get them, or any indiciation of negative
| results. Eventually, there will always be the analysis where
| you incorrectly reject the null hypothsis given enough effort.
| RSMDZ wrote:
| >> between two power-law distributions, >trillion
|
| Do you have anywhere I can read more about this? I would have
| assumed that a trillion data points would be sufficient to
| compare any two real-world distributions
| bigbillheck wrote:
| > The amount of data needed to tell between ... two power-law
| distributions, >trillion.
|
| I don't agree with this as a statement of fact (except in the
| obvious case of two power-law distributions with extremely
| close parameters). Supposing it was true, that would mean that
| you would almost never have to actually worry about the
| parameter, because unless your dataset is that large one power
| law is about as good as any other for describing your data.
| elashri wrote:
| Thanks to the author for the book and making it open access. I
| always admire these efforts.
| growingkittens wrote:
| Is there a "pre-statistics" book that teaches the thinking skills
| and concepts needed to understand statistics?
| ndr wrote:
| This book seems to start where you need it to start.
|
| You don't need much beyond basic calculus. Most suffer from
| some mental block they got installed at a young age akin those
| that say "I'm bad at math" because their teacher sucked. Dive
| in and you won't regret it.
| obscurette wrote:
| I have been a math teacher and although I can't guarantee
| that I didn't suck, I can say that most of kids don't develop
| this attitude because of teachers, but because of their
| parents. "My mum says that she sucked at math/music/whatever
| as well, so do I!" is far too common. As a teacher I just
| didn't have resources to influence this attitude either.
| ndr wrote:
| Yes, parents can be horrible too. Unfortunately it's
| somehow socially acceptable and even worthy of pride in
| some circles, to be "bad at math". It's seems very rare for
| someone to openly say "I'm bad at [my native language]" or
| "writing".
|
| I feel stats is has a somewhat similar effect even among
| those with math education. Several friends who have a
| degree in math recoil at the first mention of stats
| concepts.
| obscurette wrote:
| > It's seems very rare for someone to openly say "I'm bad
| at [my native language]" or "writing".
|
| It is actually even fashionable in non-english countries.
| Declaring "I'm bad at [my native language], I only use
| english anyway" makes you a better person somehow. And
| it's not rare in other areas either - in post-truth world
| it's trendy not to know things.
| Novosell wrote:
| In non-english countries? All of them? Source? I, as a
| person from one of said non-english countries, disagree.
| growingkittens wrote:
| My mental block is a brain injury that went undiagnosed until
| I was 30. I can't really hold more than two numbers in my
| head at a time. I struggled through math in school because it
| was lecture based, and the books were written to accompany a
| lecture.
|
| I can learn math fairly well if I have the right written
| material and the right direction. However, I do not retain
| math skills: without active practice, I revert back to "how
| do fractions work?"
|
| For example, I did extremely well in a college algebra course
| that was partially online (combined with Khan Academy to
| catch up). I could do my tests perfectly in pen, much to the
| amusement of the assistants. I could make connections and see
| the implications and applications of the math. Roughly three
| to six months later, I was back to forgetting fractions.
|
| I can't learn these things over time, but I can learn them
| all at once. I'm collecting resources for my next math
| adventure.
| armcat wrote:
| One of my favourite books on statistics and probability is
| "Regression and Other Stories", by Andrew Gelman, Jennifer Hill
| and Aki Vehtari. You can access the book for free here:
| https://users.aalto.fi/~ave/ROS.pdf
| epgui wrote:
| +1, this is a great textbook, and not just for social sciences
| as the second header would suggest.
| epgui wrote:
| As much as I appreciate and love all pedagogical endeavours in
| the field, especially in the form of open texts, I really,
| really, really dislike this overall approach to teaching
| introductory statistics.
|
| I'm hoping to see, over time, a shift away from ad-hoc null
| hypothesis testing in favour of linear models (yes, in
| introductory courses, from the start-- see link below) and
| Bayesian-by-default approaches.
|
| https://lindeloev.github.io/tests-as-linear/#:~:text=Most%20....
| bschne wrote:
| I am partway through McElreath's "Statistical Rethinking" and I
| fully agree with this.
| epgui wrote:
| That's a great textbook!
| TheAlchemist wrote:
| It's been recommended on this topic several times, so I'm
| looking at it. Quite expensive ! I see there is a series of
| lectures, which seems identical to the book. Is it the same
| ? Or still worth buying the book ?
| noelwelsh wrote:
| The lectures are good, and I've been told the book can be
| found online by the intrepid. I guess that Anna's Archive
| or Library Genesis has it.
| TheAlchemist wrote:
| I've found the book indeed - although it seems to be the
| first edition.
|
| It's here:
| https://civil.colorado.edu/~balajir/CVEN6833/bayes-
| resources...
| begemotz wrote:
| I agree about teaching from a unified GLM basis. The 'bayesian-
| by-default' approach seems to going out on a more tenuous limb,
| imo.
| JHonaker wrote:
| It's only appears tenuous because the subjective choices you
| have to make when using frequentist methods are made for you
| by the developer of the method.
|
| It's less comfortable to use Bayesian methods because you
| have to be explicit about your assumptions _as the user_ ,
| which opens your assumptions up for easier inspection.
| There's also way less specific information implied by priors
| than most people think. Informative priors should try to make
| distinctions between something that's reasonable-ish and
| something that's essentially infinity (take pharmacokinetics
| for example, the diffusion velocity of a molecule in your
| blood stream shouldn't have a velocity near the speed of
| light in a vacuum should it?). They should not be forcing
| your model to achieve a particular result. Luckily, because
| of the need to explicitly state them in a Bayesian analysis,
| it's much easier to determine if they were properly set.
|
| Prior specification is essentially problem domain-informed
| regularization where you can actually hope to understand if
| the hyperparameter is going to work or not.
| fallat wrote:
| > I'm hoping to see, over time, a shift away from ad-hoc null
| hypothesis testing in favour of linear models (yes, in
| introductory courses, from the start-- see link below) and
| Bayesian-by-default approaches.
|
| Is there anything where I can start today, as a guinea pig? My
| statistics education is basically zero.
| bschne wrote:
| See my sibling comment, can recommend this:
| https://xcelab.net/rm/statistical-rethinking/
| noelwelsh wrote:
| There are other comments here that suggests a number of books
| at varying levels. "Introduction to Modern Statistics" is
| very approachable in its presentation.
| willsmith72 wrote:
| The epub is apparently too big to send to a kindle, but I can't
| see the option to download it, only the pdf. Any ideas?
| tea-coffee wrote:
| This looks to be the 2nd edition. Can anyone comment on how the
| 1st edition was?
| mavam wrote:
| For studying statistics, I put together a comprehensive cheat
| sheet: https://github.com/mavam/stat-cookbook
___________________________________________________________________
(page generated 2023-10-12 21:00 UTC)