[HN Gopher] Relating t-statistics and the relative width of conf...
___________________________________________________________________
Relating t-statistics and the relative width of confidence
intervals
Author : luu
Score : 46 points
Date : 2024-03-09 02:32 UTC (2 days ago)
(HTM) web link (statmodeling.stat.columbia.edu)
(TXT) w3m dump (statmodeling.stat.columbia.edu)
| nerdponx wrote:
| Great little demo.
|
| > It is only when the statistical evidence against the null is
| overwhelming -- "six sigma" overwhelming or more --that you're
| also getting tight confidence intervals in relative terms. Among
| other things, this highlights that if you need to use your
| estimates quantitatively, rather than just to reject the null,
| default power analysis is going to be overoptimistic.
|
| This I think will be a real head-scratcher for a lot of students,
| who are often taught to construct confidence intervals by no
| method apart from "inverting" an hypothesis test. It illustrates
| one of the many challenges (and dangers!) of teaching statistics.
| FabHK wrote:
| I was a bit confused by the article initially:
|
| > Perhaps most simply, with a t-statistic of 2, your 95%
| confidence intervals will nearly touch 0.
|
| Your 95% CI _will_ include 0, unless you have more than 50 or so
| data points, in which case there 's no point in using Student's
| t-distribution, might as well use the Gaussian, which the author
| seems to assume, and which I thought gave rise to the z-score (in
| my mind, t-statistic = t-distribution, z-score = normal
| distribution).
|
| But then looking things up, it turns out that difference is that
| the z-score is computed with population mean and sd, while the
| t-statistic is computed with sample mean and sd. So, yeah,
| practically you'll use the t-statistic (and it will be
| t-distributed if the population is normally distributed), unless
| you already know population mean and sd, in which case you can
| compute the z-score (which will approach the normal distribution
| by CLT under certain conditions with large enough samples, but is
| otherwise not predicated on normality in any way).
|
| Then all the author was pointing out is that if we take a +/- 2
| standard error CI, then if your statistic is 2, the CI goes from
| 0 to 4, giving rise to a 100% "half-width" of the CI, while if
| your statistic is 4, say, the CI goes from 2 to 6, giving rise to
| just 50% half-width.
| nerdponx wrote:
| The T distribution arises as the ratio between a Gaussian r.v.
| and the square root of a Chi-square r.v.: X ~
| Gaussian(m, v) S ~ ChiSquare(n) T = X / sqrt(S / n)
|
| The sampling distribution of the sample mean is Gaussian
| whenever the data is Gaussian, or whenever the CLT applies.
|
| The sampling distribution of the sample variance is Chi-square
| whenever the data is Gaussian. But the CLT does _not_ have any
| effect here. In general there isn 't much else that we can say
| about the sampling distribution of the sample variance.
|
| Thus if we want to compute a "sample Z statistic" using those
| estimated quantities, _and_ we know the data is Gaussian, then
| we know the sampling distribution of that "sample Z
| statistic". It's the T distribution.
|
| But the assumption of underlying Gaussian data is important
| here. The CLT doesn't help us derive a T distribution. But it
| _is_ true that our "sample Z statistic" is asymptotically
| Gaussian, in which case the T distribution itself is
| approximately Gaussian. [0]
|
| So a "T test" (meaning "test of a difference in means"), using
| the actual T distribution as the null distribution of the test
| statistic, is basically never valid on non-Gaussian data. But
| it's valid (asymptotically) using the Gaussian distribution as
| the null distribution of the test statistic.
|
| That's a lot of reasoning to say: `avg(y) / sqrt(var(y) / n)`
| could be either T or Gaussian, depending on the assumptions and
| context. But I would push back on conflating that with `avg(y)
| / sqrt(s^2 / n)`. Even if they have the same distribution, they
| are not the same thing.
|
| [0]: If you want a good writeup, see
| https://stats.stackexchange.com/a/253318/36229. Or look into a
| good statistics textbook.
| mjburgess wrote:
| One important caveat to all these methods is that the central
| limit theorem must hold for the sample means and this is an
| _empirical_ condition, not something you can know statistically.
|
| Another important caveat: many things we want to measure are not
| well-distributed to allow the CLT to hold. If it doesnt, the bulk
| of statistical methods don't work and the results are bunk.
|
| Many quantities follow power-law distributions which would
| require trillions+ data points for the CLT to do its magic, ie.,
| for the sample means of set A to be statistically-significantly
| different from set B would require 10^BIG if the property
| measured in A/B is powerlaw distributed.
|
| Now, even worse: many areas of "science" study phenomena which is
| almost certainly power-law distributed, and use these methods to
| do so.
| ASpring wrote:
| I'm not sure I'm fully understanding your point. Is it that
| constructing confidence intervals using t-statistics is
| inappropriate for a lot of real data that isn't distributed
| somewhat normally?
| nerdponx wrote:
| It's their point, and it's a good one, but I think they're
| somewhat overstating how common power-law data is; it
| probably varies a lot by field of study. And at least the
| logarithm of a power-law variable can help bring it back
| closer to the world of sanity. Plus, there are plenty of
| fields where nonparametric tests of medians are accepted
| standard practice.
| mjburgess wrote:
| You can turn most issues into powerlaws by recursing a
| reasonable risk distribution over it.
|
| So suppose we ask, what is our confidence in X? (rather
| than X); and then, what is our confidence in the model by
| which we give confidences in X (ie., the model risk); and
| so on...
|
| In practice, what we want to model is the appropriate
| confidence, not an actual prediction (bunk). So we are very
| often screwed.
|
| Statistics is an illusion.
| kgwgk wrote:
| > many things we want to measure are not well-distributed to
| allow the CLT to hold
|
| I guess that may be true for some values of "many" and "we" but
| most things we want to measure have finite variance.
| baq wrote:
| for some examples of "not-most" things - https://en.wikipedia
| .org/wiki/Cauchy_distribution#Occurrence...
| pocketsand wrote:
| In fact, the CLT is remarkably robust to distributional
| assumptions. Examples where it breaks down (e.g., non-finite
| variance) are comparatively rare, even if there are "many" of
| them.
|
| As with all things statistical, judgment is required.
| 317070 wrote:
| > Examples where it breaks down (e.g., non-finite variance)
| are comparatively rare, even if there are "many" of them.
|
| I would beg to differ. They are absolutely not rare. [0] One
| of the most famous people in probability theory even claimed
| that it's the ones with finite moments that are rare in
| practice. (But I couldn't find this quote back, I thought it
| was Poisson or Laplace)
|
| It's even worse. Many distributions where the CLT does apply,
| require so many samples for them to actually work, that it
| does not really apply in practice anymore. Any skew in your
| data blows up the amount of samples you need to find things
| like the empirical mean.
|
| [0] Chapter 3.4
| https://arxiv.org/ftp/arxiv/papers/2001/2001.10488.pdf
___________________________________________________________________
(page generated 2024-03-11 23:01 UTC)