[HN Gopher] Relating t-statistics and the relative width of conf...
       ___________________________________________________________________
        
       Relating t-statistics and the relative width of confidence
       intervals
        
       Author : luu
       Score  : 46 points
       Date   : 2024-03-09 02:32 UTC (2 days ago)
        
 (HTM) web link (statmodeling.stat.columbia.edu)
 (TXT) w3m dump (statmodeling.stat.columbia.edu)
        
       | nerdponx wrote:
       | Great little demo.
       | 
       | > It is only when the statistical evidence against the null is
       | overwhelming -- "six sigma" overwhelming or more --that you're
       | also getting tight confidence intervals in relative terms. Among
       | other things, this highlights that if you need to use your
       | estimates quantitatively, rather than just to reject the null,
       | default power analysis is going to be overoptimistic.
       | 
       | This I think will be a real head-scratcher for a lot of students,
       | who are often taught to construct confidence intervals by no
       | method apart from "inverting" an hypothesis test. It illustrates
       | one of the many challenges (and dangers!) of teaching statistics.
        
       | FabHK wrote:
       | I was a bit confused by the article initially:
       | 
       | > Perhaps most simply, with a t-statistic of 2, your 95%
       | confidence intervals will nearly touch 0.
       | 
       | Your 95% CI _will_ include 0, unless you have more than 50 or so
       | data points, in which case there 's no point in using Student's
       | t-distribution, might as well use the Gaussian, which the author
       | seems to assume, and which I thought gave rise to the z-score (in
       | my mind, t-statistic = t-distribution, z-score = normal
       | distribution).
       | 
       | But then looking things up, it turns out that difference is that
       | the z-score is computed with population mean and sd, while the
       | t-statistic is computed with sample mean and sd. So, yeah,
       | practically you'll use the t-statistic (and it will be
       | t-distributed if the population is normally distributed), unless
       | you already know population mean and sd, in which case you can
       | compute the z-score (which will approach the normal distribution
       | by CLT under certain conditions with large enough samples, but is
       | otherwise not predicated on normality in any way).
       | 
       | Then all the author was pointing out is that if we take a +/- 2
       | standard error CI, then if your statistic is 2, the CI goes from
       | 0 to 4, giving rise to a 100% "half-width" of the CI, while if
       | your statistic is 4, say, the CI goes from 2 to 6, giving rise to
       | just 50% half-width.
        
         | nerdponx wrote:
         | The T distribution arises as the ratio between a Gaussian r.v.
         | and the square root of a Chi-square r.v.:                 X ~
         | Gaussian(m, v)       S ~ ChiSquare(n)       T = X / sqrt(S / n)
         | 
         | The sampling distribution of the sample mean is Gaussian
         | whenever the data is Gaussian, or whenever the CLT applies.
         | 
         | The sampling distribution of the sample variance is Chi-square
         | whenever the data is Gaussian. But the CLT does _not_ have any
         | effect here. In general there isn 't much else that we can say
         | about the sampling distribution of the sample variance.
         | 
         | Thus if we want to compute a "sample Z statistic" using those
         | estimated quantities, _and_ we know the data is Gaussian, then
         | we know the sampling distribution of that  "sample Z
         | statistic". It's the T distribution.
         | 
         | But the assumption of underlying Gaussian data is important
         | here. The CLT doesn't help us derive a T distribution. But it
         | _is_ true that our  "sample Z statistic" is asymptotically
         | Gaussian, in which case the T distribution itself is
         | approximately Gaussian. [0]
         | 
         | So a "T test" (meaning "test of a difference in means"), using
         | the actual T distribution as the null distribution of the test
         | statistic, is basically never valid on non-Gaussian data. But
         | it's valid (asymptotically) using the Gaussian distribution as
         | the null distribution of the test statistic.
         | 
         | That's a lot of reasoning to say: `avg(y) / sqrt(var(y) / n)`
         | could be either T or Gaussian, depending on the assumptions and
         | context. But I would push back on conflating that with `avg(y)
         | / sqrt(s^2 / n)`. Even if they have the same distribution, they
         | are not the same thing.
         | 
         | [0]: If you want a good writeup, see
         | https://stats.stackexchange.com/a/253318/36229. Or look into a
         | good statistics textbook.
        
       | mjburgess wrote:
       | One important caveat to all these methods is that the central
       | limit theorem must hold for the sample means and this is an
       | _empirical_ condition, not something you can know statistically.
       | 
       | Another important caveat: many things we want to measure are not
       | well-distributed to allow the CLT to hold. If it doesnt, the bulk
       | of statistical methods don't work and the results are bunk.
       | 
       | Many quantities follow power-law distributions which would
       | require trillions+ data points for the CLT to do its magic, ie.,
       | for the sample means of set A to be statistically-significantly
       | different from set B would require 10^BIG if the property
       | measured in A/B is powerlaw distributed.
       | 
       | Now, even worse: many areas of "science" study phenomena which is
       | almost certainly power-law distributed, and use these methods to
       | do so.
        
         | ASpring wrote:
         | I'm not sure I'm fully understanding your point. Is it that
         | constructing confidence intervals using t-statistics is
         | inappropriate for a lot of real data that isn't distributed
         | somewhat normally?
        
           | nerdponx wrote:
           | It's their point, and it's a good one, but I think they're
           | somewhat overstating how common power-law data is; it
           | probably varies a lot by field of study. And at least the
           | logarithm of a power-law variable can help bring it back
           | closer to the world of sanity. Plus, there are plenty of
           | fields where nonparametric tests of medians are accepted
           | standard practice.
        
             | mjburgess wrote:
             | You can turn most issues into powerlaws by recursing a
             | reasonable risk distribution over it.
             | 
             | So suppose we ask, what is our confidence in X? (rather
             | than X); and then, what is our confidence in the model by
             | which we give confidences in X (ie., the model risk); and
             | so on...
             | 
             | In practice, what we want to model is the appropriate
             | confidence, not an actual prediction (bunk). So we are very
             | often screwed.
             | 
             | Statistics is an illusion.
        
         | kgwgk wrote:
         | > many things we want to measure are not well-distributed to
         | allow the CLT to hold
         | 
         | I guess that may be true for some values of "many" and "we" but
         | most things we want to measure have finite variance.
        
           | baq wrote:
           | for some examples of "not-most" things - https://en.wikipedia
           | .org/wiki/Cauchy_distribution#Occurrence...
        
         | pocketsand wrote:
         | In fact, the CLT is remarkably robust to distributional
         | assumptions. Examples where it breaks down (e.g., non-finite
         | variance) are comparatively rare, even if there are "many" of
         | them.
         | 
         | As with all things statistical, judgment is required.
        
           | 317070 wrote:
           | > Examples where it breaks down (e.g., non-finite variance)
           | are comparatively rare, even if there are "many" of them.
           | 
           | I would beg to differ. They are absolutely not rare. [0] One
           | of the most famous people in probability theory even claimed
           | that it's the ones with finite moments that are rare in
           | practice. (But I couldn't find this quote back, I thought it
           | was Poisson or Laplace)
           | 
           | It's even worse. Many distributions where the CLT does apply,
           | require so many samples for them to actually work, that it
           | does not really apply in practice anymore. Any skew in your
           | data blows up the amount of samples you need to find things
           | like the empirical mean.
           | 
           | [0] Chapter 3.4
           | https://arxiv.org/ftp/arxiv/papers/2001/2001.10488.pdf
        
       ___________________________________________________________________
       (page generated 2024-03-11 23:01 UTC)