[HN Gopher] How large is that number in the Law of Large Numbers?
___________________________________________________________________
How large is that number in the Law of Large Numbers?
Author : sebg
Score : 110 points
Date : 2023-09-12 13:06 UTC (9 hours ago)
(HTM) web link (thepalindrome.org)
(TXT) w3m dump (thepalindrome.org)
| wodenokoto wrote:
| I really like how the plots and graphics look. Is it the library
| by 3blue1brown? (Is it manim, it's called?)
| bannedbybros wrote:
| [dead]
| yafbum wrote:
| Stats class role of thumb: if you need to calculate the relative
| probability of two outcomes, you can get to within about 10% once
| you get 100 samples of each outcome (so, need more samples
| overall if the distribution is skewed).
| gear54rus wrote:
| Its interesting that even in this thread the 2 answers differ
| by an order of magnitude lol
| gipp wrote:
| Eh it's really the same rule, just applying a different
| threshold.
| marcosdumay wrote:
| The problem is that the sensitivity to the number growth is
| supposed to be exponential. So if you need 100 samples for
| "within 10% of the value", then 10 samples should give you
| almost completely random behavior.
|
| In reality, it depends on your actual distribution, but the
| OP from this thread here is unreasonably conservative for
| something described as a "rule of thumb". Almost always, if
| you have at least 10 of every category, you can already
| discover every interesting thing that a rule of thumb will
| allow. And you probably could go with less. But if you want
| precision, you can't get it with rules of thumb.
| CaptainNegative wrote:
| The dependence on sample size is not exponential, it's
| sublinear. The heuristic rate of convergence to keep in
| mind is the square root of the sample size, i.e. getting
| 10x more samples shrinks the margin of error (in a
| multiplicative sense) by sqrt(10) [?] 3ish.
|
| The exponential bit applies to the probability densities
| as a function of the bounds themselves, i.e. how likely
| you are to fall x units away from the mean typically
| decreases exponentially with (some polynomial in) x.
|
| Of course, this is all assuming a whole bunch of standard
| conditions on the data you're looking at (independence,
| identically distributed, bounded variance, etc.) and may
| not hold if these are violated.
| [deleted]
| jcranmer wrote:
| FWIW, the threshold I learned was 20 in each bucket, so now
| you have 3 answers.
| koolba wrote:
| Just apply it recursively. Let's get 100 samples of comments
| suggesting the number of samples to use. Then average those.
| dragontamer wrote:
| My statistics class at high school level taught the following:
|
| The number of samples you need is very difficult to calculate
| correctly, requiring deep analysis of standard deviations and
| variances.
|
| But surprisingly, you can simply know you've reached large number
| status when over 10 items exist in each category.
|
| ---------
|
| Ex: when doing a heads vs tails coin flip experiment, you likely
| have a large number once you have over 10 heads and over 10
| tails. No matter how biased the coin is.
|
| Or in this 'Lotto ticket' example, you have a large number of
| samples after gathering enough data to find over 10 Jackpot
| winners.
| jmount wrote:
| Very cool rule.
|
| I think you can justify it by approximating each category as an
| independent Poisson distribution. Then for each such processes
| the variance equals the mean. So once you have 10 successes in
| a bin, you have evidence of a probably good estimate for the
| arrival rate of that category. The book "The Probabilistic
| Method" calls a related idea "the Poisson paradigm."
|
| (10 a nice round number where the standard deviation is below
| the mean)
| jmount wrote:
| Small proviso: this is only true for a reasonable number of
| categories (or you run into repeated experiment problems).
| NelsonMinar wrote:
| That's a neat rule of thumb; is there a simple statistical
| argument for why 10 is the (not very large) number?
| [deleted]
| tgv wrote:
| For heads or tails, that leaves a very large margin. In approx.
| 1 in 20 trials, you'll end up with a 10-20 split.
| dragontamer wrote:
| Yeah, 95% confidence ratio (or approximately two standard
| deviations) is pretty standard with regards to statistical
| tests.
|
| You gotta draw the line somewhere. At high-school statistics
| level, its basically universally drawn at the 95% confidence
| level. If you wanna draw new lines elsewhere, you gotta make
| new rules yourself and recalculate all the rules of thumb.
| User23 wrote:
| I remember my high school AP Psychology teacher mocking
| p=0.05 as practically meaningless. In retrospect it's funny
| for a psychologist to say that, but I guess it was because
| he was from the more empirically minded behaviorist
| cognitive school and from time to time they have done
| actual rigorous experiments[1] (in rodents).
|
| [1] For example as described by Feynman in Cargo Cult
| Science.
| tgv wrote:
| The observation above is simply true. If you toss a coin
| 30 times, there's about a 5% chance that you'll end up
| with 10-20 ratio or one more extreme.
|
| NHST testing inverts the probability logic, makes the 5%
| holy, and skims over the high probability of finding
| something that is not equal to a specific value. That
| procedure is then used for theory confirmation, while it
| was (in another form) meant for falsification. Everything
| is wrong about it, even if the experimental method is
| flawless. Hence the reproducibility crisis.
| lisper wrote:
| The problem is two-fold:
|
| 1. p=0.05 means that one result in 20 is going to be the
| result of chance.
|
| 2. It's generally pretty easy (especially in psychology)
| to do 20 experiments, cherry-pick -- and publish! -- the
| p=0.05 result, and throw away the others.
|
| The result is that _published_ p=0.05 results are much
| _more_ likely than 1 in 20 to be the result of chance.
| dragontamer wrote:
| So run a meta-study upon the results published by a set
| of authors and double-check to make sure that their
| results are normally distributed across the p-values
| associated with their studies.
|
| These problems are solved problems in the scientific
| community. Just announce that regular meta-studies will
| be done, expectations for authors to be normally
| distributed is published, and publicly show off the meta-
| study.
|
| -------------
|
| In any case, the discussion point you're making is well
| beyond the high-school level needed for a general
| education. If someone needs to run their own experiment
| (A/B testing upon their website) and cannot afford a
| proper set of tests/statistics, they should instead rely
| upon high-school level heuristics to design their
| personal studies.
|
| This isn't a level of study about analyzing other
| people's results and finding flaws in other people's
| (possibly maliciously seeded) results. This is a
| heuristic about how to run your own experiments and how
| to prove something to yourself at a 95% confidence level.
| If you want to get published in the scientific community,
| the level of rigor is much higher of course, but no one
| tries to publish a scientific paper on just a high school
| education (which is where I was aiming my original
| comment at).
| User23 wrote:
| There's a professor of Human Evolutionary Biology at
| Harvard who only has a high school diploma[1]. Needless
| to say he's been published and cited many times over.
|
| [1] https://theconversation.com/profiles/louis-
| liebenberg-122680...
| withinboredom wrote:
| I don't know whether you're mocking them or being
| supportive of them or just stating a fact. Either way,
| education level has no bearing on subject knowledge. I
| know more about how computers, compilers, and software
| algorithms work than most post-docs and professors that
| I've run into in those subjects.
|
| Am I smarter than them? Nope. Do I know as many fancy big
| words as them? Nope. Do I care about results and
| communicating complex topics to normal people? Yep. Do I
| care more about making the company money than chasing
| some bug-bear to go on my resume? Yep.
|
| I fucking hate school and have no desire to ever go back.
| I can't put up with the bullshit, so I dropped out; I
| just never stopped studying and I don't need a piece of
| paper to affirm that fact.
| lisper wrote:
| First, I was specifically responding to this:
|
| > I remember my high school AP Psychology teacher mocking
| p=0.05 as practically meaningless.
|
| and trying to explain why the OP's teacher was probably
| right.
|
| Second:
|
| > So run a meta-study upon the results published by a set
| of authors and double-check to make sure that their
| results are normally distributed across the p-values
| associated with their studies.
|
| That won't work, especially if you only run the meta-
| study on published results because it is all but
| impossible to get negative results published. Authors
| don't need to cherry-pick, the peer-review system does it
| for them.
|
| > These problems are solved problems in the scientific
| community.
|
| No, they aren't. These are social and political problems,
| not mathematical ones. And the scientific community is
| pretty bad at solving those.
|
| > the discussion point you're making is well beyond the
| high-school level needed for a general education
|
| I strongly disagree. I think everyone needs to understand
| this so they can approach scientific claims with an
| appropriate level of skepticism. Understanding how the
| sausage is made is essential to understanding science.
|
| And BTW, I am not some crazy anti-vaxxer climate-change
| denialist flat-earther. I was an academic researcher for
| 15 years -- in a STEM field, not psychology, and even
| _that_ was sufficiently screwed up to make me change my
| career. I have advocated for science and the scientific
| method for decades. It 's not science that's broken, it's
| the academic peer-review system, which is essentially
| unchanged since it was invented in the 19th century.
| _That_ is what needs to change. And that has nothing to
| do with math and everything to do with politics and
| economics.
| Viliam1234 wrote:
| > p=0.05 means that one result in 20 is going to be the
| result of chance.
|
| You made the same mistake most people make here: you
| turned the arrow of the implication. It is not
| "successful experiment implies chance (probability 5%)"
| but "chance implies successful experiment (probability
| 5%)".
|
| What does that mean in practice? Imagine a hypothetical
| scientist that is fundamentally confused about something
| important, so _all_ hypotheses they generate are false.
| Yet, using p=0.05, 5% of those hypotheses will be
| "confirmed experimentally". In that case, it is not 5% of
| the "experimentally confirmed" hypotheses that are wrong
| -- it is full 100%. Even without any cherry-picking.
|
| The problem is not that p=0.05 is too high. The problem
| is, it doesn't actually mean what most people believe it
| means.
| lisper wrote:
| I think we're actually in violent agreement here, but I
| just wasn't precise enough. Let me try again:
| p=0.05 means that one POSITIVE result in 20 is going to
| be the result of chance and not causality
|
| In other words: if I have some kind of intervention or
| treatment, and that intervention or treatment produces
| some result in a test group relative to a control group
| with p=0.05, then the odds of getting that result simply
| by chance and not because the treatment or intervention
| actually had an effect are 5%.
|
| The practical effect of this is that there are two
| different ways of getting a p=0.05 result:
|
| 1. Find a treatment or intervention that actually works
| or
|
| 2. Test ~20 different (useless) interventions. Or test
| one useless intervention ~20 times.
|
| A single p=0.05 result in isolation is useless because
| there is no way to know which of the two methods produced
| it.
|
| This is why replication is so important. The odds of
| getting a p=0.05 result by chance is 5%. But the odds of
| getting TWO of them in sequential trials is 0.25%, and
| the odds of a positive result being the result of pure
| chance decrease exponentially with each subsequent
| replication.
| [deleted]
| jameshart wrote:
| Curious how people are 'applying' the Law of Large Numbers in a
| way that needs this advice to be tacked on?
|
| > Always keep the speed of convergence in mind when applying the
| law of large numbers.
|
| Any 'application' of the LLN basically amounts to replacing some
| probalistic number derived from a bunch of random samples with
| the _expected value_ of that number... and tacking on 'for
| sufficiently large _n_ ' as a caveat to your subsequent
| conclusions.
|
| Figuring out whether, in practical cases, you will have a
| sufficiently large _n_ that the conclusion is valid is a
| necessary step in the analysis.
| LudwigNagasena wrote:
| > Figuring out whether, in practical cases, you will have a
| sufficiently large n that the conclusion is valid is a
| necessary step in the analysis.
|
| The econometrics textbook I studied has more words "asymptotic"
| in it than there are pages. Oftentimes it's impractical or even
| theoretically intractable to derive finite sample properties
| (and thus to answer when n is _really_ large enough).
| gloryless wrote:
| This kind of intuition is why a high school level statistics or
| probability class seems so so valuable. I know not everyone will
| use the math per se, but the concepts apply to everyday life and
| are really hard to just grasp without having been taught it at
| some point.
| zodmaner wrote:
| The sad thing is, having a mandatory high school level
| statistics & probability class alone is not enough, you'll also
| need a good curriculum and a competent teacher to go along with
| it. Otherwise, it wouldn't work: a bad curriculum taught badly
| by a unmotivated or unqualified teacher will almost always fail
| to teach the intuition, or, even worse, alienates students from
| the materials.
| Dylan16807 wrote:
| > This means that on average, we'll need a fifty million times
| larger sample for the sample average to be as close to the true
| average as in the case of dice rolls.
|
| This is "as close" in an absolute sense, right?
|
| If I take into account that the lottery value is 20x larger, and
| I'm targeting relative accuracy, then I need 2.5 million times as
| many samples?
| causality0 wrote:
| Am I the only one unreasonably annoyed that his graphs don't
| match the description of his rolls?
| alexb_ wrote:
| If you had a gambling game that was simply "heads or tails, even
| money", you would expect over a Large Number of trials that you
| would get 0. But once you observe exactly one trial, the expected
| value because +1 or -1 unit. We know this is always going to
| happen one way or the other. Why then, does the bell curve of
| "expected value" for this game not have two peaks, at 1 and -1?
| Why does it peak at 0 instead?
|
| What I'm asking about, I know I'm wrong about - I just want to
| know how I can derive that for myself.
| ineptech wrote:
| "The expected value of a random variable with a finite number
| of outcomes is a weighted average of all possible outcomes." --
| https://en.wikipedia.org/wiki/Expected_value
| alexb_ wrote:
| That makes sense, I was always thinking of it as "Given an
| infinite number of trials..."
| ineptech wrote:
| Whether/when its better to think in terms of "X has a 37%
| chance of happening in a single trial" vs "If you ran a lot
| of trials, X would happen in 37% of them" is kind of a
| fraught topic that I can't say much about, but you might
| find this interesting:
| https://en.wikipedia.org/wiki/Probability_interpretations
| munchbunny wrote:
| The intuitive explanation is that the effect of a single sample
| on the average diminishes as you take more samples. So, hand-
| waving a bit, let's assume it's true that over a large number
| of trials you would expect the average to converge to 0. You
| just tossed a coin and got heads, so you're at +1. The average
| of (1 + 0*n)/(n+1) still goes to 0 as n grows bigger and
| bigger.
|
| That skips over the distinction between "average" and
| "probability distribution", but those are nuances are probably
| better left for a proof of the central limit theorem.
| crdrost wrote:
| If the blog author is reading, some notes for improvement:
|
| - Your odds calculation is likely wrong. You assumed from the
| word "odds" that "odds ratio" was meant, (Odds=3 meaning "odds
| 3:1 against" corresponding to p=25%) but the phrase is
| "approximate odds 1 in X" (Odds=3 meaning "odds of 1 in 3 to win"
| meaning 33%) and recalculating results in the remarkably exact
| expected value of $80 which seems intentional?
|
| - You phrase things in terms of variances, people will think more
| in terms of standard deviations. So 3.5 +- 1.7 vs $80 +- $12,526.
|
| - Note that you try to make a direct comparison between those two
| but the two are in fact incomparable. The most direct comparison
| might be to subtract 1 from the die roll and multiply by $32, so
| that you have a 1/6 chance of winning $0, 1/6 of winning $32, ...
| 1/6 of winning $160. So then we have $80 +- $55 vs $80 +-
| $12,526. Then instead of saying you'd need 50 _million_ more
| lottery tickets you 'd actually say you need about 50 _thousand_
| more. This is closer to the "right ballpark" where you can tell
| that the whole lottery is expected to sell about 10,200,000
| tickets on a good day.
|
| - But where an article like this should really go is, "what are
| you using the numbers for?". In the case of the Texas lottery
| this is actually a strong constraint, they have to make sure that
| they make a "profit" (like, it's not a real profit, it probably
| goes to schools or something) on most lotteries, so you're
| actually trying to ensure that 5 sigma or so is less than the
| bias. So you've got a competition between $20 * _n_ and 5 *
| $12,526 * [?]( _n_ ), or [?]( _n_ ) = 12526/4, _n_ = 9.8 million.
| So that 's what the Texas Lottery is targeting, right? So then we
| would calculate that the equivalent number of people that should
| play in the "roll a die linear lottery" we've constructed is 187,
| call it an even 200, if 200 people pay $100 for a lottery ticket
| on the linear lottery then we can pretty much always pay out even
| on a really bad day.
|
| - So the 50,000x number that is actually correct is basically
| just saying that we can run a much smaller lottery, 50,000 times
| smaller, with that payoff structure. And there's something nice
| about phrasing it this way.
|
| - To really get "law of large numbers" we should _actually_
| probably be looking at how much these distributions deviate from
| Gaussian, rather than complaining that the Gaussian is too wide?
| You can account for a wide Gaussian in a number of ways. But
| probably we want to take the cube root of the 3d cumulant, for
| example, try to argue when it "vanishes"? Except given the
| symmetry the 3rd cumulant for the die is probably 0 so you might
| need to go out to the 4th cumulant for the die -- and this might
| give a better explanation for the die converging more rapidly in
| "shape" to the mean, it doesn't just come close faster, it also
| becomes a Gaussian significantly faster because the payoff
| structure is symmetric about the mean.
| nathell wrote:
| Tangential: syntax-highlighting math! This is the first time I've
| seen it. Not yet sure what I think about it, but I can definitely
| see the allure.
| rendaw wrote:
| Pedant-man on the scene: this is just highlighting since the
| highlighting isn't derived from syntax.
| Tachyooon wrote:
| It's easy on the eyes and it can make reading lots of equations
| less awkward if done correctly. I remember finding out this was
| possible while I was working on an assignment in Latex - it
| looked amazing.
|
| It takes a little bit of work to colour in equations but I hope
| more people start doing it (including me, I'd forgotten about
| it for a while)
| tetha wrote:
| Yeah, I like it. I used this as a tutor in more finicky
| exercises when it becomes really important to keep 2-3 very
| similar, but different things apart. It takes a bit of
| dexterity, but you can switch fluently between 3 different
| whiteboard markers held in one hand while writing, haha.
|
| I am kind of wondering if a semantic highlighting makes sense
| as well. You often end up with some implicit assignment of
| lowercase latin, uppercase latin, lowercase greek letters and
| such for certain meanings. Kinematic - xyzt for position in
| time, T_i(I_i) for the quaternion or transformation
| representing a certain joint of a robot.
| derbOac wrote:
| The biggest problem for real processes is knowing whether in fact
| x ~ i.i.d., with regard to time as well as individual
| observations.
___________________________________________________________________
(page generated 2023-09-12 23:01 UTC)