[HN Gopher] How large is that number in the Law of Large Numbers?
___________________________________________________________________
How large is that number in the Law of Large Numbers?
Author : sebg
Score : 181 points
Date : 2023-09-12 13:06 UTC (1 days ago)
(HTM) web link (thepalindrome.org)
(TXT) w3m dump (thepalindrome.org)
| wodenokoto wrote:
| I really like how the plots and graphics look. Is it the library
| by 3blue1brown? (Is it manim, it's called?)
| bannedbybros wrote:
| [dead]
| blt wrote:
| In learning theory there is focus on "non-asymptotic" results.
| Instead of only showing that our method converges on the right
| answer in the limit of infinite data, we must show _how fast_ it
| converges.
| yafbum wrote:
| Stats class role of thumb: if you need to calculate the relative
| probability of two outcomes, you can get to within about 10% once
| you get 100 samples of each outcome (so, need more samples
| overall if the distribution is skewed).
| gear54rus wrote:
| Its interesting that even in this thread the 2 answers differ
| by an order of magnitude lol
| gipp wrote:
| Eh it's really the same rule, just applying a different
| threshold.
| marcosdumay wrote:
| The problem is that the sensitivity to the number growth is
| supposed to be exponential. So if you need 100 samples for
| "within 10% of the value", then 10 samples should give you
| almost completely random behavior.
|
| In reality, it depends on your actual distribution, but the
| OP from this thread here is unreasonably conservative for
| something described as a "rule of thumb". Almost always, if
| you have at least 10 of every category, you can already
| discover every interesting thing that a rule of thumb will
| allow. And you probably could go with less. But if you want
| precision, you can't get it with rules of thumb.
| CaptainNegative wrote:
| The dependence on sample size is not exponential, it's
| sublinear. The heuristic rate of convergence to keep in
| mind is the square root of the sample size, i.e. getting
| 10x more samples shrinks the margin of error (in a
| multiplicative sense) by sqrt(10) [?] 3ish.
|
| The exponential bit applies to the probability densities
| as a function of the bounds themselves, i.e. how likely
| you are to fall x units away from the mean typically
| decreases exponentially with (some polynomial in) x.
|
| Of course, this is all assuming a whole bunch of standard
| conditions on the data you're looking at (independence,
| identically distributed, bounded variance, etc.) and may
| not hold if these are violated.
| [deleted]
| jcranmer wrote:
| FWIW, the threshold I learned was 20 in each bucket, so now
| you have 3 answers.
| Keirmot wrote:
| I think both answers are referencing the Central Limit
| Theorem, that states [simplified] that once you get over 30
| samples for each independent variable, you will get a normal
| distribution.
| koolba wrote:
| Just apply it recursively. Let's get 100 samples of comments
| suggesting the number of samples to use. Then average those.
| dragontamer wrote:
| My statistics class at high school level taught the following:
|
| The number of samples you need is very difficult to calculate
| correctly, requiring deep analysis of standard deviations and
| variances.
|
| But surprisingly, you can simply know you've reached large number
| status when over 10 items exist in each category.
|
| ---------
|
| Ex: when doing a heads vs tails coin flip experiment, you likely
| have a large number once you have over 10 heads and over 10
| tails. No matter how biased the coin is.
|
| Or in this 'Lotto ticket' example, you have a large number of
| samples after gathering enough data to find over 10 Jackpot
| winners.
| jmount wrote:
| Very cool rule.
|
| I think you can justify it by approximating each category as an
| independent Poisson distribution. Then for each such processes
| the variance equals the mean. So once you have 10 successes in
| a bin, you have evidence of a probably good estimate for the
| arrival rate of that category. The book "The Probabilistic
| Method" calls a related idea "the Poisson paradigm."
|
| (10 a nice round number where the standard deviation is below
| the mean)
| jmount wrote:
| Small proviso: this is only true for a reasonable number of
| categories (or you run into repeated experiment problems).
| hgsgm wrote:
| What's a reasonable number of categories? 10?
| dredmorbius wrote:
| That's defined by the phenomenon you're investigating.
|
| In the case of six-sided dice, there are precisely six
| categories, ideally with even odds of occurrence. With
| the lottery jackpot given, there are eight categories,
| with highly asymmetric probabilities and values.
|
| In real-world cases, you might be trying to distinguish
| two cases (treatment and control in a medical
| experiment), between multiple particles or isotopes (say,
| with physics or chemistry), amongst different political
| divisions (countries, states or provinces, counties,
| cities, or other), between political parties or
| candidates (which raises interesting questions over which
| and/or how many to include in consideration, in turn
| dependent on voting procedures, overall popularity, and
| impacts of non-winning candidates or parties on others),
| on multiple products, or on different behavioural
| characteristics in some domain (e.g., highly-active,
| occasionally-active, and lurking participants in online
| fora).
|
| There are times when categories are well and
| unambiguously defined. Others in which where you choose
| to draw divisions (say, in generational groups, or wealth
| or income brackets) is highly arbitrary. Even where there
| are a large number of potential categories, choosing some
| limited number for specific analysis (2, 3, 5, 10, etc.)
| and lumping the remaining into "other" may provide
| clearer insights and fewer distractions than choosing a
| large number of divisions.[1] In other cases, a very
| small number of _individuals_ may account for an
| overwhelming majority of _activity_ or _outcome_. I 'd
| strongly argue that in this case, the analysis might be
| somewhat poorly focused, and that activities and outcomes
| rather than individuals are of greater interest.[2]
|
| What's key is to _match your sampling and sample sizes to
| the phenomenon being studied_.
|
| ________________________________
|
| Notes:
|
| 1. Power law distribution / Zipf functions often mean
| that a very small number of participants has highly
| disproportionate impact or significance.
|
| 2. This is often the flip side of power law
| distributions. If we look at all book titles, there are a
| huge number of individual items to consider; there are
| roughly 300k annual English-language "traditional"
| publications, and over 1 million "nontraditional" (self-
| published, or publish-on-demand) titles. But if your
| focus is instead titles by percentage of revenue or
| number of sales, a top-n analysis (5, 10, 20, etc.) often
| captures much of the activity, frequently well over half.
| This is typical of any informational good: music, cinema,
| blogs, social media posts, etc.
| Aerroon wrote:
| My _intuition_ is telling me that a coin flip would be the
| worst case scenario, because I think it would be the
| easiest to hit 10 examples in both categories. Every other
| mix of probabilities I can come up with would average more
| rolls of RNG than a coinflip.
|
| Am I mistaken?
| fiddlerwoaroof wrote:
| Does this work the other way? I.e. "you have enough buckets if
| adding one more puts the number of samples in the smallest
| bucket below 10?"
| NelsonMinar wrote:
| That's a neat rule of thumb; is there a simple statistical
| argument for why 10 is the (not very large) number?
| [deleted]
| tgv wrote:
| For heads or tails, that leaves a very large margin. In approx.
| 1 in 20 trials, you'll end up with a 10-20 split.
| dragontamer wrote:
| Yeah, 95% confidence ratio (or approximately two standard
| deviations) is pretty standard with regards to statistical
| tests.
|
| You gotta draw the line somewhere. At high-school statistics
| level, its basically universally drawn at the 95% confidence
| level. If you wanna draw new lines elsewhere, you gotta make
| new rules yourself and recalculate all the rules of thumb.
| User23 wrote:
| I remember my high school AP Psychology teacher mocking
| p=0.05 as practically meaningless. In retrospect it's funny
| for a psychologist to say that, but I guess it was because
| he was from the more empirically minded behaviorist
| cognitive school and from time to time they have done
| actual rigorous experiments[1] (in rodents).
|
| [1] For example as described by Feynman in Cargo Cult
| Science.
| tgv wrote:
| The observation above is simply true. If you toss a coin
| 30 times, there's about a 5% chance that you'll end up
| with 10-20 ratio or one more extreme.
|
| NHST testing inverts the probability logic, makes the 5%
| holy, and skims over the high probability of finding
| something that is not equal to a specific value. That
| procedure is then used for theory confirmation, while it
| was (in another form) meant for falsification. Everything
| is wrong about it, even if the experimental method is
| flawless. Hence the reproducibility crisis.
| lisper wrote:
| The problem is two-fold:
|
| 1. p=0.05 means that one result in 20 is going to be the
| result of chance.
|
| 2. It's generally pretty easy (especially in psychology)
| to do 20 experiments, cherry-pick -- and publish! -- the
| p=0.05 result, and throw away the others.
|
| The result is that _published_ p=0.05 results are much
| _more_ likely than 1 in 20 to be the result of chance.
| dragontamer wrote:
| So run a meta-study upon the results published by a set
| of authors and double-check to make sure that their
| results are normally distributed across the p-values
| associated with their studies.
|
| These problems are solved problems in the scientific
| community. Just announce that regular meta-studies will
| be done, expectations for authors to be normally
| distributed is published, and publicly show off the meta-
| study.
|
| -------------
|
| In any case, the discussion point you're making is well
| beyond the high-school level needed for a general
| education. If someone needs to run their own experiment
| (A/B testing upon their website) and cannot afford a
| proper set of tests/statistics, they should instead rely
| upon high-school level heuristics to design their
| personal studies.
|
| This isn't a level of study about analyzing other
| people's results and finding flaws in other people's
| (possibly maliciously seeded) results. This is a
| heuristic about how to run your own experiments and how
| to prove something to yourself at a 95% confidence level.
| If you want to get published in the scientific community,
| the level of rigor is much higher of course, but no one
| tries to publish a scientific paper on just a high school
| education (which is where I was aiming my original
| comment at).
| User23 wrote:
| There's a professor of Human Evolutionary Biology at
| Harvard who only has a high school diploma[1]. Needless
| to say he's been published and cited many times over.
|
| [1] https://theconversation.com/profiles/louis-
| liebenberg-122680...
| withinboredom wrote:
| I don't know whether you're mocking them or being
| supportive of them or just stating a fact. Either way,
| education level has no bearing on subject knowledge. I
| know more about how computers, compilers, and software
| algorithms work than most post-docs and professors that
| I've run into in those subjects.
|
| Am I smarter than them? Nope. Do I know as many fancy big
| words as them? Nope. Do I care about results and
| communicating complex topics to normal people? Yep. Do I
| care more about making the company money than chasing
| some bug-bear to go on my resume? Yep.
|
| I fucking hate school and have no desire to ever go back.
| I can't put up with the bullshit, so I dropped out; I
| just never stopped studying and I don't need a piece of
| paper to affirm that fact.
| withinboredom wrote:
| To the people downvoting, at least rebuttal.
| lisper wrote:
| First, I was specifically responding to this:
|
| > I remember my high school AP Psychology teacher mocking
| p=0.05 as practically meaningless.
|
| and trying to explain why the OP's teacher was probably
| right.
|
| Second:
|
| > So run a meta-study upon the results published by a set
| of authors and double-check to make sure that their
| results are normally distributed across the p-values
| associated with their studies.
|
| That won't work, especially if you only run the meta-
| study on published results because it is all but
| impossible to get negative results published. Authors
| don't need to cherry-pick, the peer-review system does it
| for them.
|
| > These problems are solved problems in the scientific
| community.
|
| No, they aren't. These are social and political problems,
| not mathematical ones. And the scientific community is
| pretty bad at solving those.
|
| > the discussion point you're making is well beyond the
| high-school level needed for a general education
|
| I strongly disagree. I think everyone needs to understand
| this so they can approach scientific claims with an
| appropriate level of skepticism. Understanding how the
| sausage is made is essential to understanding science.
|
| And BTW, I am not some crazy anti-vaxxer climate-change
| denialist flat-earther. I was an academic researcher for
| 15 years -- in a STEM field, not psychology, and even
| _that_ was sufficiently screwed up to make me change my
| career. I have advocated for science and the scientific
| method for decades. It 's not science that's broken, it's
| the academic peer-review system, which is essentially
| unchanged since it was invented in the 19th century.
| _That_ is what needs to change. And that has nothing to
| do with math and everything to do with politics and
| economics.
| trashtester wrote:
| > It's not science that's broken, it's the academic peer-
| review system, which is essentially unchanged since it
| was invented in the 19th century.
|
| In my experience, it's not even this. Rather, it is that
| outside of STEM, very, very few people truly understand
| hypothesis testing.
|
| At least in my experience, even basic concepts, as
| "falsify the null-hypothesis" is surprisingly hard, even
| with presumably intelligent people, such as MD's in PHd
| programmes.
|
| They will still tend to believe that a "significant"
| result is proof of an effect, and often even believe it
| proves that the effect is causal with the direction they
| prefer.
|
| At some point, stats just becomes a set of arcane
| conjurations for an entire field. At that point, the
| field as a whole tends to lose their ability to follow
| the scientific method and turns into something resembling
| a cult or clergy.
| lisper wrote:
| FWIW, I got through a Ph.D. program in CS without ever
| having to take a stats course. I took probability theory,
| which is related, but not the same thing. I had to figure
| out stats on my own. So yes, I think you're absolutely
| right, but it's not just "outside of STEM" -- sometimes
| it's inside of STEM too.
| thaumasiotes wrote:
| > and double-check to make sure that their results are
| normally distributed across the p-values associated with
| their studies
|
| What is the distribution of a set of results over a set
| of p-values?
|
| If you mean that you should check to make sure that the
| p-values themselves are normally distributed... wouldn't
| that be wrong? Assuming all hypotheses are false,
| p-values should be uniformly distributed. Assuming some
| hypotheses can sometimes be true, there's not a lot you
| can say about the appropriate distribution of p-values -
| it would depend on how often hypotheses are correct, and
| how strong the effects are.
| Viliam1234 wrote:
| > p=0.05 means that one result in 20 is going to be the
| result of chance.
|
| You made the same mistake most people make here: you
| turned the arrow of the implication. It is not
| "successful experiment implies chance (probability 5%)"
| but "chance implies successful experiment (probability
| 5%)".
|
| What does that mean in practice? Imagine a hypothetical
| scientist that is fundamentally confused about something
| important, so _all_ hypotheses they generate are false.
| Yet, using p=0.05, 5% of those hypotheses will be
| "confirmed experimentally". In that case, it is not 5% of
| the "experimentally confirmed" hypotheses that are wrong
| -- it is full 100%. Even without any cherry-picking.
|
| The problem is not that p=0.05 is too high. The problem
| is, it doesn't actually mean what most people believe it
| means.
| majormajor wrote:
| > What does that mean in practice? Imagine a hypothetical
| scientist that is fundamentally confused about something
| important, so all hypotheses they generate are false.
| Yet, using p=0.05, 5% of those hypotheses will be
| "confirmed experimentally". In that case, it is not 5% of
| the "experimentally confirmed" hypotheses that are wrong
| -- it is full 100%. Even without any cherry-picking.
|
| Well, that's example is also introducing dependence,
| which is a tricky thing of course whenever we talk about
| chance and stats.
|
| But there's also another issue - a statement like "5% of
| positive published results are by chance since we have a
| p<=0.05 standard" treats every set of results as if
| p=0.05, whereas some of them are considerably lower
| anyway. Though the point of bad actors cherry-picking to
| screw up the data also comes into play here.
|
| (And of course, fully independent things in life are much
| harder to find than one might think at first.)
| stkdump wrote:
| I agree that the point about the 'confused scientist' is
| important, even if that itself is not stated clearly
| enough. Here is my own reading:
|
| Imagine that a scientist is making experiments of the
| form: Does observable variable A correlate with
| observable variable B? Now imagine that there are
| billions of observable variables and almost all of them
| are not correlated. And imagine that there is no better
| way to come up with plausible correlations to test than
| randomly picking variables. Then it will take a very long
| time and a very large number of experiments to find a
| pair that is truly correlated. It will be inevitable that
| most positive results are bogus.
| lisper wrote:
| I think we're actually in violent agreement here, but I
| just wasn't precise enough. Let me try again:
| p=0.05 means that one POSITIVE result in 20 is going to
| be the result of chance and not causality
|
| In other words: if I have some kind of intervention or
| treatment, and that intervention or treatment produces
| some result in a test group relative to a control group
| with p=0.05, then the odds of getting that result simply
| by chance and not because the treatment or intervention
| actually had an effect are 5%.
|
| The practical effect of this is that there are two
| different ways of getting a p=0.05 result:
|
| 1. Find a treatment or intervention that actually works
| or
|
| 2. Test ~20 different (useless) interventions. Or test
| one useless intervention ~20 times.
|
| A single p=0.05 result in isolation is useless because
| there is no way to know which of the two methods produced
| it.
|
| This is why replication is so important. The odds of
| getting a p=0.05 result by chance is 5%. But the odds of
| getting TWO of them in sequential trials is 0.25%, and
| the odds of a positive result being the result of pure
| chance decrease exponentially with each subsequent
| replication.
| thaumasiotes wrote:
| > Let me try again:
|
| > p=0.05 means that one POSITIVE result in 20 is going to
| be the result of chance and not causality
|
| No, you still didn't get it. In the example above, a full
| 100% of positive results, 20 out of every 20, are the
| result of chance and not causality.
|
| Your followup discussion is better, but your statement at
| the top doesn't work.
|
| (Note also that there is an interaction between
| p-threshold and sample size which guarantees that, if
| you're investigating an effect that your sample size is
| not large enough to detect, any statistically significant
| result that you get will be several times stronger than
| the actual effect. They're also quite likely to have the
| wrong sign.)
| lisper wrote:
| > No, you still didn't get it. In the example above, a
| full 100% of positive results, 20 out of every 20, are
| the result of chance and not causality.
|
| Yep, you're right. I do think I understand this, but
| rendering it into words is turning out to be surprisingly
| challenging.
|
| Let me try this one more time: p=0.05 means that there is
| a 5% chance that any one particular positive result is
| due to chance. If you test a false hypothesis repeatedly,
| or test multiple false hypotheses, then 5% of the time
| you will get false positives (at p=0.05).
|
| However...
|
| > Imagine a hypothetical scientist that is fundamentally
| confused about something important, so all hypotheses
| they generate are false. Yet, using p=0.05, 5% of those
| hypotheses will be "confirmed experimentally". In that
| case, it is not 5% of the "experimentally confirmed"
| hypotheses that are wrong -- it is full 100%.
|
| This is not wrong, but it's a little misleading because
| you are _presuming_ that all of the hypotheses being
| tested are false. If we 're testing a hypothesis it's
| generally because we don't know whether or not it's true;
| we're trying to find out. That's why it's important to
| think of a positive result not as "confirmed
| experimentally" but rather as "not ruled out by this
| particular experimental result". It is only after failing
| to rule something out by _multiple_ experiments that we
| can start to call it "confirmed". And nothing is ever
| 100% confirmed -- at best it is "not ruled out by the
| evidence so far".
| hgsgm wrote:
| You can't simply ignore the base rate, even if you don't
| know it.
|
| In a purely random world, 5% of experiments are false
| positives, at p=0.05. None are true positives.
|
| In a well ordered world with brilliant hypotheses, there
| are no false positives.
|
| If more than 5% of experiments show positive results at
| p=0.05, some of them are probably true, so you can try to
| replicate them with lower p.
|
| p=0.05 is a filter for "worth trying to replicate" (but
| even that is modulated by cost of replication vs value of
| result).
|
| The crisis in science is largely that people confuse
| "publishable" with "probably true". Anything "probably
| better then random guessing" is publishable to help other
| researchers, but that doesn't mean it's probably true.
| lisper wrote:
| > p=0.05 is a filter for "worth trying to replicate"
|
| Yes, I think that is an excellent way to put it.
|
| > The crisis in science is largely that people confuse
| "publishable" with "probably true".
|
| I would put it slightly differently: people conflate
| "published in a top-tier peer-reviewed journal" with
| "true beyond reasonable dispute". They also conflate "not
| published in a top-tier peer-reviewed journal" with
| "almost certainly false."
|
| But I think we're in substantial agreement here.
| thaumasiotes wrote:
| > I do think I understand this, but rendering it into
| words is turning out to be surprisingly challenging.
|
| A p-value of .05 means that, under the assumption that
| the null hypothesis you specified is true, you just
| observed a result which lies at the 5th percentile of the
| outcome space, sorted along some metric (usually
| "extremity of outcome"). That is to say, out of all
| possible outcomes, only 5% of them are as "extreme" as,
| or more "extreme" than, the outcome you observed.
|
| It doesn't tell you anything about the odds that any
| result is due to chance. It tells you how often the null
| hypothesis gives you a result that is "similar", by some
| definition, to the result you observed.
| lisper wrote:
| What do you think that "due to chance" _means_?
| Viliam1234 wrote:
| Do you know the difference between "if A then B" and "if
| B then A"?
|
| This is the same thing, but with probabilities: "if A,
| then 5% chance of B" and "if B, then 5% chance of A".
| Those are two very different things.
|
| p=0.05 means "if hypothesis is wrong, then 5% chance of
| published research". It does not mean "if published
| research, then 5% chance of wrong hypothesis"; but most
| people believe it does, including probably most
| scientists.
| [deleted]
| jameshart wrote:
| Curious how people are 'applying' the Law of Large Numbers in a
| way that needs this advice to be tacked on?
|
| > Always keep the speed of convergence in mind when applying the
| law of large numbers.
|
| Any 'application' of the LLN basically amounts to replacing some
| probalistic number derived from a bunch of random samples with
| the _expected value_ of that number... and tacking on 'for
| sufficiently large _n_ ' as a caveat to your subsequent
| conclusions.
|
| Figuring out whether, in practical cases, you will have a
| sufficiently large _n_ that the conclusion is valid is a
| necessary step in the analysis.
| LudwigNagasena wrote:
| > Figuring out whether, in practical cases, you will have a
| sufficiently large n that the conclusion is valid is a
| necessary step in the analysis.
|
| The econometrics textbook I studied has more words "asymptotic"
| in it than there are pages. Oftentimes it's impractical or even
| theoretically intractable to derive finite sample properties
| (and thus to answer when n is _really_ large enough).
| otabdeveloper4 wrote:
| About three fifty.
|
| (No, just joking. Actually 42 plus or minus.)
| gloryless wrote:
| This kind of intuition is why a high school level statistics or
| probability class seems so so valuable. I know not everyone will
| use the math per se, but the concepts apply to everyday life and
| are really hard to just grasp without having been taught it at
| some point.
| zodmaner wrote:
| The sad thing is, having a mandatory high school level
| statistics & probability class alone is not enough, you'll also
| need a good curriculum and a competent teacher to go along with
| it. Otherwise, it wouldn't work: a bad curriculum taught badly
| by a unmotivated or unqualified teacher will almost always fail
| to teach the intuition, or, even worse, alienates students from
| the materials.
| Dylan16807 wrote:
| > This means that on average, we'll need a fifty million times
| larger sample for the sample average to be as close to the true
| average as in the case of dice rolls.
|
| This is "as close" in an absolute sense, right?
|
| If I take into account that the lottery value is 20x larger, and
| I'm targeting relative accuracy, then I need 2.5 million times as
| many samples?
| causality0 wrote:
| Am I the only one unreasonably annoyed that his graphs don't
| match the description of his rolls?
| alexb_ wrote:
| If you had a gambling game that was simply "heads or tails, even
| money", you would expect over a Large Number of trials that you
| would get 0. But once you observe exactly one trial, the expected
| value because +1 or -1 unit. We know this is always going to
| happen one way or the other. Why then, does the bell curve of
| "expected value" for this game not have two peaks, at 1 and -1?
| Why does it peak at 0 instead?
|
| What I'm asking about, I know I'm wrong about - I just want to
| know how I can derive that for myself.
| ineptech wrote:
| "The expected value of a random variable with a finite number
| of outcomes is a weighted average of all possible outcomes." --
| https://en.wikipedia.org/wiki/Expected_value
| alexb_ wrote:
| That makes sense, I was always thinking of it as "Given an
| infinite number of trials..."
| tnecniv wrote:
| That would be the frequentist interpretation. A Bayesian
| would say that probability is to be interpreted in belief
| that an outcome will occur. Neither is really right or
| wrong, it depends on what you're modeling. If we're
| analyzing some kind of heavily repeated task (e.g., a
| sordid night of us glued to the blackjack table where we
| play a lot of hands or data transmission over a noisy
| cable), a frequentist interpretation might feel more sense.
| However if you're talking about the probability of a
| candidate winning an election, you could take a Bayesian
| view where the probability asserts a confidence in an
| outcome. A radical frequentist would take umbrage with an
| event that only happens once. However, I suppose, depending
| on your election rules and model (e.g., a direct
| democracy), you could interpret the election winner in a
| frequentist manner: the probability of winning is the rate
| at which people vote for the candidate. For a more
| complicated system I'm not sure the frequentist view is as
| easily justified.
|
| However to answer your question more directly, the expected
| value is just another name for the average or mean of a
| random variable. In this case, the variable is your profit.
| Assume we're betting a dollar per toss on coin flips and I
| win if it's heads (everyone knows heads always wins,
| right?). The expected value is probability of heads * 1 -
| probability of tails * 1. If the coin is fair, the
| probabilities are the same so the expected value is zero.
|
| Aside: sequences of random variables that are "fair bets"
| are called martingales and are incredibly useful. It's a
| fair bet because, given all prior knowledge of the value of
| the variable thus far, the expected value of the next value
| you witness is the current value of the variable. You could
| imagine looking at a history of stock values. Given all
| that information, it's a martingale (and thus a fair bet)
| if given that information your expected profit from
| investing is 0.
| ineptech wrote:
| Whether/when its better to think in terms of "X has a 37%
| chance of happening in a single trial" vs "If you ran a lot
| of trials, X would happen in 37% of them" is kind of a
| fraught topic that I can't say much about, but you might
| find this interesting:
| https://en.wikipedia.org/wiki/Probability_interpretations
| hddqsb wrote:
| There are a couple of confusions/ambiguities here.
|
| The Law of Large Numbers is about the _average_ , so it's not
| relevant here (an average of +1 would mean you got heads _every
| single time_ , which is extremely unlikely for large n).
|
| If you are looking at the _sum_ , then the value depends on
| whether the number of trials (n) is even or odd. If n is odd,
| you would indeed get two peaks at 1 and -1, and you would
| _never_ get exactly 0. If n is even, you would get a peak at 0
| and you would never get exactly 1 or -1.
|
| The expected value (aka average) is a _number_ , not a
| distribution. The expected value for the sum is 0 even when n
| is odd and you can't get exactly 0 -- that's just how the
| expected value works (in the same way that the "expected value"
| for the number of children in a family can be 2.5 even though a
| family can't have half a child). If you look at the
| _probability density function_ for a single trial, then it
| _does_ have two peaks at 1 and -1 (and is zero everywhere
| else).
|
| The curve you refer to might be the normal approximation (https
| ://en.wikipedia.org/wiki/Binomial_distribution#Normal_a...).
| It's true that the normal approximation for the distribution of
| the sum in your gambling game has a peak at 0 even when n is
| odd and the sum can't be exactly 0. That's because the normal
| approximation is a _continuous_ approximation and it doesn 't
| capture the discrete nature of the underlying distribution.
| bonoboTP wrote:
| You're on the right track. The only thing you're missing is
| that adding (averaging) two bell curve distributions that are
| offset by a little does not necessarily give a bimodal
| distribution. It will only be bimodal if the two unimodal that
| you are adding are placed far enough away from each other.
|
| See this https://stats.stackexchange.com/questions/416204/why-
| is-a-mi...
| munchbunny wrote:
| The intuitive explanation is that the effect of a single sample
| on the average diminishes as you take more samples. So, hand-
| waving a bit, let's assume it's true that over a large number
| of trials you would expect the average to converge to 0. You
| just tossed a coin and got heads, so you're at +1. The average
| of (1 + 0*n)/(n+1) still goes to 0 as n grows bigger and
| bigger.
|
| That skips over the distinction between "average" and
| "probability distribution", but those are nuances are probably
| better left for a proof of the central limit theorem.
| pid-1 wrote:
| One way to gain intuition on why the LLN might work faster,
| slower or not at all is to write the mean equation in rescursive
| form (What's the next expected value estimation, given a new
| sample and the current expected value estimation?).
| crdrost wrote:
| If the blog author is reading, some notes for improvement:
|
| - Your odds calculation is likely wrong. You assumed from the
| word "odds" that "odds ratio" was meant, (Odds=3 meaning "odds
| 3:1 against" corresponding to p=25%) but the phrase is
| "approximate odds 1 in X" (Odds=3 meaning "odds of 1 in 3 to win"
| meaning 33%) and recalculating results in the remarkably exact
| expected value of $80 which seems intentional?
|
| - You phrase things in terms of variances, people will think more
| in terms of standard deviations. So 3.5 +- 1.7 vs $80 +- $12,526.
|
| - Note that you try to make a direct comparison between those two
| but the two are in fact incomparable. The most direct comparison
| might be to subtract 1 from the die roll and multiply by $32, so
| that you have a 1/6 chance of winning $0, 1/6 of winning $32, ...
| 1/6 of winning $160. So then we have $80 +- $55 vs $80 +-
| $12,526. Then instead of saying you'd need 50 _million_ more
| lottery tickets you 'd actually say you need about 50 _thousand_
| more. This is closer to the "right ballpark" where you can tell
| that the whole lottery is expected to sell about 10,200,000
| tickets on a good day.
|
| - But where an article like this should really go is, "what are
| you using the numbers for?". In the case of the Texas lottery
| this is actually a strong constraint, they have to make sure that
| they make a "profit" (like, it's not a real profit, it probably
| goes to schools or something) on most lotteries, so you're
| actually trying to ensure that 5 sigma or so is less than the
| bias. So you've got a competition between $20 * _n_ and 5 *
| $12,526 * [?]( _n_ ), or [?]( _n_ ) = 12526/4, _n_ = 9.8 million.
| So that 's what the Texas Lottery is targeting, right? So then we
| would calculate that the equivalent number of people that should
| play in the "roll a die linear lottery" we've constructed is 187,
| call it an even 200, if 200 people pay $100 for a lottery ticket
| on the linear lottery then we can pretty much always pay out even
| on a really bad day.
|
| - So the 50,000x number that is actually correct is basically
| just saying that we can run a much smaller lottery, 50,000 times
| smaller, with that payoff structure. And there's something nice
| about phrasing it this way.
|
| - To really get "law of large numbers" we should _actually_
| probably be looking at how much these distributions deviate from
| Gaussian, rather than complaining that the Gaussian is too wide?
| You can account for a wide Gaussian in a number of ways. But
| probably we want to take the cube root of the 3d cumulant, for
| example, try to argue when it "vanishes"? Except given the
| symmetry the 3rd cumulant for the die is probably 0 so you might
| need to go out to the 4th cumulant for the die -- and this might
| give a better explanation for the die converging more rapidly in
| "shape" to the mean, it doesn't just come close faster, it also
| becomes a Gaussian significantly faster because the payoff
| structure is symmetric about the mean.
| nathell wrote:
| Tangential: syntax-highlighting math! This is the first time I've
| seen it. Not yet sure what I think about it, but I can definitely
| see the allure.
| rendaw wrote:
| Pedant-man on the scene: this is just highlighting since the
| highlighting isn't derived from syntax.
| crazygringo wrote:
| Which makes me wonder what _that_ would look like and would
| it be helpful?
|
| But there's already such complex and varied typography in
| math I wonder if it would be kind of redundant. E.g. you
| don't need matching parentheses to be colored when they
| already come in different sets of matching heights.
| andrewprock wrote:
| It's been around for a long time (centuries?). In most
| textbooks, you'll get different semantics for italic and bold
| faces. Modern textbooks with color printing often use color in
| semantic ways.
| Tachyooon wrote:
| It's easy on the eyes and it can make reading lots of equations
| less awkward if done correctly. I remember finding out this was
| possible while I was working on an assignment in Latex - it
| looked amazing.
|
| It takes a little bit of work to colour in equations but I hope
| more people start doing it (including me, I'd forgotten about
| it for a while)
| tetha wrote:
| Yeah, I like it. I used this as a tutor in more finicky
| exercises when it becomes really important to keep 2-3 very
| similar, but different things apart. It takes a bit of
| dexterity, but you can switch fluently between 3 different
| whiteboard markers held in one hand while writing, haha.
|
| I am kind of wondering if a semantic highlighting makes sense
| as well. You often end up with some implicit assignment of
| lowercase latin, uppercase latin, lowercase greek letters and
| such for certain meanings. Kinematic - xyzt for position in
| time, T_i(I_i) for the quaternion or transformation
| representing a certain joint of a robot.
| derbOac wrote:
| The biggest problem for real processes is knowing whether in fact
| x ~ i.i.d., with regard to time as well as individual
| observations.
___________________________________________________________________
(page generated 2023-09-13 23:02 UTC)