hngopher.com

       [HN Gopher] How large is that number in the Law of Large Numbers?
       ___________________________________________________________________
        
       How large is that number in the Law of Large Numbers?
        
       Author : sebg
       Score  : 181 points
       Date   : 2023-09-12 13:06 UTC (1 days ago)
        
 (HTM) web link (thepalindrome.org)
 (TXT) w3m dump (thepalindrome.org)
        
       | wodenokoto wrote:
       | I really like how the plots and graphics look. Is it the library
       | by 3blue1brown? (Is it manim, it's called?)
        
       | bannedbybros wrote:
       | [dead]
        
       | blt wrote:
       | In learning theory there is focus on "non-asymptotic" results.
       | Instead of only showing that our method converges on the right
       | answer in the limit of infinite data, we must show _how fast_ it
       | converges.
        
       | yafbum wrote:
       | Stats class role of thumb: if you need to calculate the relative
       | probability of two outcomes, you can get to within about 10% once
       | you get 100 samples of each outcome (so, need more samples
       | overall if the distribution is skewed).
        
         | gear54rus wrote:
         | Its interesting that even in this thread the 2 answers differ
         | by an order of magnitude lol
        
           | gipp wrote:
           | Eh it's really the same rule, just applying a different
           | threshold.
        
             | marcosdumay wrote:
             | The problem is that the sensitivity to the number growth is
             | supposed to be exponential. So if you need 100 samples for
             | "within 10% of the value", then 10 samples should give you
             | almost completely random behavior.
             | 
             | In reality, it depends on your actual distribution, but the
             | OP from this thread here is unreasonably conservative for
             | something described as a "rule of thumb". Almost always, if
             | you have at least 10 of every category, you can already
             | discover every interesting thing that a rule of thumb will
             | allow. And you probably could go with less. But if you want
             | precision, you can't get it with rules of thumb.
        
               | CaptainNegative wrote:
               | The dependence on sample size is not exponential, it's
               | sublinear. The heuristic rate of convergence to keep in
               | mind is the square root of the sample size, i.e. getting
               | 10x more samples shrinks the margin of error (in a
               | multiplicative sense) by sqrt(10) [?] 3ish.
               | 
               | The exponential bit applies to the probability densities
               | as a function of the bounds themselves, i.e. how likely
               | you are to fall x units away from the mean typically
               | decreases exponentially with (some polynomial in) x.
               | 
               | Of course, this is all assuming a whole bunch of standard
               | conditions on the data you're looking at (independence,
               | identically distributed, bounded variance, etc.) and may
               | not hold if these are violated.
        
               | [deleted]
        
           | jcranmer wrote:
           | FWIW, the threshold I learned was 20 in each bucket, so now
           | you have 3 answers.
        
           | Keirmot wrote:
           | I think both answers are referencing the Central Limit
           | Theorem, that states [simplified] that once you get over 30
           | samples for each independent variable, you will get a normal
           | distribution.
        
           | koolba wrote:
           | Just apply it recursively. Let's get 100 samples of comments
           | suggesting the number of samples to use. Then average those.
        
       | dragontamer wrote:
       | My statistics class at high school level taught the following:
       | 
       | The number of samples you need is very difficult to calculate
       | correctly, requiring deep analysis of standard deviations and
       | variances.
       | 
       | But surprisingly, you can simply know you've reached large number
       | status when over 10 items exist in each category.
       | 
       | ---------
       | 
       | Ex: when doing a heads vs tails coin flip experiment, you likely
       | have a large number once you have over 10 heads and over 10
       | tails. No matter how biased the coin is.
       | 
       | Or in this 'Lotto ticket' example, you have a large number of
       | samples after gathering enough data to find over 10 Jackpot
       | winners.
        
         | jmount wrote:
         | Very cool rule.
         | 
         | I think you can justify it by approximating each category as an
         | independent Poisson distribution. Then for each such processes
         | the variance equals the mean. So once you have 10 successes in
         | a bin, you have evidence of a probably good estimate for the
         | arrival rate of that category. The book "The Probabilistic
         | Method" calls a related idea "the Poisson paradigm."
         | 
         | (10 a nice round number where the standard deviation is below
         | the mean)
        
           | jmount wrote:
           | Small proviso: this is only true for a reasonable number of
           | categories (or you run into repeated experiment problems).
        
             | hgsgm wrote:
             | What's a reasonable number of categories? 10?
        
               | dredmorbius wrote:
               | That's defined by the phenomenon you're investigating.
               | 
               | In the case of six-sided dice, there are precisely six
               | categories, ideally with even odds of occurrence. With
               | the lottery jackpot given, there are eight categories,
               | with highly asymmetric probabilities and values.
               | 
               | In real-world cases, you might be trying to distinguish
               | two cases (treatment and control in a medical
               | experiment), between multiple particles or isotopes (say,
               | with physics or chemistry), amongst different political
               | divisions (countries, states or provinces, counties,
               | cities, or other), between political parties or
               | candidates (which raises interesting questions over which
               | and/or how many to include in consideration, in turn
               | dependent on voting procedures, overall popularity, and
               | impacts of non-winning candidates or parties on others),
               | on multiple products, or on different behavioural
               | characteristics in some domain (e.g., highly-active,
               | occasionally-active, and lurking participants in online
               | fora).
               | 
               | There are times when categories are well and
               | unambiguously defined. Others in which where you choose
               | to draw divisions (say, in generational groups, or wealth
               | or income brackets) is highly arbitrary. Even where there
               | are a large number of potential categories, choosing some
               | limited number for specific analysis (2, 3, 5, 10, etc.)
               | and lumping the remaining into "other" may provide
               | clearer insights and fewer distractions than choosing a
               | large number of divisions.[1] In other cases, a very
               | small number of _individuals_ may account for an
               | overwhelming majority of _activity_ or _outcome_. I 'd
               | strongly argue that in this case, the analysis might be
               | somewhat poorly focused, and that activities and outcomes
               | rather than individuals are of greater interest.[2]
               | 
               | What's key is to _match your sampling and sample sizes to
               | the phenomenon being studied_.
               | 
               | ________________________________
               | 
               | Notes:
               | 
               | 1. Power law distribution / Zipf functions often mean
               | that a very small number of participants has highly
               | disproportionate impact or significance.
               | 
               | 2. This is often the flip side of power law
               | distributions. If we look at all book titles, there are a
               | huge number of individual items to consider; there are
               | roughly 300k annual English-language "traditional"
               | publications, and over 1 million "nontraditional" (self-
               | published, or publish-on-demand) titles. But if your
               | focus is instead titles by percentage of revenue or
               | number of sales, a top-n analysis (5, 10, 20, etc.) often
               | captures much of the activity, frequently well over half.
               | This is typical of any informational good: music, cinema,
               | blogs, social media posts, etc.
        
             | Aerroon wrote:
             | My _intuition_ is telling me that a coin flip would be the
             | worst case scenario, because I think it would be the
             | easiest to hit 10 examples in both categories. Every other
             | mix of probabilities I can come up with would average more
             | rolls of RNG than a coinflip.
             | 
             | Am I mistaken?
        
         | fiddlerwoaroof wrote:
         | Does this work the other way? I.e. "you have enough buckets if
         | adding one more puts the number of samples in the smallest
         | bucket below 10?"
        
         | NelsonMinar wrote:
         | That's a neat rule of thumb; is there a simple statistical
         | argument for why 10 is the (not very large) number?
        
         | [deleted]
        
         | tgv wrote:
         | For heads or tails, that leaves a very large margin. In approx.
         | 1 in 20 trials, you'll end up with a 10-20 split.
        
           | dragontamer wrote:
           | Yeah, 95% confidence ratio (or approximately two standard
           | deviations) is pretty standard with regards to statistical
           | tests.
           | 
           | You gotta draw the line somewhere. At high-school statistics
           | level, its basically universally drawn at the 95% confidence
           | level. If you wanna draw new lines elsewhere, you gotta make
           | new rules yourself and recalculate all the rules of thumb.
        
             | User23 wrote:
             | I remember my high school AP Psychology teacher mocking
             | p=0.05 as practically meaningless. In retrospect it's funny
             | for a psychologist to say that, but I guess it was because
             | he was from the more empirically minded behaviorist
             | cognitive school and from time to time they have done
             | actual rigorous experiments[1] (in rodents).
             | 
             | [1] For example as described by Feynman in Cargo Cult
             | Science.
        
               | tgv wrote:
               | The observation above is simply true. If you toss a coin
               | 30 times, there's about a 5% chance that you'll end up
               | with 10-20 ratio or one more extreme.
               | 
               | NHST testing inverts the probability logic, makes the 5%
               | holy, and skims over the high probability of finding
               | something that is not equal to a specific value. That
               | procedure is then used for theory confirmation, while it
               | was (in another form) meant for falsification. Everything
               | is wrong about it, even if the experimental method is
               | flawless. Hence the reproducibility crisis.
        
               | lisper wrote:
               | The problem is two-fold:
               | 
               | 1. p=0.05 means that one result in 20 is going to be the
               | result of chance.
               | 
               | 2. It's generally pretty easy (especially in psychology)
               | to do 20 experiments, cherry-pick -- and publish! -- the
               | p=0.05 result, and throw away the others.
               | 
               | The result is that _published_ p=0.05 results are much
               | _more_ likely than 1 in 20 to be the result of chance.
        
               | dragontamer wrote:
               | So run a meta-study upon the results published by a set
               | of authors and double-check to make sure that their
               | results are normally distributed across the p-values
               | associated with their studies.
               | 
               | These problems are solved problems in the scientific
               | community. Just announce that regular meta-studies will
               | be done, expectations for authors to be normally
               | distributed is published, and publicly show off the meta-
               | study.
               | 
               | -------------
               | 
               | In any case, the discussion point you're making is well
               | beyond the high-school level needed for a general
               | education. If someone needs to run their own experiment
               | (A/B testing upon their website) and cannot afford a
               | proper set of tests/statistics, they should instead rely
               | upon high-school level heuristics to design their
               | personal studies.
               | 
               | This isn't a level of study about analyzing other
               | people's results and finding flaws in other people's
               | (possibly maliciously seeded) results. This is a
               | heuristic about how to run your own experiments and how
               | to prove something to yourself at a 95% confidence level.
               | If you want to get published in the scientific community,
               | the level of rigor is much higher of course, but no one
               | tries to publish a scientific paper on just a high school
               | education (which is where I was aiming my original
               | comment at).
        
               | User23 wrote:
               | There's a professor of Human Evolutionary Biology at
               | Harvard who only has a high school diploma[1]. Needless
               | to say he's been published and cited many times over.
               | 
               | [1] https://theconversation.com/profiles/louis-
               | liebenberg-122680...
        
               | withinboredom wrote:
               | I don't know whether you're mocking them or being
               | supportive of them or just stating a fact. Either way,
               | education level has no bearing on subject knowledge. I
               | know more about how computers, compilers, and software
               | algorithms work than most post-docs and professors that
               | I've run into in those subjects.
               | 
               | Am I smarter than them? Nope. Do I know as many fancy big
               | words as them? Nope. Do I care about results and
               | communicating complex topics to normal people? Yep. Do I
               | care more about making the company money than chasing
               | some bug-bear to go on my resume? Yep.
               | 
               | I fucking hate school and have no desire to ever go back.
               | I can't put up with the bullshit, so I dropped out; I
               | just never stopped studying and I don't need a piece of
               | paper to affirm that fact.
        
               | withinboredom wrote:
               | To the people downvoting, at least rebuttal.
        
               | lisper wrote:
               | First, I was specifically responding to this:
               | 
               | > I remember my high school AP Psychology teacher mocking
               | p=0.05 as practically meaningless.
               | 
               | and trying to explain why the OP's teacher was probably
               | right.
               | 
               | Second:
               | 
               | > So run a meta-study upon the results published by a set
               | of authors and double-check to make sure that their
               | results are normally distributed across the p-values
               | associated with their studies.
               | 
               | That won't work, especially if you only run the meta-
               | study on published results because it is all but
               | impossible to get negative results published. Authors
               | don't need to cherry-pick, the peer-review system does it
               | for them.
               | 
               | > These problems are solved problems in the scientific
               | community.
               | 
               | No, they aren't. These are social and political problems,
               | not mathematical ones. And the scientific community is
               | pretty bad at solving those.
               | 
               | > the discussion point you're making is well beyond the
               | high-school level needed for a general education
               | 
               | I strongly disagree. I think everyone needs to understand
               | this so they can approach scientific claims with an
               | appropriate level of skepticism. Understanding how the
               | sausage is made is essential to understanding science.
               | 
               | And BTW, I am not some crazy anti-vaxxer climate-change
               | denialist flat-earther. I was an academic researcher for
               | 15 years -- in a STEM field, not psychology, and even
               | _that_ was sufficiently screwed up to make me change my
               | career. I have advocated for science and the scientific
               | method for decades. It 's not science that's broken, it's
               | the academic peer-review system, which is essentially
               | unchanged since it was invented in the 19th century.
               | _That_ is what needs to change. And that has nothing to
               | do with math and everything to do with politics and
               | economics.
        
               | trashtester wrote:
               | > It's not science that's broken, it's the academic peer-
               | review system, which is essentially unchanged since it
               | was invented in the 19th century.
               | 
               | In my experience, it's not even this. Rather, it is that
               | outside of STEM, very, very few people truly understand
               | hypothesis testing.
               | 
               | At least in my experience, even basic concepts, as
               | "falsify the null-hypothesis" is surprisingly hard, even
               | with presumably intelligent people, such as MD's in PHd
               | programmes.
               | 
               | They will still tend to believe that a "significant"
               | result is proof of an effect, and often even believe it
               | proves that the effect is causal with the direction they
               | prefer.
               | 
               | At some point, stats just becomes a set of arcane
               | conjurations for an entire field. At that point, the
               | field as a whole tends to lose their ability to follow
               | the scientific method and turns into something resembling
               | a cult or clergy.
        
               | lisper wrote:
               | FWIW, I got through a Ph.D. program in CS without ever
               | having to take a stats course. I took probability theory,
               | which is related, but not the same thing. I had to figure
               | out stats on my own. So yes, I think you're absolutely
               | right, but it's not just "outside of STEM" -- sometimes
               | it's inside of STEM too.
        
               | thaumasiotes wrote:
               | > and double-check to make sure that their results are
               | normally distributed across the p-values associated with
               | their studies
               | 
               | What is the distribution of a set of results over a set
               | of p-values?
               | 
               | If you mean that you should check to make sure that the
               | p-values themselves are normally distributed... wouldn't
               | that be wrong? Assuming all hypotheses are false,
               | p-values should be uniformly distributed. Assuming some
               | hypotheses can sometimes be true, there's not a lot you
               | can say about the appropriate distribution of p-values -
               | it would depend on how often hypotheses are correct, and
               | how strong the effects are.
        
               | Viliam1234 wrote:
               | > p=0.05 means that one result in 20 is going to be the
               | result of chance.
               | 
               | You made the same mistake most people make here: you
               | turned the arrow of the implication. It is not
               | "successful experiment implies chance (probability 5%)"
               | but "chance implies successful experiment (probability
               | 5%)".
               | 
               | What does that mean in practice? Imagine a hypothetical
               | scientist that is fundamentally confused about something
               | important, so _all_ hypotheses they generate are false.
               | Yet, using p=0.05, 5% of those hypotheses will be
               | "confirmed experimentally". In that case, it is not 5% of
               | the "experimentally confirmed" hypotheses that are wrong
               | -- it is full 100%. Even without any cherry-picking.
               | 
               | The problem is not that p=0.05 is too high. The problem
               | is, it doesn't actually mean what most people believe it
               | means.
        
               | majormajor wrote:
               | > What does that mean in practice? Imagine a hypothetical
               | scientist that is fundamentally confused about something
               | important, so all hypotheses they generate are false.
               | Yet, using p=0.05, 5% of those hypotheses will be
               | "confirmed experimentally". In that case, it is not 5% of
               | the "experimentally confirmed" hypotheses that are wrong
               | -- it is full 100%. Even without any cherry-picking.
               | 
               | Well, that's example is also introducing dependence,
               | which is a tricky thing of course whenever we talk about
               | chance and stats.
               | 
               | But there's also another issue - a statement like "5% of
               | positive published results are by chance since we have a
               | p<=0.05 standard" treats every set of results as if
               | p=0.05, whereas some of them are considerably lower
               | anyway. Though the point of bad actors cherry-picking to
               | screw up the data also comes into play here.
               | 
               | (And of course, fully independent things in life are much
               | harder to find than one might think at first.)
        
               | stkdump wrote:
               | I agree that the point about the 'confused scientist' is
               | important, even if that itself is not stated clearly
               | enough. Here is my own reading:
               | 
               | Imagine that a scientist is making experiments of the
               | form: Does observable variable A correlate with
               | observable variable B? Now imagine that there are
               | billions of observable variables and almost all of them
               | are not correlated. And imagine that there is no better
               | way to come up with plausible correlations to test than
               | randomly picking variables. Then it will take a very long
               | time and a very large number of experiments to find a
               | pair that is truly correlated. It will be inevitable that
               | most positive results are bogus.
        
               | lisper wrote:
               | I think we're actually in violent agreement here, but I
               | just wasn't precise enough. Let me try again:
               | p=0.05 means that one POSITIVE result in 20 is going to
               | be the result of chance and not causality
               | 
               | In other words: if I have some kind of intervention or
               | treatment, and that intervention or treatment produces
               | some result in a test group relative to a control group
               | with p=0.05, then the odds of getting that result simply
               | by chance and not because the treatment or intervention
               | actually had an effect are 5%.
               | 
               | The practical effect of this is that there are two
               | different ways of getting a p=0.05 result:
               | 
               | 1. Find a treatment or intervention that actually works
               | or
               | 
               | 2. Test ~20 different (useless) interventions. Or test
               | one useless intervention ~20 times.
               | 
               | A single p=0.05 result in isolation is useless because
               | there is no way to know which of the two methods produced
               | it.
               | 
               | This is why replication is so important. The odds of
               | getting a p=0.05 result by chance is 5%. But the odds of
               | getting TWO of them in sequential trials is 0.25%, and
               | the odds of a positive result being the result of pure
               | chance decrease exponentially with each subsequent
               | replication.
        
               | thaumasiotes wrote:
               | > Let me try again:
               | 
               | > p=0.05 means that one POSITIVE result in 20 is going to
               | be the result of chance and not causality
               | 
               | No, you still didn't get it. In the example above, a full
               | 100% of positive results, 20 out of every 20, are the
               | result of chance and not causality.
               | 
               | Your followup discussion is better, but your statement at
               | the top doesn't work.
               | 
               | (Note also that there is an interaction between
               | p-threshold and sample size which guarantees that, if
               | you're investigating an effect that your sample size is
               | not large enough to detect, any statistically significant
               | result that you get will be several times stronger than
               | the actual effect. They're also quite likely to have the
               | wrong sign.)
        
               | lisper wrote:
               | > No, you still didn't get it. In the example above, a
               | full 100% of positive results, 20 out of every 20, are
               | the result of chance and not causality.
               | 
               | Yep, you're right. I do think I understand this, but
               | rendering it into words is turning out to be surprisingly
               | challenging.
               | 
               | Let me try this one more time: p=0.05 means that there is
               | a 5% chance that any one particular positive result is
               | due to chance. If you test a false hypothesis repeatedly,
               | or test multiple false hypotheses, then 5% of the time
               | you will get false positives (at p=0.05).
               | 
               | However...
               | 
               | > Imagine a hypothetical scientist that is fundamentally
               | confused about something important, so all hypotheses
               | they generate are false. Yet, using p=0.05, 5% of those
               | hypotheses will be "confirmed experimentally". In that
               | case, it is not 5% of the "experimentally confirmed"
               | hypotheses that are wrong -- it is full 100%.
               | 
               | This is not wrong, but it's a little misleading because
               | you are _presuming_ that all of the hypotheses being
               | tested are false. If we 're testing a hypothesis it's
               | generally because we don't know whether or not it's true;
               | we're trying to find out. That's why it's important to
               | think of a positive result not as "confirmed
               | experimentally" but rather as "not ruled out by this
               | particular experimental result". It is only after failing
               | to rule something out by _multiple_ experiments that we
               | can start to call it  "confirmed". And nothing is ever
               | 100% confirmed -- at best it is "not ruled out by the
               | evidence so far".
        
               | hgsgm wrote:
               | You can't simply ignore the base rate, even if you don't
               | know it.
               | 
               | In a purely random world, 5% of experiments are false
               | positives, at p=0.05. None are true positives.
               | 
               | In a well ordered world with brilliant hypotheses, there
               | are no false positives.
               | 
               | If more than 5% of experiments show positive results at
               | p=0.05, some of them are probably true, so you can try to
               | replicate them with lower p.
               | 
               | p=0.05 is a filter for "worth trying to replicate" (but
               | even that is modulated by cost of replication vs value of
               | result).
               | 
               | The crisis in science is largely that people confuse
               | "publishable" with "probably true". Anything "probably
               | better then random guessing" is publishable to help other
               | researchers, but that doesn't mean it's probably true.
        
               | lisper wrote:
               | > p=0.05 is a filter for "worth trying to replicate"
               | 
               | Yes, I think that is an excellent way to put it.
               | 
               | > The crisis in science is largely that people confuse
               | "publishable" with "probably true".
               | 
               | I would put it slightly differently: people conflate
               | "published in a top-tier peer-reviewed journal" with
               | "true beyond reasonable dispute". They also conflate "not
               | published in a top-tier peer-reviewed journal" with
               | "almost certainly false."
               | 
               | But I think we're in substantial agreement here.
        
               | thaumasiotes wrote:
               | > I do think I understand this, but rendering it into
               | words is turning out to be surprisingly challenging.
               | 
               | A p-value of .05 means that, under the assumption that
               | the null hypothesis you specified is true, you just
               | observed a result which lies at the 5th percentile of the
               | outcome space, sorted along some metric (usually
               | "extremity of outcome"). That is to say, out of all
               | possible outcomes, only 5% of them are as "extreme" as,
               | or more "extreme" than, the outcome you observed.
               | 
               | It doesn't tell you anything about the odds that any
               | result is due to chance. It tells you how often the null
               | hypothesis gives you a result that is "similar", by some
               | definition, to the result you observed.
        
               | lisper wrote:
               | What do you think that "due to chance" _means_?
        
               | Viliam1234 wrote:
               | Do you know the difference between "if A then B" and "if
               | B then A"?
               | 
               | This is the same thing, but with probabilities: "if A,
               | then 5% chance of B" and "if B, then 5% chance of A".
               | Those are two very different things.
               | 
               | p=0.05 means "if hypothesis is wrong, then 5% chance of
               | published research". It does not mean "if published
               | research, then 5% chance of wrong hypothesis"; but most
               | people believe it does, including probably most
               | scientists.
        
         | [deleted]
        
       | jameshart wrote:
       | Curious how people are 'applying' the Law of Large Numbers in a
       | way that needs this advice to be tacked on?
       | 
       | > Always keep the speed of convergence in mind when applying the
       | law of large numbers.
       | 
       | Any 'application' of the LLN basically amounts to replacing some
       | probalistic number derived from a bunch of random samples with
       | the _expected value_ of that number... and tacking on 'for
       | sufficiently large _n_ ' as a caveat to your subsequent
       | conclusions.
       | 
       | Figuring out whether, in practical cases, you will have a
       | sufficiently large _n_ that the conclusion is valid is a
       | necessary step in the analysis.
        
         | LudwigNagasena wrote:
         | > Figuring out whether, in practical cases, you will have a
         | sufficiently large n that the conclusion is valid is a
         | necessary step in the analysis.
         | 
         | The econometrics textbook I studied has more words "asymptotic"
         | in it than there are pages. Oftentimes it's impractical or even
         | theoretically intractable to derive finite sample properties
         | (and thus to answer when n is _really_ large enough).
        
       | otabdeveloper4 wrote:
       | About three fifty.
       | 
       | (No, just joking. Actually 42 plus or minus.)
        
       | gloryless wrote:
       | This kind of intuition is why a high school level statistics or
       | probability class seems so so valuable. I know not everyone will
       | use the math per se, but the concepts apply to everyday life and
       | are really hard to just grasp without having been taught it at
       | some point.
        
         | zodmaner wrote:
         | The sad thing is, having a mandatory high school level
         | statistics & probability class alone is not enough, you'll also
         | need a good curriculum and a competent teacher to go along with
         | it. Otherwise, it wouldn't work: a bad curriculum taught badly
         | by a unmotivated or unqualified teacher will almost always fail
         | to teach the intuition, or, even worse, alienates students from
         | the materials.
        
       | Dylan16807 wrote:
       | > This means that on average, we'll need a fifty million times
       | larger sample for the sample average to be as close to the true
       | average as in the case of dice rolls.
       | 
       | This is "as close" in an absolute sense, right?
       | 
       | If I take into account that the lottery value is 20x larger, and
       | I'm targeting relative accuracy, then I need 2.5 million times as
       | many samples?
        
       | causality0 wrote:
       | Am I the only one unreasonably annoyed that his graphs don't
       | match the description of his rolls?
        
       | alexb_ wrote:
       | If you had a gambling game that was simply "heads or tails, even
       | money", you would expect over a Large Number of trials that you
       | would get 0. But once you observe exactly one trial, the expected
       | value because +1 or -1 unit. We know this is always going to
       | happen one way or the other. Why then, does the bell curve of
       | "expected value" for this game not have two peaks, at 1 and -1?
       | Why does it peak at 0 instead?
       | 
       | What I'm asking about, I know I'm wrong about - I just want to
       | know how I can derive that for myself.
        
         | ineptech wrote:
         | "The expected value of a random variable with a finite number
         | of outcomes is a weighted average of all possible outcomes." --
         | https://en.wikipedia.org/wiki/Expected_value
        
           | alexb_ wrote:
           | That makes sense, I was always thinking of it as "Given an
           | infinite number of trials..."
        
             | tnecniv wrote:
             | That would be the frequentist interpretation. A Bayesian
             | would say that probability is to be interpreted in belief
             | that an outcome will occur. Neither is really right or
             | wrong, it depends on what you're modeling. If we're
             | analyzing some kind of heavily repeated task (e.g., a
             | sordid night of us glued to the blackjack table where we
             | play a lot of hands or data transmission over a noisy
             | cable), a frequentist interpretation might feel more sense.
             | However if you're talking about the probability of a
             | candidate winning an election, you could take a Bayesian
             | view where the probability asserts a confidence in an
             | outcome. A radical frequentist would take umbrage with an
             | event that only happens once. However, I suppose, depending
             | on your election rules and model (e.g., a direct
             | democracy), you could interpret the election winner in a
             | frequentist manner: the probability of winning is the rate
             | at which people vote for the candidate. For a more
             | complicated system I'm not sure the frequentist view is as
             | easily justified.
             | 
             | However to answer your question more directly, the expected
             | value is just another name for the average or mean of a
             | random variable. In this case, the variable is your profit.
             | Assume we're betting a dollar per toss on coin flips and I
             | win if it's heads (everyone knows heads always wins,
             | right?). The expected value is probability of heads * 1 -
             | probability of tails * 1. If the coin is fair, the
             | probabilities are the same so the expected value is zero.
             | 
             | Aside: sequences of random variables that are "fair bets"
             | are called martingales and are incredibly useful. It's a
             | fair bet because, given all prior knowledge of the value of
             | the variable thus far, the expected value of the next value
             | you witness is the current value of the variable. You could
             | imagine looking at a history of stock values. Given all
             | that information, it's a martingale (and thus a fair bet)
             | if given that information your expected profit from
             | investing is 0.
        
             | ineptech wrote:
             | Whether/when its better to think in terms of "X has a 37%
             | chance of happening in a single trial" vs "If you ran a lot
             | of trials, X would happen in 37% of them" is kind of a
             | fraught topic that I can't say much about, but you might
             | find this interesting:
             | https://en.wikipedia.org/wiki/Probability_interpretations
        
         | hddqsb wrote:
         | There are a couple of confusions/ambiguities here.
         | 
         | The Law of Large Numbers is about the _average_ , so it's not
         | relevant here (an average of +1 would mean you got heads _every
         | single time_ , which is extremely unlikely for large n).
         | 
         | If you are looking at the _sum_ , then the value depends on
         | whether the number of trials (n) is even or odd. If n is odd,
         | you would indeed get two peaks at 1 and -1, and you would
         | _never_ get exactly 0. If n is even, you would get a peak at 0
         | and you would never get exactly 1 or -1.
         | 
         | The expected value (aka average) is a _number_ , not a
         | distribution. The expected value for the sum is 0 even when n
         | is odd and you can't get exactly 0 -- that's just how the
         | expected value works (in the same way that the "expected value"
         | for the number of children in a family can be 2.5 even though a
         | family can't have half a child). If you look at the
         | _probability density function_ for a single trial, then it
         | _does_ have two peaks at 1 and -1 (and is zero everywhere
         | else).
         | 
         | The curve you refer to might be the normal approximation (https
         | ://en.wikipedia.org/wiki/Binomial_distribution#Normal_a...).
         | It's true that the normal approximation for the distribution of
         | the sum in your gambling game has a peak at 0 even when n is
         | odd and the sum can't be exactly 0. That's because the normal
         | approximation is a _continuous_ approximation and it doesn 't
         | capture the discrete nature of the underlying distribution.
        
         | bonoboTP wrote:
         | You're on the right track. The only thing you're missing is
         | that adding (averaging) two bell curve distributions that are
         | offset by a little does not necessarily give a bimodal
         | distribution. It will only be bimodal if the two unimodal that
         | you are adding are placed far enough away from each other.
         | 
         | See this https://stats.stackexchange.com/questions/416204/why-
         | is-a-mi...
        
         | munchbunny wrote:
         | The intuitive explanation is that the effect of a single sample
         | on the average diminishes as you take more samples. So, hand-
         | waving a bit, let's assume it's true that over a large number
         | of trials you would expect the average to converge to 0. You
         | just tossed a coin and got heads, so you're at +1. The average
         | of (1 + 0*n)/(n+1) still goes to 0 as n grows bigger and
         | bigger.
         | 
         | That skips over the distinction between "average" and
         | "probability distribution", but those are nuances are probably
         | better left for a proof of the central limit theorem.
        
       | pid-1 wrote:
       | One way to gain intuition on why the LLN might work faster,
       | slower or not at all is to write the mean equation in rescursive
       | form (What's the next expected value estimation, given a new
       | sample and the current expected value estimation?).
        
       | crdrost wrote:
       | If the blog author is reading, some notes for improvement:
       | 
       | - Your odds calculation is likely wrong. You assumed from the
       | word "odds" that "odds ratio" was meant, (Odds=3 meaning "odds
       | 3:1 against" corresponding to p=25%) but the phrase is
       | "approximate odds 1 in X" (Odds=3 meaning "odds of 1 in 3 to win"
       | meaning 33%) and recalculating results in the remarkably exact
       | expected value of $80 which seems intentional?
       | 
       | - You phrase things in terms of variances, people will think more
       | in terms of standard deviations. So 3.5 +- 1.7 vs $80 +- $12,526.
       | 
       | - Note that you try to make a direct comparison between those two
       | but the two are in fact incomparable. The most direct comparison
       | might be to subtract 1 from the die roll and multiply by $32, so
       | that you have a 1/6 chance of winning $0, 1/6 of winning $32, ...
       | 1/6 of winning $160. So then we have $80 +- $55 vs $80 +-
       | $12,526. Then instead of saying you'd need 50 _million_ more
       | lottery tickets you 'd actually say you need about 50 _thousand_
       | more. This is closer to the  "right ballpark" where you can tell
       | that the whole lottery is expected to sell about 10,200,000
       | tickets on a good day.
       | 
       | - But where an article like this should really go is, "what are
       | you using the numbers for?". In the case of the Texas lottery
       | this is actually a strong constraint, they have to make sure that
       | they make a "profit" (like, it's not a real profit, it probably
       | goes to schools or something) on most lotteries, so you're
       | actually trying to ensure that 5 sigma or so is less than the
       | bias. So you've got a competition between $20 * _n_ and 5 *
       | $12,526 * [?]( _n_ ), or [?]( _n_ ) = 12526/4, _n_ = 9.8 million.
       | So that 's what the Texas Lottery is targeting, right? So then we
       | would calculate that the equivalent number of people that should
       | play in the "roll a die linear lottery" we've constructed is 187,
       | call it an even 200, if 200 people pay $100 for a lottery ticket
       | on the linear lottery then we can pretty much always pay out even
       | on a really bad day.
       | 
       | - So the 50,000x number that is actually correct is basically
       | just saying that we can run a much smaller lottery, 50,000 times
       | smaller, with that payoff structure. And there's something nice
       | about phrasing it this way.
       | 
       | - To really get "law of large numbers" we should _actually_
       | probably be looking at how much these distributions deviate from
       | Gaussian, rather than complaining that the Gaussian is too wide?
       | You can account for a wide Gaussian in a number of ways. But
       | probably we want to take the cube root of the 3d cumulant, for
       | example, try to argue when it  "vanishes"? Except given the
       | symmetry the 3rd cumulant for the die is probably 0 so you might
       | need to go out to the 4th cumulant for the die -- and this might
       | give a better explanation for the die converging more rapidly in
       | "shape" to the mean, it doesn't just come close faster, it also
       | becomes a Gaussian significantly faster because the payoff
       | structure is symmetric about the mean.
        
       | nathell wrote:
       | Tangential: syntax-highlighting math! This is the first time I've
       | seen it. Not yet sure what I think about it, but I can definitely
       | see the allure.
        
         | rendaw wrote:
         | Pedant-man on the scene: this is just highlighting since the
         | highlighting isn't derived from syntax.
        
           | crazygringo wrote:
           | Which makes me wonder what _that_ would look like and would
           | it be helpful?
           | 
           | But there's already such complex and varied typography in
           | math I wonder if it would be kind of redundant. E.g. you
           | don't need matching parentheses to be colored when they
           | already come in different sets of matching heights.
        
         | andrewprock wrote:
         | It's been around for a long time (centuries?). In most
         | textbooks, you'll get different semantics for italic and bold
         | faces. Modern textbooks with color printing often use color in
         | semantic ways.
        
         | Tachyooon wrote:
         | It's easy on the eyes and it can make reading lots of equations
         | less awkward if done correctly. I remember finding out this was
         | possible while I was working on an assignment in Latex - it
         | looked amazing.
         | 
         | It takes a little bit of work to colour in equations but I hope
         | more people start doing it (including me, I'd forgotten about
         | it for a while)
        
         | tetha wrote:
         | Yeah, I like it. I used this as a tutor in more finicky
         | exercises when it becomes really important to keep 2-3 very
         | similar, but different things apart. It takes a bit of
         | dexterity, but you can switch fluently between 3 different
         | whiteboard markers held in one hand while writing, haha.
         | 
         | I am kind of wondering if a semantic highlighting makes sense
         | as well. You often end up with some implicit assignment of
         | lowercase latin, uppercase latin, lowercase greek letters and
         | such for certain meanings. Kinematic - xyzt for position in
         | time, T_i(I_i) for the quaternion or transformation
         | representing a certain joint of a robot.
        
       | derbOac wrote:
       | The biggest problem for real processes is knowing whether in fact
       | x ~ i.i.d., with regard to time as well as individual
       | observations.
        
       ___________________________________________________________________
       (page generated 2023-09-13 23:02 UTC)