[HN Gopher] Effect size is significantly more important than sta...
       ___________________________________________________________________
        
       Effect size is significantly more important than statistical
       significance
        
       Author : stochastician
       Score  : 189 points
       Date   : 2021-09-14 16:10 UTC (6 hours ago)
        
 (HTM) web link (www.argmin.net)
 (TXT) w3m dump (www.argmin.net)
        
       | ammon wrote:
       | But how much more important? :) Sorry, could not help myself.
        
       | [deleted]
        
       | kbrtalan wrote:
       | There's a whole book about this idea, Antifragile by Nassim
       | Taleb, highly recommended
        
       | abeppu wrote:
       | I think the weird thing is that a bunch of people in tech
       | understand this well _with respect to tech_, but often fall into
       | the same p-value trap when reading about science.
       | 
       | If you're working with very large datasets generated from e.g. a
       | huge number of interactions between users and your system,
       | whether as a correlation after the fact, or as an A/B experiment,
       | getting a statistically significant result is easy. Getting a
       | meaningful improvement is rarer, and gets harder after a system
       | has received a fair amount of work.
       | 
       | But then people who work in these big-data contexts can read
       | about a result outside their field (e.g. nutrition, psychology,
       | whatever), where n=200 undergrads or something, and p=0.03 (yay!)
       | and there's some pretty modest effect, and be taken in by
       | whatever claim is being made.
        
       | agnosticmantis wrote:
       | An investigator needs to rule out all conceivable ways their
       | modeling can go wrong, among them the possibility of a
       | statistical fluke, which statistical significance is supposed to
       | take care of. So statistical significance may best be thought of
       | as a necessary condition, but is typically is taken to be a
       | sufficient condition for publication. If I see a strange result
       | (p-value < 0.05), could it be because my functional form is
       | incorrect? or because I added/removed some data? Or I failed to
       | include an important variable? These are hard questions and not
       | amenable to algorithmic application and mass production.
       | Typically these questions are ignored, and only the possibility
       | of a statistical fluke is ruled out (which itself depends on the
       | other assumptions being valid).
       | 
       | Dave Freedman's Statistical Models and Shoe Leather is a good
       | read on why such formulaic application of statistical modeling is
       | bound to fail.[0]
       | 
       | [0:https://psychology.okstate.edu/faculty/jgrice/psyc5314/Freed..
       | .]
        
       | jerf wrote:
       | Speaking not to this study in particular necessarily, I strongly
       | agree with the general point. Science has really been held back
       | by an over-focusing on "significance". But I'm not really
       | interested in a pile of hundreds of thousands of studies that
       | establish a tiny effect with suspiciously-just-barely-significant
       | results. I'm interested in studies that reveal robust results
       | that are reliable enough to be built on to produce other results.
       | Results of 3% variations with p=0.046 aren't. They're dead ends,
       | because you can't put very many of those into the foundations of
       | future papers before the probability of one of your foundations
       | being incorrect is too large.
       | 
       | To the extent that those are hard to come by... Yeah! They are!
       | Science is hard. Nobody promised this would be easy. Science
       | _shouldn 't_ be something where labs are cranking out easy
       | 3%/p=0.046 papers all the time just to keep funding. It's just a
       | waste of money and time of our smartest people. It _should_ be
       | harder than it is now.
       | 
       | Too many proposals are obviously only going to be capable of
       | turning up that result (insufficient statistical power is often
       | obvious right in the proposal, if you take the time to work the
       | math). I'd rather see more wood behind fewer arrows, and see
       | fewer proposals chasing much more statistical power, than the
       | chaff of garbage we get now.
       | 
       | If I were King of Science, or at least, editor of a prestigious
       | journal, I'd want to put word out that I'm looking for papers
       | with at least one of some sort of _significant_ effect, or a p
       | value of something like p = 0.0001. Yeah. That 's a high bar. I
       | know. That's the point.
       | 
       | "But jerf, isn't it still valuable to map out all the little
       | things like that?" No, it really isn't. We already have every
       | reason in the world to believe the world is _drenched_ in 1%
       | /p=0.05 effects. "Everything's correlated to everything", so
       | that's not some sort of amazing find, it's the totally expected
       | output of living in our reality. Really, this sort of stuff is
       | still just _below the noise floor_. Plus, the idea that we can
       | remove such small, noisy confounding factors is just silly. We
       | need to look for the things that stand out from that noise floor,
       | not spending billions of dollars doing the equivalent of
       | listening to our spirit guides communicate to us over white noise
       | from the radio.
        
         | naasking wrote:
         | > If I were King of Science, or at least, editor of a
         | prestigious journal, I'd want to put word out that I'm looking
         | for papers with at least one of some sort of significant
         | effect, or a p value of something like p = 0.0001. Yeah. That's
         | a high bar. I know. That's the point.
         | 
         | And study preregistration to avoid p-hacking and incentivize
         | publishing negative results. And full availability of data, aka
         | "open science".
        
           | DiabloD3 wrote:
           | Preregistration, _requirement_ to publish negative _or_ null
           | results, and full data is, arguably, the three legs of modern
           | science. If we collectively don 't enforce this, nobody is
           | doing science, they're just fucking around and writing it
           | down.
        
             | romwell wrote:
             | Also replication studies for negative or null results _in
             | addition to_ positive ones (we don 't have either).
        
           | analog31 wrote:
           | I've thought about the idea of allowing people to separately
           | publish data and analysis. Right now, data are only published
           | if the analysis shows something interesting.
           | 
           | Improving the quality of measurements and data could be a
           | rewarding pursuit, and could encourage the development of
           | better experimental technique. And a good data set, even if
           | it doesn't lead to an immediate result, might be useful in
           | the future when combined with data that looks at a problem
           | from another angle.
           | 
           | Granted, this is a little bit self serving: I opted out of an
           | academic career, partially because I had no good research
           | ideas. But I love creating experiments and generating data!
           | Fortunately I found a niche at a company that makes
           | measurement equipment. I deal with the quality of data, and
           | the problem of replication, all day every day.
        
         | Tycho wrote:
         | What do you (or anyone else) think about the statistical
         | conclusions in this paper? Particularly the adjusted r-squared
         | values reported.
         | 
         | https://www.cambridge.org/core/journals/american-political-s...
        
         | shakezula wrote:
         | I blame most of this on pop science. It's absolutely ruined the
         | average public's respect for the behind the scenes work doing
         | interesting stuff in every field. What's worse is the attitude
         | it breeds. Anti-intellectualism runs rampant amongst even well
         | educated members of my social circle. It's frustrating to say
         | the least.
        
           | shawn-butler wrote:
           | Some say that it is not anti-intellectualism to realize the
           | emperor has no clothes but enlightenment.
           | 
           | Either way it's dangerous.
        
         | hyperbovine wrote:
         | Come into Bayesian land, the water is fine. The whole NHST
         | edifice starts to seem really shaky once you stop and wonder if
         | "True" and "False" are really the only two possible states of a
         | scientific hypothesis. Andrew Gelman has written about this in
         | many places, e.g. http://www.stat.columbia.edu/~gelman/research
         | /published/aban....
        
           | Retric wrote:
           | Bayesian reasoning has even worse underpinnings. You don't
           | actually know any of the things the equations want. For
           | example suppose a robot is counting Red and Blue balls from a
           | bin, the count is 400Red and 637Blue, it just classified a
           | Red ball.
           | 
           | Now what's the count, wait what's the likelihood it
           | misclassified a ball? How accurate are those estimates, and
           | those estimates of those ...
           | 
           | For a real world example someone using Bayesian reasoning
           | when counting cards should consider the possibility that the
           | deck doesn't have the correct cards. And the possibility that
           | the decks cards have been changed over the course of the
           | game.
        
             | tux3 wrote:
             | Suppose the likelihood it missclassified a ball is
             | significantly different from zero, but not yet known
             | precisely.
             | 
             | If you use a model that doesn't ask you to think about this
             | likelihood at all, you will get the same result as if you
             | had used bayes and consciously chose to approximate the
             | likelihood of misclassification as zero.
             | 
             | You may get slightly better results if you have a
             | reasonnable estimate of that probability, but you will get
             | no worse if you just tell Bayes zero.
             | 
             | It feels like you're criticizing the model for _asking hard
             | questions_.
             | 
             | I feel like explicitely not knowing an answer is always a
             | small step ahead of not considering the question.
        
               | Retric wrote:
               | The criticism is important because of how Bayes keeps
               | using the probability between experiments. Garbage in
               | Garbage out.
               | 
               | As much as people complain about frequentist approaches,
               | examining the experiment independently from the output of
               | the experiment effectively limits contamination.
        
             | Karrot_Kream wrote:
             | Huh? You can derive all of those from Bayesian models. If
             | you're counting balls from a bin with replacement, and your
             | bot has counted 400Red with 637Blue, you have a
             | Beta/Binomial model. That means you p_blue | data ~
             | Beta(401, 638) assuming a Uniform prior. The probability of
             | observing a red ball given the above p_blue | data is
             | P(red_obs | p_blue) = 1 - P(blue_obs | p_blue), which is
             | calculable from p_blue | data. In fact in this simple
             | example you can even analytically derive all of these
             | values, so you don't even need a simulation!
        
               | tfehring wrote:
               | And if misclassification is a concern (as the parent
               | mentioned) you can put a prior on that rate too!
        
               | Retric wrote:
               | Which rate? The rate you failed to mix the balls? The
               | rate you failed to count a ball? The rate you
               | misclassified the ball? The rate you repeatedly counted
               | the same ball? The rate you started with an incorrect
               | count? The rate you did the math wrong? etc
               | 
               | Here's the experiment and here's the data is concrete it
               | may be bogus but it's information. Updating probabilistic
               | based on recursive estimates of probabilities is largely
               | restating your assumptions. Black swans can really throw
               | a wrench into things.
               | 
               | Plenty of downvotes and comments, but nothing addressing
               | the point of the argument might suggest something.
        
               | Karrot_Kream wrote:
               | > Which rate? The rate you failed to mix the balls? The
               | rate you failed to count a ball? The rate you
               | misclassified the ball? The rate you repeatedly counted
               | the same ball? The rate you started with an incorrect
               | count? The rate you did the math wrong? etc
               | 
               | This is called modelling error. Both Bayesian and
               | frequentist approaches suffer from modelling error.
               | That's what TFA talks about when mentioning the normality
               | assumptions behind the paper's GLM. Moreover, if errors
               | are additive, certain distributions combine together
               | easily algebraically meaning it's easy to "marginalize"
               | over them as a single error term. In most GLMs, there's a
               | normally distributed error term meant to marginalize over
               | multiple i.i.d normally distributed error terms.
               | 
               | > Plenty of downvotes and comments, but nothing
               | addressing the point of the argument might suggest
               | something.
               | 
               | I don't understand the point of your argument. Please
               | clarify it.
               | 
               | > Here's the experiment and here's the data is concrete
               | it may be bogus but it's information. Updating
               | probabilistic based on recursive estimates of
               | probabilities is largely restating your assumptions.
               | 
               | What does this mean, concretely? Run me through an
               | example of the problem you're bringing up. Are you saying
               | that posterior-predictive distributions are "bogus"
               | because they're based on prior distributions? Why?
               | They're just based on the application of Bayes Law.
               | 
               | > Black swans can really throw a wrench into things
               | 
               | A "black swan" as Taleb states is a tail event, and this
               | sort of analysis is definitely performed (see:
               | https://en.wikipedia.org/wiki/Extreme_value_theory). In
               | the case of Bayesian stats, you're specifically
               | calculating the entire posterior distribution of the
               | data. Tail events are visible in the tails of the
               | posterior predictive distribution (and thus calculable)
               | and should be able to tell you what the consequences are
               | for a misprediction.
        
             | kadoban wrote:
             | Can't you just add that to your equation? Seems like for
             | anything real, this will not go many levels deep at all
             | before it's irrelevent.
        
           | funklute wrote:
           | > The whole NHST edifice starts to seem really shaky once you
           | stop and wonder if "True" and "False" are really the only two
           | possible states of a scientific hypothesis.
           | 
           | The root problem here is that people tend to dichotomise what
           | are fundamentally continuous hypothesis spaces. The correct
           | question is not "is drug A better than drug B?", it's "how
           | much better or worse is drug A compared to drug B?". And this
           | is an error you can do both in Bayesian and frequentist
           | lands, though culturally the Bayesians have a tendency to
           | work directly with the underlying, continuous hypothesis
           | space.
           | 
           | That said, there are sometimes external reasons why you have
           | to dichotomise your hypothesis space. E.g. ethical reasons in
           | medicine, since otherwise you can easily end up concluding
           | that you should give half your patients drug A and the other
           | half drug B, to minimise volatility of outcomes (this
           | situation would occur when you're very uncertain which drug
           | is better).
        
           | Karrot_Kream wrote:
           | Gelman et al's BDA3 has a fun exercise estimating heart-
           | disease rates in one of the early chapters that demonstrates
           | this issue with effect-sizes. BDA3 uses a simple frequentist
           | model to determine heart-disease rates and shows that areas
           | with small population sizes have heavily exaggerated heart-
           | disease rates because of the small base population. Building
           | a Bayesian model does not have the same issue as the prior
           | population prevalence incorporates the small base population
           | sizes.
        
         | bluGill wrote:
         | > Plus, the idea that we can remove such small, noisy
         | confounding factors is just silly. We need to look for the
         | things that stand out from that noise floor
         | 
         | We have found most of them, and all the easy ones. Today the
         | interesting things are near the noise floor. 3000 years ago
         | atoms were well below the noise floor, now we know a lot about
         | them - most of it seems useless in daily life yet a large part
         | of the things we use daily depend on our knowledge of the atom.
         | 
         | Science needs to keep separating things from the noise floor.
         | Some of them become important once we understand it.
        
           | exporectomy wrote:
           | Doesn't it make a difference if it's near the noise floor
           | because it's hard to measure (atoms) or if it's near the
           | noise floor because it's hardly there (masks)? Maybe if these
           | "hardly there" results led to further research that isolated
           | some underlying "very there" phenomena, they would be
           | important, but until that happens, who cares if thinking
           | about money makes you slightly less generous than thinking
           | about flowers? If they're not building on previous research
           | to discover more and more important things, then it doesn't
           | seem like useful progress.
        
           | kazinator wrote:
           | Individual atoms, or small numbers of them, may be beneath
           | some noise floor, but not combined atoms.
           | 
           | A salt crystal (Lattice of NaCl atoms) is nothing like a pure
           | gold nugget (clump of Au atoms).
           | 
           | That difference is a massive effect.
           | 
           | So to begin with, we have this sort of massive effect which
           | requires an explanation, such as atoms.
           | 
           | Maybe the right language here is not that we need an effect
           | rather than statistical significance, but that we need a
           | clear, unmistakable _phenomenon_. There has to be a
           | phenomenon, which is then explained by research. Research
           | cannot be _inventing the phenomenon_ by whiffing at the faint
           | fumes of statistical significance.
        
           | jerf wrote:
           | I don't think we have found most of them. I think we make it
           | look like we've found most of them because we keep throwing
           | money at these crap studies.
           | 
           | Bear in mind that my criteria are two-dimensional, and I'll
           | accept either. By all means, go back and establish your 3%
           | effect to a p-value of 0.0001. Or 0.000000001. That makes
           | that 3% much more interesting and useful.
           | 
           | It'll especially be interesting and valuable when you fail to
           | do so.
           | 
           | But we do not, generally, do that. We just keep piling up
           | small effects with small p-values and thinking we're getting
           | somewhere.
           | 
           | Further, if there is a branch of some "science" that we've
           | exhaused so thoroughly that we can't find anything that isn't
           | a 3%/p=0.047 effect anymore... pack it in, we're done here.
           | Move on.
           | 
           | However, part of the reason I so blithely say that is that I
           | suspect if we did in fact raise the standards as I propose
           | here, it would realign incentives such that more sciences
           | would start finding more useful results. I suspect, for
           | instance, that a great deal of the soft sciences probably
           | could find some much more significant results if they studied
           | larger groups of people. Or spent more time creating theories
           | that aren't about whether priming people with some sensitive
           | word makes them 3% more racist for the next twelve minutes,
           | or some other thing that even if true really isn't that
           | interesting or useful as a building block for future work.
        
         | tommiegannert wrote:
         | A few years ago, HN comments complained about the censorship
         | that only leaves successful studies. We need to report on
         | everything we've tried, so we don't walk around on donuts.
         | 
         | What's missing in my mind is admitting that results were
         | negative. I'm reading up on financial literacy, and many
         | studies end with some metrics being "great" at p 5%, but then
         | some other metrics are also "great" at p 10%, without the
         | author ever explaining what they would have classified as bad.
         | They're just reported without explanation of what significance
         | they would expect (in their field).
        
           | nix0n wrote:
           | > ...so we don't walk around on donuts
           | 
           | I agree with what you're saying, but I don't understand this
           | phrase.
        
             | Imnimo wrote:
             | The phrase "walk around on donuts" has one Google result
             | and it's this thread.
        
             | rootusrootus wrote:
             | I don't know where that turn of phrase comes from, but I
             | imagine it's synonymous with 'walking around in circles'.
        
             | potatoman22 wrote:
             | You know how sometimes you'll accidentally step on a donut
             | and you'll have to call your dog over to lick all the jelly
             | off your toes? That.
        
         | phreeza wrote:
         | This is clearly a cost/benefit tradeoff, and the sweet spot
         | will depend entirely on the field. If you are studying the
         | behavior of heads of state, getting an additional N is
         | extremely costly, and having a p=0.05 study is maybe more
         | valuable than having no published study at all, because the
         | stakes are very high and even a 1% chance of (for example)
         | preventing nuclear war is worth a lot. On the other hand, if
         | you are studying fruit flies, an additional N may be much
         | cheaper, and the benefit of yet another low effect size study
         | may be small, so I could see a good argument being made for
         | more stringent standards. In fact I know that in particle
         | physics the bar for discovery is much higher than p=0.05.
        
           | bongoman37 wrote:
           | What if it's the other way round and a p<0.05 study says that
           | the best way to make sure a rival country does not do a
           | nuclear strike on you first is to do a massive nuclear strike
           | on them first?
        
         | BenoitEssiambre wrote:
         | p = 0.0001 doesn't help much. You can get to an arbitrarily
         | small p by just using more data. The problem is trying to
         | reject a zero width null hypothesis. Scientists should always
         | reject something bigger than infinitesimally small so that they
         | are not catching tiny systematic biases in their experiments.
         | There are always small biases.
         | 
         | Gwern's page "Everything Is Correlated" is worth reading:
         | https://www.gwern.net/Everything
        
           | bpodgursky wrote:
           | It would at least filter out the social science experiments
           | where results on 30 college students is "significant" at
           | p=.04 (and it's too expensive to recruit 3000 of them to
           | force significance).
        
         | Robotbeat wrote:
         | The problem is that when you're on the cusp of a new thing,
         | unless you're super lucky, the result will necessarily be near
         | the noise floor. Real science is like that.
         | 
         | But I definitely agree it'd be nice to go back and show
         | something is true to p=.0001 or whatever. Overwhelmingly solid
         | evidence is truly a wonderful thing, and as you say, it's
         | really the only way to build a solid foundation.
         | 
         | When you engineer stuff, it needs to work 99.99-99.999% of the
         | time or more. Otherwise you're severely limited to how far your
         | machine can go (in terms of complexity, levels of abstraction
         | and organization) before it spends most of its time in a broken
         | state.
         | 
         | I've been thinking about this while playing Factorio: so much
         | of our discussion and mental modeling of automation works under
         | the assumption of perfect reliability. If you had SLIGHTLY
         | below 100% reliability in Factorio, the game would be a
         | terrible grind limited to small factories. Likewise with
         | mathematical proofs or computer transistors or self driving
         | cars or any other kind of automation. The reliability needs to
         | be insanely good. You need to add a bunch of nines to whatever
         | you're making.
         | 
         | A counterpoint to this is when you're in an emergency and
         | inaction means people die. In that case, you need to accept
         | some uncertainty early on.
        
           | MaulingMonkey wrote:
           | > If you had SLIGHTLY below 100% reliability in Factorio, the
           | game would be a terrible grind limited to small factories.
           | 
           | I'd argue you _do_ have  <100% reliability in Factorio, and
           | much of the game is in increasing the 9s.
           | 
           | Biters can wreck havok on your base. Miners contaminate your
           | belts with the wrong types of ore, if you weren't paying
           | enough attention near overlapping fields. Misplaced inserters
           | may mis-feed your assemblers, reducing efficiency or leaving
           | outright nonfunctional buildings. Misclicks can cripple large
           | swaths of your previously working factory, ruining plenty of
           | speedruns if they go uncaught. For later game megabase
           | situations, you must deal with limited lifetimes as mining
           | locations dry up, requiring you to overhaul existing systems
           | with new routes of resources into them. As inputs are split
           | and redirected, existing manufacturing can choke and sputter
           | when they end up starved of resources. Letting your power
           | plants starve of fuel can result in a small crisis! Electric
           | miners mining coal, refineries turning oil into solid fuel,
           | electric inserters fueling the boilers, water pumps providing
           | the water to said boilers - these things all take power, and
           | jump starting these after a power outage takes time you might
           | not have if under active attack if your laser turrets are all
           | offline as well.
           | 
           | But you have means of remediating much of this unreliability.
           | Emergency fuel and water stockpiles, configuring priorities
           | such that fuel for power is prioritized ahead of your fancy
           | new iron smelting setup, programmable alerts for when input
           | stockpiles run low, ammo-turrets that work without power,
           | burner inserters for your power production's critical path
           | will bootstrap themselves after an outage, roboports that
           | replace biter-attacked defenses.
           | 
           | Your first smelting setup in Factorio will likely be a hand-
           | fed burner miner and furnace, taking at most 50 coal. This
           | will run out of power in _minutes_. Then you might use
           | inserters to add a coal buffer. Then a belt of coal, so you
           | don 't need to constantly refill the coal buffer. Then a rail
           | station, so you don't need to constantly hand-route entirely
           | new coal and ore mining patches. Then you'll use blueprints
           | and bots to automate much of constructing your new inputs. If
           | you're really crazy, you'll experiment with automating the
           | usage of those blueprints to build self-expanding bases...
        
             | reilly3000 wrote:
             | I really considered getting into Factorio but your comment
             | is exactly why I can't touch it. I have certain demands
             | upon my time that would inevitably go unmet as I fuss with
             | factory.
        
           | twoslide wrote:
           | > when you're on the cusp of a new thing, unless you're super
           | lucky, the result will necessarily be near the noise floor.
           | Real science is like that.
           | 
           | That's not necessarily true in social sciences. When you're
           | working with large survey datasets, many variables are
           | significantly related. That doesn't mean these relationships
           | are meaningful or causal, they could be due to underlying
           | common causes, etc. (Maybe social sciences weren't included
           | in "real science" - but there's where a lot of stats
           | discussions focus)
        
           | mercurywells wrote:
           | > I've been thinking about this while playing Factorio: so
           | much of our discussion and mental modeling of automation
           | works under the assumption of perfect reliability. If you had
           | SLIGHTLY below 100% reliability in Factorio, the game would
           | be a terrible grind limited to small factories.
           | 
           | So I'm making a guess here that you play with few monsters or
           | non-aggressive monsters?
        
             | Robotbeat wrote:
             | Currently playing a game to minimize pollution to try to
             | totally avoid biter attention. Surrounded by trees, now
             | almost entirely solar with efficiency modules.
        
           | mumblemumble wrote:
           | Fine. Do it like the experimental physicists do: if you think
           | you're on to something, refine and repeat the experiment in
           | order to get a more robust, repeatable result.
           | 
           | The original sin of the medical and social sciences is
           | failing to recognize a distinction between exploratory
           | research and confirmatory research and behave accordingly.
        
             | Robotbeat wrote:
             | The problem is that it's really hard to get good data,
             | ethically, in medical sciences. Something that improves
             | outcomes by 5-10% can be really important, but trying to
             | get a study big enough to prove it can be super expensive
             | already.
        
               | TameAntelope wrote:
               | Nobody likes being in the control group of the first
               | working anti-aging serum...
        
               | q-big wrote:
               | > Nobody likes being in the control group of the first
               | working anti-aging serum...
               | 
               | You only know whether it works when the study has been
               | completed. You also only know whether the drug has
               | (potentially) disastrous consequences when the study has
               | been completed. Thus, I am not completely sure whether
               | your claim holds.
        
               | bluGill wrote:
               | You missed the working part. Success was a prerequisite
               | to their after the fact feelings. At least some of the
               | control group will be in old age but still alive when we
               | know it woris. They might not know if it is infinite life
               | (and side effects may turn it into die at 85, so some
               | control may outlive the intervention group after the
               | study ), but they will know on average they did worse
        
             | [deleted]
        
         | modeless wrote:
         | Not only is it not valuable to publish tons of studies with
         | p=.04999 and small effect size, in fact it's harmful. With so
         | many questionable results published in supposedly reputable
         | places it becomes possible to "prove" all sorts of crackpot
         | theories by selectively citing real research. And if you try to
         | dispute the studies you can get accused of being anti-science.
        
           | exporectomy wrote:
           | Only a problem for people who are trying hard not to think.
           | You can just ignore those people. They're not doing any harm
           | believing their beliefs.
        
             | bee_rider wrote:
             | We are literally in the middle of a global crisis that is
             | founded on people misunderstanding science.
        
               | exporectomy wrote:
               | What on earth are you talking about? I guess climate
               | change but that's certainly not founded on people
               | misunderstanding science, it's caused by people
               | understanding science which led to industrialization. Or
               | maybe you mean covid-19? Neither that. You're just trying
               | to make it seem like it's somehow very serious and bad if
               | everyone doesn't agree with you. It's not.
        
               | [deleted]
        
             | robbedpeter wrote:
             | The USDA food pyramid and nutrition education would suggest
             | that there's an inherent danger in just letting people
             | believe irrational things after a correction is known. It
             | depends on the belief - flat earth people aren't likely to
             | cause any harm. Bad nutrition information can wreak havoc
             | at scale.
        
               | vkou wrote:
               | Flat earth beliefs doesn't cause harm, but flat earth
               | believers have largely upgraded to believing more
               | dangerous nonsense.
        
               | exporectomy wrote:
               | Data or it didn't happen. This really sounds like you're
               | inventing a caricature of your enemy and assigning them
               | "dangerous" qualities so you can hate them more.
        
               | vkou wrote:
               | Nobody needs to caricature the insane beliefs surrounding
               | COVID (or flat earth), people holding them are doing a
               | good enough job of that themselves.
               | 
               | I do have a few favorites. "COVID tests give you COVID,
               | so I won't go get tested" is certainly up there. I can't
               | say I give two figs about your opinion on the Earth's
               | topology, but this one is a public health problem, that's
               | crippling hospitals around the country.
        
         | sanxiyn wrote:
         | I agree we shouldn't listen to noise, but small effect size is
         | not necessarily noise. (I agree it is highly correlated.) I
         | mean, QED's effect size on g factor is 1.001. QED was very much
         | worth finding out.
        
       | RandomLensman wrote:
       | These discussions are fun but rather pointless: e.g., sometimes a
       | small effect is really interesting but it needs to be pretty
       | strongly supported (for instance, claiming a 1% higher electron
       | mass or a 2% survival rate in rabies).
       | 
       | Also, most published research is inconsequential so it really
       | does not matter other than money spent (and that is not only
       | related to findings but also keeping people employed etc.). If
       | confidence in results is truly an objective might need to link it
       | directly to personal income or loss of income, ie force bets on
       | it.
        
       | robocat wrote:
       | From the article:
       | 
       | Ernest Rutherford is famously quoted proclaiming "If your
       | experiment needs statistics, you ought to have done a better
       | experiment."
       | 
       | "Of course, there is an existential problem arguing for large
       | effect sizes. If most effect sizes are small or zero, then most
       | interventions are useless. And this forces us scientists to
       | confront our cosmic impotence, which remains a humbling and
       | frustrating experience."
        
       | fmajid wrote:
       | The studies are in villages, but the real concern is dense urban
       | environments like New York (or Dhaka) where people are tightly
       | packed together and at risk of contagion. I'm pretty sure masks
       | make little difference in Wyoming either, where the population is
       | 5 people per square mile.
        
       | sanxiyn wrote:
       | Mask's effect size on seroprevalence is probably zero. So no
       | effect is expected result.
       | 
       | That's because mask acts on R0, not seroprevalence. After acting
       | on R0, if R0 is >1, exponential growth, if <1, exponential decay.
       | So no effect, unless it is the thing that pushes one from >1 to
       | <1.
        
         | [deleted]
        
         | lotu wrote:
         | Also, they aren't testing masking effect on seroprevalence (or
         | R0), they are testing the effect of sending out free masks and
         | encouraging masking. That is only going to move the percent of
         | people masking up or down a few percent at best.
        
           | sampo wrote:
           | The study says:
           | 
           | > The intervention increased proper mask-wearing from 13.3%
           | in control villages (N=806,547 observations) to 42.3% in
           | treatment villages (N=797,715 observations)
           | 
           | https://www.poverty-
           | action.org/sites/default/files/publicati...
        
       | hammock wrote:
       | >Effect Size Is Significantly More Important Than Statistical
       | Significance
       | 
       | Ok, but by how much?
        
       | mrtranscendence wrote:
       | > If most effect sizes are small or zero, then most interventions
       | are useless.
       | 
       | But this doesn't necessarily follow, does it? If there really
       | were a 1.1-fold reduction in risk due to mask-wearing it could
       | still be beneficial to encourage it. The salient issue (taking up
       | most of the piece) seems to be not the size of the effect but
       | rather the statistical methodology the authors employed to
       | measure that size. The p-value isn't meaningful in the face of an
       | incorrect model -- why isn't the answer a better model rather
       | than just giving up?
       | 
       | Small effects are everywhere. Sure, it's harder to disentangle
       | them, but they're still often worth knowing.
        
         | ummonk wrote:
         | > If there really were a 1.1-fold reduction in risk due to
         | mask-wearing it could still be beneficial to encourage it.
         | 
         | That's understating it. The study doesn't measure the reduction
         | in risk due to mask-wearing, but rather the reduction simply
         | from encouraging mask-wearing (which only increases actual mask
         | wearing by a limited amount). If the study's results hold up
         | statistically, then they're really impressive. With the caveat
         | of course that they apply to older variants with less viral
         | loads than Delta - it's likely Delta is more effective against
         | masks simply due to its viral load.
         | 
         | > The salient issue (taking up most of the piece) seems to be
         | not the size of the effect but rather the statistical
         | methodology the authors employed to measure that size. The
         | p-value isn't meaningful in the face of an incorrect model --
         | why isn't the answer a better model rather than just giving up?
         | 
         | Exactly. The irony of this article is that this is an example
         | where effect size is actually not the issue - it's potential
         | issues with statistical significance due to imperfect modeling,
         | and an inability for other researchers to rerun an analysis on
         | statistical significance, due to not publishing the raw data.
        
         | sanxiyn wrote:
         | I agree the problem here is an incorrect model. Mask does not
         | act on seroprevalence. Measuring mask's effect on
         | seroprevalence is just wrong study design, although it may be
         | easier to do.
        
         | whatshisface wrote:
         | Who cares if each effect is a factor of 2^(1/100) improvement,
         | just give me 100 interventions and I'll double the value being
         | measured.
        
       | nabla9 wrote:
       | If you have one BALB/c lab mouse, you give it something, and it
       | glows in the dark few months after, effect size alone makes it
       | significant.
        
       | exporectomy wrote:
       | I wonder if we should separate the roles of scientist and
       | researcher. Universities would have generalist "scientists" who's
       | job would be to consult for domain-specialized researchers to
       | ensure they're doing the science and statistics correctly. That
       | way, we don't need every researcher in every field to have a deep
       | understanding of statistics, which they often don't.
       | 
       | Either that or stop rewarding such bad behavior. Science jobs are
       | highly competitive, so why not exclude people with weak
       | statistics? Maybe because weak statistics leads to more spurious
       | exciting publications which makes the researcher and institution
       | look better?
        
         | civilized wrote:
         | The scientific establishment will never be convinced to stop
         | doing bad statistics, so "the solution to bad speech is more
         | speech". Statisticians should be rewarded for effective review
         | and criticism of flawed studies, and critical statistical
         | reviews of any article should be easy to find when they exist.
         | 
         | This is sounding like a great startup idea for a new scientific
         | journal, actually.
        
           | robertlagrant wrote:
           | Just adding an Arxiv filter that allows me to set a minimum
           | p-value or variation % would do it!
        
           | vavooom wrote:
           | I do enjoy the idea of a journal focused entirely on the
           | review of statistical methods and underlying methodologies
           | applied in modern day research. Could act as a helpful signal
           | for relevant and applicable research.
        
         | Robotbeat wrote:
         | We exclude people who don't publish. Papers tend not to publish
         | stuff that isn't a positive result.
        
       | ummonk wrote:
       | Agree with the title, but not the contents. The study in question
       | is actually an example of a huge effect size (10% reduction in
       | cases just from instructing villages they should wear masks is
       | amazing) possibly hampered by poor statistical significance (as
       | the blog post outlines).
        
       | _Nat_ wrote:
       | The title's misinformation: effect-size _ISN 'T_ more important
       | than statistical significance.
       | 
       | The article itself makes some better points, e.g.
       | 
       | > I worry that because of statistical ambiguity, there's not much
       | that can be deduced at all.
       | 
       | , which would seem like a reasonable interpretation of the study
       | that the article discusses.
       | 
       | However, the title alone seems to assert a general claim about
       | statistical interpretation that'd seem potentially harmful to the
       | community. Specifically, it'd seem pretty bad for someone to see
       | the title and internalize a notion of effect-size being more
       | important than statistical significance.
        
         | spywaregorilla wrote:
         | Not so fast. If you win your first jackpot on the first ticket.
         | You'll require 500,000 failures (at $1 per ticket) in order to
         | fail to reject the null hypothesis at p < 0.05. Assuming you're
         | just doing a t test (which isn't really appropriate tbh).
         | 
         | If you bought just ten tickets you would have a p value below
         | 0.0000001
         | 
         | And that makes sense, because a p value of 0.01 says the
         | probability of getting a sample this far from the null
         | hypothesis is less than 1 in a million by random chance...
         | which is what happened when you got the extremely unlikely but
         | highly profitable answer.
         | 
         | edit: post was edited making this seem out of context...
        
       ___________________________________________________________________
       (page generated 2021-09-14 23:00 UTC)