[HN Gopher] Does X cause Y? An in-depth evidence review (2021)
___________________________________________________________________
Does X cause Y? An in-depth evidence review (2021)
Author : l0b0
Score : 218 points
Date : 2025-02-14 06:14 UTC (16 hours ago)
(HTM) web link (www.cold-takes.com)
(TXT) w3m dump (www.cold-takes.com)
| uniqueuid wrote:
| Oh what fun to discover the horror of causality!
|
| For some areas of research, truly understanding causality is
| essentially impossible - if well-controlled experiments are
| impossible and the list of possible colliders and confounders is
| unknowable.
|
| The key problem is that _any_ causal relation can be an illusion
| caused by some other, unobserved relation!
|
| This means that in order to show fully valid causal effect
| estimates, we need to
|
| - measure precisely
|
| - measure all relevant variables
|
| - actively NOT measure all harmful (i.e. falsely correlated)
| variables
|
| I heartily recommend the book of why [1] by Pearl and Mackenzie
| for a deeper reading and the "haunted DAG" in McElreath's
| wonderful Statistical Rethinking.
|
| [1] https://en.wikipedia.org/wiki/The_Book_of_Why
| kqr wrote:
| Pearl's _Causality_ is very high on my "re-read while making
| flashcards" list. It is depressing how hard it is to establish
| causality, but also inspiring how causality can be teased out
| of observational statistics _provided one dares assume a model
| on which variables and correlations are meaningful_.
| uniqueuid wrote:
| "provided one dares assume ..." - that's a great quote which
| I'll steal in the future if you allow!
|
| Most things we learn about DAGs and causality are
| frustrating, but _simulating_ a DAG (e.g. with lavaan in R)
| is a technique that actually helps in understanding when and
| how those assumptions make sense. That 's (to me) a key part
| of making causality productive.
| KempyKolibri wrote:
| I've heard Miguel Hernan's "What If" is also excellent, but not
| got round to reading it.
| uniqueuid wrote:
| Yes it's great!
|
| There is also this great book on causality in ML, but it's a
| much heavier read:
|
| Chernozhukov, V., Hansen, C., Kallus, N., Spindler, M., &
| Syrgkanis, V. (2025). Causal Inference with ML and AI.
| levocardia wrote:
| For a lighter introduction to Hernan's ideas check out:
|
| "The C-Word: Scientific Euphemisms Do Not Improve Causal
| Inference From Observational Data"
| (https://pmc.ncbi.nlm.nih.gov/articles/PMC5888052/)
|
| "Does water kill? A call for less casual causal inferences"
| (https://pmc.ncbi.nlm.nih.gov/articles/PMC5207342/)
| alexpetralia wrote:
| I have reflected on a good definition of causality and would be
| curious if anyone has thoughts or critiques of it. I am
| repasting part of my essay below.
| (https://alexpetralia.com/2023/02/25/statistics-only-gives-
| co...)
|
| --
|
| Can we nevertheless extract causality from correlation?
|
| I would argue that, theoretically, we cannot. Practically
| speaking, however, we frequently settle for "very, very
| convincing correlations" as indicative of causation. A
| correlation may be persuasively described as causation if three
| conditions are met:
|
| Completeness: The association itself (R2) is 100%. When we
| observe X, we always observe Y.
|
| No bias: The association between X and Y is not affected by a
| third, omitted variable, Z.
|
| Temporality: X temporally precedes Y.
| kqr wrote:
| I feel like you have this backwards. In the assignment Y:=2X,
| each unit of Y is caused by half a unit of X. In the game
| where we flip a coin at fair odds, if you have increased your
| wealth by 8x in 3 tosses, that was caused by you getting
| heads every toss. Theoretically establishing causality is
| trivial.
|
| The problem comes when we try to do so practically, because
| reality is full of surprising detail.
|
| > No bias: The association between X and Y is not affected by
| a third, omitted variable, Z.
|
| This is, practically speaking, the difficult condition. I'm
| not so convinced the others are necessary (practically
| speaking, anyway) but you should read Pearl if you're into
| this!
| uniqueuid wrote:
| You are missing one crucial additional condition:
|
| - No colliders have been included in the analysis, which
| would _introduce_ appearance of causality that does not exist
| HPsquared wrote:
| Ruling out all Z is the almost-impossible part. It's hard to
| prove a negative, especially with incomplete information.
| stonemetal12 wrote:
| What of the double slit experiment, where observation changes
| the outcome? Do we call observation the cause of the outcome?
| uniqueuid wrote:
| In general you assume DAGs, i.e. non-cyclical causality.
| Cyclical relations must be resolved through distinct
| temporal steps, i.e. u_t0 causes v_t1 and v_t1 causes u_t2.
| When your measurement precision only captures simultaneous
| effects of both u on v _and_ v on u you have a problem.
| dan_mctree wrote:
| You probably also need at least: - Y does not appear when X
| does not - We need an overwhelming sample size containing
| examples of both X and not X - The experiment and data
| collection and trivially repeatable (so that we don't need to
| rely on trust) - The experiment, data collection and analysis
| must be easy to understand and sensible in every way without
| leaving room for error
|
| And as another commenter already pointed out: You can't
| really eradicate the existence of an unknown Z
| currymj wrote:
| even if you hit all the assumptions you need to make
| Pearl/Rubin causality work, and there is no unobserved factor
| to cause problems, there is still a philosophical problem.
|
| it all assumes you can divide the world cleanly into variables
| that can be the nodes of your DAG. The philosopher Nancy
| Cartwright talks about this a lot, but it's also a practical
| problem.
| shadowgovt wrote:
| And this is even before we get into the philosophical /
| epistemological questions about "cause."
|
| You can make the argument, from correlative data, that bridges
| and train tracks cause truck accidents. And more importantly,
| if you _act like they do_ when designing roadways, you
| _actually will_ decrease truck accidents. But it 's a common-
| sense-odd meaning of causality to claim a stationary object is
| acting upon a mobile object...
| QuantumGood wrote:
| That colliders and confounders have technical definitions is
| not known by some:
|
| ------------------ Confounders ------------------
|
| A variable that affects both the exposure and the outcome. It
| is a common cause of both variables.
|
| Role: Confounders can create a spurious association between the
| exposure and outcome if not properly controlled for. They are
| typically addressed by controlling for them in statistical
| models, such as regression analysis, to reduce bias and
| estimate the true causal effect.
|
| Example: Age is a common confounder in many studies because it
| can affect both the exposure (e.g., smoking) and the outcome
| (e.g., lung cancer).
|
| ------------------ Colliders ------------------
|
| A variable that is causally influenced by two or more other
| variables. In graphical models, it is represented as a node
| where the arrowheads from these variables "collide."
|
| Role: Colliders do not inherently create an association between
| the variables that influence them. However, conditioning on a
| collider (e.g., through stratification or regression) can
| introduce a non-causal association between these variables,
| leading to collider bias.
|
| Example: If both smoking and lung cancer affect quality of
| life, quality of life is a collider. Conditioning on quality of
| life could create a biased association between smoking and lung
| cancer.
|
| ------------------ Differences ------------------
|
| Direction of Causality: Confounders cause both the exposure and
| the outcome, while colliders are caused by both the exposure
| and the outcome.
|
| Statistical Handling: Confounders should be controlled for to
| reduce bias, whereas controlling for colliders can introduce
| bias.
|
| Graphical Representation: In Directed Acyclic Graphs (DAGs),
| confounders have arrows pointing away from them to both the
| exposure and outcome, while colliders have arrows pointing
| towards them from both the exposure and outcome.
|
| ------------------ Managing ------------------
|
| Directed Acyclic Graphs (DAGs): These are useful tools for
| identifying and distinguishing between confounders and
| colliders. They help in understanding the causal structure of
| the variables involved.
|
| Statistical Methods: For confounders, methods like regression
| analysis are effective for controlling their effects. For
| colliders, avoiding conditioning on them is crucial to prevent
| collider bias.
| lenzm wrote:
| If you have to start with apologies then you know, just stop
| and don't post.
| QuantumGood wrote:
| Sure, but someone else did this for me, using AI, I found
| it useful to scan in the moment. I appreciated it and
| upvoted it.
|
| Like that experience, this was meant as a scannable
| introduction to the topic, not an exact reference. Happy to
| hear altenative views, or downvote to give herding-style
| feedback.
|
| Had I done a short AI-generated summary, it would have been
| a bit less helpful, but there wouldn't have been downvotes.
|
| Had I linked instead of posted the same AI explanation,
| there would have been no or fewer downvotes, because many
| wouldn't click, and some of those that did would find it
| helpful.
|
| Had I linked to something else, many would not click and
| read without a summary, both of which could have been AI-
| created.
|
| I chose to move on and accept a few downvotes. The votes
| count less than the helpfulness to me. Votes don't mean it
| helps or doesn't. Many people accept confusion without
| seeking clarification, and appreciate a little help.
|
| Although I personally do tend to downvote content-free
| unhelpful Reddit-style comments, I'm not overly fond of
| trying to massage things to help people manage their
| feelings when posts are only information, with no framing
| or opinion content. I understand that there is value in
| downvotes as herding-style feedback (as PG has pointed
| out). Yes, I've read the HN guidelines.
|
| I think beyond herding-style feedback downvotes, AI info
| has become a bit socially unacceptable--okay to talk about
| it but not share it. But I find AI particularly useful as
| an initial look at information about a domain, though not
| trustworthy as a detailed source. I appreciate the
| footnotes that Perplexity provides for this kind of usage
| that let me begin checking for accurate details.
| dan_mctree wrote:
| And even if you do know there's causality (eg: the input
| variable X is part of software that provides some output Y),
| the exact nature of the causality can be too complex to analyze
| due to emergent and chaotic effects. It's seldom as simple as:
| an increase in X will result in an increase in Y
| aqueueaqueue wrote:
| So, Bayesian or Frequentist?
| uniqueuid wrote:
| Funnily enough that hardly matters here.
|
| Causality is a largely _orthogonal_ problem to frequentist
| /bayesian - it makes everything harder, not just one of those!
| procaryote wrote:
| Causality at least correlates with a lot of problems
| uniqueuid wrote:
| Yeah but in this case it's really a wrong way to think
| about it.
|
| If you have a DAG based on wrong assumptions, it doesn't
| matter whether you get a point estimate based on null
| hypothesis thinking or whether you get a posterior
| distribution based on some prior. The problem is that the
| way in which you combine variables is wrong, and bayesian
| analysis will just be more detailed and precise in being
| wrong.
| GuB-42 wrote:
| Does frequentist/bayesian matters to anything but quasi-
| religious beliefs?
|
| I mean, that's maths, either approach has to give the same
| results, as they come from the same theory. The Bayes theorem
| is just a theorem, use it explicitly or not, the numbers will
| be the same because the axioms are the same.
| uniqueuid wrote:
| No, they are linked to beliefs (like anything else), but
| the canonical forms do differ a lot. Most importantly:
|
| - bayesian methods give you posterior distributions rather
| than point estimates and SEs
|
| - bayesian methods natively offer prior and posterior
| predictive checks
|
| - with bayesian methods, it's evidently easier to combine
| knowledge from multiple sources, which null-hypothesis
| testing struggles with (best way is probably still meta-
| analyses)
| Temporary_31337 wrote:
| And don't even get me started on A leading to B
| aqueueaqueue wrote:
| The old headline: B happens as A happens.
|
| Baby boom as solar panels sales skyrocket.
| skirge wrote:
| Most important factor on results of research are personal
| beliefs, especially in "economics".
| laurentlb wrote:
| On a similar note, I enjoyed watching the video:
| https://youtu.be/mQ56uOkjccg?si=1hpwGqv2dQqLQ-ME (by Nutrition
| Made Simple!)
|
| It takes a specific topic (here, health effects of red meat) and
| explains how each type of study can provide information, without
| proving anything. It helped me a lot understand the science
| related to nutrition, where you never have perfect studies.
| KempyKolibri wrote:
| Dismissing all observational study designs out of hand because
| they can be difficult and easy to perform poorly seems like quite
| the take.
|
| I see this all the time in people's interpretation of nutrition
| research, and they do exactly as this article suggests and fall
| back to the "intuitive option", and go onto some woo diet that
| they eventually give up because they start feeling awful.
|
| I would disagree that observational study designs should be
| thrown out the window or that it makes sense to, as this article
| seems to do, lump cross-sectional ecological data in with
| prospective cohort studies.
|
| Things often "make intuitive sense" only because of these study
| designs. We used to get kids to smoke pipes to stave off chest
| infections because it made "intuitive sense" and it's only
| because of observational studies that we now believe smoking
| causes lung cancer.
|
| The direction of evidence from prospective cohort studies to RCTs
| in the field of nutrition science on intake vs intake shows a 92%
| agreement. If we take RCTs to be the "gold standard" of evidence
| that best tracks with reality, it seems a little odd that these
| deeply flawed observational studies that we should apparently
| disregard seem to do such a good job coming to the correct
| conclusions.
|
| https://bmcmedicine.biomedcentral.com/articles/10.1186/s1291...
| mistercow wrote:
| > We used to get kids to smoke pipes to stave off chest
| infections because it made "intuitive sense" and it's only
| because of observational studies that we now believe smoking
| causes lung cancer.
|
| This is an interesting example, because I don't know of any
| studies (although there probably are some, if only old ones)
| specifically about whether smoking pipes staves off lung
| infections, but the "intuitive sense" answer has changed
| because of adjacent evidence. And in this case, it's not the
| lung cancer evidence that makes it intuitively unlikely that
| pipe smoking would be helpful, but a broader understanding of
| what causes lung infections, and what tobacco smoke contains
| and doesn't contain.
| KempyKolibri wrote:
| I think that's a fair critique! Probably would have been
| better to say that the intuitive position was that smoking
| was unrelated to lung cancer.
| derbOac wrote:
| It's important to be thoughtful about research interpretation,
| but I'm kind of tired of kneejerk dismissal of observational
| studies for a couple of reasons.
|
| First, experiments have their own varieties of horrors. Many
| are small N, with selective data reporting, and lack external
| validity -- that is, the thing you really want to randomize is
| difficult or impossible to randomize, so researchers randomize
| something else as a proxy that's not at all the same. Other
| times there's complex effects that distort the interpretation
| of the casual pathway implied by the experiment.
|
| Second, sometimes it's important to show that any association
| exists. There are cases where it's pretty clear an association
| is non-existent, based on observational data and covariate
| analysis. You just don't hear about those because people stop
| talking about them because of the null effects. So there's a
| kind of survivorship bias in the way results are discussed,
| especially in the popular literature.
|
| It's easy to handwave about limitations of studies, it's much
| harder to create studies that provide evidence, for logical,
| practical, and ethical reasons. Why you'd want _less_
| information about an important phenomenon isn 't clear to me.
| not_kurt_godel wrote:
| It is quite the take indeed; one that I posit resonates most
| strongly with people whose societal views tend to conflict with
| the available evidence.
| mnky9800n wrote:
| In my own research we are investigating how fluids cause changes
| in rocks that allow for mineralization of CO2 and have such
| problems of confounding variables (not terribly unique I
| suppose). One thing we note is that, well, fluid comes from the
| sky and goes into the ground. Thus, the deeper you go, the less
| fluid there is since the pathways from the sky to deep into the
| ground become more sparse as well as needing higher pressures to
| enter these regions to either overcome capillary pressures in
| existing fracture zones or to literally break the rock (which is
| highly unlikely using naturally occuring pressures from fluids
| from the sky). And so, literally everything in all the data sets
| correlates with depth in some way. But in what way? well this has
| many dependencies as well, did the rock that absorbed some of the
| fluids grow in volume because of a chemical change? are the fluid
| pathways currently connected? What kind of rock is absorbing the
| fluids? Are microbes in the fluid absorbing contents from the
| fluid that would otherwise be used for rock changes? and so you
| are left with this giant pile of data (tens of terabytes) without
| a clear connection between fluid and rock interactions except
| that there is less fluids from the sky the deeper you go into the
| rock. This is obvious, however it is also rather unhelpful when
| trying to understand the other processes that exist. Of course
| you might say, have you tried detrending your data? And the
| answer is yes and to no effect. The simple truth is that this
| depth dependency interacts in different ways with different
| systems and there is no easy way to figure out how it does for
| each sub-system such as the fluid rock chemistry interactions,
| the rock fracture mechanics, the subsequent methane and hydrogen
| that is produced and likely consumed by microbes, etc.
| whatshisface wrote:
| Have you tried checking to see if the depth dependency is
| different in different large-scale geological regions?
| epidemiology wrote:
| In introductory epidemiology courses you'll usually get the
| Bradford Hill criteria in the first week or two, which gives a
| good foundation of determining public health causality. After
| digging deeper, the entire field of causal inference is revealed.
|
| A healthy respect for the difficulties of determining causality
| is beneficial. Irrational skepticism ignoring the evidence of
| strong observational research simply replaces it with... what
| exactly? That's how we ended up with an 71 year old anti-vaccine
| conspiracist as the health secretary.
| jtrn wrote:
| As a clinical psychologist, I find it increasingly frustrating to
| sift through research studies that fail to meet even the most
| basic standards of scientific rigor. The sheer volume of studies
| that claim "X is linked to Y" without properly addressing the
| correlation-versus-causation fallacy is staggering. It's not just
| an oversight--it's a fundamental flaw that undermines the
| credibility and utility of psychological research.
|
| If a study is publicly funded, there should be a minimum
| requirement: it must include at least two research arms--one with
| an experimentally manipulated variable and a proper control
| condition. Furthermore, no study should be considered conclusive
| until its findings have been successfully replicated,
| demonstrating a consistent predictive effect. This isn't an
| unreasonable demand; it's the foundation of real science. Yet, in
| clinical psychology, spineless researchers and overly cautious
| annd/or power crazed ethics committees have effectively neutered
| most studies into passive, observational, and ultimately useless
| exercises in statistical storytelling.
|
| And for the love of all that is scientific, we need to stop the
| obsession with p-values. Statistical significance is meaningless
| if it doesn't translate into real-world impact. Instead of
| reporting p-values as if they prove anything on their own,
| researchers should prioritize effect sizes that demonstrate
| meaningful clinical relevance. Otherwise, we're left with a field
| drowning in "statistically significant" noise--impressive on
| paper but useless in practice.
| gloomyday wrote:
| Obsessing with p-values while at the same time shunning
| replication studies and studies with negative results is a
| catastrophe. It causes everyone to be confidently wrong way
| more often than one would think at first.
|
| What worth is a result with p<0.01 when the 10 previous
| articles with negative results were never actually written?
| parpfish wrote:
| another contributing factor is that in psychology (and
| possibly other fields), it's very hard to make a career doing
| rigorous, incremental science that results in confident
| outcomes because each step along the way, people just way
| "yeah, sounds about right".
|
| to make a career, you need to discovering quirky
| counterintuitive findings that can be turned into ted talks
| and 'one weird trick' clickbait. you become a big deal once
| you start providing fodder for the annoying "well,
| actually..." guy to drop on people at a dinner party/reddit
| comment section.
| zkmon wrote:
| There is no causality, what so ever. The perceived causality is
| built backwards, only to make something appear sensible. Every
| event in this universe contributes as a cause to every other
| event in the universe. It's like fluid flow. Every molecule of
| the fluid affects the movement of every other molecule. The world
| evolves in a fluid motion, not through isolated causal chains.
| adrian_b wrote:
| Tell that to one who gives you a punch in the face, that there
| is no causal relationship between his desire to punch you and
| your bloody nose :-)
| zkmon wrote:
| They just get a couple of harder punch back. But you missed
| the point in your rush to make a dramatic comment. It's not
| about how someone would interpret the causality or how they
| react. It is about about how a set of events can't be
| considered as an isolated system of chain of causally related
| things, disconnected from other things. If you like to think
| about it in terms of punches, I think you would get lots of
| them.
| bowsamic wrote:
| The Scottish man still speaks it seems
| Matumio wrote:
| If you read the mathematical theory of causality (e.g. Pearl),
| you'll learn that you must have the ability to make
| interventions "from outside" (at least in theory) before you
| can talk about causality. You have to define what is "inside"
| the system you study.
|
| If you define everything to be "inside", then causality
| disappears because intervention disappears.
| Kaotique wrote:
| I think a lot of these kinds of studies are not really about
| objectively studying a phenomenon but trying to prove a
| predetermined point. The study is designed and adjusted until it
| proves what it should prove. Then it's wrapped in a nice news
| headline which goes away with all the details and subtleties and
| used for political or economic gain. Reproducing the results is
| not interesting and not funded. Other studies are then using
| these results as sources to stack the house of cards even higher.
| I think this does a lot of harm to science as a whole because a
| lot of people disregard all scientific results as a result.
| nkoren wrote:
| Yeah, sadly, I think it's worth having "ulterior motives" on
| the list.
|
| One of the first time I got interested in reading medical
| studies was when I saw a bunch of headlines announcing that a
| randomized controlled trial had proved that echinacea was
| ineffective for treating respiratory problems. This surprised
| me, because I'd always been a dogmatic drinker of echinacea tea
| whenever I had a cold, and had thought that it helped. But then
| again, I come from a culture of damn dirty hippies, so I was
| open to being wrong about it. Rather than rely on the
| headlines, I decided to dig up the study itself.
|
| Here's what the study actually found: that rubbing an
| echinacea-infused ointment on your wrists has no effect on
| respiratory health.
|
| Er... yeah, no shit, Sherlock. Literally nobody uses echinacea
| that way. You've just falsified a total straw-man of a
| hypothesis, and based on the number of headlines generated off
| the back of this, I think it's reasonable to presume there was
| some kind of funded apparatus for disseminating that bogus
| result.
|
| Ever since then, I've learned not to trust the headlines when
| it comes to trials, reserving judgment until I've looked at the
| methodology. When I do, a lot come up short.
| kridsdale1 wrote:
| I've gotten in the habit of sending study pdf files to
| Claude, having it write its own Abstract and headline from
| the rest of the content, then comparing those to the
| "organic" Abstract and headline.
| Xcelerate wrote:
| This is exactly what's going on in many situations. For any
| proposed study, you can ask the question "Is there a possible
| outcome of this study that would have a strong emotional effect
| on someone?" If the answer is "yes", then I'd say it's more
| likely than not that the study's results are already
| compromised in some subtle way.
| HPsquared wrote:
| Like a lot of other noble pursuits, scientific enquiry can be
| corrupted by money.
| daoboy wrote:
| It's layers of abstraction all the way down the light cone.
|
| The causality is always present, we just don't have the
| processing power to ensure with 100% certainty that all relevant
| factors are accounted for and all spurious factors dismissed.
| winternewt wrote:
| This is what I miss for important subjects: an actual
| ambitious, reductionist approach where in-depth cause-effect
| analysis is performed for each individual sample.
| Cappor wrote:
| The question of whether X can cause Y remains open and requires
| further research. The article highlights the importance of
| thoroughly checking sources and methodology to draw clear
| conclusions. This is an important step towards a deeper
| understanding of such relationships.
| gns24 wrote:
| "A study using a complex mathematical technique claiming to
| cleanly isolate the effect of X and Y. I can't really follow what
| it's doing..."
|
| This is a frustrating type of issue. Dismissing something with "I
| don't understand this, but I don't believe it" isn't the sort of
| thing I want to be doing. However, I don't have any desire to
| waste time trying to understand what someone has done (and did
| they really understand what they were doing themselves?) when
| it's clear that the effect isn't cleanly isolated in the data and
| no amount of mathematics is going to change that.
| sujumayas wrote:
| Am I the only one thinking through the reading of this: "Wait a
| minute... isn't this article some kind of weak X then Y also?
| Observation of many cases, with generalized causality concludes
| that he just feels like x should cause Y? hahaha. Love the
| article btw.
| spacebanana7 wrote:
| I disagree strongly with this mathematised notion of causality.
| Two things can be perfectly correlated at all observed points in
| history without necessarily being causal. There can always be
| some unknown variable driving change in both.
| ekianjo wrote:
| Or they can also not be related at all and just happen by pure
| coincidence.
| ibeff wrote:
| That's what the author deals with in the first part of the
| article on observational studies. Randomized studies don't have
| that problem.
| talkingtab wrote:
| I am slowly becoming convinced that studies are in fact cargo-
| cultism. And there are many, many studies that confirm this.
|
| But about causality. Long ago (old cars) I had a friend who told
| me that most mornings his car would not start until he opened the
| hood and wrapped some wires with tape (off with the old tape on
| with the new). Then the car would start. Every now and then it
| would take two wraps. Hmmm.
|
| After he demonstrated this, I decided to try to help. I followed
| the wires that were wrapped. Two of them. To my surprise they
| were not connected at either end. This was insane, and yet his
| study - and my own observation - demonstrated that wrapping these
| two wires which were completely disconnected caused his car to
| start. Now there is causality for you.
|
| Except that if you have a more complex model of cars, there is a
| sane explanation. Again this is an old car with a carburetor. In
| case you don't know this is a little bowl of gas it that provides
| a combustible mix of air and gas. If there is too much gas then
| your car won't work. The mix is controlled by a little float that
| controls the level of gas in the little bowl. Toilet bowls work
| on the same principle.
|
| If your float is bad (or other issues) your car engine would get
| too much gas - be "flooded" and you have to wait until much of it
| evaporates. So if you flood your car engine, go and wrap some
| wires, it may be that your car will start right up.
|
| So I rebuilt the carburetor and my friend never had that problem
| again.
|
| The moral of the story is that I had better "model" of how cars
| work. But in the back of my mind I am aware that my model may be
| or have been just as deficient. Did you know that we are
| bombarded from space by an unknown type of neutrino that stops
| electricity from working unless there is a little pool of some
| liquid nearby or it is Thursday. I am going to do a study of
| this.
|
| There are very good reasons to understand how frail our ability
| to understand causality is. And we are talking simple things
| here. The scientific method is about EXPERIMENTS. Yes, I did that
| in bold. Doing things. We have deeply complex situations we need
| to understand and in my opinion, studies do not help.
| gwern wrote:
| > After he demonstrated this, I decided to try to help. I
| followed the wires that were wrapped. Two of them. To my
| surprise they were not connected at either end. This was
| insane, and yet his study - and my own observation -
| demonstrated that wrapping these two wires which were
| completely disconnected caused his car to start. Now there is
| causality for you.
|
| You didn't show causality, though. You never randomized
| anything. His study and your observation was purely
| observational. At no point did you open the hood, get ready to
| wrap the wires, and flip a coin to decide whether to wrap the
| wires or do a placebo wrapping somewhere else.
|
| Had you done that, you would have found, per your ultimate
| explanation, that the wrapping made no causal difference: you
| did the procedure, and either way, the car turned on. Hence,
| there is no causality for you.
| bwfan123 wrote:
| imo, The idea of a cause is a logical concept of containment
| when used in theories. A causes B means the phenomena
| represented by A implies the phenomena represented by B. So,
| causation is a device of our symbolization and understanding
| of the world rather than anything fundamentally out there.
| this is of course a controversial view.
|
| Causality eventually demands a "theory" for full explanatory
| power and understanding. Theories have premises, involve
| inference, and have predictions. Otherwise, we get ad-hoc
| models of phenomena via observations which is a great start,
| but ends up as an oversimplification. X causes Y but, what
| caused X or why did X cause Y and not Z ? models represent
| phenomena while theories explain them. we start with models,
| and then our curiosity eventually leads to a theory. refer
| [1] for a great read from a physicist turned quant.
|
| [1] https://www.amazon.com/Models-Behaving-Badly-Confusing-
| Illus...
| yccs27 wrote:
| If I understand it correctly, they randomly decided to try
| starting the car immediately or to go wrap the wires first.
| This absolutely demonstrates a causality, they just didn't
| cleanly separate the different factors which changed.
|
| Your comparison to placebo is very apt: Giving medication to
| a patient (vs not giving anything) _causes_ them to get
| better, but it might be the "giving a pill" part instead of
| the "ingesting medication" part that matters.
| Retric wrote:
| > The scientific method is about EXPERIMENTS
|
| IMO the model of that story is the S at the end of experiments
| is more than just repeating the same things. Fixing the
| carburetor was the second and vastly more informative
| experiment, but your friend could have tried various
| alternatives to doing exactly what he was doing which would
| then uncover the time component.
|
| Science digs into problems, so the most important part of meta
| analysis, which is often ignored, is asking if the question
| even makes sense in a generic context. Just as crucial is
| narrowing down the effect size which may be dwarfed by random
| noise in some experiments etc.
| ramon156 wrote:
| Doesn't this heavily apply to building software as well? e.g.
| instead of spray and pray development we should get a better
| understanding of the model we're working with.
|
| If my parser gets null's when it should be non-null then I
| first need to find where they could potentially even come from,
| aka get a better understanding of the model I'm working with.
| schneems wrote:
| I liken this to the experience of playing an old school
| fighting game in the era before the internet. You would be
| mashing buttons when suddenly your character would do a power
| move. Then spend the rest of the day trying to figure out how
| to reproduce it.
|
| If you could reproduce it, it would usually be intermittent.
| Eventually you would learn "when I X then my character will Y,
| but only sometimes.
|
| This is due to the real command being a subset or being a
| slight variation of what you thought was correct that you
| accidentally do sometimes.
|
| Even when it's ephemeral and seemingly random I still find
| these things valuable. It's better to be able to reproduce it
| sometime instead of never. Answering the question "is doing
| this better than random?" (P95) can help you throw away a bad
| hypothesis. Most people don't realize that when they are
| providing evidence for causality they are competing with
| random. If they had instead done jumping jacks or said a prayer
| to the engine gods X times, then the correlation between the
| wires and the engine might suddenly seem much weaker.
|
| Once you have one hypothesis you can test it against others and
| I believe that's powerful. Provided it's done systemically and
| with at least a mild understanding of probability and error.
| Also a hypothesis without a theory first scientific. Why did
| your friend wrap the wires to begin with?
|
| It's okay to act in random until we find some effect, but then
| we also need to take the time to roll back (as you did) to ask
| "WHY did this happen?" In which case you can begin the process
| with a fresh hypothesis.
|
| I feel when we are taught the scientific method in elementary
| school it doesn't stick for most of us, even engineers.
| Especially non-engineer folks. It seems at first blush like
| some truisms strung together, but that simplicity hides very
| powerful capabilities and subtle edge cases.
| shadowgovt wrote:
| I think the most fascinating thing about the practice of
| science (and this is one of those things I wish I'd realized
| sooner when learning physics) is that experimental evidence
| often outstrips theory.
|
| There are all manner of observable, reproducible behaviors in
| nature that we barely have an explanation of. Those things
| remain observable and reproducible whether we can tell a tidy
| story about why they happen.
|
| In a very meaningful sense, the local healer applying poultices
| formulated from generations of experimentation is using science
| much as the medical doctor is (assuming, of course, they're
| taking notes, passing on the discoveries, and the results are
| reproducible). The doctor having tied their results to the
| "germ theory of medicine" vs. the local healer having tied
| theirs to "the Earth Mother's energies impregnate the wound" is
| an irrelevant distinction until (and unless) a need comes along
| to unify the theory to some other observable outcomes.
| atombender wrote:
| That's true for simple things. You don't need to know what
| the pharmacological mechanism of COX inhibitors is in order
| to prescribe Advil for a headache. But if you're a scientist
| trying to make a better Advil you probably need to know how
| it works.
|
| Doctors routinely prescribe medications that have no
| randomized clinical trials supporting their use. In those
| cases, clinical experience replaces trial data; they "know"
| the drugs work because all the patients have effectively been
| trial subject over a span of decades.
| daft_pink wrote:
| After reading this article, it would be really interesting to
| have a study on whether they can do research to indicate when
| correlation == causation and when correlation != causation for
| any given study and what the factors and a tool so we can have a
| simple risk assessment on whether there is a link or not.
| BugsJustFindMe wrote:
| > _Now, a lot of these studies try to "control for" the problem I
| just stated - they say things like "We examined the effect of X
| and Y, while controlling for Z [e.g., how wealthy or educated the
| people/countries/whatever are]." How do they do this? The short
| answer is, well, hm, jeez._
|
| You mean they don't cluster the data into sets of overlapping
| bins where the controlled attribute has approximately the same
| value and then look for the presence of an XY relationship within
| the bins instead of across them?
| Sniffnoy wrote:
| No. What they actually do is that they do a regression with
| both X and Z among the independent variables, and then look
| solely at the coefficients coming from X. (As mentioned in the
| article.) Including Z as a dependent variable alongside X
| "controls for" it in that now the coefficients for X are
| supposed to not include any effect from Z (since any Z effect
| should go in the Z coefficients). How well this works is
| something I don't know enough to answer.
|
| I don't actually know how the method you suggest compares in
| the limit of finer bins. It's possible it might only achieve
| similar results?
| KempyKolibri wrote:
| The smaller bins approach is adjustment via stratification.
|
| Good primer on both here:
| https://www.mynutritionscience.com/p/statistical-adjustment
| einpoklum wrote:
| > _I have to say, this all was simultaneously more fascinating
| and less informative than I expected it would be going in._
|
| Direct quote from the author of this post and I couldn't agree
| more, particulartly about the post itself.
| thenoblesunfish wrote:
| As with many things, just understand what you are trying to do.
|
| If you want to _predict_ Y and you know X, you can use data that
| tell you when they happen together.
|
| If you are trying to _cause_ (or prevent) Y, it 's harder. If you
| can't do experiments (e.g. macroeconomics), it's borderline
| impossible.
| m3kw9 wrote:
| so if we have a scenario where we have data points where when X
| ball moves white ball also moves, but we're missing some direct
| evidence where they actually hit each other or not. But they
| correlate from the limited sample. I think this is what most
| correlations are like, we do not see the direct atoms causing the
| causation, only a probability
| Chance-Device wrote:
| Well, of course the conclusion is that you don't know, Mr.
| Author. Because the very thing that triggered your interest in
| the subject of X and Y was that there was no clear cut consensus
| on the subject. If there were, you wouldn't have needed to do
| research at any level of depth at all, because those findings
| would be well known, and you'd have found them easily through a
| simple web search.
|
| Instead you were drawn to a topic which seemed ambiguous, which
| had multiple possible interpretations, multiple plausible angles,
| and on which nobody could agree. You didn't explicitly know these
| things starting out, but they were embedded in the very
| circumstances which caused you to investigate the subject
| further.
|
| Yes, determining causation is sometimes hard, is it also
| sometimes very easy. However, very easy answers are not
| interesting ones, and so we find ourselves here.
| HPsquared wrote:
| Nice hypothesis, but how do we prove it?
| levocardia wrote:
| Seems very dismissive and unaware of recent advances in causal
| inference (cf other comments on Pearl). Putting "throw the
| kitchen sink at it" regression a la early 2000s nutritional
| research (which is indeed garbage in garbage out) in the same
| category as mendelian randomization, DAGs, IP weighting, and
| G-methods is misleading. I do worry that some of these EA types
| dive head-first into a random smattering of google scholar
| searches with no subject matter expertise, find a mess of
| studies, then conclude "ah well, better just trust my super
| rational bayesian priors!" instead of talking with a current
| subject matter expert. Research -- even observational research --
| has changed a lot since the days of "one-week observational study
| on a few dozen preschoolers."
|
| A more general observation: If your conclusion after reading a
| bunch of studies is "wow I really don't understand the fancy math
| they're doing here" then _usually_ you should do the work to
| understand that math before you conclude that it 's all a load of
| crap. Not always, of course, but usually.
| Recursing wrote:
| > I do worry that some of these EA types dive head-first into a
| random smattering of google scholar searches with no subject
| matter expertise, find a mess of studies, then conclude "ah
| well, better just trust my super rational bayesian priors!"
| instead of talking with a current subject matter expert.
| Research -- even observational research -- has changed a lot
| since the days of "one-week observational study on a few dozen
| preschoolers."
|
| EA types spend a lot of time talking with subject matter
| experts, see e.g.
| https://www.givewell.org/international/technical/programs/vi...
| t_mann wrote:
| We don't even need to go into the 2000's. The author openly
| dismisses Generalized Method of Moments (published in 1982 by
| Lars Hansen [0]) as a 'complex mathematical technique' that
| he's 'guessing there are a lot of weird assumptions baked into'
| it, the main evidence being that he 'can't really follow what
| it's doing'. He also admits that he has no idea what control
| variables are or how to explain linear regression. It's
| completely pointless trying to discuss the subtleties of how
| certain statistical techniques try to address some of his exact
| concerns, it's clear that he has no interest in listening,
| won't understand and just take that as further evidence that
| it's all just BS. This post is a rant best described as
| Dunning-Kruger on steroids, I have no idea how this got 200
| points on HN and can just advise anyone who reads here first to
| spare themselves the read.
|
| [0] edit: Hansen was awarded the Nobel Memorial Prize in
| Economics in 2013 for GMM, not that that means it can't fail,
| but clearly a lot of people have found it useful.
| MichaelDickens wrote:
| I think you are significantly misrepresenting what the author
| said. He didn't say he has no idea what control variables
| are. What he said is:
|
| > The "controlling for" thing relies on a lot of subtle
| assumptions and can break in all kinds of weird ways.
| Here's[1] a technical explanation of some of the pitfalls;
| here's[2] a set of deconstructions of regressions that break
| in weird ways.
|
| [1] https://journals.plos.org/plosone/article?id=10.1371/jour
| nal...
|
| [2] https://www.cold-takes.com/phil-birnbaums-regression-
| analysi...
|
| To me this seems to demonstrate a stronger understanding of
| regression analysis than 90+% of scientists who use the
| technique.
| groby_b wrote:
| > He didn't say he has no idea what control variables are
|
| He did say exactly that.
|
| > They use a technique called regression analysis that, as
| far as I can determine, cannot be explained in a simple,
| intuitive way (especially not in terms of how it "controls
| for" confounders).
|
| That's about as /noideadog as you can get.
| roenxi wrote:
| That is unfair, he says...
|
| > "generalized method of moments" approaches to cross-country
| analysis (of e.g. the effectiveness of aid)
|
| Which is an entirely reasonable criticism. GMM is a complex
| mathematical process, wiki suggests [0] that it assumes data
| generated by a weakly stationary ergodic stochastic process
| of multivariate normal variables. There are a lot of ways
| that a real world data for aid distribution might be
| nonergodic, unstationary, generally distributed or even
| deterministic!
|
| Verifying that a paper has used a parameter estimation
| technique like that properly is not a trivial task even for
| someone who understands GMM quite well. A reader can't be
| expected to follow what the implications are from reading a
| study; there is a strong element of trust.
|
| [0]
| https://en.wikipedia.org/wiki/Generalized_method_of_moments
| hn_throwaway_99 wrote:
| Yeah, I found this article to be annoying AF, because it seemed
| to fall into the same traps that he's accusing these study
| authors of making in the first place. It seemed by the end of
| it he was just trying to yell "correlation is not causation!"
| but in an _even smarter_ "I am very smart" sort of way.
|
| E.g. I certainly found myself agreeing with his points about
| observational studies, and there are plenty of real-world
| examples you can point to where experts have been lead astray
| by these kinds of studies (e.g. alcohol consumption
| recommendations, egg/cholesterol recommendations, etc.)
|
| But when he talked about his reservations re "the wheat"
| studies, they seemed really weak to me and semi-bizarre:
|
| 1. Regarding "The paper doesn't make it easy to replicate its
| analysis." I mean, no shit Sherlock? The whole point is that it
| would be prohibitively expensive or unethical to carry out
| these real experiments, so we rely on these "natural"
| experiments to reach better conclusions.
|
| 2. "There was other weird stuff going on (e.g., changes in
| census data collection methods), during the strange historical
| event, so it's a little hard to generalize." First, this seems
| kind of hand-wavy (not all natural experiments have this
| issue), but second and more importantly, of course it's hard to
| "generalize" these kinds of experiments because their value in
| the first place is that they're trying to tease out one
| specific variable at a specific point in time.
|
| 3. The third bullet point just seemed like it could be
| summarized as "news flash, academics like to endlessly argue
| about shit."
|
| I think the fundamental problem when looking for "does X cause
| Y", is that in the real world these are complex systems: _lots_
| of other things cause Y too (or can reduce its chances), so you
| 're only ever able to make some statistical statement, e.g. X
| makes Y Z% more likely, on average. But even then, suppose
| there is some thing that could make Y Z% more likely among some
| specific sub-population, but make it some percent _less_ likely
| in another sub-population (not an exact analogy but my
| understanding is that most people don 't really need to worry
| about cholesterol in eggs, but a sub-population of people is
| very reactive to dietary cholesterol).
|
| Basically, it feels like the author is looking for some
| definitive, unambiguous "does X cause Y", but that's not really
| how complex systems work.
| fritzo wrote:
| I like this writing style with unbound variables. Reminds me of
| Maya Binyam's novel "Hangman", or Kafka's novels.
| dang wrote:
| Related. Others?
|
| _Does X cause Y? An in-depth evidence review_ -
| https://news.ycombinator.com/item?id=30613882 - March 2022 (3
| comments)
| groby_b wrote:
| "a technique called regression analysis that, as far as I can
| determine, cannot be explained in a simple, intuitive way
| (especially not in terms of how it "controls for" confounders)"
|
| That sounds very much like a skills issue. Because it can. You
| call out what you consider might be confounders as independent
| variables (covariates). You can then use regression analysis to
| estimate the individual contributions from each confounder, and
| control for them by essentially filtering out that contribution.
|
| Is reality harder than that? Yes. Much. The world of science
| isn't 9th grade math, sorry. You are not entitled to understand
| everything deeply with 5 minutes of mediocre effort.
| stickfigure wrote:
| I can't believe nobody has posted the obvious XKCD of relevance
| yet:
|
| https://xkcd.com/552/
| skyde wrote:
| is it only me or this completely miss all the recent research on
| causal inference using causal graphical model ?
___________________________________________________________________
(page generated 2025-02-14 23:01 UTC)