[HN Gopher] Seven basic rules for causal inference
___________________________________________________________________
Seven basic rules for causal inference
Author : RafelMri
Score : 174 points
Date : 2024-08-16 07:14 UTC (3 days ago)
(HTM) web link (pedermisager.org)
(TXT) w3m dump (pedermisager.org)
| lordnacho wrote:
| This is brilliant. The whole causal inference thing is something
| I only came across after university, either I missed it or it is
| a hole in the curriculum, because it seems incredibly fundamental
| to our understanding of the world.
|
| The thing that made be read into it was a quite interesting
| sentence from lesswrong, saying that actually the common idea
| that correlation does not imply causation is wrong. Now it's not
| wrong in the face-value sense, it's wrong in the sense that
| actually you can use correlations to learn something about
| causation, and there turns out to be a whole field of study here.
| Vecr wrote:
| When did you go to university? The terminology here came from
| Pearl 2000, and it probably took years and years after that to
| diffuse out.
| lordnacho wrote:
| I thought Pearl was writing from 1984 onwards?
|
| I was at university around the millennium.
| janto wrote:
| Causality (2000) made the topic accessible (to students and
| lecturers) as a single book.
| jerf wrote:
| "correlation does not imply causation is wrong"
|
| That's a specific instance of a more general problem in the
| "logical fallacies", which is that most of them are written to
| be true in an absolutist, Aristotelian frame. It is true that
| if two things are correlated you can not therefore infer a
| rigidly 100% chance that there is a causative relationship
| there. And that's how Aristotelian logic works; everything is
| either True or False and if there is anything else it is as
| most "Indeterminate" and there is absolutely, positively, no in
| betweens or probabilities or anything else.
|
| However, consider the canonical "logical fallacy":
| 1. A -> B. 2. B 3. Therefore, A.
|
| It is absolutely a logical fallacy in the Aristotelian sense.
| Just because B is there does not mean A is. However,
| _probabilistically_ , if you are uncertain about A, the
| presence of B _can_ be used to _update_ your expected
| probability of A. After all, this is exactly what Bayes ' rule
| is for!
|
| Many of the "fallacies" can be rewritten to be useful
| probabilistically, and aren't quite _as_ fallacious as their
| many internet devotees fancy.
|
| It is certainly reasonable to be "suspicious" about
| correlations. There often is a "there" there. Of course,
| whether you can ever figure out what the "there" is is quite a
| different question; https://gwern.net/everything really gets in
| your way. (I also recommend https://gwern.net/causality ).
|
| The upshot is basically 1. the glib dismissal that correlation
| != causation is, well, too glib and throws away too many things
| but 2. it is still true you still generally can't assume it
| either. The reality of the situation is exceedingly
| complicated.
| boxfire wrote:
| I liked the way Pearl phrased it originally. A calculus of
| anti-correlations implies causation. That makes the nature of
| the analysis clear and doesn't set of the classic minds alarm
| bells.
| cubefox wrote:
| Unfortunately this calculus is exceedingly complicated and
| I haven't even seen a definition of "a causes b" in terms
| of this calculus. One problem is that Pearl and others make
| use of the notion of "d-separation". This allows for
| elegant proofs but is hard to understand. I once found a
| paper which replaced d-separation with equivalent but more
| intuitive assumptions about common causes, but I since
| forgot the source.
|
| By the way, there is also an alternative to causal graphs,
| namely "finite factored sets" by Scott Garrabrant. Probably
| more alternatives exist. Though I don't know more about
| (dis)advantages.
| geye1234 wrote:
| I don't disagree with the substance of your comment, but want
| to clarify something.
|
| Lesswrong promulgated a seriously misleading view of Aristole
| as some fussy logician who never observed reality and was
| unaware of probability, chance, the unknown, and so on. It is
| entirely false. Aristotle repeats, again and again and again,
| that we can only seek the degree of certainty that is
| appropriate for a given subject matter. In the _Ethics_ ,
| perhaps his most-read work, he says this, or something like
| it, at least five times.
|
| I mention this because your association of the words
| "absolutist" and "Aristotelian" suggests your comment may
| have been influenced by this.
|
| ISTM that there are two entirely different discussions taking
| place here, not opposed to each other. "Aristotelian" logic
| tends to be more concerned with ontology -- measles causes
| spots, therefore if he has measles, then he will have spots.
| Whereas the question of probability is entirely
| epistemological -- we know he has spots, which may indicate
| he has measles, but given everything else we know about his
| history and situation this seems unlikely; let's investigate
| further. Both describe reality, and both are useful.
|
| So the fallacies are _entirely_ fallacious: I don 't think
| your point gainsays this. But I agree that, _to us_ , B may
| suggest A, and it is then that the question of probability
| comes into play.
|
| Aquinas, who was obviously greatly influenced by Aristotle,
| makes a similar point somewhere IIRC (I think in SCG when
| he's explaining why the ontological argument for God's
| existence fails), so it's not as if this is a new discovery.
| jerf wrote:
| I consider Aristotelian logic to be a category. It is the
| Newtonian physics of the logic world; if your fancier logic
| doesn't have some sort of correspondence principle to
| Aristotelian logic, something has probably gone wrong. (Or
| you're so far out in whacky logic land you've left
| correspondence to the real universe behind you. More power
| to you, as long as you are aware you've done that.) And
| like Newton, being the first to figure it out labels
| Aristotle as a certifiable genius.
|
| See also Euclid; the fact that his geometry turns out not
| to be The Geometry does not diminish what it means to have
| blazed that trail. And it took centuries for anyone to find
| an alternative; that's quite an accomplishment.
|
| If I have a backhanded criticism hiding in my comment, it
| actually isn't pointed at Aristotle, but at the school
| system that may teach some super basic logic at some point
| and accidentally teaches people that's all logic is, in
| much the same way stats class accidentally teaches people
| that everything is uniformly randomly distributed (because
| it makes the homework problems easier, which is
| legitimately true, but does reduce the education's value in
| the real world), leaving people fairly vulnerable to the
| lists of fallacies they may find on the internet and
| unequipped to realize that they only apply in certain ways,
| in certain cases. I don't know that I've ever seen such a
| list where they point out that they have some validity in a
| probabilistic sense. There's also the fallacies that are
| just plain fallacious even so, but I don't generally see
| them segmented off or anything.
| 082349872349872 wrote:
| > _...it took centuries for anyone to find an
| alternative..._
|
| Pedantry: s/centuries/millennia/ (roughly 21 of the
| former, 2 of the latter?)
|
| EDIT: does anyone remember the quote about problems
| patiently waiting for our understanding to improve?
| currymj wrote:
| Rigorous causal inference methods are just now starting to
| diffuse into the undergraduate curriculum, after gradually
| becoming part of the mainstream in a lot of social science
| fields. But this is just happening.
|
| Judea Pearl is in some respects a little grandiose, but I think
| he is right to be express shock that it took almost a century
| to develop to this point, given how long the basic tools of
| probability and statistics have been fairly mature.
| currymj wrote:
| Rule 2 ("causation creates correlation") would be strongly
| disputed by a lot of people. It relies on the assumption of
| "faithfulness" which is not discussed until the bottom of the
| article.
|
| This is a very innocent sounding assumption but it's actually
| quite strong. In particular it may be violated when there are
| control systems or strategic agents as part of the system you
| want to study -- which is often the case for causal inference. In
| such scenarios (eg the famous thermostat example) you could have
| strong causal links which are invisible in the data.
| apwheele wrote:
| This was my thought as well.
|
| I don't like showing the scatterplots in these examples, as
| "correlation" _I think_ is more associated with the correlation
| coefficient than the more generic independence that the author
| means in this scenario. E.g. a U shape in the scatterplot may
| have a zero correlation coefficient but is not conditionally
| independent.
| bdjsiqoocwk wrote:
| > E.g. a U shape in the scatterplot may have a zero
| correlation coefficient but is not conditionally independent.
|
| Ok this is correct, but has nothing to do with causality.
| Whether or not two variables are correlated and whether or
| not they are independent, and when one does or doesn't imply
| the other, is a conversation that can be had without
| resorting to the concept of causality at all. And in fact
| that's how the subject is taught at an introductory level
| basically 100% of the times.
| cubefox wrote:
| > Ok this is correct, but has nothing to do with causality.
|
| It does. Dependence and independence have a lot to do with
| causation, as the article explains.
|
| > Whether or not two variables are correlated and whether
| or not they are independent, and when one does or doesn't
| imply the other, is a conversation that can be had without
| resorting to the concept of causality at all.
|
| Yes, but this is irrelevant. It's like saying "whether or
| not someone is married is a conversation that can be had
| without resorting to the concept of a bachelor at all".
|
| You can talk about (in)dependence without talking about
| causation, but you can't talk in detail about causation
| without talking about (in)dependence.
| ordu wrote:
| From the article:
|
| _> NB: Correlated does not mean linearly correlated_
|
| _> For simplicity, I have used linear correlations in all
| the example R code. In real life, however, the pattern of
| correlation /association/mutual information we should expect
| depends entirely on the functional form of the causal
| relationships involved._
| dash2 wrote:
| The standard mathematical definition of correlation means
| linear correlation. If you are talking about non-
| independence, it would be better to use that language. This
| early mistake made me think the author is not really an
| expert.
| cubefox wrote:
| What is an appropriate measure of (in)dependence though,
| if not Pearson correlation? Such that you feed a scatter
| plot into the formula for this measure, and if the
| measure returns 0 dependence, the variables are
| independent.
| currymj wrote:
| it's a tough problem.
|
| there are various schemes for estimating mutual
| information from samples. if you do that and mutual
| information is very close to zero, then I guess you can
| claim the two rvs are independent. But these estimators
| are pretty noisy and also often computationally
| frustrating (the ones I'm familiar with require doing a
| bunch of nearest-neighbor search between all the points).
|
| I agree with the OP that it's better to say "non-
| independence" and avoid confusion, at the same time, I
| disagree that linear correlation is actually the standard
| definition. In many fields, especially those where nobody
| ever expects linear relationships, it is not and
| everybody uses "correlated" to mean "not independent".
| cubefox wrote:
| Yeah. It would be simpler to talk about causal graphs if
| the nodes represented only events instead of arbitrary
| variables, because independence between events is much
| simpler to determine: X and Y are independent iff P(X) *
| P(Y) = P(X and Y). For events there also exists a measure
| of dependence: The so-called odds ratio. It is not
| influenced by the marginal probabilities, unlike Pearson
| correlation (called "phi coefficient" for events) or
| pointwise mutual information. Of course in practice
| events are usually not a possible simplification.
| rlpb wrote:
| That seems a bit harsh. People can independently become
| experts without being familiar with the terminology used
| by existing experts. Further, if intended for a non-
| expert audience, it may even be deliberate to loosen
| definitions of terms used by experts, and being precise
| by leaving a note about that instead, which apparently is
| exactly what this author did.
| dash2 wrote:
| It's much better to use vocabulary consistently with what
| everyone else does in the field. Then you don't need to
| add footnotes correcting yourself. And if you are not
| familiar with what everyone else means by correlation,
| you're very unlikely to be an expert. This is not like
| that Indian mathematician who reinvented huge chunks of
| mathematics.
| rlpb wrote:
| > It's much better to use vocabulary consistently with
| what everyone else does in the field.
|
| Fine, but...
|
| > And if you are not familiar with what everyone else
| means by correlation, you're very unlikely to be an
| expert.
|
| Perhaps, but this is not relevant. If there's a problem
| with this work, then that problem can be criticized
| directly. There is no need, and it is not useful, to
| infer "expertise" by indirect means.
| currymj wrote:
| This is a separate issue and also a good point. Correlation
| sometimes means "Pearson's correlation coefficient" and
| sometimes means "anything but completely independent" and
| it's often unclear. In this context I mean the latter.
| BenoitP wrote:
| I'd argue you both could be right. Your comment could lead to a
| definition of intelligence. Organisms capable of causally
| influencing deterministic systems to their advantage can be
| marked as intelligent. The complexity of which would determine
| the degree of intelligence.
|
| Your point is great in that it pinpoints also the notions of
| agency scopes. In all the causal DAGs it feels like there are
| implicit regions: ones where we can influence or not, intervene
| or not, observe or not, where one is responsible for or not.
|
| An intelligent agent is one capable of modelling a system,
| influence it, and bias it. Such that it can reach and exploit
| an existing corner case of it. I talk about a corner case
| because of entropy and murphy's law. For a given energy, there
| are way many more unadvantageous states than advantageous one.
| And the intelligence of a system is the complexity required to
| wield the entropy reduction of an energy source.
| joe_the_user wrote:
| Two problem with this. 1. There are many other ways that
| correlation doesn't imply causation. 2. The phenomenon the gp
| describes doesn't require broad intelligence but just
| reactiveness - a thermostat or a guided missile could have
| this.
| bubblyworld wrote:
| For anyone else who went down a rabbit hole - this paper
| describes the problem control systems present for these
| methodologies:
| https://www.sciencedirect.com/science/article/abs/pii/B97801...
|
| (paywalled link, but it's available on a well-known useful
| website)
| SpaceManNabs wrote:
| My fav way to intuit this is this example
|
| https://stats.stackexchange.com/questions/85363/simple-examp...
|
| Blew my mind the first time I saw it.
|
| Not the same definitions one to one (author specifically talks
| about correlation vs linear correlation) but same idea.
| kyllo wrote:
| Indeed, causally linked variables need not be correlated in
| observed data; bias in the opposite direction of the causal
| effect may approximately equal or exceed it in magnitude and
| "mask" the correlation. Chapter 1 of this popular causal
| inference book demonstrates this with a few examples:
| https://mixtape.scunning.com/01-introduction#do-not-confuse-...
| Vecr wrote:
| Are the assumptions "No spurious correlation", "Consistency", and
| "Exchangeability" ever actually true? If a dataset's big enough
| you should generally be able to find at least one weird
| correlation, and the others are limits of doing statistics in the
| real world.
| levocardia wrote:
| Some situations guarantee certain assumptions: Randomization,
| for example, guarantees exchangeability.
| shiandow wrote:
| This is missing my favourite rule.
|
| 0. The directions of all arrows not part of a collider are
| statistically meaningless.
| Vecr wrote:
| What's not part of a collider? Good luck with your memory in
| that case.
| 082349872349872 wrote:
| I'm guessing they mean that given a bunch of correlated nodes
| but no collider (in which case the casual graph must be a
| tree of some sort) you not only don't know if the tree be
| bushy or linear, you don't even know which node may be the
| root.
|
| (bushy trees, of which there are very many compared with
| linear ones, would be an instance of Gwern's model* of
| confounds being [much] more common than causality?)
|
| * https://news.ycombinator.com/item?id=41291636
| Vecr wrote:
| Right, but your memory functions as a collider, if there
| are literally no colliders anywhere you by definition won't
| be able to remember anything.
| dkga wrote:
| I highly suggest this paper here for a more complete view of
| causality that nests do-calculus (at least in economics):
|
| Heckman, JJ and Pinto, R. (2024): "Econometric causality: The
| central role of thought experiments", Journal of Econometrics,
| v.243, n.1-2.
| fn-mote wrote:
| Why should you look this paper up? It argues that certain
| approaches from statistics and computer science are limited,
| and (essentially) that economists have a better approach. YMMV,
| but the criticisms are specific (whether or not you buy the
| "fix").
|
| From the paper:
|
| > Each of the recent approaches holds value for limited classes
| of problems. [...] The danger lies in the sole reliance on
| these tools, which eliminates serious consideration of
| important policy and interpretation questions. We highlight the
| flexibility and adaptability of the econometric approach to
| causality, contrasting it with the limitations of other causal
| frameworks.
| Rhapso wrote:
| I'm keeping this link, taking a backup and handing it out
| whenever i can. It is succinct and effective.
|
| These are concepts i find myself constantly having to explain and
| teach and they are critical to problem solving.
| 082349872349872 wrote:
| Can these seven be reduced to three basic rules?
|
| - controlling for a node increases correlation among pairs where
| both are ancestors
|
| - controlling for a node does not affect (the lack of)
| correlation among pairs where at least one is categorically
| unrelated (shares no ancestry with that node)
|
| - controlling for a node decreases correlation among pairs where
| both are related but at least one is not an ancestor
| raymondh wrote:
| Is there a simple R example for Rule 4?
| elsherbini wrote:
| It is sort of tautological: # variable A has
| three causes: C1,C2,C3 C1 <- rnorm(100) C2 <-
| rnorm(100) C3 <- rnorm(100) A <- ifelse(C1
| + C2 + C3 > 1, 1, 0) cor(A, C1) cor(A, C2)
| cor(A, C3) # If we set the values of A
| ourselves... A <- sample(c(1,0), 100, replace=TRUE)
| # then A no longer has correlation with its natural causes
| cor(A, C1) cor(A, C2) cor(A, C3)
| abeppu wrote:
| At the bottom, the author mentions that by "correlation" they
| don't mean "linear correlation", but all their diagrams show the
| presence or absence of a clear linear correlation, and code
| examples use linear functions of random variables.
|
| They offhandedly say that "correlation" means "association" or
| "mutual information", so why not just do the whole post in terms
| of mutual information? I _think_ the main issue with that is just
| that some of these points become tautologies -- e.g. the first
| point, "independent variables have zero mutual information" ends
| up being just one implication of the definition of mutual
| information.
| jdhwosnhw wrote:
| This isnt a correction to your post, but a clarification for
| other readers: correlation implies dependence, but dependence
| does not imply correlation. Conversely, two variables share
| non-zero mutual information if and only if they are dependent.
| islewis wrote:
| Could you give some examples of dependence without
| correlation?
| xtacy wrote:
| You can check the example described here:
| https://stats.stackexchange.com/questions/644280/stable-
| viol...
|
| Judea Pearl's book also goes into the above in some detail,
| as to why faithfulness might be a reasonable assumption.
| abeppu wrote:
| A clear graphical set of illustrations is the bottom row in
| this famous set: https://en.wikipedia.org/wiki/Correlation#
| /media/File:Correl...
|
| They have clear dependence; if you imagine fixing
| ("conditioning") x at a particular value and looking at the
| distribution of y at that value, it's different from the
| overall distribution of y (and vice versa). But the
| familiar linear correlation coefficient wouldn't indicate
| anything about this relationship.
| kyllo wrote:
| > A sailor is sailing her boat across the lake on a windy
| day. As the wind blows, she counters by turning the rudder
| in such a way so as to exactly offset the force of the
| wind. Back and forth she moves the rudder, yet the boat
| follows a straight line across the lake. A kindhearted yet
| naive person with no knowledge of wind or boats might look
| at this woman and say, "Someone get this sailor a new
| rudder! Hers is broken!" He thinks this because he cannot
| see any relationship between the movement of the rudder and
| the direction of the boat.
|
| https://mixtape.scunning.com/01-introduction#do-not-
| confuse-...
| gweinberg wrote:
| Imagine your data points look like a U. There's no
| (lineral) correlation between x and y, you are equally
| likely to have a high value of y when x is high or low. But
| low values of y are associated with medium values of x, and
| a high value of y means x will be very high or very low.
| crystal_revenge wrote:
| I mentioned it in another comment, but the most trivial
| example is:
|
| X ~ Unif(-1,1)
|
| Y = X^2
|
| In this case X and Y have a correlation of 0.
| westurner wrote:
| By that measure, all of these Spurious Correlations indicate
| _insignificant_ dependence, which isn 't of utility:
| https://www.tylervigen.com/spurious-correlations
|
| Isn't it possible to contrive an example where a test of
| pairwise dependence causes the statistician to error by
| excluding relevant variables from tests of more complex
| relations?
|
| Trying to remember which of these factor both P(A|B) and
| P(B|A) into the test
| abeppu wrote:
| I think you're using the word "insignificant" in a possibly
| misleading or confusing way.
|
| I think in this context, the issue with the spurious
| correlations from that site is that they're all time series
| for overlapping periods. Of course, the people who
| collected these understood that time was an important
| causal factor in all these phenomena. In the graphical
| language of this post:
|
| T --> X_i
|
| T --> X_j
|
| Since T is a common cause to both, we should expect to see
| a mutual information between X_i, X_j. In the paradigm
| here, we could try to control for T and see if a
| relationship persists (i.e. perhaps in the same month,
| collect observations for X_i, X_j in each of a large number
| of locales), and get a signal on whether some the shared
| dependence on time is the _only_ link.
| bjornsing wrote:
| I'd be more interested in those tautologies nonetheless. Much
| better than literally untrue statements that I have to somehow
| decipher.
| levocardia wrote:
| >Controlling for a collider leads to correlation
|
| This is a big one that most people are not aware of. Quite often,
| in economics, medicine, and epidemiology, you'll see researchers
| adjust for everything in their regression model: income, physical
| activity, education, alcohol consumption, BMI, ... without
| realizing that they could easily be inducing collider bias.
|
| A much better, but rare, approach is to sit down with some
| subject matter experts and draft up a DAG - directed acyclic
| graph - that makes your assumptions about the causal structure of
| the problem explicit. Then determine what needs to be adjusted
| for in order to get a causal estimate of the effect. When you're
| explicit about your causal assumptions, it makes it easier for
| other researchers to propose different causal structures, and see
| if your results still hold up under alternative causal
| structures.
|
| The DAGitty tool [1] has some cool examples.
|
| [1] https://www.dagitty.net/dags.html
| kyllo wrote:
| Collider bias or "Berkson's Paradox" is a fun one, there lots
| of examples of it in everyday life:
| https://en.wikipedia.org/wiki/Berkson%27s_paradox
| chrsig wrote:
| > Rule 8: Controlling for a causal descendant (partially)
| controls for the ancestor
|
| perhaps this is a quaint or wildly off base question, but an
| honest one, please forgive any ignorance:
|
| Isn't this essentiallydefining the partial derivative? Should one
| arrive at the calculus definition of a partial derivative by
| following this?
| bubblyworld wrote:
| You probably could if you interpret that sentence very
| creatively. But I think it's useful to remember that this is
| mathematics, and words like "control", "descendant" and
| "ancestor" have specific technical meanings (all defined in the
| article, I believe).
|
| The technical meaning of that sentence has to do with
| probability theory (probability distributions, correlation,
| conditionals), and not so much calculus (differentiable
| functions, limits, continuity).
| nomilk wrote:
| Humble reminder of how easy R is to use. Download and install R
| for your operating system: https://cran.r-project.org/bin/
|
| Start it in the terminal by typing: R
|
| Copy/paste the code from the article to see it run!
| curiousgal wrote:
| Can't use R without RStudio. It so much better than the
| terminal.
| nomilk wrote:
| Agree RStudio makes R a dream, but isn't necessary for
| someone to run the code in the article =)
| throwway_278314 wrote:
| really??? I've developed in R for over a decade using two
| terminal windows. One runs vim, the other runs R. Keyboard
| shortcuts to send R code from vim to R.
|
| first google hit if you want to try this yourself:
| https://www.freecodecamp.org/news/turning-vim-into-an-r-
| ide-...
|
| Sooooooo much better than "notebooks". Hating on "notebooks"
| today.
| carlmr wrote:
| >Humble reminder of how easy R is to use.
|
| I had to learn R for a statistics course. This was a long time
| ago. But coming from a programming background I never found any
| other mainstream language as hard to grok as R.
|
| Has this become better? Is it just me that doesn't get it?
| incognito124 wrote:
| R is my least favorite language to use, thanks to the uni
| courses that force it
|
| https://github.com/ReeceGoding/Frustration-One-Year-With-R
| crystal_revenge wrote:
| > Independent variables are not correlated
|
| But it's important to remember that _dependent_ variables can
| also be _not correlated_. That is _no correlation_ does _not_
| imply independence.
|
| Consider this trivial case:
|
| X ~ Uniform(-1,1)
|
| Y = X^2
|
| Cor(X,Y) = 0
|
| Despite the fact that Y's value is absolutely determined by the
| value of X.
| TheRealPomax wrote:
| This is also why it's important to _look at your plots_.
| Because simply looking at your scatter plot makes it really
| obvious what methods you _can 't_ use, even if it doesn't
| really tell you anything about what you _should_ use.
| antognini wrote:
| The author is using "correlation" in a somewhat non-standard
| way. He isn't referring to linear correlation as you are, but
| any sort of nonzero mutual information between the two
| variables. So in his usage those two variables are "correlated"
| in your example.
___________________________________________________________________
(page generated 2024-08-19 23:00 UTC)