[HN Gopher] The Illusion of Causality in Charts
___________________________________________________________________
The Illusion of Causality in Charts
Author : skadamat
Score : 40 points
Date : 2025-05-28 17:49 UTC (3 days ago)
(HTM) web link (filwd.substack.com)
(TXT) w3m dump (filwd.substack.com)
| NoTranslationL wrote:
| This is a tough problem. I'm working on an app called Reflect [1]
| that lets you analyze your life's data and the temptation to draw
| conclusions from charts and correlations is strong. We added an
| experiments feature that will let you form hypotheses and it will
| even flag confounding variables if you track other metrics during
| your experiments. Still trying to make it even better to avoid
| drawing false conclusions.
|
| [1] https://apps.apple.com/us/app/reflect-track-
| anything/id64638...
| gcanyon wrote:
| The article seems more about the underlying causality, and less
| about the charts' specific role in misleading. To pick one
| example, the scatterplot chart isn't misleading: it's just a
| humble chart doing _exactly_ what it 's supposed to do: present
| some data in a way that makes clear the relationship (not
| necessarily causality!) between saturated fat consumption and
| heart disease.
|
| The underlying issue (which the article discusses to some extent)
| is how confounding factors can make _the data_ misleading /allow
| the data to be misinterpreted.
|
| To discuss "The Illusion of Causality in Charts" I'd want to
| consider how one chart type vs. another is more susceptible to
| misinterpretation/more misleading than another. I don't know if
| that's actually true -- I haven't worked up some examples to
| check -- but that's what I was hoping for here.
| melagonster wrote:
| a famous example is that bar chart always better than pie chart
| (see the advice from page of pie chart on ggplot website).
| hammock wrote:
| > the scatterplot chart isn't misleading
|
| Even leaving out the data (which you rightly point out) you are
| forced to choose what to plot on x and y, which by convention
| will communicate IV and DV respectively whether you like it or
| not.
| justonceokay wrote:
| A pet issue I have that is in line with the "illusions" in the
| article is what I might call the "bound by statistics" fallacy.
|
| The shape of it is that there is a statistic about population and
| then that statistic is used to describe a member of that
| population. For example, a news story that starts with "70% of
| restaurants fail in their first year, so it's surprising that new
| restaurant Pete's Pizza is opening their third location!"
|
| But it's only surprising if you know absolutely nothing about
| Pete and his business. Pete's a smart guy. He's running without
| debt and has community and government ties. His aunt ran a pizza
| business and gave him her recipes.
|
| In a Bayesian way of thinking, the newscasters statement only
| makes sense if the only prior they have is the average success
| rate of restaurants. But that is an admittance that they know
| nothing about the actual specifics of the current situation, or
| the person they are talking about. Additionally there is zero
| causal relationship between group statistics and individual
| outcomes, the causal relationship goes the other way. Pete's
| success will slightly change that 70% metric, but the 70% metric
| never bound Pete to be "likely to fail".
|
| Other places I see the "bound by statistics" problem is in
| healthcare, criminal proceedings, racist rhetoric, and identity
| politics.
| zmgsabst wrote:
| It's not even surprising without knowing about Pete: the
| newspaper isn't going to publish the many that failed, so their
| own selection bias is the dominant effect. Even if we take the
| probability of the group to be the probability of individuals,
| eg, rolling dice ("1 in 6 people rolls a 4!").
|
| Lots of them opened, of them 70% failed, and one who didn't
| happened to be named Pete.
|
| No more interesting than "Pete rolled a 4!" even though 83% of
| people don't.
| skybrian wrote:
| If a newspaper only publishes surprising results, but it's
| unsurprising when they appear in the newspaper, then you've
| set up a paradox: a set that only contains nonmembers.
|
| I don't think it's valid to define "surprising" in such a
| self-referential way. When something unusual appears in the
| news, that doesn't make it common. The probabilities are
| different before and after applying a filter.
| nemomarx wrote:
| It doesn't make it a common outcome overall, but it makes
| it a common outcome for it to appear in the newspaper,
| right? It's just different meanings of "common".
| skybrian wrote:
| Yes, that's what I meant.
| zmgsabst wrote:
| I didn't say that, nor use any self-reference.
|
| > When something unusual appears in the news, that doesn't
| make it common.
|
| This in particular isn't even close to what I said, which
| was: rare events can be unsurprising in large datasets --
| as is the case with both dice rolls and restaurants
| succeeding.
| skybrian wrote:
| If rare events are unsurprising then I think you've
| defined surprising events out of existence? I mean, sure,
| someone will win the lottery, but it would very
| surprising if it happened to you.
|
| I guess this is just point of view. "Someone won the
| lottery" and "I won the lottery" describe the same event
| from very different perspectives.
| yusina wrote:
| If 99.9% fail and you see one that didn't, then would it be
| surprising that it didn't? No! It's not 100%, so there
| _must_ be some example. For that one it 's _not_
| surprising.
|
| More precisely, it's not surprising that one exists. It may
| be surprising that this particular one survived, just as it
| wouldbe surprising that it's my neighbor who wins the
| lottery next week. But it's likely that _somebody_ will, so
| if somebody has won, it won 't be a surprise that somebody
| did.
| skybrian wrote:
| Yes, it seems like surprise has to depend on your point
| of view. There's an enormous difference between "someone
| somewhere won the lottery" and "I won the lottery," even
| if it's describing the same event.
|
| It's a question of how many other possibilities are
| considered similar to the one that happened. From a
| zoomed-out perspective, one win is as good as any
| another.
| steveBK123 wrote:
| People are also very even worse with conditional
| probabilities
| skybrian wrote:
| People sometimes talk about this as "taking the outside view"
| or "reference class forecasting." [1] It doesn't work when
| there are important differences between the case being
| considered and other members of the reference class. Nationwide
| statistics are especially zoomed-out and there are going to be
| a lot of people in the same country who are quite different
| from you. Worldwide statistics are even worse.
|
| It doesn't mean the statistics are wrong, though. If there is a
| 70% chance of failure, there's also a 30% chance of success.
| But it's subjective: use a different reference class and you'll
| get a different number.
|
| The opposite problem is also common: assuming that "this time
| it's different" without considering the reasons why others have
| failed.
|
| The general issue is overconfidence and failure to consider
| alternative scenarios. The future isn't known to us and it's
| easy to fool yourself into thinking it _is_ known.
|
| [1] https://en.m.wikipedia.org/wiki/Reference_class_forecasting
| yusina wrote:
| I agree with your description, but the pizza place case is even
| simpler: Statistics don't guarantee future single sample
| properties. 70% fail in the first year. So, 30% don't. Why
| would it be surprising to see one that didn't? It would be
| surprising to see _none_ that didn 't fail. So, it's _expected_
| to see lots that don 't fail, Pete's being one of them.
| nwlotz wrote:
| One of the best things I was forced to do in high school was read
| "How to Lie with Statistics" by Darrell Huff. The book's a bit
| dated and oversimplified in parts, but it gave me a healthy
| skepticism that served me well in college and beyond.
|
| I think the issues described in this piece, and by other
| comments, are going to get much worse with the (dis)information
| overload AI can provide. "Hey AI, plot thing I don't like A with
| bad outcome B, and scale the axes so they look heavily
| correlated". Then it's picked up on social media, a clout-chasing
| public official sees it, and now it's used to make policy.
| hammock wrote:
| It helps to internalize the concept that all statistics
| (visualizations, but also literally any statistic with an
| element of organization) is narrative. "The medium is the
| message" type of way.
|
| Sometimes you are choosing the narrative consciously (I created
| this chart to tell a story), and sometimes you are choosing it
| unconsciously (I just want to scatter plot and see what it
| shows - but you chose the x and y to plot, and you chose the
| scatter plot vs some other framework), and sometimes it is
| chosen for you (chart defaults for example, or north is up on a
| map).
|
| And it's not just charts. Statistics on the whole exist to
| organize raw data. The very act of introducing organization
| means you have a scheme, framework, lens which with to do so.
| You have to accept that and become conscious of that.
|
| You cannot do anything as simple as report an average without
| choosing which data to include and which type of average to
| use. Or a histogram without choosing the bin sizes, and again,
| the data to include.
|
| This is all to say nothing of the way the data was produced in
| the first place. (Separate topic)
| djoldman wrote:
| This is not a problem with charts, it is a problem with the
| interpretation of charts.
|
| 1. In general, humans are not trained to be skeptical of data
| visualizations.
|
| 2. Humans are hard-wired to find and act on patterns, illusory or
| not, at great expense.
|
| Incidentally, I've found that avoiding the words "causes,"
| "causality," and "causation" is almost always the right path or
| at the least should be the rule as opposed to the exception. In
| my experience, they rarely clarify and are almost always
| overreach.
| ninetyninenine wrote:
| It's not a problem of interpretation or visualization or
| charts. People are talking about it as if it's deception or
| interpretation but the problem is deeper than this.
|
| It's a fundamental problem of reality.
|
| The nature of reality itself prevents us from determining
| causality from observation, this includes looking at a chart.
|
| If you observe two variables. Whether those random variables
| correlate or not... there is NO way to determine if one
| variable is causative to another through observation alone. Any
| causation in a conclusion from observation alone is in
| actuality only assumed. Note the key phrase here is: "through
| observation alone."
|
| In order to determine if one thing "causes" another thing, you
| have to insert yourself into the experiment. It needs to go
| beyond observation.
|
| The experimenter needs to turn off the cause and turn on the
| cause in a random pattern and see whether that changes the
| correlation. Only through this can one determine causation. If
| you don't agree with this, think about it a bit.
|
| Also note that this is how they approve and validate
| medicine... they have to prove that the medicine/procedure
| "causes" a better outcome and the only way to do this is to
| actually make giving and withholding the medicine as part of
| the trial.
| qixv wrote:
| You know, everyone that confuses correlation with causation ends
| up dying.
| singularity2001 wrote:
| what's very fascinating in general is that causality is a
| difficult mathematical concept which only a tiny fraction of the
| population learns yet everyone is talking about it and "using it"
|
| we do have a pretty good intuition for it but if you look at the
| details and ask people what is the difference between correlation
| and causality and how do you distinguish it things get rabbit
| holey pretty quick
___________________________________________________________________
(page generated 2025-05-31 23:01 UTC)