[HN Gopher] The Illusion of Causality in Charts
       ___________________________________________________________________
        
       The Illusion of Causality in Charts
        
       Author : skadamat
       Score  : 40 points
       Date   : 2025-05-28 17:49 UTC (3 days ago)
        
 (HTM) web link (filwd.substack.com)
 (TXT) w3m dump (filwd.substack.com)
        
       | NoTranslationL wrote:
       | This is a tough problem. I'm working on an app called Reflect [1]
       | that lets you analyze your life's data and the temptation to draw
       | conclusions from charts and correlations is strong. We added an
       | experiments feature that will let you form hypotheses and it will
       | even flag confounding variables if you track other metrics during
       | your experiments. Still trying to make it even better to avoid
       | drawing false conclusions.
       | 
       | [1] https://apps.apple.com/us/app/reflect-track-
       | anything/id64638...
        
       | gcanyon wrote:
       | The article seems more about the underlying causality, and less
       | about the charts' specific role in misleading. To pick one
       | example, the scatterplot chart isn't misleading: it's just a
       | humble chart doing _exactly_ what it 's supposed to do: present
       | some data in a way that makes clear the relationship (not
       | necessarily causality!) between saturated fat consumption and
       | heart disease.
       | 
       | The underlying issue (which the article discusses to some extent)
       | is how confounding factors can make _the data_ misleading /allow
       | the data to be misinterpreted.
       | 
       | To discuss "The Illusion of Causality in Charts" I'd want to
       | consider how one chart type vs. another is more susceptible to
       | misinterpretation/more misleading than another. I don't know if
       | that's actually true -- I haven't worked up some examples to
       | check -- but that's what I was hoping for here.
        
         | melagonster wrote:
         | a famous example is that bar chart always better than pie chart
         | (see the advice from page of pie chart on ggplot website).
        
         | hammock wrote:
         | > the scatterplot chart isn't misleading
         | 
         | Even leaving out the data (which you rightly point out) you are
         | forced to choose what to plot on x and y, which by convention
         | will communicate IV and DV respectively whether you like it or
         | not.
        
       | justonceokay wrote:
       | A pet issue I have that is in line with the "illusions" in the
       | article is what I might call the "bound by statistics" fallacy.
       | 
       | The shape of it is that there is a statistic about population and
       | then that statistic is used to describe a member of that
       | population. For example, a news story that starts with "70% of
       | restaurants fail in their first year, so it's surprising that new
       | restaurant Pete's Pizza is opening their third location!"
       | 
       | But it's only surprising if you know absolutely nothing about
       | Pete and his business. Pete's a smart guy. He's running without
       | debt and has community and government ties. His aunt ran a pizza
       | business and gave him her recipes.
       | 
       | In a Bayesian way of thinking, the newscasters statement only
       | makes sense if the only prior they have is the average success
       | rate of restaurants. But that is an admittance that they know
       | nothing about the actual specifics of the current situation, or
       | the person they are talking about. Additionally there is zero
       | causal relationship between group statistics and individual
       | outcomes, the causal relationship goes the other way. Pete's
       | success will slightly change that 70% metric, but the 70% metric
       | never bound Pete to be "likely to fail".
       | 
       | Other places I see the "bound by statistics" problem is in
       | healthcare, criminal proceedings, racist rhetoric, and identity
       | politics.
        
         | zmgsabst wrote:
         | It's not even surprising without knowing about Pete: the
         | newspaper isn't going to publish the many that failed, so their
         | own selection bias is the dominant effect. Even if we take the
         | probability of the group to be the probability of individuals,
         | eg, rolling dice ("1 in 6 people rolls a 4!").
         | 
         | Lots of them opened, of them 70% failed, and one who didn't
         | happened to be named Pete.
         | 
         | No more interesting than "Pete rolled a 4!" even though 83% of
         | people don't.
        
           | skybrian wrote:
           | If a newspaper only publishes surprising results, but it's
           | unsurprising when they appear in the newspaper, then you've
           | set up a paradox: a set that only contains nonmembers.
           | 
           | I don't think it's valid to define "surprising" in such a
           | self-referential way. When something unusual appears in the
           | news, that doesn't make it common. The probabilities are
           | different before and after applying a filter.
        
             | nemomarx wrote:
             | It doesn't make it a common outcome overall, but it makes
             | it a common outcome for it to appear in the newspaper,
             | right? It's just different meanings of "common".
        
               | skybrian wrote:
               | Yes, that's what I meant.
        
             | zmgsabst wrote:
             | I didn't say that, nor use any self-reference.
             | 
             | > When something unusual appears in the news, that doesn't
             | make it common.
             | 
             | This in particular isn't even close to what I said, which
             | was: rare events can be unsurprising in large datasets --
             | as is the case with both dice rolls and restaurants
             | succeeding.
        
               | skybrian wrote:
               | If rare events are unsurprising then I think you've
               | defined surprising events out of existence? I mean, sure,
               | someone will win the lottery, but it would very
               | surprising if it happened to you.
               | 
               | I guess this is just point of view. "Someone won the
               | lottery" and "I won the lottery" describe the same event
               | from very different perspectives.
        
             | yusina wrote:
             | If 99.9% fail and you see one that didn't, then would it be
             | surprising that it didn't? No! It's not 100%, so there
             | _must_ be some example. For that one it 's _not_
             | surprising.
             | 
             | More precisely, it's not surprising that one exists. It may
             | be surprising that this particular one survived, just as it
             | wouldbe surprising that it's my neighbor who wins the
             | lottery next week. But it's likely that _somebody_ will, so
             | if somebody has won, it won 't be a surprise that somebody
             | did.
        
               | skybrian wrote:
               | Yes, it seems like surprise has to depend on your point
               | of view. There's an enormous difference between "someone
               | somewhere won the lottery" and "I won the lottery," even
               | if it's describing the same event.
               | 
               | It's a question of how many other possibilities are
               | considered similar to the one that happened. From a
               | zoomed-out perspective, one win is as good as any
               | another.
        
           | steveBK123 wrote:
           | People are also very even worse with conditional
           | probabilities
        
         | skybrian wrote:
         | People sometimes talk about this as "taking the outside view"
         | or "reference class forecasting." [1] It doesn't work when
         | there are important differences between the case being
         | considered and other members of the reference class. Nationwide
         | statistics are especially zoomed-out and there are going to be
         | a lot of people in the same country who are quite different
         | from you. Worldwide statistics are even worse.
         | 
         | It doesn't mean the statistics are wrong, though. If there is a
         | 70% chance of failure, there's also a 30% chance of success.
         | But it's subjective: use a different reference class and you'll
         | get a different number.
         | 
         | The opposite problem is also common: assuming that "this time
         | it's different" without considering the reasons why others have
         | failed.
         | 
         | The general issue is overconfidence and failure to consider
         | alternative scenarios. The future isn't known to us and it's
         | easy to fool yourself into thinking it _is_ known.
         | 
         | [1] https://en.m.wikipedia.org/wiki/Reference_class_forecasting
        
         | yusina wrote:
         | I agree with your description, but the pizza place case is even
         | simpler: Statistics don't guarantee future single sample
         | properties. 70% fail in the first year. So, 30% don't. Why
         | would it be surprising to see one that didn't? It would be
         | surprising to see _none_ that didn 't fail. So, it's _expected_
         | to see lots that don 't fail, Pete's being one of them.
        
       | nwlotz wrote:
       | One of the best things I was forced to do in high school was read
       | "How to Lie with Statistics" by Darrell Huff. The book's a bit
       | dated and oversimplified in parts, but it gave me a healthy
       | skepticism that served me well in college and beyond.
       | 
       | I think the issues described in this piece, and by other
       | comments, are going to get much worse with the (dis)information
       | overload AI can provide. "Hey AI, plot thing I don't like A with
       | bad outcome B, and scale the axes so they look heavily
       | correlated". Then it's picked up on social media, a clout-chasing
       | public official sees it, and now it's used to make policy.
        
         | hammock wrote:
         | It helps to internalize the concept that all statistics
         | (visualizations, but also literally any statistic with an
         | element of organization) is narrative. "The medium is the
         | message" type of way.
         | 
         | Sometimes you are choosing the narrative consciously (I created
         | this chart to tell a story), and sometimes you are choosing it
         | unconsciously (I just want to scatter plot and see what it
         | shows - but you chose the x and y to plot, and you chose the
         | scatter plot vs some other framework), and sometimes it is
         | chosen for you (chart defaults for example, or north is up on a
         | map).
         | 
         | And it's not just charts. Statistics on the whole exist to
         | organize raw data. The very act of introducing organization
         | means you have a scheme, framework, lens which with to do so.
         | You have to accept that and become conscious of that.
         | 
         | You cannot do anything as simple as report an average without
         | choosing which data to include and which type of average to
         | use. Or a histogram without choosing the bin sizes, and again,
         | the data to include.
         | 
         | This is all to say nothing of the way the data was produced in
         | the first place. (Separate topic)
        
       | djoldman wrote:
       | This is not a problem with charts, it is a problem with the
       | interpretation of charts.
       | 
       | 1. In general, humans are not trained to be skeptical of data
       | visualizations.
       | 
       | 2. Humans are hard-wired to find and act on patterns, illusory or
       | not, at great expense.
       | 
       | Incidentally, I've found that avoiding the words "causes,"
       | "causality," and "causation" is almost always the right path or
       | at the least should be the rule as opposed to the exception. In
       | my experience, they rarely clarify and are almost always
       | overreach.
        
         | ninetyninenine wrote:
         | It's not a problem of interpretation or visualization or
         | charts. People are talking about it as if it's deception or
         | interpretation but the problem is deeper than this.
         | 
         | It's a fundamental problem of reality.
         | 
         | The nature of reality itself prevents us from determining
         | causality from observation, this includes looking at a chart.
         | 
         | If you observe two variables. Whether those random variables
         | correlate or not... there is NO way to determine if one
         | variable is causative to another through observation alone. Any
         | causation in a conclusion from observation alone is in
         | actuality only assumed. Note the key phrase here is: "through
         | observation alone."
         | 
         | In order to determine if one thing "causes" another thing, you
         | have to insert yourself into the experiment. It needs to go
         | beyond observation.
         | 
         | The experimenter needs to turn off the cause and turn on the
         | cause in a random pattern and see whether that changes the
         | correlation. Only through this can one determine causation. If
         | you don't agree with this, think about it a bit.
         | 
         | Also note that this is how they approve and validate
         | medicine... they have to prove that the medicine/procedure
         | "causes" a better outcome and the only way to do this is to
         | actually make giving and withholding the medicine as part of
         | the trial.
        
       | qixv wrote:
       | You know, everyone that confuses correlation with causation ends
       | up dying.
        
       | singularity2001 wrote:
       | what's very fascinating in general is that causality is a
       | difficult mathematical concept which only a tiny fraction of the
       | population learns yet everyone is talking about it and "using it"
       | 
       | we do have a pretty good intuition for it but if you look at the
       | details and ask people what is the difference between correlation
       | and causality and how do you distinguish it things get rabbit
       | holey pretty quick
        
       ___________________________________________________________________
       (page generated 2025-05-31 23:01 UTC)