Causal Inference in Data Journalism

Causal Inference in
Data Journalism
Laura Bronner
ETH Zürich
Two types of data journalism
What is a causal claim?
Defining 
cause
 and 
effect
 
Here:
  
Cause: museum visits (and other arts engagement)
  
Effect: longer life
The 
counterfactual
 you’re interested in:
 
Here:
  
Longevity of someone who goes to museums,
  
compared to someone who doesn’t
Theory of selection
A qualitative argument about 
how/why
 someone ends up in one group
(here: going to museums) as opposed to the other (here: not going to museums)
The benefit of randomized controlled trials:
-
Controlled:
-
You control the selection process: how and why someone ends up in one group or the other
-
Randomized:
-
The two groups are the same, on average – on dimensions you know about (age, health, etc) but
also on dimensions you might not be able to measure (curiosity, habits) or even know about (?,
?)
Randomized controlled trials: the “gold standard”
What this means: 
When you compare treatment and control, you know 
(a) why they took the treatment, and 
(b) that they are otherwise the same.
Result: 
Any difference between the two groups is be
cause
 of the treatment
Causal-associational ambiguity
(or: the “This is just an association! ;)“ cop-out)
From the study: “
this study was observational, and although we took a
number of additional steps to try and test the assumptions of models,
causality cannot be assumed
.“
 
Important: not just what researchers say, but:
 
…how they write about it…
 
…how media covers it…
 
…how readers understand it.
What will people take from it?
 
Are we interested in whether 
museum-goers
 live longer?
 
Or whether 
going to the museum
 makes you live longer?
What’s the idealized RCT here?
Make some people go to museums for 14 years, and prevent others from
doing so?
Is the study’s methodology approximating this by controlling for various
socioeconomic/health variables?
These aren’t the same kinds of people.
Causal claims are 
hard
Need to 
-
identify cause and effect
-
establish a counterfactual
-
estimate the effect
My favorite example: 
NYT on Running Shoes
Look at runners’ Strava times – those who
wear Vaporflys and those who don’t
But go further: try to understand 
why
runners might pick Vaporflys
Do Vaporflys make runners faster?
Try to understand what might bias their estimates
Descriptive 
or causal?
Takeaway
It’s perfectly fine to be here!
If your evidence is descriptive, focus on that! 
-
Don’t make causal claims. 
-
And don’t use words like “association” or “link” that imply causality without explicitly
claiming it.
If your claim is causal, ask yourself: 
-
What’s the counterfactual? 
-
What’s the process by which some people received the treatment and others didn’t? 
-
Is that plausible?
Slide Note
Embed
Share

Delve into the world of data journalism as it explores the causal relationship between museum visits and longevity. Discover the theory of selection, randomized controlled trials, and the nuances of causal claims. Uncover the ambiguity in causal-associational relationships and the idealized RCT approach. Gain insights into how media covers causality in data journalism and the importance of rigorous methodology in drawing causal conclusions.

  • Data Journalism
  • Causal Inference
  • Museum Visits
  • Longevity
  • Randomized Controlled Trials

Uploaded on Feb 27, 2025 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. Causal Inference in Data Journalism Laura Bronner ETH Z rich

  2. Two types of data journalism Descriptive Causal Here s a thing that happens Here s the effect of a thing

  3. What is a causal claim? Defining cause and effect Here: Cause: museum visits (and other arts engagement) Effect: longer life The counterfactual you re interested in: Here: Longevity of someone who goes to museums, compared to someone who doesn t

  4. Theory of selection A qualitative argument about how/why someone ends up in one group (here: going to museums) as opposed to the other (here: not going to museums) The benefit of randomized controlled trials: - Controlled: - You control the selection process: how and why someone ends up in one group or the other - Randomized: - The two groups are the same, on average on dimensions you know about (age, health, etc) but also on dimensions you might not be able to measure (curiosity, habits) or even know about (?, ?)

  5. Randomized controlled trials: the gold standard What this means: When you compare treatment and control, you know (a) why they took the treatment, and (b) that they are otherwise the same. Result: Any difference between the two groups is because of the treatment

  6. Causal-associational ambiguity (or: the This is just an association! ;) cop-out) From the study: this study was observational, and although we took a number of additional steps to try and test the assumptions of models, causality cannot be assumed. Important: not just what researchers say, but: how they write about it how media covers it how readers understand it. What will people take from it? Are we interested in whether museum-goers live longer? Or whether going to the museum makes you live longer?

  7. Whats the idealized RCT here? Make some people go to museums for 14 years, and prevent others from doing so? Is the study s methodology approximating this by controlling for various socioeconomic/health variables? These aren t the same kinds of people.

  8. Causal claims are hard Need to - - - identify cause and effect establish a counterfactual estimate the effect My favorite example: NYT on Running Shoes

  9. Do Vaporflys make runners faster? Look at runners Strava times those who wear Vaporflys and those who don t But go further: try to understand why runners might pick Vaporflys

  10. Try to understand what might bias their estimates

  11. Descriptive or causal?

  12. Takeaway It s perfectly fine to be here! Descriptive Causal Here s a thing that happens Here s the effect of a thing If your evidence is descriptive, focus on that! - Don t make causal claims. - And don t use words like association or link that imply causality without explicitly claiming it. If your claim is causal, ask yourself: - What s the counterfactual? - What s the process by which some people received the treatment and others didn t? - Is that plausible?

More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#