Causal Inference in Data Journalism
Delve into the world of data journalism as it explores the causal relationship between museum visits and longevity. Discover the theory of selection, randomized controlled trials, and the nuances of causal claims. Uncover the ambiguity in causal-associational relationships and the idealized RCT approach. Gain insights into how media covers causality in data journalism and the importance of rigorous methodology in drawing causal conclusions.
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
Causal Inference in Data Journalism Laura Bronner ETH Z rich
Two types of data journalism Descriptive Causal Here s a thing that happens Here s the effect of a thing
What is a causal claim? Defining cause and effect Here: Cause: museum visits (and other arts engagement) Effect: longer life The counterfactual you re interested in: Here: Longevity of someone who goes to museums, compared to someone who doesn t
Theory of selection A qualitative argument about how/why someone ends up in one group (here: going to museums) as opposed to the other (here: not going to museums) The benefit of randomized controlled trials: - Controlled: - You control the selection process: how and why someone ends up in one group or the other - Randomized: - The two groups are the same, on average on dimensions you know about (age, health, etc) but also on dimensions you might not be able to measure (curiosity, habits) or even know about (?, ?)
Randomized controlled trials: the gold standard What this means: When you compare treatment and control, you know (a) why they took the treatment, and (b) that they are otherwise the same. Result: Any difference between the two groups is because of the treatment
Causal-associational ambiguity (or: the This is just an association! ;) cop-out) From the study: this study was observational, and although we took a number of additional steps to try and test the assumptions of models, causality cannot be assumed. Important: not just what researchers say, but: how they write about it how media covers it how readers understand it. What will people take from it? Are we interested in whether museum-goers live longer? Or whether going to the museum makes you live longer?
Whats the idealized RCT here? Make some people go to museums for 14 years, and prevent others from doing so? Is the study s methodology approximating this by controlling for various socioeconomic/health variables? These aren t the same kinds of people.
Causal claims are hard Need to - - - identify cause and effect establish a counterfactual estimate the effect My favorite example: NYT on Running Shoes
Do Vaporflys make runners faster? Look at runners Strava times those who wear Vaporflys and those who don t But go further: try to understand why runners might pick Vaporflys
Descriptive or causal?
Takeaway It s perfectly fine to be here! Descriptive Causal Here s a thing that happens Here s the effect of a thing If your evidence is descriptive, focus on that! - Don t make causal claims. - And don t use words like association or link that imply causality without explicitly claiming it. If your claim is causal, ask yourself: - What s the counterfactual? - What s the process by which some people received the treatment and others didn t? - Is that plausible?