Causal Inference in Data Journalism

Causal Inference in

Data Journalism

Laura Bronner

ETH Zürich

Two types of data journalism

What is a causal claim?

Defining

cause

and

effect

Here:

Cause: museum visits (and other arts engagement)

Effect: longer life

The

counterfactual

 you’re interested in:

Here:

Longevity of someone who goes to museums,

compared to someone who doesn’t

Theory of selection

A qualitative argument about

how/why

 someone ends up in one group

(here: going to museums) as opposed to the other (here: not going to museums)

The benefit of randomized controlled trials:

Controlled:

You control the selection process: how and why someone ends up in one group or the other

Randomized:

The two groups are the same, on average – on dimensions you know about (age, health, etc) but

also on dimensions you might not be able to measure (curiosity, habits) or even know about (?,

?)

Randomized controlled trials: the “gold standard”

What this means:

When you compare treatment and control, you know

(a) why they took the treatment, and

(b) that they are otherwise the same.

Result:

Any difference between the two groups is be

cause

 of the treatment

Causal-associational ambiguity

(or: the “This is just an association! ;)“ cop-out)

From the study: “

this study was observational, and although we took a

number of additional steps to try and test the assumptions of models,

causality cannot be assumed

.“

Important: not just what researchers say, but:

…how they write about it…

…how media covers it…

…how readers understand it.

What will people take from it?

Are we interested in whether

museum-goers

 live longer?

Or whether

going to the museum

 makes you live longer?

What’s the idealized RCT here?

Make some people go to museums for 14 years, and prevent others from

doing so?

Is the study’s methodology approximating this by controlling for various

socioeconomic/health variables?

These aren’t the same kinds of people.

Causal claims are

hard

Need to

identify cause and effect

establish a counterfactual

estimate the effect

My favorite example:

NYT on Running Shoes

Look at runners’ Strava times – those who

wear Vaporflys and those who don’t

But go further: try to understand

why

runners might pick Vaporflys

Do Vaporflys make runners faster?

Try to understand what might bias their estimates

Descriptive

or causal?

Takeaway

It’s perfectly fine to be here!

If your evidence is descriptive, focus on that!

Don’t make causal claims.

And don’t use words like “association” or “link” that imply causality without explicitly

claiming it.

If your claim is causal, ask yourself:

What’s the counterfactual?

What’s the process by which some people received the treatment and others didn’t?

Is that plausible?

Slide Note

Embed Share

Download

Delve into the world of data journalism as it explores the causal relationship between museum visits and longevity. Discover the theory of selection, randomized controlled trials, and the nuances of causal claims. Uncover the ambiguity in causal-associational relationships and the idealized RCT approach. Gain insights into how media covers causality in data journalism and the importance of rigorous methodology in drawing causal conclusions.

eagles_j Follow

Uploaded on Feb 27, 2025 | 0 Views

Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

Download Presentation

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript

Causal Inference in Data Journalism Laura Bronner ETH Z rich

Two types of data journalism Descriptive Causal Here s a thing that happens Here s the effect of a thing

What is a causal claim? Defining cause and effect Here: Cause: museum visits (and other arts engagement) Effect: longer life The counterfactual you re interested in: Here: Longevity of someone who goes to museums, compared to someone who doesn t

Theory of selection A qualitative argument about how/why someone ends up in one group (here: going to museums) as opposed to the other (here: not going to museums) The benefit of randomized controlled trials: - Controlled: - You control the selection process: how and why someone ends up in one group or the other - Randomized: - The two groups are the same, on average on dimensions you know about (age, health, etc) but also on dimensions you might not be able to measure (curiosity, habits) or even know about (?, ?)

Randomized controlled trials: the gold standard What this means: When you compare treatment and control, you know (a) why they took the treatment, and (b) that they are otherwise the same. Result: Any difference between the two groups is because of the treatment

Causal-associational ambiguity (or: the This is just an association! ;) cop-out) From the study: this study was observational, and although we took a number of additional steps to try and test the assumptions of models, causality cannot be assumed. Important: not just what researchers say, but: how they write about it how media covers it how readers understand it. What will people take from it? Are we interested in whether museum-goers live longer? Or whether going to the museum makes you live longer?

Whats the idealized RCT here? Make some people go to museums for 14 years, and prevent others from doing so? Is the study s methodology approximating this by controlling for various socioeconomic/health variables? These aren t the same kinds of people.

Causal claims are hard Need to - - - identify cause and effect establish a counterfactual estimate the effect My favorite example: NYT on Running Shoes

Do Vaporflys make runners faster? Look at runners Strava times those who wear Vaporflys and those who don t But go further: try to understand why runners might pick Vaporflys

Try to understand what might bias their estimates

Descriptive or causal?

Takeaway It s perfectly fine to be here! Descriptive Causal Here s a thing that happens Here s the effect of a thing If your evidence is descriptive, focus on that! - Don t make causal claims. - And don t use words like association or link that imply causality without explicitly claiming it. If your claim is causal, ask yourself: - What s the counterfactual? - What s the process by which some people received the treatment and others didn t? - Is that plausible?

Causal Inference in Data Journalism

Download Presentation

Presentation Transcript

Related

More Related Content