A QUESTION FROM WEDNESDAY…

 
A QUESTION FROM WEDNESDAY…
1
 
“…Are there any comprehensive resources listing longitudinal studies from around the
world?”
 
Leo has answered: “Yes: 
 
https://www.landscaping-longitudinal-research.com
undefined
 
CAUSALITY
  statistics is not enough
2
nick.shryane@manchester.ac.uk
 
A TYPICAL RESEARCH PROBLEM
 
Research question: 
Is low education a cause of type II diabetes?
 
Key Variables
Outcome (Y) = Whether respondent has 
type II diabetes
Predictor (X) = Whether respondent has 
low education level
 
Possible Confounders (Z):
Mother diagnosed with type II diabetes
Mother’s genetic risk for type II diabetes
Respondent’s genetic risk for type II diabetes
Income during mother’s childhood
Income during respondent’s childhood
 
How do we decide which confounders, if any, to control for?
 
Wouldn’t it be safest to simply control for them all? What if we don’t have data for some?
3
 
JOIN IN
 
Go to 
slido.com 
 
Enter code #4204494
 
0. “What approaches can we use for model selection?”
(
https://admin.sli.do/event/xnA37S7AchTABXDQ8RKcYy/polls
)
4
 
1. CORRELATION HELPS US WITH MODEL
SELECTION?
The correlation between the hypothesised cause (x) and outcome (y) is 0.37. 
The correlation between the outcome (y) and a presumed confounder (z1) is quite high, 0.27. 
Therefore, z1 should be included in our analysis to control for this potential confounder, right? 
What do you say?
5
 
JOIN IN
 
Go to 
slido.com 
 
Enter code #4204494
 
1. “The variable should be included because of the decent correlation?”
6
 
1. CORRELATION SHOWS US POTENTIAL
CONFOUNDERS, RIGHT?
z1
x
y
Confounder
Simple confounders are direct predictors of the 
predictor and outcome variables in our presumed
causal relationship.
Confounders create a correlation between x and y, 
not because x causes y but because both are caused
by z1
If there is no true causal relationship between x and y,
then the correlation between them will be zero if we 
control for z1
7
 
1. NO, CORRELATION CANNOT BE RELIED UPON
FOR MODEL SELECTION
z1
x
y
z1
x
y
Confounder - false
Mediator - true
…statistically identical with…
What if z1 is 
not a confounder
?
What if, say, a drug (x) lowers blood pressure (z1), reducing 
cardiovascular disease (y).  
z1 is a consequence of x, not a confounder.
z1 is a mediator.
8
 
1. THE STATISTICAL CONSEQUENCES OF MEDIATORS AND
CONFOUNDERS ARE THE SAME
C is a part of a Chain 
(mediator)
e.g. deprivation (x) causes stress response (c) causes health
outcomes (z)
     Confounding / Fork 
(common cause)
 
e.g. Good weather (c) predicts ice cream sales (x) and drownings (y)
9
If we erroneously control for a mediator, the estimated b coefficient for x will be zero
 
1. REASONS WHY WE MIGHT WE OBSERVE
VARIABLES X AND Y TO BE CORRELATED
C is a part of a Chain 
(mediator)
e.g. deprivation (x) causes stress response (c) causes health
outcomes (z)
     Confounding / Fork 
(common cause)
 
e.g. Good weather (c) predicts ice cream sales (x) and drownings (y)
      Collider / Inverted fork 
(common outcome)
e.g. academic ability (x) and wealth (y) predict college admission (c)
10
 
11
 
2. DO YOU APPROVE THE DRUG?
You are the 
Chief Medical Statistician
. On this evidence, should the new drug be approved? 
(The drug was offered to random (sex-matched) patients with the same diagnosis).
 
12
 
2. DO YOU APPROVE THE DRUG?
Your assistant statistician rushes in with a new breakdown of the results by sex.
Do you want to change your decision?
 
13
 
2. DO YOU APPROVE THE DRUG?
If we 
don’t know 
the sex of the patient, the drug looks 
worse than ineffective
.
If we 
know 
the sex of the patient, the drug is 
effective 
for both women and men!!
 
14
 
2. DO YOU APPROVE THE DRUG?
What is going on here? What are the important factors? Do any of them cause one another? 
15
 
JOIN IN
 
Go to 
slido.com 
 
Enter code #4204494
 
3. “Do you approve the drug?”
16
 
17
 
2. YES. YOU APPROVE THE DRUG.
18
 
2. YES. YOU APPROVE THE DRUG.
Women are 
less
 likely to take the drug
and the drug is less effective for women,
but the drug is efficacious in both groups.
19
more
 
2. YES. YOU APPROVE THE DRUG.
Sex is a confounder.
 
Confounding is a causal concept,
not a statistical one
20
 
21
 
3. HOW TO DECIDE WHICH CONFOUNDERS TO
CONTROL FOR
 
Research question: Is low education a cause of type II diabetes?
 
Key Variables
Outcome (Y) = Whether respondent has 
type II diabetes
Predictor (X) = Whether respondent has 
low education level
 
Possible Confounders (Z):
Mother diagnosed with type II diabetes
Mother’s genetic risk for type II diabetes
Respondent’s genetic risk for type II diabetes
Income during mother’s childhood
Income during respondent’s childhood
 
Which confounders, if any, shall we control for?
 
If we had data on all of them, would it just be safest to control for them all?
22
 
WE NEED TO SPECIFY OUR CASUAL ASSUMPTIONS
23
 
H0 : NO CAUSAL EFFECT OF CHILD’S LOW EDUCATION ON DIABETES
RISK. DIABETES IS CAUSED BY GENES AND POVERTY
Mother’s childhood 
poverty
Mother’s genetic
diabetes II risk
Child’s 
diabetes 
II (y)
Mother’s 
diabetes
II
Child’s 
childhood 
poverty
Child’s genetic
diabetes II risk
Child’s low
education (x)
This Directed Acyclic
Graph (DAG) shows the
hypothesised causal
relationships among 
the variables.
It shows the null hypothesis
for the research question:
low education does not 
directly cause diabetes (i.e. 
there is no arrow between 
them)
24
 
H0 : NO CAUSAL EFFECT OF CHILD’S LOW EDUCATION ON DIABETES
RISK. DIABETES IS CAUSED BY GENES AND POVERTY
Mother’s childhood 
poverty
Mother’s genetic
diabetes II risk
Child’s 
diabetes 
II (y)
Mother’s 
diabetes
II
Child’s 
childhood 
poverty
Child’s genetic
diabetes II risk
Child’s low
education (x)
We need to know 
this Directed Acyclic
Graph (DAG) to know
which variables to 
control for…
25
 
BACKDOOR PATHS
Mother’s childhood 
poverty
Mother’s genetic
diabetes II risk
Child’s 
diabetes 
II (y)
Mother’s 
diabetes
II
Child’s 
childhood 
poverty
Child’s genetic
diabetes II risk
Child’s low
education (x)
Any two variables can 
be spuriously correlated
if we can draw a
“backdoor” path 
between them
26
 
BACKDOOR PATHS
Mother’s childhood 
poverty
Mother’s genetic
diabetes II risk
Child’s 
diabetes 
II (y)
Mother’s 
diabetes
II
Child’s 
childhood 
poverty
Child’s genetic
diabetes II risk
Child’s low
education (x)
Backdoor path
:
Linked arrows point into 
both the variables at the 
start and end of the path.
There is a backdoor path 
between diabetes statuses:
Mo diab<-Mo gen->Chi gen->Chi diab
27
 
BACKDOOR PATHS
Mother’s childhood 
poverty
Mother’s genetic
diabetes II risk
Child’s 
diabetes 
II (y)
Mother’s 
diabetes
II
Child’s 
childhood 
poverty
Child’s genetic
diabetes II risk
Child’s low
education (x)
This backdoor path means that
Mother’s and child’s diabetes
statuses’ will be correlated –
not causally because diabetes
causes diabetes, but spuriously
because of shared genetic
influence.
28
 
BLOCKING BACKDOOR PATHS
Mother’s childhood 
poverty
Mother’s genetic
diabetes II risk
Child’s 
diabetes 
II (y)
Mother’s 
diabetes
II
Child’s 
childhood 
poverty
Child’s genetic
diabetes II risk
Child’s low
education (x)
How can we remove this spurious
correlation?
We can “control” for variables on
the backdoor path, depending on
whether they are a:
1. Fork (confounder)
2. Chain (mediator)
3. Collider
29
 
CONTROL FOR A 
FORK 
VARIABLE TO BLOCK THE BACKDOOR PATH
Mother’s childhood 
poverty
Mother’s genetic
diabetes II risk
Child’s 
diabetes 
II (y)
Mother’s 
diabetes
II
Child’s 
childhood 
poverty
Child’s genetic
diabetes II risk
Child’s low
education (x)
Mother’s genetic risk is on a 
fork 
on
the backdoor path
Fork:   Var1 <- 
Var2 
-> Var3
If we 
control
 for a fork, we 
block
the spurious relation along the
backdoor path
(A fork can also be called a
confounder)
Mo diab<-
Mo gen
->Chi gen->Chi diab
30
 
CONTROL FOR A 
CHAIN 
VARIABLE TO BLOCK THE BACKDOOR PATH
Mother’s childhood 
poverty
Mother’s genetic
diabetes II risk
Child’s 
diabetes 
II (y)
Mother’s 
diabetes
II
Child’s 
childhood 
poverty
Child’s genetic
diabetes II risk
Child’s low
education (x)
Child’s genetic risk is in a 
chain 
on
the backdoor path
Fork:   Var1 -> 
Var2 
-> Var3
If we 
control
 for a chain, we 
block
the spurious relation along the
backdoor path.
(A chain is also known as a
mediator)
31
Mo diab<-
Mo gen
->Chi gen->Chi diab
 
THE PATH ONLY NEEDS TO BE BLOCKED AT ONE PLACE
Mother’s childhood 
poverty
Mother’s genetic
diabetes II risk
Child’s 
diabetes 
II (y)
Mother’s 
diabetes
II
Child’s 
childhood 
poverty
Child’s genetic
diabetes II risk
Child’s low
education (x)
The path only needs to be blocked
in one place.
In theory, we can control for 
either
Mother’s 
or 
Child’s genetic risk, to
block the backdoor path.
32
 
IF WE DON’T HAVE GENETIC DATA, DIABETES STATUSES OF PARENT
AND CHILD WILL BE SPURIOUSLY CORRELATED
Mother’s childhood 
poverty
Mother’s genetic
diabetes II risk
Child’s 
diabetes 
II (y)
Mother’s 
diabetes
II
Child’s 
childhood 
poverty
Child’s genetic
diabetes II risk
Child’s low
education (x)
If we don’t have genetic data
for the mother or child, we
can’t block this backdoor path,
It will remain 
open
, provoking a
spurious (non-causal) correlation
between the variables at the ends
of the backdoor path.
In this case, if we don’t know the
genetic status, mother’s and child’s
diabetes statuses will be correlated.
33
 
IS THERE A BACKDOOR PATH BETWEEN X AND Y?
Mother’s childhood 
poverty
Mother’s genetic
diabetes II risk
Child’s 
diabetes 
II (y)
Mother’s 
diabetes
II
Child’s 
childhood 
poverty
Child’s genetic
diabetes II risk
Child’s low
education (x)
Yes. It’s a longer version of the
previous path.
If we don’t block this path, our
estimate of the causal effect of x
on y will be affected by
the spurious correlation.
Child Ed <- Mo Pov->Mo diab<-Mo gen->Chi gen->Chi diab
34
 
IS THERE A BACKDOOR PATH BETWEEN X AND Y?
Mother’s childhood 
poverty
Mother’s genetic
diabetes II risk
Child’s 
diabetes 
II (y)
Mother’s 
diabetes
II
Child’s 
childhood 
poverty
Child’s genetic
diabetes II risk
Child’s low
education (x)
If we don’t have genetic
information, can we still block the
path?
We could perhaps control for
Mother’s diabetes II?
Child Ed <- Mo Pov->
Mo diab
<-Mo gen->Chi gen->Chi diab
35
 
 MOTHER’S DIABETES II STATUS IS A 
COLLIDER
Mother’s childhood 
poverty
Mother’s genetic
diabetes II risk
Child’s 
diabetes 
II (y)
Mother’s 
diabetes
II
Child’s 
childhood 
poverty
Child’s genetic
diabetes II risk
Child’s low
education (x)
Mother’s diabetes II status is a
collider 
on the backdoor path – it
has two arrows pointing into it
(colliding).
A collider blocks the path by 
not
controlling 
for it.
If we control for a collider we
OPEN the backdoor path.
Why?
Child Ed <- Mo Pov->
Mo diab
<-Mo gen->Chi gen->Chi diab
36
 
CONTROLLING FOR A COLLIDER OPENS THE BACKDOOR PATH
Mother’s childhood 
poverty
Mother’s genetic
diabetes II risk
Mother’s 
diabetes
II
Let’s look at just mother’s diabetes
status.
The DAG defines two causes of
diabetes: poverty and genetics.
If we control/stratify by diabetes,
then for those with diabetes, if one
cause is absent, the other is more
likely. E.g. if I have diabetes but no
genetic risk factors, it becomes
likely that I had poverty in
childhood.
Mo Pov->
Mo diab
<-Mo gen
37
 
CONTROLLING FOR A COLLIDER OPENS THE BACKDOOR PATH
Mother’s childhood 
poverty
Mother’s genetic
diabetes II risk
Mother’s 
diabetes
II
The variables at the ends of the backdoor path will become conditionally
correlated, conditional on knowing the status of the collider variable.
So, the path is OPEN if we control for the collider.
The path is closed if we DON’T control for the collider.
Mo Pov->
Mo diab
<-Mo gen
38
 
H0 : DIABETES IS CAUSED BY GENES AND POVERTY. NO CAUSAL
EFFECT OF CHILD’S LOW EDUCATION ON DIABETES RISK
Mother’s childhood 
poverty
Mother’s genetic
diabetes II risk
Child’s 
diabetes 
II (y)
Mother’s 
diabetes
II
Child’s 
childhood 
poverty
Child’s genetic
diabetes II risk
Child’s low
education (x)
So, even if we don’t have genetic
data, according to this DAG we can
block this backdoor path by NOT
controlling for Mother’s diabetes II
status.
Child Ed <- Mo Pov->Mo diab<-Mo gen->Chi gen->Chi diab
39
 
WHICH CONFOUNDERS SHALL WE CONTROL FOR?
 
Research question: Is low education a cause of type II diabetes?
 
Key Variables
Outcome (Y) = Whether respondent  has 
type II diabetes
Predictor (X) = Whether respondent has 
low education level
 
Possible Confounders (Z):
Mother diagnosed with type II diabetes
Mother’s genetic risk for type II diabetes
Respondent’s genetic risk for type II diabetes
Income during mother’s childhood
Income during respondent’s childhood
If the DAG is true, we need 
not control for 
anything 
to 
block this backdoor path.
BUT…
40
 
THERE WILL BE MORE THAN ONE BACKDOOR PATH..
Mother’s childhood 
poverty
Mother’s genetic
diabetes II risk
Child’s 
diabetes 
II (y)
Mother’s 
diabetes
II
Child’s 
childhood 
poverty
Child’s genetic
diabetes II risk
Child’s low
education (x)
There are other backdoor
paths between predictor (x)
and outcome (Y), e.g.
Child Ed <- 
Chi pov
 ->Chi diab
Child’s childhood poverty is a
fork and so, if controlled, will
block this spurious path.
41
 
THERE WILL BE MORE THAN ONE BACKDOOR PATH..
Mother’s childhood 
poverty
Mother’s genetic
diabetes II risk
Child’s 
diabetes 
II (y)
Mother’s 
diabetes
II
Child’s 
childhood 
poverty
Child’s genetic
diabetes II risk
Child’s low
education (x)
There are other backdoor
paths between predictor (x)
and outcome (y), e.g.
Child Ed <- Mo pov <- 
Chi pov
->Mo Pov
This path is also blocked by
Child’s childhood poverty.
42
 
WHICH CONFOUNDERS SHALL WE CONTROL FOR?
 
Research question: Is low education a cause of type II diabetes?
 
Key Variables
Outcome (Y) = Whether respondent  has 
type II diabetes
Predictor (X) = Whether respondent has 
low education level
 
Possible Confounders (Z):
Mother diagnosed with type II diabetes
Mother’s genetic risk for type II diabetes
Respondent’s genetic risk for type II diabetes
Income during mother’s childhood
Income during respondent’s childhood
If the DAG is correct, we only need to 
control for the child’s poverty in childhood,
and NOT for mother’s diabetes status.
43
 
SUMMARY
 
Statistics and data are not enough.
 
We need to make causal assumptions.
 
A DAG is just a set of causal assumptions.
 
The DAG can then guide model building and comparison.
44
 
SUMMARY
 
I’ve covered some main points but over-simplified others
 
There are many topics we didn’t cover, e.g.:
Mendelian Randomization
Longitudinal designs, e.g. longitudinal mediation
Latent variables (i.e. unmeasured causes)
Multilevel causation (e.g. pupils affecting teachers, affecting pupils)
45
 
LEARN MORE -> START HERE
46
 
LEARN MORE -> WORKED EXAMPLES
47
 
LEARN MORE -> DIG DEEP WITH JUDEA PEARL
48
 
2. RANDOMIZATION
 
Variables that have been properly 
randomized do not have any 
causes – no arrows going into them
in the DAG
Randomization of patients to Drug
means that person-specific qualities
(e.g. sex) cannot be causes of 
between-patient variance in Drug 
uptake.
49
 
2. MENDELIAN RANDOMIZATION
LDL
HMGCR
CVD
Research question:
Does Low Density Lipoprotein (LDL) blood cholesterol cause
CardioVascular Disease (CVD)?
Problem
: unmeasured confounders (u) of LDL and CVD
Solution
: HMGCR: Gene variants involved in LDL production.
A person’s HMGCR variants are 
randomized at conception
.
HMGCR status is independent of the unmeasured confounders (u)
of LDL and CVD; we can use it as an ‘instrument’ variable.
50
Slide Note
Embed
Share

A compilation of resources providing information on longitudinal studies worldwide, addressing model selection, correlation in causality analysis, and confounding variables for research clarity.

  • Longitudinal Studies
  • Research Resources
  • Causality Analysis
  • Model Selection
  • Confounding Variables

Uploaded on Feb 13, 2025 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. A QUESTION FROM WEDNESDAY Are there any comprehensive resources listing longitudinal studies from around the world? Leo has answered: Yes: https://www.landscaping-longitudinal-research.com 1

  2. nick.shryane@manchester.ac.uk CAUSALITY statistics is not enough 2

  3. A TYPICAL RESEARCH PROBLEM Research question: Is low education a cause of type II diabetes? Key Variables Outcome (Y) = Whether respondent has type II diabetes Predictor (X) = Whether respondent has low education level Possible Confounders (Z): Mother diagnosed with type II diabetes Mother s genetic risk for type II diabetes Respondent s genetic risk for type II diabetes Income during mother s childhood Income during respondent s childhood How do we decide which confounders, if any, to control for? Wouldn t it be safest to simply control for them all? What if we don t have data for some? 3

  4. JOIN IN Go to slido.com Enter code #4204494 0. What approaches can we use for model selection? (https://admin.sli.do/event/xnA37S7AchTABXDQ8RKcYy/polls) 4

  5. 1. CORRELATION HELPS US WITH MODEL SELECTION? x y z1 z2 x 1 y 0.37 1 z1 0.13 0.27 1 z2 -0.08 -0.01 0.54 1 The correlation between the hypothesised cause (x) and outcome (y) is 0.37. The correlation between the outcome (y) and a presumed confounder (z1) is quite high, 0.27. Therefore, z1 should be included in our analysis to control for this potential confounder, right? What do you say? 5

  6. JOIN IN Go to slido.com Enter code #4204494 1. The variable should be included because of the decent correlation? 6

  7. 1. CORRELATION SHOWS US POTENTIAL CONFOUNDERS, RIGHT? Simple confounders are direct predictors of the predictor and outcome variables in our presumed causal relationship. Confounders create a correlation between x and y, not because x causes y but because both are caused by z1 z1 x y If there is no true causal relationship between x and y, then the correlation between them will be zero if we control for z1 Confounder ???| ?1 = 0 7

  8. 1. NO, CORRELATION CANNOT BE RELIED UPON FOR MODEL SELECTION What if z1 is not a confounder? What if, say, a drug (x) lowers blood pressure (z1), reducing cardiovascular disease (y). z1 is a consequence of x, not a confounder. z1 is a mediator. z1 x y x z1 y Confounder - false Mediator - true statistically identical with 8

  9. 1. THE STATISTICAL CONSEQUENCES OF MEDIATORS AND CONFOUNDERS ARE THE SAME x c y C is a part of a Chain (mediator) e.g. deprivation (x) causes stress response (c) causes health ???| ? = 0 outcomes (z) c Confounding / Fork (common cause) e.g. Good weather (c) predicts ice cream sales (x) and drownings (y) ???| ? = 0 x y If we erroneously control for a mediator, the estimated b coefficient for x will be zero 9

  10. 11

  11. 2. DO YOU APPROVE THE DRUG? You are the Chief Medical Statistician. On this evidence, should the new drug be approved? (The drug was offered to random (sex-matched) patients with the same diagnosis). 12

  12. 2. DO YOU APPROVE THE DRUG? Your assistant statistician rushes in with a new breakdown of the results by sex. Do you want to change your decision? 13

  13. 2. DO YOU APPROVE THE DRUG? If we don t know the sex of the patient, the drug looks worse than ineffective. If we know the sex of the patient, the drug is effective for both women and men!! 14

  14. 2. DO YOU APPROVE THE DRUG? What is going on here? What are the important factors? Do any of them cause one another? 15

  15. JOIN IN Go to slido.com Enter code #4204494 3. Do you approve the drug? 16

  16. 17

  17. 2. YES. YOU APPROVE THE DRUG. 18

  18. 2. YES. YOU APPROVE THE DRUG. Women are less likely to take the drug and the drug is less effective for women, but the drug is efficacious in both groups. 19

  19. 2. YES. YOU APPROVE THE DRUG. Sex is a confounder. Confounding is a causal concept, not a statistical one 20

  20. 21

  21. 3. HOW TO DECIDE WHICH CONFOUNDERS TO CONTROL FOR Research question: Is low education a cause of type II diabetes? Key Variables Outcome (Y) = Whether respondent has type II diabetes Predictor (X) = Whether respondent has low education level Possible Confounders (Z): Mother diagnosed with type II diabetes Mother s genetic risk for type II diabetes Respondent s genetic risk for type II diabetes Income during mother s childhood Income during respondent s childhood Which confounders, if any, shall we control for? If we had data on all of them, would it just be safest to control for them all? 22

  22. WE NEED TO SPECIFY OUR CASUAL ASSUMPTIONS 23

  23. H0 : NO CAUSAL EFFECT OF CHILDS LOW EDUCATION ON DIABETES RISK. DIABETES IS CAUSED BY GENES AND POVERTY Mother s genetic diabetes II risk Mother s childhood poverty This Directed Acyclic Graph (DAG) shows the hypothesised causal relationships among the variables. Mother s diabetes II Child s low education (x) It shows the null hypothesis for the research question: low education does not directly cause diabetes (i.e. there is no arrow between them) Child s diabetes II (y) Child s childhood poverty Child s genetic diabetes II risk 24

  24. H0 : NO CAUSAL EFFECT OF CHILDS LOW EDUCATION ON DIABETES RISK. DIABETES IS CAUSED BY GENES AND POVERTY Mother s genetic diabetes II risk Mother s childhood poverty Mother s diabetes II We need to know this Directed Acyclic Graph (DAG) to know which variables to control for Child s low education (x) Child s diabetes II (y) Child s childhood poverty Child s genetic diabetes II risk 25

  25. BACKDOOR PATHS Mother s genetic diabetes II risk Mother s childhood poverty Mother s diabetes II Any two variables can be spuriously correlated if we can draw a backdoor path between them Child s low education (x) Child s diabetes II (y) Child s childhood poverty Child s genetic diabetes II risk 26

  26. BACKDOOR PATHS Mother s genetic diabetes II risk Mother s childhood poverty Mother s diabetes II Backdoor path: Linked arrows point into both the variables at the start and end of the path. Child s low education (x) There is a backdoor path between diabetes statuses: Child s diabetes II (y) Child s childhood poverty Child s genetic diabetes II risk Mo diab<-Mo gen->Chi gen->Chi diab 27

  27. BACKDOOR PATHS Mother s genetic diabetes II risk Mother s childhood poverty This backdoor path means that Mother s and child s diabetes statuses will be correlated not causally because diabetes causes diabetes, but spuriously because of shared genetic influence. Mother s diabetes II Child s low education (x) Child s diabetes II (y) Child s childhood poverty Child s genetic diabetes II risk 28

  28. BLOCKING BACKDOOR PATHS How can we remove this spurious correlation? Mother s genetic diabetes II risk Mother s childhood poverty Mother s diabetes II We can control for variables on the backdoor path, depending on whether they are a: Child s low education (x) 1. Fork (confounder) 2. Chain (mediator) 3. Collider Child s diabetes II (y) Child s childhood poverty Child s genetic diabetes II risk 29

  29. CONTROL FOR A FORK VARIABLE TO BLOCK THE BACKDOOR PATH Mother s genetic risk is on a fork on the backdoor path Mother s genetic diabetes II risk Mother s childhood poverty Mother s diabetes II Fork: Var1 <- Var2 -> Var3 If we control for a fork, we block the spurious relation along the backdoor path (A fork can also be called a confounder) Child s low education (x) Child s diabetes II (y) Child s childhood poverty Child s genetic diabetes II risk Mo diab<-Mo gen->Chi gen->Chi diab 30

  30. CONTROL FOR A CHAIN VARIABLE TO BLOCK THE BACKDOOR PATH Child s genetic risk is in a chain on the backdoor path Mother s genetic diabetes II risk Mother s childhood poverty Mother s diabetes II Fork: Var1 -> Var2 -> Var3 If we control for a chain, we block the spurious relation along the backdoor path. (A chain is also known as a mediator) Child s low education (x) Child s diabetes II (y) Child s childhood poverty Child s genetic diabetes II risk Mo diab<-Mo gen->Chi gen->Chi diab 31

  31. THE PATH ONLY NEEDS TO BE BLOCKED AT ONE PLACE The path only needs to be blocked in one place. Mother s genetic diabetes II risk Mother s childhood poverty Mother s diabetes II In theory, we can control for either Mother s or Child s genetic risk, to block the backdoor path. Child s low education (x) Child s diabetes II (y) Child s childhood poverty Child s genetic diabetes II risk 32

  32. IF WE DONT HAVE GENETIC DATA, DIABETES STATUSES OF PARENT AND CHILD WILL BE SPURIOUSLY CORRELATED If we don t have genetic data for the mother or child, we can t block this backdoor path, Mother s genetic diabetes II risk Mother s childhood poverty Mother s diabetes II It will remain open, provoking a spurious (non-causal) correlation between the variables at the ends of the backdoor path. Child s low education (x) In this case, if we don t know the genetic status, mother s and child s diabetes statuses will be correlated. Child s diabetes II (y) Child s childhood poverty Child s genetic diabetes II risk 33

  33. IS THERE A BACKDOOR PATH BETWEEN X AND Y? Mother s genetic diabetes II risk Mother s childhood poverty Yes. It s a longer version of the previous path. Mother s diabetes II If we don t block this path, our estimate of the causal effect of x on y will be affected by the spurious correlation. Child s low education (x) Child s diabetes II (y) Child s childhood poverty Child s genetic diabetes II risk Child Ed <- Mo Pov->Mo diab<-Mo gen->Chi gen->Chi diab 34

  34. IS THERE A BACKDOOR PATH BETWEEN X AND Y? If we don t have genetic information, can we still block the path? Mother s genetic diabetes II risk Mother s childhood poverty Mother s diabetes II We could perhaps control for Mother s diabetes II? Child s low education (x) Child s diabetes II (y) Child s childhood poverty Child s genetic diabetes II risk Child Ed <- Mo Pov->Mo diab<-Mo gen->Chi gen->Chi diab 35

  35. MOTHERS DIABETES II STATUS IS A COLLIDER Mother s diabetes II status is a collider on the backdoor path it has two arrows pointing into it (colliding). Mother s genetic diabetes II risk Mother s childhood poverty Mother s diabetes II A collider blocks the path by not controlling for it. Child s low education (x) If we control for a collider we OPEN the backdoor path. Child s diabetes II (y) Child s childhood poverty Child s genetic diabetes II risk Why? Child Ed <- Mo Pov->Mo diab<-Mo gen->Chi gen->Chi diab 36

  36. CONTROLLING FOR A COLLIDER OPENS THE BACKDOOR PATH Let s look at just mother s diabetes status. Mother s genetic diabetes II risk Mother s childhood poverty Mother s diabetes II The DAG defines two causes of diabetes: poverty and genetics. If we control/stratify by diabetes, then for those with diabetes, if one cause is absent, the other is more likely. E.g. if I have diabetes but no genetic risk factors, it becomes likely that I had poverty in childhood. Mo Pov->Mo diab<-Mo gen 37

  37. CONTROLLING FOR A COLLIDER OPENS THE BACKDOOR PATH Mother s genetic diabetes II risk Mother s childhood poverty Mother s diabetes II The variables at the ends of the backdoor path will become conditionally correlated, conditional on knowing the status of the collider variable. So, the path is OPEN if we control for the collider. The path is closed if we DON T control for the collider. Mo Pov->Mo diab<-Mo gen 38

  38. H0 : DIABETES IS CAUSED BY GENES AND POVERTY. NO CAUSAL EFFECT OF CHILD S LOW EDUCATION ON DIABETES RISK So, even if we don t have genetic data, according to this DAG we can block this backdoor path by NOT controlling for Mother s diabetes II status. Mother s genetic diabetes II risk Mother s childhood poverty Mother s diabetes II Child s low education (x) Child s diabetes II (y) Child s childhood poverty Child s genetic diabetes II risk Child Ed <- Mo Pov->Mo diab<-Mo gen->Chi gen->Chi diab 39

  39. WHICH CONFOUNDERS SHALL WE CONTROL FOR? Research question: Is low education a cause of type II diabetes? Key Variables Outcome (Y) = Whether respondent has type II diabetes Predictor (X) = Whether respondent has low education level Possible Confounders (Z): Mother diagnosed with type II diabetes Mother s genetic risk for type II diabetes Respondent s genetic risk for type II diabetes Income during mother s childhood Income during respondent s childhood If the DAG is true, we need not control for anything to block this backdoor path. BUT 40

  40. THERE WILL BE MORE THAN ONE BACKDOOR PATH.. There are other backdoor paths between predictor (x) and outcome (Y), e.g. Mother s genetic diabetes II risk Mother s childhood poverty Mother s diabetes II Child Ed <- Chi pov ->Chi diab Child s childhood poverty is a fork and so, if controlled, will block this spurious path. Child s low education (x) Child s diabetes II (y) Child s childhood poverty Child s genetic diabetes II risk 41

  41. THERE WILL BE MORE THAN ONE BACKDOOR PATH.. There are other backdoor paths between predictor (x) and outcome (y), e.g. Mother s genetic diabetes II risk Mother s childhood poverty Mother s diabetes II Child Ed <- Mo pov <- Chi pov ->Mo Pov This path is also blocked by Child s childhood poverty. Child s low education (x) Child s diabetes II (y) Child s childhood poverty Child s genetic diabetes II risk 42

  42. WHICH CONFOUNDERS SHALL WE CONTROL FOR? Research question: Is low education a cause of type II diabetes? Key Variables Outcome (Y) = Whether respondent has type II diabetes Predictor (X) = Whether respondent has low education level Possible Confounders (Z): Mother diagnosed with type II diabetes Mother s genetic risk for type II diabetes Respondent s genetic risk for type II diabetes Income during mother s childhood Income during respondent s childhood If the DAG is correct, we only need to control for the child s poverty in childhood, and NOT for mother s diabetes status. 43

  43. SUMMARY Statistics and data are not enough. We need to make causal assumptions. A DAG is just a set of causal assumptions. The DAG can then guide model building and comparison. 44

  44. SUMMARY I ve covered some main points but over-simplified others There are many topics we didn t cover, e.g.: Mendelian Randomization Longitudinal designs, e.g. longitudinal mediation Latent variables (i.e. unmeasured causes) Multilevel causation (e.g. pupils affecting teachers, affecting pupils) 45

  45. LEARN MORE -> START HERE 46

  46. LEARN MORE -> WORKED EXAMPLES 47

  47. LEARN MORE -> DIG DEEP WITH JUDEA PEARL 48

  48. 2. RANDOMIZATION Variables that have been properly randomized do not have any causes no arrows going into them in the DAG Randomization of patients to Drug means that person-specific qualities (e.g. sex) cannot be causes of between-patient variance in Drug uptake. 49

  49. 2. MENDELIAN RANDOMIZATION U Research question: Does Low Density Lipoprotein (LDL) blood cholesterol cause CardioVascular Disease (CVD)? Problem: unmeasured confounders (u) of LDL and CVD HMGCR LDL CVD Solution: HMGCR: Gene variants involved in LDL production. A person s HMGCR variants are randomized at conception. HMGCR status is independent of the unmeasured confounders (u) of LDL and CVD; we can use it as an instrument variable. 50

Related


More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#