Directed Acyclic Graphs (DAGs)

 
Directed Acyclic Graphs (DAGs)
Gunn-Helen Moen
ARC DECRA Fellow
Institute for Molecular Bioscience, University of Queensland
Gives overview of your data and is a tool to understand your data
Condition on a variable or not?
How does selection bias and loss to follow up bias impact your study?
What is the impact of missing data?
T
o
d
a
y
s
 
l
e
c
t
u
r
e
:
What is a DAG?
Key concepts
DAGs v path models
Some exercises
Why DAGs?
 
Key concepts
 
 
A graph is a series of nodes (variables) connected by paths
 
A graph can be directed or undirected
 
 
A 
D
irected 
A
cyclic 
G
raph (DAG) is a graph that is Directed (has arrows) and Acyclic (no feedback
loops).
Graphs
 
DAG or not DAG?
 
(A)
(B)
(C)
(D)
(E)
(F)
Node = variable
Arrow = cause
E=exposure,   D=disease
R
e
a
d
 
o
f
 
t
h
e
 
D
A
G
:
  
Causality
 
= arrow
  Association
 
= path
  Independency
 
= no path
 
Estimations:
E-D association has two parts:
E
D  
  
causal effect
 
keep open
E
C
U
D
 
bias
  
try to close
 
Conditioning (Adjusting): 
E
[C]
U
D
 
 
 
Association and Cause
 
7
 
Association
3 possible causal structure
 
(reverse cause)
 
E
 
D
 
A confounder induces an association between its effects
Conditioning on a confounder removes the association
Condition = (restrict, stratify, adjust)
 
Confounder idea
8
 
 Yellow fingers
Smoking
Lung cancer
A common cause
+
+
Adjust for smoking
 
 
Conditioning on a collider induces an association between its causes
Collider idea
9
Oct-24
 Yellow fingers
Selected
Lung cancer
Two causes for selection to study
+
+
Selected subjects
Mediator
 
10
 
 
E
 
D
 
M
indirect
effect
direct effect
 
[M]
Graphs allow us to determine whether two variables are independent or (likely) dependent
Two variables are independent if every path between them is blocked
If even one path between X and Z is unblocked, then X and Z are (likely) dependent
Colliders block paths between variables
The act of conditioning on a variable can block a path
However, conditioning on a collider opens paths…
Dependent or Independent?
 
U
V
W
X
Y
Z
Statistical criteria for variable selection
 
12
 
E
D
C
 
E
 
D
 
C
 
E
 
D
 
C
 
E
 
D
 
C
- Want the effect of E on D
 
(E precedes D)
- Observe the two associations C-E and C-D
 
The undirected graph above is compatible with three DAGs:
 
Confounder
1. Adjust
 
Mediator
2. Direct: adjust
3. Total: not adjust
 
Collider
4. Not adjust
 
Conclusion: 
 
Need information from outside the data to do a proper analysis
P
a
t
h
:
 
A
n
y
 
t
r
a
i
l
 
f
r
o
m
 
E
 
t
o
 
D
 
(
w
i
t
h
o
u
t
 
r
e
p
e
a
t
i
n
g
 
i
t
s
e
l
f
)
Open non-causal path = biasing path 
Confounding and selection bias not always distinct
May use DAG to give distinct definitions:
Confounding versus selection bias
 
13
 
Hernan et al, A structural approach to selection bias, Epidemiology 2004
Concepts: Summing up
 
14
 
 
E
 
D
 
E
 
D
 
M
 
E
 
D
 
C
 
E
 
D
K
 
Cause
 
Cause with 
Mediator
 
Cause with 
Confounder
 
Cause with 
Collider
 
DAG: 
no arrow means independence
 
Associations visible 
in data
.
Causal structure from 
outside the data
.
 
15
 
 
  
1. Causal path: E

D
 
 
 
 
 
 
(
a
l
l
 
a
r
r
o
w
s
 
i
n
 
t
h
e
 
s
a
m
e
 
d
i
r
e
c
t
i
o
n
)
 
o
t
h
e
r
w
i
s
e
 
n
o
n
-
c
a
u
s
a
l
 
B
e
f
o
r
e
 
c
o
n
d
i
t
i
o
n
i
n
g
:
  
2. Closed path: 
K
 
 
 
 
 
 
(
c
l
o
s
e
d
 
a
t
 
a
 
c
o
l
l
i
d
e
r
,
 
o
t
h
e
r
w
i
s
e
 
o
p
e
n
)
 
C
o
n
d
i
t
i
o
n
i
n
g
 
o
n
:
  
3. a non-collider closes: 
[M]
 or 
[C]
Four rules
 
4. a collider opens: 
 
  
[K]
        (
or a descendant of a collider
)
DAGs and path models are related but not the same
DAGs vs SEMs / Path Models
 
Some exercise…
 
 
Are U and Z independent?
 
Are U and Z independent?
 
Are U and A independent?
A
 
 
What’s the minimum number of variables to condition on to make Z1 and Y conditionally independent?
Which variables?
 
 
In a Mendelian randomization analysis, why do we not condition on the exposure variable (i.e. check if it
blocks the path from the SNP to the outcome)?
Exercise: Physical activity and
Coronary Heart Disease (CHD)
 
20
 
We want the total effect of Physical Activity on
CHD.
1.
Write down the paths.
2.
Are they 
causal/non-causal
, 
open/closed
?
3.
What should we adjust for?
5 minutes
Exercise: Tea and depression
1.
Write down the paths. Show 
type
 and
status
.
2.
You want the total effect of tea on
depression. What would you adjust for?
3.
You want the direct effect of tea on
depression. What would you adjust for?
4.
Is caffeine an intermediate variable or a
confounder?
21
Oct-24
5 minutes
1.
Write down the paths. Show 
type
 and 
status
.
2.
You want the total effect of statin on CHD.
What would you adjust for?
3.
If lifestyle is unmeasured, can we estimate the
direct effect of statin on CHD (not mediated
through cholesterol)?
4.
Is cholesterol an intermediate variable or a
collider?
Exercise: Statin and CHD
22
5 minutes
Diabetes and Fractures
 
23
 
 
Mediators
 
Confounders
 
We want the total effect of
Diabetes on fractures
(E 
 D)
Write the paths their type and status
Which variables should we condition on?
 
More paths?
Convenience sample, homogenous sample
 
24
 
 
Collider, selection bias
 
1. Convenience sample:
Conduct the study among
hospital patients?
 
2. Homogeneous sample:
Population data,
exclude hospital patients?
Adjusting for Selection bias
 
25
 
smoke
CHD
 
age
S
 
sex
Hernan et al, A structural approach to selection bias, Epidemiology 2004
 
Adjusting for 
sex
 or 
age
 or both
removes the selection bias
undefined
DAGs are simplified 
models of reality
 
26
must be large enough to be realistic,
small enough to be useful
 
Slide Note
Embed
Share

Explore the significance of Directed Acyclic Graphs (DAGs) in comprehending data structures, addressing issues like bias, loss to follow up, and missing data impacts in studies. Gain insights into key concepts, nodes, arrows, causality, associations, causal structures, and the role of confounders. Enhance your knowledge on causality, independence, estimations, and conditioning through illustrative examples.

  • DAGs
  • Data Analysis
  • Causality
  • Associations
  • Confounders

Uploaded on Oct 11, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. Directed Acyclic Graphs (DAGs) Gunn-Helen Moen ARC DECRA Fellow Institute for Molecular Bioscience, University of Queensland

  2. Why DAGs? Gives overview of your data and is a tool to understand your data Condition on a variable or not? How does selection bias and loss to follow up bias impact your study? What is the impact of missing data? Today s lecture: What is a DAG? Key concepts DAGs v path models Some exercises

  3. Key concepts

  4. Graphs A graph is a series of nodes (variables) connected by paths X Y Z A graph can be directed or undirected X Y Z A Directed Acyclic Graph (DAG) is a graph that is Directed (has arrows) and Acyclic (no feedback loops).

  5. DAG or not DAG? (A) (C) (E) X Z W Y X X Z Y W Z V Y W V (B) (D) (F) W X X Z X Z Y Y Z Y W W V V

  6. Node = variable Arrow = cause E=exposure, D=disease Read of the DAG: Causality Association Independency = arrow = path = no path Estimations: E-D association has two parts: E D E C U D causal effect bias keep open try to close Conditioning (Adjusting): E [C] U D

  7. Association 3 possible causal structures Association and Cause Association Association 3 possible causal structures 3 possible causal structures Association 3 possible causal structure Yellow fingers Lung Cause 1 Yellow Lung cancer Yellow fingers fingers Lung cancer cancer Cause Cause 1 1 E D (reverse cause) Smoke Smoke Smoke Confounder Confounder Confounder Yellow Lung Yellow Lung Yellow fingers Lung cancer Yellow fingers fingers Lung cancer cancer Yellow fingers Lung cancer 2 Yellow fingers fingers Lung cancer cancer 2 2 U U U Collider Collider Collider Yellow Lung 3 Yellow fingers Lung cancer Yellow fingers fingers Lung cancer cancer 3 3 7

  8. Confounder idea A common cause Adjust for smoking Smoking Smoking + + + + Yellow fingers Lung cancer Yellow fingers Lung cancer + A confounder induces an association between its effects Conditioning on a confounder removes the association Condition = (restrict, stratify, adjust) 8

  9. Oct-24 Collider idea Selected subjects Two causes for selection to study Selected Selected + + + + Yellow fingers Lung cancer Yellow fingers Lung cancer Conditioning on a collider induces an association between its causes 9

  10. Mediator M [M] direct effect D E ????? ?????? = ???????? + ?????? ?????? ?????? = ????????? ?? ???????? 10

  11. Dependent or Independent? Graphs allow us to determine whether two variables are independent or (likely) dependent Two variables are independent if every path between them is blocked If even one path between X and Z is unblocked, then X and Z are (likely) dependent Colliders block paths between variables The act of conditioning on a variable can block a path However, conditioning on a collider opens paths U V X W Y Z

  12. Statistical criteria for variable selection C - Want the effect of E on D(E precedes D) - Observe the two associations C-E and C-D E The undirected graph above is compatible with three DAGs: D C C C E D E D E D Confounder 1. Adjust Mediator Collider 4. Not adjust 2. Direct: adjust 3. Total: not adjust Conclusion: Need information from outside the data to do a proper analysis 12

  13. Confounding versus selection bias Path: Any trail from E to D (without repeating itself) Open non-causal path = biasing path Confounding and selection bias not always distinct May use DAG to give distinct definitions: C C A B A B E D A Causal B E D E D Selection bias: Non-causal path open due to conditioning on a collider Confounding: Non-causal path without colliders 13 Hernan et al, A structural approach to selection bias, Epidemiology 2004

  14. Concepts: Summing up Associations visible in data. Causal structure from outside the data. E D Cause DAG: no arrow means independence M Cause with Mediator E D C Cause with Confounder E D K Cause with Collider E D 14

  15. Four rules 1. Causal path: E D (all arrows in the same direction) otherwise non-causal Before conditioning: 2. Closed path: K (closed at a collider, otherwise open) Conditioning on: 3. a non-collider closes: [M] or [C] 4. a collider opens: (or a descendant of a collider) [K] 15

  16. DAGs vs SEMs / Path Models DAGs and path models are related but not the same DAGs Path Models Distribution free Assumes linearity and normality Implies probabilistic dependencies in model Implies (linear) covariances and variances in model One headed arrows only One headed and two headed arrows Feedback loops allowed Boxes indicate observed variables Acyclic Boxes indicate conditioning

  17. Some exercise Are U and Z independent? U V X W Y Z Are U and Z independent? U V X W Y Z Are U and A independent? U V X Y Z W A

  18. Whats the minimum number of variables to condition on to make Z1 and Y conditionally independent? Which variables? Z2 Z1 W1 W2 Z Y W3 X

  19. In a Mendelian randomization analysis, why do we not condition on the exposure variable (i.e. check if it blocks the path from the SNP to the outcome)? Exposure Outcome SNP

  20. Exercise: Physical activity and Coronary Heart Disease (CHD) We want the total effect of Physical Activity on CHD. 1. Write down the paths. 2. Are they causal/non-causal, open/closed? 3. What should we adjust for? 5 minutes 20

  21. Oct-24 Exercise: Tea and depression 1. Write down the paths. Show type and status. You want the total effect of tea on depression. What would you adjust for? You want the direct effect of tea on depression. What would you adjust for? Is caffeine an intermediate variable or a confounder? O C caffeine coffee 2. 3. E tea D depression 4. 5 minutes 21

  22. Exercise: Statin and CHD C U 1. 2. Write down the paths. Show type and status. You want the total effect of statin on CHD. What would you adjust for? If lifestyle is unmeasured, can we estimate the direct effect of statin on CHD (not mediated through cholesterol)? Is cholesterol an intermediate variable or a collider? lifestyle cholesterol 3. E statin D CHD 4. 5 minutes 22

  23. Diabetes and Fractures We want the total effect of Diabetes on fractures (E D) Write the paths their type and status Which variables should we condition on? Conditional Path 1 E D 2 E F D 3 E B D 4 E [V] B D Unconditional Path 1 E D 2 E F D 3 E B D 4 E V B D 5 E P B D 5 E [P] B D Type Causal Causal Causal Non-causal Non-causal Non-causal Type Causal Causal Causal Non-causal Status Open Open Open Open Open Closed Status Open Open Open Closed Mediators More paths? Confounders 23

  24. Convenience sample, homogenous sample H 1. Convenience sample: Conduct the study among hospital patients? 2. Homogeneous sample: Population data, exclude hospital patients? hospital E D fractures diabetes Conditional Path 1 E D Unconditional Path 1 E D 2 E H D 2 E [H] D Type Causal Non-causal Non-Causal Open Status Open Closed Type Causal Status Open Collider, selection bias 24

  25. Adjusting for Selection bias S Paths smoke CHD smoke sex S age CHD Type Causal Non-causal Open Status Open sex age CHD smoke Adjusting for sex or age or both removes the selection bias 25 Hernan et al, A structural approach to selection bias, Epidemiology 2004

  26. DAGs are simplified models of reality must be large enough to be realistic, small enough to be useful 26

More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#