Directed Acyclic Graphs (DAGs)
Explore the significance of Directed Acyclic Graphs (DAGs) in comprehending data structures, addressing issues like bias, loss to follow up, and missing data impacts in studies. Gain insights into key concepts, nodes, arrows, causality, associations, causal structures, and the role of confounders. Enhance your knowledge on causality, independence, estimations, and conditioning through illustrative examples.
Download Presentation
Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
E N D
Presentation Transcript
Directed Acyclic Graphs (DAGs) Gunn-Helen Moen ARC DECRA Fellow Institute for Molecular Bioscience, University of Queensland
Why DAGs? Gives overview of your data and is a tool to understand your data Condition on a variable or not? How does selection bias and loss to follow up bias impact your study? What is the impact of missing data? Today s lecture: What is a DAG? Key concepts DAGs v path models Some exercises
Graphs A graph is a series of nodes (variables) connected by paths X Y Z A graph can be directed or undirected X Y Z A Directed Acyclic Graph (DAG) is a graph that is Directed (has arrows) and Acyclic (no feedback loops).
DAG or not DAG? (A) (C) (E) X Z W Y X X Z Y W Z V Y W V (B) (D) (F) W X X Z X Z Y Y Z Y W W V V
Node = variable Arrow = cause E=exposure, D=disease Read of the DAG: Causality Association Independency = arrow = path = no path Estimations: E-D association has two parts: E D E C U D causal effect bias keep open try to close Conditioning (Adjusting): E [C] U D
Association 3 possible causal structures Association and Cause Association Association 3 possible causal structures 3 possible causal structures Association 3 possible causal structure Yellow fingers Lung Cause 1 Yellow Lung cancer Yellow fingers fingers Lung cancer cancer Cause Cause 1 1 E D (reverse cause) Smoke Smoke Smoke Confounder Confounder Confounder Yellow Lung Yellow Lung Yellow fingers Lung cancer Yellow fingers fingers Lung cancer cancer Yellow fingers Lung cancer 2 Yellow fingers fingers Lung cancer cancer 2 2 U U U Collider Collider Collider Yellow Lung 3 Yellow fingers Lung cancer Yellow fingers fingers Lung cancer cancer 3 3 7
Confounder idea A common cause Adjust for smoking Smoking Smoking + + + + Yellow fingers Lung cancer Yellow fingers Lung cancer + A confounder induces an association between its effects Conditioning on a confounder removes the association Condition = (restrict, stratify, adjust) 8
Oct-24 Collider idea Selected subjects Two causes for selection to study Selected Selected + + + + Yellow fingers Lung cancer Yellow fingers Lung cancer Conditioning on a collider induces an association between its causes 9
Mediator M [M] direct effect D E ????? ?????? = ???????? + ?????? ?????? ?????? = ????????? ?? ???????? 10
Dependent or Independent? Graphs allow us to determine whether two variables are independent or (likely) dependent Two variables are independent if every path between them is blocked If even one path between X and Z is unblocked, then X and Z are (likely) dependent Colliders block paths between variables The act of conditioning on a variable can block a path However, conditioning on a collider opens paths U V X W Y Z
Statistical criteria for variable selection C - Want the effect of E on D(E precedes D) - Observe the two associations C-E and C-D E The undirected graph above is compatible with three DAGs: D C C C E D E D E D Confounder 1. Adjust Mediator Collider 4. Not adjust 2. Direct: adjust 3. Total: not adjust Conclusion: Need information from outside the data to do a proper analysis 12
Confounding versus selection bias Path: Any trail from E to D (without repeating itself) Open non-causal path = biasing path Confounding and selection bias not always distinct May use DAG to give distinct definitions: C C A B A B E D A Causal B E D E D Selection bias: Non-causal path open due to conditioning on a collider Confounding: Non-causal path without colliders 13 Hernan et al, A structural approach to selection bias, Epidemiology 2004
Concepts: Summing up Associations visible in data. Causal structure from outside the data. E D Cause DAG: no arrow means independence M Cause with Mediator E D C Cause with Confounder E D K Cause with Collider E D 14
Four rules 1. Causal path: E D (all arrows in the same direction) otherwise non-causal Before conditioning: 2. Closed path: K (closed at a collider, otherwise open) Conditioning on: 3. a non-collider closes: [M] or [C] 4. a collider opens: (or a descendant of a collider) [K] 15
DAGs vs SEMs / Path Models DAGs and path models are related but not the same DAGs Path Models Distribution free Assumes linearity and normality Implies probabilistic dependencies in model Implies (linear) covariances and variances in model One headed arrows only One headed and two headed arrows Feedback loops allowed Boxes indicate observed variables Acyclic Boxes indicate conditioning
Some exercise Are U and Z independent? U V X W Y Z Are U and Z independent? U V X W Y Z Are U and A independent? U V X Y Z W A
Whats the minimum number of variables to condition on to make Z1 and Y conditionally independent? Which variables? Z2 Z1 W1 W2 Z Y W3 X
In a Mendelian randomization analysis, why do we not condition on the exposure variable (i.e. check if it blocks the path from the SNP to the outcome)? Exposure Outcome SNP
Exercise: Physical activity and Coronary Heart Disease (CHD) We want the total effect of Physical Activity on CHD. 1. Write down the paths. 2. Are they causal/non-causal, open/closed? 3. What should we adjust for? 5 minutes 20
Oct-24 Exercise: Tea and depression 1. Write down the paths. Show type and status. You want the total effect of tea on depression. What would you adjust for? You want the direct effect of tea on depression. What would you adjust for? Is caffeine an intermediate variable or a confounder? O C caffeine coffee 2. 3. E tea D depression 4. 5 minutes 21
Exercise: Statin and CHD C U 1. 2. Write down the paths. Show type and status. You want the total effect of statin on CHD. What would you adjust for? If lifestyle is unmeasured, can we estimate the direct effect of statin on CHD (not mediated through cholesterol)? Is cholesterol an intermediate variable or a collider? lifestyle cholesterol 3. E statin D CHD 4. 5 minutes 22
Diabetes and Fractures We want the total effect of Diabetes on fractures (E D) Write the paths their type and status Which variables should we condition on? Conditional Path 1 E D 2 E F D 3 E B D 4 E [V] B D Unconditional Path 1 E D 2 E F D 3 E B D 4 E V B D 5 E P B D 5 E [P] B D Type Causal Causal Causal Non-causal Non-causal Non-causal Type Causal Causal Causal Non-causal Status Open Open Open Open Open Closed Status Open Open Open Closed Mediators More paths? Confounders 23
Convenience sample, homogenous sample H 1. Convenience sample: Conduct the study among hospital patients? 2. Homogeneous sample: Population data, exclude hospital patients? hospital E D fractures diabetes Conditional Path 1 E D Unconditional Path 1 E D 2 E H D 2 E [H] D Type Causal Non-causal Non-Causal Open Status Open Closed Type Causal Status Open Collider, selection bias 24
Adjusting for Selection bias S Paths smoke CHD smoke sex S age CHD Type Causal Non-causal Open Status Open sex age CHD smoke Adjusting for sex or age or both removes the selection bias 25 Hernan et al, A structural approach to selection bias, Epidemiology 2004
DAGs are simplified models of reality must be large enough to be realistic, small enough to be useful 26