A QUESTION FROM WEDNESDAY…
A compilation of resources providing information on longitudinal studies worldwide, addressing model selection, correlation in causality analysis, and confounding variables for research clarity.
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
A QUESTION FROM WEDNESDAY Are there any comprehensive resources listing longitudinal studies from around the world? Leo has answered: Yes: https://www.landscaping-longitudinal-research.com 1
nick.shryane@manchester.ac.uk CAUSALITY statistics is not enough 2
A TYPICAL RESEARCH PROBLEM Research question: Is low education a cause of type II diabetes? Key Variables Outcome (Y) = Whether respondent has type II diabetes Predictor (X) = Whether respondent has low education level Possible Confounders (Z): Mother diagnosed with type II diabetes Mother s genetic risk for type II diabetes Respondent s genetic risk for type II diabetes Income during mother s childhood Income during respondent s childhood How do we decide which confounders, if any, to control for? Wouldn t it be safest to simply control for them all? What if we don t have data for some? 3
JOIN IN Go to slido.com Enter code #4204494 0. What approaches can we use for model selection? (https://admin.sli.do/event/xnA37S7AchTABXDQ8RKcYy/polls) 4
1. CORRELATION HELPS US WITH MODEL SELECTION? x y z1 z2 x 1 y 0.37 1 z1 0.13 0.27 1 z2 -0.08 -0.01 0.54 1 The correlation between the hypothesised cause (x) and outcome (y) is 0.37. The correlation between the outcome (y) and a presumed confounder (z1) is quite high, 0.27. Therefore, z1 should be included in our analysis to control for this potential confounder, right? What do you say? 5
JOIN IN Go to slido.com Enter code #4204494 1. The variable should be included because of the decent correlation? 6
1. CORRELATION SHOWS US POTENTIAL CONFOUNDERS, RIGHT? Simple confounders are direct predictors of the predictor and outcome variables in our presumed causal relationship. Confounders create a correlation between x and y, not because x causes y but because both are caused by z1 z1 x y If there is no true causal relationship between x and y, then the correlation between them will be zero if we control for z1 Confounder ???| ?1 = 0 7
1. NO, CORRELATION CANNOT BE RELIED UPON FOR MODEL SELECTION What if z1 is not a confounder? What if, say, a drug (x) lowers blood pressure (z1), reducing cardiovascular disease (y). z1 is a consequence of x, not a confounder. z1 is a mediator. z1 x y x z1 y Confounder - false Mediator - true statistically identical with 8
1. THE STATISTICAL CONSEQUENCES OF MEDIATORS AND CONFOUNDERS ARE THE SAME x c y C is a part of a Chain (mediator) e.g. deprivation (x) causes stress response (c) causes health ???| ? = 0 outcomes (z) c Confounding / Fork (common cause) e.g. Good weather (c) predicts ice cream sales (x) and drownings (y) ???| ? = 0 x y If we erroneously control for a mediator, the estimated b coefficient for x will be zero 9
2. DO YOU APPROVE THE DRUG? You are the Chief Medical Statistician. On this evidence, should the new drug be approved? (The drug was offered to random (sex-matched) patients with the same diagnosis). 12
2. DO YOU APPROVE THE DRUG? Your assistant statistician rushes in with a new breakdown of the results by sex. Do you want to change your decision? 13
2. DO YOU APPROVE THE DRUG? If we don t know the sex of the patient, the drug looks worse than ineffective. If we know the sex of the patient, the drug is effective for both women and men!! 14
2. DO YOU APPROVE THE DRUG? What is going on here? What are the important factors? Do any of them cause one another? 15
JOIN IN Go to slido.com Enter code #4204494 3. Do you approve the drug? 16
2. YES. YOU APPROVE THE DRUG. Women are less likely to take the drug and the drug is less effective for women, but the drug is efficacious in both groups. 19
2. YES. YOU APPROVE THE DRUG. Sex is a confounder. Confounding is a causal concept, not a statistical one 20
3. HOW TO DECIDE WHICH CONFOUNDERS TO CONTROL FOR Research question: Is low education a cause of type II diabetes? Key Variables Outcome (Y) = Whether respondent has type II diabetes Predictor (X) = Whether respondent has low education level Possible Confounders (Z): Mother diagnosed with type II diabetes Mother s genetic risk for type II diabetes Respondent s genetic risk for type II diabetes Income during mother s childhood Income during respondent s childhood Which confounders, if any, shall we control for? If we had data on all of them, would it just be safest to control for them all? 22
H0 : NO CAUSAL EFFECT OF CHILDS LOW EDUCATION ON DIABETES RISK. DIABETES IS CAUSED BY GENES AND POVERTY Mother s genetic diabetes II risk Mother s childhood poverty This Directed Acyclic Graph (DAG) shows the hypothesised causal relationships among the variables. Mother s diabetes II Child s low education (x) It shows the null hypothesis for the research question: low education does not directly cause diabetes (i.e. there is no arrow between them) Child s diabetes II (y) Child s childhood poverty Child s genetic diabetes II risk 24
H0 : NO CAUSAL EFFECT OF CHILDS LOW EDUCATION ON DIABETES RISK. DIABETES IS CAUSED BY GENES AND POVERTY Mother s genetic diabetes II risk Mother s childhood poverty Mother s diabetes II We need to know this Directed Acyclic Graph (DAG) to know which variables to control for Child s low education (x) Child s diabetes II (y) Child s childhood poverty Child s genetic diabetes II risk 25
BACKDOOR PATHS Mother s genetic diabetes II risk Mother s childhood poverty Mother s diabetes II Any two variables can be spuriously correlated if we can draw a backdoor path between them Child s low education (x) Child s diabetes II (y) Child s childhood poverty Child s genetic diabetes II risk 26
BACKDOOR PATHS Mother s genetic diabetes II risk Mother s childhood poverty Mother s diabetes II Backdoor path: Linked arrows point into both the variables at the start and end of the path. Child s low education (x) There is a backdoor path between diabetes statuses: Child s diabetes II (y) Child s childhood poverty Child s genetic diabetes II risk Mo diab<-Mo gen->Chi gen->Chi diab 27
BACKDOOR PATHS Mother s genetic diabetes II risk Mother s childhood poverty This backdoor path means that Mother s and child s diabetes statuses will be correlated not causally because diabetes causes diabetes, but spuriously because of shared genetic influence. Mother s diabetes II Child s low education (x) Child s diabetes II (y) Child s childhood poverty Child s genetic diabetes II risk 28
BLOCKING BACKDOOR PATHS How can we remove this spurious correlation? Mother s genetic diabetes II risk Mother s childhood poverty Mother s diabetes II We can control for variables on the backdoor path, depending on whether they are a: Child s low education (x) 1. Fork (confounder) 2. Chain (mediator) 3. Collider Child s diabetes II (y) Child s childhood poverty Child s genetic diabetes II risk 29
CONTROL FOR A FORK VARIABLE TO BLOCK THE BACKDOOR PATH Mother s genetic risk is on a fork on the backdoor path Mother s genetic diabetes II risk Mother s childhood poverty Mother s diabetes II Fork: Var1 <- Var2 -> Var3 If we control for a fork, we block the spurious relation along the backdoor path (A fork can also be called a confounder) Child s low education (x) Child s diabetes II (y) Child s childhood poverty Child s genetic diabetes II risk Mo diab<-Mo gen->Chi gen->Chi diab 30
CONTROL FOR A CHAIN VARIABLE TO BLOCK THE BACKDOOR PATH Child s genetic risk is in a chain on the backdoor path Mother s genetic diabetes II risk Mother s childhood poverty Mother s diabetes II Fork: Var1 -> Var2 -> Var3 If we control for a chain, we block the spurious relation along the backdoor path. (A chain is also known as a mediator) Child s low education (x) Child s diabetes II (y) Child s childhood poverty Child s genetic diabetes II risk Mo diab<-Mo gen->Chi gen->Chi diab 31
THE PATH ONLY NEEDS TO BE BLOCKED AT ONE PLACE The path only needs to be blocked in one place. Mother s genetic diabetes II risk Mother s childhood poverty Mother s diabetes II In theory, we can control for either Mother s or Child s genetic risk, to block the backdoor path. Child s low education (x) Child s diabetes II (y) Child s childhood poverty Child s genetic diabetes II risk 32
IF WE DONT HAVE GENETIC DATA, DIABETES STATUSES OF PARENT AND CHILD WILL BE SPURIOUSLY CORRELATED If we don t have genetic data for the mother or child, we can t block this backdoor path, Mother s genetic diabetes II risk Mother s childhood poverty Mother s diabetes II It will remain open, provoking a spurious (non-causal) correlation between the variables at the ends of the backdoor path. Child s low education (x) In this case, if we don t know the genetic status, mother s and child s diabetes statuses will be correlated. Child s diabetes II (y) Child s childhood poverty Child s genetic diabetes II risk 33
IS THERE A BACKDOOR PATH BETWEEN X AND Y? Mother s genetic diabetes II risk Mother s childhood poverty Yes. It s a longer version of the previous path. Mother s diabetes II If we don t block this path, our estimate of the causal effect of x on y will be affected by the spurious correlation. Child s low education (x) Child s diabetes II (y) Child s childhood poverty Child s genetic diabetes II risk Child Ed <- Mo Pov->Mo diab<-Mo gen->Chi gen->Chi diab 34
IS THERE A BACKDOOR PATH BETWEEN X AND Y? If we don t have genetic information, can we still block the path? Mother s genetic diabetes II risk Mother s childhood poverty Mother s diabetes II We could perhaps control for Mother s diabetes II? Child s low education (x) Child s diabetes II (y) Child s childhood poverty Child s genetic diabetes II risk Child Ed <- Mo Pov->Mo diab<-Mo gen->Chi gen->Chi diab 35
MOTHERS DIABETES II STATUS IS A COLLIDER Mother s diabetes II status is a collider on the backdoor path it has two arrows pointing into it (colliding). Mother s genetic diabetes II risk Mother s childhood poverty Mother s diabetes II A collider blocks the path by not controlling for it. Child s low education (x) If we control for a collider we OPEN the backdoor path. Child s diabetes II (y) Child s childhood poverty Child s genetic diabetes II risk Why? Child Ed <- Mo Pov->Mo diab<-Mo gen->Chi gen->Chi diab 36
CONTROLLING FOR A COLLIDER OPENS THE BACKDOOR PATH Let s look at just mother s diabetes status. Mother s genetic diabetes II risk Mother s childhood poverty Mother s diabetes II The DAG defines two causes of diabetes: poverty and genetics. If we control/stratify by diabetes, then for those with diabetes, if one cause is absent, the other is more likely. E.g. if I have diabetes but no genetic risk factors, it becomes likely that I had poverty in childhood. Mo Pov->Mo diab<-Mo gen 37
CONTROLLING FOR A COLLIDER OPENS THE BACKDOOR PATH Mother s genetic diabetes II risk Mother s childhood poverty Mother s diabetes II The variables at the ends of the backdoor path will become conditionally correlated, conditional on knowing the status of the collider variable. So, the path is OPEN if we control for the collider. The path is closed if we DON T control for the collider. Mo Pov->Mo diab<-Mo gen 38
H0 : DIABETES IS CAUSED BY GENES AND POVERTY. NO CAUSAL EFFECT OF CHILD S LOW EDUCATION ON DIABETES RISK So, even if we don t have genetic data, according to this DAG we can block this backdoor path by NOT controlling for Mother s diabetes II status. Mother s genetic diabetes II risk Mother s childhood poverty Mother s diabetes II Child s low education (x) Child s diabetes II (y) Child s childhood poverty Child s genetic diabetes II risk Child Ed <- Mo Pov->Mo diab<-Mo gen->Chi gen->Chi diab 39
WHICH CONFOUNDERS SHALL WE CONTROL FOR? Research question: Is low education a cause of type II diabetes? Key Variables Outcome (Y) = Whether respondent has type II diabetes Predictor (X) = Whether respondent has low education level Possible Confounders (Z): Mother diagnosed with type II diabetes Mother s genetic risk for type II diabetes Respondent s genetic risk for type II diabetes Income during mother s childhood Income during respondent s childhood If the DAG is true, we need not control for anything to block this backdoor path. BUT 40
THERE WILL BE MORE THAN ONE BACKDOOR PATH.. There are other backdoor paths between predictor (x) and outcome (Y), e.g. Mother s genetic diabetes II risk Mother s childhood poverty Mother s diabetes II Child Ed <- Chi pov ->Chi diab Child s childhood poverty is a fork and so, if controlled, will block this spurious path. Child s low education (x) Child s diabetes II (y) Child s childhood poverty Child s genetic diabetes II risk 41
THERE WILL BE MORE THAN ONE BACKDOOR PATH.. There are other backdoor paths between predictor (x) and outcome (y), e.g. Mother s genetic diabetes II risk Mother s childhood poverty Mother s diabetes II Child Ed <- Mo pov <- Chi pov ->Mo Pov This path is also blocked by Child s childhood poverty. Child s low education (x) Child s diabetes II (y) Child s childhood poverty Child s genetic diabetes II risk 42
WHICH CONFOUNDERS SHALL WE CONTROL FOR? Research question: Is low education a cause of type II diabetes? Key Variables Outcome (Y) = Whether respondent has type II diabetes Predictor (X) = Whether respondent has low education level Possible Confounders (Z): Mother diagnosed with type II diabetes Mother s genetic risk for type II diabetes Respondent s genetic risk for type II diabetes Income during mother s childhood Income during respondent s childhood If the DAG is correct, we only need to control for the child s poverty in childhood, and NOT for mother s diabetes status. 43
SUMMARY Statistics and data are not enough. We need to make causal assumptions. A DAG is just a set of causal assumptions. The DAG can then guide model building and comparison. 44
SUMMARY I ve covered some main points but over-simplified others There are many topics we didn t cover, e.g.: Mendelian Randomization Longitudinal designs, e.g. longitudinal mediation Latent variables (i.e. unmeasured causes) Multilevel causation (e.g. pupils affecting teachers, affecting pupils) 45
2. RANDOMIZATION Variables that have been properly randomized do not have any causes no arrows going into them in the DAG Randomization of patients to Drug means that person-specific qualities (e.g. sex) cannot be causes of between-patient variance in Drug uptake. 49
2. MENDELIAN RANDOMIZATION U Research question: Does Low Density Lipoprotein (LDL) blood cholesterol cause CardioVascular Disease (CVD)? Problem: unmeasured confounders (u) of LDL and CVD HMGCR LDL CVD Solution: HMGCR: Gene variants involved in LDL production. A person s HMGCR variants are randomized at conception. HMGCR status is independent of the unmeasured confounders (u) of LDL and CVD; we can use it as an instrument variable. 50