Directed Acyclic Graphs (DAGs) for Causal Inference

DAGs & EMM

Prep

notes | HW1-4 | scratchpad

Install packages:

(1) dagitty, (2) ggdag, (3) contrast

learnr.web.unc.edu

2020.02.27 – L15 – Mike

’

•

DAGs

•

EMM

•

More on models

•

Homework 3

: Looked good!

•

Homework 4

: Basic GLMs

DAGs

Models

head

births_sm

births_sm

mage_sq

 births_sm

mage

2 # or poly

m1

glm

formula

 preterm_f

 pnc5_f,

family

binomial

"identity"

data

 births_sm

m2

glm

formula

 preterm_f

 pnc5_f

smoker_f

family

binomial

"identity"

data

 births_sm

m3

glm

formula

 preterm_f

 pnc5_f

 smoker_f

mage

family

binomial

"identity"

data

 births_sm

m4

glm

formula

 preterm_f

 pnc5_f

 smoker_f

poly

mage,

raw

family

binomial

"identity"

data

 births_sm

Our (toy) DAG

Directed Acyclic Graphs (DAGs)

 inform our variable selection and treatment in

models (based on their status as mediators, confounders, effect measure

modifiers, etc. We will not elaborate in this class!

Take the Epi sequence for more.

DAG from EPID 716 / Christy Avery

What is a DAG?

DAGs

Directed Acyclic Graphs

) document causal

assumptions / knowledge from our head or

literature. They can be used to guide us to better

answer questions like:

•

Does a change in A prompt a change in B?

•

Or the other way around?

•

Or not in either direction, but their association is

caused by some third thing?

DAG Requirements

Directed



, not --

Acyclic

: A



B, not A



B and B



Graph

: Connected, not dangling*.

Other disciplines, from engineering to other forms of

statistical models, relax these requirements.

Mediation

Confounders

Colliders

Inducing a biased causal association through

controlling a collider is:

collider stratification bias

Effect Measure Modifiers

EMM: Two notes

•

Note that an

EMM may or may not be a confounder

(influencing the values of A and B directly, vs. the effect

of A on B), so may not be in the DAG node network.

EMMs are good for a DAG notes though!

•

Also note that it may be rare that an exposure /

intervention switches direction entirely. Effect measure

modification worth acknowledging may be a matter of

degree

… or important to report because of

context -

regardless of statistical interaction, p=whatever.

In Sum:

Create a model, throw things in, reduce by p-value /

backwards selection, or any of a number of techniques.

May be good at predicting outcome from exposure and

other variables. There is nuance here, but…

In Sum:

…if we want the causal effect, we have to be intentional

about what parts of this flow we block. We leave direct

and indirect causal paths, and leave blocked paths with

colliders (do not control!)

How do we do this?

•

Encode the nodes and directed edges of the DAG

from the literature / content knowledge

•

By eye, hand, or software

1.

document all paths between Exposure and Outcome

2.

Identify whether they are already blocked (collider),

backdoor (confounded), or causal (direct or indirect

through a mediator)

3.

Select nodes to statistically control, often ideally as

few as possible, to block the open backdoor paths

without blocking

A note on functional form!

Remember

maternal age

? In order to improve precision

of estimates (and acknowledge the linear assumptions of

GLMs), it behooves us to model covariates as well as

possible… balancing parsimony and interpretation.

Hence mage2… or splines of some kind.

This does not apply as directly to our exposures, which

we want to interpret in actionable, communicable ways!

….and a note on interpretation!

Relatedly, the Table 2 fallacy suggests

Reminder: Table 2 fallacy! (Westrich & Greenland 2013).

If those “estimates” are largely uninterpretable in causal

inference context, might as well let them go and model

them more precisely (albeit obfuscated).

Westreich, D., and S. Greenland. “The Table 2 Fallacy: Presenting and Interpreting Confounder and

Modifier Coefficients.”

American Journal of Epidemiology

 177, no. 4 (February 15, 2013): 292–98.

https://doi.org/10.1093/aje/kws412

DAG critiques

•

Reality isn’t DAGGY

•

DAGs (even if large) are a model, and so a simplified version of reality,

in this particular case requiring unidirectionality and acyclic

assumptions.

•

Reality is often a system with feedback loops and inter-relationships

that may not be modeled well with unidirectional models, perhaps

especially with social / network processes. There are other methods!

•

Not all nodes / edges are alike

•

Race-ethnicity in particular is a heavily overloaded construct for causal

inference (Vanderweele & Robinson 2014), reaching back to represent

historical and current systemic oppression and racism, physical

phenotype, experiences of cultural and ascribed identity. Parts of this

construct may have different causal relationships. Break apart if you

can, and regardless, be mindful / nuanced in your interpretation.

•

Not always interested in causal effects

•

Prediction, classification, association, other techniques have a place in

a public health toolbox.

VanderWeele, Tyler J., and Whitney R. Robinson. “On the Causal Interpretation of Race in Regressions Adjusting for Confounding and

Mediating Variables:”

Epidemiology

 25, no. 4 (July 2014): 473–84. doi:10.1097/EDE.0000000000000105.

Our (toy) DAG

Directed Acyclic Graphs (DAGs)

 inform our variable selection and treatment in

models (based on their status as mediators, confounders, effect measure

modifiers, etc. We will not elaborate in this class!

Take the Epi sequence for more.

DAG from EPID 716 / Christy Avery

Let’s Try: Dagitty

•

Check it out here:

www.dagitty.net

•

Premade DAG for you:

dagitty.net/moAh6a6

DAGs in R

The online

dagitty

tool

http://www.dagitty.net/dags.html#

) exports R code

for the dagitty package. Can be directly downloaded

into R!

The new

ggdag

package is a beefed-up version using

tidy data structures we can recognize.

’

e.g.:

http://www.dagitty.net/dags.html?id=oAh6a6

’

library

dagitty

birth_daggity

 downloadGraph

"dagitty.net/moAh6a6"

# ^ From the internet!

birth_daggity

plot

birth_daggity

’

# Imagine! run a model driven by a DAG! Or tweak the dag and

rerun the model!  :p

 dagitty

::

adjustmentSets

birth_daggity

 Maternal_Age, Maternal_Smoking, Race_Ethnicity

 minimal_adjustment_set

dagitty

::

adjustmentSets

birth_daggity, type

"minimal"

)[[

]]

 minimal_adjustment_set

"Maternal_Age"

"Maternal_Smoking"

"Race_Ethnicity"

str

birth_daggity

# < DAGs are messy though. :/

'dagitty'

 Named chr

"dag {\nChild_Gender [pos=\"0.111,-

0.146\"]\nEarlyPNC [exposure,pos=\"-0.366,-

0.090\"]\nMaternal_Age [pos=\"-0.5"

 __truncated__

’

•

 tidy_dagitty()

•

 ggdag(dag_df) + theme_dag() gets you a nice

ggplot2

•

 geometries, edit plot as usual

•

See:

https://cran.r-

project.org/web/packages/ggdag/vignettes/intro-

to-ggdag.html

•

And:

https://github.com/malcolmbarrett/ggdag

’

library

ggdag

 dag_df

 tidy_dagitty

birth_daggity

str

dag_df

List of

data

Classes ‘tbl_df’, ‘tbl’ and

'data.frame'

 obs. of

 variables

..

 name

chr

"Child_Gender"

"EarlyPNC"

"Maternal_Age"

"Maternal_Age"

...

..

num

0.111

0.366

0.576

0.576

0.576

0.48

0.48

0.619

0.619

0.619

...

..

num

0.146

0.09

0.091

0.091

0.091

0.022

0.022

0.051

0.051

0.051

...

..

 direction

 Factor w

levels

"<-"

"->"

"<->"

...

..

to

chr

"Preterm_Birth"

"Preterm_Birth"

"EarlyPNC"

"Maternal_Smoking"

...

..

 xend

num

0.147

0.147

0.366

0.48

0.147

0.366

0.147

0.366

0.576

0.48

...

..

 yend

num

0.051

0.051

0.09

0.022

0.051

0.09

0.051

0.09

0.091

0.022

...

..

 circular

 logi

FALSE

FALSE

FALSE

FALSE

FALSE

FALSE

...

dag

'dagitty'

 Named chr

"dag {\nChild_Gender [pos=\"0.111,-0.146\"]\nEarlyPNC

[exposure,pos=\"-0.366,-0.090\"]\nMaternal_Age [pos=\"-0.5"

 __truncated__

attr

(*

"class"

)=

chr

"tidy_dagitty"

’

plot

dag_df

# nope!

ggdag

dag_df

%>% mutate(name = str_trunc(name, 5))

  theme_dag_blank

()

’

ggdag_adjustment_set

dag_df

  theme_dag_blank

()

# Hey that's nice!

  geom_dag_text

color

"black"

# gives us geometries as usual

Practical Uses

R may be helpful for quickly changing and rerunning

for minimal sets a few different DAG scenarios.

Probably better than hand. Nice pair with dagitty

website.

Maybe useful for SEM? I dunno. But you can stick

your DAGs in papers / R Markdown, make ggplots of

them, etc.

Other Packages

If you do this a lot, find your favorite! R skills let you translate

data structures across packages as you need to.

dagitty  * ggdag

: Today!

DiagrammeR

: Prettier, but no adjustment sets?

https://donlelek.github.io/2015-03-31-dags-with-r/

dagR

: Prettier and adjustment sets, but funny syntax?

http://rstudio-pubs-

static.s3.amazonaws.com/2609_e3d86d0748c04eb18d5f56d6

a99feb3f.html

Maybe new packages?

EMM

Effect Measure Modifiers

Crude, but like…

Another view…

In other words

A continuation of modeling:

How do we get

group-specific effect

estimates

and

confidence intervals

from models when we have EMM?

Also:

a recursion primer

Review

1.

We’ve looked at a

Directed Acyclic Graph

 to identify

model covariates to control to assess the causal

relationship of early prenatal care on preterm birth.

We know EMM won’t show up on the DAG.

2.

We’ve constructed

Generalized Linear Models

for

risk difference (also looked at RRs and ORs) without

EMM.

3.

We’ve looked at

crude EMM

using dplyr:: and table().

Moderate modification?

4.

Now we want modeled, stratum-specific effect

estimates and CIs controlling for our other covariates

(smoking, maternal age). We want a

full EMM

model

Model Specification 1: Crude

Model Specification 2

Model Specification 3

Model Specification 3

Full Model Specification w/ EMM

Coding schemes

R’s default for expanding unordered factors in a

model is

indicator coding

R breaks up factors in models to all be referent to a

baseline level (set by factor() or fct_* functions!).

This means you need

(n-1) indicator levels

to

describe

n levels of a factor

Let’s Look: Contrast Matrix

Full Model Specification w/ EMM

Full Model Specification w/ EMM

map(births[, map_lgl(births, is.factor)], levels)

# ^ level check – purrr

mfull_emm_rd

glm

data

births,

                   preterm_f

pnc5_f

 raceeth_f

                     smoker_f

 mage

 mage_sq,

family

binomial

link

"identity"

))

Detail

 mfull_emm_rd

glm

data

births,

                    preterm_f

 pnc5_f

 raceeth_f

                      smoker_f

 mage

 mage_sq,

family

binomial

link

"identity"

))

 broom

::

tidy

mfull_emm_rd

# A tibble: 13 x 5

   term                                                       estimate std.error statistic   p.value

chr

dbl

dbl

dbl

dbl

Intercept

0.725

0.0249

29.1

9.21

 pnc5_fEarly PNC

0.0221

0.00683

3.23

1.22

 raceeth_fOther

0.0130

0.0112

1.17

2.44

 raceeth_fWhite Hispanic

0.0104

0.0246

0.422

6.73

 raceeth_fAfrican American

0.0589

0.0109

5.41

6.15

 raceeth_fAmerican Indian or Alaska Native

0.0673

0.0416

1.62

1.06

 smoker_fSmoker

0.0331

0.00427

7.77

7.96

 mage

0.0135

0.00174

7.78

7.15

 mage_sq

0.000246

0.0000306

8.02

1.09

 pnc5_fEarly PNC

raceeth_fOther

0.00570

0.0117

0.489

6.25

 pnc5_fEarly PNC

raceeth_fWhite Hispanic

0.0113

0.0256

0.440

6.60

 pnc5_fEarly PNC

raceeth_fAfrican American

0.00353

0.0114

0.310

7.56

 pnc5_fEarly PNC

raceeth_fAmerican Indian or Alaska Native

0.0213

0.0433

0.492

6.23

table

interaction

births

pnc5_f, births

raceeth_f

))

# : = interaction()

              No Early PNC.White non

Hispanic                  Early PNC.White non

Hispanic

                           No Early PNC.Other                               Early PNC.Other

                  No Early PNC.White Hispanic                      Early PNC.White Hispanic

                No Early PNC.African American                    Early PNC.African American

No Early PNC.American Indian or Alaska Native    Early PNC.American Indian or Alaska Native

Full Model Specification w/ EMM

Contrasts

We can get these point-estimates

by hand a few ways…

# Point estimates by hand, a few ways

mfull_emm_rd

coefficients

#WnH

RD_WnH

 mfull_emm_rd

coefficients

"pnc5_fPNC starts in first 5 mo"

#WnH

RD_WnH

 mfull_emm_rd

coefficients

"pnc5_fPNC starts in first 5 mo:raceeth_fAA"

#AA

sum

mfull_emm_rd

coefficients

)])

# AA

sum

mfull_emm_rd

coefficients

)])

# WH

sum

coef

mfull_emm_rd

)[

)])

# AI/AN

sum

coef

mfull_emm_rd

)[

)])

# Other

…But what about the CIs?

Can't quite make it from the individual coefficients.

Lets you define arbitrary contrasts across your model

that you’re interested in by setting two coefficient

specifications as lists.

contrast(<MODEL OBJECT>,

         a=list(<COEFA>), b=list(<COEFB>)

Handles more than GLMs – also multi-level/LME, gee,

etc. If you need exponentiated estimates, pass in

fun=exp. See help page

For our full model…

Gets us the RD effect estimate and 95% CI for AA moms getting prenatal care vs.

those who aren’t.

Note that unchanging model levels are required to have something, but “drop out”

of the estimation.

This is similar to how predict() works.

# Contrasts using contrast package

library

contrast

contrast

mfull_emm_rd,

list

pnc5_f

"PNC starts in first 5 mo"

raceeth_f

"AA"

, smoker_f

"Smoker"

mage

, mage_sq

list

pnc5_f

"No PNC before 5 mo"

raceeth_f

"AA"

, smoker_f

"Smoker"

mage

, mage_sq

))

For all levels of EMM:

Gets us the RD effect estimate and 95% CI for all levels of race-ethnicity including

referent.

Can use data.frame() to piece together a results table for graphing.

raceeth_diff

 contrast

mfull_emm_rd,

list

pnc5_f

"PNC starts in first 5 mo"

raceeth_f

levels

births

raceeth_f

smoker_f

"Smoker"

, mage

, mage_sq

list

pnc5_f

"No PNC before 5 mo"

raceeth_f

levels

births

raceeth_f

smoker_f

"Smoker"

, mage

, mage_sq

))

print

raceeth_diff, X

EMM_df

 data.frame

model

paste0

"M5: "

, raceeth_diff

raceeth_f

                    estimate

raceeth_diff

Contrast, std.error

 raceeth_diff

SE,

                   X2.5..

raceeth_diff

Lower, X97.5..

raceeth_diff

Upper

ggplot

bind_rows

model_results,EMM_df

aes

model, estimate,color

std.error, fill

std.error

))+

  geom_linerange

aes

ymin

X2.5.., ymax

X97.5..

, size

)+

  geom_point

shape

, size

, color

"white"

)+

 geom_point

shape

)+

  scale_y_continuous

limits

NA

NA

, breaks

seq

(-

0.15

0.05

0.01

))+

  geom_hline

yintercept

, lty

, color

"grey"

)+

  geom_vline

xintercept

4.5

, lty

, color

"blue"

)+

annotate

geom

"text"

, x

4.6

, y

=-

0.05

, angle

label

"M5: Race-Ethnicity EMM: Stratum-Specific RDs"

)+

  labs

title

"Model results"

, subtitle

"Mirroring tufte boxplot aesthetics,

see ggthemes::geom_tufteboxplot()"

)+

  ggthemes

::

theme_tufte

()

#............................................

# EMM / final model

#............................................

glm

data

births, preterm_f

 pnc5_f

 smoker_f,

family

binomial

link

"identity"

))

#Discussion of model spec

head

births

table

births

race_f, births

methnic

# note my "ethnicity" coding may not be ideal.

mfull_emm_rd

glm

data

births,

                   preterm_f

 pnc5_f

 raceeth_f

                     smoker_f

 mage

 mage_sq,

family

binomial

link

"identity"

))

summary

mfull_emm_rd

broom

::

tidy

mfull_emm_rd

# Point estimates by hand, a few ways

mfull_emm_rd

coefficients

#WnH

RD_WnH

 mfull_emm_rd

coefficients

"pnc5_fPNC starts in first 5 mo"

#WnH

RD_WnH

 mfull_emm_rd

coefficients

"pnc5_fPNC starts in first 5 mo:raceeth_fAA"

#AA

sum

mfull_emm_rd

coefficients

)])

# AA

sum

mfull_emm_rd

coefficients

)])

# WH

sum

coef

mfull_emm_rd

)[

)])

# AI/AN

sum

coef

mfull_emm_rd

)[

)])

# Other

births

raceeth_f

# get or set contrasts

contrasts

births

raceeth_f

# Let's look: contrast matrix

# See contr.treatment or contr.sum

# Contrasts using contrast package

library

contrast

contrast

mfull_emm_rd,

list

pnc5_f

"PNC starts in first 5 mo"

, raceeth_f

"AA"

, smoker_f

"Smoker"

, mage

, mage_sq

list

pnc5_f

"No PNC before 5 mo"

, raceeth_f

"AA"

, smoker_f

"Smoker"

, mage

, mage_sq

))

raceeth_diff

 contrast

mfull_emm_rd,

list

pnc5_f

"PNC starts in first 5 mo"

, raceeth_f

levels

births

raceeth_f

, smoker_f

"Smoker"

, mage

, mage_sq

list

pnc5_f

"No PNC before 5 mo"

, raceeth_f

levels

births

raceeth_f

, smoker_f

"Smoker"

, mage

, mage_sq

))

print

raceeth_diff, X

EMM_df

 data.frame

model

paste0

"M5: "

, raceeth_diff

raceeth_f

                    estimate

raceeth_diff

Contrast, std.error

 raceeth_diff

SE,

                   X2.5..

raceeth_diff

Lower, X97.5..

raceeth_diff

Upper

ggplot

bind_rows

model_results,EMM_df

, aes

model, estimate,color

std.error, fill

std.error

))+

  geom_linerange

aes

ymin

X2.5.., ymax

X97.5..

, size

)+

  geom_point

shape

, size

, color

"white"

)+

 geom_point

shape

)+

  scale_y_continuous

limits

NA

NA

, breaks

seq

(-

0.15

0.05

0.01

))+

  geom_hline

yintercept

, lty

, color

"grey"

)+

  geom_vline

xintercept

4.5

, lty

, color

"blue"

)+

  annotate

geom

"text"

, x

4.6

, y

=-

0.05

, angle

, label

"M5: Race-Ethnicity EMM: Stratum-Specific RDs"

)+

  labs

title

"Model results"

, subtitle

"Mirroring tufte boxplot aesthetics, see ggthemes::geom_tufteboxplot()"

)+

  ggthemes

::

theme_tufte

()

Code

chunk

Other ways

Specifying the contrast type using functions like:

C()

: Sets the "contrasts" attribute for the factor.

 contrasts():

Set and view the contrasts associated with a factor.

 model.matrix()

: creates a design (or model) matrix, e.g., by

expanding factors to a set of dummy variables (depending on the

contrasts) and expanding interactions similarly.

Create a custom contrast matrix so that individual

coefficients are more meaningful under EMM. Possible

but beyond scope of this class. Good luck.



References

A (sort of) complete guide to contrasts in R

http://rstudio-pubs-static.s3.amazonaws.com/65059_586f394d8eb84f84b1baaf56ffb6b47f.html

contrast

package

https://cran.r-project.org/web/packages/contrast/vignettes/contrast.pdf

Coming up next week

Outputs / Reports /

Markdown

Slide

Template

Bank

…

1)

Thing 1

2)

Thing 2

3)

Head to the google doc and

type “Done” next to your

name when done!

…

After

doing a thing

, enter

something

A.

B.

C.

Example: Type order (e.g. BCA)

into google doc.

Small Group Activity

Template

•

•

•

Slide Note

Welcome to the R class! Today I will talk about course logistics and give you some background on R. I’ll also demo how to install R and your homework for today will be to install R for next week. I’ll also have you fill out a short survey online so that I and the other teachers can get to know you and the level of R experience you are at.

Embed Share

Download

Directed Acyclic Graphs (DAGs) play a crucial role in documenting causal assumptions and guiding variable selection in epidemiological models. They inform us about causal relationships between variables and help answer complex questions related to causality. DAGs must meet specific requirements like being directed, acyclic, and connected to be valid for causal inference. Concepts like mediation and confounders are essential in interpreting DAGs for understanding causal relationships in epidemiological research.

veer Follow

Uploaded on Aug 05, 2024 | 1 Views

Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

Download Presentation

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript

EPID 701 Spring 2020 EPID 701 Spring 2020 R for Epidemiologists DAGs & EMM Prep: notes | HW1-4 | scratchpad Install packages: (1) dagitty, (2) ggdag, (3) contrast 2020.02.27 L15 Mike learnr.web.unc.edu

Todays Overview Today s Overview DAGs EMM More on models Homework 3: Looked good! Homework 4: Basic GLMs

DAGs

Models head(births_sm) births_sm$mage_sq = births_sm$mage ^ 2 # or poly m1 = glm(formula = preterm_f ~ pnc5_f, family = binomial("identity"), data = births_sm) m2 = glm(formula = preterm_f ~ pnc5_f + smoker_f , family = binomial("identity"), data = births_sm) m3 = glm(formula = preterm_f ~ pnc5_f + smoker_f + mage, family = binomial("identity"), data = births_sm) m4 = glm(formula = preterm_f ~ pnc5_f + smoker_f + poly(mage, 2, raw = T), family = binomial("identity"), data = births_sm)

Our (toy) DAG Directed Acyclic Graphs (DAGs) inform our variable selection and treatment in models (based on their status as mediators, confounders, effect measure modifiers, etc. We will not elaborate in this class! Take the Epi sequence for more. DAG from EPID 716 / Christy Avery

What is a DAG? DAGs (Directed Acyclic Graphs) document causal assumptions / knowledge from our head or literature. They can be used to guide us to better answer questions like: Does a change in A prompt a change in B? Or the other way around? Or not in either direction, but their association is caused by some third thing?

DAG Requirements Directed: , not -- Acyclic: A B, not A B and B A Graph: Connected, not dangling*. Other disciplines, from engineering to other forms of statistical models, relax these requirements.

Mediation

Confounders

Colliders Inducing a biased causal association through controlling a collider is: collider stratification bias

Effect Measure Modifiers

EMM: Two notes Note that an EMM may or may not be a confounder (influencing the values of A and B directly, vs. the effect of A on B), so may not be in the DAG node network. EMMs are good for a DAG notes though! Also note that it may be rare that an exposure / intervention switches direction entirely. Effect measure modification worth acknowledging may be a matter of degree or important to report because of context - regardless of statistical interaction, p=whatever.

In Sum: Create a model, throw things in, reduce by p-value / backwards selection, or any of a number of techniques. May be good at predicting outcome from exposure and other variables. There is nuance here, but

In Sum: if we want the causal effect, we have to be intentional about what parts of this flow we block. We leave direct and indirect causal paths, and leave blocked paths with colliders (do not control!)

How do we do this? Encode the nodes and directed edges of the DAG from the literature / content knowledge By eye, hand, or software 1. document all paths between Exposure and Outcome 2. Identify whether they are already blocked (collider), backdoor (confounded), or causal (direct or indirect through a mediator) 3. Select nodes to statistically control, often ideally as few as possible, to block the open backdoor paths without blocking

A note on functional form! Remember maternal age? In order to improve precision of estimates (and acknowledge the linear assumptions of GLMs), it behooves us to model covariates as well as possible balancing parsimony and interpretation. Hence mage2 or splines of some kind. This does not apply as directly to our exposures, which we want to interpret in actionable, communicable ways!

.and a note on interpretation! Relatedly, the Table 2 fallacy suggests Reminder: Table 2 fallacy! (Westrich & Greenland 2013). If those estimates are largely uninterpretable in causal inference context, might as well let them go and model them more precisely (albeit obfuscated). Westreich, D., and S. Greenland. The Table 2 Fallacy: Presenting and Interpreting Confounder and Modifier Coefficients. American Journal of Epidemiology 177, no. 4 (February 15, 2013): 292 98. https://doi.org/10.1093/aje/kws412.

DAG critiques Reality isn t DAGGY DAGs (even if large) are a model, and so a simplified version of reality, in this particular case requiring unidirectionality and acyclic assumptions. Reality is often a system with feedback loops and inter-relationships that may not be modeled well with unidirectional models, perhaps especially with social / network processes. There are other methods! Not all nodes / edges are alike Race-ethnicity in particular is a heavily overloaded construct for causal inference (Vanderweele & Robinson 2014), reaching back to represent historical and current systemic oppression and racism, physical phenotype, experiences of cultural and ascribed identity. Parts of this construct may have different causal relationships. Break apart if you can, and regardless, be mindful / nuanced in your interpretation. Not always interested in causal effects Prediction, classification, association, other techniques have a place in a public health toolbox. VanderWeele, Tyler J., and Whitney R. Robinson. On the Causal Interpretation of Race in Regressions Adjusting for Confounding and Mediating Variables: Epidemiology 25, no. 4 (July 2014): 473 84. doi:10.1097/EDE.0000000000000105.

Our (toy) DAG Directed Acyclic Graphs (DAGs) inform our variable selection and treatment in models (based on their status as mediators, confounders, effect measure modifiers, etc. We will not elaborate in this class! Take the Epi sequence for more. DAG from EPID 716 / Christy Avery

Lets Try: Dagitty Check it out here: www.dagitty.net Premade DAG for you: dagitty.net/moAh6a6

DAGs in R The online dagitty tool (http://www.dagitty.net/dags.html#) exports R code for the dagitty package. Can be directly downloaded into R! The new ggdag package is a beefed-up version using tidy data structures we can recognize.

Lets Try: DAGs in R with dagitty dagitty e.g.: http://www.dagitty.net/dags.html?id=oAh6a6

Lets Try: DAGs in R with dagitty dagitty library(dagitty) birth_daggity = downloadGraph("dagitty.net/moAh6a6") # ^ From the internet! birth_daggity plot(birth_daggity)

Lets Try: DAGs in R with dagitty dagitty # Imagine! run a model driven by a DAG! Or tweak the dag and rerun the model! :p > dagitty::adjustmentSets(birth_daggity) { Maternal_Age, Maternal_Smoking, Race_Ethnicity } > minimal_adjustment_set = dagitty::adjustmentSets(birth_daggity, type="minimal")[[1]] > minimal_adjustment_set [1] "Maternal_Age" "Maternal_Smoking" "Race_Ethnicity" > str(birth_daggity) # < DAGs are messy though. :/ 'dagitty' Named chr "dag {\nChild_Gender [pos=\"0.111,- 0.146\"]\nEarlyPNC [exposure,pos=\"-0.366,- 0.090\"]\nMaternal_Age [pos=\"-0.5"| __truncated__ >

Lets Try: DAGs in R with ggdag ggdag tidy_dagitty() ggdag(dag_df) + theme_dag() gets you a nice ggplot2 geometries, edit plot as usual See: https://cran.r- project.org/web/packages/ggdag/vignettes/intro- to-ggdag.html And: https://github.com/malcolmbarrett/ggdag

Lets Try: DAGs in R with ggdag ggdag > library(ggdag) > > dag_df = tidy_dagitty(birth_daggity) > str(dag_df) List of 2 $ data:Classes tbl_df , tbl and 'data.frame': ..$ name : chr [1:15] "Child_Gender" "EarlyPNC" "Maternal_Age" "Maternal_Age" ... ..$ x : num [1:15] 0.111 -0.366 -0.576 -0.576 -0.576 -0.48 -0.48 -0.619 -0.619 -0.619 ... ..$ y : num [1:15] -0.146 -0.09 -0.091 -0.091 -0.091 0.022 0.022 -0.051 -0.051 -0.051 ... ..$ direction: Factor w/ 3 levels "<-","->","<->": 2 2 2 2 2 2 2 2 2 2 ... ..$ to : chr [1:15] "Preterm_Birth" "Preterm_Birth" "EarlyPNC" "Maternal_Smoking" ... ..$ xend : num [1:15] -0.147 -0.147 -0.366 -0.48 -0.147 -0.366 -0.147 -0.366 -0.576 -0.48 ... ..$ yend : num [1:15] -0.051 -0.051 -0.09 0.022 -0.051 -0.09 -0.051 -0.09 -0.091 0.022 ... ..$ circular : logi [1:15] FALSE FALSE FALSE FALSE FALSE FALSE ... $ dag : 'dagitty' Named chr "dag {\nChild_Gender [pos=\"0.111,-0.146\"]\nEarlyPNC [exposure,pos=\"-0.366,-0.090\"]\nMaternal_Age [pos=\"-0.5"| __truncated__ - attr(*, "class")= chr "tidy_dagitty" > 15 obs. of 8 variables:

Lets Try: DAGs in R with ggdag ggdag plot(dag_df) # nope! ggdag(dag_df %>% mutate(name = str_trunc(name, 5))) + theme_dag_blank()

Lets Try: DAGs in R with ggdag ggdag ggdag_adjustment_set(dag_df) + theme_dag_blank() + # Hey that's nice! geom_dag_text(color="black") # gives us geometries as usual

Practical Uses R may be helpful for quickly changing and rerunning for minimal sets a few different DAG scenarios. Probably better than hand. Nice pair with dagitty website. Maybe useful for SEM? I dunno. But you can stick your DAGs in papers / R Markdown, make ggplots of them, etc.

Other Packages If you do this a lot, find your favorite! R skills let you translate data structures across packages as you need to. dagitty * ggdag: Today! DiagrammeR: Prettier, but no adjustment sets? https://donlelek.github.io/2015-03-31-dags-with-r/ dagR: Prettier and adjustment sets, but funny syntax? http://rstudio-pubs- static.s3.amazonaws.com/2609_e3d86d0748c04eb18d5f56d6 a99feb3f.html Maybe new packages?

EMM

Effect Measure Modifiers

Crude, but like

Another view

In other words A continuation of modeling: How do we get group-specific effect estimates and confidence intervals from models when we have EMM? Also: a recursion primer

Review 1. We ve looked at a Directed Acyclic Graph to identify model covariates to control to assess the causal relationship of early prenatal care on preterm birth. We know EMM won t show up on the DAG. 2. We ve constructed Generalized Linear Models for risk difference (also looked at RRs and ORs) without EMM. 3. We ve looked at crude EMM using dplyr:: and table(). Moderate modification? 4. Now we want modeled, stratum-specific effect estimates and CIs controlling for our other covariates (smoking, maternal age). We want a full EMM model.

Model Specification 1: Crude Risk Preterm = ?0+ ?1 ???5 Variable coefficients like ?1 (slopes) represent a linear change in risks (a risk difference) for a 1 unit increase in their associated variable. ?0: Risk of preterm with no early prenatal care ?1: Additional risk (risk difference) with early prenatal care

Model Specification 2 Risk Preterm = ?0+ ?1 ???5 + ?2 ?????? Coefficient slopes represent a linear change in risks (a risk difference) for a 1 unit increase in their associated variable. ?0: Risk of preterm with no early prenatal care ?1: Additional risk (risk difference) with early prenatal care (expect negative RD since protective) ?2: Additional risk (RD) with smoking status (note: no interaction term, so this additional risk is regardless of PNC5 status)

Model Specification 3 Risk Preterm = ?0+ ?1 ???5 + ?? ?????? + ?3?????? + ?4???? + ?5????_?? No interaction term but since race-ethnicity has multiple levels, so that 2 is really a vector of coefficients. Now ?0 is the baseline risk for White non-Hispanic mothers, each 2? for non-referent levels of race-ethnicity (AA, WH, AI/AN, other) is the increased risk of preterm birth by race-ethnicity status of the mother .*

Model Specification 3 Risk Preterm = ?0+ ?1 ???5 + ?? ?????? + ?4?????? + ?5???? + ?6????_?? * SORT OF. Remember the Table 2 Fallacy when not looking at EMM: we didn t draw our DAG to properly estimate these other coefficients. But may still useful to think about their statistical (non-causal) interpretation.

Full Model Specification w/ EMM Risk Preterm = ?0+ ?1 ???5 + ?? ?????? + ?????5 ?????? + ?4 ?????? + ?5 ???? + ?6 ????_?? Coefficient slopes represent a linear change in risks (a risk difference) for a 1 unit increase in their associated variable.

Coding schemes R s default for expanding unordered factors in a model is indicator coding: R breaks up factors in models to all be referent to a baseline level (set by factor() or fct_* functions!). This means you need (n-1) indicator levels to describe n levels of a factor.

Lets Look: Contrast Matrix

Full Model Specification w/ EMM Risk Preterm = ?0+ ?1 ???5??? + ?2 ?????? ?? + ?6???5 ?????? ?? + ?3 ?????? ?? + ?7???5 ?????? ?? + ?4 ?????? ???? + ?8???5 ?????? ???? + ?5 ?????? ????? + ?9???5 ?????? ????? + ?10????????? + ?11???? + ?12????_??

Full Model Specification w/ EMM Risk Preterm = ?0+ ?1 ???5??? + ?2 ?????? ?? + ?6???5 ?????? ?? + ?3 ?????? ?? + ?7???5 ?????? ?? + ?4 ?????? ???? + ?8???5 ?????? ???? + ?5 ?????? ????? + ?9???5 ?????? ????? + ?10????????? + ?11???? + ?12????_?? map(births[, map_lgl(births, is.factor)], levels) # ^ level check purrr mfull_emm_rd = glm(data=births, preterm_f ~ pnc5_f * raceeth_f + smoker_f + mage + mage_sq, family=binomial(link="identity"))

> mfull_emm_rd = glm(data=births, + + + Detail preterm_f ~ pnc5_f * raceeth_f + smoker_f + mage + mage_sq, family=binomial(link="identity")) > broom::tidy(mfull_emm_rd) # A tibble: 13 x 5 term estimate std.error statistic p.value <chr> 1 (Intercept) 2 pnc5_fEarly PNC 0.0221 3 raceeth_fOther 4 raceeth_fWhite Hispanic -0.0104 5 raceeth_fAfrican American -0.0589 6 raceeth_fAmerican Indian or Alaska Native -0.0673 7 smoker_fSmoker -0.0331 8 mage 0.0135 9 mage_sq 10 pnc5_fEarly PNC:raceeth_fOther 11 pnc5_fEarly PNC:raceeth_fWhite Hispanic 0.0113 12 pnc5_fEarly PNC:raceeth_fAfrican American 0.00353 13 pnc5_fEarly PNC:raceeth_fAmerican Indian or Alaska Native 0.0213 > table(interaction(births$pnc5_f, births$raceeth_f)) # : = interaction() <dbl> <dbl> <dbl> 29.1 3.23 -1.17 -0.422 6.73e- -5.41 -1.62 -7.77 7.78 -8.02 -0.489 6.25e- 0.440 6.60e- 0.310 7.56e- 0.492 6.23e- <dbl> 0.725 0.0249 0.00683 0.0112 0.0246 0.0109 0.0416 0.00427 0.00174 9.21e-187 1.22e- 2.44e- 3 1 1 8 1 -0.0130 6.15e- 1.06e- 7.96e- 15 7.15e- 15 1.09e- 15 -0.000246 0.0000306 -0.00570 0.0117 0.0256 0.0114 0.0433 1 1 1 1 No Early PNC.White non-Hispanic Early PNC.White non-Hispanic 2105 No Early PNC.Other 1240 No Early PNC.White Hispanic Early PNC.White Hispanic 175 No Early PNC.African American Early PNC.African American 1836 No Early PNC.American Indian or Alaska Native Early PNC.American Indian or Alaska Native 86 32523 Early PNC.Other 8626 1387 12768 783

Full Model Specification w/ EMM Under that coding schema ?0 is no longer the baseline risk (no PNC) for all groups it s the baseline risk for White non-Hispanic mothers*. ?2,3,4,5 are the increment in risk for each race-ethnicity status (compared to WnH) without prenatal care*. ?6,7,8,9 are the increment in risk by each race-ethnicity status (compared to WnH) with prenatal care*. * adjusting for other covariates like smoking and mage as specified in the model

Contrasts So if we want the group-specific RDs by race, ethnicity, we could imagine subtracting one equation from another. Many terms drop out, leaving: ?1: ?1+ ?6: ?1+ ?7: ?1+ ?8: ?1+ ?9: Effect (RD) of PNC5 for WnH moms Effect (RD) of PNC5 for AA moms Effect (RD) of PNC5 for WH moms Effect (RD) of PNC5 for AI/AN moms Effect (RD) of PNC5 for RE:Other moms

We can get these point-estimates by hand a few ways # Point estimates by hand, a few ways mfull_emm_rd$coefficients[2] #WnH RD_WnH = mfull_emm_rd$coefficients["pnc5_fPNC starts in first 5 mo"] #WnH RD_WnH + mfull_emm_rd$coefficients["pnc5_fPNC starts in first 5 mo:raceeth_fAA"] #AA sum(mfull_emm_rd$coefficients[c(2,10)]) # AA sum(mfull_emm_rd$coefficients[c(2,11)]) # WH sum(coef(mfull_emm_rd)[c(2,12)]) # AI/AN sum(coef(mfull_emm_rd)[c(2,13)]) # Other