Epidemiology Concepts in Research and Analysis

As you
arrive….
The usual!
Usual:
1)
Lecture
2)
Scratchpad, births loaded
3)
Homework 1 Open
4)
Notes doc
5)
Warmup question
 using your
scratchpad/ homework
script:
How many of each type of
variable is the
births2012_small.csv
 dataset
read with read_csv()? How
did you find out?
  
Hint: class()
 
"Not sure" is always ok!
E
P
I
D
 
7
0
1
 
S
p
r
i
n
g
 
2
0
2
0
R
 
f
o
r
 
E
p
i
d
e
m
i
o
l
o
g
i
s
t
s
R Coding III:
Last of our tools:
control & functions
(and HW deep dive)
learnr.web.unc.edu
2020.01.23 – L5 – Mike
Today
1.
Homework Intro!
2.
Control
…and Functionals
3.
Functions
H
o
m
e
w
o
r
k
P
r
o
j
e
c
t
 
Motivating Question
Does 
  
early prenatal care (PNC)
reduce
 
  
preterm birth?
Prototypical Epi Analysis
Literature review
  
Nope! We’re picking it. 
Hypothesis / question generation
Prepare, Explore & Recode Data (HW1*)
Select / Subset Covariates (HW2)
Functional Form & Crude/Basic Models (HW3)
Confounding & Effect Measure Modification (HW4)
Graphics & Outputs (HW5)
Maps (short HW6*)
…Bonus stuff
*We’ll revisit some early stuff in future assignments,
since some of the power of the fancier stuff you’d
generally apply right away! But this is the gist.
Important Epidemiology Concepts
from the EPID Methods Sequence
…that we will be using 
but abbreviating hard: 
Exposure
Outcome
Risk
Risk Difference/Ratio
Rate
Confounder
Mediator
Effect Measure Modifier
Directed Acyclic Graphs (DAGs) for Causal Inference
DAGs
Directed Acyclic Graphs (DAGs)
 inform our variable selection and treatment in
models (based on their status as mediators, confounders, effect measure
modifiers, etc. We will not elaborate in this class!
Take the Epi sequence for more.
    
DAG from EPID 716 / Christy Avery
Important Epidemiology Concepts
from the EPID Methods* Sequence
…that we will not be using 
or only minimally use. 
Hand calculations
Confidence intervals (minimal)
Odds ratios
DAGs
….And many more!
*Covered in depth in EPID 715/716 in a SAS base and the
core EPID sequence.
I
m
p
o
r
t
a
n
t
 
E
p
i
d
e
m
i
o
l
o
g
y
 
C
o
n
c
e
p
t
s
N
O
T
 
I
N
 
t
h
e
 
E
P
I
D
 
C
o
r
e
 
S
e
q
u
e
n
c
e
…that we 
will
 be using, because they come up a lot
in public health practice, paper writing, teamwork,
etc.
Maps!
Table & report generation
Organizing a large analysis
More critical analysis on disparities
Etc.
Motivating Question…
 
…a bit more specifically
Does
 
early prenatal care = PNC during or before 5
th
 month
reduce
 
preterm birth = less than 37 weeks
…when controlling for obvious confounders
Literature note
: …uh, only maybe sorta? It’s more complex
than we’ll be treating it for this class. But let’s mostly drop
that for now! Feel free to explore the literature here.
Think
: PNC seems to be good. Let’s figure out how good!
Relevant Variables*
Exposure/Outcome
Mdif
: Month Prenatal Care Began
Wksgest
: Calculated Estimate of Gestation
Covariates
Mage
: Maternal age
Mrace
: Maternal Race
Methnic
: Hispanic Origin of Mother
Cigdur
: Cigarrette Smoking During Pregnancy
Cores
: Residence of Mother -- County
Look ahead: actually, we’ll be creating some modified versions of these,
but these are our base elements. And a sidenote on case / style….
Relevant Variables
Selection Criteria
Plur: 
Plurality of birth (twins, triplets, etc.)
Wksgest
: Calculated Estimate of Gestation
DOB
: Date of birth of baby
Congenital Anomalies: 
multiple variables with
congenital anomaly status (future: purrr/apply)
Sex
: Infant sex
Visits
: Total Number of Prenatal Care Visits
C
o
n
t
r
o
l
…Quickly!
Just because you can
doesn't mean you should
Control
To do…or not to do
…or do 
repeatedly
.
R has “
traditional
” control structures…
if() {}, elseif() {}, else{}, ifelse(), for(){}  
 
Base R
if_else(), case_when()
   
 
Tidyverse
 
 
…but also benefits from 
vectorization!
C
o
n
d
i
t
i
o
n
a
l
 
F
u
n
c
t
i
o
n
s
I
f
 
(
)
 
{
}
# if([BOOLEAN]){[DO STUFF]}
if
(
length
(
names
(
births
))
 
>
 
10
){
  
print
 
(
"lots of variables!"
)
}
 
# {} treats many statements as 1 block*
# For only very short ifs (and functions)…
if
(
length
(
names
(
births
))
 
>
 
10
)
 
"Bigger"
 
else
 
"Smaller!"
[Boolean test]
C
o
n
d
i
t
i
o
n
a
l
 
F
u
n
c
t
i
o
n
s
L
o
n
g
e
r
 
i
f
e
l
s
e
# if([BOOLEAN]){[DO STUFF]}
if
(
file.exists
(
"births2012_small.csv"
)){
  
print
(
"CSV file is here!"
)
}
 
else if
 
(
file.exists
(
"births_sm.rdata"
)){
 
# Optional!
  
print
(
"Can't find csv, but found rdata!"
)
}
 
else
 
{
 
# Optional! Note all on same line!
  
print
(
"Where's the file?"
)
}
[Boolean test]
C
o
n
d
i
t
i
o
n
a
l
 
F
u
n
c
t
i
o
n
s
d
p
l
y
r
:
:
i
f
_
e
l
s
e
(
)
 
f
u
n
c
t
i
o
n
# vectorized ifelse()
# dplyr - T/F must be same type
if_else
(
births
$
mage
[
1
:
10
]
 
<
 
20
, 
"Teenager"
, 
">20"
)
# base - not as strict!
ifelse
(
births
$
mage
[
1
:
10
]
 
<
 
20
, 
"Teenager"
, 
20
)
# Nice for recoding!
visit_tail 
=
 
tail
(
births
$
visits, 
20
)
if_else
(
visit_tail 
%in%
 
c
(
88
, 
99
)
, 
NA
, visit_tail
)
births
$
visits_fixed 
=
 if_else
(
births
$
visits 
%in%
 
c
(
88
, 
99
)
,
                              
NA
, births
$
visits
)
# ^ often / next week this will be in a mutate() statement
[Boolean test]
C
o
n
d
i
t
i
o
n
a
l
 
F
u
n
c
t
i
o
n
s
d
p
l
y
r
:
:
c
a
s
e
_
w
h
e
n
(
)
# case_when
head
(
births
)
births
$
smoker_mage 
=
 case_when
(
# Cascading if...else ifs...else
  births
$
mage 
<
 
21
 
&
 births
$
cigdur 
==
 
"Y"
 
~
 
"Younger Smoker"
,
  births
$
mage 
<
 
21
 
&
 births
$
cigdur 
==
 
"N"
 
~
 
"Younger Non-Smoker"
,
  births
$
mage 
>=
 
21
 
&
 births
$
cigdur 
==
 
"Y"
 
~
 
"Older Smoker"
,
  births
$
mage 
>=
 
21
 
&
 births
$
cigdur 
==
 
"N"
 
~
 
"Older Non-Smoker"
,
  
TRUE
 
~
 
"Something else!" 
# TRUE ~ to catch what's left
)
 
# (Who knows what young and old is! Referencing new law @ 21)
# Note the use of the Formula operator: ~ (tilde). Will see again.
head
(
births
[
, 
c
(
"mage"
, 
"cigdur"
, 
"smoker_mage"
)])
# ^ Haven't seen dplyr verbs *quite* yet!
I
t
e
r
a
t
i
o
n
f
o
r
 
-
 
l
o
o
p
 
b
y
 
n
u
m
b
e
r
# One approach to the warm-up – we'll have more
for
 
(
i 
in
 seq_along
(
names
(
births
))){
  
print
(
class
(
pull
(
births
[
,i
])))
}
# ^ won't print w/o print() in for loop or function!
# v Why seq_along is better than i in 1:n
seq_along
(
integer
(
0
))
1
:
0
Vectorization is almost always better! 
But how…?
I
t
e
r
a
t
i
o
n
f
o
r
 
-
 
l
o
o
p
 
o
n
 
e
a
c
h
 
s
t
r
i
n
g
is.numeric
(
births
[[
"mage"
]])
for
 
(
var_name 
in
 
names
(
births
)){
  
if
(
is.numeric
(
births
[[
var_name
]])){
    
print
(
paste
(
var_name, 
"is numeric"
))
    
print
(
summary
(
births
[[
var_name
]]))
  
}
 
else
 
{
    
print
(
paste
(
var_name, 
"isn't!"
))
  
}
} 
# See while() statement- not covering
I
t
e
r
a
t
i
o
n
w
h
i
l
e
(
)
 
-
 
u
n
k
n
o
w
n
 
s
e
q
 
l
e
n
g
t
h
for
 
(
i 
in
 seq_along
(
x
))
 
{
  
# body
}
# Equivalent to
i 
<-
 
1
while
 
(
i 
<=
 
length
(
x
))
 
{
  
# body
  i 
<-
 i 
+
 
1
} 
# Careful! Want an infinite loop?
When to (traditionally) loop
Want to act quite differently on each
iteration, perhaps testing as you go.
Want to manage environment / memory
smartly (delete duplicates as you read
and combine large files, for instance)
A few other cases…
W
h
e
n
 
N
O
T
 
t
o
 
l
o
o
p
Most of the time!
When you care more about the output
than the "bookkeeping"
When you want to do the same thing
many times.
What do we need?
Need some way to save complicated
steps (FUNCTIONS), then vectorize more
these complicated actions, AKA…
I
T
E
R
A
T
E
 
o
v
e
r
 
t
h
e
 
e
l
e
m
e
n
t
s
 
o
f
 
/
 
l
e
n
g
t
h
 
o
f
a
 
T
H
I
N
G
 
a
n
d
 
d
o
 
S
T
U
F
F
.
We'll get there (BRIEF intro to purrr!)
after functions.
F
u
n
c
t
i
o
n
a
l
s
We'll come back to this to
practice, but want you to hear
the concept early.
Will take repetition!
Functions that take
functions as params.
A powerful, compact way
to 
iterate
F
u
n
c
t
i
o
n
a
l
s
:
l
a
p
p
l
y
(
)
 
(
*
a
p
p
l
y
 
f
a
m
i
l
y
)
 
i
n
 
B
a
s
e
 
R
lapply
(
births, 
summary
)
lapply
(
births, 
class
)
sapply
(
births, 
class
)
https://r4ds.had.co.nz/iteration.html#the-map-functions
http://adv-r.had.co.nz/Functional-programming.html
Walk some list, do some thing.
Would be nice to control what
we're returning, have consistent
syntax, be faster, and have some
other conveniences… purrr!
F
u
n
c
t
i
o
n
a
l
s
:
m
a
p
 
f
u
n
c
t
i
o
n
s
 
-
 
p
u
r
r
r
 
i
n
 
t
i
d
y
v
e
r
s
e
# Intuition check
sapply
(
c
(
"mike"
, 
"hillary"
, 
"your_name"
)
, str_to_title
)
map_chr
(
c
(
"mike"
, 
"hillary"
, 
"your_name"
)
, str_to_title
)
map_chr
(
list
(
"mike"
, 
"hillary"
, 
"your_name"
)
, str_to_title
)
# Remember a vector is a kind of (homogenous) list!
# More practically: give map* a named list, get a...
map
(
births, 
class
)
 
# ... named char vector
map_chr
(
births, 
class
)
 
# ... named char vector
map_lgl
(
births, is.numeric
)
 
# ... named lgl vector
map_dbl
(
births
[
, map_lgl
(
births, is.numeric
)]
, 
max
)
 
# etc
# ...see cheat sheet for more!
F
u
n
c
t
i
o
n
a
l
s
:
m
a
p
 
f
u
n
c
t
i
o
n
s
 
-
 
p
u
r
r
r
 
i
n
 
t
i
d
y
v
e
r
s
e
The … in functions
represents a list of other
parameters, often
passed to another
function inside. So:
F
u
n
c
t
i
o
n
a
l
s
:
m
a
p
 
f
u
n
c
t
i
o
n
s
 
 
m
a
p
2
*
(
)
,
 
p
m
a
p
*
(
)
# More advanced:
map_int
(
c
(
1
:
5
, NA), 
sum, na.rm=T
)
# map2_* and pmap_* versions
map2_int
(
1
:
5
, 
6
:
10
, 
sum
)
pmap
(
list
(
n
=
rep
(
3
,
3
)
, 
mean
=
rep
(
0
,
3
)
, 
sd
=
1
:
3
)
, 
rnorm
)
# 23 total variations...
F
u
n
c
t
i
o
n
a
l
s
:
m
a
p
 
f
u
n
c
t
i
o
n
s
 
 
p
u
r
r
r
 
i
s
 
a
 
d
e
e
p
 
w
e
l
l
!
# Just 'walk' and do, don't return
walk
(
c
(
"Mike"
, 
"Hillary"
)
, 
cat
, 
"is here\n"
)
# e.g. walk2() (A) a list of filenames and 
# (B) a list of tables: write_csv them all.
# 'Adverbs' - Whoah.
map_dbl
(
list
(
1
, 
10
, 
"a"
)
, 
log
)
 
# Nope
map_dbl
(
list
(
1
, 
10
, 
"a"
)
, possibly
(
log
, NA_real_
))
# ^ Works now!
# use the future_map* functions to distribute the
processing over all your processors, or network nodes.
# Brain strain yet?! We'll work a single use case in HW
# and have a follow-up lecture on advanced purrr
F
u
n
c
t
i
o
n
a
l
s
:
m
a
p
 
f
u
n
c
t
i
o
n
s
 
 
l
i
k
e
,
 
r
e
a
l
l
y
 
d
e
e
p
.
 
L
A
T
E
R
#................
# We're not ready to dig into this yet, 
# Consider running this piece by piece a challenge problem.
# We'll need dplyr, modeling, and more purrr
births 
%>%
  
filter
(
sex 
%in%
 
1
:
2
)
 
%>%
  group_by
(
sex
)
 
%>%
 nest
()
 
%>%
  mutate
(
model 
=
 map
(
data
, 
~
 
lm
(
data
 
=
 .x, wksgest 
~
 mage
)))
 
%>%
  mutate
(
coefs 
=
 map
(
model, broom
::
tidy
))
 
%>%
  mutate
(
confints 
=
 map
(
model, broom
::
confint_tidy
))
 
%>%
  unnest
(
c
(
coefs, confints
))
 
%>%
  
filter
(
term 
==
 
"mage"
)
#................
F
u
n
c
t
i
o
n
s
 
F
u
n
c
t
i
o
n
s
:
 
T
h
e
 
P
i
p
e
!
# …Continuing tidyverse transition
# The Pipe
summary
(
births
)
births 
%>%
 
summary
births 
%>%
 
summary
()
 
# same
c
(
1
:
10
, 
NA
)
 
%>%
 
max
c
(
1
:
10
, 
NA
)
 
%>%
 
max
(
na.rm
=
T
)
# Mike's answer to  warmup!
births 
%>%
 map_chr
(
class
)
 
%>%
 
table
 
F
u
n
c
t
i
o
n
s
:
 
C
r
e
a
t
e
 
y
o
u
r
 
o
w
n
# Grammar suggestions: 
# (1) vectors are nouns, functions are verbs. 
# (2) data first (pipe friendly).
make_99_missing 
=
 
function
(
x
){
  x
[
x
==
99
]
 
=
 
NA
  
return
(
x
)
 
# explicit
}
make_99_missing 
=
 
function
(
x
){
x
[
x
==
99
]
 
=
 
NA
;x
}
# ^ also valid. last operation is returned.
make_99_missing
(
c
(
1
:
4
, 
88
, 
99
))
c
(
1
:
4
, 
88
, 
99
) %>% 
make_99_missing
()
 
F
u
n
c
t
i
o
n
s
:
 
C
r
e
a
t
e
 
y
o
u
r
 
o
w
n
# hard coding '99' everywhere is bad news...
make_nums_missing 
=
 
function
(
x, nums
=
99
){
  x
[
x 
%in%
 nums
]
 
=
 
NA
;x
}
make_nums_missing
(
c
(
1
:
4
, 
88
, 
99
))
make_nums_missing
(
c
(
1
:
4
, 
88
, 
99
)
, 
c
(
88
,
99
))
# can use names or param order
c
(
1
:
4
, 
88
, 
99
)
 
%>%
 make_nums_missing
(
c
(
88
,
99
))
make_nums_missing
(
num 
=
 
c
(
88
,
99
)
, x
=
c
(
1
:
4
, 
88
, 
99
))
 
F
u
n
c
t
i
o
n
s
:
 
A
n
o
n
y
m
o
u
s
 
F
u
n
c
t
i
o
n
s
# More powerful purrr - with user defined functions.
numeric_births 
=
 births
[
, births 
%>%
 map_lgl
(
is.numeric
)]
numeric_births 
%>%
 map_dfc
(
make_nums_missing
)
# map_dfc = get list... return Data.Frame Columns (dfc)
# Functionals can use 'anonymous' functions
numeric_births 
%>%
 map_dfc
(
function
(
x
)
 
{
x
[
x 
==
 
99
]=
NA
;x
})
numeric_births 
%>%
  map_dfc
(~
 
{
.x
[
.x 
%in%
 
c
(
88
,
99
)]=
NA
;.x
})
 
%>%
  
tail
(
20
)
 
F
u
n
c
t
i
o
n
s
:
 
U
n
d
e
r
 
t
h
e
 
H
o
o
d
#.............
# How 1 function understands many classes?!
summary
(
births
)
; 
summary
(
births
$
mage
)
# summary() is a generic / default 
# that calls a class-specific version
# ...simplifying a little bit, but the gist
summary.data.frame
(
births
)
?
summary.data.frame 
# Or F1
?
summary.lm
# Useful if looking for help on a function
Next Class….
Starting the homework (recoding)
together in earnest!
(Recommend you start
if you haven't already, though)
Done!
Slide
Template
Bank
P
o
s
t
-
C
h
e
c
k
Y
o
u
 
t
r
y
!
 
O
p
e
n
 
R
 
a
n
d
1)
Thing 1
2)
Thing 2
3)
Head to the google doc and
type “Done” next to your
name when done!
P
r
e
-
C
h
e
c
k
 
/
 
P
o
s
t
-
C
h
e
c
k
 
/
P
a
u
s
e
 
&
 
C
h
e
c
k
A
n
s
w
e
r
 
i
n
 
t
h
e
 
g
o
o
g
l
e
 
d
o
c
After 
doing a thing
, enter
something
A.
B.
C.
Example: Type order (e.g. BCA)
into google doc.
Small Group Activity
Template
A
B
C
Slide Note
Embed
Share

Exploring important epidemiology concepts such as exposure, outcome, risk, confounders, effect measures, and more, this content delves into variable selection using Directed Acyclic Graphs (DAGs) for causal inference in research and analysis. Understanding these concepts is crucial for conducting robust epidemiological studies.

  • Epidemiology
  • Research
  • Analysis
  • DAGs
  • Causal Inference

Uploaded on Oct 09, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. Usual: 1) Lecture 2) Scratchpad, births loaded 3) Homework 1 Open 4) Notes doc 5) Warmup question using your scratchpad/ homework script: As you arrive . How many of each type of variable is the births2012_small.csv dataset read with read_csv()? How did you find out? Hint: class() The usual! "Not sure" is always ok!

  2. EPID 701 Spring 2020 EPID 701 Spring 2020 R for Epidemiologists R Coding III: Last of our tools: control & functions (and HW deep dive) 2020.01.23 L5 Mike learnr.web.unc.edu

  3. 1. Homework Intro! 2. Control and Functionals Today 3. Functions

  4. Homework Homework Project Project

  5. Motivating Question Does early prenatal care (PNC) reduce preterm birth?

  6. Prototypical Epi Analysis Literature review Hypothesis / question generation Prepare, Explore & Recode Data (HW1*) Select / Subset Covariates (HW2) Functional Form & Crude/Basic Models (HW3) Confounding & Effect Measure Modification (HW4) Graphics & Outputs (HW5) Maps (short HW6*) Bonus stuff Nope! We re picking it. *We ll revisit some early stuff in future assignments, since some of the power of the fancier stuff you d generally apply right away! But this is the gist.

  7. Important Epidemiology Concepts from the EPID Methods Sequence that we will be using but abbreviating hard: Exposure Outcome Risk Risk Difference/Ratio Rate Confounder Mediator Effect Measure Modifier Directed Acyclic Graphs (DAGs) for Causal Inference

  8. DAGs Directed Acyclic Graphs (DAGs) inform our variable selection and treatment in models (based on their status as mediators, confounders, effect measure modifiers, etc. We will not elaborate in this class! Take the Epi sequence for more. DAG from EPID 716 / Christy Avery

  9. Important Epidemiology Concepts from the EPID Methods* Sequence that we will not be using or only minimally use. Hand calculations Confidence intervals (minimal) Odds ratios DAGs .And many more! *Covered in depth in EPID 715/716 in a SAS base and the core EPID sequence.

  10. Important Epidemiology Concepts NOT IN NOT IN the EPID Core Sequence that we will be using, because they come up a lot in public health practice, paper writing, teamwork, etc. Maps! Table & report generation Organizing a large analysis More critical analysis on disparities Etc.

  11. Motivating Question a bit more specifically Does reduce when controlling for obvious confounders early prenatal care = PNC during or before 5th month preterm birth = less than 37 weeks Literature note: uh, only maybe sorta? It s more complex than we ll be treating it for this class. But let s mostly drop that for now! Feel free to explore the literature here. Think: PNC seems to be good. Let s figure out how good!

  12. Relevant Variables* Exposure/Outcome Mdif: Month Prenatal Care Began Wksgest: Calculated Estimate of Gestation Covariates Mage: Maternal age Mrace: Maternal Race Methnic: Hispanic Origin of Mother Cigdur: Cigarrette Smoking During Pregnancy Cores: Residence of Mother -- County Look ahead: actually, we ll be creating some modified versions of these, but these are our base elements. And a sidenote on case / style .

  13. Relevant Variables Selection Criteria Plur: Plurality of birth (twins, triplets, etc.) Wksgest: Calculated Estimate of Gestation DOB: Date of birth of baby Congenital Anomalies: multiple variables with congenital anomaly status (future: purrr/apply) Sex: Infant sex Visits: Total Number of Prenatal Care Visits

  14. Control Control Quickly! Just because you can doesn't mean you should

  15. Control To do or not to do or do repeatedly. R has traditional control structures if() {}, elseif() {}, else{}, ifelse(), for(){} Base R if_else(), case_when() Tidyverse but also benefits from vectorization!

  16. Conditional Functions If () {} If () {} # if([BOOLEAN]){[DO STUFF]} [Boolean test] if(length(names(births)) > 10){ print ("lots of variables!") } # {} treats many statements as 1 block* # For only very short ifs (and functions) if(length(names(births)) > 10) "Bigger" else "Smaller!"

  17. Conditional Functions Longer if else Longer if else # if([BOOLEAN]){[DO STUFF]} [Boolean test] if(file.exists("births2012_small.csv")){ print("CSV file is here!") } else if (file.exists("births_sm.rdata")){ # Optional! print("Can't find csv, but found rdata!") } else { # Optional! Note all on same line! print("Where's the file?") }

  18. Conditional Functions dplyr dplyr:: ::if_else if_else() function () function # vectorized ifelse() # dplyr - T/F must be same type if_else( (births$ $mage[ [1: :10] ] < < 20, "Teenager", ">20") ) [Boolean test] # base - not as strict! ifelse( (births$ $mage[ [1: :10] ] < < 20, "Teenager", 20) ) # Nice for recoding! visit_tail = = tail( (births$ $visits, 20) ) if_else( (visit_tail %in% c( (88, 99) ), NA NA, visit_tail) ) births$ $visits_fixed = = if_else( (births$ $visits %in% c( (88, 99) ), NA NA, births$ $visits) ) # ^ often / next week this will be in a mutate() statement

  19. Conditional Functions dplyr dplyr:: ::case_when case_when() () # case_when head( (births) ) births$ $smoker_mage = = case_when( (# Cascading if...else ifs...else births$ $mage < < 21 & & births$ $cigdur == == "Y" ~ ~ "Younger Smoker", births$ $mage < < 21 & & births$ $cigdur == == "N" ~ ~ "Younger Non-Smoker", births$ $mage >= >= 21 & & births$ $cigdur == == "Y" ~ ~ "Older Smoker", births$ $mage >= >= 21 & & births$ $cigdur == == "N" ~ ~ "Older Non-Smoker", TRUE TRUE ~ ~ "Something else!" # TRUE ~ to catch what's left ) ) # (Who knows what young and old is! Referencing new law @ 21) # Note the use of the Formula operator: ~ (tilde). Will see again. head( (births[ [, c( ("mage", "cigdur", "smoker_mage")]) )]) # ^ Haven't seen dplyr verbs *quite* yet!

  20. Iteration for for - - loop by number loop by number # One approach to the warm-up we'll have more for for ( (i in in seq_along( (names( (births))){ ))){ print( (class( (pull( (births[ [,i]))) ]))) } } # ^ won't print w/o print() in for loop or function! # v Why seq_along is better than i in 1:n seq_along( (integer( (0)) )) 1: :0 Vectorization is almost always better! But how ?

  21. Iteration for for - - loop on each string loop on each string is.numeric( (births[[ [["mage"]]) ]]) for for ( (var_name in if if( (is.numeric( (births[[ in names( (births)){ [[var_name]])){ )){ ]])){ print( (paste( (var_name, "is numeric")) print( (summary( (births[[ } } else else { { )) [[var_name]])) ]])) print( (paste( (var_name, "isn't!")) )) } } } } # See while() statement- not covering

  22. When to (traditionally) loop Want to act quite differently on each iteration, perhaps testing as you go. Want to manage environment / memory smartly (delete duplicates as you read and combine large files, for instance) A few other cases

  23. When NOT NOT to loop Most of the time! When you care more about the output than the "bookkeeping" When you want to do the same thing many times.

  24. What do we need? Need some way to save complicated steps (FUNCTIONS), then vectorize more these complicated actions, AKA ITERATE ITERATE over the elements of / length of a THING THING and do STUFF STUFF. We'll get there (BRIEF intro to purrr!) after functions.

  25. Functions that take functions as params. A powerful, compact way to iterate Functionals Functionals We'll come back to this to practice, but want you to hear the concept early. Will take repetition!

  26. Functionals: lapply() (*apply family) in Base R lapply() (*apply family) in Base R lapply( (births, summary) ) lapply( (births, class) ) sapply( (births, class) ) Walk some list, do some thing. Would be nice to control what we're returning, have consistent syntax, be faster, and have some other conveniences purrr! https://r4ds.had.co.nz/iteration.html#the-map-functions http://adv-r.had.co.nz/Functional-programming.html

  27. Functionals: map map functions functions - - purrr purrr in in tidyverse tidyverse # Intuition check sapply( (c( ("mike", "hillary", "your_name") ), str_to_title) ) map_chr( (c( ("mike", "hillary", "your_name") ), str_to_title) ) map_chr( (list( ("mike", "hillary", "your_name") ), str_to_title) ) # Remember a vector is a kind of (homogenous) list! # More practically: give map* a named list, get a... map( (births, class) ) # ... named char vector map_chr( (births, class) ) # ... named char vector map_lgl( (births, is.numeric) ) # ... named lgl vector map_dbl( (births[ [, map_lgl( (births, is.numeric)] )], max) ) # etc # ...see cheat sheet for more!

  28. Functionals: map map functions functions - - purrr purrr in in tidyverse tidyverse The in functions represents a list of other parameters, often passed to another function inside. So:

  29. Functionals: map map functions functions map2*(), map2*(), pmap pmap*() *() # More advanced: map_int( (c( (1: :5, NA), sum, na.rm=T) ) # map2_* and pmap_* versions map2_int( (1: :5, 6: :10, sum) ) pmap( (list( (n= =rep( (3,3) ), mean= =rep( (0,3) ), sd= =1: :3) ), rnorm) ) # 23 total variations...

  30. Functions Functions

  31. Functions: The Pipe! The Pipe! # Continuing tidyverse transition # The Pipe summary( (births) ) births %>% summary births %>% summary() () # same c( (1: :10, NA c( (1: :10, NA NA) ) %>% max NA) ) %>% max( (na.rm= =T) ) # Mike's answer to warmup! births %>% map_chr( (class) ) %>% table

  32. Functions: Create your own reate your own # Grammar suggestions: # (1) vectors are nouns, functions are verbs. # (2) data first (pipe friendly). make_99_missing = = function function( (x){ ){ x[ [x== ==99] ] = = NA NA return( (x) ) # explicit } } make_99_missing = = function function( (x){ ){x[ [x== ==99] ] = = NA NA;x} } # ^ also valid. last operation is returned. make_99_missing( (c( (1: :4, 88, 99)) )) c( (1: :4, 88, 99) %>% ) %>% make_99_missing() ()

  33. Functions: Create your own reate your own # hard coding '99' everywhere is bad news... make_nums_missing = = function function( (x, nums= =99){ ){ x[ [x %in% nums] ] = = NA NA;x } } make_nums_missing( (c( (1: :4, 88, 99)) )) make_nums_missing( (c( (1: :4, 88, 99) ), c( (88,99)) )) # can use names or param order c( (1: :4, 88, 99) ) %>% make_nums_missing( (c( (88,99)) )) make_nums_missing( (num = = c( (88,99) ), x= =c( (1: :4, 88, 99)) ))

  34. Functions: Anonymous Functions Anonymous Functions # More powerful purrr - with user defined functions. numeric_births = = births[ [, births %>% map_lgl( (is.numeric)] )] numeric_births %>% map_dfc( (make_nums_missing) ) # map_dfc = get list... return Data.Frame Columns (dfc) # Functionals can use 'anonymous' functions numeric_births %>% map_dfc( (function function( (x) ) { {x[ [x == == 99]= ]=NA NA;x}) }) numeric_births %>% map_dfc(~ (~ { {.x[ [.x %in% c( (88,99)]= )]=NA NA;.x}) }) %>% tail( (20) )

  35. Functions: Under the Hood Under the Hood #............. # How 1 function understands many classes?! summary(births); summary(births$mage) # summary() is a generic / default # that calls a class-specific version # ...simplifying a little bit, but the gist summary.data.frame(births) ?summary.data.frame # Or F1 ?summary.lm # Useful if looking for help on a function

  36. Next Class. Starting the homework (recoding) together in earnest! (Recommend you start if you haven't already, though)

  37. Done!

More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#