Data Reshaping Techniques in R

 
Reshaping data using R
 
James Gwinnutt
 
Outline
 
What is reshaping
How to do it in R
 
Reshaping
 
There are two ways to store clustered data (e.g. longitudinal data,
hierarchical data)
Long
Wide
 
 
Example
 
Reshaping
 
Baseline
 
6 hours
 
12 hours
 
001
 
001
 
001
 
38.3°
 
38.1°
 
37.6°
 
Long
 
Wide
 
Reshape in R
 
Two packages to do this
 
tidyr – pivot_longer & pivot_wider (formerly gather and spread)
reshape2 – melt and cast
 
Example dataset
 
Blood pressure data
120 people
Blood pressure data, “before” and “after” an intervention
 
 
 
bplong
 
<-
 
foreign::
read.dta
(
'https://www.stata-press.com/data/r8/bplong.dta'
)
 
Tidyr
 
pivot_wider to convert the data from long to wide format
 
 
bpwide
 
<-
 
bplong
 
%>%
 
pivot_wider
(
names_from 
=
 
'when'
, values_from 
=
 
'bp'
)
New (wide)
dataframe
Original (long)
dataframe
Follow-up
variable name
Measured
variable(s)
Function
 
2 measured variables?
 
bplong$test <- bplong$bp + 
1000
 
bpwide2 <- bplong %>% pivot_wider(names_from = 
'when'
, values_from = c(bp, test))
New variable
Use c() to make a
list of variables
 
Back to long
 
pivot_longer to convert data from wide to long
 
 
Back to long
 
pivot_longer to convert data from wide to long
 
 
bplong2 <- bpwide %>% 
pivot_longer
(c(Before, After), names_to = 
'timepoint'
, values_to = 
'bp'
)
New (long)
dataframe
Original (wide)
dataframe
Function
Variables to
convert from
wide to long
New variable
denoting the
timepoint
New name for
measured
variable
 
Multiple variables
Multiple variables
bplong3 <- bpwide2 %>%
 
pivot_longer(cols = -c(patient, sex, agegrp),
   
names_to = c(
".value"
, 
"timepoint"
),
   
names_pattern = 
"([A-Za-z]+)_([A-Za-z]+)"
)
Pivot all
variables except
these
New variable
names
Variable naming
pattern
 
reshape2
 
reshape2
 
First ‘melt’ the dataframe
 
molten
 
<-
 
melt
(
bplong
, measure.vars 
=
 
'bp'
)
 
reshape2
 
Then ‘cast’ it to desired shape
 
bpwide
 
<-
 
dcast
(
molten
, 
patient
 
+
 
sex
 
+
 
agegrp
 
~
 
when
)
 
Multiple variables
 
bplong$test <- bplong$bp + 1000
molten2 <- melt(bplong, measure.vars = c('bp', 'test'))
 
Multiple variables
 
bpwide2 <- dcast(molten2, patient + sex + agegrp ~ variable + when)
 
Reshape2 – wide to long
 
bplong2 <- melt(bpwide, id.vars = c("patient", "sex", "agegrp"))
 
Multiple variables
 
Multiple variables
 
bplong3 <- melt(bpwide2, id.vars = c("patient", "sex", "agegrp"))
Not very satisfying
library(splitstackshape)
bplong3 <- merged.stack(bpwide2, id.vars = c("patient", "sex", "agegrp"),
                        var.stubs = c("bp", "test"), sep = "_")
 
Learn more here
 
David’s page:
https://personalpages.manchester.ac.uk/staff/david.selby/stata/2021-
02-02-refinements/#reshaping-data
Slide Note
Embed
Share

Learn how to reshape data effectively using R, including the concepts of long and wide formats, practical examples, and the use of packages like tidyr and reshape2. Explore converting data from long to wide formats, handling measured variables, and transitioning back to long format seamlessly.

  • Data Reshaping
  • R Programming
  • Data Manipulation
  • Data Transformation
  • Data Analysis

Uploaded on Jul 16, 2024 | 1 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. Reshaping data using R James Gwinnutt

  2. Outline What is reshaping How to do it in R

  3. Reshaping There are two ways to store clustered data (e.g. longitudinal data, hierarchical data) Long Wide Example

  4. Reshaping Baseline 6 hours 12 hours 001 001 001 38.1 37.6 38.3

  5. Long ID Time (hours) Temperature (degrees) 001 0 38.3 001 6 38.1 001 12 37.6 Wide ID Temperature_0 Temperature_6 Temperature_12 001 38.3 38.1 37.6

  6. Reshape in R Two packages to do this tidyr pivot_longer & pivot_wider (formerly gather and spread) reshape2 melt and cast

  7. Example dataset Blood pressure data 120 people Blood pressure data, before and after an intervention bplong <- foreign::read.dta('https://www.stata-press.com/data/r8/bplong.dta')

  8. Tidyr pivot_wider to convert the data from long to wide format bpwide <- bplong %>% pivot_wider(names_from = 'when', values_from = 'bp') New (wide) dataframe Follow-up variable name Measured variable(s) Function Original (long) dataframe

  9. 2 measured variables? New variable bplong$test <- bplong$bp + 1000 bpwide2 <- bplong %>% pivot_wider(names_from = 'when', values_from = c(bp, test)) Use c() to make a list of variables

  10. Back to long pivot_longer to convert data from wide to long

  11. Back to long pivot_longer to convert data from wide to long bplong2 <- bpwide %>% pivot_longer(c(Before, After), names_to = 'timepoint', values_to = 'bp') New (long) dataframe New variable denoting the timepoint Function New name for measured variable Variables to convert from wide to long Original (wide) dataframe

  12. Multiple variables

  13. Pivot all variables except these Multiple variables New variable names bplong3 <- bpwide2 %>% pivot_longer(cols = -c(patient, sex, agegrp), names_to = c(".value", "timepoint"), names_pattern = "([A-Za-z]+)_([A-Za-z]+)") Variable naming pattern

  14. reshape2

  15. reshape2 First melt the dataframe molten <- melt(bplong, measure.vars = 'bp')

  16. reshape2 Then cast it to desired shape bpwide <- dcast(molten, patient + sex + agegrp ~ when)

  17. Multiple variables bplong$test <- bplong$bp + 1000 molten2 <- melt(bplong, measure.vars = c('bp', 'test'))

  18. Multiple variables bpwide2 <- dcast(molten2, patient + sex + agegrp ~ variable + when)

  19. Reshape2 wide to long bplong2 <- melt(bpwide, id.vars = c("patient", "sex", "agegrp"))

  20. Multiple variables

  21. Multiple variables bplong3 <- melt(bpwide2, id.vars = c("patient", "sex", "agegrp")) Not very satisfying

  22. library(splitstackshape) bplong3 <- merged.stack(bpwide2, id.vars = c("patient", "sex", "agegrp"), var.stubs = c("bp", "test"), sep = "_")

  23. Learn more here David s page: https://personalpages.manchester.ac.uk/staff/david.selby/stata/2021- 02-02-refinements/#reshaping-data

Related


More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#