Introduction to R: Data Wrangling Course Overview
This course introduces you to R and RStudio, covering basics of data processing and wrangling. Learn why R is essential for complex tasks, code sharing, and reproducibility. Understand RStudio environment, data input options, and data types like data frames. Dive into hands-on practice and troubleshooting to enhance your R skills.
Download Presentation
Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
E N D
Presentation Transcript
Introduction to R: Data Wrangling Lisa Federer, MLIS, MA, AHIP Research Data Informationist
Course Overview Introduction to R and RStudio Why R? What is RStudio? Terminology and syntax Hands-on practice Deciphering error messages and troubleshooting Basics of data processing/wrangling Getting more help
Why R? One command to carry out complex tasks
Why R? Easy to share code, which enhances reproducibility
Why R? stat.desc(batting_figures) #gives us a table of descriptive stats about each variable ## runs hits doubles X3B ## nbr.val 1.435000e+03 1435.000000 1435.0000000 1.435000e+03 ## nbr.null 6.680000e+02 609.000000 774.0000000 1.107000e+03 ## nbr.na 0.000000e+00 0.000000 0.0000000 0.000000e+00 ## min 0.000000e+00 0.000000 0.0000000 0.000000e+00 ## max 1.150000e+02 225.000000 53.0000000 1.200000e+01 ## range 1.150000e+02 225.000000 53.0000000 1.200000e+01 ## sum 1.976100e+04 41595.000000 8137.0000000 8.490000e+02 ## median 1.000000e+00 2.000000 0.0000000 0.000000e+00 ## mean 1.377073e+01 28.986063 5.6703833 5.916376e-01 ## SE.mean 6.159246e-01 1.261722 0.2574403 3.911696e-02 ## CI.mean.0.95 1.208210e+00 2.475020 0.5050000 7.673260e-02 ## var 5.443860e+02 2284.439136 95.1053635 2.195746e+00 ## std.dev 2.333208e+01 47.795807 9.7521979 1.481805e+00 ## coef.var 1.694324e+00 1.648924 1.7198481 2.504582e+00 new_batting <- batting[1:100, -3] #I can also specify info about both rows and columns. new_batting <- batting[c(1:100, 400:425), c(1, 2, 5)] #I can use the : to take all the rows/columns in a range, but I can also use c() to refer to some specific rows/columns, such as here, where I'm taking rows 1-100 and 400-425 of columns 1, 2, and 5. random.batting <- batting[sample(1:nrow(batting), 50), ] #I can even have R generate a random sample for me. Here, I've requested it to look through all the rows in my batting dataset and choose 50 random observations Everything in one places: data processing, analysis, and visualization
What is RStudio? Environment pane R Script pane Navigation pane Console IDE (integrated development environment) for R
Data input options Input data directly into R by typing it in Import data from many, many common file types (.txt, .csv, Excel files, XML, SAS, SPSS, Stata, etc, etc) Connect to web resources and other remote servers
Data types: data frame a list of variables of the same number of rows with unique row names (like a spreadsheet)
Data types within data frames Numeric variable = numeric or integer Ex: 1, 1.5, 200000, 3.14159 Text variable = character Ex: a, b, hello, 3b Nominal variable = factor Ex: cat, dog, pig, rhino, horse Ordinal variable = ordered factor Ex: xsmall, small, medium, large, xlarge True/false = logical Ex: TRUE, FALSE
Talking to R getwd() dat <- read.csv(file = "allFound.csv", header = TRUE) Function: how to tell R what to do, always followed by ()
Talking to R getwd() dat <- read.csv(file = "allFound.csv", header = TRUE) Argument: instructions that specify how a function should be run. Not always required, may be more than one, they are separated by commas, and they all go in the ()
Talking to R getwd() dat <- read.csv(file = "allFound.csv", header = TRUE) Assignment operator (<-): assigns the output of the function to a name that you can then refer back to