Introduction to R Workshop - STAT CLUB November 19, 2014
Hosted by STAT CLUB on November 19, 2014, the Introduction to R Workshop covered the basics of R, including getting started, importing data, common commands, and the features of R as a free, open-source software. It outlined the installation process and interactions with R, emphasizing the script editor for writing, editing, and saving code.
Download Presentation
Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
E N D
Presentation Transcript
Introduction to R A workshop hosted by STAT CLUB November 19, 2014
Outline I. II. Getting started III. The very basics IV. Importing data V. Common commands About R An Introduction to R 9/30/2024 2
I. About R An Introduction to R 9/30/2024 3
What is R? Free, open-source software with its own language Works on all operating systems Extensive: LOTS of built-in functions and downloadable packages availble Flexible: can define your own functions, modify existing commands, cutsomize graphics, etc. Powerful: can do all sorts of analyses and can handle large data sets Integrates in other environments such as Excel, LaTeX, Hadoop, etc. An Introduction to R 9/30/2024 4
II. Getting started An Introduction to R 9/30/2024 5
Installing R Download from http://lib.stat.cmu.edu/R/CRAN/ Select the correct platform Download the base package Run the set-up It s quick, easy, and free! An Introduction to R 9/30/2024 6
An Introduction to R 9/30/2024 7
Interacting with R The command prompt > indicates that we can begin typing a command Hit Esc key to exit out of a line of code Basic rule: type a command and hit enter to execute it Hit the Stop button at top to stop running a line of code For example: x = 1:100 creates a vector of values 1,2, ,100 An Introduction to R 9/30/2024 8
R Script/Editor A files where you can write, edit, and save codes Go to File > New script When you have typed in the code you want to run, highlight the chunk you want to run and either hit ctrl+R or right-click and select Run line or selection You can save this script for later use by hit ctrl+S or going to File > Save while the Script window is activated Will NOT save any results of running the commands saves the text script only An Introduction to R 9/30/2024 9
R Script/Editor An Introduction to R 9/30/2024 10
Workspaces An R workspace includes all the functions and variables (called objects ) defined in a session The output associated with any command you ve run will be stored in the workspace Can be saved by going to File > Save Workspace Load workspace by going to File > Load Workspace If you want to clear some part of the workspace, use rm() Use ls() to see what has been saved An Introduction to R 9/30/2024 11
Working directory Where everything will be saved/loaded from by default It is, by default, usually in My Documents Can change this under File > Change dir If you want to save/load from a different place, can usually just type the file path into the name of the file and R will find it Makes it easier to always work from your working directory An Introduction to R 9/30/2024 12
R packages Collections of R functions and datasets created by others Many standard packages included, others have to be downloaded If you know the name of the package, can install it by going to Package > Install Packages or by using install.packages( packagename ) in command line Even if a package is installed on your computer, R will not automatically use it so if you need to use a function from a package, use library( packagename ) in command line An Introduction to R 9/30/2024 13
Use as a calculator Basic arithmetic can be done intuitively: 12+5, 3/8, 17+(6*5), 5^2, Etc. Don t use brackets! They mean something else! Use parentheses An Introduction to R 9/30/2024 14
R commands and language Mostly in the form of functions: mean(x), plot(x,y), etc. CaSe SenSitVe! Spaces don t usually mean anything Can use periods . and underscores _ in object names An Introduction to R 9/30/2024 15
Getting help in R Go to Help menu If you know the exact command that you need help with, can type ? before command name in console, and this will bring up an online documentation for the commands If you do not know the exact command but have an idea of what it might look like or what words may be used in the description, type ?? before the command Google! An Introduction to R 9/30/2024 16
An Introduction to R 9/30/2024 17
III. The very basics An Introduction to R 9/30/2024 18
Basics to know first Creating your own Objects (variables, vectors, matrices, lists, functions, etc.) Assigning names to these objects Learning to access objects Performing simple calculations and transformations on these objects An Introduction to R 9/30/2024 19
Types of objects Functions that you can perform on or with objects depends on their class or type: Numeric (double-precision numbers) Double (same as numeric) Integer (integer-valued; rarely used) Character (strings, non-numerical values) Matrix (matrix of numerical values) Logical (Boolean true/false) Factor ( groups or levels) List (list of other types of objects) Dataframe (table or other collection of data that is numerical or non-numerical) Functions (functions that take inputs) To find out which class a variable belongs to, use class() To determine the dimensions of an object use dim() Verify a class by using is.numeric(), is.character(), is.logical(), is.data.frame, etc. Change a class by using as.numeric(), as.character(), as.logical(), as.data.frame, etc. An Introduction to R 9/30/2024 20
Single value Use = or <- to assign name to a value Use quotations if not a numerical value Example: x = 36 Example: y <- age An Introduction to R 9/30/2024 21
Vectors Vector: c() Use = or <- to assign name to vector If the vector contains non-numerical values, use quotations Example: mileage = c(1200,200,6700,1000,1200) Example: type <- c( Compact , Minivan , SUV , Roadster , Truck ) An Introduction to R 9/30/2024 22
Matrix Matrix: matrix(data=c(2,3,4,5), nrow=2, ncol=2) data = vector of values you want entered in (enters in by COLUMN!) nrow = number of rows ncol = number of columns An Introduction to R 9/30/2024 23
Dataframe Like a table Can contain both numerical and string variables Use data.frame(vars) An Introduction to R 9/30/2024 24
Lists Each element in a list can be ANY object vector, matrix, dataframe, even another list! Use list(vars) An Introduction to R 9/30/2024 25
Functions Creating functions are more complex Of the form: g <-function(var1,var2) {var1 + var2} g is the function name Var1, var2 are the input variables The function goes in the curly brackets To use the function: g(input1, input2) An Introduction to R 26 9/30/2024
IV. Importing data An Introduction to R 9/30/2024 27
Importing data Can import from many formats (.txt, .csv, .xls, .xlsx, .sav, .dta, .ssd, ) Recommend .txt or .csv others need packages If in working directory: data1 = read.table( mydata.txt , header=TRUE, sep= , ) header = TRUE indicates that a row of column headings/titles are included in the file; set to FALSE if not sep= , indicates that a comma is separating records, like in a .csv; can remove this code if separated by space or tab (default); or can modify if separated by something else If not in working directory, use file path: data1 = read.table( C:/Users/xyz/Desktop/folder/mydata.text , header=TRUE, sep= , ) An Introduction to R 9/30/2024 28
Working with data sets Attach datasets to the current space: attach(dataset) Use a variable from a dataset: dataset$varname Retrieve the names of the variables: names(dataset) Take a subset of your data according to some criterion: subset(dataset,criterion) An Introduction to R 9/30/2024 29
V. Common commands An Introduction to R 9/30/2024 30
Arithmetic/calculator Add: + Subtract: - Multiply: * Divide: / Raise to a power: ^ Natural logarithm: log() Exponentiation: exp() Square root: sqrt() An Introduction to R 9/30/2024 31
Vector commands Create a vector of numbers: c(num1,num2) Combine vectors together to create one: c(vec1,vec2) Create a vector of numbers from a to b in increments of 1: a:b Create a vector of numbers from a to be in increments of d: seq(a,b,d) Create a vector of numbers from a to b in equal increments such that the there are k total numbers: seq(a,b,length=k) Return the number of elements in a vector x: length(x) Sort entries in a vector x: sort(x, decreasing=FALSE) Element-wise arithmetic: 3*x, 4+x, log(x), sqrt(x), etc. Arithmetic of two vectors x and y will be element-wise: x*y, x+y, etc. An Introduction to R 9/30/2024 32
Matrix commands Create a matrix: matrix(vals,nrow,ncol) Create a diagonal matrix: diag(vals) Multiply matrices M1 and M2: M1 %*% M2 Note that M1*M2 will be element-wise Find the determinant of matrix M: det(M) Find inverse of matrix M: solve(M) Find transpose of matrix M: t(M) Combine matrices by column: cbind(M1,M2) Combine matrices by row: rbind(M1,M2) Find dimensions of a matrix M: dim(M) An Introduction to R 9/30/2024 33
Retrieving parts of objects Return the kth element of a vector x: x[k] Return the i,j th element of a matrix x: x[i,j] Return the kth object of a list x: x[[k]] Return the ith element of the kth object of a list x: x[[k]][i] Return the element or object called name : x$name Can retrieve more than one element at a time An Introduction to R 9/30/2024 34
Summaries and statistics Mean: mean(x) Standard deviation: sd(x) Median: median(x) Minimum: min(x) Maximum: max(x) Range(min and max): range(x) Sum: sum(x) Which index contains the minimum value: which.min Which index contains the maximum value: which.max An Introduction to R 9/30/2024 35
Logical Operators: > greater than, >= greater than or equal to, < less than, <= less than or equal to, == equal to, != not equal to, & and, | or Just entering some function of operators will return a Boolean( TRUE or FALSE ) vector R will many times treat TRUE as 1 and FALSE as 0 so that you can conduct mathematical operations on them Return indices of a vector that satisfies criterion: which(x > 45) To get the actual value: x[x>45] An Introduction to R 9/30/2024 36
Logical If-then statements: if (criterion) {command} else {command} For-loops: for (i in x){ commands } An Introduction to R 9/30/2024 37
Apply functions Apply a function to rows or columns of a matrix: apply(M, 1, mean) will take average across rows apply(M, 2, sum) will sum columns Apply a function to each element of a vector, list or data.frame: sapply(L, length) An Introduction to R 9/30/2024 38
Plots Scatterplot of a vector x and vector y: plot(x,y) Add points to an already-existing scatterplot: points(xvals,yvals) Add a line to an already-existing scatterplot: lines(xvals,yvals) Histogram of a vector of values x: hist(x) Histogram of err 20 15 Frequency 10 5 0 -3 -2 -1 0 1 2 err An Introduction to R 9/30/2024 39
Tables Create a table of frequencies from a vector of values x: table(x) Create a two-way table between vectors x and y of same length: table(x,y) An Introduction to R 9/30/2024 40
Linear regression Linear regression of y on x: lm(y~x) Can get more info using summary(model) An Introduction to R 9/30/2024 41
Working with datasets: linear regression Can find all objects in the model: names(model) An Introduction to R 9/30/2024 42
Hypothesis tests One-sample t-test for vector of values x: t.test(x, alternative= two.sided ,mu=0) Two-sample t-test between vectors x and y: t.test(x,y) Chi-squre test of independence in two-way table tab : chisq.test(tab) An Introduction to R 9/30/2024 43
Final warnings! Floating point arithmetic is not exact! Missing values are not excluded by default must use na.rm = TRUE option Combining different classes will all entries to be the same class Some things, such as quotation marks, cannot not be easily copied and pasted into R from other applications such as Word An Introduction to R 9/30/2024 44