Introduction to Programming in R: Coding, Debugging, and Optimizing

Slide Note
Embed
Share

Explore the world of programming in R with a focus on coding, debugging, and optimizing techniques. Learn from Katia Oleinik at Boston University about scientific computing and visualization in R. Discover the power of if statements, comparison operators, and enhance your skills in R programming.


Uploaded on Oct 08, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. Programming in R coding, debugging and optimizing Katia Oleinik koleinik@bu.edu Scientific Computing and Visualization Boston University http://www.bu.edu/tech/research/training/tutorials/list/

  2. if if (condition) { command(s) } else { command(s) } Comparison operators: == equal != not equal > (<) greater (less) >= (<=) greater (less) or equal Logical operators: & | ! and or not 2

  3. if ># define x > x <- 7 ># simple if statement > if ( x < 0 ) print("Negative") ># simple if if- -else > if ( x < 0 ) print("Negative") else print("Non-negative") [1] "Non-negative" else statement ># if if statement may be used inside other constructions >y <- if ( x < 0 ) -1 else 0 > y [1] 0 3

  4. if ># multiline if - else statement > if ( x < 0 ) { +x <- x+10 +print("x is negative: subtract 10") + } else if ( x == 0 ) { +print("x is equal zero") + } else { + print("x is positive: add 10") + } [1] positive Note: : For multiline if-statements braces are necessary even for single statement bodies. The left and right braces must be on the same line with else keyword (in interactive session). 4

  5. ifelse ifelse (test_condition, true_value, false_value) ># ifelse ifelse statement >y <- ifelse( x < 0, -1, 0 ) ># nested ifelse >y <- ifelse ( x < 0, -1, ifelse (x > 0, 1, 0) ) ifelse statement 5

  6. ifelse Best of all ifelse statement operates on vectors! ># ifelse ifelse statement on a vector >digits <- 0 : 9 >(odd <- ifelse( digits %% 2 > 0, TRUE, FALSE )) [1] FALSE TRUE FALSE TRUE FALSE TRUE FALSE TRUE FALSE TRUE 6

  7. ifelse Exercise: Exercise: define a random vector ranging from -10 to 10: x<- as.integer as.integer( runif create vector y, such that its elements equal to absolute values of x runif( 10, -10, 10 ) ) Note: normally, you would use abs() function to achieve this result 7

  8. switch switch (statement, list) ># simple switch >x <- 3 > switch( x, 2, 4, 6, 8 ) [1] 6 switch statement > switch( x, 2, 4 ) # returns NULL since there are only 2 elements in the list 8

  9. switch switch (statement, name1 = str1, name2 = str2, ) ># switch switch statement with named list >day <- "Tue" > switch(day, Sun = 0, Mon = 1, Tue = 2, Wed = 3, ) [1] 2 ># switch switch statement with a default value >food <- "meet" > switch( food, banana="fruit", carrot="veggie", "neither") [1] "neither" 9

  10. loops There are 3 statements that provide explicit looping: Built in constructs to control the looping: - repeat - for - while - next - break Note: Use explicit loops only if it is absolutely necessary. R has other functions for implicit looping, which will run much faster: apply(), sapply(), tapply(), and lapply(). 10

  11. repeat repeat { } statement causes repeated evaluation of the body until break is requested. Be careful infinite loop may occur! ># find the greatest odd divisor of an integer >x <- 84 > repeat{ + print(x) + if( x%%2 != 0) break + x <- x/2 + } [1] 84 [1] 42 [1] 21 > 11

  12. for for (object in sequence) { command(s) } ># print all words in a vector >names <- c("Sam", "Paul", "Michael") > > for( j in names ){ + print(paste("My name is" , j)) + } [1] "My name is Sam" [1] "My name is Paul" [1] "My name is Michael" > 12

  13. for for (object in sequence) { command(s) if ( ) next # return to the start of the loop if ( ) break # exit from (innermost) loop } 13

  14. while while (test_statement) { command(s) } ># find the largest odd divisor of a given number >x <- 84 > while (x %% 2 == 0){ + x <- x/2 + } >x [1] 21 > 14

  15. loops Exercise: Exercise: Using either loop statement print all the numbers from 0 to 30 divisible by 7. Use %% - modular arithmetic operator to check divisibility. 15

  16. function myFun <- function (ARG, OPT_ARGs ){ statement(s) } ARG: vector, matrix, list or a data frame OPT_ARGs: optional arguments Functions are a powerful R elements. They allows you to expand on existing functions by writing your own custom functions. 16

  17. function myFun <- function (ARG, OPT_ARGs ){ } statement(s) Naming: Variable naming rules apply. Avoid usage of existing (built-in) functions Arguments: Argument list can be empty. Some (or all) of the arguments can have a default value ( arg1 = TRUE ) The argument can be used to allow one function to pass on argument settings to another function. Return value: The value returned by the function is the last value computed, but you can also use return() statement. 17

  18. function ># simple function: calculate (x+1)2 >myFun <- function (x) { + x^2 + 2*x + 1 + } > myFun(3) [1] 16 > 18

  19. function ># function with optional arguments: calculate (x+a)2 >myFun <- function (x, a=1) { + x^2 + 2*x*a + a^2 + } > myFun(3) [1] 16 > myFun(3,2) [1] 25 > ># arguments can be called using their names ( and out of order!!!) > myFun( a = 2, x = 1) [1] 9 19

  20. function ># Some optional arguments can be specified as to pass them to another function >myFun <- function (x, ) { + plot(x, ) + } > ># print all the words together in one sentence >myFun <- function ( ) { + print(paste ( ) ) + } > myFun("Hello", " R! ") [1] "Hello R! " 20

  21. function Local and global variables: All variables appearing inside a function are treated as local, except their initial value will be of that of the global (if such variable exists). ># define a function >myFun <- function (x) { + cat ("u=", u, "\n") # this variable is local ! + u<-u+1 # this will not affect the value of variable outside f() + cat ("u=", u, "\n") + } > >u <- 2 # define a variable > myFun(5) #execute the function u= 2 u= 3 > > cat ("u=", u, "\n") # print the value of the variable u= 2 21

  22. function Local and global variables: If you want to access the global variable you can use the super- assignment operator <<-. You should avoid doing this!!! ># define a function >myFun <- function (x) { + cat ("u=", u, "\n") # this variable is local ! + u <<- u+1 # this WILL affect the value of variable outside f() + cat ("u=", u, "\n") + } > >u <- 2 # define a variable > myFun(u) #execute the function u= 2 u= 3 > > cat ("u=", u, "\n") # print the value of the variable u= 3 > 22

  23. function Call vector variables: Functions do not change their arguments. ># define a function >myFun <- function (x) { + x <- 2 + print (x) + } > >x <- 3 # assign value to x >y <- myFun(x) # call the function [1] 2 > > print(x) # print value of x [1] 3 > 23

  24. function Call vector variables: If you want to change the value of the function s argument, reassign the return value to the argument. ># define a function >myFun <- function (x) { + x <- 2 + print (x) + } > >x <- 3 # assign value to x >x <- myFun(x) # call the function [1] 2 > > print(x) # print value of x [1] 2 > 24

  25. function Finding the source code: You can find the source code for any R function by printing its name without parentheses. ># get the source code of lm() > lm function (formula, data, subset, weights, na.action, method = "qr", model = TRUE, x = FALSE, y = FALSE, qr = TRUE, singular.ok = TRUE, contrasts = NULL, offset, ...) { ret.x <- x ret.y <- y cl <- match.call() lm() function . . . z } <environment: namespace:stats> > 25

  26. function Finding the source code: For generic functions there are many methods depending on the type of the argument. ># get the source code of mean() > mean function (x, ...) UseMethod("mean") <environment: namespace:base> > mean() function 26

  27. function Finding the source code: You can first explore different methods and then chose the one you need. ># get the source code of mean() > methods("mean") [1] mean.Date mean.POSIXct mean.POSIXlt mean.data.frame [5] mean.default mean.difftime > ># get source code > mean.default function (x, trim = 0, na.rm = FALSE, ...) { if (!is.numeric(x) && !is.complex(x) && !is.logical(x)) { . . . z } <environment: namespace:stats> mean() function 27

  28. apply apply (OBJECT, MARGIN, FUNCTION, ARGs ) object: vector, matrix or a data frame margin: 1 rows, 2 columns, c(1,2) both function: function to apply args: possible arguments Description: Returns a vector or array or list of values obtained by applying a function to margins of an array or matrix 28

  29. apply Example: Create matrix and apply different functions to its rows and columns. ># create 3x4 matrix >x <- matrix( 1:12, nrow = 3, ncol = 4) >x [,1] [,2] [,3] [,4] [1,] 1 4 7 10 [2,] 2 5 8 11 [3,] 3 6 9 12 > 29

  30. apply Example: Create matrix and apply different functions to its rows and columns. ># create 3x4 matrix >x <- matrix( 1:12, nrow = 3, ncol = 4) >x [,1] [,2] [,3] [,4] [1,] 1 4 7 10 [2,] 2 5 8 11 [3,] 3 6 9 12 ># find median of each row > apply (x, 1, median) [1] 5.5 6.5 7.5 > 30

  31. apply Example: Create matrix and apply different functions to its rows and columns. ># create 3x4 matrix >x <- matrix( 1:12, nrow = 3, ncol = 4) >x [,1] [,2] [,3] [,4] [1,] 1 4 7 10 [2,] 2 5 8 11 [3,] 3 6 9 12 ># find mean of each column > apply (x, 2, mean) [1] 2 5 8 11 > 31

  32. apply Example: Create matrix and apply different functions to its rows and columns. ># create 3x4 matrix >x <- matrix( 1:12, nrow = 3, ncol = 4) >x [,1] [,2] [,3] [,4] [1,] 1 4 7 10 [2,] 2 5 8 11 [3,] 3 6 9 12 ># create a new matrix with values 0 or 1 for even and odd elements of x > apply (x, c(1,2), function (x) x%%2) [,1] [,2] [,3] [,4] [1,] 1 0 1 0 [2,] 0 1 0 1 [3,] 1 0 1 0 > 32

  33. lapply llapply() function returns a list: lapply(X, FUN, ...) ># create a list >x <- list(a = 1:10, beta = exp(-3:3), logic = c(TRUE,FALSE,FALSE)) ># compute the list mean for each list element > lapply (x, mean) $a [1] 5.5 $beta [1] 4.535125 $logic [1] 0.3333333 > 33

  34. sapply lsapply() function returns a vector or a matrix: sapply(X, FUN, ... , simplify = TRUE, USE.NAMES = TRUE) ># create a list >x <- list(a = 1:10, beta = exp(-3:3), logic = c(TRUE,FALSE,FALSE)) ># compute the list mean for each list element > sapply (x, mean) a beta logic 5.5000000 4.5351252 0.3333333 > 34

  35. code sourcing source ("file", ) file: file with a source code to load (usually with extension .r ) echo: if TRUE, each expression is printed after parsing, before evaluation. 35

  36. code sourcing Linux prompt katana:~ %emacs foo_source.r & Text editor # dummy function foo <- function(x){ x+1 } R session ># load foo.r source file > source ("foo_source.r") ># create a vector > x <- c(3,5,7) ># call function > foo(x) [1] 4 6 8 36

  37. code sourcing ># load foo.r source file > source ("foo_source.r", echo = TRUE) > # dummy function > foo <- function(x){ + x+1; + } ># create a vector > x <- c(3,5,7) ># call function > foo(x) [1] 4 6 8 37

  38. code sourcing Exercise: - - write a function that computes a logarithm of inverse of a number log(1/x) - save it in the file with .r extension - load it into your workspace - execute it - try execute it with input vector ( 2, 1, 0, -1 ). log(1/x) 38

  39. debugging R package includes debugging tools. cat () & print () print out the values pause the code execution and browse the code browser () debug (FUN) execute function line by line undebug (FUN) stop debugging the function 39

  40. debugging inv_log.r # dummy function inv_log <- function(x){ y <- 1/x browser() y <- log(y) } ># load foo.r source file > source ("inv_log.r", echo = TRUE) > # dummy function > inv_log <- function(x){ + y<-1/x; + browser(); + y<-log(y); + } > inv_log (x)# call function Called from: inv_log(x) Browse[1]> y # check the values of local variables [1] 0.3333333 0.5000000 1.0000000 Inf -1.0000000 40

  41. debugging <RET>Go to the next statement if the function is being debugged. Continue execution if the browser was invoked. c or contContinue execution without single stepping. n Execute the next statement in the function. This works from the browser as well. where Show the call stack. Q Halt execution and jump to the top-level immediately. To view the value of a variable whose name matches one of these commands, use the print() function, e.g. print(n). 41

  42. debugging inv_log.r # dummy function inv_log <- function(x){ y <- 1/x browser() y <- log(y) } ># load foo.r source file > source ("inv_log.r", echo = TRUE) > # dummy function > inv_log <- function(x){ + y<-1/x; + browser(); + y<-log(y); + } > inv_log (x)# call function Called from: inv_log(x) Browse[1]> y [1] 0.3333333 0.5000000 1.0000000 Inf -1.0000000 Browse[1]> n debug: y <- log(y) Browse[2]> Warning message: In log(y) : NaNs produced > 42

  43. debugging inv_log.r # dummy function inv_log <- function(x){ y <- 1/x y <- log(y) } ># load foo.r source file > source ("inv_log.r", echo = TRUE) > # dummy function > inv_log <- function(x){ + y<-1/x; + y<-log(y); + } > debug(inv_log)# debug mode > inv_log (x)# call function Called from: inv_log(x) debugging in: inv_log(x) debug: { y <- 1/x y <- log(y) } Browse[2]> . . . > undebug(inv_log)# exit debugging mode 43

  44. timing Use system.time() functions to measure the time of execution. ># make a function > g <- function(n) { + y = vector(length=n) + for (i in 1:n) y[i]=i/(i+1) + y + } 44

  45. timing Use system.time() functions to measure the time of execution. ># make a function > myFun <- function(x) { + y = vector(length=x) + for (i in 1:x) y[i]=i/(i+1) + y + } > # execute the function, measuring the time of the execution > system.time( myFun(100000) ) user system elapsed 0.107 0.002 0.109 45

  46. optimization How to speed up the code? 46

  47. optimization How to speed up the code? Use vectors ! 47

  48. optimization How to speed up the code? Use vectors ! ># using loops > g1 <- function(x) { + y = vector(length=x) + for (i in 1:x) y[i]=i/(i+1) + y + } ># using vectors > x <- (1:100000) > g2 <- function(x) { + x/(x+1) + } > 48

  49. optimization How to speed up the code? Use vectors ! ># using loops > g1 <- function(x) { + y = vector(length=x) + for (i in 1:x) y[i]=i/(i+1) + y + } ># using vectors > x <- (1:100000) > g2 <- function(x) { + x/(x+1) + } > # execute the function > system.time( g2(x) ) user system elapsed 0.002 0.000 0.003 > # execute the function > system.time( g1(100000) ) user system elapsed 0.107 0.002 0.109 49

  50. optimization How to speed up the code? Avoid dynamically expanding arrays 50

More Related Content