advanced programming with r - free€¦ · programming with r thibault laurent general informations...

40
Advanced Programming with R Thibault LAURENT General informations Improving computational time Fix the size of the vectors Use vectorized function Use C++ code Make your code clean and elegant The functions l-s-t-m-apply() Use switch() instead of if/else Other recom- mandations Short introduction to S3 method Advanced Programming with R R course, Master 2 Statistics and Econometrics, TSE Formation OMP Thibault LAURENT [email protected] Toulouse School of Economics, CNRS October 23, 2019

Upload: others

Post on 02-Jun-2020

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Advanced Programming with R - Free€¦ · Programming with R Thibault LAURENT General informations Improving computational time Fix the size of the vectors Use vectorized function

AdvancedProgramming

with R

ThibaultLAURENT

Generalinformations

Improvingcomputationaltime

Fix the size ofthe vectors

Use vectorizedfunction

Use C++ code

Make yourcode cleanand elegant

The functionsl-s-t-m-apply()

Use switch()instead of if/else

Other recom-mandations

Shortintroductionto S3 method

Advanced Programming with RR course,

Master 2 Statistics and Econometrics, TSEFormation OMP

Thibault [email protected]

Toulouse School of Economics, CNRS

October 23, 2019

Page 2: Advanced Programming with R - Free€¦ · Programming with R Thibault LAURENT General informations Improving computational time Fix the size of the vectors Use vectorized function

AdvancedProgramming

with R

ThibaultLAURENT

Generalinformations

Improvingcomputationaltime

Fix the size ofthe vectors

Use vectorizedfunction

Use C++ code

Make yourcode cleanand elegant

The functionsl-s-t-m-apply()

Use switch()instead of if/else

Other recom-mandations

Shortintroductionto S3 method

1 General informations

2 Improving computational time

3 Make your code clean and elegant

4 Short introduction to S3 method

Page 3: Advanced Programming with R - Free€¦ · Programming with R Thibault LAURENT General informations Improving computational time Fix the size of the vectors Use vectorized function

AdvancedProgramming

with R

ThibaultLAURENT

Generalinformations

Improvingcomputationaltime

Fix the size ofthe vectors

Use vectorizedfunction

Use C++ code

Make yourcode cleanand elegant

The functionsl-s-t-m-apply()

Use switch()instead of if/else

Other recom-mandations

Shortintroductionto S3 method

Packages and software versions

> install.packages(c("Matrix", "microbenchmark", "pryr", "Rcpp",

"reticulate"))

> require("Matrix")

> require("pryr")

> require("Rcpp")

> require("reticulate")

> sessionInfo()

R version 3.6.1 (2019-07-05)

Platform: x86_64-pc-linux-gnu (64-bit)

Running under: Ubuntu 16.04.6 LTS

Matrix products: default

BLAS: /usr/lib/openblas-base/libblas.so.3

LAPACK: /usr/lib/libopenblasp-r0.2.18.so

locale:

[1] LC_CTYPE=fr_FR.UTF-8 LC_NUMERIC=C

[3] LC_TIME=fr_FR.UTF-8 LC_COLLATE=fr_FR.UTF-8

[5] LC_MONETARY=fr_FR.UTF-8 LC_MESSAGES=fr_FR.UTF-8

[7] LC_PAPER=fr_FR.UTF-8 LC_NAME=C

[9] LC_ADDRESS=C LC_TELEPHONE=C

[11] LC_MEASUREMENT=fr_FR.UTF-8 LC_IDENTIFICATION=C

attached base packages:

[1] stats graphics grDevices utils datasets methods

[7] base

other attached packages:

[1] reticulate_1.13 Rcpp_1.0.2 pryr_0.1.4 Matrix_1.2-17

loaded via a namespace (and not attached):

[1] compiler_3.6.1 magrittr_1.5 tools_3.6.1

[4] stringi_1.4.3 codetools_0.2-16 grid_3.6.1

[7] jsonlite_1.6 stringr_1.4.0 lattice_0.20-38

Page 4: Advanced Programming with R - Free€¦ · Programming with R Thibault LAURENT General informations Improving computational time Fix the size of the vectors Use vectorized function

AdvancedProgramming

with R

ThibaultLAURENT

Generalinformations

Improvingcomputationaltime

Fix the size ofthe vectors

Use vectorizedfunction

Use C++ code

Make yourcode cleanand elegant

The functionsl-s-t-m-apply()

Use switch()instead of if/else

Other recom-mandations

Shortintroductionto S3 method

Style guide for R code (1)

No PEP 8 as Python(https://www.python.org/dev/peps/pep-0008/)

Hadley Wickham recommandations :http://adv-r.had.co.nz/Style.html (R advanced book)

Google recommandations(https://google.github.io/styleguide/Rguide.xml)

Appropriate file naming:> source("lagrangian_computation.R")

> load("final_data_basis.RData")

Appropriate R objects naming:> reg_tree

> rf_reg

Page 5: Advanced Programming with R - Free€¦ · Programming with R Thibault LAURENT General informations Improving computational time Fix the size of the vectors Use vectorized function

AdvancedProgramming

with R

ThibaultLAURENT

Generalinformations

Improvingcomputationaltime

Fix the size ofthe vectors

Use vectorizedfunction

Use C++ code

Make yourcode cleanand elegant

The functionsl-s-t-m-apply()

Use switch()instead of if/else

Other recom-mandations

Shortintroductionto S3 method

Style guide for R code (2)

Use spaces between operators:> 1 + 2 + 3 + 4 == 4 * (4 + 1) / 2

> a <- c(5, NA, 4, 3)

> mean(a, na.rm = TRUE)

With exceptions for:> 1:10

> stats::lm

Number of characters per line : 80 maximum> ozone.rf <- randomForest(Ozone ~ ., data = airquality,

mtry = 3, importance = TRUE,

na.action = na.omit)

Affectation operator: use <- instead of =

> a <- 5

Page 6: Advanced Programming with R - Free€¦ · Programming with R Thibault LAURENT General informations Improving computational time Fix the size of the vectors Use vectorized function

AdvancedProgramming

with R

ThibaultLAURENT

Generalinformations

Improvingcomputationaltime

Fix the size ofthe vectors

Use vectorizedfunction

Use C++ code

Make yourcode cleanand elegant

The functionsl-s-t-m-apply()

Use switch()instead of if/else

Other recom-mandations

Shortintroductionto S3 method

Style guide for R code (3)

Loop:> x <- rnorm(10)

> mean_sq <- 0

> for (i in seq_along(x)) {

mean_sq <- mean_sq + x[i] ^ 2

}

Condition:> y <- 1

> if (y == 0) {

log(x)

} else {

y ^ x

}

Page 7: Advanced Programming with R - Free€¦ · Programming with R Thibault LAURENT General informations Improving computational time Fix the size of the vectors Use vectorized function

AdvancedProgramming

with R

ThibaultLAURENT

Generalinformations

Improvingcomputationaltime

Fix the size ofthe vectors

Use vectorizedfunction

Use C++ code

Make yourcode cleanand elegant

The functionsl-s-t-m-apply()

Use switch()instead of if/else

Other recom-mandations

Shortintroductionto S3 method

How can I get the code source of a R function? (1)

Solution 1 : try to print the name of the function in yourconsole

> sapply

function (X, FUN, ..., simplify = TRUE, USE.NAMES = TRUE)

{

FUN <- match.fun(FUN)

answer <- lapply(X = X, FUN = FUN, ...)

if (USE.NAMES && is.character(X) && is.null(names(answer)))

names(answer) <- X

if (!isFALSE(simplify) && length(answer))

simplify2array(answer, higher = (simplify == "array"))

else answer

}

<bytecode: 0x3042068>

<environment: namespace:base>

Page 8: Advanced Programming with R - Free€¦ · Programming with R Thibault LAURENT General informations Improving computational time Fix the size of the vectors Use vectorized function

AdvancedProgramming

with R

ThibaultLAURENT

Generalinformations

Improvingcomputationaltime

Fix the size ofthe vectors

Use vectorizedfunction

Use C++ code

Make yourcode cleanand elegant

The functionsl-s-t-m-apply()

Use switch()instead of if/else

Other recom-mandations

Shortintroductionto S3 method

How can I get the code source of a R function? (2)

Solution 2 : the function belongs to the class S3

> summary

standardGeneric for "summary" defined from package "base"

function (object, ...)

standardGeneric("summary")

<environment: 0xad092b0>

Methods may be defined for arguments: object

Use showMethods("summary") for currently available ones.

To get all the functions which can be called with summary :

> methods("summary")

To get the codes, there are two options:

if there is no *> summary.lm

if there is a *> stats:::summary.ecdf

> getAnywhere("summary.ecdf")

Page 9: Advanced Programming with R - Free€¦ · Programming with R Thibault LAURENT General informations Improving computational time Fix the size of the vectors Use vectorized function

AdvancedProgramming

with R

ThibaultLAURENT

Generalinformations

Improvingcomputationaltime

Fix the size ofthe vectors

Use vectorizedfunction

Use C++ code

Make yourcode cleanand elegant

The functionsl-s-t-m-apply()

Use switch()instead of if/else

Other recom-mandations

Shortintroductionto S3 method

How can I get the code source of a R function? (3)

Solution 3 : the function calls internal program (.Primitive()ou .Internal())

> sum

function (..., na.rm = FALSE) .Primitive("sum")

Use the function show c source() to get the source code fromGitHub (need an account):

> pryr::show_c_source(.Internal(mean(x)))

Solution 4 : the function calls C code

> qnorm

function (p, mean = 0, sd = 1, lower.tail = TRUE, log.p = FALSE)

.Call(C_qnorm, p, mean, sd, lower.tail, log.p)

<bytecode: 0x9761398>

<environment: namespace:stats>

Explorer: site:https://svn.r-project.org/R/trunk/src

qnorm

Page 10: Advanced Programming with R - Free€¦ · Programming with R Thibault LAURENT General informations Improving computational time Fix the size of the vectors Use vectorized function

AdvancedProgramming

with R

ThibaultLAURENT

Generalinformations

Improvingcomputationaltime

Fix the size ofthe vectors

Use vectorizedfunction

Use C++ code

Make yourcode cleanand elegant

The functionsl-s-t-m-apply()

Use switch()instead of if/else

Other recom-mandations

Shortintroductionto S3 method

How can I get the code source of a R function? (4)

Solution 5: the function belongs to the class S4

> require("Matrix")

> methods("dim")

[1] dim.data.frame dim.layout*

[3] dim,MatrixFactorization-method dim,Matrix-method

[5] dim.pandas.core.frame.DataFrame* dim.pandas.core.series.Series*

[7] dim.scipy.sparse.csc.csc_matrix* dim.trellis*

see '?methods' for accessing help and source code

> getMethod("dim", "MatrixFactorization")

> showMethods(class = "MatrixFactorization")

Page 11: Advanced Programming with R - Free€¦ · Programming with R Thibault LAURENT General informations Improving computational time Fix the size of the vectors Use vectorized function

AdvancedProgramming

with R

ThibaultLAURENT

Generalinformations

Improvingcomputationaltime

Fix the size ofthe vectors

Use vectorizedfunction

Use C++ code

Make yourcode cleanand elegant

The functionsl-s-t-m-apply()

Use switch()instead of if/else

Other recom-mandations

Shortintroductionto S3 method

1 General informations

2 Improving computational timeFix the size of the vectorsUse vectorized functionUse C++ code

3 Make your code clean and elegant

4 Short introduction to S3 method

Page 12: Advanced Programming with R - Free€¦ · Programming with R Thibault LAURENT General informations Improving computational time Fix the size of the vectors Use vectorized function

AdvancedProgramming

with R

ThibaultLAURENT

Generalinformations

Improvingcomputationaltime

Fix the size ofthe vectors

Use vectorizedfunction

Use C++ code

Make yourcode cleanand elegant

The functionsl-s-t-m-apply()

Use switch()instead of if/else

Other recom-mandations

Shortintroductionto S3 method

Fix the size of the vectors

Objective: create a function trunc rnorm() which takes asinput argument an integer n and a positive scalar a. It returnsa vector of size n where each elements follows a gaussian (0, 1)and is between [-a, a];

> trunc_rnorm.1 <- function(n, a) {

x <- numeric(0)

i <- 1

while (i <= n) {

r <- rnorm(1)

if (r > - abs(a) & r < abs(a)) {

x <- c(x, r)

i <- i + 1

}

}

x

}

> trunc_rnorm.2 <- function(n, a) {

x <- numeric(n)

i <- 1

while (i <= n) {

r <- rnorm(1)

if (r > - abs(a) & r < abs(a)) {

x[i] <- r

i <- i + 1

}

}

x

}

Page 13: Advanced Programming with R - Free€¦ · Programming with R Thibault LAURENT General informations Improving computational time Fix the size of the vectors Use vectorized function

AdvancedProgramming

with R

ThibaultLAURENT

Generalinformations

Improvingcomputationaltime

Fix the size ofthe vectors

Use vectorizedfunction

Use C++ code

Make yourcode cleanand elegant

The functionsl-s-t-m-apply()

Use switch()instead of if/else

Other recom-mandations

Shortintroductionto S3 method

Comparing results

By using system.time():> system.time(trunc_rnorm.1(n = 10000, a = 2))

user system elapsed

0.252 0.008 0.260

> system.time(trunc_rnorm.2(n = 10000, a = 2))

user system elapsed

0.032 0.000 0.031

By using microbenchmark():> microbenchmark::microbenchmark(

trunc_rnorm.1(n = 10000, a = 2),

trunc_rnorm.2(n = 10000, a = 2),

times = 10L)

Unit: milliseconds

expr min lq mean

trunc_rnorm.1(n = 10000, a = 2) 223.1846 226.32124 253.52285

trunc_rnorm.2(n = 10000, a = 2) 22.9708 25.36922 26.52099

median uq max neval

232.27947 241.64728 420.39724 10

27.06867 28.58687 29.51709 10

Page 14: Advanced Programming with R - Free€¦ · Programming with R Thibault LAURENT General informations Improving computational time Fix the size of the vectors Use vectorized function

AdvancedProgramming

with R

ThibaultLAURENT

Generalinformations

Improvingcomputationaltime

Fix the size ofthe vectors

Use vectorizedfunction

Use C++ code

Make yourcode cleanand elegant

The functionsl-s-t-m-apply()

Use switch()instead of if/else

Other recom-mandations

Shortintroductionto S3 method

1 General informations

2 Improving computational timeFix the size of the vectorsUse vectorized functionUse C++ code

3 Make your code clean and elegant

4 Short introduction to S3 method

Page 15: Advanced Programming with R - Free€¦ · Programming with R Thibault LAURENT General informations Improving computational time Fix the size of the vectors Use vectorized function

AdvancedProgramming

with R

ThibaultLAURENT

Generalinformations

Improvingcomputationaltime

Fix the size ofthe vectors

Use vectorizedfunction

Use C++ code

Make yourcode cleanand elegant

The functionsl-s-t-m-apply()

Use switch()instead of if/else

Other recom-mandations

Shortintroductionto S3 method

Use vectorized function

Objective: compute the sum of the elements of a simulatedvector

By using loop for> vec <- rnorm(10000000)

> my_sum <- function(x) {

res <- 0

for (k in seq_along(x))

res <- res + x[k]

return(res)

}

> microbenchmark::microbenchmark(

my_sum(vec), times = 10L

)

Unit: milliseconds

expr min lq mean median uq max

my_sum(vec) 456.7001 457.486 462.5226 460.7802 462.6858 476.7904

neval

10

By using vectorized function> microbenchmark::microbenchmark(sum(vec), times = 10L)

Unit: milliseconds

expr min lq mean median uq max neval

sum(vec) 11.25038 11.26833 11.27601 11.27898 11.28572 11.29278 10

Page 16: Advanced Programming with R - Free€¦ · Programming with R Thibault LAURENT General informations Improving computational time Fix the size of the vectors Use vectorized function

AdvancedProgramming

with R

ThibaultLAURENT

Generalinformations

Improvingcomputationaltime

Fix the size ofthe vectors

Use vectorizedfunction

Use C++ code

Make yourcode cleanand elegant

The functionsl-s-t-m-apply()

Use switch()instead of if/else

Other recom-mandations

Shortintroductionto S3 method

Use C++ code

1 Program your C++ function in a separated file

2 Lauch your file in R> download.file(url = "http://www.thibault.laurent.free.fr/cours/R_avance/sumcplusplus.cpp",

destfile = getwd())

> sourceCpp("sumcplusplus.cpp")

3 Call the C++ function from R> microbenchmark::microbenchmark(sum_rcpp(vec), times = 10L)

Unit: milliseconds

expr min lq mean median uq max

sum_rcpp(vec) 39.09352 39.11007 39.2842 39.12512 39.18452 40.48588

neval

10

4 More informations here :http://adv-r.had.co.nz/Rcpp.html

Page 17: Advanced Programming with R - Free€¦ · Programming with R Thibault LAURENT General informations Improving computational time Fix the size of the vectors Use vectorized function

AdvancedProgramming

with R

ThibaultLAURENT

Generalinformations

Improvingcomputationaltime

Fix the size ofthe vectors

Use vectorizedfunction

Use C++ code

Make yourcode cleanand elegant

The functionsl-s-t-m-apply()

Use switch()instead of if/else

Other recom-mandations

Shortintroductionto S3 method

Overview of a C++ file

#include <Rcpp.h>

using namespace Rcpp;

// [[Rcpp::export]]

double sum_rcpp(NumericVector x) {

double res = 0;

int n = x.size();

for(int i = 0; i < n; i++) {

res = res + x(i);

}

return res;

}

Page 18: Advanced Programming with R - Free€¦ · Programming with R Thibault LAURENT General informations Improving computational time Fix the size of the vectors Use vectorized function

AdvancedProgramming

with R

ThibaultLAURENT

Generalinformations

Improvingcomputationaltime

Fix the size ofthe vectors

Use vectorizedfunction

Use C++ code

Make yourcode cleanand elegant

The functionsl-s-t-m-apply()

Use switch()instead of if/else

Other recom-mandations

Shortintroductionto S3 method

1 General informations

2 Improving computational time

3 Make your code clean and elegantThe functions l-s-t-m-apply()Use switch() instead of if/elseOther recommandations

4 Short introduction to S3 method

Page 19: Advanced Programming with R - Free€¦ · Programming with R Thibault LAURENT General informations Improving computational time Fix the size of the vectors Use vectorized function

AdvancedProgramming

with R

ThibaultLAURENT

Generalinformations

Improvingcomputationaltime

Fix the size ofthe vectors

Use vectorizedfunction

Use C++ code

Make yourcode cleanand elegant

The functionsl-s-t-m-apply()

Use switch()instead of if/else

Other recom-mandations

Shortintroductionto S3 method

Function apply()

apply() can be used on a array or a matrix. It can be alsoused on a data.frame if argument FUN is adapted to the typesof the observations or variables.

LUC

PIERRE

PEDRO

TAILLE POIDSDIM 2

apply(name_matrix, DIM, FUN)

DIM 12000

2010

LUC

PIERRE

PEDRO

TAILLE POIDS

> tab <- array(c(100, 90, 110, 25, 24, 28, 175, 180,

190, 68, 74, 85), dim = c(3, 2, 2))

> dimnames(tab)[[1]] <- c("Luc", "Pierre", "Pedro")

> dimnames(tab)[[2]] <- c("taille", "poids")

> dimnames(tab)[[3]] <- c("2000", "2010")

Page 20: Advanced Programming with R - Free€¦ · Programming with R Thibault LAURENT General informations Improving computational time Fix the size of the vectors Use vectorized function

AdvancedProgramming

with R

ThibaultLAURENT

Generalinformations

Improvingcomputationaltime

Fix the size ofthe vectors

Use vectorizedfunction

Use C++ code

Make yourcode cleanand elegant

The functionsl-s-t-m-apply()

Use switch()instead of if/else

Other recom-mandations

Shortintroductionto S3 method

Function apply(): example 1

What is the mean of taille and poids in 2000 and in 2010 ?

> res <- matrix(0, dim(tab)[2], dim(tab)[3])

> for (i in 1:dim(tab)[2]) {

for (j in 1:dim(tab)[3]) {

res[i, j] <- mean(tab[ , i, j])

}

}

> res

[,1] [,2]

[1,] 100.00000 181.66667

[2,] 25.66667 75.66667

which is equivalent to :

> apply(tab, c(2, 3), mean)

2000 2010

taille 100.00000 181.66667

poids 25.66667 75.66667

Page 21: Advanced Programming with R - Free€¦ · Programming with R Thibault LAURENT General informations Improving computational time Fix the size of the vectors Use vectorized function

AdvancedProgramming

with R

ThibaultLAURENT

Generalinformations

Improvingcomputationaltime

Fix the size ofthe vectors

Use vectorizedfunction

Use C++ code

Make yourcode cleanand elegant

The functionsl-s-t-m-apply()

Use switch()instead of if/else

Other recom-mandations

Shortintroductionto S3 method

Function apply(): example 2

Compared to loop for, computationnal time is not better whenusing apply(). However the code is easier to read.

> x <- matrix(runif(10e6), nc = 5)

> microbenchmark::microbenchmark(apply(x, 2, sum), times = 10L)

Unit: milliseconds

expr min lq mean median uq

apply(x, 2, sum) 114.8715 130.0866 192.6882 149.5321 281.9222

max neval

286.4282 10

> res <- numeric(5)

> microbenchmark::microbenchmark(

for (k in 1:5) {res[k] <- sum(x[, k])}, times = 10L)

Unit: milliseconds

expr min lq

for (k in 1:5) { res[k] <- sum(x[, k]) } 37.67169 38.1126

mean median uq max neval

50.68672 40.20141 64.3248 88.38813 10

Page 22: Advanced Programming with R - Free€¦ · Programming with R Thibault LAURENT General informations Improving computational time Fix the size of the vectors Use vectorized function

AdvancedProgramming

with R

ThibaultLAURENT

Generalinformations

Improvingcomputationaltime

Fix the size ofthe vectors

Use vectorizedfunction

Use C++ code

Make yourcode cleanand elegant

The functionsl-s-t-m-apply()

Use switch()instead of if/else

Other recom-mandations

Shortintroductionto S3 method

colSums(), rowSums(), colMeans(), rowMeans()

Finally, when applying function apply() with FUN = sum or FUN

= mean, it is recommended to use one the function colSums(),rowSums(), colMeans(), rowMeans() which are vectorized:

> x <- matrix(runif(10e6), nc = 5)

> microbenchmark::microbenchmark(apply(x, 2, mean), times = 10L)

Unit: milliseconds

expr min lq mean median uq

apply(x, 2, mean) 88.41795 104.304 139.8478 120.4499 140.6812

max neval

249.5629 10

> microbenchmark::microbenchmark(colMeans(x), times = 10L)

Unit: milliseconds

expr min lq mean median uq max neval

colMeans(x) 11.20471 11.264 11.49133 11.51508 11.6262 11.78949 10

Page 23: Advanced Programming with R - Free€¦ · Programming with R Thibault LAURENT General informations Improving computational time Fix the size of the vectors Use vectorized function

AdvancedProgramming

with R

ThibaultLAURENT

Generalinformations

Improvingcomputationaltime

Fix the size ofthe vectors

Use vectorizedfunction

Use C++ code

Make yourcode cleanand elegant

The functionsl-s-t-m-apply()

Use switch()instead of if/else

Other recom-mandations

Shortintroductionto S3 method

Function lapply()

Function lapply() is used on a list object.

lapply(my list, FUN) applies the function FUN to eachelement of the list :

> my_list <- list(a = 1:3, b = "a string")

> lapply(my_list, nchar)

$a

[1] 1 1 1

$b

[1] 8

Page 24: Advanced Programming with R - Free€¦ · Programming with R Thibault LAURENT General informations Improving computational time Fix the size of the vectors Use vectorized function

AdvancedProgramming

with R

ThibaultLAURENT

Generalinformations

Improvingcomputationaltime

Fix the size ofthe vectors

Use vectorizedfunction

Use C++ code

Make yourcode cleanand elegant

The functionsl-s-t-m-apply()

Use switch()instead of if/else

Other recom-mandations

Shortintroductionto S3 method

Function lapply() on a data.frame

We remind that a data.frame is a particular list where eachelement of the list is a variable).Objective: compute the mean of each variable

Option 1 : without lapply()

> res <- numeric(length(mtcars))

> for (k in seq_along(res))

res[k] <- mean(mtcars[[k]])

> res

[1] 20.090625 6.187500 230.721875 146.687500 3.596563 3.217250

[7] 17.848750 0.437500 0.406250 3.687500 2.812500

Option 2 : with lapply()

> lapply(mtcars, mean)

Disadvantage: the result of lapply() is a list.

Page 25: Advanced Programming with R - Free€¦ · Programming with R Thibault LAURENT General informations Improving computational time Fix the size of the vectors Use vectorized function

AdvancedProgramming

with R

ThibaultLAURENT

Generalinformations

Improvingcomputationaltime

Fix the size ofthe vectors

Use vectorizedfunction

Use C++ code

Make yourcode cleanand elegant

The functionsl-s-t-m-apply()

Use switch()instead of if/else

Other recom-mandations

Shortintroductionto S3 method

Function sapply()

Similar to lapply() but it returns a more convenient format> sapply(mtcars, mean)

mpg cyl disp hp drat wt

20.090625 6.187500 230.721875 146.687500 3.596563 3.217250

qsec vs am gear carb

17.848750 0.437500 0.406250 3.687500 2.812500

Functions l-s-t-m-apply() are useful when argument FUN is anown created function:> f <- function(x)

c(min = min(x), max = max(x), mean = mean(x), med = median(x), sd = sd(x))

> t(sapply(mtcars, f))

The result of FUN applied on each element of the list must beof the same type (otherwise, it is like lapply()):> sapply(iris, function(x){

if (is.numeric(x))

f(x)

else

table(x)

})

Page 26: Advanced Programming with R - Free€¦ · Programming with R Thibault LAURENT General informations Improving computational time Fix the size of the vectors Use vectorized function

AdvancedProgramming

with R

ThibaultLAURENT

Generalinformations

Improvingcomputationaltime

Fix the size ofthe vectors

Use vectorizedfunction

Use C++ code

Make yourcode cleanand elegant

The functionsl-s-t-m-apply()

Use switch()instead of if/else

Other recom-mandations

Shortintroductionto S3 method

Function replicate()

Objective: we want to simulate 5 samples distributed under aU[0,1], each of size 10.

Solution 1:> res <- vector("list", 5)

> for (k in 1:5)

res[[k]] <- runif(10)

Solution 2:> res <- lapply(1:5, function(x) runif(10))

Solution 3:> res <- replicate(5, runif(10))

Page 27: Advanced Programming with R - Free€¦ · Programming with R Thibault LAURENT General informations Improving computational time Fix the size of the vectors Use vectorized function

AdvancedProgramming

with R

ThibaultLAURENT

Generalinformations

Improvingcomputationaltime

Fix the size ofthe vectors

Use vectorizedfunction

Use C++ code

Make yourcode cleanand elegant

The functionsl-s-t-m-apply()

Use switch()instead of if/else

Other recom-mandations

Shortintroductionto S3 method

Function mapply()

Objective: we have a list of 5 samples distributed under aU[0,1]. We want to compute the weighted mean for eachsample. The weighted are given in a second list.We create a list of 5 samples and a list with the weights:

> xs <- replicate(5, runif(10), simplify = FALSE)

> ws <- replicate(5, rpois(10, 5) + 1, simplify = FALSE)

Solution 1: by using loop> res <- numeric(length(xs))

> for (k in seq_along(res))

res[k] <- sum(xs[[k]] * ws[[k]]) / mean(ws[[k]])

Solution 2 : by using mapply() function> mapply(function(x, y) sum(x * y) / mean(y), xs, ws)

[1] 4.571867 5.626009 5.364398 5.439418 4.746239

Page 28: Advanced Programming with R - Free€¦ · Programming with R Thibault LAURENT General informations Improving computational time Fix the size of the vectors Use vectorized function

AdvancedProgramming

with R

ThibaultLAURENT

Generalinformations

Improvingcomputationaltime

Fix the size ofthe vectors

Use vectorizedfunction

Use C++ code

Make yourcode cleanand elegant

The functionsl-s-t-m-apply()

Use switch()instead of if/else

Other recom-mandations

Shortintroductionto S3 method

Function tapply()

Objective: we have two vectors of size n, one numeric x andone categorical f . We want to compute the mean with respectto the levels of x . We create two vectors:

> x <- c(rnorm(5, 165, 10), rnorm(5, 175, 10))

> f <- rep(c("M", "F"), each = 5)

Solution 1 : split() creates a list separating the data withrespect to the levels of x. Then, function sapply() permitsto compute the mean for each element of the list:> res.split <- split(x = x, f = f)

> sapply(X = res.split, FUN = mean)

F M

179.2725 163.6803

Solution 2 : by using directly function tapply()

> tapply(X = x, INDEX = f, mean)

F M

179.2725 163.6803

Page 29: Advanced Programming with R - Free€¦ · Programming with R Thibault LAURENT General informations Improving computational time Fix the size of the vectors Use vectorized function

AdvancedProgramming

with R

ThibaultLAURENT

Generalinformations

Improvingcomputationaltime

Fix the size ofthe vectors

Use vectorizedfunction

Use C++ code

Make yourcode cleanand elegant

The functionsl-s-t-m-apply()

Use switch()instead of if/else

Other recom-mandations

Shortintroductionto S3 method

1 General informations

2 Improving computational time

3 Make your code clean and elegantThe functions l-s-t-m-apply()Use switch() instead of if/elseOther recommandations

4 Short introduction to S3 method

Page 30: Advanced Programming with R - Free€¦ · Programming with R Thibault LAURENT General informations Improving computational time Fix the size of the vectors Use vectorized function

AdvancedProgramming

with R

ThibaultLAURENT

Generalinformations

Improvingcomputationaltime

Fix the size ofthe vectors

Use vectorizedfunction

Use C++ code

Make yourcode cleanand elegant

The functionsl-s-t-m-apply()

Use switch()instead of if/else

Other recom-mandations

Shortintroductionto S3 method

Classical if/else syntax

Objective: compute a non parametric kernel K where K is oneof the three options.

1 biweight K (x) = 1516(1 − ( xh )2)21( x

h)2≤1

2 triweight K (x) = 3532(1 − ( xh )2)31( x

h)2≤1

3 gaussien K (x) = 1√2π

exp(−0.5( xh )2))

> f_noyau <- function(x, h, type = "bi") {

if (type == "bi") {

15/16 * (1 - (x/h) ^ 2) ^ 2 * ifelse((x/h) ^ 2 <= 1, 1, 0)

} else {

if (type == "tri") {

35/32 * (1 - (x/h) ^ 2) ^ 3 * ifelse((x/h) ^ 2 <= 1, 1, 0)

} else {

1 / sqrt(2 * pi) * exp(-0.5 * (x/h) ^ 2)

}

}

}

Page 31: Advanced Programming with R - Free€¦ · Programming with R Thibault LAURENT General informations Improving computational time Fix the size of the vectors Use vectorized function

AdvancedProgramming

with R

ThibaultLAURENT

Generalinformations

Improvingcomputationaltime

Fix the size ofthe vectors

Use vectorizedfunction

Use C++ code

Make yourcode cleanand elegant

The functionsl-s-t-m-apply()

Use switch()instead of if/else

Other recom-mandations

Shortintroductionto S3 method

Use switch() function

Syntax:

> f_noyau.2 <- function(x, h, type = "bi") {

switch(type,

bi = 15/16 * (1 - (x/h)^2)^2 * ifelse((x/h)^2 <= 1, 1, 0),

tri = 35/32 * (1 - (x/h)^2)^3 * ifelse((x/h)^2 <= 1, 1, 0),

gauss = 1 / sqrt(2 * pi) * exp(-0.5 * (x/h)^2),

"Preciser une autre valeur de type")

}

Application:

> x <- seq(-1, 1, 0.01)

> plot(x, f_noyau.2(x, 0.3, type = "bi"),

type = "l", ylab = "", ylim = c(0, 1.2))

> lines(x, f_noyau.2(x, 0.3, type = "tri"),

lty = 2)

> lines(x, f_noyau.2(x, 0.3, type = "gauss"),

lty = 3)

> f_noyau.2(x, 0.3, type = "unif")

[1] "Preciser une autre valeur de type"

−1.0 −0.5 0.0 0.5 1.0

0.0

0.2

0.4

0.6

0.8

1.0

1.2

x

Page 32: Advanced Programming with R - Free€¦ · Programming with R Thibault LAURENT General informations Improving computational time Fix the size of the vectors Use vectorized function

AdvancedProgramming

with R

ThibaultLAURENT

Generalinformations

Improvingcomputationaltime

Fix the size ofthe vectors

Use vectorizedfunction

Use C++ code

Make yourcode cleanand elegant

The functionsl-s-t-m-apply()

Use switch()instead of if/else

Other recom-mandations

Shortintroductionto S3 method

Function stopifnot()

Stop as soon as a condition is not verified:

> stopifnot(1 < 2, length(1:2) == 2, pi < 2, cos(pi) > 3)

Usually used at the top of a function:

> try_stopifnot <- function (x, y, n) {

stopifnot(length(x) == length(y),

is.integer(n))

(x + y) ^ n

}

> try_stopifnot(2, 5, 2L)

[1] 49

Page 33: Advanced Programming with R - Free€¦ · Programming with R Thibault LAURENT General informations Improving computational time Fix the size of the vectors Use vectorized function

AdvancedProgramming

with R

ThibaultLAURENT

Generalinformations

Improvingcomputationaltime

Fix the size ofthe vectors

Use vectorizedfunction

Use C++ code

Make yourcode cleanand elegant

The functionsl-s-t-m-apply()

Use switch()instead of if/else

Other recom-mandations

Shortintroductionto S3 method

Some recommandations (1)

Use ... as input arguments> plot_reg <- function(x, y, np = TRUE, ...) {

plot(y ~ x, ...)

abline(lm(y ~ x), col = "blue")

if (np) {

np.reg <- loess(y ~ x)

x.seq <- seq(min(x), max(x), length.out = 25)

lines(x.seq, predict(np.reg, x.seq), col = "red")

}

}

> x <- rnorm(100)

> y <- x * runif(100)

> plot_reg(x, y, np = FALSE)

> plot_reg(x, y, pch = 16, col = "pink",

xlab = "variable explicative",

ylab = "variable a expliquer")

Page 34: Advanced Programming with R - Free€¦ · Programming with R Thibault LAURENT General informations Improving computational time Fix the size of the vectors Use vectorized function

AdvancedProgramming

with R

ThibaultLAURENT

Generalinformations

Improvingcomputationaltime

Fix the size ofthe vectors

Use vectorizedfunction

Use C++ code

Make yourcode cleanand elegant

The functionsl-s-t-m-apply()

Use switch()instead of if/else

Other recom-mandations

Shortintroductionto S3 method

Some recommandations (2)

Create several functions:> bi <- function(x, h)

15/16 * (1 - (x/h)^2)^2 * ifelse((x/h)^2 <= 1, 1, 0)

> tri <- function(x, h)

35/32 * (1 - (x/h)^2)^3 * ifelse((x/h)^2 <= 1, 1, 0)

> gauss <- function(x, h)

1 / sqrt(2 * pi) * exp(-0.5 * (x/h)^2)

> f_noyau.2 <- function(x, h, type = "bi") {

switch(type,

bi = bi(x, h),

tri = tri(x, h),

gauss = gauss(x, h),

"Preciser une autre valeur de type")

}

Page 35: Advanced Programming with R - Free€¦ · Programming with R Thibault LAURENT General informations Improving computational time Fix the size of the vectors Use vectorized function

AdvancedProgramming

with R

ThibaultLAURENT

Generalinformations

Improvingcomputationaltime

Fix the size ofthe vectors

Use vectorizedfunction

Use C++ code

Make yourcode cleanand elegant

The functionsl-s-t-m-apply()

Use switch()instead of if/else

Other recom-mandations

Shortintroductionto S3 method

Some recommandations (3)

Do not keep un-used arguments> f <- function(a = 5, b = 4, d = 3, e = 1)

(a + b)^2

It is possible to use function as input argument:> randomise <- function(f) f(runif(1e3))

> randomise(mean)

> randomise(sum)

It is possible to use function as output argument:> f_power <- function(exponent)

function(x) x^exponent

> f_power(2)(1:5)

> f_power(3)(1:5)

Page 36: Advanced Programming with R - Free€¦ · Programming with R Thibault LAURENT General informations Improving computational time Fix the size of the vectors Use vectorized function

AdvancedProgramming

with R

ThibaultLAURENT

Generalinformations

Improvingcomputationaltime

Fix the size ofthe vectors

Use vectorizedfunction

Use C++ code

Make yourcode cleanand elegant

The functionsl-s-t-m-apply()

Use switch()instead of if/else

Other recom-mandations

Shortintroductionto S3 method

Debugg your function

Distinguish "error message" and "warning message"

Most of the time, the error message is obvious tounderstand what is wrong> sum(c("a", "b"))

Use traceback() function> f_noyau.2(seq(-2, 2, by = 0.1), h = "n")

> traceback()

Use debugonce() function> ex_bug.2 <- function(x) {

x <- x + 1

x <- x^3

x <- log(x)

x

}

> debugonce(ex_bug.2)

> ex_bug.2(-5)

Page 37: Advanced Programming with R - Free€¦ · Programming with R Thibault LAURENT General informations Improving computational time Fix the size of the vectors Use vectorized function

AdvancedProgramming

with R

ThibaultLAURENT

Generalinformations

Improvingcomputationaltime

Fix the size ofthe vectors

Use vectorizedfunction

Use C++ code

Make yourcode cleanand elegant

The functionsl-s-t-m-apply()

Use switch()instead of if/else

Other recom-mandations

Shortintroductionto S3 method

Function try()

If you are conscious that your code contains error and you donot want the function stops, use function try():

> f_error.1 <- function(x) {

try(x <- log(x))

x

}

> f_error.1("10")

> f_error.1(-1)

Page 38: Advanced Programming with R - Free€¦ · Programming with R Thibault LAURENT General informations Improving computational time Fix the size of the vectors Use vectorized function

AdvancedProgramming

with R

ThibaultLAURENT

Generalinformations

Improvingcomputationaltime

Fix the size ofthe vectors

Use vectorizedfunction

Use C++ code

Make yourcode cleanand elegant

The functionsl-s-t-m-apply()

Use switch()instead of if/else

Other recom-mandations

Shortintroductionto S3 method

1 General informations

2 Improving computational time

3 Make your code clean and elegant

4 Short introduction to S3 method

Page 39: Advanced Programming with R - Free€¦ · Programming with R Thibault LAURENT General informations Improving computational time Fix the size of the vectors Use vectorized function

AdvancedProgramming

with R

ThibaultLAURENT

Generalinformations

Improvingcomputationaltime

Fix the size ofthe vectors

Use vectorizedfunction

Use C++ code

Make yourcode cleanand elegant

The functionsl-s-t-m-apply()

Use switch()instead of if/else

Other recom-mandations

Shortintroductionto S3 method

S3 method (1)

Objective: use the same function for different classes of object.

Define new classes:> squ <- 3

> class(squ) <- "carre"

> rec <- c(5, 6)

> class(rec) <- "rectangle"

> cir <- sqrt(10)

> class(cir) <- "cercle"

Define a new method and default method:> getArea <- function(obj)

UseMethod("getArea", obj)

> getArea.default <- function(obj) {

stop("Methode getArea non definie pour ce type d'objet")}

Page 40: Advanced Programming with R - Free€¦ · Programming with R Thibault LAURENT General informations Improving computational time Fix the size of the vectors Use vectorized function

AdvancedProgramming

with R

ThibaultLAURENT

Generalinformations

Improvingcomputationaltime

Fix the size ofthe vectors

Use vectorizedfunction

Use C++ code

Make yourcode cleanand elegant

The functionsl-s-t-m-apply()

Use switch()instead of if/else

Other recom-mandations

Shortintroductionto S3 method

S3 method (2)

Create a function getArea associated to each class> getArea.cercle <- function(obj) {

pi * obj[1] ^ 2

}

> getArea.rectangle <- function(obj) {

obj[1] * obj[2]

}

> getArea.carre <- function(obj) {

obj[1]^2

}

Examples:> getArea(cir)

[1] 31.41593

> getArea(rec)

[1] 30

> getArea(squ)

[1] 9