advanced programming with r - free€¦ · programming with r thibault laurent general informations...
TRANSCRIPT
![Page 1: Advanced Programming with R - Free€¦ · Programming with R Thibault LAURENT General informations Improving computational time Fix the size of the vectors Use vectorized function](https://reader034.vdocuments.net/reader034/viewer/2022042313/5edf1588ad6a402d666a6fac/html5/thumbnails/1.jpg)
AdvancedProgramming
with R
ThibaultLAURENT
Generalinformations
Improvingcomputationaltime
Fix the size ofthe vectors
Use vectorizedfunction
Use C++ code
Make yourcode cleanand elegant
The functionsl-s-t-m-apply()
Use switch()instead of if/else
Other recom-mandations
Shortintroductionto S3 method
Advanced Programming with RR course,
Master 2 Statistics and Econometrics, TSEFormation OMP
Thibault [email protected]
Toulouse School of Economics, CNRS
October 23, 2019
![Page 2: Advanced Programming with R - Free€¦ · Programming with R Thibault LAURENT General informations Improving computational time Fix the size of the vectors Use vectorized function](https://reader034.vdocuments.net/reader034/viewer/2022042313/5edf1588ad6a402d666a6fac/html5/thumbnails/2.jpg)
AdvancedProgramming
with R
ThibaultLAURENT
Generalinformations
Improvingcomputationaltime
Fix the size ofthe vectors
Use vectorizedfunction
Use C++ code
Make yourcode cleanand elegant
The functionsl-s-t-m-apply()
Use switch()instead of if/else
Other recom-mandations
Shortintroductionto S3 method
1 General informations
2 Improving computational time
3 Make your code clean and elegant
4 Short introduction to S3 method
![Page 3: Advanced Programming with R - Free€¦ · Programming with R Thibault LAURENT General informations Improving computational time Fix the size of the vectors Use vectorized function](https://reader034.vdocuments.net/reader034/viewer/2022042313/5edf1588ad6a402d666a6fac/html5/thumbnails/3.jpg)
AdvancedProgramming
with R
ThibaultLAURENT
Generalinformations
Improvingcomputationaltime
Fix the size ofthe vectors
Use vectorizedfunction
Use C++ code
Make yourcode cleanand elegant
The functionsl-s-t-m-apply()
Use switch()instead of if/else
Other recom-mandations
Shortintroductionto S3 method
Packages and software versions
> install.packages(c("Matrix", "microbenchmark", "pryr", "Rcpp",
"reticulate"))
> require("Matrix")
> require("pryr")
> require("Rcpp")
> require("reticulate")
> sessionInfo()
R version 3.6.1 (2019-07-05)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.6 LTS
Matrix products: default
BLAS: /usr/lib/openblas-base/libblas.so.3
LAPACK: /usr/lib/libopenblasp-r0.2.18.so
locale:
[1] LC_CTYPE=fr_FR.UTF-8 LC_NUMERIC=C
[3] LC_TIME=fr_FR.UTF-8 LC_COLLATE=fr_FR.UTF-8
[5] LC_MONETARY=fr_FR.UTF-8 LC_MESSAGES=fr_FR.UTF-8
[7] LC_PAPER=fr_FR.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=fr_FR.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods
[7] base
other attached packages:
[1] reticulate_1.13 Rcpp_1.0.2 pryr_0.1.4 Matrix_1.2-17
loaded via a namespace (and not attached):
[1] compiler_3.6.1 magrittr_1.5 tools_3.6.1
[4] stringi_1.4.3 codetools_0.2-16 grid_3.6.1
[7] jsonlite_1.6 stringr_1.4.0 lattice_0.20-38
![Page 4: Advanced Programming with R - Free€¦ · Programming with R Thibault LAURENT General informations Improving computational time Fix the size of the vectors Use vectorized function](https://reader034.vdocuments.net/reader034/viewer/2022042313/5edf1588ad6a402d666a6fac/html5/thumbnails/4.jpg)
AdvancedProgramming
with R
ThibaultLAURENT
Generalinformations
Improvingcomputationaltime
Fix the size ofthe vectors
Use vectorizedfunction
Use C++ code
Make yourcode cleanand elegant
The functionsl-s-t-m-apply()
Use switch()instead of if/else
Other recom-mandations
Shortintroductionto S3 method
Style guide for R code (1)
No PEP 8 as Python(https://www.python.org/dev/peps/pep-0008/)
Hadley Wickham recommandations :http://adv-r.had.co.nz/Style.html (R advanced book)
Google recommandations(https://google.github.io/styleguide/Rguide.xml)
Appropriate file naming:> source("lagrangian_computation.R")
> load("final_data_basis.RData")
Appropriate R objects naming:> reg_tree
> rf_reg
![Page 5: Advanced Programming with R - Free€¦ · Programming with R Thibault LAURENT General informations Improving computational time Fix the size of the vectors Use vectorized function](https://reader034.vdocuments.net/reader034/viewer/2022042313/5edf1588ad6a402d666a6fac/html5/thumbnails/5.jpg)
AdvancedProgramming
with R
ThibaultLAURENT
Generalinformations
Improvingcomputationaltime
Fix the size ofthe vectors
Use vectorizedfunction
Use C++ code
Make yourcode cleanand elegant
The functionsl-s-t-m-apply()
Use switch()instead of if/else
Other recom-mandations
Shortintroductionto S3 method
Style guide for R code (2)
Use spaces between operators:> 1 + 2 + 3 + 4 == 4 * (4 + 1) / 2
> a <- c(5, NA, 4, 3)
> mean(a, na.rm = TRUE)
With exceptions for:> 1:10
> stats::lm
Number of characters per line : 80 maximum> ozone.rf <- randomForest(Ozone ~ ., data = airquality,
mtry = 3, importance = TRUE,
na.action = na.omit)
Affectation operator: use <- instead of =
> a <- 5
![Page 6: Advanced Programming with R - Free€¦ · Programming with R Thibault LAURENT General informations Improving computational time Fix the size of the vectors Use vectorized function](https://reader034.vdocuments.net/reader034/viewer/2022042313/5edf1588ad6a402d666a6fac/html5/thumbnails/6.jpg)
AdvancedProgramming
with R
ThibaultLAURENT
Generalinformations
Improvingcomputationaltime
Fix the size ofthe vectors
Use vectorizedfunction
Use C++ code
Make yourcode cleanand elegant
The functionsl-s-t-m-apply()
Use switch()instead of if/else
Other recom-mandations
Shortintroductionto S3 method
Style guide for R code (3)
Loop:> x <- rnorm(10)
> mean_sq <- 0
> for (i in seq_along(x)) {
mean_sq <- mean_sq + x[i] ^ 2
}
Condition:> y <- 1
> if (y == 0) {
log(x)
} else {
y ^ x
}
![Page 7: Advanced Programming with R - Free€¦ · Programming with R Thibault LAURENT General informations Improving computational time Fix the size of the vectors Use vectorized function](https://reader034.vdocuments.net/reader034/viewer/2022042313/5edf1588ad6a402d666a6fac/html5/thumbnails/7.jpg)
AdvancedProgramming
with R
ThibaultLAURENT
Generalinformations
Improvingcomputationaltime
Fix the size ofthe vectors
Use vectorizedfunction
Use C++ code
Make yourcode cleanand elegant
The functionsl-s-t-m-apply()
Use switch()instead of if/else
Other recom-mandations
Shortintroductionto S3 method
How can I get the code source of a R function? (1)
Solution 1 : try to print the name of the function in yourconsole
> sapply
function (X, FUN, ..., simplify = TRUE, USE.NAMES = TRUE)
{
FUN <- match.fun(FUN)
answer <- lapply(X = X, FUN = FUN, ...)
if (USE.NAMES && is.character(X) && is.null(names(answer)))
names(answer) <- X
if (!isFALSE(simplify) && length(answer))
simplify2array(answer, higher = (simplify == "array"))
else answer
}
<bytecode: 0x3042068>
<environment: namespace:base>
![Page 8: Advanced Programming with R - Free€¦ · Programming with R Thibault LAURENT General informations Improving computational time Fix the size of the vectors Use vectorized function](https://reader034.vdocuments.net/reader034/viewer/2022042313/5edf1588ad6a402d666a6fac/html5/thumbnails/8.jpg)
AdvancedProgramming
with R
ThibaultLAURENT
Generalinformations
Improvingcomputationaltime
Fix the size ofthe vectors
Use vectorizedfunction
Use C++ code
Make yourcode cleanand elegant
The functionsl-s-t-m-apply()
Use switch()instead of if/else
Other recom-mandations
Shortintroductionto S3 method
How can I get the code source of a R function? (2)
Solution 2 : the function belongs to the class S3
> summary
standardGeneric for "summary" defined from package "base"
function (object, ...)
standardGeneric("summary")
<environment: 0xad092b0>
Methods may be defined for arguments: object
Use showMethods("summary") for currently available ones.
To get all the functions which can be called with summary :
> methods("summary")
To get the codes, there are two options:
if there is no *> summary.lm
if there is a *> stats:::summary.ecdf
> getAnywhere("summary.ecdf")
![Page 9: Advanced Programming with R - Free€¦ · Programming with R Thibault LAURENT General informations Improving computational time Fix the size of the vectors Use vectorized function](https://reader034.vdocuments.net/reader034/viewer/2022042313/5edf1588ad6a402d666a6fac/html5/thumbnails/9.jpg)
AdvancedProgramming
with R
ThibaultLAURENT
Generalinformations
Improvingcomputationaltime
Fix the size ofthe vectors
Use vectorizedfunction
Use C++ code
Make yourcode cleanand elegant
The functionsl-s-t-m-apply()
Use switch()instead of if/else
Other recom-mandations
Shortintroductionto S3 method
How can I get the code source of a R function? (3)
Solution 3 : the function calls internal program (.Primitive()ou .Internal())
> sum
function (..., na.rm = FALSE) .Primitive("sum")
Use the function show c source() to get the source code fromGitHub (need an account):
> pryr::show_c_source(.Internal(mean(x)))
Solution 4 : the function calls C code
> qnorm
function (p, mean = 0, sd = 1, lower.tail = TRUE, log.p = FALSE)
.Call(C_qnorm, p, mean, sd, lower.tail, log.p)
<bytecode: 0x9761398>
<environment: namespace:stats>
Explorer: site:https://svn.r-project.org/R/trunk/src
qnorm
![Page 10: Advanced Programming with R - Free€¦ · Programming with R Thibault LAURENT General informations Improving computational time Fix the size of the vectors Use vectorized function](https://reader034.vdocuments.net/reader034/viewer/2022042313/5edf1588ad6a402d666a6fac/html5/thumbnails/10.jpg)
AdvancedProgramming
with R
ThibaultLAURENT
Generalinformations
Improvingcomputationaltime
Fix the size ofthe vectors
Use vectorizedfunction
Use C++ code
Make yourcode cleanand elegant
The functionsl-s-t-m-apply()
Use switch()instead of if/else
Other recom-mandations
Shortintroductionto S3 method
How can I get the code source of a R function? (4)
Solution 5: the function belongs to the class S4
> require("Matrix")
> methods("dim")
[1] dim.data.frame dim.layout*
[3] dim,MatrixFactorization-method dim,Matrix-method
[5] dim.pandas.core.frame.DataFrame* dim.pandas.core.series.Series*
[7] dim.scipy.sparse.csc.csc_matrix* dim.trellis*
see '?methods' for accessing help and source code
> getMethod("dim", "MatrixFactorization")
> showMethods(class = "MatrixFactorization")
![Page 11: Advanced Programming with R - Free€¦ · Programming with R Thibault LAURENT General informations Improving computational time Fix the size of the vectors Use vectorized function](https://reader034.vdocuments.net/reader034/viewer/2022042313/5edf1588ad6a402d666a6fac/html5/thumbnails/11.jpg)
AdvancedProgramming
with R
ThibaultLAURENT
Generalinformations
Improvingcomputationaltime
Fix the size ofthe vectors
Use vectorizedfunction
Use C++ code
Make yourcode cleanand elegant
The functionsl-s-t-m-apply()
Use switch()instead of if/else
Other recom-mandations
Shortintroductionto S3 method
1 General informations
2 Improving computational timeFix the size of the vectorsUse vectorized functionUse C++ code
3 Make your code clean and elegant
4 Short introduction to S3 method
![Page 12: Advanced Programming with R - Free€¦ · Programming with R Thibault LAURENT General informations Improving computational time Fix the size of the vectors Use vectorized function](https://reader034.vdocuments.net/reader034/viewer/2022042313/5edf1588ad6a402d666a6fac/html5/thumbnails/12.jpg)
AdvancedProgramming
with R
ThibaultLAURENT
Generalinformations
Improvingcomputationaltime
Fix the size ofthe vectors
Use vectorizedfunction
Use C++ code
Make yourcode cleanand elegant
The functionsl-s-t-m-apply()
Use switch()instead of if/else
Other recom-mandations
Shortintroductionto S3 method
Fix the size of the vectors
Objective: create a function trunc rnorm() which takes asinput argument an integer n and a positive scalar a. It returnsa vector of size n where each elements follows a gaussian (0, 1)and is between [-a, a];
> trunc_rnorm.1 <- function(n, a) {
x <- numeric(0)
i <- 1
while (i <= n) {
r <- rnorm(1)
if (r > - abs(a) & r < abs(a)) {
x <- c(x, r)
i <- i + 1
}
}
x
}
> trunc_rnorm.2 <- function(n, a) {
x <- numeric(n)
i <- 1
while (i <= n) {
r <- rnorm(1)
if (r > - abs(a) & r < abs(a)) {
x[i] <- r
i <- i + 1
}
}
x
}
![Page 13: Advanced Programming with R - Free€¦ · Programming with R Thibault LAURENT General informations Improving computational time Fix the size of the vectors Use vectorized function](https://reader034.vdocuments.net/reader034/viewer/2022042313/5edf1588ad6a402d666a6fac/html5/thumbnails/13.jpg)
AdvancedProgramming
with R
ThibaultLAURENT
Generalinformations
Improvingcomputationaltime
Fix the size ofthe vectors
Use vectorizedfunction
Use C++ code
Make yourcode cleanand elegant
The functionsl-s-t-m-apply()
Use switch()instead of if/else
Other recom-mandations
Shortintroductionto S3 method
Comparing results
By using system.time():> system.time(trunc_rnorm.1(n = 10000, a = 2))
user system elapsed
0.252 0.008 0.260
> system.time(trunc_rnorm.2(n = 10000, a = 2))
user system elapsed
0.032 0.000 0.031
By using microbenchmark():> microbenchmark::microbenchmark(
trunc_rnorm.1(n = 10000, a = 2),
trunc_rnorm.2(n = 10000, a = 2),
times = 10L)
Unit: milliseconds
expr min lq mean
trunc_rnorm.1(n = 10000, a = 2) 223.1846 226.32124 253.52285
trunc_rnorm.2(n = 10000, a = 2) 22.9708 25.36922 26.52099
median uq max neval
232.27947 241.64728 420.39724 10
27.06867 28.58687 29.51709 10
![Page 14: Advanced Programming with R - Free€¦ · Programming with R Thibault LAURENT General informations Improving computational time Fix the size of the vectors Use vectorized function](https://reader034.vdocuments.net/reader034/viewer/2022042313/5edf1588ad6a402d666a6fac/html5/thumbnails/14.jpg)
AdvancedProgramming
with R
ThibaultLAURENT
Generalinformations
Improvingcomputationaltime
Fix the size ofthe vectors
Use vectorizedfunction
Use C++ code
Make yourcode cleanand elegant
The functionsl-s-t-m-apply()
Use switch()instead of if/else
Other recom-mandations
Shortintroductionto S3 method
1 General informations
2 Improving computational timeFix the size of the vectorsUse vectorized functionUse C++ code
3 Make your code clean and elegant
4 Short introduction to S3 method
![Page 15: Advanced Programming with R - Free€¦ · Programming with R Thibault LAURENT General informations Improving computational time Fix the size of the vectors Use vectorized function](https://reader034.vdocuments.net/reader034/viewer/2022042313/5edf1588ad6a402d666a6fac/html5/thumbnails/15.jpg)
AdvancedProgramming
with R
ThibaultLAURENT
Generalinformations
Improvingcomputationaltime
Fix the size ofthe vectors
Use vectorizedfunction
Use C++ code
Make yourcode cleanand elegant
The functionsl-s-t-m-apply()
Use switch()instead of if/else
Other recom-mandations
Shortintroductionto S3 method
Use vectorized function
Objective: compute the sum of the elements of a simulatedvector
By using loop for> vec <- rnorm(10000000)
> my_sum <- function(x) {
res <- 0
for (k in seq_along(x))
res <- res + x[k]
return(res)
}
> microbenchmark::microbenchmark(
my_sum(vec), times = 10L
)
Unit: milliseconds
expr min lq mean median uq max
my_sum(vec) 456.7001 457.486 462.5226 460.7802 462.6858 476.7904
neval
10
By using vectorized function> microbenchmark::microbenchmark(sum(vec), times = 10L)
Unit: milliseconds
expr min lq mean median uq max neval
sum(vec) 11.25038 11.26833 11.27601 11.27898 11.28572 11.29278 10
![Page 16: Advanced Programming with R - Free€¦ · Programming with R Thibault LAURENT General informations Improving computational time Fix the size of the vectors Use vectorized function](https://reader034.vdocuments.net/reader034/viewer/2022042313/5edf1588ad6a402d666a6fac/html5/thumbnails/16.jpg)
AdvancedProgramming
with R
ThibaultLAURENT
Generalinformations
Improvingcomputationaltime
Fix the size ofthe vectors
Use vectorizedfunction
Use C++ code
Make yourcode cleanand elegant
The functionsl-s-t-m-apply()
Use switch()instead of if/else
Other recom-mandations
Shortintroductionto S3 method
Use C++ code
1 Program your C++ function in a separated file
2 Lauch your file in R> download.file(url = "http://www.thibault.laurent.free.fr/cours/R_avance/sumcplusplus.cpp",
destfile = getwd())
> sourceCpp("sumcplusplus.cpp")
3 Call the C++ function from R> microbenchmark::microbenchmark(sum_rcpp(vec), times = 10L)
Unit: milliseconds
expr min lq mean median uq max
sum_rcpp(vec) 39.09352 39.11007 39.2842 39.12512 39.18452 40.48588
neval
10
4 More informations here :http://adv-r.had.co.nz/Rcpp.html
![Page 17: Advanced Programming with R - Free€¦ · Programming with R Thibault LAURENT General informations Improving computational time Fix the size of the vectors Use vectorized function](https://reader034.vdocuments.net/reader034/viewer/2022042313/5edf1588ad6a402d666a6fac/html5/thumbnails/17.jpg)
AdvancedProgramming
with R
ThibaultLAURENT
Generalinformations
Improvingcomputationaltime
Fix the size ofthe vectors
Use vectorizedfunction
Use C++ code
Make yourcode cleanand elegant
The functionsl-s-t-m-apply()
Use switch()instead of if/else
Other recom-mandations
Shortintroductionto S3 method
Overview of a C++ file
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
double sum_rcpp(NumericVector x) {
double res = 0;
int n = x.size();
for(int i = 0; i < n; i++) {
res = res + x(i);
}
return res;
}
![Page 18: Advanced Programming with R - Free€¦ · Programming with R Thibault LAURENT General informations Improving computational time Fix the size of the vectors Use vectorized function](https://reader034.vdocuments.net/reader034/viewer/2022042313/5edf1588ad6a402d666a6fac/html5/thumbnails/18.jpg)
AdvancedProgramming
with R
ThibaultLAURENT
Generalinformations
Improvingcomputationaltime
Fix the size ofthe vectors
Use vectorizedfunction
Use C++ code
Make yourcode cleanand elegant
The functionsl-s-t-m-apply()
Use switch()instead of if/else
Other recom-mandations
Shortintroductionto S3 method
1 General informations
2 Improving computational time
3 Make your code clean and elegantThe functions l-s-t-m-apply()Use switch() instead of if/elseOther recommandations
4 Short introduction to S3 method
![Page 19: Advanced Programming with R - Free€¦ · Programming with R Thibault LAURENT General informations Improving computational time Fix the size of the vectors Use vectorized function](https://reader034.vdocuments.net/reader034/viewer/2022042313/5edf1588ad6a402d666a6fac/html5/thumbnails/19.jpg)
AdvancedProgramming
with R
ThibaultLAURENT
Generalinformations
Improvingcomputationaltime
Fix the size ofthe vectors
Use vectorizedfunction
Use C++ code
Make yourcode cleanand elegant
The functionsl-s-t-m-apply()
Use switch()instead of if/else
Other recom-mandations
Shortintroductionto S3 method
Function apply()
apply() can be used on a array or a matrix. It can be alsoused on a data.frame if argument FUN is adapted to the typesof the observations or variables.
LUC
PIERRE
PEDRO
TAILLE POIDSDIM 2
apply(name_matrix, DIM, FUN)
DIM 12000
2010
LUC
PIERRE
PEDRO
TAILLE POIDS
> tab <- array(c(100, 90, 110, 25, 24, 28, 175, 180,
190, 68, 74, 85), dim = c(3, 2, 2))
> dimnames(tab)[[1]] <- c("Luc", "Pierre", "Pedro")
> dimnames(tab)[[2]] <- c("taille", "poids")
> dimnames(tab)[[3]] <- c("2000", "2010")
![Page 20: Advanced Programming with R - Free€¦ · Programming with R Thibault LAURENT General informations Improving computational time Fix the size of the vectors Use vectorized function](https://reader034.vdocuments.net/reader034/viewer/2022042313/5edf1588ad6a402d666a6fac/html5/thumbnails/20.jpg)
AdvancedProgramming
with R
ThibaultLAURENT
Generalinformations
Improvingcomputationaltime
Fix the size ofthe vectors
Use vectorizedfunction
Use C++ code
Make yourcode cleanand elegant
The functionsl-s-t-m-apply()
Use switch()instead of if/else
Other recom-mandations
Shortintroductionto S3 method
Function apply(): example 1
What is the mean of taille and poids in 2000 and in 2010 ?
> res <- matrix(0, dim(tab)[2], dim(tab)[3])
> for (i in 1:dim(tab)[2]) {
for (j in 1:dim(tab)[3]) {
res[i, j] <- mean(tab[ , i, j])
}
}
> res
[,1] [,2]
[1,] 100.00000 181.66667
[2,] 25.66667 75.66667
which is equivalent to :
> apply(tab, c(2, 3), mean)
2000 2010
taille 100.00000 181.66667
poids 25.66667 75.66667
![Page 21: Advanced Programming with R - Free€¦ · Programming with R Thibault LAURENT General informations Improving computational time Fix the size of the vectors Use vectorized function](https://reader034.vdocuments.net/reader034/viewer/2022042313/5edf1588ad6a402d666a6fac/html5/thumbnails/21.jpg)
AdvancedProgramming
with R
ThibaultLAURENT
Generalinformations
Improvingcomputationaltime
Fix the size ofthe vectors
Use vectorizedfunction
Use C++ code
Make yourcode cleanand elegant
The functionsl-s-t-m-apply()
Use switch()instead of if/else
Other recom-mandations
Shortintroductionto S3 method
Function apply(): example 2
Compared to loop for, computationnal time is not better whenusing apply(). However the code is easier to read.
> x <- matrix(runif(10e6), nc = 5)
> microbenchmark::microbenchmark(apply(x, 2, sum), times = 10L)
Unit: milliseconds
expr min lq mean median uq
apply(x, 2, sum) 114.8715 130.0866 192.6882 149.5321 281.9222
max neval
286.4282 10
> res <- numeric(5)
> microbenchmark::microbenchmark(
for (k in 1:5) {res[k] <- sum(x[, k])}, times = 10L)
Unit: milliseconds
expr min lq
for (k in 1:5) { res[k] <- sum(x[, k]) } 37.67169 38.1126
mean median uq max neval
50.68672 40.20141 64.3248 88.38813 10
![Page 22: Advanced Programming with R - Free€¦ · Programming with R Thibault LAURENT General informations Improving computational time Fix the size of the vectors Use vectorized function](https://reader034.vdocuments.net/reader034/viewer/2022042313/5edf1588ad6a402d666a6fac/html5/thumbnails/22.jpg)
AdvancedProgramming
with R
ThibaultLAURENT
Generalinformations
Improvingcomputationaltime
Fix the size ofthe vectors
Use vectorizedfunction
Use C++ code
Make yourcode cleanand elegant
The functionsl-s-t-m-apply()
Use switch()instead of if/else
Other recom-mandations
Shortintroductionto S3 method
colSums(), rowSums(), colMeans(), rowMeans()
Finally, when applying function apply() with FUN = sum or FUN
= mean, it is recommended to use one the function colSums(),rowSums(), colMeans(), rowMeans() which are vectorized:
> x <- matrix(runif(10e6), nc = 5)
> microbenchmark::microbenchmark(apply(x, 2, mean), times = 10L)
Unit: milliseconds
expr min lq mean median uq
apply(x, 2, mean) 88.41795 104.304 139.8478 120.4499 140.6812
max neval
249.5629 10
> microbenchmark::microbenchmark(colMeans(x), times = 10L)
Unit: milliseconds
expr min lq mean median uq max neval
colMeans(x) 11.20471 11.264 11.49133 11.51508 11.6262 11.78949 10
![Page 23: Advanced Programming with R - Free€¦ · Programming with R Thibault LAURENT General informations Improving computational time Fix the size of the vectors Use vectorized function](https://reader034.vdocuments.net/reader034/viewer/2022042313/5edf1588ad6a402d666a6fac/html5/thumbnails/23.jpg)
AdvancedProgramming
with R
ThibaultLAURENT
Generalinformations
Improvingcomputationaltime
Fix the size ofthe vectors
Use vectorizedfunction
Use C++ code
Make yourcode cleanand elegant
The functionsl-s-t-m-apply()
Use switch()instead of if/else
Other recom-mandations
Shortintroductionto S3 method
Function lapply()
Function lapply() is used on a list object.
lapply(my list, FUN) applies the function FUN to eachelement of the list :
> my_list <- list(a = 1:3, b = "a string")
> lapply(my_list, nchar)
$a
[1] 1 1 1
$b
[1] 8
![Page 24: Advanced Programming with R - Free€¦ · Programming with R Thibault LAURENT General informations Improving computational time Fix the size of the vectors Use vectorized function](https://reader034.vdocuments.net/reader034/viewer/2022042313/5edf1588ad6a402d666a6fac/html5/thumbnails/24.jpg)
AdvancedProgramming
with R
ThibaultLAURENT
Generalinformations
Improvingcomputationaltime
Fix the size ofthe vectors
Use vectorizedfunction
Use C++ code
Make yourcode cleanand elegant
The functionsl-s-t-m-apply()
Use switch()instead of if/else
Other recom-mandations
Shortintroductionto S3 method
Function lapply() on a data.frame
We remind that a data.frame is a particular list where eachelement of the list is a variable).Objective: compute the mean of each variable
Option 1 : without lapply()
> res <- numeric(length(mtcars))
> for (k in seq_along(res))
res[k] <- mean(mtcars[[k]])
> res
[1] 20.090625 6.187500 230.721875 146.687500 3.596563 3.217250
[7] 17.848750 0.437500 0.406250 3.687500 2.812500
Option 2 : with lapply()
> lapply(mtcars, mean)
Disadvantage: the result of lapply() is a list.
![Page 25: Advanced Programming with R - Free€¦ · Programming with R Thibault LAURENT General informations Improving computational time Fix the size of the vectors Use vectorized function](https://reader034.vdocuments.net/reader034/viewer/2022042313/5edf1588ad6a402d666a6fac/html5/thumbnails/25.jpg)
AdvancedProgramming
with R
ThibaultLAURENT
Generalinformations
Improvingcomputationaltime
Fix the size ofthe vectors
Use vectorizedfunction
Use C++ code
Make yourcode cleanand elegant
The functionsl-s-t-m-apply()
Use switch()instead of if/else
Other recom-mandations
Shortintroductionto S3 method
Function sapply()
Similar to lapply() but it returns a more convenient format> sapply(mtcars, mean)
mpg cyl disp hp drat wt
20.090625 6.187500 230.721875 146.687500 3.596563 3.217250
qsec vs am gear carb
17.848750 0.437500 0.406250 3.687500 2.812500
Functions l-s-t-m-apply() are useful when argument FUN is anown created function:> f <- function(x)
c(min = min(x), max = max(x), mean = mean(x), med = median(x), sd = sd(x))
> t(sapply(mtcars, f))
The result of FUN applied on each element of the list must beof the same type (otherwise, it is like lapply()):> sapply(iris, function(x){
if (is.numeric(x))
f(x)
else
table(x)
})
![Page 26: Advanced Programming with R - Free€¦ · Programming with R Thibault LAURENT General informations Improving computational time Fix the size of the vectors Use vectorized function](https://reader034.vdocuments.net/reader034/viewer/2022042313/5edf1588ad6a402d666a6fac/html5/thumbnails/26.jpg)
AdvancedProgramming
with R
ThibaultLAURENT
Generalinformations
Improvingcomputationaltime
Fix the size ofthe vectors
Use vectorizedfunction
Use C++ code
Make yourcode cleanand elegant
The functionsl-s-t-m-apply()
Use switch()instead of if/else
Other recom-mandations
Shortintroductionto S3 method
Function replicate()
Objective: we want to simulate 5 samples distributed under aU[0,1], each of size 10.
Solution 1:> res <- vector("list", 5)
> for (k in 1:5)
res[[k]] <- runif(10)
Solution 2:> res <- lapply(1:5, function(x) runif(10))
Solution 3:> res <- replicate(5, runif(10))
![Page 27: Advanced Programming with R - Free€¦ · Programming with R Thibault LAURENT General informations Improving computational time Fix the size of the vectors Use vectorized function](https://reader034.vdocuments.net/reader034/viewer/2022042313/5edf1588ad6a402d666a6fac/html5/thumbnails/27.jpg)
AdvancedProgramming
with R
ThibaultLAURENT
Generalinformations
Improvingcomputationaltime
Fix the size ofthe vectors
Use vectorizedfunction
Use C++ code
Make yourcode cleanand elegant
The functionsl-s-t-m-apply()
Use switch()instead of if/else
Other recom-mandations
Shortintroductionto S3 method
Function mapply()
Objective: we have a list of 5 samples distributed under aU[0,1]. We want to compute the weighted mean for eachsample. The weighted are given in a second list.We create a list of 5 samples and a list with the weights:
> xs <- replicate(5, runif(10), simplify = FALSE)
> ws <- replicate(5, rpois(10, 5) + 1, simplify = FALSE)
Solution 1: by using loop> res <- numeric(length(xs))
> for (k in seq_along(res))
res[k] <- sum(xs[[k]] * ws[[k]]) / mean(ws[[k]])
Solution 2 : by using mapply() function> mapply(function(x, y) sum(x * y) / mean(y), xs, ws)
[1] 4.571867 5.626009 5.364398 5.439418 4.746239
![Page 28: Advanced Programming with R - Free€¦ · Programming with R Thibault LAURENT General informations Improving computational time Fix the size of the vectors Use vectorized function](https://reader034.vdocuments.net/reader034/viewer/2022042313/5edf1588ad6a402d666a6fac/html5/thumbnails/28.jpg)
AdvancedProgramming
with R
ThibaultLAURENT
Generalinformations
Improvingcomputationaltime
Fix the size ofthe vectors
Use vectorizedfunction
Use C++ code
Make yourcode cleanand elegant
The functionsl-s-t-m-apply()
Use switch()instead of if/else
Other recom-mandations
Shortintroductionto S3 method
Function tapply()
Objective: we have two vectors of size n, one numeric x andone categorical f . We want to compute the mean with respectto the levels of x . We create two vectors:
> x <- c(rnorm(5, 165, 10), rnorm(5, 175, 10))
> f <- rep(c("M", "F"), each = 5)
Solution 1 : split() creates a list separating the data withrespect to the levels of x. Then, function sapply() permitsto compute the mean for each element of the list:> res.split <- split(x = x, f = f)
> sapply(X = res.split, FUN = mean)
F M
179.2725 163.6803
Solution 2 : by using directly function tapply()
> tapply(X = x, INDEX = f, mean)
F M
179.2725 163.6803
![Page 29: Advanced Programming with R - Free€¦ · Programming with R Thibault LAURENT General informations Improving computational time Fix the size of the vectors Use vectorized function](https://reader034.vdocuments.net/reader034/viewer/2022042313/5edf1588ad6a402d666a6fac/html5/thumbnails/29.jpg)
AdvancedProgramming
with R
ThibaultLAURENT
Generalinformations
Improvingcomputationaltime
Fix the size ofthe vectors
Use vectorizedfunction
Use C++ code
Make yourcode cleanand elegant
The functionsl-s-t-m-apply()
Use switch()instead of if/else
Other recom-mandations
Shortintroductionto S3 method
1 General informations
2 Improving computational time
3 Make your code clean and elegantThe functions l-s-t-m-apply()Use switch() instead of if/elseOther recommandations
4 Short introduction to S3 method
![Page 30: Advanced Programming with R - Free€¦ · Programming with R Thibault LAURENT General informations Improving computational time Fix the size of the vectors Use vectorized function](https://reader034.vdocuments.net/reader034/viewer/2022042313/5edf1588ad6a402d666a6fac/html5/thumbnails/30.jpg)
AdvancedProgramming
with R
ThibaultLAURENT
Generalinformations
Improvingcomputationaltime
Fix the size ofthe vectors
Use vectorizedfunction
Use C++ code
Make yourcode cleanand elegant
The functionsl-s-t-m-apply()
Use switch()instead of if/else
Other recom-mandations
Shortintroductionto S3 method
Classical if/else syntax
Objective: compute a non parametric kernel K where K is oneof the three options.
1 biweight K (x) = 1516(1 − ( xh )2)21( x
h)2≤1
2 triweight K (x) = 3532(1 − ( xh )2)31( x
h)2≤1
3 gaussien K (x) = 1√2π
exp(−0.5( xh )2))
> f_noyau <- function(x, h, type = "bi") {
if (type == "bi") {
15/16 * (1 - (x/h) ^ 2) ^ 2 * ifelse((x/h) ^ 2 <= 1, 1, 0)
} else {
if (type == "tri") {
35/32 * (1 - (x/h) ^ 2) ^ 3 * ifelse((x/h) ^ 2 <= 1, 1, 0)
} else {
1 / sqrt(2 * pi) * exp(-0.5 * (x/h) ^ 2)
}
}
}
![Page 31: Advanced Programming with R - Free€¦ · Programming with R Thibault LAURENT General informations Improving computational time Fix the size of the vectors Use vectorized function](https://reader034.vdocuments.net/reader034/viewer/2022042313/5edf1588ad6a402d666a6fac/html5/thumbnails/31.jpg)
AdvancedProgramming
with R
ThibaultLAURENT
Generalinformations
Improvingcomputationaltime
Fix the size ofthe vectors
Use vectorizedfunction
Use C++ code
Make yourcode cleanand elegant
The functionsl-s-t-m-apply()
Use switch()instead of if/else
Other recom-mandations
Shortintroductionto S3 method
Use switch() function
Syntax:
> f_noyau.2 <- function(x, h, type = "bi") {
switch(type,
bi = 15/16 * (1 - (x/h)^2)^2 * ifelse((x/h)^2 <= 1, 1, 0),
tri = 35/32 * (1 - (x/h)^2)^3 * ifelse((x/h)^2 <= 1, 1, 0),
gauss = 1 / sqrt(2 * pi) * exp(-0.5 * (x/h)^2),
"Preciser une autre valeur de type")
}
Application:
> x <- seq(-1, 1, 0.01)
> plot(x, f_noyau.2(x, 0.3, type = "bi"),
type = "l", ylab = "", ylim = c(0, 1.2))
> lines(x, f_noyau.2(x, 0.3, type = "tri"),
lty = 2)
> lines(x, f_noyau.2(x, 0.3, type = "gauss"),
lty = 3)
> f_noyau.2(x, 0.3, type = "unif")
[1] "Preciser une autre valeur de type"
−1.0 −0.5 0.0 0.5 1.0
0.0
0.2
0.4
0.6
0.8
1.0
1.2
x
![Page 32: Advanced Programming with R - Free€¦ · Programming with R Thibault LAURENT General informations Improving computational time Fix the size of the vectors Use vectorized function](https://reader034.vdocuments.net/reader034/viewer/2022042313/5edf1588ad6a402d666a6fac/html5/thumbnails/32.jpg)
AdvancedProgramming
with R
ThibaultLAURENT
Generalinformations
Improvingcomputationaltime
Fix the size ofthe vectors
Use vectorizedfunction
Use C++ code
Make yourcode cleanand elegant
The functionsl-s-t-m-apply()
Use switch()instead of if/else
Other recom-mandations
Shortintroductionto S3 method
Function stopifnot()
Stop as soon as a condition is not verified:
> stopifnot(1 < 2, length(1:2) == 2, pi < 2, cos(pi) > 3)
Usually used at the top of a function:
> try_stopifnot <- function (x, y, n) {
stopifnot(length(x) == length(y),
is.integer(n))
(x + y) ^ n
}
> try_stopifnot(2, 5, 2L)
[1] 49
![Page 33: Advanced Programming with R - Free€¦ · Programming with R Thibault LAURENT General informations Improving computational time Fix the size of the vectors Use vectorized function](https://reader034.vdocuments.net/reader034/viewer/2022042313/5edf1588ad6a402d666a6fac/html5/thumbnails/33.jpg)
AdvancedProgramming
with R
ThibaultLAURENT
Generalinformations
Improvingcomputationaltime
Fix the size ofthe vectors
Use vectorizedfunction
Use C++ code
Make yourcode cleanand elegant
The functionsl-s-t-m-apply()
Use switch()instead of if/else
Other recom-mandations
Shortintroductionto S3 method
Some recommandations (1)
Use ... as input arguments> plot_reg <- function(x, y, np = TRUE, ...) {
plot(y ~ x, ...)
abline(lm(y ~ x), col = "blue")
if (np) {
np.reg <- loess(y ~ x)
x.seq <- seq(min(x), max(x), length.out = 25)
lines(x.seq, predict(np.reg, x.seq), col = "red")
}
}
> x <- rnorm(100)
> y <- x * runif(100)
> plot_reg(x, y, np = FALSE)
> plot_reg(x, y, pch = 16, col = "pink",
xlab = "variable explicative",
ylab = "variable a expliquer")
![Page 34: Advanced Programming with R - Free€¦ · Programming with R Thibault LAURENT General informations Improving computational time Fix the size of the vectors Use vectorized function](https://reader034.vdocuments.net/reader034/viewer/2022042313/5edf1588ad6a402d666a6fac/html5/thumbnails/34.jpg)
AdvancedProgramming
with R
ThibaultLAURENT
Generalinformations
Improvingcomputationaltime
Fix the size ofthe vectors
Use vectorizedfunction
Use C++ code
Make yourcode cleanand elegant
The functionsl-s-t-m-apply()
Use switch()instead of if/else
Other recom-mandations
Shortintroductionto S3 method
Some recommandations (2)
Create several functions:> bi <- function(x, h)
15/16 * (1 - (x/h)^2)^2 * ifelse((x/h)^2 <= 1, 1, 0)
> tri <- function(x, h)
35/32 * (1 - (x/h)^2)^3 * ifelse((x/h)^2 <= 1, 1, 0)
> gauss <- function(x, h)
1 / sqrt(2 * pi) * exp(-0.5 * (x/h)^2)
> f_noyau.2 <- function(x, h, type = "bi") {
switch(type,
bi = bi(x, h),
tri = tri(x, h),
gauss = gauss(x, h),
"Preciser une autre valeur de type")
}
![Page 35: Advanced Programming with R - Free€¦ · Programming with R Thibault LAURENT General informations Improving computational time Fix the size of the vectors Use vectorized function](https://reader034.vdocuments.net/reader034/viewer/2022042313/5edf1588ad6a402d666a6fac/html5/thumbnails/35.jpg)
AdvancedProgramming
with R
ThibaultLAURENT
Generalinformations
Improvingcomputationaltime
Fix the size ofthe vectors
Use vectorizedfunction
Use C++ code
Make yourcode cleanand elegant
The functionsl-s-t-m-apply()
Use switch()instead of if/else
Other recom-mandations
Shortintroductionto S3 method
Some recommandations (3)
Do not keep un-used arguments> f <- function(a = 5, b = 4, d = 3, e = 1)
(a + b)^2
It is possible to use function as input argument:> randomise <- function(f) f(runif(1e3))
> randomise(mean)
> randomise(sum)
It is possible to use function as output argument:> f_power <- function(exponent)
function(x) x^exponent
> f_power(2)(1:5)
> f_power(3)(1:5)
![Page 36: Advanced Programming with R - Free€¦ · Programming with R Thibault LAURENT General informations Improving computational time Fix the size of the vectors Use vectorized function](https://reader034.vdocuments.net/reader034/viewer/2022042313/5edf1588ad6a402d666a6fac/html5/thumbnails/36.jpg)
AdvancedProgramming
with R
ThibaultLAURENT
Generalinformations
Improvingcomputationaltime
Fix the size ofthe vectors
Use vectorizedfunction
Use C++ code
Make yourcode cleanand elegant
The functionsl-s-t-m-apply()
Use switch()instead of if/else
Other recom-mandations
Shortintroductionto S3 method
Debugg your function
Distinguish "error message" and "warning message"
Most of the time, the error message is obvious tounderstand what is wrong> sum(c("a", "b"))
Use traceback() function> f_noyau.2(seq(-2, 2, by = 0.1), h = "n")
> traceback()
Use debugonce() function> ex_bug.2 <- function(x) {
x <- x + 1
x <- x^3
x <- log(x)
x
}
> debugonce(ex_bug.2)
> ex_bug.2(-5)
![Page 37: Advanced Programming with R - Free€¦ · Programming with R Thibault LAURENT General informations Improving computational time Fix the size of the vectors Use vectorized function](https://reader034.vdocuments.net/reader034/viewer/2022042313/5edf1588ad6a402d666a6fac/html5/thumbnails/37.jpg)
AdvancedProgramming
with R
ThibaultLAURENT
Generalinformations
Improvingcomputationaltime
Fix the size ofthe vectors
Use vectorizedfunction
Use C++ code
Make yourcode cleanand elegant
The functionsl-s-t-m-apply()
Use switch()instead of if/else
Other recom-mandations
Shortintroductionto S3 method
Function try()
If you are conscious that your code contains error and you donot want the function stops, use function try():
> f_error.1 <- function(x) {
try(x <- log(x))
x
}
> f_error.1("10")
> f_error.1(-1)
![Page 38: Advanced Programming with R - Free€¦ · Programming with R Thibault LAURENT General informations Improving computational time Fix the size of the vectors Use vectorized function](https://reader034.vdocuments.net/reader034/viewer/2022042313/5edf1588ad6a402d666a6fac/html5/thumbnails/38.jpg)
AdvancedProgramming
with R
ThibaultLAURENT
Generalinformations
Improvingcomputationaltime
Fix the size ofthe vectors
Use vectorizedfunction
Use C++ code
Make yourcode cleanand elegant
The functionsl-s-t-m-apply()
Use switch()instead of if/else
Other recom-mandations
Shortintroductionto S3 method
1 General informations
2 Improving computational time
3 Make your code clean and elegant
4 Short introduction to S3 method
![Page 39: Advanced Programming with R - Free€¦ · Programming with R Thibault LAURENT General informations Improving computational time Fix the size of the vectors Use vectorized function](https://reader034.vdocuments.net/reader034/viewer/2022042313/5edf1588ad6a402d666a6fac/html5/thumbnails/39.jpg)
AdvancedProgramming
with R
ThibaultLAURENT
Generalinformations
Improvingcomputationaltime
Fix the size ofthe vectors
Use vectorizedfunction
Use C++ code
Make yourcode cleanand elegant
The functionsl-s-t-m-apply()
Use switch()instead of if/else
Other recom-mandations
Shortintroductionto S3 method
S3 method (1)
Objective: use the same function for different classes of object.
Define new classes:> squ <- 3
> class(squ) <- "carre"
> rec <- c(5, 6)
> class(rec) <- "rectangle"
> cir <- sqrt(10)
> class(cir) <- "cercle"
Define a new method and default method:> getArea <- function(obj)
UseMethod("getArea", obj)
> getArea.default <- function(obj) {
stop("Methode getArea non definie pour ce type d'objet")}
![Page 40: Advanced Programming with R - Free€¦ · Programming with R Thibault LAURENT General informations Improving computational time Fix the size of the vectors Use vectorized function](https://reader034.vdocuments.net/reader034/viewer/2022042313/5edf1588ad6a402d666a6fac/html5/thumbnails/40.jpg)
AdvancedProgramming
with R
ThibaultLAURENT
Generalinformations
Improvingcomputationaltime
Fix the size ofthe vectors
Use vectorizedfunction
Use C++ code
Make yourcode cleanand elegant
The functionsl-s-t-m-apply()
Use switch()instead of if/else
Other recom-mandations
Shortintroductionto S3 method
S3 method (2)
Create a function getArea associated to each class> getArea.cercle <- function(obj) {
pi * obj[1] ^ 2
}
> getArea.rectangle <- function(obj) {
obj[1] * obj[2]
}
> getArea.carre <- function(obj) {
obj[1]^2
}
Examples:> getArea(cir)
[1] 31.41593
> getArea(rec)
[1] 30
> getArea(squ)
[1] 9