intro to parallel computing in r - wmich.edu · intro to parallel computing in r kevinlee...

16
Intro to Parallel Computing in R Kevin Lee Department of Statistics Western Michigan University February 22, 2019 Kevin Lee Statistics Colloquium WMU February 22, 2019 1 / 16

Upload: others

Post on 06-Aug-2020

9 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Intro to Parallel Computing in R - wmich.edu · Intro to Parallel Computing in R KevinLee Department of Statistics Western Michigan University February22,2019 Kevin Lee Statistics

Intro to Parallel Computing in R

Kevin Lee

Department of StatisticsWestern Michigan University

February 22, 2019

Kevin Lee Statistics Colloquium WMU February 22, 2019 1 / 16

Page 2: Intro to Parallel Computing in R - wmich.edu · Intro to Parallel Computing in R KevinLee Department of Statistics Western Michigan University February22,2019 Kevin Lee Statistics

Outline

1 List of Useful R Packages

2 Introduction to Parallel Computing in R

Kevin Lee Statistics Colloquium WMU February 22, 2019 2 / 16

Page 3: Intro to Parallel Computing in R - wmich.edu · Intro to Parallel Computing in R KevinLee Department of Statistics Western Michigan University February22,2019 Kevin Lee Statistics

List of Useful R Packages

Some of the top most downloaded R packages:

Check https://support.rstudio.com/hc/en-us/articles/201057987-Quick-list-of-useful-R-packages.

Kevin Lee Statistics Colloquium WMU February 22, 2019 3 / 16

Page 4: Intro to Parallel Computing in R - wmich.edu · Intro to Parallel Computing in R KevinLee Department of Statistics Western Michigan University February22,2019 Kevin Lee Statistics

Outline

1 List of Useful R Packages

2 Introduction to Parallel Computing in R

Kevin Lee Statistics Colloquium WMU February 22, 2019 4 / 16

Page 5: Intro to Parallel Computing in R - wmich.edu · Intro to Parallel Computing in R KevinLee Department of Statistics Western Michigan University February22,2019 Kevin Lee Statistics

Why Parallel Computing?

Many statistical analysis tasks are computationally intensive.

But at the same time, many problems are “embarrassingly parallel”.

And often we have multiple cores in our computer!

However, R only uses a single core.

Kevin Lee Statistics Colloquium WMU February 22, 2019 5 / 16

Page 6: Intro to Parallel Computing in R - wmich.edu · Intro to Parallel Computing in R KevinLee Department of Statistics Western Michigan University February22,2019 Kevin Lee Statistics

Embarrassingly Parallel Problems

Easy to speed things up when:

Calculating similar things many times (e.g. iterations in a loop).

Calculations are independent of each other.

Each calculation takes a decent amount of time.

Kevin Lee Statistics Colloquium WMU February 22, 2019 6 / 16

Page 7: Intro to Parallel Computing in R - wmich.edu · Intro to Parallel Computing in R KevinLee Department of Statistics Western Michigan University February22,2019 Kevin Lee Statistics

How Does Parallel Computing Works?

Kevin Lee Statistics Colloquium WMU February 22, 2019 7 / 16

Page 8: Intro to Parallel Computing in R - wmich.edu · Intro to Parallel Computing in R KevinLee Department of Statistics Western Michigan University February22,2019 Kevin Lee Statistics

Motivation Example: Birthday Problem

What is the probability that in a group of n people at least two have thesame birthday?

Kevin Lee Statistics Colloquium WMU February 22, 2019 8 / 16

Page 9: Intro to Parallel Computing in R - wmich.edu · Intro to Parallel Computing in R KevinLee Department of Statistics Western Michigan University February22,2019 Kevin Lee Statistics

Key Motivation: Speeding up R

Kevin Lee Statistics Colloquium WMU February 22, 2019 9 / 16

Page 10: Intro to Parallel Computing in R - wmich.edu · Intro to Parallel Computing in R KevinLee Department of Statistics Western Michigan University February22,2019 Kevin Lee Statistics

Key Motivation: Speeding up R

Kevin Lee Statistics Colloquium WMU February 22, 2019 10 / 16

Page 11: Intro to Parallel Computing in R - wmich.edu · Intro to Parallel Computing in R KevinLee Department of Statistics Western Michigan University February22,2019 Kevin Lee Statistics

foreach package

Repeated executions can be done manually, but it becomes quitetedious to execute repeated operations.

R comes with various looping constructs that solve this problem. Thefor loop is one of the more common looping constructs, but therepeat and while statements are also quite useful.

In addition, there is the family of “apply” functions, which includesapply, lapply, sapply, tapply, and others.

Kevin Lee Statistics Colloquium WMU February 22, 2019 11 / 16

Page 12: Intro to Parallel Computing in R - wmich.edu · Intro to Parallel Computing in R KevinLee Department of Statistics Western Michigan University February22,2019 Kevin Lee Statistics

foreach package

The foreach package provides a new looping construct for executingR code repeatedly.

The main reason for using the foreach package is that it supportsparallel execution.

It can execute those repeated operations on multiple processors/coreson your computer, or on multiple nodes of a cluster.

Kevin Lee Statistics Colloquium WMU February 22, 2019 12 / 16

Page 13: Intro to Parallel Computing in R - wmich.edu · Intro to Parallel Computing in R KevinLee Department of Statistics Western Michigan University February22,2019 Kevin Lee Statistics

doParallel package

The doParallel package is a “parallel backend” for the foreachpackage.

It provides a mechanism needed to execute foreach loops in parallel.

The foreach package must be used in conjunction with a packagesuch as doParallel in order to execute code in parallel.

Kevin Lee Statistics Colloquium WMU February 22, 2019 13 / 16

Page 14: Intro to Parallel Computing in R - wmich.edu · Intro to Parallel Computing in R KevinLee Department of Statistics Western Michigan University February22,2019 Kevin Lee Statistics

Example: Simulation Study

We will perform a simulation study to check the performance ofbootstrap estimate of standard error of sample mean on differentsample sizes.

We first generate random samples of size 10, 50, and 100 from normaldistribution with mean 0 and standard deviation 2. Next we find thebootstrap estimate of standard error of sample mean for each samplesize with 2000 bootstrap samples.

We repeat this 100 times and calculate the root-mean-squared error(RMSE) to compare the performance of bootstrap estimate ofstandard error of sample mean on different sample sizes.

RMSE =

√√√√ 1R

R∑r=1

(SE(x) − SEr (x∗)

)2

Kevin Lee Statistics Colloquium WMU February 22, 2019 14 / 16

Page 15: Intro to Parallel Computing in R - wmich.edu · Intro to Parallel Computing in R KevinLee Department of Statistics Western Michigan University February22,2019 Kevin Lee Statistics

Conclusion

A rule of thumb: If you can wrap your task in an apply function orone of its variants then you can also use parallel computing!

Check how many cores your laptop or desktop has and start usingparallel computing in R!

Kevin Lee Statistics Colloquium WMU February 22, 2019 15 / 16

Page 16: Intro to Parallel Computing in R - wmich.edu · Intro to Parallel Computing in R KevinLee Department of Statistics Western Michigan University February22,2019 Kevin Lee Statistics

References

L. Collado-Torres’s websitehttp://lcolladotor.github.io/

Steve Weston’s documents.Using the foreach PackageGetting Started with doParallel and foreach

Kevin Lee Statistics Colloquium WMU February 22, 2019 16 / 16