![Page 1: R Snow 2015Sept22 - secure.hosting.vt.edu€¦ · 22/09/2015 · clusterSetupRNG(cl, type = 'RNGstream') #samples per core core mh.n.sims.cl](https://reader033.vdocuments.net/reader033/viewer/2022050601/5fa85432afffd102237fc0b7/html5/thumbnails/1.jpg)
Parallelizing R with the Snow Library Advanced Research Computing September 22, 2015
![Page 2: R Snow 2015Sept22 - secure.hosting.vt.edu€¦ · 22/09/2015 · clusterSetupRNG(cl, type = 'RNGstream') #samples per core core mh.n.sims.cl](https://reader033.vdocuments.net/reader033/viewer/2022050601/5fa85432afffd102237fc0b7/html5/thumbnails/2.jpg)
Outline
• Introduc9on • Snow Basics • Examples • Conclusions
![Page 3: R Snow 2015Sept22 - secure.hosting.vt.edu€¦ · 22/09/2015 · clusterSetupRNG(cl, type = 'RNGstream') #samples per core core mh.n.sims.cl](https://reader033.vdocuments.net/reader033/viewer/2022050601/5fa85432afffd102237fc0b7/html5/thumbnails/3.jpg)
4 4
Introduc9on
![Page 4: R Snow 2015Sept22 - secure.hosting.vt.edu€¦ · 22/09/2015 · clusterSetupRNG(cl, type = 'RNGstream') #samples per core core mh.n.sims.cl](https://reader033.vdocuments.net/reader033/viewer/2022050601/5fa85432afffd102237fc0b7/html5/thumbnails/4.jpg)
R
• Programming language and environment for sta9s9cal compu9ng
• Free • Intrinsic support for wide array of sta9s9cal func9onality
• Huge number of user-‐created packages to add or improve func9onality
![Page 5: R Snow 2015Sept22 - secure.hosting.vt.edu€¦ · 22/09/2015 · clusterSetupRNG(cl, type = 'RNGstream') #samples per core core mh.n.sims.cl](https://reader033.vdocuments.net/reader033/viewer/2022050601/5fa85432afffd102237fc0b7/html5/thumbnails/5.jpg)
An Aside: Op9mizing R
• Pre-‐allocate Variables • Vectorize (or perhaps apply func9ons)
– Yes: z = x * y – No:
for (i in 1:length(x)) { z[i] = x[i] * y[i] }
• Reference: The R Inferno hOp://www.burns-‐stat.com/documents/books/the-‐r-‐inferno/
![Page 6: R Snow 2015Sept22 - secure.hosting.vt.edu€¦ · 22/09/2015 · clusterSetupRNG(cl, type = 'RNGstream') #samples per core core mh.n.sims.cl](https://reader033.vdocuments.net/reader033/viewer/2022050601/5fa85432afffd102237fc0b7/html5/thumbnails/6.jpg)
An Aside: Op9mizing R (con9nued)
• Many R opera9ons use Basic Linear Algebra Subrou9nes (BLAS)
• Build R with op9mized BLAS è op9mized R
0"
50"
100"
150"
200"
250"
300"
gcc" Intel"
Run$Time$(s)$
Run$Time$for$R$2.5$Benchmark$by$Build$Type$
Standard"
Op4mized"BLAS"
![Page 7: R Snow 2015Sept22 - secure.hosting.vt.edu€¦ · 22/09/2015 · clusterSetupRNG(cl, type = 'RNGstream') #samples per core core mh.n.sims.cl](https://reader033.vdocuments.net/reader033/viewer/2022050601/5fa85432afffd102237fc0b7/html5/thumbnails/7.jpg)
The Need for Parallelism
![Page 8: R Snow 2015Sept22 - secure.hosting.vt.edu€¦ · 22/09/2015 · clusterSetupRNG(cl, type = 'RNGstream') #samples per core core mh.n.sims.cl](https://reader033.vdocuments.net/reader033/viewer/2022050601/5fa85432afffd102237fc0b7/html5/thumbnails/8.jpg)
9 9
Snow
![Page 9: R Snow 2015Sept22 - secure.hosting.vt.edu€¦ · 22/09/2015 · clusterSetupRNG(cl, type = 'RNGstream') #samples per core core mh.n.sims.cl](https://reader033.vdocuments.net/reader033/viewer/2022050601/5fa85432afffd102237fc0b7/html5/thumbnails/9.jpg)
Snow Basics
• Simple Network of Worksta9ons (SNOW) • For embarrassingly parallel tasks • Master/Slave model: $ ps -u jkrometi -o cmd | grep R
R -f time_mh.r --restore –no-save
R –slave <etc>
R –slave <etc>
![Page 10: R Snow 2015Sept22 - secure.hosting.vt.edu€¦ · 22/09/2015 · clusterSetupRNG(cl, type = 'RNGstream') #samples per core core mh.n.sims.cl](https://reader033.vdocuments.net/reader033/viewer/2022050601/5fa85432afffd102237fc0b7/html5/thumbnails/10.jpg)
Snow: Start/Stop Cluster
• Load libraries: library(snow) library(Rmpi)
• Start a cluster with ncores cores: cl <- makeCluster(ncores, type = 'MPI')
• Ini9alize random number generator: clusterSetupRNG(cl, type = 'RNGstream')
• Stop the cluster (important): stopCluster(cl)
![Page 11: R Snow 2015Sept22 - secure.hosting.vt.edu€¦ · 22/09/2015 · clusterSetupRNG(cl, type = 'RNGstream') #samples per core core mh.n.sims.cl](https://reader033.vdocuments.net/reader033/viewer/2022050601/5fa85432afffd102237fc0b7/html5/thumbnails/11.jpg)
Snow: Compu9ng
• Call same func9on across cluster (ncores 9mes): clusterCall(cl, fun, ...)
• Parallel versions of apply: clusterApply(cl, x, fun, ...)
parApply(cl, X, MARGIN, FUN, ...) parLapply(cl, x, fun, ...)
parRapply(cl, x, fun, ...) parCapply(cl, x, fun, ...)
![Page 12: R Snow 2015Sept22 - secure.hosting.vt.edu€¦ · 22/09/2015 · clusterSetupRNG(cl, type = 'RNGstream') #samples per core core mh.n.sims.cl](https://reader033.vdocuments.net/reader033/viewer/2022050601/5fa85432afffd102237fc0b7/html5/thumbnails/12.jpg)
13 13
Examples
![Page 13: R Snow 2015Sept22 - secure.hosting.vt.edu€¦ · 22/09/2015 · clusterSetupRNG(cl, type = 'RNGstream') #samples per core core mh.n.sims.cl](https://reader033.vdocuments.net/reader033/viewer/2022050601/5fa85432afffd102237fc0b7/html5/thumbnails/13.jpg)
Monte Carlo: Calcula9ng π
• The ra9o of the area of the unit circle to the area of the unit square is
• So: – Randomly pick S points in the unit square – Count the number in the unit circle (C) – Then
π4
π ≈ 4CS
![Page 14: R Snow 2015Sept22 - secure.hosting.vt.edu€¦ · 22/09/2015 · clusterSetupRNG(cl, type = 'RNGstream') #samples per core core mh.n.sims.cl](https://reader033.vdocuments.net/reader033/viewer/2022050601/5fa85432afffd102237fc0b7/html5/thumbnails/14.jpg)
MC π: Code
mcpi <- function(n.pts) { #generate n.pts (x,y) points in the unit square
m = matrix(runif(2*n.pts),n.pts,2)
#determine if they are in the unit circle
in.ucir = function(x) {as.integer((x[1]^2 + x[2]^2)<=1)} cir = apply(m, 1, in.ucir )
#return the proportion of points in the unit circle * 4 return (4*mean(cir))
}
![Page 15: R Snow 2015Sept22 - secure.hosting.vt.edu€¦ · 22/09/2015 · clusterSetupRNG(cl, type = 'RNGstream') #samples per core core mh.n.sims.cl](https://reader033.vdocuments.net/reader033/viewer/2022050601/5fa85432afffd102237fc0b7/html5/thumbnails/15.jpg)
MC π: Parallelize
#start up and initialize the cluster cl <- makeCluster(ncores, type = 'MPI') clusterSetupRNG(cl, type = 'RNGstream') #determine if points are in the unit circle cir = parRapply(cl, m, in.ucir ) #calculate pi pi.approx = 4*mean(cir) #stop the cluster stopCluster(cl)
![Page 16: R Snow 2015Sept22 - secure.hosting.vt.edu€¦ · 22/09/2015 · clusterSetupRNG(cl, type = 'RNGstream') #samples per core core mh.n.sims.cl](https://reader033.vdocuments.net/reader033/viewer/2022050601/5fa85432afffd102237fc0b7/html5/thumbnails/16.jpg)
![Page 17: R Snow 2015Sept22 - secure.hosting.vt.edu€¦ · 22/09/2015 · clusterSetupRNG(cl, type = 'RNGstream') #samples per core core mh.n.sims.cl](https://reader033.vdocuments.net/reader033/viewer/2022050601/5fa85432afffd102237fc0b7/html5/thumbnails/17.jpg)
![Page 18: R Snow 2015Sept22 - secure.hosting.vt.edu€¦ · 22/09/2015 · clusterSetupRNG(cl, type = 'RNGstream') #samples per core core mh.n.sims.cl](https://reader033.vdocuments.net/reader033/viewer/2022050601/5fa85432afffd102237fc0b7/html5/thumbnails/18.jpg)
MC π: An Op9miza9on Example
> n.pts <- 500000
> m = matrix(runif(2*n.pts),n.pts,2) > in.ucir <- function(x) { as.integer((x[1]^2 + x[2]^2) <= 1) }
> system.time( apply(m, 1, in.ucir ) )
user system elapsed
5.037 0.025 5.069
> system.time( as.integer(m[,1]^2 + m[,2]^2 <= 1) )
user system elapsed
0.02 0.00 0.02
![Page 19: R Snow 2015Sept22 - secure.hosting.vt.edu€¦ · 22/09/2015 · clusterSetupRNG(cl, type = 'RNGstream') #samples per core core mh.n.sims.cl](https://reader033.vdocuments.net/reader033/viewer/2022050601/5fa85432afffd102237fc0b7/html5/thumbnails/19.jpg)
MCMC: Metropolis-‐Has9ngs
• Goal: Draw random samples with probability density approxima9ng given distribu9on
• Used to model stochas9c inputs • Do not need to know normalizing factor
– Func9ons in high dimensions
![Page 20: R Snow 2015Sept22 - secure.hosting.vt.edu€¦ · 22/09/2015 · clusterSetupRNG(cl, type = 'RNGstream') #samples per core core mh.n.sims.cl](https://reader033.vdocuments.net/reader033/viewer/2022050601/5fa85432afffd102237fc0b7/html5/thumbnails/20.jpg)
MCMC: Metropolis-‐Has9ngs
• Given a: – Target distribu9on – Jumping distribu9on – Ini9al sample
• Choose candidate sample from jumping distribu9on centered at ini9al sample
• Accept candidate as new sample: – Always if candidate is beOer fit (per target dist) – With probability <1 if candidate is worse fit
• Repeat with new sample as ini9al sample
![Page 21: R Snow 2015Sept22 - secure.hosting.vt.edu€¦ · 22/09/2015 · clusterSetupRNG(cl, type = 'RNGstream') #samples per core core mh.n.sims.cl](https://reader033.vdocuments.net/reader033/viewer/2022050601/5fa85432afffd102237fc0b7/html5/thumbnails/21.jpg)
M-‐H: Code (Markov Chain Part) #function to calculate next sample theta.update <- function(theta.cur) { #candidate sample theta.can <- jump(theta.cur) #acceptance probability accept.prob <- samp(theta.can)/samp(theta.cur) #compare with sample from uniform dist (0 to 1) if (runif(1) <= accept.prob) theta.can else theta.cur }
Reference: Lam, Patrick. "MCMC Methods: Gibbs Sampling and the Metropolis-‐HasDngs Algorithm."
![Page 22: R Snow 2015Sept22 - secure.hosting.vt.edu€¦ · 22/09/2015 · clusterSetupRNG(cl, type = 'RNGstream') #samples per core core mh.n.sims.cl](https://reader033.vdocuments.net/reader033/viewer/2022050601/5fa85432afffd102237fc0b7/html5/thumbnails/22.jpg)
Metropolis-‐Has9ngs: Code #function to generate (n.sims-burnin) samples mh <- function(n.sims, start, burnin, samp, jump) { theta.cur <- start draws <- c() #call theta.update() n.sims times for (i in 1:n.sims) { draws[i] <- theta.cur <- theta.update(theta.cur) } #return the samples after the burn in return( draws[(burnin + 1):n.sims] ) }
Reference: Lam, Patrick. "MCMC Methods: Gibbs Sampling and the Metropolis-‐HasDngs Algorithm."
![Page 23: R Snow 2015Sept22 - secure.hosting.vt.edu€¦ · 22/09/2015 · clusterSetupRNG(cl, type = 'RNGstream') #samples per core core mh.n.sims.cl](https://reader033.vdocuments.net/reader033/viewer/2022050601/5fa85432afffd102237fc0b7/html5/thumbnails/23.jpg)
![Page 24: R Snow 2015Sept22 - secure.hosting.vt.edu€¦ · 22/09/2015 · clusterSetupRNG(cl, type = 'RNGstream') #samples per core core mh.n.sims.cl](https://reader033.vdocuments.net/reader033/viewer/2022050601/5fa85432afffd102237fc0b7/html5/thumbnails/24.jpg)
![Page 25: R Snow 2015Sept22 - secure.hosting.vt.edu€¦ · 22/09/2015 · clusterSetupRNG(cl, type = 'RNGstream') #samples per core core mh.n.sims.cl](https://reader033.vdocuments.net/reader033/viewer/2022050601/5fa85432afffd102237fc0b7/html5/thumbnails/25.jpg)
Metropolis-‐Has9ngs: Parallelize #start up and initialize the cluster cl <- makeCluster(ncores, type = 'MPI') clusterSetupRNG(cl, type = 'RNGstream') #samples per core core mh.n.sims.cl <- ceiling(mh.n.sims / ncores) #call mh on each core mh.draws.cl <- clusterCall(cl, mh, mh.n.sims.cl, start = 1, burnin = mh.burnin, samp = samp.fcn, jump = jump.fcn) #reduce list to 1-D mh.draws <- unlist(mh.draws.cl) #stop the cluster stopCluster(cl)
![Page 26: R Snow 2015Sept22 - secure.hosting.vt.edu€¦ · 22/09/2015 · clusterSetupRNG(cl, type = 'RNGstream') #samples per core core mh.n.sims.cl](https://reader033.vdocuments.net/reader033/viewer/2022050601/5fa85432afffd102237fc0b7/html5/thumbnails/26.jpg)
![Page 27: R Snow 2015Sept22 - secure.hosting.vt.edu€¦ · 22/09/2015 · clusterSetupRNG(cl, type = 'RNGstream') #samples per core core mh.n.sims.cl](https://reader033.vdocuments.net/reader033/viewer/2022050601/5fa85432afffd102237fc0b7/html5/thumbnails/27.jpg)
![Page 28: R Snow 2015Sept22 - secure.hosting.vt.edu€¦ · 22/09/2015 · clusterSetupRNG(cl, type = 'RNGstream') #samples per core core mh.n.sims.cl](https://reader033.vdocuments.net/reader033/viewer/2022050601/5fa85432afffd102237fc0b7/html5/thumbnails/28.jpg)
29 29
Conclusions
![Page 29: R Snow 2015Sept22 - secure.hosting.vt.edu€¦ · 22/09/2015 · clusterSetupRNG(cl, type = 'RNGstream') #samples per core core mh.n.sims.cl](https://reader033.vdocuments.net/reader033/viewer/2022050601/5fa85432afffd102237fc0b7/html5/thumbnails/29.jpg)
R on ARC’s Systems
• R 3.0.3, 2.14.1: – Each R build comes with Rmpi and Snow module load intel R openmpi
• R 3.2.0: – Built with rlecuyer and ggplot2 – Ploeng via cairo (offline), X11 (interac9ve) – Parallel packages (Rmpi, snow, snowfall, pbdR) built into R-‐parallel module
module load intel R/3.2.0 openmpi hdf5 netcdf R-parallel/3.2.0
![Page 30: R Snow 2015Sept22 - secure.hosting.vt.edu€¦ · 22/09/2015 · clusterSetupRNG(cl, type = 'RNGstream') #samples per core core mh.n.sims.cl](https://reader033.vdocuments.net/reader033/viewer/2022050601/5fa85432afffd102237fc0b7/html5/thumbnails/30.jpg)
Geeng Started on ARC Systems
• Request an account (anyone with a VT PID): hOp://www.arc.vt.edu/account – Can also request for external collaborators
• Request a system unit alloca9on: hOp://www.arc.vt.edu/alloca9ons
![Page 31: R Snow 2015Sept22 - secure.hosting.vt.edu€¦ · 22/09/2015 · clusterSetupRNG(cl, type = 'RNGstream') #samples per core core mh.n.sims.cl](https://reader033.vdocuments.net/reader033/viewer/2022050601/5fa85432afffd102237fc0b7/html5/thumbnails/31.jpg)
32
References
• Snow Manual: hOp://cran.r-‐project.org/web/packages/snow/snow.pdf
• Snow Func9ons: hOp://www.sfu.ca/~sblay/R/snow.html
• ARC’s R page: hOp://www.arc.vt.edu/r
• Course Slides: hOp://www.arc.vt.edu/?class_note=parallel-‐r-‐i-‐snow
![Page 32: R Snow 2015Sept22 - secure.hosting.vt.edu€¦ · 22/09/2015 · clusterSetupRNG(cl, type = 'RNGstream') #samples per core core mh.n.sims.cl](https://reader033.vdocuments.net/reader033/viewer/2022050601/5fa85432afffd102237fc0b7/html5/thumbnails/32.jpg)
Ques9ons?