r snow 2015sept22 - secure.hosting.vt.edu€¦ · 22/09/2015  · clustersetuprng(cl, type =...

32
Parallelizing R with the Snow Library Advanced Research Computing September 22, 2015

Upload: others

Post on 11-Aug-2020

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: R Snow 2015Sept22 - secure.hosting.vt.edu€¦ · 22/09/2015  · clusterSetupRNG(cl, type = 'RNGstream') #samples per core core mh.n.sims.cl

Parallelizing  R  with  the  Snow  Library  Advanced Research Computing September 22, 2015

Page 2: R Snow 2015Sept22 - secure.hosting.vt.edu€¦ · 22/09/2015  · clusterSetupRNG(cl, type = 'RNGstream') #samples per core core mh.n.sims.cl

Outline  

•  Introduc9on  •  Snow  Basics  •  Examples  •  Conclusions  

Page 3: R Snow 2015Sept22 - secure.hosting.vt.edu€¦ · 22/09/2015  · clusterSetupRNG(cl, type = 'RNGstream') #samples per core core mh.n.sims.cl

4  4  

Introduc9on  

Page 4: R Snow 2015Sept22 - secure.hosting.vt.edu€¦ · 22/09/2015  · clusterSetupRNG(cl, type = 'RNGstream') #samples per core core mh.n.sims.cl

R  

•  Programming  language  and  environment  for  sta9s9cal  compu9ng  

•  Free  •  Intrinsic  support  for  wide  array  of  sta9s9cal  func9onality  

•  Huge  number  of  user-­‐created  packages  to  add  or  improve  func9onality  

Page 5: R Snow 2015Sept22 - secure.hosting.vt.edu€¦ · 22/09/2015  · clusterSetupRNG(cl, type = 'RNGstream') #samples per core core mh.n.sims.cl

An  Aside:  Op9mizing  R  

•  Pre-­‐allocate  Variables  •  Vectorize  (or  perhaps  apply  func9ons)  

–  Yes:  z = x * y –  No:    

for (i in 1:length(x)) { z[i] = x[i] * y[i] }

•  Reference:  The  R  Inferno  hOp://www.burns-­‐stat.com/documents/books/the-­‐r-­‐inferno/  

Page 6: R Snow 2015Sept22 - secure.hosting.vt.edu€¦ · 22/09/2015  · clusterSetupRNG(cl, type = 'RNGstream') #samples per core core mh.n.sims.cl

An  Aside:  Op9mizing  R  (con9nued)  

•  Many  R  opera9ons  use  Basic  Linear  Algebra  Subrou9nes  (BLAS)  

•  Build  R  with  op9mized  BLAS  è  op9mized  R  

0"

50"

100"

150"

200"

250"

300"

gcc" Intel"

Run$Time$(s)$

Run$Time$for$R$2.5$Benchmark$by$Build$Type$

Standard"

Op4mized"BLAS"

Page 7: R Snow 2015Sept22 - secure.hosting.vt.edu€¦ · 22/09/2015  · clusterSetupRNG(cl, type = 'RNGstream') #samples per core core mh.n.sims.cl

The  Need  for  Parallelism  

Page 8: R Snow 2015Sept22 - secure.hosting.vt.edu€¦ · 22/09/2015  · clusterSetupRNG(cl, type = 'RNGstream') #samples per core core mh.n.sims.cl

9  9  

Snow  

Page 9: R Snow 2015Sept22 - secure.hosting.vt.edu€¦ · 22/09/2015  · clusterSetupRNG(cl, type = 'RNGstream') #samples per core core mh.n.sims.cl

Snow  Basics  

•  Simple  Network  of  Worksta9ons  (SNOW)  •  For  embarrassingly  parallel  tasks  •  Master/Slave  model:  $ ps -u jkrometi -o cmd | grep R

R -f time_mh.r --restore –no-save

R –slave <etc>

R –slave <etc>

Page 10: R Snow 2015Sept22 - secure.hosting.vt.edu€¦ · 22/09/2015  · clusterSetupRNG(cl, type = 'RNGstream') #samples per core core mh.n.sims.cl

Snow:  Start/Stop  Cluster  

•  Load  libraries:  library(snow) library(Rmpi)

•  Start  a  cluster  with  ncores  cores:  cl <- makeCluster(ncores, type = 'MPI')

•  Ini9alize  random  number  generator:  clusterSetupRNG(cl, type = 'RNGstream')

•  Stop  the  cluster  (important):  stopCluster(cl)

Page 11: R Snow 2015Sept22 - secure.hosting.vt.edu€¦ · 22/09/2015  · clusterSetupRNG(cl, type = 'RNGstream') #samples per core core mh.n.sims.cl

Snow:  Compu9ng  

•  Call  same  func9on  across  cluster  (ncores  9mes):    clusterCall(cl, fun, ...)

•  Parallel  versions  of  apply:  clusterApply(cl, x, fun, ...)

parApply(cl, X, MARGIN, FUN, ...) parLapply(cl, x, fun, ...)

parRapply(cl, x, fun, ...) parCapply(cl, x, fun, ...)

Page 12: R Snow 2015Sept22 - secure.hosting.vt.edu€¦ · 22/09/2015  · clusterSetupRNG(cl, type = 'RNGstream') #samples per core core mh.n.sims.cl

13  13  

Examples  

Page 13: R Snow 2015Sept22 - secure.hosting.vt.edu€¦ · 22/09/2015  · clusterSetupRNG(cl, type = 'RNGstream') #samples per core core mh.n.sims.cl

Monte  Carlo:  Calcula9ng  π  

•  The  ra9o  of  the  area  of  the  unit  circle  to  the  area  of  the  unit  square  is  

•  So:  – Randomly  pick  S  points  in  the  unit  square  – Count  the  number  in  the  unit  circle  (C)  – Then  

π4

π ≈ 4CS

Page 14: R Snow 2015Sept22 - secure.hosting.vt.edu€¦ · 22/09/2015  · clusterSetupRNG(cl, type = 'RNGstream') #samples per core core mh.n.sims.cl

MC  π:  Code  

mcpi <- function(n.pts) { #generate n.pts (x,y) points in the unit square

m = matrix(runif(2*n.pts),n.pts,2)

#determine if they are in the unit circle

in.ucir = function(x) {as.integer((x[1]^2 + x[2]^2)<=1)} cir = apply(m, 1, in.ucir )

#return the proportion of points in the unit circle * 4 return (4*mean(cir))

}

Page 15: R Snow 2015Sept22 - secure.hosting.vt.edu€¦ · 22/09/2015  · clusterSetupRNG(cl, type = 'RNGstream') #samples per core core mh.n.sims.cl

MC  π:  Parallelize  

#start up and initialize the cluster cl <- makeCluster(ncores, type = 'MPI') clusterSetupRNG(cl, type = 'RNGstream') #determine if points are in the unit circle cir = parRapply(cl, m, in.ucir ) #calculate pi pi.approx = 4*mean(cir) #stop the cluster stopCluster(cl)

Page 16: R Snow 2015Sept22 - secure.hosting.vt.edu€¦ · 22/09/2015  · clusterSetupRNG(cl, type = 'RNGstream') #samples per core core mh.n.sims.cl
Page 17: R Snow 2015Sept22 - secure.hosting.vt.edu€¦ · 22/09/2015  · clusterSetupRNG(cl, type = 'RNGstream') #samples per core core mh.n.sims.cl
Page 18: R Snow 2015Sept22 - secure.hosting.vt.edu€¦ · 22/09/2015  · clusterSetupRNG(cl, type = 'RNGstream') #samples per core core mh.n.sims.cl

MC  π:  An  Op9miza9on  Example    

> n.pts <- 500000

> m = matrix(runif(2*n.pts),n.pts,2) > in.ucir <- function(x) { as.integer((x[1]^2 + x[2]^2) <= 1) }

> system.time( apply(m, 1, in.ucir ) )

user system elapsed

5.037 0.025 5.069

> system.time( as.integer(m[,1]^2 + m[,2]^2 <= 1) )

user system elapsed

0.02 0.00 0.02

Page 19: R Snow 2015Sept22 - secure.hosting.vt.edu€¦ · 22/09/2015  · clusterSetupRNG(cl, type = 'RNGstream') #samples per core core mh.n.sims.cl

MCMC:  Metropolis-­‐Has9ngs  

•  Goal:  Draw  random  samples  with  probability  density  approxima9ng  given  distribu9on  

•  Used  to  model  stochas9c  inputs  •  Do  not  need  to  know  normalizing  factor  

– Func9ons  in  high  dimensions  

Page 20: R Snow 2015Sept22 - secure.hosting.vt.edu€¦ · 22/09/2015  · clusterSetupRNG(cl, type = 'RNGstream') #samples per core core mh.n.sims.cl

MCMC:  Metropolis-­‐Has9ngs  

•  Given  a:  –  Target  distribu9on  –  Jumping  distribu9on  –  Ini9al  sample  

•  Choose  candidate  sample  from  jumping  distribu9on  centered  at  ini9al  sample  

•  Accept  candidate  as  new  sample:  – Always  if  candidate  is  beOer  fit  (per  target  dist)  – With  probability  <1  if  candidate  is  worse  fit  

•  Repeat  with  new  sample  as  ini9al  sample  

Page 21: R Snow 2015Sept22 - secure.hosting.vt.edu€¦ · 22/09/2015  · clusterSetupRNG(cl, type = 'RNGstream') #samples per core core mh.n.sims.cl

M-­‐H:  Code  (Markov  Chain  Part)  #function to calculate next sample theta.update <- function(theta.cur) { #candidate sample theta.can <- jump(theta.cur) #acceptance probability accept.prob <- samp(theta.can)/samp(theta.cur) #compare with sample from uniform dist (0 to 1) if (runif(1) <= accept.prob) theta.can else theta.cur }

Reference:  Lam,  Patrick.  "MCMC  Methods:  Gibbs  Sampling  and  the  Metropolis-­‐HasDngs  Algorithm."  

Page 22: R Snow 2015Sept22 - secure.hosting.vt.edu€¦ · 22/09/2015  · clusterSetupRNG(cl, type = 'RNGstream') #samples per core core mh.n.sims.cl

Metropolis-­‐Has9ngs:  Code  #function to generate (n.sims-burnin) samples mh <- function(n.sims, start, burnin, samp, jump) { theta.cur <- start draws <- c() #call theta.update() n.sims times for (i in 1:n.sims) { draws[i] <- theta.cur <- theta.update(theta.cur) } #return the samples after the burn in return( draws[(burnin + 1):n.sims] ) }

Reference:  Lam,  Patrick.  "MCMC  Methods:  Gibbs  Sampling  and  the  Metropolis-­‐HasDngs  Algorithm."  

Page 23: R Snow 2015Sept22 - secure.hosting.vt.edu€¦ · 22/09/2015  · clusterSetupRNG(cl, type = 'RNGstream') #samples per core core mh.n.sims.cl
Page 24: R Snow 2015Sept22 - secure.hosting.vt.edu€¦ · 22/09/2015  · clusterSetupRNG(cl, type = 'RNGstream') #samples per core core mh.n.sims.cl
Page 25: R Snow 2015Sept22 - secure.hosting.vt.edu€¦ · 22/09/2015  · clusterSetupRNG(cl, type = 'RNGstream') #samples per core core mh.n.sims.cl

Metropolis-­‐Has9ngs:  Parallelize  #start up and initialize the cluster cl <- makeCluster(ncores, type = 'MPI') clusterSetupRNG(cl, type = 'RNGstream') #samples per core core mh.n.sims.cl <- ceiling(mh.n.sims / ncores) #call mh on each core mh.draws.cl <- clusterCall(cl, mh, mh.n.sims.cl, start = 1, burnin = mh.burnin, samp = samp.fcn, jump = jump.fcn) #reduce list to 1-D mh.draws <- unlist(mh.draws.cl) #stop the cluster stopCluster(cl)

Page 26: R Snow 2015Sept22 - secure.hosting.vt.edu€¦ · 22/09/2015  · clusterSetupRNG(cl, type = 'RNGstream') #samples per core core mh.n.sims.cl
Page 27: R Snow 2015Sept22 - secure.hosting.vt.edu€¦ · 22/09/2015  · clusterSetupRNG(cl, type = 'RNGstream') #samples per core core mh.n.sims.cl
Page 28: R Snow 2015Sept22 - secure.hosting.vt.edu€¦ · 22/09/2015  · clusterSetupRNG(cl, type = 'RNGstream') #samples per core core mh.n.sims.cl

29  29  

Conclusions  

Page 29: R Snow 2015Sept22 - secure.hosting.vt.edu€¦ · 22/09/2015  · clusterSetupRNG(cl, type = 'RNGstream') #samples per core core mh.n.sims.cl

R  on  ARC’s  Systems  

•  R  3.0.3,  2.14.1:  – Each  R  build  comes  with  Rmpi  and  Snow  module load intel R openmpi

•  R  3.2.0:  – Built  with  rlecuyer  and  ggplot2  – Ploeng  via  cairo  (offline),  X11  (interac9ve)  – Parallel  packages  (Rmpi,  snow,  snowfall,  pbdR)  built  into  R-­‐parallel  module  

module load intel R/3.2.0 openmpi hdf5 netcdf R-parallel/3.2.0

Page 30: R Snow 2015Sept22 - secure.hosting.vt.edu€¦ · 22/09/2015  · clusterSetupRNG(cl, type = 'RNGstream') #samples per core core mh.n.sims.cl

Geeng  Started  on  ARC  Systems  

•  Request  an  account  (anyone  with  a  VT  PID):    hOp://www.arc.vt.edu/account  – Can  also  request  for  external  collaborators  

•  Request  a  system  unit  alloca9on:  hOp://www.arc.vt.edu/alloca9ons  

Page 31: R Snow 2015Sept22 - secure.hosting.vt.edu€¦ · 22/09/2015  · clusterSetupRNG(cl, type = 'RNGstream') #samples per core core mh.n.sims.cl

32

References  

•  Snow  Manual:  hOp://cran.r-­‐project.org/web/packages/snow/snow.pdf  

•  Snow  Func9ons:    hOp://www.sfu.ca/~sblay/R/snow.html  

•  ARC’s  R  page:  hOp://www.arc.vt.edu/r  

•  Course  Slides:    hOp://www.arc.vt.edu/?class_note=parallel-­‐r-­‐i-­‐snow  

Page 32: R Snow 2015Sept22 - secure.hosting.vt.edu€¦ · 22/09/2015  · clusterSetupRNG(cl, type = 'RNGstream') #samples per core core mh.n.sims.cl

Ques9ons?