parallel r - virginia tech...introduction if you need your code to go faster, you have a few...
Post on 17-Jan-2020
2 Views
Preview:
TRANSCRIPT
Parallel RBob Settlage
Feb 14, 2018
Parallel R
Todays Agenda
Introduction
Brief aside:
Snow
Rmpi
pbdR (more brief)
Conclusions
·
·
R and parallel R on ARC's systems-
·
·
·
·
2/48
R
Programming language and environment for statisticalcomputing
Free
Instrisic support for wide array of statistical functionality
Huge number of user-created packages to add or improvefunctionality
·
·
·
·
3/48
Introduction
If you need your code to go faster, you have a few options:
Be a more efficient programmer
Port your code to C/C++/Fortran
Parallelize, ie use more cores
·
use vector/matrix operations
remove redundant operations
avoid memory copy/preallocate
-
-
-
·
full Monte (.C or .Call)
Rcpp
-
-
·
parallel packages
MPI
GPU…
-
-
-
4/48
An aside: Optimizing R
Pre-allocate Variables
Vectorize (or perhaps apply functions)
Reference: The R Inferno http://www.burns-stat.com/documents/books/the-r-inferno/
·
·
YES:z <- x * y
NO:for (i in 1:length(x)) { z[i] <- x[i] + y[i]}
-
-
·
5/48
The Need for Parallelism
6/48
The Need for Parallelismtransistor count
7/48
Parallelism in R
Serial by default
Embarrassing parallelism (e.g. Monte Carlo):
More advanced:
·
Exception: matrix operations using BLAS (ARC systems)-
·
snow
snowfall
-
-
·
Rmpi
pbdR
-
-
8/48
Todays Agenda
Introduction
Brief aside:
Snow
Rmpi
pbdR (more brief)
Conclusions
·
·
R and parallel R on ARC's systems-
·
·
·
·
9/48
R on ARC's Systems
http://www.arc.vt.edu/
Includes ggplot2, rlecuyer, plyr, and several other packaages
Built with OpenBLAS for GCC and MKL
Plotting via Cairo (offline) and X11 (interactive)
Parallel packages provided as part of separate R-parallelmodule build against OpenMPI
·
·
Use MKL_NUM_THREADS or OPENBLAS_NUM_THREADSto control threading
-
·
·
10/48
R libs installed on ARC
11/48
Getting started on ARCSystems
Request and account http://www.arc.vt.edu/account
Request a system unit allocationhttp://www.arc.vt.edu/allocations
R documentation http://www.arc.vt.edu/r
These examples and more:https://secure.hosting.vt.edu/www.arc.vt.edu/userguide/r/#examples
·
·
·
·
12/48
Snow
NOTE: this is being replaced by the parallel package
Simple Network of Workstations (SNOW)
For embarrassingly parallel tasks
Master/Slave model
·
·
·
13/48
Snow: Start/Stop cluster
library(snow)library(parallel)library(Rmpi) #for reference later, not part of SNOW
## start a cluster with ncoresncores = 5cl <- makeCluster(ncores, type = "MPI")# Initialize RNGclusterSetupRNG(cl, type = "RNGstream")# VERY IMPORTANT, STOP the cluster when finishedstopCluster(cl)
14/48
Snow: Computing
Calls same function across cluster
Parallel versions of apply:
·
clusterCall(cl, fun, …)-
·
clusterApply(cl, x, fun, …)
parApply(cl, X, MARGIN, FUN, …)
parLapply(cl, x, fun, …)
parRapply(cl, x, fun, …)
parCapply(cl, x, fun, …)
-
-
-
-
-
15/48
SNOW: simple example
library(snow)library(parallel)library(Rmpi) #for reference later, not part of SNOW
## start a cluster with ncoresncores = 5cl <- makeCluster(ncores, type = "MPI")# Initialize RNGclusterSetupRNG(cl, type = "RNGstream")clusterApply(cl, 1:2, get("+"), 3)xx <- 1clusterExport(cl, "xx")clusterCall(cl, function(y) xx + y, 2)# VERY IMPORTANT, STOP the cluster when finishedstopCluster(cl)
16/48
Example: Monte Carlo pi
circle
The ratio fo the area of the unit circle to the area of the unitsquare is
SO:
·π4
·
Randomly pick S points in the unit square
Count the number in the unit circle (C)
Then
-
-
- π ≈ 4 CS
17/48
MC : codeπ
# generate n.pts (x,y) points in the unit square determine if they are in# the unit circle return the proportion of points in the unit circle * 4mcpi <- function(n.pts) { m = matrix(runif(2 * n.pts), n.pts, 2) in.ucir = function(x) { as.integer((x[1]^2 + x[2]^2) <= 1) } cir = apply(m, 1, in.ucir) return(4 * mean(cir))}
18/48
MC : parallelizeπ
# start up and initialize the clustercl <- makeCluster(ncores, type = "MPI")clusterSetupRNG(cl, type = "RNGstream")# determine if points are in the unit circlesystem.time({ cir = parSapply(cl, seq(from = 1000, to = 20000, by = 1000), mcpi) #calculate pi})pi.approx = mean(cir)print(pi.approx)# stop the clusterstopCluster(cl)
19/48
MC compute time via SNOW
20/48
MC compute error via SNOW
21/48
MC : An Optimizationexample
π
n.pts <- 5e+05m <- matrix(runif(2 * n.pts), n.pts, 2)in.ucir <- function(x) { as.integer((x[1]^2 + x[2]^2) <= 1)}system.time(apply(m, 1, in.ucir))system.time(as.integer(m[, 1]^2 + m[, 2]^2 <= 1))system.time(cir = parSapply(cl, rep(10000, 500), mcpi))
22/48
MCMC: Metropolis-Hastings
Goal: draw random samples wp density ca. given distribution
Used to model stochastic inputs
Do not need to know normalizing factor
·
·
·
Functions in high dimensions-
23/48
MCMC: Metropolis-Hastings(cont)
Given a:
Choose candidate sample from jumping distribution centeredat initial sample
Accept candidate as new sample:
Repeat
·
Target distribution
jumping distribution
initial sample
-
-
-
·
·
always if candidate is better fit (per target dist)
with prob <1 if candidate is worse fit
-
-
·
24/48
M-H: Code (MC part)
Reference: Lam, Patrick. "MCMC Methods: Gibbs Sampling andthe Metropolis-Hastings Algorithm."
# function to calculate next sample candidate sampletheta.update <- function(theta.cur) { theta.can <- jump(theta.cur) # acceptance probability accept.prob <- samp(theta.can)/samp(theta.cur) # compare with sample from uniform dist (0 to 1) if (runif(1) <= accept.prob) theta.can else theta.cur}
25/48
M-H: code
# function to generate (n.sims-burnin) samplesmh <- function(n.sims, start, burnin, samp, jump) { theta.cur <- start draws <- c() # call theta.update() n.sims times for (i in 1:n.sims) { draws[i] <- theta.cur <- theta.update(theta.cur) } # return the samples after the burn in return(draws[(burnin + 1):n.sims])}
26/48
M-H
27/48
M-H
28/48
M-H: parallelize
# start up and initialize the clustercl <- makeCluster(ncores, type = "MPI")clusterSetupRNG(cl, type = "RNGstream")# samples per core coremh.n.sims.cl <- ceiling(mh.n.sims/ncores)# call mh on each coremh.draws.cl <- clusterCall(cl, mh, mh.n.sims.cl, start = 1, burnin = mh.burnin samp = samp.fcn, jump = jump.fcn)# reduce list to 1-Dmh.draws <- unlist(mh.draws.cl)# stop the clusterstopCluster(cl)
29/48
M-H
30/48
M-H
31/48
SNOW References
Snow Manual http://cran.r-project.org/web/packages/snow/snow.pdf
Snow Functions http://www.sfu.ca/~sblay/R/snow.html
ARC http://www.arc.vt.edu/r
·
·
·
32/48
MPI
33/48
MPI: Program Models
Examples:
http://cran.r-project.org/web/packages/pbdMPI/vigneIes/pbdMPI-guide.pdf
"Brute Force": Decompose problem
"Task Push": Master creates list of tasks and sends to slaves inround-robin fashion
"Task Pull": Slaves report to master when finished, receivenew tasks
·
·
·
34/48
Rmpi
User-developed package
Interfce to MPI for R +Master/slave paradigm
Allows parallelism beyond embarrassingly parallel, e.g. SNOW
Provided as part of ARC R module
·
·
·
·
35/48
Rmpi: Starting and Stopping
Load library: library(Rmpi)
Spawn nsl slaves mpi.spawn.Rslaves(nslaves=nsl)
Shut down slaves (IMPORTANT) mpi.close.Rslaves()
Clean up and quit R mpi.quit()
·
·
·
·
36/48
Rmpi basics
Run an Rmpi script like any other R script: Rscript mcpi_rmpi.r
Get the number of processes (the number of slaves +1)mpi.comm.size()
Get the rank of a process: mpi.comm.rank() + Mater: 0 - Slave:1+
·
·
·
37/48
Rmpi: Executing Remotely
#Execute on the master:paste("I am",mpi.comm.rank(),"of",mpi.comm.size()) [1] "I am 0 of 3"
#Execute Rcommand on the slaves: mpi.bcast.cmd(Rcommand)
#Execute on the slaves and return to master: result <- mpi.remote.exec(Rcommand)
#Returns nslaves-length list
38/48
Rmpi: Hello World
39/48
Rmpi communications
Broadcast a function or variable from mater to slavempi.bcast.Robj2slave(object)
Send object to destination mpi.send.Robj(object, destination,tag)
Receive a sent message recv <-mpi.recv.Robj(mpi.any.source(),mpi.any.tag())
Get tag from received message recv.info <-mpi.get.sourcetag()
·
·
·
·
40/48
Rmpi Example: Passmessages
# Function to pass message to next slavemessage.pass <- function() { # Get each slave's rank myrank <- mpi.comm.rank() # Get partner slave's rank (some hackery to avoid master) otherrank <- (myrank + 1)%%mpi.comm.size() otherrank <- otherrank + (otherrank == 0) # Send a message to the partner mpi.send.Robj(paste("I am rank", myrank), dest = otherrank, tag = myrank # Receive the message & tag (includes source) recv.msg <- mpi.recv.Robj(mpi.any.source(), mpi.any.tag()) recv.tag <- mpi.get.sourcetag() paste("Received message '", recv.msg, "' from process ", recv.tag[1], \n", sep = "")}
41/48
Rmpi: other communicationfunctions
Low-level
Advanced:
·
Send: mpi.send()
Receive: mpi.recv()
-
-
·
Scatter: mpi.scatter()
Gather: mpi.gather()
Reduce: mpi.reduce()
-
-
-
42/48
Rmpi example: MC part 1π
# Function to calculate whether a point is in the unit circlein.ucir <- function(x) { as.integer((x[, 1]^2 + x[, 2]^2) <= 1)}# Function to generate n.pts random points in the unit square and count the# number in the unit circlecount.in.cir <- function(n.pts) { # Create a list of n.pts random (x,y) pairs m <- matrix(runif(n.pts * 2), n.pts, 2) # Determine whether each point is in unit circle in.cir <- in.ucir(m) # Count the points in the unit circle return(sum(in.cir))}# Send variables and functions to slavesmpi.bcast.Robj2slave(n.pts)mpi.bcast.Robj2slave(in.ucir)mpi.bcast.Robj2slave(count.in.cir)
43/48
Rmpi example: MC part 2π
# Call count.in.cir() on slavesmpi.bcast.cmd(n.in.cir <- count.in.cir(n.pts))# Call count.in.cir() on mastern.in.cir <- count.in.cir(n.pts)# Use mpi.reduce() to total across all processes Have to do two steps# (slaves, master) to avoid hangmpi.bcast.cmd(mpi.reduce(n.in.cir, type = 1, op = "sum"))n.in.cir <- mpi.reduce(n.in.cir, type = 1, op = "sum")# pi is roughly 4*proportion of points in the circlepi.approx <- 4 * n.in.cir/(mpi.comm.size() * n.pts)
44/48
Rmpi: MC
Notes:
π
Generate and analyze data in each process
Use mpi.reduce() to sum up results
·
minimize size of messages
minimize frequency of message passing
-
-
·
45/48
pbdR
"Programming with Big Data in R"
designed for HPC
·
·
46/48
pbdR: Components
MPI: pbdMPI MPI SPMD-style interface
Distributed Linear Algebra and Statistics:
pbdNCDF4 interface to NetCDF4 file formats
pbdML: machine learning
Profiling: pbdPROF, pbdPAPI, hpcvis
·
·
pbdSLAP
pbdBASE
pbdDMAT
-
-
-
·
·
·
47/48
pbdMPI example:
Looks like normal mpi call externally:
mpirun -np 8 Rscript mcpi_pbdr.r
library(pbdMPI, quiet = TRUE)init()n.pts <- 1e+06in.ucir <- function(x) { as.integer((x[, 1]^2 + x[, 2]^2) <= 1)}count.in.cir <- function(n.pts) { m <- matrix(runif(n.pts * 2), n.pts, 2) in.cir <- in.ucir(m) return(sum(in.cir))}# Call count.in.cir on each processn.in.cir <- count.in.cir(n.pts)# Use reduce() to total across processesn.in.cir <- reduce(n.in.cir, op = "sum")pi.approx <- 4 * n.in.cir/(comm.size() * n.pts)finalize()
48/48
top related