the state of hpc in the open source r ecosystem · a closer look at hpc and r \olcf researchers...

57
The State of HPC in the Open Source R Ecosystem Drew Schmidt November 12, 2016 Slides: wrathematics.github.io/hpcdevcon2016/ Drew Schmidt The State of HPC in the Open Source R Ecosystem

Upload: others

Post on 27-Apr-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: The State of HPC in the Open Source R Ecosystem · A Closer Look at HPC and R \OLCF Researchers Scale R to Tackle Big Science Data Sets" A problem that takes several hours on Apache

The State of HPC in the Open Source R Ecosystem

Drew Schmidt

November 12, 2016

Slides: wrathematics.github.io/hpcdevcon2016/ Drew Schmidt The State of HPC in the Open Source R Ecosystem

Page 2: The State of HPC in the Open Source R Ecosystem · A Closer Look at HPC and R \OLCF Researchers Scale R to Tackle Big Science Data Sets" A problem that takes several hours on Apache

Support and Disclaimer

This material is based upon work supported by the National Science Foundation Division ofMathematical Sciences under Grant No. 1418195.

The findings and conclusions in this presentation have not been formally disseminated by theU.S. Department of Health & Human Services nor by the U.S. Department of Energy, andshould not be construed to represent any determination or policy of University, Agency,Administration and National Laboratory.

Slides: wrathematics.github.io/hpcdevcon2016/ Drew Schmidt The State of HPC in the Open Source R Ecosystem

Page 3: The State of HPC in the Open Source R Ecosystem · A Closer Look at HPC and R \OLCF Researchers Scale R to Tackle Big Science Data Sets" A problem that takes several hours on Apache

Speaker Bio

M.S. in mathematics.

Former statistics consultant.

Former full-time university researcher.

Now a miserable grad student.

Prolific complainer on twitter.

Slides: wrathematics.github.io/hpcdevcon2016/ Drew Schmidt The State of HPC in the Open Source R Ecosystem

Page 4: The State of HPC in the Open Source R Ecosystem · A Closer Look at HPC and R \OLCF Researchers Scale R to Tackle Big Science Data Sets" A problem that takes several hours on Apache

Goals of This Talk

Convince you that R has a legitimate place in HPC.

Give a broad overview of the R package landscape.

Make some very safe predictions.

Slides: wrathematics.github.io/hpcdevcon2016/ Drew Schmidt The State of HPC in the Open Source R Ecosystem

Page 5: The State of HPC in the Open Source R Ecosystem · A Closer Look at HPC and R \OLCF Researchers Scale R to Tackle Big Science Data Sets" A problem that takes several hours on Apache

Contents

1 Background and Motivation

2 A Little History

3 Packages

4 A Closer Look at HPC and R

5 Concluding Remarks

Slides: wrathematics.github.io/hpcdevcon2016/ Drew Schmidt The State of HPC in the Open Source R Ecosystem

Page 6: The State of HPC in the Open Source R Ecosystem · A Closer Look at HPC and R \OLCF Researchers Scale R to Tackle Big Science Data Sets" A problem that takes several hours on Apache

Background and Motivation

1 Background and MotivationR Is WeirdR Is Popular

Slides: wrathematics.github.io/hpcdevcon2016/ Drew Schmidt The State of HPC in the Open Source R Ecosystem

Page 7: The State of HPC in the Open Source R Ecosystem · A Closer Look at HPC and R \OLCF Researchers Scale R to Tackle Big Science Data Sets" A problem that takes several hours on Apache

Background and Motivation R Is Weird

1 Background and MotivationR Is WeirdR Is Popular

Slides: wrathematics.github.io/hpcdevcon2016/ Drew Schmidt The State of HPC in the Open Source R Ecosystem

Page 8: The State of HPC in the Open Source R Ecosystem · A Closer Look at HPC and R \OLCF Researchers Scale R to Tackle Big Science Data Sets" A problem that takes several hours on Apache

Background and Motivation R Is Weird

Types

logical (“boolean”)

integer (32-bit int)

numeric (double)

complex (double complex)

character (string)

Also raw and external pointer

Data Structures

Vectors (matrices, n-dim arrays)

Lists (arrays of pointers)

Dataframes (lists with constraints)

Environments (hash tables?!)

That’s it.

Slides: wrathematics.github.io/hpcdevcon2016/ Drew Schmidt The State of HPC in the Open Source R Ecosystem 1/39

Page 9: The State of HPC in the Open Source R Ecosystem · A Closer Look at HPC and R \OLCF Researchers Scale R to Tackle Big Science Data Sets" A problem that takes several hours on Apache

Background and Motivation R Is Weird

Happy Opposite Day!

1 T

2 ## [1] TRUE

3 F

4 ## [1] FALSE

5

6 T <- FALSE

7 F <- TRUE

8

9 T

10 ## [1] FALSE

11 F

12 ## [1] TRUE

Slides: wrathematics.github.io/hpcdevcon2016/ Drew Schmidt The State of HPC in the Open Source R Ecosystem 2/39

Page 10: The State of HPC in the Open Source R Ecosystem · A Closer Look at HPC and R \OLCF Researchers Scale R to Tackle Big Science Data Sets" A problem that takes several hours on Apache

Background and Motivation R Is Weird

Odd Conventions

. has no semantic meaning (except when it does

t.test()

t.data.frame()

A package is installed in a library.

Slides: wrathematics.github.io/hpcdevcon2016/ Drew Schmidt The State of HPC in the Open Source R Ecosystem 3/39

Page 11: The State of HPC in the Open Source R Ecosystem · A Closer Look at HPC and R \OLCF Researchers Scale R to Tackle Big Science Data Sets" A problem that takes several hours on Apache

Background and Motivation R Is Weird

Package or Library?

I wrote a library.

I put that library into a package.

I installed the package . . . into a library.

I load the package with library() ???

Slides: wrathematics.github.io/hpcdevcon2016/ Drew Schmidt The State of HPC in the Open Source R Ecosystem 4/39

Page 12: The State of HPC in the Open Source R Ecosystem · A Closer Look at HPC and R \OLCF Researchers Scale R to Tackle Big Science Data Sets" A problem that takes several hours on Apache

Background and Motivation R Is Weird

*BOOM*

Slides: wrathematics.github.io/hpcdevcon2016/ Drew Schmidt The State of HPC in the Open Source R Ecosystem 5/39

Page 13: The State of HPC in the Open Source R Ecosystem · A Closer Look at HPC and R \OLCF Researchers Scale R to Tackle Big Science Data Sets" A problem that takes several hours on Apache

Background and Motivation R Is Popular

1 Background and MotivationR Is WeirdR Is Popular

Slides: wrathematics.github.io/hpcdevcon2016/ Drew Schmidt The State of HPC in the Open Source R Ecosystem

Page 14: The State of HPC in the Open Source R Ecosystem · A Closer Look at HPC and R \OLCF Researchers Scale R to Tackle Big Science Data Sets" A problem that takes several hours on Apache

Background and Motivation R Is Popular

Part Programming Language, Part Data Analysis Package

“R is a shockingly dreadful language for an exceptionally useful data analysis environment.”— Tim Smith, from aRrgh: a newcomer’s (angry) guide to R.

Slides: wrathematics.github.io/hpcdevcon2016/ Drew Schmidt The State of HPC in the Open Source R Ecosystem 6/39

Page 15: The State of HPC in the Open Source R Ecosystem · A Closer Look at HPC and R \OLCF Researchers Scale R to Tackle Big Science Data Sets" A problem that takes several hours on Apache

Background and Motivation R Is Popular

IEEE Spectrum’s 2014 Ranking of Programming Languages

Slides: wrathematics.github.io/hpcdevcon2016/ Drew Schmidt The State of HPC in the Open Source R Ecosystem 7/39

Page 16: The State of HPC in the Open Source R Ecosystem · A Closer Look at HPC and R \OLCF Researchers Scale R to Tackle Big Science Data Sets" A problem that takes several hours on Apache

Background and Motivation R Is Popular

IEEE Spectrum’s 2016 Ranking of Programming Languages

Slides: wrathematics.github.io/hpcdevcon2016/ Drew Schmidt The State of HPC in the Open Source R Ecosystem 8/39

Page 17: The State of HPC in the Open Source R Ecosystem · A Closer Look at HPC and R \OLCF Researchers Scale R to Tackle Big Science Data Sets" A problem that takes several hours on Apache

Background and Motivation R Is Popular

Rexer 2015 data scientist survey

Slides: wrathematics.github.io/hpcdevcon2016/ Drew Schmidt The State of HPC in the Open Source R Ecosystem 9/39

Page 18: The State of HPC in the Open Source R Ecosystem · A Closer Look at HPC and R \OLCF Researchers Scale R to Tackle Big Science Data Sets" A problem that takes several hours on Apache

Background and Motivation R Is Popular

Slides: wrathematics.github.io/hpcdevcon2016/ Drew Schmidt The State of HPC in the Open Source R Ecosystem 10/39

Page 19: The State of HPC in the Open Source R Ecosystem · A Closer Look at HPC and R \OLCF Researchers Scale R to Tackle Big Science Data Sets" A problem that takes several hours on Apache

Background and Motivation R Is Popular

Slides: wrathematics.github.io/hpcdevcon2016/ Drew Schmidt The State of HPC in the Open Source R Ecosystem 11/39

Page 20: The State of HPC in the Open Source R Ecosystem · A Closer Look at HPC and R \OLCF Researchers Scale R to Tackle Big Science Data Sets" A problem that takes several hours on Apache

Background and Motivation R Is Popular

Slides: wrathematics.github.io/hpcdevcon2016/ Drew Schmidt The State of HPC in the Open Source R Ecosystem 12/39

Page 21: The State of HPC in the Open Source R Ecosystem · A Closer Look at HPC and R \OLCF Researchers Scale R to Tackle Big Science Data Sets" A problem that takes several hours on Apache

Background and Motivation R Is Popular

Why use R at all?

Most diverse set of statistical methods available.

Rapid prototyping.

CRAN (and increasingly GitHub) packages.

Awesome community.

Syntax is designed for analysis of data.

Slides: wrathematics.github.io/hpcdevcon2016/ Drew Schmidt The State of HPC in the Open Source R Ecosystem 13/39

Page 22: The State of HPC in the Open Source R Ecosystem · A Closer Look at HPC and R \OLCF Researchers Scale R to Tackle Big Science Data Sets" A problem that takes several hours on Apache

A Little History

2 A Little HistoryStatistics, Data Science, Big Data, and So OnEnter R

Slides: wrathematics.github.io/hpcdevcon2016/ Drew Schmidt The State of HPC in the Open Source R Ecosystem

Page 23: The State of HPC in the Open Source R Ecosystem · A Closer Look at HPC and R \OLCF Researchers Scale R to Tackle Big Science Data Sets" A problem that takes several hours on Apache

A Little History Statistics, Data Science, Big Data, and So On

2 A Little HistoryStatistics, Data Science, Big Data, and So OnEnter R

Slides: wrathematics.github.io/hpcdevcon2016/ Drew Schmidt The State of HPC in the Open Source R Ecosystem

Page 24: The State of HPC in the Open Source R Ecosystem · A Closer Look at HPC and R \OLCF Researchers Scale R to Tackle Big Science Data Sets" A problem that takes several hours on Apache

A Little History Statistics, Data Science, Big Data, and So On

HPC: Not Just for PDE’S Anymore!

R’s use in HPC.

No traditional HPC. . .

Lots of interesting work

Slides: wrathematics.github.io/hpcdevcon2016/ Drew Schmidt The State of HPC in the Open Source R Ecosystem 14/39

Page 25: The State of HPC in the Open Source R Ecosystem · A Closer Look at HPC and R \OLCF Researchers Scale R to Tackle Big Science Data Sets" A problem that takes several hours on Apache

A Little History Statistics, Data Science, Big Data, and So On

About Traditional HPC. . .

Slides: wrathematics.github.io/hpcdevcon2016/ Drew Schmidt The State of HPC in the Open Source R Ecosystem 15/39

Page 26: The State of HPC in the Open Source R Ecosystem · A Closer Look at HPC and R \OLCF Researchers Scale R to Tackle Big Science Data Sets" A problem that takes several hours on Apache

A Little History Statistics, Data Science, Big Data, and So On

Changing Landscape of HPC

“non-traditional” HPC: everybody but physics.

What kind of software do they need?

Can we leverage any existing HPC stuff?

Slides: wrathematics.github.io/hpcdevcon2016/ Drew Schmidt The State of HPC in the Open Source R Ecosystem 16/39

Page 27: The State of HPC in the Open Source R Ecosystem · A Closer Look at HPC and R \OLCF Researchers Scale R to Tackle Big Science Data Sets" A problem that takes several hours on Apache

A Little History Statistics, Data Science, Big Data, and So On

Problems with ”Big Data“ Software

Many frameworks; what do they all do?

Don’t always play nice with HPC systems.

Often not as ”high level“ as advertised.

Almost exclusively batch!

Slides: wrathematics.github.io/hpcdevcon2016/ Drew Schmidt The State of HPC in the Open Source R Ecosystem 17/39

Page 28: The State of HPC in the Open Source R Ecosystem · A Closer Look at HPC and R \OLCF Researchers Scale R to Tackle Big Science Data Sets" A problem that takes several hours on Apache

A Little History Statistics, Data Science, Big Data, and So On

Data Analysis Is An Interactive Activity

Data analysis is an interactive activitya

aData analysis is an interactive activity

Slides: wrathematics.github.io/hpcdevcon2016/ Drew Schmidt The State of HPC in the Open Source R Ecosystem 18/39

Page 29: The State of HPC in the Open Source R Ecosystem · A Closer Look at HPC and R \OLCF Researchers Scale R to Tackle Big Science Data Sets" A problem that takes several hours on Apache

A Little History Statistics, Data Science, Big Data, and So On

Data science in action

Slides: wrathematics.github.io/hpcdevcon2016/ Drew Schmidt The State of HPC in the Open Source R Ecosystem 19/39

Page 30: The State of HPC in the Open Source R Ecosystem · A Closer Look at HPC and R \OLCF Researchers Scale R to Tackle Big Science Data Sets" A problem that takes several hours on Apache

A Little History Enter R

2 A Little HistoryStatistics, Data Science, Big Data, and So OnEnter R

Slides: wrathematics.github.io/hpcdevcon2016/ Drew Schmidt The State of HPC in the Open Source R Ecosystem

Page 31: The State of HPC in the Open Source R Ecosystem · A Closer Look at HPC and R \OLCF Researchers Scale R to Tackle Big Science Data Sets" A problem that takes several hours on Apache

A Little History Enter R

http://datascience.la/john-chambers-user-2014-keynote/

Slides: wrathematics.github.io/hpcdevcon2016/ Drew Schmidt The State of HPC in the Open Source R Ecosystem 20/39

Page 32: The State of HPC in the Open Source R Ecosystem · A Closer Look at HPC and R \OLCF Researchers Scale R to Tackle Big Science Data Sets" A problem that takes several hours on Apache

Packages

3 PackagesAdvanced Compute PackagesHPC PackagesHadoop and ApplicationsOk, So What?

Slides: wrathematics.github.io/hpcdevcon2016/ Drew Schmidt The State of HPC in the Open Source R Ecosystem

Page 33: The State of HPC in the Open Source R Ecosystem · A Closer Look at HPC and R \OLCF Researchers Scale R to Tackle Big Science Data Sets" A problem that takes several hours on Apache

Packages

Where to Begin?

Many packages of varying scope and quality.

1 core package (parallel)

Over 100 contributed packageshttps://cran.r-project.org/web/views/HighPerformanceComputing.html

Even more on GitHub.

Slides: wrathematics.github.io/hpcdevcon2016/ Drew Schmidt The State of HPC in the Open Source R Ecosystem 21/39

Page 34: The State of HPC in the Open Source R Ecosystem · A Closer Look at HPC and R \OLCF Researchers Scale R to Tackle Big Science Data Sets" A problem that takes several hours on Apache

Packages

Slides: wrathematics.github.io/hpcdevcon2016/ Drew Schmidt The State of HPC in the Open Source R Ecosystem 22/39

Page 35: The State of HPC in the Open Source R Ecosystem · A Closer Look at HPC and R \OLCF Researchers Scale R to Tackle Big Science Data Sets" A problem that takes several hours on Apache

Packages Advanced Compute Packages

3 PackagesAdvanced Compute PackagesHPC PackagesHadoop and ApplicationsOk, So What?

Slides: wrathematics.github.io/hpcdevcon2016/ Drew Schmidt The State of HPC in the Open Source R Ecosystem

Page 36: The State of HPC in the Open Source R Ecosystem · A Closer Look at HPC and R \OLCF Researchers Scale R to Tackle Big Science Data Sets" A problem that takes several hours on Apache

Packages Advanced Compute Packages

Out of Core Packages

ff, bigmemory and friends

R is very “copy happy”

Many statisticians don’t know about things like XSEDE.

Others hear “Linux” and run away screaming.

Bizarrely, cloud computing is changing this.

Slides: wrathematics.github.io/hpcdevcon2016/ Drew Schmidt The State of HPC in the Open Source R Ecosystem 23/39

Page 37: The State of HPC in the Open Source R Ecosystem · A Closer Look at HPC and R \OLCF Researchers Scale R to Tackle Big Science Data Sets" A problem that takes several hours on Apache

Packages Advanced Compute Packages

Rcpp

Rcpp

RcppArmadillo, RcppEigen

RcppParallel

. . .

Slides: wrathematics.github.io/hpcdevcon2016/ Drew Schmidt The State of HPC in the Open Source R Ecosystem 24/39

Page 38: The State of HPC in the Open Source R Ecosystem · A Closer Look at HPC and R \OLCF Researchers Scale R to Tackle Big Science Data Sets" A problem that takes several hours on Apache

Packages HPC Packages

3 PackagesAdvanced Compute PackagesHPC PackagesHadoop and ApplicationsOk, So What?

Slides: wrathematics.github.io/hpcdevcon2016/ Drew Schmidt The State of HPC in the Open Source R Ecosystem

Page 39: The State of HPC in the Open Source R Ecosystem · A Closer Look at HPC and R \OLCF Researchers Scale R to Tackle Big Science Data Sets" A problem that takes several hours on Apache

Packages HPC Packages

Accelerator Packages

gputools, Magma, HiPLARM, a few others.

Accessibility mostly from things like nvblas and Intel MKL.

Slides: wrathematics.github.io/hpcdevcon2016/ Drew Schmidt The State of HPC in the Open Source R Ecosystem 25/39

Page 40: The State of HPC in the Open Source R Ecosystem · A Closer Look at HPC and R \OLCF Researchers Scale R to Tackle Big Science Data Sets" A problem that takes several hours on Apache

Packages HPC Packages

Distributed Packages

Rmpi

snow

pbdMPI and friends

Slides: wrathematics.github.io/hpcdevcon2016/ Drew Schmidt The State of HPC in the Open Source R Ecosystem 26/39

Page 41: The State of HPC in the Open Source R Ecosystem · A Closer Look at HPC and R \OLCF Researchers Scale R to Tackle Big Science Data Sets" A problem that takes several hours on Apache

Packages HPC Packages

Remote Evaluation Packages

rzmq, pbdZMQ

remoter, future

Slides: wrathematics.github.io/hpcdevcon2016/ Drew Schmidt The State of HPC in the Open Source R Ecosystem 27/39

Page 42: The State of HPC in the Open Source R Ecosystem · A Closer Look at HPC and R \OLCF Researchers Scale R to Tackle Big Science Data Sets" A problem that takes several hours on Apache

Packages Hadoop and Applications

3 PackagesAdvanced Compute PackagesHPC PackagesHadoop and ApplicationsOk, So What?

Slides: wrathematics.github.io/hpcdevcon2016/ Drew Schmidt The State of HPC in the Open Source R Ecosystem

Page 43: The State of HPC in the Open Source R Ecosystem · A Closer Look at HPC and R \OLCF Researchers Scale R to Tackle Big Science Data Sets" A problem that takes several hours on Apache

Packages Hadoop and Applications

Hadoop et al Packages

RHadoop, RHIPE

SparkR

sparklyr

h2o

Slides: wrathematics.github.io/hpcdevcon2016/ Drew Schmidt The State of HPC in the Open Source R Ecosystem 28/39

Page 44: The State of HPC in the Open Source R Ecosystem · A Closer Look at HPC and R \OLCF Researchers Scale R to Tackle Big Science Data Sets" A problem that takes several hours on Apache

Packages Hadoop and Applications

“Applications”

dplyr and data.table

caret

randomForest

xgboost

Slides: wrathematics.github.io/hpcdevcon2016/ Drew Schmidt The State of HPC in the Open Source R Ecosystem 29/39

Page 45: The State of HPC in the Open Source R Ecosystem · A Closer Look at HPC and R \OLCF Researchers Scale R to Tackle Big Science Data Sets" A problem that takes several hours on Apache

Packages Ok, So What?

3 PackagesAdvanced Compute PackagesHPC PackagesHadoop and ApplicationsOk, So What?

Slides: wrathematics.github.io/hpcdevcon2016/ Drew Schmidt The State of HPC in the Open Source R Ecosystem

Page 46: The State of HPC in the Open Source R Ecosystem · A Closer Look at HPC and R \OLCF Researchers Scale R to Tackle Big Science Data Sets" A problem that takes several hours on Apache

Packages Ok, So What?

Is the R community using this stuff?

Short answer: yes.

Long answer: mostly single-node parallelism.

Hard truth: in addition to hype and buzzwords — fear and distrust

Slides: wrathematics.github.io/hpcdevcon2016/ Drew Schmidt The State of HPC in the Open Source R Ecosystem 30/39

Page 47: The State of HPC in the Open Source R Ecosystem · A Closer Look at HPC and R \OLCF Researchers Scale R to Tackle Big Science Data Sets" A problem that takes several hours on Apache

Packages Ok, So What?

Slides: wrathematics.github.io/hpcdevcon2016/ Drew Schmidt The State of HPC in the Open Source R Ecosystem 31/39

Page 48: The State of HPC in the Open Source R Ecosystem · A Closer Look at HPC and R \OLCF Researchers Scale R to Tackle Big Science Data Sets" A problem that takes several hours on Apache

Packages Ok, So What?

Source https://twitter.com/eddelbuettel/status/787740983433854977

Slides: wrathematics.github.io/hpcdevcon2016/ Drew Schmidt The State of HPC in the Open Source R Ecosystem 32/39

Page 49: The State of HPC in the Open Source R Ecosystem · A Closer Look at HPC and R \OLCF Researchers Scale R to Tackle Big Science Data Sets" A problem that takes several hours on Apache

A Closer Look at HPC and R

4 A Closer Look at HPC and R

Slides: wrathematics.github.io/hpcdevcon2016/ Drew Schmidt The State of HPC in the Open Source R Ecosystem

Page 50: The State of HPC in the Open Source R Ecosystem · A Closer Look at HPC and R \OLCF Researchers Scale R to Tackle Big Science Data Sets" A problem that takes several hours on Apache

A Closer Look at HPC and R

HPC may be dying, but we’re behind the times

0

50

100

150

2014 2015 2016 2017Date

Pac

kage

Dow

nloa

d M

arke

tsha

repackage

pbdMPI

Rmpi

Slides: wrathematics.github.io/hpcdevcon2016/ Drew Schmidt The State of HPC in the Open Source R Ecosystem 33/39

Page 51: The State of HPC in the Open Source R Ecosystem · A Closer Look at HPC and R \OLCF Researchers Scale R to Tackle Big Science Data Sets" A problem that takes several hours on Apache

A Closer Look at HPC and R

“OLCF Researchers Scale R to Tackle Big Science Data Sets”

A problem that takes several hours on Apache Spark[was analyzed] in less than a minute using R on OLCFhigh-performance hardware.

“. . . for situations where one needs interactivenear-real-time analysis, the pbdR approach is muchbetter.”

https://www.hpcwire.com/2016/07/06/

olcf-researchers-scale-r-tackle-big-science-data-sets/

Slides: wrathematics.github.io/hpcdevcon2016/ Drew Schmidt The State of HPC in the Open Source R Ecosystem 34/39

Page 52: The State of HPC in the Open Source R Ecosystem · A Closer Look at HPC and R \OLCF Researchers Scale R to Tackle Big Science Data Sets" A problem that takes several hours on Apache

A Closer Look at HPC and R

Slides: wrathematics.github.io/hpcdevcon2016/ Drew Schmidt The State of HPC in the Open Source R Ecosystem 35/39

Page 53: The State of HPC in the Open Source R Ecosystem · A Closer Look at HPC and R \OLCF Researchers Scale R to Tackle Big Science Data Sets" A problem that takes several hours on Apache

A Closer Look at HPC and R

Interconnection Network

PROC + cache

PROC + cache

PROC + cache

PROC + cache

Mem Mem Mem Mem

Distributed Memory

Memory

CORE + cache

CORE + cache

CORE + cache

CORE + cache

Network

Shared Memory Local Memory

Co-Processor

GPU: Graphical Processing Unit

MIC: Many Integrated Core

Slides: wrathematics.github.io/hpcdevcon2016/ Drew Schmidt The State of HPC in the Open Source R Ecosystem 36/39

Page 54: The State of HPC in the Open Source R Ecosystem · A Closer Look at HPC and R \OLCF Researchers Scale R to Tackle Big Science Data Sets" A problem that takes several hours on Apache

A Closer Look at HPC and R

Local Memory

Co-Processor

GPU: Graphical Processing Unit

MIC: Many Integrated Core

Interconnection Network

PROC + cache

PROC + cache

PROC + cache

PROC + cache

Mem Mem Mem Mem

Distributed Memory

Memory

CORE + cache

CORE + cache

CORE + cache

CORE + cache

Network

Shared Memory

Trilinos

PETSc

PLASMA

DPLASMALibSci (Cray) MKL (Intel)

ScaLAPACK PBLAS BLACS

cuBLAS (NVIDIA)

MAGMA

PAPI

Tau

MPImpiP

fpmpi

NetCDF4

ADIOS

pbdMPI

pbdPAPI

pbdNCDF4

pbdADIOS

pbdPROF pbdPROF pbdPROF

ACML (AMD)

pbdDEMO

CombBLAS

cuSPARSE (NVIDIA)

pbdDMATpbdDMATpbdDMATpbdDMAT

pbdBASE pbdSLAP

HiPLARHiPLARM

magma

ZeroMQ pbdCS

Profiling

I/O

Learning

Released Under DevelopmentSlides: wrathematics.github.io/hpcdevcon2016/ Drew Schmidt The State of HPC in the Open Source R Ecosystem 37/39

Page 55: The State of HPC in the Open Source R Ecosystem · A Closer Look at HPC and R \OLCF Researchers Scale R to Tackle Big Science Data Sets" A problem that takes several hours on Apache

Concluding Remarks

5 Concluding Remarks

Slides: wrathematics.github.io/hpcdevcon2016/ Drew Schmidt The State of HPC in the Open Source R Ecosystem

Page 56: The State of HPC in the Open Source R Ecosystem · A Closer Look at HPC and R \OLCF Researchers Scale R to Tackle Big Science Data Sets" A problem that takes several hours on Apache

Concluding Remarks

The Future?

Better dplyr backends.

More threading + accelerator usage in packages (Rcpp + RcppParallel).

Astronomical amounts of buzz in the Haddop/Spark-and-friends space — will ultimatelyhurt us in the MPI space.

Slides: wrathematics.github.io/hpcdevcon2016/ Drew Schmidt The State of HPC in the Open Source R Ecosystem 38/39

Page 57: The State of HPC in the Open Source R Ecosystem · A Closer Look at HPC and R \OLCF Researchers Scale R to Tackle Big Science Data Sets" A problem that takes several hours on Apache

Concluding Remarks

∼Thanks!∼

Questions?

Email: [email protected]

GitHub: https://github.com/wrathematics

Web: http://wrathematics.info

Twitter: @wrathematics

Slides: wrathematics.github.io/hpcdevcon2016/ Drew Schmidt The State of HPC in the Open Source R Ecosystem 39/39