introducing revolution r open: enhanced, open source r distribution from revolution analytics

33
Introducing Revolution R Open The Enhanced R Distribution November 12, 2014

Upload: revolution-analytics

Post on 02-Jul-2015

3.627 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Introducing Revolution R Open: Enhanced, Open Source R distribution from Revolution Analytics

Introducing

Revolution R Open The Enhanced R Distribution

November 12, 2014

Page 2: Introducing Revolution R Open: Enhanced, Open Source R distribution from Revolution Analytics

In today’s webinar:

R Update

Revolution R Open

The Reproducible R Toolkit

MRAN

Other open-source projects

• DeployR Open

• ParallelR

• Rhadoop

Revolution R Plus

Q&A

David Smith Chief Community Officer

Revolution Analytics @revodavid

[email protected]

Editor, blog.revolutionanalytics.com

Co-author, “Introduction to R”

Page 3: Introducing Revolution R Open: Enhanced, Open Source R distribution from Revolution Analytics

3

OUR COMPANY

The leading provider

of advanced analytics

software and services

based on open source R,

since 2007

OUR PRODUCT

REVOLUTION R: The

enterprise-grade predictive

analytics application platform

based on the R language

SOME KUDOS

Visionary

Gartner Magic Quadrant

for Advanced Analytics

Platforms, 2014

Page 4: Introducing Revolution R Open: Enhanced, Open Source R distribution from Revolution Analytics

What is R? Most widely used data analysis software

• Used by 2M+ data scientists, statisticians and analysts

Most powerful statistical programming language

• Flexible, extensible and comprehensive for productivity

Create beautiful and unique data visualizations

• As seen in New York Times, Twitter and Flowing Data

Thriving open-source community

• Leading edge of analytics research

Fills the Data Science talent gap

• New graduates prefer R

www.revolutionanalytics.com/what-is-r

Page 5: Introducing Revolution R Open: Enhanced, Open Source R distribution from Revolution Analytics

5

Poll #1

What software do you use for statistical analysis? (Select all that apply.)

R

SAS

SPSS

Python

Other

Page 6: Introducing Revolution R Open: Enhanced, Open Source R distribution from Revolution Analytics

6

R’s popularity is growing rapidly More at blog.revolutionanalytics.com/popularity

R Usage Growth Rexer Data Miner Survey, 2007-2013

• Rexer Data Miner Survey • IEEE Spectrum, July 2014

#9: R

Language Popularity IEEE Spectrum Top Programming Languages

Page 7: Introducing Revolution R Open: Enhanced, Open Source R distribution from Revolution Analytics

7

Revolution R Open is:

Enhanced Open Source R distribution

Compatible with all R-related software

Multi-threaded for performance

Focus on reproducibility

Open source (GPLv2 license)

Available for Windows, Mac OS X, Ubuntu,

Red Hat and OpenSUSE

Download from mran.revolutionanalytics.com

Page 8: Introducing Revolution R Open: Enhanced, Open Source R distribution from Revolution Analytics

8

Multi-threaded performance Intel MKL replaces standard

BLAS/LAPACK algorithms

Pipelined operations

– Optimized for Intel, works for all archs

High-performance algorithms

Sequential Parallel

– Uses as many threads as there are

available cores

– Control with: setMKLthreads(<value>)

No need to change any R code

Included in RRO binary distribution

More at Revolutions blog

Page 9: Introducing Revolution R Open: Enhanced, Open Source R distribution from Revolution Analytics

9

100% Compatibility Built on latest R engine

– Currently R 3.1.1, R 3.1.2 in testing

100% compatible with

– R scripts

– R packages

– Applications with R connections

Designed to work with Rstudio

– No configuration required

Replaces existing R application

– Side-by-side installations

Page 10: Introducing Revolution R Open: Enhanced, Open Source R distribution from Revolution Analytics

Reproducibility – why do we care? Academic / Research

Verify results

Advance Research

Business

Production code

Reliability

Reusability

Collaboration

Regulation

10

www.nytimes.com/2011/07/08/health/research/08genes.html

http://arxiv.org/pdf/1010.1092.pdf

Page 11: Introducing Revolution R Open: Enhanced, Open Source R distribution from Revolution Analytics

11

An R Reproducibility Problem

Adapted from http://xkcd.com/234/ CC BY-NC 2.5

Page 12: Introducing Revolution R Open: Enhanced, Open Source R distribution from Revolution Analytics

12

Reproducible R Toolkit in RRO

Static CRAN mirror – CRAN packages fixed with each Revolution R Open update

Daily CRAN snapshots – Storing every package version since September 2014

– Binaries and sources

– At mran.revolutionanalytics.com/snapshot

Easily write and share scripts synced to a specific snapshot – “checkpoint” package installed with RRO

CRAN

R R Daily

snapshots

http://mran.revolutionanalytics.com/snapshot/

checkpoint

package

library(checkpoint)

checkpoint("2014-09-17")

CRAN mirror

http://cran.revolutionanalytics.com/

checkpoint

server

Midnight

UTC

Page 13: Introducing Revolution R Open: Enhanced, Open Source R distribution from Revolution Analytics

13

Using checkpoint Easy to use: add 2 lines to the top of each script

library(checkpoint)

checkpoint("2014-09-17")

For the package author:

– Use package versions available on the chosen date

– Installs packages local to this project

• Allows different package versions to be used simultaneously

For a script collaborator:

– Automatically installs required packages

• Detects required packages (no need to manually install!)

– Uses same package versions as script author to ensure reproducibility

Page 14: Introducing Revolution R Open: Enhanced, Open Source R distribution from Revolution Analytics

14

MRAN: The Managed R Archive Network Download Revolution R

Open

Learn about R and RRO

Daily CRAN snapshots

Explore Packages

– and dependencies

Explore Task Views

Page 15: Introducing Revolution R Open: Enhanced, Open Source R distribution from Revolution Analytics

Revolution Analytics

Open Source Projects More at projects.revolutionanalytics.com

Page 16: Introducing Revolution R Open: Enhanced, Open Source R distribution from Revolution Analytics

16

DeployR Open Goal: embed results from R scripts into

existing applications, in real time

Problem:

– Exposing arbitrary R functions is unwise

– Need to handle concurrent R sessions

Solution: DeployR Open

– R, on a server, behind a firewall

– Repository Manager defines entry points

• Expose only authorized R functions

– Automatically creates Web Services APIs

– Manages and monitors pool of R sessions

– Separates roles for R and app developer

DeployR Open: for prototyping integrations

– Revolution R Enterprise adds grid-scaling and

enterprise authentication

More at deployr.revolutionanalytics.com

Page 17: Introducing Revolution R Open: Enhanced, Open Source R distribution from Revolution Analytics

17

DeployR : Integration DeployR does not provide any application UI.

3 integration modes embed real-time R results into existing interfaces

Web app, mobile app, desktop app, BI tool, Excel, …

RBroker Framework (tutorial):

Simple, high-performance API for Java, .NET and Javascript apps

Supports transactional, on-demand analytics on a stateless R session

Client Libraries (tutorial):

Flexible control of R services from Java, .NET and Javascript apps

Also supports stateful R integrations (e.g. complex GUIs)

DeployR Web Services API:

Integrate R using almost any client languages

Page 18: Introducing Revolution R Open: Enhanced, Open Source R distribution from Revolution Analytics

Only available in Revolution R Enterprise DeployR

18

DeployR : Security / Scalability Layers 1. Anonymous execution

– Only authorized, user-defined R functions accessible

– No state preserved

2. Basic username / password authentication

– Managed in DeployR Administration Console

3. Enterprise Authentication

– Verifies identify with SSO / LDAP / Active Directory / PAM

4. Adaptive load-balancing grid

– Ensures service availability

Page 19: Introducing Revolution R Open: Enhanced, Open Source R distribution from Revolution Analytics

DeployR Open demo

Fraud detection

19

Page 20: Introducing Revolution R Open: Enhanced, Open Source R distribution from Revolution Analytics

20

RHadoop and ParallelR Toolkits for data scientists and numerical analysts to create custom

parallel and distributed algorithms

ParallelR: parallel programming for multi-CPU servers and grids

RHadoop: map-reduce programming in R language

Mainly useful for “embarrassingly parallel” problems, where parallel

components work with small amounts of data

Big Data Predictive Analytics mostly not embarrassingly parallel

80+ pre-built “parallel external memory algorithms” included with

Revolution R Enterprise

Page 21: Introducing Revolution R Open: Enhanced, Open Source R distribution from Revolution Analytics

21

RHadoop Collection of packages for interfacing R and Hadoop

Client (desktop) R interface to Hadoop:

– rhdfs: Browse, read, write and modify files stored in HDFS

– rhbase: Browse, read, write and modify tables stored in HBASE

– ravro: Read, write and run map-reduce on Apache Avro files in HDFS

R computations in Hadoop:

– rmr2: write map-reduce tasks in R to run in Hadoop

– plyrmr: R-based data manipulation computations on data in Hadoop

RHadoop Wiki: github.com/RevolutionAnalytics/RHadoop/wiki

Page 22: Introducing Revolution R Open: Enhanced, Open Source R distribution from Revolution Analytics

22

Word count in RHadoop Map:

– Input: lines of text

– Output: words with key value 1

Reduce:

– Input: Words with several key values

– Output: words with counts

Map-Reduce:

– Apply map to lines of text

– Gather like words together and count

Page 23: Introducing Revolution R Open: Enhanced, Open Source R distribution from Revolution Analytics

Word count: execution

23

More: Video replay of “Using R with

Hadoop” by Jeffrey Breen

http://bit.ly/W35PLR

Page 24: Introducing Revolution R Open: Enhanced, Open Source R distribution from Revolution Analytics

ParallelR

foreach replaces for loops

– Minimal code change required

Choice of parallel backends

– doParallel (base “parallel”)

– doMC (multi-core servers)

– doSNOW (grids)

Iterations run in parallel

– Speedups depend on backend,

“granularity”

All iterations run in-memory

24

birthday <- function(n) {

m <- 10000

x <- numeric(m)

for(i in 1:m) {

b <- sample(1:365, n, repl=T)

x[i] <- ifelse(length(unique(b))==n,0,1)

}

mean(x) # est prob of at least 1 match

}

for(j in 1:100) birthday(j)

library("doMC")

registerDoMC(2)

x <- foreach(j=1:100) %dopar% birthday(j)

2-core MacBook Air: 21.9s

2-core MacBook Air: 12.0s

Page 25: Introducing Revolution R Open: Enhanced, Open Source R distribution from Revolution Analytics

Introducing

Revolution R Plus

Page 26: Introducing Revolution R Open: Enhanced, Open Source R distribution from Revolution Analytics

26

Revolution R Plus includes: AdviseR™ Technical Support for:

– Revolution R Open

• Including R, base and recommended packages

– Reproducible R Toolkit

– ParallelR: Parallel programming with R

– RHadoop: R integration with Hadoop

– DeployR Open: Secure deployment of R to applications

Open Source Assurance for all supported components

– Provides legal indemnity for subscribers

Workstation subscriptions: $1,800 per year

– Server and Hadoop subscriptions also available

Page 27: Introducing Revolution R Open: Enhanced, Open Source R distribution from Revolution Analytics

27

AdviseR™ Technical Support Technical support for R, from the R experts.

10x5 email and phone support (in your local time zone)

Full support for R, validated packages, and third-party software

connections

Notifications of updates and bug fixes

On-line case management and knowledgebase

Access to technical resources, documentation and user forums

Defined service-level agreements for rapid responses

Included with Revolution R Plus and Revolution R Enterprise.

Page 28: Introducing Revolution R Open: Enhanced, Open Source R distribution from Revolution Analytics

28

Open Source Assurance Revolution Analytics will defend Revolution R Plus subscribers should a

third party make an intellectual property claim against covered open

source software with respect to:

– copyrights, patents, trademarks, trade secrets

Covered software includes:

– Revolution R Open (incl. R base and recommended packages), Reproducible R

Toolkit, DeployR Open, ParallelR, RHadoop

Revolution Analytics will defend open source software in court

– If necessary, Revolution Analytics will obtain rights, modify, or replace software

found to be infringing

– If a resolution can’t be found, fees paid in past 12 months will be refunded.

Page 29: Introducing Revolution R Open: Enhanced, Open Source R distribution from Revolution Analytics

29

The Revolution R Product Suite

• Free and open source R distribution

• Enhanced and distributed by Revolution Analytics

Revolution R Open

• Open-source distribution of R, packages, and other components

• Enhanced, supported and indemnified by Revolution Analytics

Revolution R Plus

• Secure, Scalable and Supported Distribution of R

• With proprietary components created by Revolution Analytics

Revolution R Enterprise

Page 30: Introducing Revolution R Open: Enhanced, Open Source R distribution from Revolution Analytics

Revolution R Enterprise (RRE) The All-Inclusive Big Data Big Analytics Platform

DistributedR

DeployR DevelopR

ScaleR

ConnectR

High-performance open source R plus:

Data source connectivity to big-data objects

Big-data advanced analytics

Multi-platform environment support

In-Hadoop and in-Teradata predictive modeling

Visual Studio IDE option

Secure, Scalable R Deployment

Technical support, training and services

– 24x7 support option

30

Contact Revolution Analytics for more info: www.revolutionanalytics.com/contact-us

Page 31: Introducing Revolution R Open: Enhanced, Open Source R distribution from Revolution Analytics

31

Poll #2 Which Revolution Analytics projects do you plan to use (or already use?)

Select all that apply:

1. Revolution R Open (free distribution)

2. Revolution R Plus (paid subscription for support and indemnification)

3. Reproducible R Toolkit (checkpoint package)

4. DeployR Open

5. Rhadoop / ParallelR

Page 32: Introducing Revolution R Open: Enhanced, Open Source R distribution from Revolution Analytics

32

Wrapping up… Revolution R Open is available now from

mran.revolutionanalytics.com/download

Explore Revolution Analytics open-source projects at

projects.revolutionanalytics.com

Technical support and open-source assurance with

Revolution R Plus

www.revolutionanalytics.com/plus

David Smith Chief Community Officer

Revolution Analytics @revodavid

[email protected]

m

Page 33: Introducing Revolution R Open: Enhanced, Open Source R distribution from Revolution Analytics

Thank you.

Next up:

Batter Up! Advanced Sports Analytics with R and Storm

December 11, 2014

revolutionanalytics.com/webinars

www.revolutionanalytics.com

1.855.GET.REVO

Twitter: @RevolutionR