introducing revolution r open: enhanced, open source r distribution from revolution analytics
TRANSCRIPT
Introducing
Revolution R Open The Enhanced R Distribution
November 12, 2014
In today’s webinar:
R Update
Revolution R Open
The Reproducible R Toolkit
MRAN
Other open-source projects
• DeployR Open
• ParallelR
• Rhadoop
Revolution R Plus
Q&A
David Smith Chief Community Officer
Revolution Analytics @revodavid
Editor, blog.revolutionanalytics.com
Co-author, “Introduction to R”
3
OUR COMPANY
The leading provider
of advanced analytics
software and services
based on open source R,
since 2007
OUR PRODUCT
REVOLUTION R: The
enterprise-grade predictive
analytics application platform
based on the R language
SOME KUDOS
Visionary
Gartner Magic Quadrant
for Advanced Analytics
Platforms, 2014
What is R? Most widely used data analysis software
• Used by 2M+ data scientists, statisticians and analysts
Most powerful statistical programming language
• Flexible, extensible and comprehensive for productivity
Create beautiful and unique data visualizations
• As seen in New York Times, Twitter and Flowing Data
Thriving open-source community
• Leading edge of analytics research
Fills the Data Science talent gap
• New graduates prefer R
www.revolutionanalytics.com/what-is-r
5
Poll #1
What software do you use for statistical analysis? (Select all that apply.)
R
SAS
SPSS
Python
Other
6
R’s popularity is growing rapidly More at blog.revolutionanalytics.com/popularity
R Usage Growth Rexer Data Miner Survey, 2007-2013
• Rexer Data Miner Survey • IEEE Spectrum, July 2014
#9: R
Language Popularity IEEE Spectrum Top Programming Languages
7
Revolution R Open is:
Enhanced Open Source R distribution
Compatible with all R-related software
Multi-threaded for performance
Focus on reproducibility
Open source (GPLv2 license)
Available for Windows, Mac OS X, Ubuntu,
Red Hat and OpenSUSE
Download from mran.revolutionanalytics.com
8
Multi-threaded performance Intel MKL replaces standard
BLAS/LAPACK algorithms
Pipelined operations
– Optimized for Intel, works for all archs
High-performance algorithms
Sequential Parallel
– Uses as many threads as there are
available cores
– Control with: setMKLthreads(<value>)
No need to change any R code
Included in RRO binary distribution
More at Revolutions blog
9
100% Compatibility Built on latest R engine
– Currently R 3.1.1, R 3.1.2 in testing
100% compatible with
– R scripts
– R packages
– Applications with R connections
Designed to work with Rstudio
– No configuration required
Replaces existing R application
– Side-by-side installations
Reproducibility – why do we care? Academic / Research
Verify results
Advance Research
Business
Production code
Reliability
Reusability
Collaboration
Regulation
10
www.nytimes.com/2011/07/08/health/research/08genes.html
http://arxiv.org/pdf/1010.1092.pdf
11
An R Reproducibility Problem
Adapted from http://xkcd.com/234/ CC BY-NC 2.5
12
Reproducible R Toolkit in RRO
Static CRAN mirror – CRAN packages fixed with each Revolution R Open update
Daily CRAN snapshots – Storing every package version since September 2014
– Binaries and sources
– At mran.revolutionanalytics.com/snapshot
Easily write and share scripts synced to a specific snapshot – “checkpoint” package installed with RRO
CRAN
R R Daily
snapshots
http://mran.revolutionanalytics.com/snapshot/
checkpoint
package
library(checkpoint)
checkpoint("2014-09-17")
CRAN mirror
http://cran.revolutionanalytics.com/
checkpoint
server
Midnight
UTC
13
Using checkpoint Easy to use: add 2 lines to the top of each script
library(checkpoint)
checkpoint("2014-09-17")
For the package author:
– Use package versions available on the chosen date
– Installs packages local to this project
• Allows different package versions to be used simultaneously
For a script collaborator:
– Automatically installs required packages
• Detects required packages (no need to manually install!)
– Uses same package versions as script author to ensure reproducibility
14
MRAN: The Managed R Archive Network Download Revolution R
Open
Learn about R and RRO
Daily CRAN snapshots
Explore Packages
– and dependencies
Explore Task Views
Revolution Analytics
Open Source Projects More at projects.revolutionanalytics.com
16
DeployR Open Goal: embed results from R scripts into
existing applications, in real time
Problem:
– Exposing arbitrary R functions is unwise
– Need to handle concurrent R sessions
Solution: DeployR Open
– R, on a server, behind a firewall
– Repository Manager defines entry points
• Expose only authorized R functions
– Automatically creates Web Services APIs
– Manages and monitors pool of R sessions
– Separates roles for R and app developer
DeployR Open: for prototyping integrations
– Revolution R Enterprise adds grid-scaling and
enterprise authentication
More at deployr.revolutionanalytics.com
17
DeployR : Integration DeployR does not provide any application UI.
3 integration modes embed real-time R results into existing interfaces
Web app, mobile app, desktop app, BI tool, Excel, …
RBroker Framework (tutorial):
Simple, high-performance API for Java, .NET and Javascript apps
Supports transactional, on-demand analytics on a stateless R session
Client Libraries (tutorial):
Flexible control of R services from Java, .NET and Javascript apps
Also supports stateful R integrations (e.g. complex GUIs)
DeployR Web Services API:
Integrate R using almost any client languages
Only available in Revolution R Enterprise DeployR
18
DeployR : Security / Scalability Layers 1. Anonymous execution
– Only authorized, user-defined R functions accessible
– No state preserved
2. Basic username / password authentication
– Managed in DeployR Administration Console
3. Enterprise Authentication
– Verifies identify with SSO / LDAP / Active Directory / PAM
4. Adaptive load-balancing grid
– Ensures service availability
DeployR Open demo
Fraud detection
19
20
RHadoop and ParallelR Toolkits for data scientists and numerical analysts to create custom
parallel and distributed algorithms
ParallelR: parallel programming for multi-CPU servers and grids
RHadoop: map-reduce programming in R language
Mainly useful for “embarrassingly parallel” problems, where parallel
components work with small amounts of data
Big Data Predictive Analytics mostly not embarrassingly parallel
80+ pre-built “parallel external memory algorithms” included with
Revolution R Enterprise
21
RHadoop Collection of packages for interfacing R and Hadoop
Client (desktop) R interface to Hadoop:
– rhdfs: Browse, read, write and modify files stored in HDFS
– rhbase: Browse, read, write and modify tables stored in HBASE
– ravro: Read, write and run map-reduce on Apache Avro files in HDFS
R computations in Hadoop:
– rmr2: write map-reduce tasks in R to run in Hadoop
– plyrmr: R-based data manipulation computations on data in Hadoop
RHadoop Wiki: github.com/RevolutionAnalytics/RHadoop/wiki
22
Word count in RHadoop Map:
– Input: lines of text
– Output: words with key value 1
Reduce:
– Input: Words with several key values
– Output: words with counts
Map-Reduce:
– Apply map to lines of text
– Gather like words together and count
Word count: execution
23
More: Video replay of “Using R with
Hadoop” by Jeffrey Breen
http://bit.ly/W35PLR
ParallelR
foreach replaces for loops
– Minimal code change required
Choice of parallel backends
– doParallel (base “parallel”)
– doMC (multi-core servers)
– doSNOW (grids)
Iterations run in parallel
– Speedups depend on backend,
“granularity”
All iterations run in-memory
24
birthday <- function(n) {
m <- 10000
x <- numeric(m)
for(i in 1:m) {
b <- sample(1:365, n, repl=T)
x[i] <- ifelse(length(unique(b))==n,0,1)
}
mean(x) # est prob of at least 1 match
}
for(j in 1:100) birthday(j)
library("doMC")
registerDoMC(2)
x <- foreach(j=1:100) %dopar% birthday(j)
2-core MacBook Air: 21.9s
2-core MacBook Air: 12.0s
Introducing
Revolution R Plus
26
Revolution R Plus includes: AdviseR™ Technical Support for:
– Revolution R Open
• Including R, base and recommended packages
– Reproducible R Toolkit
– ParallelR: Parallel programming with R
– RHadoop: R integration with Hadoop
– DeployR Open: Secure deployment of R to applications
Open Source Assurance for all supported components
– Provides legal indemnity for subscribers
Workstation subscriptions: $1,800 per year
– Server and Hadoop subscriptions also available
27
AdviseR™ Technical Support Technical support for R, from the R experts.
10x5 email and phone support (in your local time zone)
Full support for R, validated packages, and third-party software
connections
Notifications of updates and bug fixes
On-line case management and knowledgebase
Access to technical resources, documentation and user forums
Defined service-level agreements for rapid responses
Included with Revolution R Plus and Revolution R Enterprise.
28
Open Source Assurance Revolution Analytics will defend Revolution R Plus subscribers should a
third party make an intellectual property claim against covered open
source software with respect to:
– copyrights, patents, trademarks, trade secrets
Covered software includes:
– Revolution R Open (incl. R base and recommended packages), Reproducible R
Toolkit, DeployR Open, ParallelR, RHadoop
Revolution Analytics will defend open source software in court
– If necessary, Revolution Analytics will obtain rights, modify, or replace software
found to be infringing
– If a resolution can’t be found, fees paid in past 12 months will be refunded.
29
The Revolution R Product Suite
• Free and open source R distribution
• Enhanced and distributed by Revolution Analytics
Revolution R Open
• Open-source distribution of R, packages, and other components
• Enhanced, supported and indemnified by Revolution Analytics
Revolution R Plus
• Secure, Scalable and Supported Distribution of R
• With proprietary components created by Revolution Analytics
Revolution R Enterprise
Revolution R Enterprise (RRE) The All-Inclusive Big Data Big Analytics Platform
DistributedR
DeployR DevelopR
ScaleR
ConnectR
High-performance open source R plus:
Data source connectivity to big-data objects
Big-data advanced analytics
Multi-platform environment support
In-Hadoop and in-Teradata predictive modeling
Visual Studio IDE option
Secure, Scalable R Deployment
Technical support, training and services
– 24x7 support option
30
Contact Revolution Analytics for more info: www.revolutionanalytics.com/contact-us
31
Poll #2 Which Revolution Analytics projects do you plan to use (or already use?)
Select all that apply:
1. Revolution R Open (free distribution)
2. Revolution R Plus (paid subscription for support and indemnification)
3. Reproducible R Toolkit (checkpoint package)
4. DeployR Open
5. Rhadoop / ParallelR
32
Wrapping up… Revolution R Open is available now from
mran.revolutionanalytics.com/download
Explore Revolution Analytics open-source projects at
projects.revolutionanalytics.com
Technical support and open-source assurance with
Revolution R Plus
www.revolutionanalytics.com/plus
David Smith Chief Community Officer
Revolution Analytics @revodavid
m
Thank you.
Next up:
Batter Up! Advanced Sports Analytics with R and Storm
December 11, 2014
revolutionanalytics.com/webinars
www.revolutionanalytics.com
1.855.GET.REVO
Twitter: @RevolutionR