revolution analytics

23
Revolution Confidential Revolution Analytics Overview of Revolution R Enterprise Joseph B. Rickert, Marketing Manager For the Dallas R User’s Group

Upload: edythe

Post on 22-Feb-2016

66 views

Category:

Documents


0 download

DESCRIPTION

Revolution Analytics. Overview of Revolution R Enterprise. Joseph B. Rickert , Marketing Manager. For the Dallas R User’s Group. Agenda. Revolution Analytics Today Revolution R Enterprise Revolution Analytics in the Enterprise Big Data with RevoScaleR - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Revolution Analytics

Revolution Confidential

Revolution Analytics

Overview of Revolution R Enterprise

Joseph B. Rickert, Marketing Manager

For the Dallas R User’s Group

Page 2: Revolution Analytics

Revolution Confidential

2

Agenda Revolution Analytics Today Revolution R Enterprise Revolution Analytics in the Enterprise Big Data with RevoScaleR Deploying R Throughout the Enterprise with

RevoDeployR

Page 3: Revolution Analytics

Revolution Confidential

3

Corporate Overview & Quick Facts

Founded 2008 (as REvolution Computing)

Office Locations Palo Alto (HQ), Seattle (Eng)

CEO David Rich

Number of Employees 40+

Number of customers 100+

Investors Northbridge Venture Partners, Intel Capital, Presidio Ventures

“Revolution Analytics is the leading commercial provider of software and support for the

open-source R statistical computing language.”

Page 4: Revolution Analytics

Revolution Confidential

4

OPEN SOURCE ANALYTICS FOR THE ENTERPRISE

The professor who invented analytic software for the experts now wants to take it to the masses

Most advanced statistical analysis software available

Half the cost of commercial alternatives

2M+ Users

2,500+ Applications

Statistics

Predictive Analytics

Data Mining

Visualization

Finance

Life Sciences

Manufacturing

Retail

Telecom

Social Media

Government

Power

Productivity

Enterprise Readiness

Page 5: Revolution Analytics

Revolution Confidential

5

Revolution R Enterprise

Productivity

Page 6: Revolution Analytics

Revolution Confidential

6

Revolution R Enterprise has Open-Source R Engine at the core

2,500 community packages and growing exponentially

R Engine Language Libraries

Community Packages

Technical Support

Web ServicesAPI

Big DataAnalysis

DeveloperIDE

BuildAssurance

ParallelTools

Multi-ThreadedMath Libraries

Page 7: Revolution Analytics

Revolution Confidential

7

A network of partners for integrated, large-scale data analysis

Advanced Analytics

Deployment / Consumption

Data Infrastructure

Page 8: Revolution Analytics

Revolution Confidential

8

Revolution R Enterprise

Performance

Page 9: Revolution Analytics

Performance: Intel MKL Math Libraries

OpenSource R

Revolution R Enterprise

Computation (4-core laptop) Open Source R2.13.2

Revolution R Enterprise5.0

Speedup(4-core laptop)

Linear Algebra1

Matrix Multiply 174.6 sec 10.4 sec 15.8x

Cholesky Factorization 25.7 sec 1.4 sec 17.6x

Linear Discriminant Analysis 224.4 sec 20.1 sec 7.6x

General R Benchmarks2

R Benchmarks (Matrix Functions) 24.9 sec 3.8 sec 5.5x

R Benchmarks (Program Control) 4.7 sec 4.6 sec Not appreciable

1. http://www.revolutionanalytics.com/why-revolution-r/benchmarks.php2. http://r.research.att.com/benchmarks/

Page 10: Revolution Analytics

Revolution Confidential

10

Revolution R Enterprise

Big Data Analysis

Page 11: Revolution Analytics

Revolution Confidential

11

Hadoop File Based In-database

A common analytic platform across big data architectures

Page 12: Revolution Analytics

Revolution Confidential

12

Two Big Data problems: capacity and speed

Capacity: problems handling the size of data sets or models Data too big to fit into memory Even if it can fit, there are limits on what can be

done Even simple data management can be

extremely challenging Speed: even without a capacity limit,

computation may be too slow to be useful

Page 13: Revolution Analytics

Revolution Confidential

13

RevoScaleR: Big Data Analysis for Revolution R Enterprise

DistributedStatisticalAlgorithms

External Memory Programming Framework

XDF File Format

R LanguageInterface

Addresses performance by distributing computations between cores and computers

Addresses capacity through a

collection of functions for

chunking through massive data files

A novel high-speed file format designed specifically to support statistical analyses

Familiar, high-prodictivity

programming paradigm for R

users

Page 14: Revolution Analytics

Revolution ConfidentialThe basis for a solution for capacity, speed, distributed and streaming data – PEMA’s Parallel external memory algorithms

(PEMA’s) allow solution of both capacity and speed problems, and can deal with distributed and streaming data

External memory algorithms are those that allow computations to be split into pieces so that not all data has to be in memory at one time

It is possible to “automatically” parallelize and distribute such algorithms

14

Page 15: Revolution Analytics

Revolution Confidential

Core 0(Thread 0)

Core n(Thread n)

Core 2(Thread 2)

Core 1(Thread 1)

Multicore Processor (4, 8, 16+ cores)

DataData Data

Disk

RevoScaleR

Shared Memory

• A RevoScaleR algorithm is provided a data source as input• The algorithm loops over data, reading a block at a time. Blocks of data are read by a separate worker thread

(Thread 0).• Other worker threads (Threads 1..n) process the data block from the previous iteration of the data loop and update

intermediate results objects in memory• When all of the data is processed a master results object is created from the intermediate results objects

RevoScaleR on a Multicore Server

Page 16: Revolution Analytics

Revolution Confidential

16

Compute Node

(RevoScaleR)

Compute Node

(RevoScaleR) Master Node

(RevoScaleR)

DataPartition

DataPartition

Compute Node

(RevoScaleR)

Compute Node

(RevoScaleR)

DataPartition

DataPartition

• Portions of the data source are made available to each compute node

• RevoScaleR on the master node assigns a task to each compute node

• Each compute node independently processes its data, and returns it’s intermediate results back to the master node

• master node aggregates all of the intermediate results from each compute node and produces the final result

RevoScaleR for Distributed Computing Clusters

Page 17: Revolution Analytics

Revolution Confidential

17

Platform-agnostic Big Data Analytics Set “compute context” to define hardware (one line of code)

Native job-scheduler handles distribution, monitoring, failover etc. Same code runs on other supported architectures

Just change compute context Supported architectures:

Windows: Microsoft HPC Server Linux: Platform Computing LSF (coming 2012)

42 seconds instead of 6 minutes

Page 18: Revolution Analytics

Revolution Confidential

18

R and Hadoop Hadoop offers a scalable infrastructure for

processing massive amounts of data Storage – HDFS, HBASE Distributed Computing - MapReduce

R is a statistical programming language for developing advanced analytic applications

Currently, writing analytics for Hadoop requires a combination of Java, pig, Python, …

The Rhadoop project makes it possible to write PEMAs for Hadoop using the R language alone.

Page 19: Revolution Analytics

Revolution Confidential

19

Massively parallel/distributed analytics:RevoConnectR for Hadoop

Revolution R Client

R

Map or Reduce

Job Tracker

Task Node

HDFS

HBASE

Thrift

rhdfs - R and HDFS rhbase - R and HBASE rmr - R and MapReduce

Write Map-Reduce analytics using only R code with these R packages:

rmr

rhdfs rhbase

More information at:bit.ly/r-hadoop

Page 20: Revolution Analytics

Revolution Confidential

20

In-Database Execution with IBM Netezza

Page 21: Revolution Analytics

Revolution Confidential

21

Revolution R Enterprise

Enterprise Deployment

Page 22: Revolution Analytics

Revolution Confidential

22

Revolution R Web Services: RevoDeployR Data Sources

& Creation of Analytics

R / Statistical Modeling Expert

Revolution “RevoDeployR”

Data Analysis

Business Intelligence

Interactive Web Apps

Cloud / SaaS

Consumption of Analytics & Results

DeploymentExpert

Page 23: Revolution Analytics

Revolution Confidential

24

Thank you.

www.revolutionanalytics.com 650.646.9545 Twitter: @RevolutionR

The leading commercial provider of software and support for the popular open source R statistics language.