real-time big data analytics: from deployment to production

16
Real-Time Big Data Analytics From Deployment to Production 1 David Smith Revolution Analytics @revodavid

Upload: revolution-analytics

Post on 26-Jan-2015

10.526 views

Category:

Documents


1 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Real-time Big Data Analytics: From Deployment to Production

1

Real-Time Big Data Analytics

From Deployment to Production

David SmithRevolution Analytics

@revodavid

Page 2: Real-time Big Data Analytics: From Deployment to Production

2

WHAT’S UP

WITH THAT?

Page 3: Real-time Big Data Analytics: From Deployment to Production

3

REAL TIME

BIG DATA

PREDICTIVE ANALYTICS

Buzzword Bingo!

Page 4: Real-time Big Data Analytics: From Deployment to Production

4Photo: Sarah&Boston (flickr: pocheco) Creative Commons BY-SA 2.0

Page 5: Real-time Big Data Analytics: From Deployment to Production

5

Predictive Analytics Model

Factors

Scores

”IO VAPOURA” by Jaya Prime flickr.com/photos/sanjayaprime/4924462993 CC-BY 2.0

Decision TreeLogistic RegressionNeural NetworkK-means clusteringEnsemble Model

Predictive Model

User IDBrowserTime/Date / LocationPrevious purchasesFriend data

Any known information

Product of most interestOffer of most likely saleMost relevant linkForecast sale valueOptimal Bid

Prediction or Selection

Scoring Rules

Page 6: Real-time Big Data Analytics: From Deployment to Production

"CLOCK" by Heiko Klingele flickr.com/photos/divdax/3458668053/ CC-BY 2.0

6

Real-time Deployment

1. Data distillation2. Model development and

validation3. Model deployment4. Real-time model

scoring5. Model refresh

Page 7: Real-time Big Data Analytics: From Deployment to Production

7

1. Data Distillation in Hadoop

Unstructured

Data

Analytics Data Mart

Structured Data

Log Files

Sensor Streams

Language Text

HDFS LoadMap-

Reducermr

Page 8: Real-time Big Data Analytics: From Deployment to Production

8

2. The Model Development Cycle

Feature SelectionSamplingAggregat

ion

Variable Trans-

formation

Model Estimatio

n

Model Refinem

ent

Model Compari

son / Bench-

marking

Structured Data Predictive Model

R White Paperbit.ly/r-is-hot

Page 9: Real-time Big Data Analytics: From Deployment to Production

9

3: Deployment Options

Unknown factorsSQL / Rules EngineCode (C++, Java, R, Hadoop)PMML Engine

Factors known in advanceBatch Lookup Tables

Factors

Scores

Page 10: Real-time Big Data Analytics: From Deployment to Production

10

Why did I buy that blender?

Just browsing in the mallTV ad / magazine adCoupon in the mail“Just moved” promo emailWebstore recommendationBrowsing catalog

Page 11: Real-time Big Data Analytics: From Deployment to Production

11

UpStream: Attribution Modeling

Page 12: Real-time Big Data Analytics: From Deployment to Production

• ETL

• Marketing channel data

• Behavioral variables

• Promotional data

• Overlay data

• Exploratory data analysis• Time-to-event models• GAM survival models

• Scoring for inference

• Scoring for prediction

• 5 billion scores per day per retailer

UPSTREAM DATA FORMAT

CUSTOM VARIABLES (PMML)

4. Model Scorin

g

Page 13: Real-time Big Data Analytics: From Deployment to Production

13

5. Model refresh Factors

Scores

Actual Outcomes

Page 14: Real-time Big Data Analytics: From Deployment to Production

14

Big Data

Real Time

Kilobytes/Sec

Megabytes/Sec

Gigabytes Terabytes

Petabytes Exabytes

Seconds

Milliseconds

Minutes

Minutes Hours

Page 15: Real-time Big Data Analytics: From Deployment to Production

15

PREDICTIVE ANALYTICSBIG DATA

REAL TIME

WHAT’S UP WITH THAT?

Page 16: Real-time Big Data Analytics: From Deployment to Production

16

www.revolutionanalytics.com +1 650 646 9545 Twitter: @RevolutionR

The leading enterprise provider of software and services for Open Source R

Real-Time Big Data Predictive Analytics: From Deployment to Production

Booth 618 / Office Hours Weds 1:30PM

David Smith@revodavid