real-time big data analytics: from deployment to production
DESCRIPTION
TRANSCRIPT
1
Real-Time Big Data Analytics
From Deployment to Production
David SmithRevolution Analytics
@revodavid
2
WHAT’S UP
WITH THAT?
3
REAL TIME
BIG DATA
PREDICTIVE ANALYTICS
Buzzword Bingo!
4Photo: Sarah&Boston (flickr: pocheco) Creative Commons BY-SA 2.0
5
Predictive Analytics Model
Factors
Scores
”IO VAPOURA” by Jaya Prime flickr.com/photos/sanjayaprime/4924462993 CC-BY 2.0
Decision TreeLogistic RegressionNeural NetworkK-means clusteringEnsemble Model
Predictive Model
User IDBrowserTime/Date / LocationPrevious purchasesFriend data
Any known information
Product of most interestOffer of most likely saleMost relevant linkForecast sale valueOptimal Bid
Prediction or Selection
Scoring Rules
"CLOCK" by Heiko Klingele flickr.com/photos/divdax/3458668053/ CC-BY 2.0
6
Real-time Deployment
1. Data distillation2. Model development and
validation3. Model deployment4. Real-time model
scoring5. Model refresh
7
1. Data Distillation in Hadoop
Unstructured
Data
Analytics Data Mart
Structured Data
Log Files
Sensor Streams
Language Text
HDFS LoadMap-
Reducermr
8
2. The Model Development Cycle
Feature SelectionSamplingAggregat
ion
Variable Trans-
formation
Model Estimatio
n
Model Refinem
ent
Model Compari
son / Bench-
marking
Structured Data Predictive Model
R White Paperbit.ly/r-is-hot
9
3: Deployment Options
Unknown factorsSQL / Rules EngineCode (C++, Java, R, Hadoop)PMML Engine
Factors known in advanceBatch Lookup Tables
Factors
Scores
10
Why did I buy that blender?
Just browsing in the mallTV ad / magazine adCoupon in the mail“Just moved” promo emailWebstore recommendationBrowsing catalog
11
UpStream: Attribution Modeling
• ETL
• Marketing channel data
• Behavioral variables
• Promotional data
• Overlay data
• Exploratory data analysis• Time-to-event models• GAM survival models
• Scoring for inference
• Scoring for prediction
• 5 billion scores per day per retailer
UPSTREAM DATA FORMAT
CUSTOM VARIABLES (PMML)
4. Model Scorin
g
13
5. Model refresh Factors
Scores
Actual Outcomes
14
Big Data
Real Time
Kilobytes/Sec
Megabytes/Sec
Gigabytes Terabytes
Petabytes Exabytes
Seconds
Milliseconds
Minutes
Minutes Hours
15
PREDICTIVE ANALYTICSBIG DATA
REAL TIME
WHAT’S UP WITH THAT?
16
www.revolutionanalytics.com +1 650 646 9545 Twitter: @RevolutionR
The leading enterprise provider of software and services for Open Source R
Real-Time Big Data Predictive Analytics: From Deployment to Production
Booth 618 / Office Hours Weds 1:30PM
David Smith@revodavid