agile deployment predictive analytics on hadoop

14
© 2012 Datameer, Inc. All rights reserved. Page 1 © 2012 Datameer, Inc. All rights reserved. Agile Deployment of Predictive Analytics on Hadoop Faster Insights through Open Standards Hadoop Summit 2012

Upload: hadoop-summit

Post on 15-May-2015

3.681 views

Category:

Technology


6 download

TRANSCRIPT

Page 1: Agile deployment predictive analytics on hadoop

© 2012 Datameer, Inc. All rights reserved. Page 1

© 2012 Datameer, Inc. All rights reserved.

Agile Deployment of Predictive Analytics on

Hadoop

Faster Insights through Open Standards

Hadoop Summit 2012

Page 2: Agile deployment predictive analytics on hadoop

© 2012 Datameer, Inc. All rights reserved. Page 2

Today’s Session

Ulrich Rueckert Michael Zeller Data Scientist CEO Datameer Zementis

After this session, you will be able to…

1. Effectively deliver predictive solutions combining: a.  R, KNIME & Others [Model Development] b.  Zementis Universal PMML Plug-in [Model Deployment & Execution] c.  Datameer [Scalable Hadoop Infrastructure]

2.  Identify PMML as a vendor-neutral & open standard to: a.  Incorporate predictive models from virtually any commercial vendor or open source tool b.  Apply such models on Big Data

3. Leverage a lightweight, agile deployment process for predictive analytics to: a.  Accelerate time-to-market b.  Lower cost and complexity c.  Reuse existing predictive assets

Page 3: Agile deployment predictive analytics on hadoop

© 2012 Datameer, Inc. All rights reserved. Page 3

§  “Business Intelligence on top of Hadoop” §  Established 2009 by Hadoop and enterprise software veterans §  Offices in Silicon Valley, New York and Germany

Who is Datameer?

§  Some customers:

Page 4: Agile deployment predictive analytics on hadoop

© 2012 Datameer, Inc. All rights reserved. Page 4

§  Focus on “Operational Predictive Analytics” §  Offices in San Diego and Hong Kong §  Predictive Analytics Software Technology:

•  ADAPA® Decision Engine (Predictive Models and Rules) •  ADAPA Add-in for Excel •  PMML Converter •  Universal PMML Plug-in (UPPI)

§  Global Partner Network

Who is Zementis?

Page 5: Agile deployment predictive analytics on hadoop

© 2012 Datameer, Inc. All rights reserved. Page 5

Big Data and Analytics

§  People and Sensor Data •  Transaction records •  Social media •  Climate information •  Mobile GPS signals •  Healthcare •  Smart Grid

§  Benefits from Analytics •  Descriptive Analytics answers “What happened?” •  Predictive Analytics answers “What will happen next?”

90% of the data today created in the last 2 years

Page 6: Agile deployment predictive analytics on hadoop

© 2012 Datameer, Inc. All rights reserved. Page 6

Operational Predictive Analytics

Score Distribution1st Lien Stand-Alone Loans

0%

2%

4%

6%

8%

10%

12%

14%

50 100

150

200

250

300

350

400

450

500

550

600

650

700

750

800

850

900

950

1000

Score

% W

ithin

Cla

ss

GoodsBadsPoly. (Goods)Poly. (Bads)

% of Delinquent Loans per Month

0

10

20

30

40

50

60

70

80

90

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov

Months

% o

f Del

inqu

ent L

oans

700750800850900950

Page 7: Agile deployment predictive analytics on hadoop

© 2012 Datameer, Inc. All rights reserved. Page 7

From Model Building to Deployment

Model Deployment Integration / Execution

Model Building

Datameer Server                

UPPI  

PMML  (models)  PMML  (models)  PMML  (models)  

Simple Deployment & Execution 1.  Upload PMML file(s) in DAS 2.  PMML turns into custom function 3.  Seamlessly score data in Datameer

PMML  

Page 8: Agile deployment predictive analytics on hadoop

© 2012 Datameer, Inc. All rights reserved. Page 8

PMML Predictive Model Markup Language

Transformations

•  PMML is an XML-based language used to define statistical and data mining models and to share these between compliant applications.

•  Mature standard developed by the DMG (Data Mining Group) to avoid proprietary issues and incompatibilities and to deploy models.

•  Supported by all leading data mining tools, commercial and open-source. •  Allows for the clear separation of tasks: Model development vs. model deployment.

•  Eliminates the need for custom code and proprietary model deployment solutions.

•  Uniform deployment platform ensures scalability and reliability of model execution.

PMML book available on Amazon.com

Page 9: Agile deployment predictive analytics on hadoop

© 2012 Datameer, Inc. All rights reserved. Page 9

PMML: Predictive Model Management Integrating across all systems and processes

Applications CRM, ERP, EXCEL, etc.

Business Process

PMML

IBM SmartCloud Amazon EC2

Page 10: Agile deployment predictive analytics on hadoop

© 2012 Datameer, Inc. All rights reserved. Page 10

Service Providers

Divisions

External Vendors

Applications

PMML

PMML: One Standard, One Process

Page 11: Agile deployment predictive analytics on hadoop

© 2012 Datameer, Inc. All rights reserved. Page 11

Demo Setup

§  End-to-end “Model Development Lifecycle” §  PMML Standard as the “Glue”

Universal  PMML    Plug-­‐In  

Model Design Development and Test

Model Deployment Data Analysis

Demonstrate Model Performance

Real-time Process Improvement and ROI

Understand Client’s Data

Build Model(s) to Unlock Hidden Value

Page 12: Agile deployment predictive analytics on hadoop

© 2012 Datameer, Inc. All rights reserved. Page 12

Demo: Annual Marketing Campaign

§  Which customers should we target?

§  Split 2011 results in training and test set

§  Learn model on training set §  Apply model on test set §  Fine-tune model until

evaluation shows success §  Apply final model on 2012

customer list

2011 Campaign

Results

Subset for Training

Subset for Testing

Prediction Model

2012Customer

List

Model Evaluation

CampaignCandidates

Fine-TunedPrediction

Model

Page 13: Agile deployment predictive analytics on hadoop

© 2012 Datameer, Inc. All rights reserved. Page 13

Summary

Ease of Use Fast ROI

Avoid Vendor Lock-in Hadoop-based

Scoring Paradigm

•  Open Standards vs. Proprietary Code

•  Best-of-Breed Tool Set

•  Minimize Data Movement •  Massively Parallel Execution •  Scale with Business Demand

•  Leverage Datameer UI •  Deploy in Minutes vs. Months •  No Coding Skills Required

Page 14: Agile deployment predictive analytics on hadoop

© 2012 Datameer, Inc. All rights reserved. Page 14

Online Resources

§  Learn More About PMML §  Data Mining Group website http://www.dmg.org

§  Join LinkedIn PMML Discussion Group http://www.linkedin.com/groupRegistration?gid=2328634 §  Articles, on-line videos, blogs http://www.zementis.com/community.htm

§  Product Info §  On Demand Webinar http://data.datameer.com/power-of-big-data-insights-of-predictive-analytics/ §  UPPI for Datameer http://www.zementis.com/DAS-plugin.htm