efficient deployment of predictive analytics through open standards and cloud computing

26
Efficient Deployment of Predictive Analytics through Open Standards and Cloud Computing ACM SIGKDD Explorations Volume 11, Issue 1, July 2009 報報報 報報報 報報 69821503 1

Upload: albany

Post on 13-Jan-2016

39 views

Category:

Documents


0 download

DESCRIPTION

Efficient Deployment of Predictive Analytics through Open Standards and Cloud Computing. ACM SIGKDD Explorations Volume 11, Issue 1, July 2009. 報告人:黃啟智 學號: 69821503. Outline. Introduction Interoperability and Open Standards Putting Models to Work Performance Conclusion. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Efficient Deployment of Predictive Analytics through Open Standards and Cloud Computing

Efficient Deployment of Predictive Analytics through Open Standards

and Cloud ComputingACM SIGKDD Explorations

Volume 11, Issue 1, July 2009

報告人:黃啟智 學號: 698215031

Page 2: Efficient Deployment of Predictive Analytics through Open Standards and Cloud Computing

Outline

• Introduction• Interoperability and Open Standards• Putting Models to Work• Performance• Conclusion

2

Page 3: Efficient Deployment of Predictive Analytics through Open Standards and Cloud Computing

Introduction

• Deployment and practical application of predictive model:– Limited choice of options– Often takes months for models to be integrated and

deployment(時間冗長 )– Custom coding or proprietary process(成本昂貴 )

• Open standards and Internet-based technologies are available to provide a more effective end-to-end solution for the deployment.

3

Page 4: Efficient Deployment of Predictive Analytics through Open Standards and Cloud Computing

Introduction

• SOA : Service Oriented Architecture– For the design of loosely coupled IT systems(e.g.

based on Web Services)• SaaS : Software-as-a-Service– A license model– Vendors deliver software solutions as a cost-effect

service• PMML : Predictive Model Markup Language– A open standard that allows users to exchange

predictive models among various software tools

4

Page 5: Efficient Deployment of Predictive Analytics through Open Standards and Cloud Computing

Interoperability and Open Standards

• Cloud Computing

Web Services

SaaS, IaaS, PaaS

Cloud Computing(an computing architecture)

SOAP

WSDL

UDDI

RPC

SOA

REST

(access)

(SOA-related standards)5

Page 6: Efficient Deployment of Predictive Analytics through Open Standards and Cloud Computing

Interoperability and Open Standards

• Cloud Computing– Reduce cost and management overhead for IT– Shift in the geography of computation– The Internet as a platform– A set of services that provide computing resources– A variety of services:

Storage capacity, processing power, business application…– Cloud infrastructures

Amazon Web Service(AWS)Sector/SphereHadoop…

– The OCC, Open Cloud Consortium(www.opencloudconsortium.org)

6

Page 7: Efficient Deployment of Predictive Analytics through Open Standards and Cloud Computing

Interoperability and Open Standards

• Web Service– W3C definition– Providing the foundation of SOA– Use XML to code and decode data– Use SOAP(Simple Object Access

Protocol) standard to transport data– Data can be easily exchanged between different

applications and platforms– Can be described by a WSDL(Web Service Description

Language) file– UDDI(Universal Description, Discovery, and Integration):a

platform independent XML-based registry for business to list themselvs on the Internet

http://zh.wikipedia.org

7

Page 8: Efficient Deployment of Predictive Analytics through Open Standards and Cloud Computing

Interoperability and Open Standards

• A SOAP request for PMML file

(The file/model was previously uploaded to the service provider.)

8

A JDM(Java Data Mining) call

Page 9: Efficient Deployment of Predictive Analytics through Open Standards and Cloud Computing

Interoperability and Open Standards

• SaaS – Software as a Service– A license model, users may access software via

the Internet(not actually “buy and install”)– Users only pay for the right for a certain time

period(e.g. NT$100 for an hour)– No upfront costs in setting up servers or software– Minimizing the risk of purchasing costly software

that may not provide adequate return of investment

– E.g. Salesforce.com, Google Apps.

9

Page 10: Efficient Deployment of Predictive Analytics through Open Standards and Cloud Computing

Interoperability and Open Standards

• PMML-Predictive Model Markup Language– Developed by the Data Mining Group(www.dmg.org)– An open standard for representing data mining

models– An XML-based language– Can describe data preprocessing and predictive

algorithms– Can represent input data and data transformations

10

Page 11: Efficient Deployment of Predictive Analytics through Open Standards and Cloud Computing

Interoperability and Open Standards

PMML Structure examples(a test data file)

Required (active)data fields Predicted data field11

Page 12: Efficient Deployment of Predictive Analytics through Open Standards and Cloud Computing

Interoperability and Open Standards

PMML Structure examples

12

Page 13: Efficient Deployment of Predictive Analytics through Open Standards and Cloud Computing

Interoperability and Open Standards

PMML Structure examples

Array of counts of different field values under different class labels

13

Page 14: Efficient Deployment of Predictive Analytics through Open Standards and Cloud Computing

Interoperability and Open Standards

• PMML Model specifics (parameters, architecture) are defined under different model elements, including:– Neural Networks– Support Vector Machines– Regressions Models– Decision Trees– Association Rules– Clustering– Sequences– Naïve Bayes– Text Models– Rules

14

Page 15: Efficient Deployment of Predictive Analytics through Open Standards and Cloud Computing

Interoperability and Open Standards

• PMML On-The-Go– PMML 4.0

Time series, boolean data types, model segmentation, lift/gain charts, expanded range of built-in functions…

– More applications support export and import functionality in PMML

– Open-source environments:KNIME(www.knime.org)The R project(www.R-project.org)

15

Page 16: Efficient Deployment of Predictive Analytics through Open Standards and Cloud Computing

Putting Models to Work

• Amazon EC2– Elastic Compute Cloud– powered by Amazon Web Services

• ADAPA scoring engine– uses JDM(Java Data Mining) Web Service calls and therefore– allows for automatic decisions to be virtually embedded into

enterprise systems and applications– available as a service to minimize total cost

16

Page 17: Efficient Deployment of Predictive Analytics through Open Standards and Cloud Computing

• Model Verification and Execution

Typical tasks in the life cycle of a data mining project:– Building, deploying, testing and using data mining models

(A cross-platform and multi-vendor environment)

Putting Models to Work

17

Page 18: Efficient Deployment of Predictive Analytics through Open Standards and Cloud Computing

• Model Verification and Execution– Model testing/verification• To ensure that both the scoring engine and the model

development environment produce exactly the same result• It allows for a test file containing any number of

records with all the necessary input variables and the expected result for each record to be upload for score matching

Putting Models to Work

18

Page 19: Efficient Deployment of Predictive Analytics through Open Standards and Cloud Computing

• Model Verification and Execution– Model execution• Batch mode: via the web console ,uploading a data file

containing records (in CSV format or zipped)• Real-Time mode: via web services,

embedded calls (SOAP request)

Putting Models to Work

instance

19

Page 20: Efficient Deployment of Predictive Analytics through Open Standards and Cloud Computing

• Demo Excel-addin

Putting Models to Work

20

Page 21: Efficient Deployment of Predictive Analytics through Open Standards and Cloud Computing

• Demo Excel-addin

Putting Models to Work

21

Page 22: Efficient Deployment of Predictive Analytics through Open Standards and Cloud Computing

• Security on the Cloud– Uploading proprietary information to 3rd party

service → security and control questions– The engine should not store any data– An instance shares nothing with other instances– And instance is Private (via authentication)– Access to an instance only via HTTPS– Models and data are deleted after an instance is

terminated

Putting Models to Work

22

Page 23: Efficient Deployment of Predictive Analytics through Open Standards and Cloud Computing

Performance

Instance type reference : http://aws.amazon.com/ec2/

23

Page 24: Efficient Deployment of Predictive Analytics through Open Standards and Cloud Computing

Performance

24

Page 25: Efficient Deployment of Predictive Analytics through Open Standards and Cloud Computing

Conclusion

• Cloud computingIt offers a powerful and revolutionizing way for putting

data mining models to work.• Open standard(PMML)

It helps predictive models to be easily accessed from anywhere in the enterprise (web-service calls or uploading data files).

• The combination of both accelerates the deployment of predictive models and makes it more affordable.

25

Page 26: Efficient Deployment of Predictive Analytics through Open Standards and Cloud Computing

Questions

• Security (transmission via Internet, to a 3rd party vendors) 、 privacy

• High-dimensionality / Large databasetransmission time + processing time

26