efficient deployment of predictive analytics through open standards and cloud computing
DESCRIPTION
Efficient Deployment of Predictive Analytics through Open Standards and Cloud Computing. ACM SIGKDD Explorations Volume 11, Issue 1, July 2009. 報告人:黃啟智 學號: 69821503. Outline. Introduction Interoperability and Open Standards Putting Models to Work Performance Conclusion. - PowerPoint PPT PresentationTRANSCRIPT
Efficient Deployment of Predictive Analytics through Open Standards
and Cloud ComputingACM SIGKDD Explorations
Volume 11, Issue 1, July 2009
報告人:黃啟智 學號: 698215031
Outline
• Introduction• Interoperability and Open Standards• Putting Models to Work• Performance• Conclusion
2
Introduction
• Deployment and practical application of predictive model:– Limited choice of options– Often takes months for models to be integrated and
deployment(時間冗長 )– Custom coding or proprietary process(成本昂貴 )
• Open standards and Internet-based technologies are available to provide a more effective end-to-end solution for the deployment.
3
Introduction
• SOA : Service Oriented Architecture– For the design of loosely coupled IT systems(e.g.
based on Web Services)• SaaS : Software-as-a-Service– A license model– Vendors deliver software solutions as a cost-effect
service• PMML : Predictive Model Markup Language– A open standard that allows users to exchange
predictive models among various software tools
4
Interoperability and Open Standards
• Cloud Computing
Web Services
SaaS, IaaS, PaaS
Cloud Computing(an computing architecture)
SOAP
WSDL
UDDI
RPC
SOA
REST
(access)
(SOA-related standards)5
Interoperability and Open Standards
• Cloud Computing– Reduce cost and management overhead for IT– Shift in the geography of computation– The Internet as a platform– A set of services that provide computing resources– A variety of services:
Storage capacity, processing power, business application…– Cloud infrastructures
Amazon Web Service(AWS)Sector/SphereHadoop…
– The OCC, Open Cloud Consortium(www.opencloudconsortium.org)
6
Interoperability and Open Standards
• Web Service– W3C definition– Providing the foundation of SOA– Use XML to code and decode data– Use SOAP(Simple Object Access
Protocol) standard to transport data– Data can be easily exchanged between different
applications and platforms– Can be described by a WSDL(Web Service Description
Language) file– UDDI(Universal Description, Discovery, and Integration):a
platform independent XML-based registry for business to list themselvs on the Internet
http://zh.wikipedia.org
7
Interoperability and Open Standards
• A SOAP request for PMML file
(The file/model was previously uploaded to the service provider.)
8
A JDM(Java Data Mining) call
Interoperability and Open Standards
• SaaS – Software as a Service– A license model, users may access software via
the Internet(not actually “buy and install”)– Users only pay for the right for a certain time
period(e.g. NT$100 for an hour)– No upfront costs in setting up servers or software– Minimizing the risk of purchasing costly software
that may not provide adequate return of investment
– E.g. Salesforce.com, Google Apps.
9
Interoperability and Open Standards
• PMML-Predictive Model Markup Language– Developed by the Data Mining Group(www.dmg.org)– An open standard for representing data mining
models– An XML-based language– Can describe data preprocessing and predictive
algorithms– Can represent input data and data transformations
10
Interoperability and Open Standards
PMML Structure examples(a test data file)
Required (active)data fields Predicted data field11
Interoperability and Open Standards
PMML Structure examples
12
Interoperability and Open Standards
PMML Structure examples
Array of counts of different field values under different class labels
13
Interoperability and Open Standards
• PMML Model specifics (parameters, architecture) are defined under different model elements, including:– Neural Networks– Support Vector Machines– Regressions Models– Decision Trees– Association Rules– Clustering– Sequences– Naïve Bayes– Text Models– Rules
14
Interoperability and Open Standards
• PMML On-The-Go– PMML 4.0
Time series, boolean data types, model segmentation, lift/gain charts, expanded range of built-in functions…
– More applications support export and import functionality in PMML
– Open-source environments:KNIME(www.knime.org)The R project(www.R-project.org)
15
Putting Models to Work
• Amazon EC2– Elastic Compute Cloud– powered by Amazon Web Services
• ADAPA scoring engine– uses JDM(Java Data Mining) Web Service calls and therefore– allows for automatic decisions to be virtually embedded into
enterprise systems and applications– available as a service to minimize total cost
16
• Model Verification and Execution
Typical tasks in the life cycle of a data mining project:– Building, deploying, testing and using data mining models
(A cross-platform and multi-vendor environment)
Putting Models to Work
17
• Model Verification and Execution– Model testing/verification• To ensure that both the scoring engine and the model
development environment produce exactly the same result• It allows for a test file containing any number of
records with all the necessary input variables and the expected result for each record to be upload for score matching
Putting Models to Work
18
• Model Verification and Execution– Model execution• Batch mode: via the web console ,uploading a data file
containing records (in CSV format or zipped)• Real-Time mode: via web services,
embedded calls (SOAP request)
Putting Models to Work
instance
19
• Demo Excel-addin
Putting Models to Work
20
• Demo Excel-addin
Putting Models to Work
21
• Security on the Cloud– Uploading proprietary information to 3rd party
service → security and control questions– The engine should not store any data– An instance shares nothing with other instances– And instance is Private (via authentication)– Access to an instance only via HTTPS– Models and data are deleted after an instance is
terminated
Putting Models to Work
22
Performance
Instance type reference : http://aws.amazon.com/ec2/
23
Performance
24
Conclusion
• Cloud computingIt offers a powerful and revolutionizing way for putting
data mining models to work.• Open standard(PMML)
It helps predictive models to be easily accessed from anywhere in the enterprise (web-service calls or uploading data files).
• The combination of both accelerates the deployment of predictive models and makes it more affordable.
25
Questions
• Security (transmission via Internet, to a 3rd party vendors) 、 privacy
• High-dimensionality / Large databasetransmission time + processing time
26