machine learning with r as a servicedownload.microsoft.com/download/6/5/0/65023338-ae... · word...
TRANSCRIPT
29/10/2015
1
Data Platform Airlift21 de Outubro \\ Microsoft Lisbon Experience
Machine Learning with R as a Service
Manuel Dias
Business Analytics Lead, Microsoft
Predictive Analytics Sales and marketing
Finance and risk
Customer and channel
Operations and workforce
Utilities, Oil & Gas
Agent Allocation
Warehouse Efficiency
Smart buildings
Predictive Maintenance
Supply chain optimization
User Segmentation
Personalized Offers
Product Recommendation
Fraud Detection
Credit risk management
Sales Forecasting
Demand Forecasting
Sales Lead Scoring
Marketing mix optimization
Energy Forecasting
Grid Optimization
Theft Prevention
Predictive maintenance
Demand Response
Customer Profiling
Credit Scoring
Revenue Forecasting
29/10/2015
2
Harvard Business, Thomas H. Davenport , October 2012
What is Machine Learning?
Predictive computing
systems become smarter
with experience
We want to learn a mapping from the input to the output; correct
values are provided by supervisor:
• Fraud Detection
• Image Recognition
• SPAM Filter
• Sales Forecast
We want to find regularities in the data. The class labels of training
data is unknown.
• Customer Segmentation
• Movies Recommendation engine
29/10/2015
3
Azure ML
Identify outliers on
the running data
Predict numerical
outcomes
Explore associations
between cases
Discover natural
groupings of cases
Classification Anomaly Detection RecommendersRegression Clustering
Predict what class
case belongs to
Supervised Learning Unsupervised Learning
R and
Python
Mathematical
Programming
Online
analytical
processing
Graph
analytics
Text
analytics
Support
Vector
Machines
Boosted
Decision
Trees
Time series
processing
In the future
Support for
extensibility
by enabling
users to add
their own
algorithms as
modules
Associative
rule mining
Neural
networks
Regression
analysisClustering
Nearest-
neighbor
29/10/2015
4
Azure Machine Learning
DATA
HDInsight
SQL Server VM
SQL Database
Blobs & Tables
Desktop files
Excel spreadsheet
Other data files on PC
Azure Machine
Learning
ML Studio
Azure Machine Learning
ML Marketplace
Devices & Applications
Publish API
Get Historical Data
Feature Engineering
Evaluate Model
Define Model
Score Model
Train/ Test Split
Train Model
Iterate until the test
metrics are satisfactory
29/10/2015
5
1 0
1 506TRUE POSITIVE (TP)
112FALSE NEGATIVE
(FN)
0 169FALSE POSITIVE (FP)
420TRUE NEGATIVE (TN)
Accuracy = 𝑇𝑃 + 𝑇𝑁
𝑃+𝑁
Sometimes a better model may have lower Accuracy!
Precision = 𝑇𝑃
𝑇𝑃+𝐹𝑃
How many of the returned documents are correct
Recall = 𝑇𝑃
𝑇𝑃+𝐹𝑁=
𝑇𝑃
𝑃
How many of the positive labels are correct
Demo 1Building a model to predict Population Income
29/10/2015
6
A Language Platform…
A Community…
Tools & Resourceshttp://www.rstudio.com/
Core R: http://cran.r-project.org/
R was ranked no. 1 in the KDnuggets 2014 poll on Top Languages for analytics, data mining, data science
Classification
Decision trees: rpart, party
Random forest: randomForest, party
SVM: e1071, kernlab
Neural networks: nnet, neuralnet, RSNNS
Performance evaluation: ROCR
Clustering
k-means: kmeans(), kmeansruns()10
k-medoids: pam(), pamk()
Hierarchical clustering: hclust(), agnes(), diana()
DBSCAN: fpc
BIRCH: birch
Cluster validation: packages clv, clValid, NbClust
Association Rules
Association rules: apriori(), eclat() in package arules
Sequential patterns: arulesSequence
Visualisation of associations: arulesVi
Text Mining
Text mining: tm
Topic modelling: topicmodels, lda
Word cloud: wordcloud
Twitter data access: twitteR
29/10/2015
7
Execute R Script
Create R Model
29/10/2015
8
Demo 2Building a model to predict Population Income
in R and deplying it in Azure ML
29/10/2015
9
Predictive Analysis
Usage files
Data Sources Ingest & Pre-
processing
Data Preparation
(normalize, clean, etc)
Analyze & Score
(build Predictive Model)
Publish for
Consumption
Consume
Cloud Storage
batch
Processing
Engine
Machine LearningData Cleanup
Relational
DW/DM
BI Tools
- Volume per move is
typically ~100GB or
- Data is most commonly
collected nightly or
hourly
- Common Pre-processing
steps: scrub for
compliance purposes &
partition for long term
storage
- HDI & Customer code used in
this step as a
transformation/cleaning tool
- E.g. enrich, normalize
ADF: Move Data, Orchestrate, Schedule & Monitor
- Generate BI-Ready results (e.g.
dims or facts, aggregated big
data, etc)
- Create result set to drive app
or business process (e.g. list of
customers likely to churn next
month)
- In this scenario is used
as a queryable storage
system for Information
workers and analyst to
connect their BI tools
to.
- BI Tools: Power BI,
Tableau, etc.
- Apps here means any
programmatic
consumption
Business
AppsCustomer Info
Real time