azure machine learning using r

57

Upload: herman-wu

Post on 21-Apr-2017

534 views

Category:

Data & Analytics


2 download

TRANSCRIPT

Page 1: Azure Machine Learning using R
Page 2: Azure Machine Learning using R

Introduction to Azure ML Studio

How to Build an ML App in 15 mins

Data Science Process:Data Ingress and Egress

Data Visualization, Transformation and Feature Selection

Model Building, Evaluating, and Tuning

Operationalizing a Predictive Model

Using R in ML Studio

Page 3: Azure Machine Learning using R

Which statement best matches your relationship to R?

- I’m completely new to R, but want to learn- I’m learning R- I’m an experienced R user for personal interests

- I’m an experienced R user for works- I won’t be using R (but I’m interested in what it can do)

Page 4: Azure Machine Learning using R

4

• Free and open source R distribution

• Enhanced and distributed by Revolution Analytics

Microsoft R Open

• Built in Advanced Analytics and Stand Alone Server Capability

• Leverages the Benefits of SQL 2016 Enterprise Edition

SQL Server R Services

Microsoft R Products

Page 6: Azure Machine Learning using R

Microsoft R ServerBig-data analytics and distributed computing on Linux,

Hadoop and Teradata

SQL Server 2016Big-data analytics integrated with SQL Server database

(coming soon)

PowerBI Computations and charts from R scripts in dashboards

Azure ML Studio R Scripts in cloud-based Experiment workflows

Visual StudioR Tools for Visual Studio: integrated development

environment for R (coming soon)

HDInsights R integrated with cloud-based Hadoop clusters

Cortana Analytics Cloud-based R APIs and Virtual Machines

Page 8: Azure Machine Learning using R

CRAN, MRO, MRS Comparison

DatasizeIn-memory

In-memory In-Memory or Disk Based

Speed of AnalysisSingle threaded Multi-threaded

Multi-threaded, parallel

processing 1:N servers

SupportCommunity Community Community + Commercial

Analytic Breadth

& Depth 7500+ innovative analytic

packages7500+ innovative analytic

packages

7500+ innovative packages +

commercial parallel high-speed

functions

LicenseOpen Source

Open Source

Commercial license.

Supported release with

indemnity

Microsoft

R Open

Microsoft

R Server

Page 9: Azure Machine Learning using R
Page 10: Azure Machine Learning using R

Enterprise

proven

Hybrid

Hyper-scale Microsoft

Azure

Page 11: Azure Machine Learning using R

Azure regions

Page 12: Azure Machine Learning using R

Enterprise

proven

Hybrid

Hyper-scale

Open + Flexible

Open & flexible

Applications

Infrastructure

Management

Databases & Middleware

App Frameworks

Linux

Page 13: Azure Machine Learning using R

Enterprise

proven

Hybrid

Hyper-scaleOpen & flexible

TrustworthyMore compliance certifications than any other cloud

Trustworthy

Developer & IT productivity

Trustworthy

Platform for SaaS extensibility

Page 14: Azure Machine Learning using R
Page 15: Azure Machine Learning using R

Vision Analytics

Recommenda-

tion engines

Advertising

analysis

Weather

forecasting for

business

planning

Social network

analysis

Legal

discovery and

document

archiving

Pricing analysis

Fraud

detection

Churn

analysis

Equipment

monitoring

Location-based

tracking and

services

Personalized

Insurance

Machine learning & predictive analytics are core capabilities that are needed throughout your business

Page 16: Azure Machine Learning using R
Page 17: Azure Machine Learning using R
Page 18: Azure Machine Learning using R

Get/Prepare Data

Build/Edit Experiment

Create/Update Model

Evaluate Model Results

Publish Web

Service

Build ML Model Deploy as Web ServiceProvision Workspace

Get Azure

Subscription

Create

Workspace

Publish an App

Azure Data

Marketplace

Page 19: Azure Machine Learning using R

Classification Regression Recommenders Clustering

• State of art ML Algorithms

Page 20: Azure Machine Learning using R

• Support of open source tools: R (and Python)

• Build a model by visually drag and drop

• Deploy as web service with clicks of button

• Use the service by calling the web API

• Publish an app in Azure Data Marketplace

Page 21: Azure Machine Learning using R
Page 22: Azure Machine Learning using R

• Goal

• Data sources

Census Income DatasetUCI Machine Learning Repository

• Dataset Description

http://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.names

Page 23: Azure Machine Learning using R

age workclass fnlwgt education education-num marital-status occupation relationship race sex capital-gain capital-loss hours-per-week native-country income

39 State-gov 77516 Bachelors 13 Never-married Adm-clerical Not-in-family White Male 2174 0 40 United-States <=50K

50 Self-emp-not-inc83311 Bachelors 13 Married-civ-spouse Exec-managerial Husband White Male 0 0 13 United-States <=50K

38 Private 215646 HS-grad 9 Divorced Handlers-cleaners Not-in-family White Male 0 0 40 United-States <=50K

53 Private 234721 11th 7 Married-civ-spouse Handlers-cleaners Husband Black Male 0 0 40 United-States <=50K

28 Private 338409 Bachelors 13 Married-civ-spouse Prof-specialty Wife Black Female 0 0 40 Cuba <=50K

37 Private 284582 Masters 14 Married-civ-spouse Exec-managerial Wife White Female 0 0 40 United-States <=50K

49 Private 160187 9th 5 Married-spouse-absent Other-service Not-in-family Black Female 0 0 16 Jamaica <=50K

52 Self-emp-not-inc209642 HS-grad 9 Married-civ-spouse Exec-managerial Husband White Male 0 0 45 United-States >50K

31 Private 45781 Masters 14 Never-married Prof-specialty Not-in-family White Female 14084 0 50 United-States >50K

42 Private 159449 Bachelors 13 Married-civ-spouse Exec-managerial Husband White Male 5178 0 40 United-States >50K

37 Private 280464 Some-college 10 Married-civ-spouse Exec-managerial Husband Black Male 0 0 80 United-States >50K

30 State-gov 141297 Bachelors 13 Married-civ-spouse Prof-specialty Husband Asian-Pac-Islander Male 0 0 40 India >50K

23 Private 122272 Bachelors 13 Never-married Adm-clerical Own-child White Female 0 0 30 United-States <=50K

32 Private 205019 Assoc-acdm 12 Never-married Sales Not-in-family Black Male 0 0 50 United-States <=50K

40 Private 121772 Assoc-voc 11 Married-civ-spouse Craft-repair Husband Asian-Pac-Islander Male 0 0 40 ? >50K

34 Private 245487 7th-8th 4 Married-civ-spouse Transport-moving Husband Amer-Indian-Eskimo Male 0 0 45 Mexico <=50K

25 Self-emp-not-inc176756 HS-grad 9 Never-married Farming-fishing Own-child White Male 0 0 35 United-States <=50K

32 Private 186824 HS-grad 9 Never-married Machine-op-inspct Unmarried White Male 0 0 40 United-States <=50K

38 Private 28887 11th 7 Married-civ-spouse Sales Husband White Male 0 0 50 United-States <=50K

43 Self-emp-not-inc292175 Masters 14 Divorced Exec-managerial Unmarried White Female 0 0 45 United-States >50K

40 Private 193524 Doctorate 16 Married-civ-spouse Prof-specialty Husband White Male 0 0 60 United-States >50K

54 Private 302146 HS-grad 9 Separated Other-service Unmarried Black Female 0 0 20 United-States <=50K

35 Federal-gov 76845 9th 5 Married-civ-spouse Farming-fishing Husband Black Male 0 0 40 United-States <=50K

43 Private 117037 11th 7 Married-civ-spouse Transport-moving Husband White Male 0 2042 40 United-States <=50K

59 Private 109015 HS-grad 9 Divorced Tech-support Unmarried White Female 0 0 40 United-States <=50K

56 Local-gov 216851 Bachelors 13 Married-civ-spouse Tech-support Husband White Male 0 0 40 United-States >50K

19 Private 168294 HS-grad 9 Never-married Craft-repair Own-child White Male 0 0 40 United-States <=50K

54 ? 180211 Some-college 10 Married-civ-spouse ? Husband Asian-Pac-Islander Male 0 0 60 South >50K

39 Private 367260 HS-grad 9 Divorced Exec-managerial Not-in-family White Male 0 0 80 United-States <=50K

Page 24: Azure Machine Learning using R
Page 25: Azure Machine Learning using R
Page 26: Azure Machine Learning using R

• Read from various data sources including:

• Write out the results:

Page 27: Azure Machine Learning using R

• Data sources

• Data sinks

Page 28: Azure Machine Learning using R

• Data storage

• Element Types

• Categorical aka “factors”

• Dense and sparse

Page 29: Azure Machine Learning using R

Overview of Azure ML

Data Ingress and Egress

Page 30: Azure Machine Learning using R
Page 31: Azure Machine Learning using R

• Data Insights

• Data Scrubbing

• Data Transformation

• Feature Selection

Page 32: Azure Machine Learning using R

• Problem

• Goal

Predicting Flight Delays

• Data sources

1.

Bureau of Transportation Statistics (BTS)

2.

National Oceanic and Atmospheric Administration (NOAA)

FTP

Page 33: Azure Machine Learning using R

Year Month DayofMonth DayOfWeek Carrier OriginAirportID DestAirportID CRSDepTime DepDelay DepDel15 CRSArrTime ArrDelay ArrDel15 Cancelled

2013 7 16 2 DL 13487 14747 2155 -1 0 2336 5 0 0

2013 7 16 2 DL 12889 13487 1555 -6 0 2057 -7 0 0

2013 7 16 2 DL 11278 10397 1600 -5 0 1752 -19 0 0

2013 7 16 2 DL 13851 10397 600 -3 0 904 4 0 0

2013 7 16 2 DL 14747 12478 2330 49 1 736 40 1 0

2013 7 16 2 DL 12478 14747 1735 -4 0 2108 -41 0 0

2013 7 16 2 DL 15016 13487 1656 33 1 1831 17 1 0

2013 7 16 2 DL 11278 11433 659 -7 0 837 -28 0 0

2013 7 16 2 DL 10397 14107 805 -2 0 859 -25 0 0

2013 7 16 2 DL 14107 10397 1005 -6 0 1650 -8 0 0

2013 7 16 2 DL 12953 10397 700 112 1 930 108 1 0

2013 7 16 2 DL 11433 12953 1725 1914 1 1

2013 7 16 2 DL 13495 12892 720 917 1 1

2013 7 16 2 DL 12889 13487 720 0 0 1222 -7 0 0

2013 7 16 2 DL 13487 12889 1750 -9 0 1911 -26 0 0

2013 7 16 2 DL 12889 12892 620 -1 0 730 -5 0 0

2013 7 16 2 DL 12889 10397 715 -7 0 1415 -1 0 0

2013 7 16 2 DL 10397 12892 940 -2 0 1110 -11 0 0

2013 7 16 2 DL 15304 10397 1445 -5 0 1615 -5 0 0

2013 7 16 2 DL 14869 14747 2155 -5 0 2300 31 1 0

2013 7 16 2 DL 15304 12892 1930 -8 0 2125 -17 0 0

2013 7 16 2 DL 13487 12892 1135 2 0 1321 -17 0 0

2013 7 16 2 DL 12892 12173 1442 4 0 1723 2 0 0

2013 7 16 2 DL 11433 10529 720 -4 0 900 -9 0 0

2013 7 16 2 DL 10529 11433 941 -4 0 1129 -12 0 0

2013 7 16 2 DL 10397 14100 1117 -4 0 1320 -5 0 0

2013 7 16 2 DL 14100 10397 1415 -5 0 1616 -12 0 0

2013 7 16 2 DL 14869 13487 2016 6 0 2346 0 0 0

Sample flight data

Page 34: Azure Machine Learning using R

Year Month Day AirportID Time TimeZone SkyCondition Visibility WeatherType DryBulbFarenheit DryBulbCelsius WetBulbFarenheit WetBulbCelsius DewPointFarenheit DewPointCelsius RelativeHumidity

2013 7 1 14843 56 -4 FEW020 SCT035 10 78 25.6 75 23.6 73 22.8 85

2013 7 1 14843 156 -4 FEW035 10 78 25.6 74 23.2 72 22.2 82

2013 7 1 14843 256 -4 FEW050 10 78 25.6 75 23.6 73 22.8 85

2013 7 1 14843 356 -4 FEW055 SCT070 10 78 25.6 75 23.6 73 22.8 85

2013 7 1 14843 456 -4 FEW050 10 77 25 74 23.4 73 22.8 88

2013 7 1 14843 556 -4 FEW025 SCT065 10 78 25.6 75 23.6 73 22.8 85

2013 7 1 14843 656 -4 FEW025 SCT042 SCT065 10 78 25.6 75 24 74 23.3 88

2013 7 1 14843 756 -4 FEW034 SCT044 SCT065 10 81 27.2 77 24.8 75 23.9 82

2013 7 1 14843 856 -4 FEW034 SCT045 SCT070 9 85 29.4 77 25.1 74 23.3 70

2013 7 1 14843 956 -4 FEW022 SCT032 9 86 30 78 25.6 75 23.9 70

2013 7 1 14843 1056 -4 FEW025 SCT032 9 87 30.6 78 25.8 75 23.9 68

2013 7 1 14843 1156 -4 SCT025 SCT032CB BKN050 9 TS 84 28.9 77 25 74 23.3 72

2013 7 1 14843 1215 -4 SCT025 SCT038 BKN055 9 84 29 76 24.6 73 23 70

2013 7 1 14843 1256 -4 FEW025 SCT035 BKN075 7 =-RA 81 27.2 77 24.8 75 23.9 82

2013 7 1 14843 1356 -4 SCT028 SCT040 BKN080 9 83 28.3 76 24.4 73 22.8 72

2013 7 1 14843 1456 -4 FEW038 SCT060 BKN110 9 84 28.9 77 24.9 74 23.3 72

2013 7 1 14843 1556 -4 FEW036 SCT070 BKN110 9 83 28.3 76 24.4 73 22.8 72

2013 7 1 14843 1656 -4 FEW042 BKN100 9 83 28.3 77 24.8 74 23.3 74

2013 7 1 14843 1756 -4 FEW055 BKN100 9 83 28.3 77 24.8 74 23.3 74

2013 7 1 14843 1856 -4 FEW038 SCT075 BKN100 9 82 27.8 76 24.3 73 22.8 74

2013 7 1 14843 1956 -4 FEW055 10 81 27.2 75 24.1 73 22.8 77

2013 7 1 14843 2056 -4 FEW045 SCT065 10 81 27.2 75 24.1 73 22.8 77

2013 7 1 14843 2156 -4 FEW025 SCT055 10 80 26.7 75 23.9 73 22.8 79

2013 7 1 14843 2256 -4 FEW025 SCT055 10 80 26.7 76 24.3 74 23.3 82

2013 7 1 14843 2356 -4 FEW025 SCT060 10 79 26.1 76 24.1 74 23.3 85

2013 7 2 14843 56 -4 FEW022 BKN070 10 79 26.1 76 24.1 74 23.3 85

2013 7 2 14843 156 -4 FEW040 10 79 26.1 75 23.8 73 22.8 82

2013 7 2 14843 256 -4 FEW040 10 =-RA 78 25.6 75 23.6 73 22.8 85

2013 7 2 14843 356 -4 FEW022 SCT040 10 78 25.6 75 24 74 23.3 88

Sample weather data

Page 35: Azure Machine Learning using R

Data Visualization and Transformation

Page 36: Azure Machine Learning using R
Page 37: Azure Machine Learning using R

• Filter Based Feature Selection

Page 38: Azure Machine Learning using R
Page 39: Azure Machine Learning using R

• Data Partitioning

• Training a Binary Classifier

• Model Comparison

• Parameter Sweeping

Page 40: Azure Machine Learning using R

Building a Classification Model

Page 41: Azure Machine Learning using R
Page 42: Azure Machine Learning using R

Model Evaluation and Parameter Sweeping

Page 43: Azure Machine Learning using R
Page 44: Azure Machine Learning using R
Page 45: Azure Machine Learning using R

Get/Prepare Data

Build/Edit Experiment

Create/Update Model

Evaluate Model Results

Create

Scoring

Graph

Add

Input/output

Ports

Publish Web

Service

Deploy to

Production

Build ML Model Publish as Web ServiceProvision Workspace

Get Azure

Subscription

Create

Workspace

Page 46: Azure Machine Learning using R
Page 47: Azure Machine Learning using R

Operationalizing a Predictive Model

Page 48: Azure Machine Learning using R
Page 49: Azure Machine Learning using R
Page 50: Azure Machine Learning using R
Page 51: Azure Machine Learning using R
Page 52: Azure Machine Learning using R
Page 53: Azure Machine Learning using R

# Connect Adult dataset - visible as data.frame

# Clean up unfortunate variable names (R does not like ‘-’ in variable names)

# Plot some variable distributions by income

Page 54: Azure Machine Learning using R
Page 55: Azure Machine Learning using R

Using R in Azure ML Studio

Page 56: Azure Machine Learning using R
Page 57: Azure Machine Learning using R