azure machine learning using r

57

Upload: herman-wu

Post on 16-Apr-2017

534 views

Category:

Data & Analytics


5 download

TRANSCRIPT

Introduction to Azure ML Studio

How to Build an ML App in 15 mins

Data Science Process:Data Ingress and Egress

Data Visualization, Transformation and Feature Selection

Model Building, Evaluating, and Tuning

Operationalizing a Predictive Model

Using R in ML Studio

Which statement best matches your relationship to R?

- I’m completely new to R, but want to learn- I’m learning R- I’m an experienced R user for personal interests

- I’m an experienced R user for works- I won’t be using R (but I’m interested in what it can do)

4

• Free and open source R distribution

• Enhanced and distributed by Revolution Analytics

Microsoft R Open

• Built in Advanced Analytics and Stand Alone Server Capability

• Leverages the Benefits of SQL 2016 Enterprise Edition

SQL Server R Services

Microsoft R Products

Microsoft R ServerBig-data analytics and distributed computing on Linux,

Hadoop and Teradata

SQL Server 2016Big-data analytics integrated with SQL Server database

(coming soon)

PowerBI Computations and charts from R scripts in dashboards

Azure ML Studio R Scripts in cloud-based Experiment workflows

Visual StudioR Tools for Visual Studio: integrated development

environment for R (coming soon)

HDInsights R integrated with cloud-based Hadoop clusters

Cortana Analytics Cloud-based R APIs and Virtual Machines

CRAN, MRO, MRS Comparison

DatasizeIn-memory

In-memory In-Memory or Disk Based

Speed of AnalysisSingle threaded Multi-threaded

Multi-threaded, parallel

processing 1:N servers

SupportCommunity Community Community + Commercial

Analytic Breadth

& Depth 7500+ innovative analytic

packages7500+ innovative analytic

packages

7500+ innovative packages +

commercial parallel high-speed

functions

LicenseOpen Source

Open Source

Commercial license.

Supported release with

indemnity

Microsoft

R Open

Microsoft

R Server

Enterprise

proven

Hybrid

Hyper-scale Microsoft

Azure

Azure regions

Enterprise

proven

Hybrid

Hyper-scale

Open + Flexible

Open & flexible

Applications

Infrastructure

Management

Databases & Middleware

App Frameworks

Linux

Enterprise

proven

Hybrid

Hyper-scaleOpen & flexible

TrustworthyMore compliance certifications than any other cloud

Trustworthy

Developer & IT productivity

Trustworthy

Platform for SaaS extensibility

Vision Analytics

Recommenda-

tion engines

Advertising

analysis

Weather

forecasting for

business

planning

Social network

analysis

Legal

discovery and

document

archiving

Pricing analysis

Fraud

detection

Churn

analysis

Equipment

monitoring

Location-based

tracking and

services

Personalized

Insurance

Machine learning & predictive analytics are core capabilities that are needed throughout your business

Get/Prepare Data

Build/Edit Experiment

Create/Update Model

Evaluate Model Results

Publish Web

Service

Build ML Model Deploy as Web ServiceProvision Workspace

Get Azure

Subscription

Create

Workspace

Publish an App

Azure Data

Marketplace

Classification Regression Recommenders Clustering

• State of art ML Algorithms

• Support of open source tools: R (and Python)

• Build a model by visually drag and drop

• Deploy as web service with clicks of button

• Use the service by calling the web API

• Publish an app in Azure Data Marketplace

• Goal

• Data sources

Census Income DatasetUCI Machine Learning Repository

• Dataset Description

http://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.names

age workclass fnlwgt education education-num marital-status occupation relationship race sex capital-gain capital-loss hours-per-week native-country income

39 State-gov 77516 Bachelors 13 Never-married Adm-clerical Not-in-family White Male 2174 0 40 United-States <=50K

50 Self-emp-not-inc83311 Bachelors 13 Married-civ-spouse Exec-managerial Husband White Male 0 0 13 United-States <=50K

38 Private 215646 HS-grad 9 Divorced Handlers-cleaners Not-in-family White Male 0 0 40 United-States <=50K

53 Private 234721 11th 7 Married-civ-spouse Handlers-cleaners Husband Black Male 0 0 40 United-States <=50K

28 Private 338409 Bachelors 13 Married-civ-spouse Prof-specialty Wife Black Female 0 0 40 Cuba <=50K

37 Private 284582 Masters 14 Married-civ-spouse Exec-managerial Wife White Female 0 0 40 United-States <=50K

49 Private 160187 9th 5 Married-spouse-absent Other-service Not-in-family Black Female 0 0 16 Jamaica <=50K

52 Self-emp-not-inc209642 HS-grad 9 Married-civ-spouse Exec-managerial Husband White Male 0 0 45 United-States >50K

31 Private 45781 Masters 14 Never-married Prof-specialty Not-in-family White Female 14084 0 50 United-States >50K

42 Private 159449 Bachelors 13 Married-civ-spouse Exec-managerial Husband White Male 5178 0 40 United-States >50K

37 Private 280464 Some-college 10 Married-civ-spouse Exec-managerial Husband Black Male 0 0 80 United-States >50K

30 State-gov 141297 Bachelors 13 Married-civ-spouse Prof-specialty Husband Asian-Pac-Islander Male 0 0 40 India >50K

23 Private 122272 Bachelors 13 Never-married Adm-clerical Own-child White Female 0 0 30 United-States <=50K

32 Private 205019 Assoc-acdm 12 Never-married Sales Not-in-family Black Male 0 0 50 United-States <=50K

40 Private 121772 Assoc-voc 11 Married-civ-spouse Craft-repair Husband Asian-Pac-Islander Male 0 0 40 ? >50K

34 Private 245487 7th-8th 4 Married-civ-spouse Transport-moving Husband Amer-Indian-Eskimo Male 0 0 45 Mexico <=50K

25 Self-emp-not-inc176756 HS-grad 9 Never-married Farming-fishing Own-child White Male 0 0 35 United-States <=50K

32 Private 186824 HS-grad 9 Never-married Machine-op-inspct Unmarried White Male 0 0 40 United-States <=50K

38 Private 28887 11th 7 Married-civ-spouse Sales Husband White Male 0 0 50 United-States <=50K

43 Self-emp-not-inc292175 Masters 14 Divorced Exec-managerial Unmarried White Female 0 0 45 United-States >50K

40 Private 193524 Doctorate 16 Married-civ-spouse Prof-specialty Husband White Male 0 0 60 United-States >50K

54 Private 302146 HS-grad 9 Separated Other-service Unmarried Black Female 0 0 20 United-States <=50K

35 Federal-gov 76845 9th 5 Married-civ-spouse Farming-fishing Husband Black Male 0 0 40 United-States <=50K

43 Private 117037 11th 7 Married-civ-spouse Transport-moving Husband White Male 0 2042 40 United-States <=50K

59 Private 109015 HS-grad 9 Divorced Tech-support Unmarried White Female 0 0 40 United-States <=50K

56 Local-gov 216851 Bachelors 13 Married-civ-spouse Tech-support Husband White Male 0 0 40 United-States >50K

19 Private 168294 HS-grad 9 Never-married Craft-repair Own-child White Male 0 0 40 United-States <=50K

54 ? 180211 Some-college 10 Married-civ-spouse ? Husband Asian-Pac-Islander Male 0 0 60 South >50K

39 Private 367260 HS-grad 9 Divorced Exec-managerial Not-in-family White Male 0 0 80 United-States <=50K

• Read from various data sources including:

• Write out the results:

• Data sources

• Data sinks

• Data storage

• Element Types

• Categorical aka “factors”

• Dense and sparse

Overview of Azure ML

Data Ingress and Egress

• Data Insights

• Data Scrubbing

• Data Transformation

• Feature Selection

• Problem

• Goal

Predicting Flight Delays

• Data sources

1.

Bureau of Transportation Statistics (BTS)

2.

National Oceanic and Atmospheric Administration (NOAA)

FTP

Year Month DayofMonth DayOfWeek Carrier OriginAirportID DestAirportID CRSDepTime DepDelay DepDel15 CRSArrTime ArrDelay ArrDel15 Cancelled

2013 7 16 2 DL 13487 14747 2155 -1 0 2336 5 0 0

2013 7 16 2 DL 12889 13487 1555 -6 0 2057 -7 0 0

2013 7 16 2 DL 11278 10397 1600 -5 0 1752 -19 0 0

2013 7 16 2 DL 13851 10397 600 -3 0 904 4 0 0

2013 7 16 2 DL 14747 12478 2330 49 1 736 40 1 0

2013 7 16 2 DL 12478 14747 1735 -4 0 2108 -41 0 0

2013 7 16 2 DL 15016 13487 1656 33 1 1831 17 1 0

2013 7 16 2 DL 11278 11433 659 -7 0 837 -28 0 0

2013 7 16 2 DL 10397 14107 805 -2 0 859 -25 0 0

2013 7 16 2 DL 14107 10397 1005 -6 0 1650 -8 0 0

2013 7 16 2 DL 12953 10397 700 112 1 930 108 1 0

2013 7 16 2 DL 11433 12953 1725 1914 1 1

2013 7 16 2 DL 13495 12892 720 917 1 1

2013 7 16 2 DL 12889 13487 720 0 0 1222 -7 0 0

2013 7 16 2 DL 13487 12889 1750 -9 0 1911 -26 0 0

2013 7 16 2 DL 12889 12892 620 -1 0 730 -5 0 0

2013 7 16 2 DL 12889 10397 715 -7 0 1415 -1 0 0

2013 7 16 2 DL 10397 12892 940 -2 0 1110 -11 0 0

2013 7 16 2 DL 15304 10397 1445 -5 0 1615 -5 0 0

2013 7 16 2 DL 14869 14747 2155 -5 0 2300 31 1 0

2013 7 16 2 DL 15304 12892 1930 -8 0 2125 -17 0 0

2013 7 16 2 DL 13487 12892 1135 2 0 1321 -17 0 0

2013 7 16 2 DL 12892 12173 1442 4 0 1723 2 0 0

2013 7 16 2 DL 11433 10529 720 -4 0 900 -9 0 0

2013 7 16 2 DL 10529 11433 941 -4 0 1129 -12 0 0

2013 7 16 2 DL 10397 14100 1117 -4 0 1320 -5 0 0

2013 7 16 2 DL 14100 10397 1415 -5 0 1616 -12 0 0

2013 7 16 2 DL 14869 13487 2016 6 0 2346 0 0 0

Sample flight data

Year Month Day AirportID Time TimeZone SkyCondition Visibility WeatherType DryBulbFarenheit DryBulbCelsius WetBulbFarenheit WetBulbCelsius DewPointFarenheit DewPointCelsius RelativeHumidity

2013 7 1 14843 56 -4 FEW020 SCT035 10 78 25.6 75 23.6 73 22.8 85

2013 7 1 14843 156 -4 FEW035 10 78 25.6 74 23.2 72 22.2 82

2013 7 1 14843 256 -4 FEW050 10 78 25.6 75 23.6 73 22.8 85

2013 7 1 14843 356 -4 FEW055 SCT070 10 78 25.6 75 23.6 73 22.8 85

2013 7 1 14843 456 -4 FEW050 10 77 25 74 23.4 73 22.8 88

2013 7 1 14843 556 -4 FEW025 SCT065 10 78 25.6 75 23.6 73 22.8 85

2013 7 1 14843 656 -4 FEW025 SCT042 SCT065 10 78 25.6 75 24 74 23.3 88

2013 7 1 14843 756 -4 FEW034 SCT044 SCT065 10 81 27.2 77 24.8 75 23.9 82

2013 7 1 14843 856 -4 FEW034 SCT045 SCT070 9 85 29.4 77 25.1 74 23.3 70

2013 7 1 14843 956 -4 FEW022 SCT032 9 86 30 78 25.6 75 23.9 70

2013 7 1 14843 1056 -4 FEW025 SCT032 9 87 30.6 78 25.8 75 23.9 68

2013 7 1 14843 1156 -4 SCT025 SCT032CB BKN050 9 TS 84 28.9 77 25 74 23.3 72

2013 7 1 14843 1215 -4 SCT025 SCT038 BKN055 9 84 29 76 24.6 73 23 70

2013 7 1 14843 1256 -4 FEW025 SCT035 BKN075 7 =-RA 81 27.2 77 24.8 75 23.9 82

2013 7 1 14843 1356 -4 SCT028 SCT040 BKN080 9 83 28.3 76 24.4 73 22.8 72

2013 7 1 14843 1456 -4 FEW038 SCT060 BKN110 9 84 28.9 77 24.9 74 23.3 72

2013 7 1 14843 1556 -4 FEW036 SCT070 BKN110 9 83 28.3 76 24.4 73 22.8 72

2013 7 1 14843 1656 -4 FEW042 BKN100 9 83 28.3 77 24.8 74 23.3 74

2013 7 1 14843 1756 -4 FEW055 BKN100 9 83 28.3 77 24.8 74 23.3 74

2013 7 1 14843 1856 -4 FEW038 SCT075 BKN100 9 82 27.8 76 24.3 73 22.8 74

2013 7 1 14843 1956 -4 FEW055 10 81 27.2 75 24.1 73 22.8 77

2013 7 1 14843 2056 -4 FEW045 SCT065 10 81 27.2 75 24.1 73 22.8 77

2013 7 1 14843 2156 -4 FEW025 SCT055 10 80 26.7 75 23.9 73 22.8 79

2013 7 1 14843 2256 -4 FEW025 SCT055 10 80 26.7 76 24.3 74 23.3 82

2013 7 1 14843 2356 -4 FEW025 SCT060 10 79 26.1 76 24.1 74 23.3 85

2013 7 2 14843 56 -4 FEW022 BKN070 10 79 26.1 76 24.1 74 23.3 85

2013 7 2 14843 156 -4 FEW040 10 79 26.1 75 23.8 73 22.8 82

2013 7 2 14843 256 -4 FEW040 10 =-RA 78 25.6 75 23.6 73 22.8 85

2013 7 2 14843 356 -4 FEW022 SCT040 10 78 25.6 75 24 74 23.3 88

Sample weather data

Data Visualization and Transformation

• Filter Based Feature Selection

• Data Partitioning

• Training a Binary Classifier

• Model Comparison

• Parameter Sweeping

Building a Classification Model

Model Evaluation and Parameter Sweeping

Get/Prepare Data

Build/Edit Experiment

Create/Update Model

Evaluate Model Results

Create

Scoring

Graph

Add

Input/output

Ports

Publish Web

Service

Deploy to

Production

Build ML Model Publish as Web ServiceProvision Workspace

Get Azure

Subscription

Create

Workspace

Operationalizing a Predictive Model

# Connect Adult dataset - visible as data.frame

# Clean up unfortunate variable names (R does not like ‘-’ in variable names)

# Plot some variable distributions by income

Using R in Azure ML Studio