azure machine learning using r
TRANSCRIPT
Introduction to Azure ML Studio
How to Build an ML App in 15 mins
Data Science Process:Data Ingress and Egress
Data Visualization, Transformation and Feature Selection
Model Building, Evaluating, and Tuning
Operationalizing a Predictive Model
Using R in ML Studio
Which statement best matches your relationship to R?
- I’m completely new to R, but want to learn- I’m learning R- I’m an experienced R user for personal interests
- I’m an experienced R user for works- I won’t be using R (but I’m interested in what it can do)
4
• Free and open source R distribution
• Enhanced and distributed by Revolution Analytics
Microsoft R Open
• Built in Advanced Analytics and Stand Alone Server Capability
• Leverages the Benefits of SQL 2016 Enterprise Edition
SQL Server R Services
Microsoft R Products
Microsoft R ServerBig-data analytics and distributed computing on Linux,
Hadoop and Teradata
SQL Server 2016Big-data analytics integrated with SQL Server database
(coming soon)
PowerBI Computations and charts from R scripts in dashboards
Azure ML Studio R Scripts in cloud-based Experiment workflows
Visual StudioR Tools for Visual Studio: integrated development
environment for R (coming soon)
HDInsights R integrated with cloud-based Hadoop clusters
Cortana Analytics Cloud-based R APIs and Virtual Machines
CRAN, MRO, MRS Comparison
DatasizeIn-memory
In-memory In-Memory or Disk Based
Speed of AnalysisSingle threaded Multi-threaded
Multi-threaded, parallel
processing 1:N servers
SupportCommunity Community Community + Commercial
Analytic Breadth
& Depth 7500+ innovative analytic
packages7500+ innovative analytic
packages
7500+ innovative packages +
commercial parallel high-speed
functions
LicenseOpen Source
Open Source
Commercial license.
Supported release with
indemnity
Microsoft
R Open
Microsoft
R Server
Enterprise
proven
Hybrid
Hyper-scale
Open + Flexible
Open & flexible
Applications
Infrastructure
Management
Databases & Middleware
App Frameworks
Linux
Enterprise
proven
Hybrid
Hyper-scaleOpen & flexible
TrustworthyMore compliance certifications than any other cloud
Trustworthy
Developer & IT productivity
Trustworthy
Platform for SaaS extensibility
Vision Analytics
Recommenda-
tion engines
Advertising
analysis
Weather
forecasting for
business
planning
Social network
analysis
Legal
discovery and
document
archiving
Pricing analysis
Fraud
detection
Churn
analysis
Equipment
monitoring
Location-based
tracking and
services
Personalized
Insurance
Machine learning & predictive analytics are core capabilities that are needed throughout your business
Get/Prepare Data
Build/Edit Experiment
Create/Update Model
Evaluate Model Results
Publish Web
Service
Build ML Model Deploy as Web ServiceProvision Workspace
Get Azure
Subscription
Create
Workspace
Publish an App
Azure Data
Marketplace
• Support of open source tools: R (and Python)
• Build a model by visually drag and drop
• Deploy as web service with clicks of button
• Use the service by calling the web API
• Publish an app in Azure Data Marketplace
• Goal
• Data sources
Census Income DatasetUCI Machine Learning Repository
• Dataset Description
http://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.names
age workclass fnlwgt education education-num marital-status occupation relationship race sex capital-gain capital-loss hours-per-week native-country income
39 State-gov 77516 Bachelors 13 Never-married Adm-clerical Not-in-family White Male 2174 0 40 United-States <=50K
50 Self-emp-not-inc83311 Bachelors 13 Married-civ-spouse Exec-managerial Husband White Male 0 0 13 United-States <=50K
38 Private 215646 HS-grad 9 Divorced Handlers-cleaners Not-in-family White Male 0 0 40 United-States <=50K
53 Private 234721 11th 7 Married-civ-spouse Handlers-cleaners Husband Black Male 0 0 40 United-States <=50K
28 Private 338409 Bachelors 13 Married-civ-spouse Prof-specialty Wife Black Female 0 0 40 Cuba <=50K
37 Private 284582 Masters 14 Married-civ-spouse Exec-managerial Wife White Female 0 0 40 United-States <=50K
49 Private 160187 9th 5 Married-spouse-absent Other-service Not-in-family Black Female 0 0 16 Jamaica <=50K
52 Self-emp-not-inc209642 HS-grad 9 Married-civ-spouse Exec-managerial Husband White Male 0 0 45 United-States >50K
31 Private 45781 Masters 14 Never-married Prof-specialty Not-in-family White Female 14084 0 50 United-States >50K
42 Private 159449 Bachelors 13 Married-civ-spouse Exec-managerial Husband White Male 5178 0 40 United-States >50K
37 Private 280464 Some-college 10 Married-civ-spouse Exec-managerial Husband Black Male 0 0 80 United-States >50K
30 State-gov 141297 Bachelors 13 Married-civ-spouse Prof-specialty Husband Asian-Pac-Islander Male 0 0 40 India >50K
23 Private 122272 Bachelors 13 Never-married Adm-clerical Own-child White Female 0 0 30 United-States <=50K
32 Private 205019 Assoc-acdm 12 Never-married Sales Not-in-family Black Male 0 0 50 United-States <=50K
40 Private 121772 Assoc-voc 11 Married-civ-spouse Craft-repair Husband Asian-Pac-Islander Male 0 0 40 ? >50K
34 Private 245487 7th-8th 4 Married-civ-spouse Transport-moving Husband Amer-Indian-Eskimo Male 0 0 45 Mexico <=50K
25 Self-emp-not-inc176756 HS-grad 9 Never-married Farming-fishing Own-child White Male 0 0 35 United-States <=50K
32 Private 186824 HS-grad 9 Never-married Machine-op-inspct Unmarried White Male 0 0 40 United-States <=50K
38 Private 28887 11th 7 Married-civ-spouse Sales Husband White Male 0 0 50 United-States <=50K
43 Self-emp-not-inc292175 Masters 14 Divorced Exec-managerial Unmarried White Female 0 0 45 United-States >50K
40 Private 193524 Doctorate 16 Married-civ-spouse Prof-specialty Husband White Male 0 0 60 United-States >50K
54 Private 302146 HS-grad 9 Separated Other-service Unmarried Black Female 0 0 20 United-States <=50K
35 Federal-gov 76845 9th 5 Married-civ-spouse Farming-fishing Husband Black Male 0 0 40 United-States <=50K
43 Private 117037 11th 7 Married-civ-spouse Transport-moving Husband White Male 0 2042 40 United-States <=50K
59 Private 109015 HS-grad 9 Divorced Tech-support Unmarried White Female 0 0 40 United-States <=50K
56 Local-gov 216851 Bachelors 13 Married-civ-spouse Tech-support Husband White Male 0 0 40 United-States >50K
19 Private 168294 HS-grad 9 Never-married Craft-repair Own-child White Male 0 0 40 United-States <=50K
54 ? 180211 Some-college 10 Married-civ-spouse ? Husband Asian-Pac-Islander Male 0 0 60 South >50K
39 Private 367260 HS-grad 9 Divorced Exec-managerial Not-in-family White Male 0 0 80 United-States <=50K
• Problem
• Goal
Predicting Flight Delays
• Data sources
1.
Bureau of Transportation Statistics (BTS)
2.
National Oceanic and Atmospheric Administration (NOAA)
FTP
Year Month DayofMonth DayOfWeek Carrier OriginAirportID DestAirportID CRSDepTime DepDelay DepDel15 CRSArrTime ArrDelay ArrDel15 Cancelled
2013 7 16 2 DL 13487 14747 2155 -1 0 2336 5 0 0
2013 7 16 2 DL 12889 13487 1555 -6 0 2057 -7 0 0
2013 7 16 2 DL 11278 10397 1600 -5 0 1752 -19 0 0
2013 7 16 2 DL 13851 10397 600 -3 0 904 4 0 0
2013 7 16 2 DL 14747 12478 2330 49 1 736 40 1 0
2013 7 16 2 DL 12478 14747 1735 -4 0 2108 -41 0 0
2013 7 16 2 DL 15016 13487 1656 33 1 1831 17 1 0
2013 7 16 2 DL 11278 11433 659 -7 0 837 -28 0 0
2013 7 16 2 DL 10397 14107 805 -2 0 859 -25 0 0
2013 7 16 2 DL 14107 10397 1005 -6 0 1650 -8 0 0
2013 7 16 2 DL 12953 10397 700 112 1 930 108 1 0
2013 7 16 2 DL 11433 12953 1725 1914 1 1
2013 7 16 2 DL 13495 12892 720 917 1 1
2013 7 16 2 DL 12889 13487 720 0 0 1222 -7 0 0
2013 7 16 2 DL 13487 12889 1750 -9 0 1911 -26 0 0
2013 7 16 2 DL 12889 12892 620 -1 0 730 -5 0 0
2013 7 16 2 DL 12889 10397 715 -7 0 1415 -1 0 0
2013 7 16 2 DL 10397 12892 940 -2 0 1110 -11 0 0
2013 7 16 2 DL 15304 10397 1445 -5 0 1615 -5 0 0
2013 7 16 2 DL 14869 14747 2155 -5 0 2300 31 1 0
2013 7 16 2 DL 15304 12892 1930 -8 0 2125 -17 0 0
2013 7 16 2 DL 13487 12892 1135 2 0 1321 -17 0 0
2013 7 16 2 DL 12892 12173 1442 4 0 1723 2 0 0
2013 7 16 2 DL 11433 10529 720 -4 0 900 -9 0 0
2013 7 16 2 DL 10529 11433 941 -4 0 1129 -12 0 0
2013 7 16 2 DL 10397 14100 1117 -4 0 1320 -5 0 0
2013 7 16 2 DL 14100 10397 1415 -5 0 1616 -12 0 0
2013 7 16 2 DL 14869 13487 2016 6 0 2346 0 0 0
Sample flight data
Year Month Day AirportID Time TimeZone SkyCondition Visibility WeatherType DryBulbFarenheit DryBulbCelsius WetBulbFarenheit WetBulbCelsius DewPointFarenheit DewPointCelsius RelativeHumidity
2013 7 1 14843 56 -4 FEW020 SCT035 10 78 25.6 75 23.6 73 22.8 85
2013 7 1 14843 156 -4 FEW035 10 78 25.6 74 23.2 72 22.2 82
2013 7 1 14843 256 -4 FEW050 10 78 25.6 75 23.6 73 22.8 85
2013 7 1 14843 356 -4 FEW055 SCT070 10 78 25.6 75 23.6 73 22.8 85
2013 7 1 14843 456 -4 FEW050 10 77 25 74 23.4 73 22.8 88
2013 7 1 14843 556 -4 FEW025 SCT065 10 78 25.6 75 23.6 73 22.8 85
2013 7 1 14843 656 -4 FEW025 SCT042 SCT065 10 78 25.6 75 24 74 23.3 88
2013 7 1 14843 756 -4 FEW034 SCT044 SCT065 10 81 27.2 77 24.8 75 23.9 82
2013 7 1 14843 856 -4 FEW034 SCT045 SCT070 9 85 29.4 77 25.1 74 23.3 70
2013 7 1 14843 956 -4 FEW022 SCT032 9 86 30 78 25.6 75 23.9 70
2013 7 1 14843 1056 -4 FEW025 SCT032 9 87 30.6 78 25.8 75 23.9 68
2013 7 1 14843 1156 -4 SCT025 SCT032CB BKN050 9 TS 84 28.9 77 25 74 23.3 72
2013 7 1 14843 1215 -4 SCT025 SCT038 BKN055 9 84 29 76 24.6 73 23 70
2013 7 1 14843 1256 -4 FEW025 SCT035 BKN075 7 =-RA 81 27.2 77 24.8 75 23.9 82
2013 7 1 14843 1356 -4 SCT028 SCT040 BKN080 9 83 28.3 76 24.4 73 22.8 72
2013 7 1 14843 1456 -4 FEW038 SCT060 BKN110 9 84 28.9 77 24.9 74 23.3 72
2013 7 1 14843 1556 -4 FEW036 SCT070 BKN110 9 83 28.3 76 24.4 73 22.8 72
2013 7 1 14843 1656 -4 FEW042 BKN100 9 83 28.3 77 24.8 74 23.3 74
2013 7 1 14843 1756 -4 FEW055 BKN100 9 83 28.3 77 24.8 74 23.3 74
2013 7 1 14843 1856 -4 FEW038 SCT075 BKN100 9 82 27.8 76 24.3 73 22.8 74
2013 7 1 14843 1956 -4 FEW055 10 81 27.2 75 24.1 73 22.8 77
2013 7 1 14843 2056 -4 FEW045 SCT065 10 81 27.2 75 24.1 73 22.8 77
2013 7 1 14843 2156 -4 FEW025 SCT055 10 80 26.7 75 23.9 73 22.8 79
2013 7 1 14843 2256 -4 FEW025 SCT055 10 80 26.7 76 24.3 74 23.3 82
2013 7 1 14843 2356 -4 FEW025 SCT060 10 79 26.1 76 24.1 74 23.3 85
2013 7 2 14843 56 -4 FEW022 BKN070 10 79 26.1 76 24.1 74 23.3 85
2013 7 2 14843 156 -4 FEW040 10 79 26.1 75 23.8 73 22.8 82
2013 7 2 14843 256 -4 FEW040 10 =-RA 78 25.6 75 23.6 73 22.8 85
2013 7 2 14843 356 -4 FEW022 SCT040 10 78 25.6 75 24 74 23.3 88
Sample weather data
Get/Prepare Data
Build/Edit Experiment
Create/Update Model
Evaluate Model Results
Create
Scoring
Graph
Add
Input/output
Ports
Publish Web
Service
Deploy to
Production
Build ML Model Publish as Web ServiceProvision Workspace
Get Azure
Subscription
Create
Workspace
# Connect Adult dataset - visible as data.frame
# Clean up unfortunate variable names (R does not like ‘-’ in variable names)
# Plot some variable distributions by income