demystifying machine learning

101
@louisdorard #papisconnect

Upload: louis-dorard

Post on 06-Aug-2015

292 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Demystifying Machine Learning

@louisdorard

#papisconnect

Page 2: Demystifying Machine Learning
Page 3: Demystifying Machine Learning

PredictiveAPIs

Page 4: Demystifying Machine Learning
Page 5: Demystifying Machine Learning

Student Researcher Data Scientist Developer Non-technical

GREAT TALKS ON “REALWORLD” MACHINE

LEARNINGIMPLEMENTATIONS FROM

ALL OVER THE WORLDANDRÉS GONZALEZ, CLEVERTASK

32.5%33.8%

Familiarity with Predictive

Page 6: Demystifying Machine Learning
Page 7: Demystifying Machine Learning
Page 8: Demystifying Machine Learning
Page 9: Demystifying Machine Learning

Machine Learning Use cases

Limitations Predictive APIs Does it work?

Case study ML Canvas

Page 10: Demystifying Machine Learning

–Mike Gualtieri, Principal Analyst at Forrester

“Predictive apps are the next big thing

in app development.”

Page 11: Demystifying Machine Learning

–Waqar Hasan, VISA

“Predictive is the ‘killer app’ for big data.”

Page 12: Demystifying Machine Learning

1. Machine Learning

2. Data

Page 13: Demystifying Machine Learning

BUT

Page 14: Demystifying Machine Learning

–McKinsey & Co. (2011)

“A significant constraint on realizing value from big data will

be a shortage of talent, particularly of people with deep

expertise in statistics and machine learning.”

Page 15: Demystifying Machine Learning

DemystifyingMachine Learning

Page 16: Demystifying Machine Learning
Page 17: Demystifying Machine Learning

“Which type of email is this?

— Spam/Ham”

Page 18: Demystifying Machine Learning

“Which type of email is this?

— Spam/Ham”

⇒ Classification

Page 19: Demystifying Machine Learning

I

O

“Which type of email is this?

— Spam/Ham”

Page 20: Demystifying Machine Learning

??

Page 21: Demystifying Machine Learning
Page 22: Demystifying Machine Learning

“How much is this house worth?

— X $”

-> Regression

Page 23: Demystifying Machine Learning

Bedrooms Bathrooms Surface (foot²) Year built Type Price ($)

3 1 860 1950 house 565,0003 1 1012 1951 house2 1.5 968 1976 townhouse 447,0004 1315 1950 house 648,0003 2 1599 1964 house3 2 987 1951 townhouse 790,0001 1 530 2007 condo 122,0004 2 1574 1964 house 835,0004 2001 house 855,0003 2.5 1472 2005 house4 3.5 1714 2005 townhouse2 2 1113 1999 condo1 769 1999 condo 315,000

Page 24: Demystifying Machine Learning

Bedrooms Bathrooms Surface (foot²) Year built Type Price ($)

3 1 860 1950 house 565,0003 1 1012 1951 house2 1.5 968 1976 townhouse 447,0004 1315 1950 house 648,0003 2 1599 1964 house3 2 987 1951 townhouse 790,0001 1 530 2007 condo 122,0004 2 1574 1964 house 835,0004 2001 house 855,0003 2.5 1472 2005 house4 3.5 1714 2005 townhouse2 2 1113 1999 condo1 769 1999 condo 315,000

Page 25: Demystifying Machine Learning
Page 26: Demystifying Machine Learning

ML is a set of AI techniques where “intelligence” is built by referring to

examples

Page 27: Demystifying Machine Learning
Page 28: Demystifying Machine Learning

Use cases

Page 29: Demystifying Machine Learning

• Real-estate

• Spam

• Priority inbox

• Crowd prediction

property price

email spam indicator

email importance indicator

location & context #people

Zillow

Gmail

Gmail

Tranquilien

Page 30: Demystifying Machine Learning

I. Get more customers • Reduce churn

• Score leads

• Optimize campaigns

customer churn indicator

customer revenue

customer & campaign interest indicator

Page 31: Demystifying Machine Learning

II. Serve customers better • Cross-sell

• Increase engagement

• Optimize pricing

customer & product purchase indicator

user & item interest indicator

product & price #sales

Page 32: Demystifying Machine Learning

III. Serve customers more efficiently • Predict demand

• Automate tasks

• Use predictive enterprise apps

context demand

credit application repayment indicator

Page 33: Demystifying Machine Learning

Predictive enterprise apps • Priority filtering

• Message routing

• Auto-configuration

message priority indicator

request employee

user & actions settings

RULES

Page 34: Demystifying Machine Learning

–Katherine Barr, Partner at VC-firm MDV

"Pairing human workers with machine learning and automation

will transform knowledge work and unleash new levels of human

productivity and creativity."

Page 35: Demystifying Machine Learning

Limitations

Page 36: Demystifying Machine Learning
Page 37: Demystifying Machine Learning
Page 38: Demystifying Machine Learning
Page 39: Demystifying Machine Learning
Page 40: Demystifying Machine Learning

Need examples of inputs AND outputs

Page 41: Demystifying Machine Learning
Page 42: Demystifying Machine Learning

What if not enough data points?

Page 43: Demystifying Machine Learning
Page 44: Demystifying Machine Learning

What if similar inputs have dissimilar outputs?

Page 45: Demystifying Machine Learning
Page 46: Demystifying Machine Learning

Bedrooms Bathrooms Price ($)

3 2 500,0003 2 800,0001 1 300,0001 1 800,000

Page 47: Demystifying Machine Learning

Bedrooms Bathrooms Surface (foot²) Year built Price ($)

3 2 800 1950 500,0003 2 1000 1950 800,0001 1 500 1950 300,0001 1 500 2014 800,000

Page 48: Demystifying Machine Learning

–@louisdorard

“A model can only be as good as the data it was given to train on”

Page 49: Demystifying Machine Learning

Predictive APIs:ML for all

Page 50: Demystifying Machine Learning
Page 51: Demystifying Machine Learning

HTML / CSS / JavaScript

Page 52: Demystifying Machine Learning

HTML / CSS / JavaScript

Page 53: Demystifying Machine Learning

squarespace.com

Page 54: Demystifying Machine Learning
Page 55: Demystifying Machine Learning
Page 56: Demystifying Machine Learning

The two phases of machine learning:

• TRAIN a model

• PREDICT with a model

Page 57: Demystifying Machine Learning

The two methods of predictive APIs:

• TRAIN a model

• PREDICT with a model

Page 58: Demystifying Machine Learning

The two methods of predictive APIs:

• model = create_model(dataset)

• predicted_output = create_prediction(model, new_input)

Page 59: Demystifying Machine Learning

The two methods of predictive APIs:

• model = create_model(‘training.csv’)

• predicted_output = create_prediction(model, new_input)

Page 60: Demystifying Machine Learning
Page 61: Demystifying Machine Learning
Page 62: Demystifying Machine Learning

“Is this email important?

— Yes/No”

Page 63: Demystifying Machine Learning

“Is this customer going to leave next month?

— Yes/No”

Page 64: Demystifying Machine Learning

“What is the sentiment of this tweet?

— Positive/Neutral/Negative”

Page 65: Demystifying Machine Learning

The two phases of machine learning:

• TRAIN a model

• PREDICT with a model

Page 66: Demystifying Machine Learning

The two phases of machine learning:

• TRAIN a model

• PREDICT with an already existing model

Page 67: Demystifying Machine Learning

“Is this email spam?

— Yes/No”

Page 68: Demystifying Machine Learning

Does it work?How well

Page 69: Demystifying Machine Learning

Bedrooms Bathrooms Surface (foot²) Year built Type Price ($)

3 1 860 1950 house 565,0003 1 1012 1951 house2 1.5 968 1976 townhouse 447,0004 1315 1950 house 648,0003 2 1599 1964 house3 2 987 1951 townhouse 790,0001 1 530 2007 condo 122,0004 2 1574 1964 house 835,0004 2001 house 855,0003 2.5 1472 2005 house4 3.5 1714 2005 townhouse2 2 1113 1999 condo1 769 1999 condo 315,000

Page 70: Demystifying Machine Learning

Bedrooms Bathrooms Surface (foot²) Year built Type Price ($)

3 1 860 1950 house 565,0002 1.5 968 1976 townhouse 447,0004 1315 1950 house 648,0003 2 987 1951 townhouse 790,0001 1 530 2007 condo 122,0004 2 1574 1964 house 835,0004 2001 house 855,0001 769 1999 condo 315,0003 1 1012 1951 house3 2 1599 1964 house3 2.5 1472 2005 house4 3.5 1714 2005 townhouse2 2 1113 1999 condo

Page 71: Demystifying Machine Learning

Bedrooms Bathrooms Surface (foot²) Year built Type Price ($)

3 1 860 1950 house 565,0002 1.5 968 1976 townhouse 447,0004 1315 1950 house 648,0003 2 987 1951 townhouse 790,0001 1 530 2007 condo 122,0004 2 1574 1964 house 835,0004 2001 house 855,0001 769 1999 condo 315,000

Page 72: Demystifying Machine Learning

Bedrooms Bathrooms Surface (foot²) Year built Type Price ($)

2 1.5 968 1976 townhouse 447,0003 1 860 1950 house 565,0001 769 1999 condo 315,0004 1315 1950 house 648,0004 2 1574 1964 house 835,0003 2 987 1951 townhouse 790,0004 2001 house 855,0001 1 530 2007 condo 122,000

Page 73: Demystifying Machine Learning

Bedrooms Bathrooms Surface (foot²) Year built Type Price ($)

4 2 1574 1964 house 835,0003 2 987 1951 townhouse 790,0004 2001 house 855,0001 1 530 2007 condo 122,000

Bedrooms Bathrooms Surface (foot²) Year built Type Price ($)

2 1.5 968 1976 townhouse 447,0003 1 860 1950 house 565,0001 769 1999 condo 315,0004 1315 1950 house 648,000

Page 74: Demystifying Machine Learning

Bedrooms Bathrooms Surface (foot²) Year built Type Price ($)

2 1.5 968 1976 townhouse 447,0003 1 860 1950 house 565,0001 769 1999 condo 315,0004 1315 1950 house 648,000

Bedrooms Bathrooms Surface (foot²) Year built Type Price ($) Price ($)

4 2 1574 1964 house 835,000 835,0003 2 987 1951 townhouse 790,000 790,0004 2001 house 855,000 855,0001 1 530 2007 condo 122,000 122,000

Page 75: Demystifying Machine Learning

Bedrooms Bathrooms Surface (foot²) Year built Type Price ($)

2 1.5 968 1976 townhouse 447,0003 1 860 1950 house 565,0001 769 1999 condo 315,0004 1315 1950 house 648,000

Bedrooms Bathrooms Surface (foot²) Year built Type Price ($) Price ($)

4 2 1574 1964 house 835,0003 2 987 1951 townhouse 790,0004 2001 house 855,0001 1 530 2007 condo 122,000

Page 76: Demystifying Machine Learning

Bedrooms Bathrooms Surface (foot²) Year built Type Price ($)

2 1.5 968 1976 townhouse 447,0003 1 860 1950 house 565,0001 769 1999 condo 315,0004 1315 1950 house 648,000

Bedrooms Bathrooms Surface (foot²) Year built Type Price ($) Price ($)

4 2 1574 1964 house 835,0003 2 987 1951 townhouse 790,0004 2001 house 855,0001 1 530 2007 condo 122,000

Page 77: Demystifying Machine Learning

Bedrooms Bathrooms Surface (foot²) Year built Type Price ($)

2 1.5 968 1976 townhouse 447,0003 1 860 1950 house 565,0001 769 1999 condo 315,0004 1315 1950 house 648,000

Bedrooms Bathrooms Surface (foot²) Year built Type Price ($) Price ($)

4 2 1574 1964 house 818,000 835,0003 2 987 1951 townhouse 800,000 790,0004 2001 house 915,000 855,0001 1 530 2007 condo 100,000 122,000

Page 78: Demystifying Machine Learning

Price ($) Price ($)

818,000 835,000800,000 790,000915,000 855,000100,000 122,000

Page 79: Demystifying Machine Learning

Need real-time machine learning?

Page 80: Demystifying Machine Learning

The two phases of machine learning:

• TRAIN a model

• PREDICT with a model

Page 81: Demystifying Machine Learning

• Training time

• Prediction time

• Accuracy

Page 82: Demystifying Machine Learning

Case study:churn analysis

Page 83: Demystifying Machine Learning

• Who: SaaS company selling monthly subscription

• Question asked: “Is this customer going to leave within 1 month?”

• Input: customer

• Output: no-churn (negative) or churn (positive)

• Data collection: history up until 1 month ago

• Baseline: if no usage for more than 15 days then churn

Page 84: Demystifying Machine Learning

Learning: OK

but

• How to represent customers?

• What to do after predicting churn?

Page 85: Demystifying Machine Learning

Customer representation:

• basic info (age, income, etc.)

• usage of service (# times used app, avg time spent, features used, etc.)

• interactions with customer support (how many, topics of questions, satisfaction ratings)

Page 86: Demystifying Machine Learning

Taking action to prevent churn:

• contact customers (in which order?)

• switch to different plan

• give special offer

• no action?

Page 87: Demystifying Machine Learning

Measuring accuracy:

• #TP (we predict customer churns and he does)

• #FP (we predict customer churns but he doesn’t)

• #FN (we predict customer doesn’t churn but he does)

• Compare to baseline

Page 88: Demystifying Machine Learning

Estimating Return On Investment:

• Taking action for #TP and #FP customers has a cost

• We earn #TP * success rate * revenue /cust. /month

• Compare to baseline

Page 89: Demystifying Machine Learning

Machine Learning Canvas

Page 90: Demystifying Machine Learning
Page 91: Demystifying Machine Learning

Machine Learning Canvas

PREDICTIONS OBJECTIVES DATA

Context

Who will use the predictive system / who will beaffected by it? Provide some background.

Value Proposition

What are we trying to do? E.g. spend less time onX, increase Y...

Data Sources

Where do/can we get data from? (internaldatabase, 3rd party API, etc.)

Problem

Question to predict answers to (in plain English)

Input (i.e. question "parameter")

Possible outputs (i.e. "answers")

Type of problem (e.g. classification, regression,recommendation...)

Baseline

What is an alternative way of making predictions(e.g. manual rules based on feature values)?

Performance evaluation

Domain-specific / bottom-line metrics formonitoring performance in production

Prediction accuracy metrics (e.g. MSE ifregression; % accuracy, #FP for classification)

Offline performance evaluation method (e.g.cross-validation or simple training/test split)

Dataset

How do we collect data (inputs and outputs)?How many data points?

Features

Used to represent inputs and extracted fromdata sources above. Group by types andmention key features if too many to list all.

Using predictions

When do we make predictions and how many?

What is the time constraint for making those predictions?

How do we use predictions and confidence values?

Learning predictive models

When do we create/update models? With which data / how much?

What is the time constraint for creating a model?

Criteria for deploying model (e.g. minimum performance value — absolute,relative to baseline or to previous model)

IDE

AS

PE

CS

DE

PLO

YM

EN

T

Page 92: Demystifying Machine Learning

BACKGROUND

ENGINE SPECS

INTEGRATION

Page 93: Demystifying Machine Learning

PREDICTIONS OBJECTIVES DATA

BACKGROUND

ENGINE SPECS

INTEGRATION

Page 94: Demystifying Machine Learning

PREDICTIONS OBJECTIVES DATA

BACKGROUND End-user Value prop Sources

ENGINE SPECS ML problem Perf eval Preparation

INTEGRATION Using pred Learning modelINTEGRATION Using pred Learning model

Page 95: Demystifying Machine Learning

Why fill in ML canvas?

• Target the right problem for your company

• Choose right algorithm, infrastructure, or ML solution

• Guide project management

• Improve team communication

Page 96: Demystifying Machine Learning

machinelearningcanvas.com

Page 97: Demystifying Machine Learning

Recap

Page 98: Demystifying Machine Learning

• Need examples of inputs AND outputs

• Need enough examples

Page 99: Demystifying Machine Learning

• ML to create value from data

• 2 phases: TRAIN and PREDICT

• Predictive APIs make it more accessible

• Good data is essential

• What do we do with predictions?

• Measure performance with accuracy, time and bottom-line

• Also: deploy, maintain, improve…

Page 100: Demystifying Machine Learning
Page 101: Demystifying Machine Learning

louisdorard.com