vssml16 l1. introduction, models, and evaluations

56
A Gentle Introduction to Machine Learning #VSSML16 September 2016 #VSSML16 A Gentle Introduction to Machine Learning September 2016 1 / 56

Upload: bigml-inc

Post on 22-Jan-2018

669 views

Category:

Data & Analytics


1 download

TRANSCRIPT

Page 1: VSSML16 L1. Introduction, Models, and Evaluations

A Gentle Introduction to Machine Learning

#VSSML16

September 2016

#VSSML16 A Gentle Introduction to Machine Learning September 2016 1 / 56

Page 2: VSSML16 L1. Introduction, Models, and Evaluations

Outline

1 A Machine Learning Fable

2 Machine Learning: Some Intuition

3 School Overview

4 The Basic Idea: Create a Program From Data

5 Modeling

6 Do More With Your Model

7 Evaluation

#VSSML16 A Gentle Introduction to Machine Learning September 2016 2 / 56

Page 3: VSSML16 L1. Introduction, Models, and Evaluations

Outline

1 A Machine Learning Fable

2 Machine Learning: Some Intuition

3 School Overview

4 The Basic Idea: Create a Program From Data

5 Modeling

6 Do More With Your Model

7 Evaluation

#VSSML16 A Gentle Introduction to Machine Learning September 2016 3 / 56

Page 4: VSSML16 L1. Introduction, Models, and Evaluations

A Churn Problem

• You’re the CEO of a mobilephone service(congratulations!)

• Some percent of yourcustomers leave every month(churn)

• You want to reach out to theones you think are going toleave next month (but youcan’t call everyone)

• And who might they be?

#VSSML16 A Gentle Introduction to Machine Learning September 2016 4 / 56

Page 5: VSSML16 L1. Introduction, Models, and Evaluations

Begin with the End In Mind

• Currently, you use a simple targeting strategy designed by hand,which identifies the 10,000 customers most likely to churn

• For every five people you call, two are considering leaving (falsepositive rate of 60%)

• On these customers, your operators can convince half to stay

• Each customer saved is worth $500

• What if you could reduce that false positive rate to 40%?

#VSSML16 A Gentle Introduction to Machine Learning September 2016 5 / 56

Page 6: VSSML16 L1. Introduction, Models, and Evaluations

You Have The Data!

Luckily, you’ve been tracking customer outcomes for months, alongwith a number of relevant customer attributes.

Minutes Used Last Month Bill Support Calls Website Visits Churn?104 $103.60 0 0 No124 $56.33 1 0 No56 $214.60 2 0 Yes

2410 $305.60 0 5 No536 $145.70 0 0 No234 $122.09 0 1 No201 $185.76 1 7 Yes111 $83.60 3 2 No

#VSSML16 A Gentle Introduction to Machine Learning September 2016 6 / 56

Page 7: VSSML16 L1. Introduction, Models, and Evaluations

Now . . . magic!

• Can we use the data to createa better targeting strategy, withvirtually no other work?(Spoiler: YES)

• Can we use the data toevaluate that strategy,confirming a better falsepositive rate? (Spoiler: YES)

• How? (Spoiler: MACHINELEARNING)

#VSSML16 A Gentle Introduction to Machine Learning September 2016 7 / 56

Page 8: VSSML16 L1. Introduction, Models, and Evaluations

Outline

1 A Machine Learning Fable

2 Machine Learning: Some Intuition

3 School Overview

4 The Basic Idea: Create a Program From Data

5 Modeling

6 Do More With Your Model

7 Evaluation

#VSSML16 A Gentle Introduction to Machine Learning September 2016 8 / 56

Page 9: VSSML16 L1. Introduction, Models, and Evaluations

In a Nutshell

• Collect row-column data aboutpast data from your predictionproblem (e.g., previousmonths of customer data)

• Feed this data to a machinelearning algorithm

• The algorithm creates aprogram (alternately,“predictive model” or“classifier”) that makespredictions on future data

#VSSML16 A Gentle Introduction to Machine Learning September 2016 9 / 56

Page 10: VSSML16 L1. Introduction, Models, and Evaluations

Traditional Expert System: Expert and Programmer

• Machine learning breaks thetraditional paradigm of expertsystems

• Experts know how theprediction is made

• Programmers know how totranslate the expert’sknowledge into a computerprogram that makes theprediction

• Constructing a systemrequires both (though theymay be the same person)

#VSSML16 A Gentle Introduction to Machine Learning September 2016 10 / 56

Page 11: VSSML16 L1. Introduction, Models, and Evaluations

Using Machine Learning: Data and Algorithm

• With machine learning, data replaces the expertI Is often more accurate than “expertise”

Article: “Specialist Knowledge is Useless and Unhelpful”1

I Requires no human expertise except how to collect the dataI The data may already be there

• Algorithms replace programmersI Algorithms are faster than programmers, enabling iterationI Algorithms are more modular than programmersI Data offers the opportunity for performance measurement (which

you would be doing in any case, right?)

1https://www.newscientist.com/article/mg21628930-400-specialist-knowledge-is-useless-and-unhelpful/

#VSSML16 A Gentle Introduction to Machine Learning September 2016 11 / 56

Page 12: VSSML16 L1. Introduction, Models, and Evaluations

Outline

1 A Machine Learning Fable

2 Machine Learning: Some Intuition

3 School Overview

4 The Basic Idea: Create a Program From Data

5 Modeling

6 Do More With Your Model

7 Evaluation

#VSSML16 A Gentle Introduction to Machine Learning September 2016 12 / 56

Page 13: VSSML16 L1. Introduction, Models, and Evaluations

Today

• MorningI Introduction, Models, and EvaluationsI Ensembles and Logistic Regressions

• AfternoonI Clusters and Anomaly DetectionI Association Discovery and Latent Dirichlet Allocation

#VSSML16 A Gentle Introduction to Machine Learning September 2016 13 / 56

Page 14: VSSML16 L1. Introduction, Models, and Evaluations

Tomorrow

• MorningI Basic TransformationsI Feature Engineering

• AfternoonI REST API, Bindings, and Basic WorkflowsI Advanced Workflows: Algorithms as Workflows

#VSSML16 A Gentle Introduction to Machine Learning September 2016 14 / 56

Page 15: VSSML16 L1. Introduction, Models, and Evaluations

Outline

1 A Machine Learning Fable

2 Machine Learning: Some Intuition

3 School Overview

4 The Basic Idea: Create a Program From Data

5 Modeling

6 Do More With Your Model

7 Evaluation

#VSSML16 A Gentle Introduction to Machine Learning September 2016 15 / 56

Page 16: VSSML16 L1. Introduction, Models, and Evaluations

First: The Data

In the simplest and most numerous cases, machine learning beginswith row-column data.

Minutes Used Last Month Bill Support Calls Website Visits Churn?104 $103.60 0 0 No124 $56.33 1 0 No56 $214.60 2 0 Yes

2410 $305.60 0 5 No536 $145.70 0 0 No234 $122.09 0 1 No201 $185.76 1 7 Yes111 $83.60 3 2 No

#VSSML16 A Gentle Introduction to Machine Learning September 2016 16 / 56

Page 17: VSSML16 L1. Introduction, Models, and Evaluations

Rows: What is the Prediction About?

• Each row (or instance) in thedata represents the thingabout which you are making aprediction

• Churn prediction: Each row isa customer

• Medical diagnosis: Each rowis a patient

• Credit card fraud detection:Each row is a transaction

• Market closing price forecast:Each row is a day

#VSSML16 A Gentle Introduction to Machine Learning September 2016 17 / 56

Page 18: VSSML16 L1. Introduction, Models, and Evaluations

Columns: What information Will Help Us Predict?

• Each column (or field, or feature) represents a piece ofinformation about the thing in each row

I Churn prediction: minutes used, calls to supportI Medical diagnosis: BMI, temperature, test resultsI Credit card fraud detection: Amount, transaction type, Geographical

information (maybe difference from previous transactions)I Market closing price forecast: Opening price, previous day’s

open/close, five day trend

• How much and what type of information should you collect?I There’s no general answer here; it’s “domain-specific”I Question one: How much does it matter (as a function of model

accuracy)?I Question two: How much does it cost to collect?

#VSSML16 A Gentle Introduction to Machine Learning September 2016 18 / 56

Page 19: VSSML16 L1. Introduction, Models, and Evaluations

Objective: What are we Trying to Predict?

• One of the columns is special:The “objective” is the columnin the data that contains theinformation we want to predict

• Churn prediction: Customerdid or did not churn

• Medical diagnosis: Positive ornegative for some disease

• Credit card fraud detection:Transaction is or is notfraudulent

• Market closing price forecast:The closing price on the givenday

#VSSML16 A Gentle Introduction to Machine Learning September 2016 19 / 56

Page 20: VSSML16 L1. Introduction, Models, and Evaluations

The Goal: Creating a Prediction Machine

• Feed this data to a machine learning algorithm

• The algorithm constructs a program (or model or classifier ):I Inputs: A single row of data, excluding the objectiveI Outputs: A prediction of the objective for that row

• Most importantly, the input row need not come from the trainingdata

#VSSML16 A Gentle Introduction to Machine Learning September 2016 20 / 56

Page 21: VSSML16 L1. Introduction, Models, and Evaluations

When is this a Good Idea?

• Remember: Machine learningreplaces expert andprogrammer with data andalgorithm

• Is it hard to find an expert?

• Can the programmer encodethe knowledge (quicklyenough, accurately enough)?

#VSSML16 A Gentle Introduction to Machine Learning September 2016 21 / 56

Page 22: VSSML16 L1. Introduction, Models, and Evaluations

People Can’t Tell You How They Do It

• Optical character recognition(label a written character givenits image)

• Object detection (image doesor does not contain an object)

• Speech recognition (label asentence given spoken soundfile)

#VSSML16 A Gentle Introduction to Machine Learning September 2016 22 / 56

Page 23: VSSML16 L1. Introduction, Models, and Evaluations

Human Experts are Rare or Expensive

• Medical diagnosis, MRIreading, etc. (predictbio-abnormalities from tests)

• Autonomous helicopteroperation

• Game playing (at extremelyhigh levels)

#VSSML16 A Gentle Introduction to Machine Learning September 2016 23 / 56

Page 24: VSSML16 L1. Introduction, Models, and Evaluations

Everyone Needs Their Own Algorithm

• Spam detection (predictspam/not spam from e-mailtext)

• Activity recognition (predictuser activity from mobilesensors)

• Location prediction (predictfuture user location givencurrent state)

• Anything that’s “personalized”,and generates enough data(mobiles!)

• Aside: IoT has the potential tobe huge for machine learning

#VSSML16 A Gentle Introduction to Machine Learning September 2016 24 / 56

Page 25: VSSML16 L1. Introduction, Models, and Evaluations

Every Little Bit Counts

• Market activity modeling(buy/sell at a given time)

• Recognition tasks that are“easy” (handwriting detection)

• Churn prediction?

• Any time there’s aperformance critical hand-builtsystem

#VSSML16 A Gentle Introduction to Machine Learning September 2016 25 / 56

Page 26: VSSML16 L1. Introduction, Models, and Evaluations

When is this a Bad idea?

• Human experts are relatively cheap and easy to come byI Natural language processing for stock tradersI Automatic defect detection in printer operation

• A program is easily written by handI Some simple if-then rules often perform wellI The best program is the one that’s already there

• The data is difficult or expensive to acquireI Medical domainsI Automatic processing of images

#VSSML16 A Gentle Introduction to Machine Learning September 2016 26 / 56

Page 27: VSSML16 L1. Introduction, Models, and Evaluations

Outline

1 A Machine Learning Fable

2 Machine Learning: Some Intuition

3 School Overview

4 The Basic Idea: Create a Program From Data

5 Modeling

6 Do More With Your Model

7 Evaluation

#VSSML16 A Gentle Introduction to Machine Learning September 2016 27 / 56

Page 28: VSSML16 L1. Introduction, Models, and Evaluations

Searching in Hypothesis Space

• Machine learning algorithms don’t exactly create a program

• More accurately, they search for a program consistent with thetraining data

I Search must be very cleverI Usually there is some sort of “Occam’s razor” involved

• The space of possible solutions in which the algorithm search iscalled its hypothesis space

• The algorithm is trying to do “reverse science”!

#VSSML16 A Gentle Introduction to Machine Learning September 2016 28 / 56

Page 29: VSSML16 L1. Introduction, Models, and Evaluations

Some Hypothesis Spaces

• Logistic regression: Sets of allpossible feature coefficientsand a bias term

• Neural network: Weights for allnodes in the network

• Support vector machines:Coefficients on each trainingpoint

• Note: Parametric vs.Non-parametric

#VSSML16 A Gentle Introduction to Machine Learning September 2016 29 / 56

Page 30: VSSML16 L1. Introduction, Models, and Evaluations

A Simple Hypothesis Space

Consider thresholds on a single variable, and predict “majority class” ofthe resulting subsets

#VSSML16 A Gentle Introduction to Machine Learning September 2016 30 / 56

Page 31: VSSML16 L1. Introduction, Models, and Evaluations

A Simple Hypothesis Space

Visits to Website > 0

#VSSML16 A Gentle Introduction to Machine Learning September 2016 31 / 56

Page 32: VSSML16 L1. Introduction, Models, and Evaluations

A Simple Hypothesis Space

Minutes > 200

#VSSML16 A Gentle Introduction to Machine Learning September 2016 32 / 56

Page 33: VSSML16 L1. Introduction, Models, and Evaluations

A Simple Hypothesis Space

Last Bill > $180

#VSSML16 A Gentle Introduction to Machine Learning September 2016 33 / 56

Page 34: VSSML16 L1. Introduction, Models, and Evaluations

Good Question

• Subset purity

• Large difference between theprior and posterior objectivedistributions

• How to find the best question?Try them all!

#VSSML16 A Gentle Introduction to Machine Learning September 2016 34 / 56

Page 35: VSSML16 L1. Introduction, Models, and Evaluations

Expanding the Space

There’s no reason to stop at just one question:

Last Bill > $180

#VSSML16 A Gentle Introduction to Machine Learning September 2016 35 / 56

Page 36: VSSML16 L1. Introduction, Models, and Evaluations

Expanding the Space

There’s no reason to stop at just one question:

Last Bill > $180 AND Support Calls > 0

#VSSML16 A Gentle Introduction to Machine Learning September 2016 36 / 56

Page 37: VSSML16 L1. Introduction, Models, and Evaluations

The Decision Tree

• Why stop at two?

• Suggests a recursivealgorithm:

I Choose a thresholdI Partition the data into

subsetsI Choose a threshold for the

subsets, and so on

• This is a decision tree

#VSSML16 A Gentle Introduction to Machine Learning September 2016 37 / 56

Page 38: VSSML16 L1. Introduction, Models, and Evaluations

Visualizing a Decision Tree

• Each node in the treerepresents a threshold ; aquestion asked of the data

• Branches leading from thenode indicate the two or morepaths data could take

• Thickness of the branchesindicates the amount oftraining data that took thatpath

• At the leaf or terminal node,we decide on a prediction;usually the majority class forthe subset at that leaf.

#VSSML16 A Gentle Introduction to Machine Learning September 2016 38 / 56

Page 39: VSSML16 L1. Introduction, Models, and Evaluations

When to Stop?

• How do we know when to stop splitting the data?

• Several possible criteria:I When the subset is totally pure (Note: Subsets of size 1 are trivially

pure)I When the size reaches a predetermined minimumI When the number of nodes or tree depth is too largeI When you can’t get any statistically significant improvement

• Nodes that don’t meet the latter criteria can be removed after treeconstruction via pruning

#VSSML16 A Gentle Introduction to Machine Learning September 2016 39 / 56

Page 40: VSSML16 L1. Introduction, Models, and Evaluations

Predicting Using a Decision Tree

• The process of building the tree from the training data is calledlearning, induction, or training.

• The payoff of using the tree for prediction is inference, prediction,or scoring.

• To do prediction with a tree, given an instance:I Start at the root node, subjecting the instance to the thresholdI “Send” the instance down the appropriate path, to the next

thresholdI Continue until you reach a terminal node, then make a prediction

#VSSML16 A Gentle Introduction to Machine Learning September 2016 40 / 56

Page 41: VSSML16 L1. Introduction, Models, and Evaluations

Outline

1 A Machine Learning Fable

2 Machine Learning: Some Intuition

3 School Overview

4 The Basic Idea: Create a Program From Data

5 Modeling

6 Do More With Your Model

7 Evaluation

#VSSML16 A Gentle Introduction to Machine Learning September 2016 41 / 56

Page 42: VSSML16 L1. Introduction, Models, and Evaluations

Prediction Confidence - Intuition

• Each terminal node return a prediction . . .

• . . . but are some terminal nodes “better” than others?

• Given a terminal node, how “confident” are we (perhaps forranking)?

• Depends on two things:I The number of instances that reach that terminal nodeI The purity of that subset

• Think of each terminal node as a “mixture of experts”

#VSSML16 A Gentle Introduction to Machine Learning September 2016 42 / 56

Page 43: VSSML16 L1. Introduction, Models, and Evaluations

Prediction Confidence - Calculation

• Compute a 95% confidencebound about the probabilityimplied by the terminal nodesubset

• The Wilson score interval:

p̂ + 12nz

2 ± z√

p̂(1−p̂)n + z2

4n2

1 + 1nz

2

• Use the lower bound as theconfidence

• Similarish for regressionmodels

#VSSML16 A Gentle Introduction to Machine Learning September 2016 43 / 56

Page 44: VSSML16 L1. Introduction, Models, and Evaluations

Field Importance - Intuition

• Which fields drive the model predictions the most?

• Why might we want to know?I Improve data collection practicesI Reduce the number of features in your modelI Model validation (Watch out for data leakage: objective in the

training data)

• Depends on two things:I Number of times the node uses that feature for the thresholdI How much it “matters” when it’s used

#VSSML16 A Gentle Introduction to Machine Learning September 2016 44 / 56

Page 45: VSSML16 L1. Introduction, Models, and Evaluations

Field Importance - Calculation

• For each node in the tree:I Compute the error of the

prediction in the subset atthat node (before)

I Compute the error of the twochild subsets (after)

I Compute the improvement(the difference between thetwo)

• Sum up the improvements forall nodes corresponding to agiven feature

#VSSML16 A Gentle Introduction to Machine Learning September 2016 45 / 56

Page 46: VSSML16 L1. Introduction, Models, and Evaluations

Outline

1 A Machine Learning Fable

2 Machine Learning: Some Intuition

3 School Overview

4 The Basic Idea: Create a Program From Data

5 Modeling

6 Do More With Your Model

7 Evaluation

#VSSML16 A Gentle Introduction to Machine Learning September 2016 46 / 56

Page 47: VSSML16 L1. Introduction, Models, and Evaluations

Evaluating Your Model

• You want to know if yourmodel is good before putting itin production

• In your data, you know theright answer!

• So use the data to test howoften your model is right

#VSSML16 A Gentle Introduction to Machine Learning September 2016 47 / 56

Page 48: VSSML16 L1. Introduction, Models, and Evaluations

Don’t Evaluate on the Training Data!

• Never train and test on the same data

• Many non-parametric models are able to memorize the trainingdata

• This results in evaluations that are sometimes wildly optimistic(which is the worst)

• Solution: Split your data into training and testing subsets (80%and 20% is customary)

#VSSML16 A Gentle Introduction to Machine Learning September 2016 48 / 56

Page 49: VSSML16 L1. Introduction, Models, and Evaluations

Watch out For Luck!

• Even splitting your data issometimes not enough

• You may draw an “easy”subset (especially possiblewith small data)

• Can be disastrous if the “edge”you’re expecting is small

• Solution: Do multiple runs andaverage (or cross-validate)

#VSSML16 A Gentle Introduction to Machine Learning September 2016 49 / 56

Page 50: VSSML16 L1. Introduction, Models, and Evaluations

Accuracy Can Be Misleading

• The top-line evaluation number is accuracy ; the number of correctpredictions divided by the number of total predictions

• Accuracy can be misleading!I Suppose only 5% of my customers churn every monthI Suppose I just predict “no churn” for everybodyI Ta-da! 95% accuracy!I But I get the same 95% accuracy if predict all “churn” instances

correct with a 50% false positive rate!

• Lesson: Accuracy is broken when classes are unbalanced

#VSSML16 A Gentle Introduction to Machine Learning September 2016 50 / 56

Page 51: VSSML16 L1. Introduction, Models, and Evaluations

The Confusion Matrix

• What to do if accuracy isn’t what you want?

• Look at the specific types of mistakes the algorithm makes

• This is the confusion matrix

#VSSML16 A Gentle Introduction to Machine Learning September 2016 51 / 56

Page 52: VSSML16 L1. Introduction, Models, and Evaluations

Aside: What If You Don’t Like it?

• You see your confusion matrix and it’s making “the wrongmistakes” (maybe you can even confirm by looking at the model)

• Often this is caused by unbalanced data; data where one classdominates the dataset

• You suspect that a trade-off could be made

• Using weights is a good way to do this

• A good place to start is often balancing the objective

#VSSML16 A Gentle Introduction to Machine Learning September 2016 52 / 56

Page 53: VSSML16 L1. Introduction, Models, and Evaluations

But What if You Want Just One Number?

• For model selection, you usually want a single number for a directcomparison

• There are other functions of the confusion matrix besidesaccuracy

• BigML provides a few of these:I Φ-coefficient (common in soft sciences)I F1-score (information retrieval, where there’s a rare positive class)I Balanced Accuracy (accuracy, averaged class-wise by inverse

frequency)

#VSSML16 A Gentle Introduction to Machine Learning September 2016 53 / 56

Page 54: VSSML16 L1. Introduction, Models, and Evaluations

The Right Thing is Always The Right Thing

• None of these things are better in general than any other

• In your domain, it might be best to come up with your own function

• A common form is the cost matrix : How much does a mistake ofeach type cost:

Total cost = cFPnFP + cFNnFN

where cx is the cost of a mistake of type x (false positive or falsenegative), and nx is the number of mistakes of that type.

• This is an important part of what you, the domain expert, bring tothe table! Don’t ignore it!

#VSSML16 A Gentle Introduction to Machine Learning September 2016 54 / 56

Page 55: VSSML16 L1. Introduction, Models, and Evaluations

Regression Evaluation: Generally Less Fraught

• To evaluate regression models, we first measure how much errorwe would have if we just predicted the mean objective for thewhole dataset

• This functions as a baseline

• How much (as a fraction) of the baseline error does the modeleliminate?

I 1.0 is bestI 0.0 is not good at allI And of course, we can go negative

• This is R2

• What is “good?” Again, this is domain-specific, but at leastimprovement with R2 is usually fairly monotonic

#VSSML16 A Gentle Introduction to Machine Learning September 2016 55 / 56

Page 56: VSSML16 L1. Introduction, Models, and Evaluations

Fin!

Questions?

#VSSML16 A Gentle Introduction to Machine Learning September 2016 56 / 56