introduction to machine learning

39
INTRODUCTION TO MACHINE LEARNING David Kauchak CS 451 – Fall 2013

Upload: javan

Post on 19-Mar-2016

28 views

Category:

Documents


0 download

DESCRIPTION

Introduction to Machine Learning. David Kauchak CS 451 – Fall 2013. Admin. Assignment 1… how’d it go? Assignment 2 out soon building decision trees Java with some starter code competition extra credit. Building decision trees. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Introduction to Machine Learning

INTRODUCTION TO MACHINE LEARNINGDavid KauchakCS 451 – Fall 2013

Page 2: Introduction to Machine Learning

Admin

Assignment 1… how’d it go?

Assignment 2- out soon- building decision trees- Java with some starter code- competition- extra credit

Page 3: Introduction to Machine Learning

Building decision treesBase case: If all data belong to the same class, create a leaf node with that label

Otherwise:- calculate the “score” for each feature if

we used it to split the data- pick the feature with the highest score,

partition the data based on that data value and call recursively

Page 4: Introduction to Machine Learning

Snowy

Partitioning the data

Terrain Unicycle-type

Weather Go-For-Ride?

Trail Normal Rainy NORoad Normal Sunny YESTrail Mountain Sunny YESRoad Mountain Rainy YESTrail Normal Snowy NORoad Normal Rainy YESRoad Mountain Snowy YESTrail Normal Sunny NORoad Normal Snowy NOTrail Mountain Snowy YES

TerrainRoad Trail

YES: 4NO: 1

YES: 2NO: 3

Unicycle Mountain Normal

YES: 4NO: 0

YES: 2NO: 4

WeatherRainy Sunny

YES: 2NO: 1

YES: 2NO: 1

YES: 2NO: 2

Page 5: Introduction to Machine Learning

Decision treesTerrain

Road Trail

YES: 4NO: 1

YES: 2NO: 3

Unicycle Mountain Normal

YES: 4NO: 0

YES: 2NO: 4

SnowyWeather

Rainy Sunny

YES: 2NO: 1

YES: 2NO: 1

YES: 2NO: 2

Training error: the average error over the training set

3/10 2/10 4/10

Page 6: Introduction to Machine Learning

Training error vs. accuracy

TerrainRoad Trail

YES: 4NO: 1

YES: 2NO: 3

Unicycle Mountain Normal

YES: 4NO: 0

YES: 2NO: 4

SnowyWeather

Rainy Sunny

YES: 2NO: 1

YES: 2NO: 1

YES: 2NO: 2

Training error: the average error over the training set

3/10 2/10 4/10

Training accuracy: the average percent correct over the training set

Training error:Training accuracy:7/10 8/10 6/10

training error = 1-accuracy (and vice versa)

Page 7: Introduction to Machine Learning

RecurseUnicycle

Mountain Normal

YES: 4NO: 0

YES: 2NO: 4Terrain Unicycle

-typeWeather Go-For-

Ride?Trail Normal Rainy NORoad Normal Sunny YESTrail Normal Snowy NORoad Normal Rainy YESTrail Normal Sunny NORoad Normal Snowy NO

Terrain Unicycle-type

Weather Go-For-Ride?

Trail Mountain Sunny YESRoad Mountain Rainy YESRoad Mountain Snowy YESTrail Mountain Snowy YES

Page 8: Introduction to Machine Learning

RecurseUnicycle

Mountain Normal

YES: 4NO: 0

YES: 2NO: 4

Terrain Unicycle-type

Weather Go-For-Ride?

Trail Normal Rainy NORoad Normal Sunny YESTrail Normal Snowy NORoad Normal Rainy YESTrail Normal Sunny NORoad Normal Snowy NO

Snowy

TerrainRoad Trail

YES: 2NO: 1

YES: 0NO: 3

WeatherRainy Sunny

YES: 1NO: 1

YES: 1NO: 1

YES: 0NO: 2

Page 9: Introduction to Machine Learning

RecurseUnicycle

Mountain Normal

YES: 4NO: 0

YES: 2NO: 4

Terrain Unicycle-type

Weather Go-For-Ride?

Trail Normal Rainy NORoad Normal Sunny YESTrail Normal Snowy NORoad Normal Rainy YESTrail Normal Sunny NORoad Normal Snowy NO

Snowy

TerrainRoad Trail

YES: 2NO: 1

YES: 0NO: 3

WeatherRainy Sunny

YES: 1NO: 1

YES: 1NO: 1

YES: 0NO: 2

1/6

2/6

Page 10: Introduction to Machine Learning

RecurseUnicycle

Mountain Normal

YES: 4NO: 0

Terrain Unicycle-type

Weather Go-For-Ride?

Road Normal Sunny YESRoad Normal Rainy YESRoad Normal Snowy NO

TerrainRoad Trail

YES: 2NO: 1

YES: 0NO: 3

Page 11: Introduction to Machine Learning

RecurseUnicycle

Mountain Normal

YES: 4NO: 0

TerrainRoad Trail

YES: 0NO: 3Snowy

WeatherRainy Sunny

YES: 1NO: 0

YES: 1NO: 0

YES: 0NO: 1

Page 12: Introduction to Machine Learning

RecurseUnicycle

Mountain Normal

YES: 4NO: 0

TerrainRoad Trail

YES: 0NO: 3Snowy

WeatherRainy Sunny

YES: 1NO: 0

YES: 1NO: 0

YES: 0NO: 1

Terrain Unicycle-type

Weather Go-For-Ride?

Trail Normal Rainy NORoad Normal Sunny YESTrail Mountain Sunny YESRoad Mountain Rainy YESTrail Normal Snowy NORoad Normal Rainy YESRoad Mountain Snowy YESTrail Normal Sunny NORoad Normal Snowy NOTrail Mountain Snowy YES

Training error?Are we always guaranteed to get a training error of 0?

Page 13: Introduction to Machine Learning

Problematic dataTerrain Unicycle

-typeWeather Go-For-

Ride?Trail Normal Rainy NORoad Normal Sunny YESTrail Mountain Sunny YESRoad Mountain Snowy NOTrail Normal Snowy NORoad Normal Rainy YESRoad Mountain Snowy YESTrail Normal Sunny NORoad Normal Snowy NOTrail Mountain Snowy YES

When can this happen?

Page 14: Introduction to Machine Learning

Recursive approachBase case: If all data belong to the same class, create a leaf node with that label OR all the data has the same feature values

Do we always want to go all the way to the bottom?

Page 15: Introduction to Machine Learning

What would the tree look like for…

Terrain Unicycle-type

Weather Go-For-Ride?

Trail Mountain Rainy YESTrail Mountain Sunny YESRoad Mountain Snowy YESRoad Mountain Sunny YESTrail Normal Snowy NO

Trail Normal Rainy NORoad Normal Snowy YESRoad Normal Sunny NOTrail Normal Sunny NO

Page 16: Introduction to Machine Learning

What would the tree look like for…

Terrain Unicycle-type

Weather Go-For-Ride?

Trail Mountain Rainy YESTrail Mountain Sunny YESRoad Mountain Snowy YESRoad Mountain Sunny YESTrail Normal Snowy NO

Trail Normal Rainy NORoad Normal Snowy YESRoad Normal Sunny NOTrail Normal Sunny NO

Unicycle Mountain Normal

YES TerrainRoad Trail

NO

SnowyWeather

Rainy Sunny

NO NOYES

Is that what you would do?

Page 17: Introduction to Machine Learning

What would the tree look like for…

Terrain Unicycle-type

Weather Go-For-Ride?

Trail Mountain Rainy YESTrail Mountain Sunny YESRoad Mountain Snowy YESRoad Mountain Sunny YESTrail Normal Snowy NO

Trail Normal Rainy NORoad Normal Snowy YESRoad Normal Sunny NOTrail Normal Sunny NO

Unicycle Mountain Normal

YES NO

Unicycle Mountain Normal

YES TerrainRoad Trail

NO

SnowyWeather

Rainy Sunny

NO NOYES

Maybe…

Page 18: Introduction to Machine Learning

What would the tree look like for…Terrain Unicycle-

typeWeather Jacket ML grade Go-For-

Ride?Trail Mountain Rainy Heavy D YESTrail Mountain Sunny Light C- YESRoad Mountain Snowy Light B YESRoad Mountain Sunny Heavy A YES… Mountain … … … YES

Trail Normal Snowy Light D+ NO

Trail Normal Rainy Heavy B- NORoad Normal Snowy Heavy C+ YESRoad Normal Sunny Light A- NOTrail Normal Sunny Heavy B+ NOTrail Normal Snowy Light F NO… Normal … … … NOTrail Normal Rainy Light C YES

Page 19: Introduction to Machine Learning

Overfitting

Terrain Unicycle-type

Weather Go-For-Ride?

Trail Mountain Rainy YESTrail Mountain Sunny YESRoad Mountain Snowy YESRoad Mountain Sunny YESTrail Normal Snowy NO

Trail Normal Rainy NORoad Normal Snowy YESRoad Normal Sunny NOTrail Normal Sunny NO

Unicycle Mountain Normal

YES

Overfitting occurs when we bias our model too much towards the training data

Our goal is to learn a general model that will work on the training data as well as other data (i.e. test data)

NO

Page 20: Introduction to Machine Learning

Overfitting

Our decision tree learning procedure always decreases training errorIs that what we want?

Page 21: Introduction to Machine Learning

Test set error!

Machine learning is about predicting the future based on the past.

-- Hal Daume III

TrainingData

learnmodel/predictor

past

predict

model/predictor

future

TestingData

Page 22: Introduction to Machine Learning

Overfitting

Even though the training error is decreasing, the testing error can go up!

Page 23: Introduction to Machine Learning

Overfitting

Terrain Unicycle-type

Weather Go-For-Ride?

Trail Mountain Rainy YESTrail Mountain Sunny YESRoad Mountain Snowy YESRoad Mountain Sunny YESTrail Normal Snowy NO

Trail Normal Rainy NORoad Normal Snowy YESRoad Normal Sunny NOTrail Normal Sunny NO

Unicycle Mountain Normal

YES TerrainRoad Trail

NO

SnowyWeather

Rainy Sunny

NO NOYES

How do we prevent overfitting?

Page 24: Introduction to Machine Learning

Preventing overfittingBase case: If all data belong to the same class, create a leaf node with that label OR all the data has the same feature values OR- We’ve reached a particular depth in the

tree- ?

One idea: stop building the tree early

Page 25: Introduction to Machine Learning

Preventing overfittingBase case: If all data belong to the same class, create a leaf node with that label OR all the data has the same feature values OR- We’ve reached a particular depth in the tree- We only have a certain number/fraction of

examples remaining- We’ve reached a particular training error- Use development data (more on this later)- …

Page 26: Introduction to Machine Learning

Preventing overfitting: pruning

Unicycle Mountain Normal

YES TerrainRoad Trail

NO

SnowyWeather

Rainy Sunny

NO NOYES

Pruning: after the tree is built, go back and “prune” the tree, i.e. remove some lower parts of the tree

Similar to stopping early, but done after the entire tree is built

Page 27: Introduction to Machine Learning

Preventing overfitting: pruning

Unicycle Mountain Normal

YES TerrainRoad Trail

NO

SnowyWeather

Rainy Sunny

NO NOYES

Build the full tree

Page 28: Introduction to Machine Learning

Preventing overfitting: pruning

Unicycle Mountain Normal

YES TerrainRoad Trail

NO

SnowyWeather

Rainy Sunny

NO NOYES

Build the full tree

Unicycle Mountain Normal

YES NO

Prune back leaves that are too specific

Page 29: Introduction to Machine Learning

Preventing overfitting: pruning

Unicycle Mountain Normal

YES TerrainRoad Trail

NO

SnowyWeather

Rainy Sunny

NO NOYES

Unicycle Mountain Normal

YES NO

Pruning criterion?

Page 30: Introduction to Machine Learning

Handling non-binary attributes

What do we do with features that have multiple values? Real-values?

Page 31: Introduction to Machine Learning

Features with multiple values

SnowyWeather

Rainy Sunny

NO NOYES

Treat as an n-ary split Treat as multiple binary splits

Rainy?Rainy not Rainy

NO

NOYES

Snowy?Snowy Sunny

Page 32: Introduction to Machine Learning

Real-valued features

Fare < $20Yes No

Use any comparison test (>, <, ≤, ≥) to split the data into two parts

Select a range filter, i.e. min < value < max

Fare0-10

10-20 20-50>50

Page 33: Introduction to Machine Learning

Other splitting criterionOtherwise:- calculate the “score” for each feature if

we used it to split the data- pick the feature with the highest score,

partition the data based on that data value and call recursively

We used training error for the score. Any other ideas?

Page 34: Introduction to Machine Learning

Other splitting criterion

- Entropy: how much uncertainty there is in the distribution over labels after the split- Gini: sum of the square of the label proportions after split- Training error = misclassification error

Page 35: Introduction to Machine Learning

Decision treesGood? Bad?

Page 36: Introduction to Machine Learning

Decision trees: the goodVery intuitive and easy to interpret

Fast to run and fairly easy to implement (Assignment 2 )

Historically, perform fairly well (especially with a few more tricks we’ll see later on)

No prior assumptions about the data

Page 37: Introduction to Machine Learning

Decision trees: the badBe careful with features with lots of values

ID Terrain

Unicycle-type

Weather

Go-For-Ride?

1 Trail Normal Rainy NO2 Road Normal Sunny YES3 Trail Mount

ainSunny YES

4 Road Mountain

Rainy YES

5 Trail Normal Snowy NO6 Road Normal Rainy YES7 Road Mount

ainSnowy YES

8 Trail Normal Sunny NO9 Road Normal Snowy NO10 Trail Mount

ainSnowy YESWhich feature would be at the top here?

Page 38: Introduction to Machine Learning

Decision trees: the badCan be problematic (slow, bad performance) with large numbers of features

Can’t learn some very simple data sets (e.g. some types of linearly separable data)

Pruning/tuning can be tricky to get right

Page 39: Introduction to Machine Learning

Final DT algorithmBase cases:1. If all data belong to the same class, pick that label2. If all the data have the same feature values, pick majority label3. If we’re out of features to examine, pick majority label4. If the we don’t have any data left, pick majority label of parent5. If some other stopping criteria exists to avoid overfitting, pick

majority label

Otherwise:- calculate the “score” for each feature if we used it to split the

data- pick the feature with the highest score, partition the data based

on that data value and call recursively