cs 461: machine learning lecture 2 - kiri wagstaff1/12/08 cs 461, winter 2008 1 cs 461: machine...

1/12/08 CS 461, Winter 2008 1

CS 461: Machine LearningLecture 2

Dr. Kiri [email protected]

1/12/08 CS 461, Winter 2008 2

Introductions

Share with us: Your name Grad or undergrad? What inspired you to sign up for this class?

Anything specifically that you want to learn from it?

1/12/08 CS 461, Winter 2008 3

Today’s Topics

Review Homework 1 Supervised Learning Issues Decision Trees Evaluation Weka

1/12/08 CS 461, Winter 2008 4

Review

Machine Learning Computers learn from their past experience

Inductive Learning Generalize to new data

Supervised Learning Training data: <x, g(x)> pairs Known label or output value for training data Classification and regression

Instance-Based Learning 1-Nearest Neighbor k-Nearest Neighbors

Hypothesis

ObservationsFeedback,

moreobservations

Refinement

Induction,generalization Actions,

guesses

1/12/08 CS 461, Winter 2008 5

Homework 1

Solution/Discussion

1/12/08 CS 461, Winter 2008 6

Issues in Supervised Learning

1. Representation: which features to use?

2. Model Selection: complexity, noise, bias

1/12/08 CS 461, Winter 2008 7

When don’t you want zero error?

[Alpaydin 2004 © The MIT Press]

1/12/08 CS 461, Winter 2008 8

Model Selection and Complexity

Rule of thumb: prefer simplest model that has“good enough” performance


“Make everything as simple as possible, but not simpler.”-- Albert Einstein

1/12/08 CS 461, Winter 2008 9

Another reason to prefer simplemodels…

[Tom Dietterich]

1/12/08 CS 461, Winter 2008 10

Decision Trees

Chapter 9

1/12/08 CS 461, Winter 2008 11

Decision Trees

Example: Diagnosis of Psychotic Disorders

1/12/08 CS 461, Winter 2008 12

(Hyper-)Rectangles == Decision Tree


discriminant

1/12/08 CS 461, Winter 2008 13

Measuring Impurity

1. Calculate error using majority label After a split

2. More sensitive: use entropy For node m, Nm instances reach m, Nim belong to Ci

Node m is pure if pim is 0 or 1 Entropy:

After a split:

( )m

imi

miN

Npm,CP̂ =!x|

!=

"=K

i

i

m

i

mm pp1

2logI


!

I’m = "Nmj

Nmj=1

n

# pmji log2pmj

i

i=1

K

#

!

Im =1"maxi(pm

i)

!

I'm =Nmj

Nmj=1

n

" 1#maxi(pmj

i)

1/12/08 CS 461, Winter 2008 14

Should we play tennis?

[Tom Mitchell]

1/12/08 CS 461, Winter 2008 15

How well does it generalize?

[Tom Dietterich, Tom Mitchell]

1/12/08 CS 461, Winter 2008 16

Decision Tree Construction Algorithm


1/12/08 CS 461, Winter 2008 17

Decision Trees = Profit!

PredictionWorks “Increasing customer loyalty through targeted

marketing”

1/12/08 CS 461, Winter 2008 18

Decision Trees in Weka

Use Explorer, run J48 on height/weight data Who does it misclassify?

1/12/08 CS 461, Winter 2008 19

Building a Regression Tree

Same algorithm… different criterion Instead of impurity, use Mean Squared Error

(in local region) Predict mean output for node Compute training error (Same as computing the variance for the node)

Keep splitting until node error is acceptable;then it becomes a leaf Acceptable: error < threshold

1/12/08 CS 461, Winter 2008 20

Turning Trees into Rules


1/12/08 CS 461, Winter 2008 21

Weka Machine Learning Library

Weka Explorer’s Guide

1/12/08 CS 461, Winter 2008 22

Summary: Key Points for Today

Supervised Learning Representation: features available Model Selection: which hypothesis to choose?

Decision Trees Hierarchical, non-parametric, greedy

Nodes: test a feature value Leaves: classify items (or predict values)

Minimize impurity (%error or entropy) Turning trees into rules

Evaluation (10-fold) Cross-Validation Confusion Matrix

1/12/08 CS 461, Winter 2008 23

Next Time

Support Vector Machines(read Ch. 10.1-10.4, 10.6, 10.9)

Evaluation(read Ch. 14.4, 14.5, 14.7, 14.9)

Questions to answer from the reading: Posted on the website (calendar)

cs 461: machine learning lecture 2 - kiri wagstaff1/12/08 cs 461, winter 2008 1 cs 461: machine...

Documents