cs 461: machine learning lecture 2 - kiri wagstaff1/12/08 cs 461, winter 2008 1 cs 461: machine...
TRANSCRIPT
1/12/08 CS 461, Winter 2008 2
Introductions
Share with us: Your name Grad or undergrad? What inspired you to sign up for this class?
Anything specifically that you want to learn from it?
1/12/08 CS 461, Winter 2008 3
Today’s Topics
Review Homework 1 Supervised Learning Issues Decision Trees Evaluation Weka
1/12/08 CS 461, Winter 2008 4
Review
Machine Learning Computers learn from their past experience
Inductive Learning Generalize to new data
Supervised Learning Training data: <x, g(x)> pairs Known label or output value for training data Classification and regression
Instance-Based Learning 1-Nearest Neighbor k-Nearest Neighbors
Hypothesis
ObservationsFeedback,
moreobservations
Refinement
Induction,generalization Actions,
guesses
1/12/08 CS 461, Winter 2008 5
Homework 1
Solution/Discussion
1/12/08 CS 461, Winter 2008 6
Issues in Supervised Learning
1. Representation: which features to use?
2. Model Selection: complexity, noise, bias
1/12/08 CS 461, Winter 2008 7
When don’t you want zero error?
[Alpaydin 2004 © The MIT Press]
1/12/08 CS 461, Winter 2008 8
Model Selection and Complexity
Rule of thumb: prefer simplest model that has“good enough” performance
[Alpaydin 2004 © The MIT Press]
“Make everything as simple as possible, but not simpler.”-- Albert Einstein
1/12/08 CS 461, Winter 2008 9
Another reason to prefer simplemodels…
[Tom Dietterich]
1/12/08 CS 461, Winter 2008 10
Decision Trees
Chapter 9
1/12/08 CS 461, Winter 2008 11
Decision Trees
Example: Diagnosis of Psychotic Disorders
1/12/08 CS 461, Winter 2008 12
(Hyper-)Rectangles == Decision Tree
[Alpaydin 2004 © The MIT Press]
discriminant
1/12/08 CS 461, Winter 2008 13
Measuring Impurity
1. Calculate error using majority label After a split
2. More sensitive: use entropy For node m, Nm instances reach m, Nim belong to Ci
Node m is pure if pim is 0 or 1 Entropy:
After a split:
( )m
imi
miN
Npm,CP̂ =!x|
!=
"=K
i
i
m
i
mm pp1
2logI
[Alpaydin 2004 © The MIT Press]
!
I’m = "Nmj
Nmj=1
n
# pmji log2pmj
i
i=1
K
#
!
Im =1"maxi(pm
i)
!
I'm =Nmj
Nmj=1
n
" 1#maxi(pmj
i)
1/12/08 CS 461, Winter 2008 14
Should we play tennis?
[Tom Mitchell]
1/12/08 CS 461, Winter 2008 15
How well does it generalize?
[Tom Dietterich, Tom Mitchell]
1/12/08 CS 461, Winter 2008 16
Decision Tree Construction Algorithm
[Alpaydin 2004 © The MIT Press]
1/12/08 CS 461, Winter 2008 17
Decision Trees = Profit!
PredictionWorks “Increasing customer loyalty through targeted
marketing”
1/12/08 CS 461, Winter 2008 18
Decision Trees in Weka
Use Explorer, run J48 on height/weight data Who does it misclassify?
1/12/08 CS 461, Winter 2008 19
Building a Regression Tree
Same algorithm… different criterion Instead of impurity, use Mean Squared Error
(in local region) Predict mean output for node Compute training error (Same as computing the variance for the node)
Keep splitting until node error is acceptable;then it becomes a leaf Acceptable: error < threshold
1/12/08 CS 461, Winter 2008 20
Turning Trees into Rules
[Alpaydin 2004 © The MIT Press]
1/12/08 CS 461, Winter 2008 21
Weka Machine Learning Library
Weka Explorer’s Guide
1/12/08 CS 461, Winter 2008 22
Summary: Key Points for Today
Supervised Learning Representation: features available Model Selection: which hypothesis to choose?
Decision Trees Hierarchical, non-parametric, greedy
Nodes: test a feature value Leaves: classify items (or predict values)
Minimize impurity (%error or entropy) Turning trees into rules
Evaluation (10-fold) Cross-Validation Confusion Matrix
1/12/08 CS 461, Winter 2008 23
Next Time
Support Vector Machines(read Ch. 10.1-10.4, 10.6, 10.9)
Evaluation(read Ch. 14.4, 14.5, 14.7, 14.9)
Questions to answer from the reading: Posted on the website (calendar)