decision treessheldon/teaching/mhc/cs341/2012fa/files/lec8.pdf · cs 341 – lectures 8/9 dan...
TRANSCRIPT
![Page 1: Decision Treessheldon/teaching/mhc/cs341/2012fa/files/lec8.pdf · CS 341 – Lectures 8/9 Dan Sheldon . Review: Linear Methods ! So far, we’ve looked at linear methods ! Linear](https://reader030.vdocuments.net/reader030/viewer/2022040800/5e35fcf002aa42063a0366b1/html5/thumbnails/1.jpg)
Decision Trees
CS 341 – Lectures 8/9
Dan Sheldon
![Page 2: Decision Treessheldon/teaching/mhc/cs341/2012fa/files/lec8.pdf · CS 341 – Lectures 8/9 Dan Sheldon . Review: Linear Methods ! So far, we’ve looked at linear methods ! Linear](https://reader030.vdocuments.net/reader030/viewer/2022040800/5e35fcf002aa42063a0366b1/html5/thumbnails/2.jpg)
Review: Linear Methods
! So far, we’ve looked at linear methods
! Linear regression ! Fit a line/plane/hyperplane
! Logistic regression ! Decision boundary is a line/
plane/hyperplane
Elements of Statistical Learning (2nd Ed.) c!Hastie, Tibshirani & Friedman 2009 Chap 3
•• •
••
• •••
• •••
•
•
•
••
•••
••
•
•
••
•
•• ••
•
•
••
•• •
•
•
•
•
•
•
•
•
•
•
•
•• •
•
•
•
••
•
• ••
• •
••
• •••
•
•
•
•
X1
X2
Y
FIGURE 3.1. Linear least squares fitting withX ! IR2. We seek the linear function of X that mini-mizes the sum of squared residuals from Y .
2.5 3
3.5 4
4.5 5
5.5 6
6.5 7
7.5
4.5 5 5.5 6 6.5 7
x 2
x1
![Page 3: Decision Treessheldon/teaching/mhc/cs341/2012fa/files/lec8.pdf · CS 341 – Lectures 8/9 Dan Sheldon . Review: Linear Methods ! So far, we’ve looked at linear methods ! Linear](https://reader030.vdocuments.net/reader030/viewer/2022040800/5e35fcf002aa42063a0366b1/html5/thumbnails/3.jpg)
Review: Feature Expansions ! Non-linearity by trickery
! Drawback: not automatic ! hard work for us!
Polynomial Regression
x 7! (1, x, x
2, x
3, . . .)
−1 −0.5 0 0.5 1−2
−1
0
1
2
3
4
x
y
Polynomial Regression
x 7! (1, x, x
2, x
3, . . .)
−1 −0.5 0 0.5 1−2
−1
0
1
2
3
4
x
y
-1.5
-1
-0.5
0
0.5
1
1.5
-1.5 -1 -0.5 0 0.5 1 1.5
x 2
x1
Nonlinear Decision Boundaries by Feature Expansion
Example (Ng)
(x1, x2) 7! (1, x1, x2, x21, x
22, x1x2),
w =
⇥�1 0 0 1 1 0
⇤T
Exercise: what does decision boundary look like in (x1, x2) plane?
![Page 4: Decision Treessheldon/teaching/mhc/cs341/2012fa/files/lec8.pdf · CS 341 – Lectures 8/9 Dan Sheldon . Review: Linear Methods ! So far, we’ve looked at linear methods ! Linear](https://reader030.vdocuments.net/reader030/viewer/2022040800/5e35fcf002aa42063a0366b1/html5/thumbnails/4.jpg)
Next Up: Non-Linear Methods
! Hypothesis class is intrinsically non-linear
! Next few topics ! Decision trees ! Neural networks ! Kernels
! Today: decision trees
![Page 5: Decision Treessheldon/teaching/mhc/cs341/2012fa/files/lec8.pdf · CS 341 – Lectures 8/9 Dan Sheldon . Review: Linear Methods ! So far, we’ve looked at linear methods ! Linear](https://reader030.vdocuments.net/reader030/viewer/2022040800/5e35fcf002aa42063a0366b1/html5/thumbnails/5.jpg)
Decision Tree Topics
! What is a decision tree? ! Learn a tree from data
! Good vs. bad trees ! Greedy top-down learning ! Entropy
! Pruning ! Geometric interpretation
![Page 6: Decision Treessheldon/teaching/mhc/cs341/2012fa/files/lec8.pdf · CS 341 – Lectures 8/9 Dan Sheldon . Review: Linear Methods ! So far, we’ve looked at linear methods ! Linear](https://reader030.vdocuments.net/reader030/viewer/2022040800/5e35fcf002aa42063a0366b1/html5/thumbnails/6.jpg)
What Is a Decision Tree? ! Example from R&N: wait for table at restaurant?
[Figure: Russell & Norvig]
No Yes
No Yes
No Yes
No Yes
No Yes
No Yes
None Some Full
>60 30−60 10−30 0−10
No YesAlternate?
Hungry?
Reservation?
Bar? Raining?
Alternate?
Patrons?
Fri/Sat?
WaitEstimate?F T
F T
T
T
F T
TFT
TF
![Page 7: Decision Treessheldon/teaching/mhc/cs341/2012fa/files/lec8.pdf · CS 341 – Lectures 8/9 Dan Sheldon . Review: Linear Methods ! So far, we’ve looked at linear methods ! Linear](https://reader030.vdocuments.net/reader030/viewer/2022040800/5e35fcf002aa42063a0366b1/html5/thumbnails/7.jpg)
What Is a Decision Tree? ! Internal nodes: test an attribute (feature)
! Assume discrete attributes for now ! Branches: different values of attribute ! Leaf node: predict a class label
[Figure: Russell & Norvig]
No Yes
No Yes
No Yes
No Yes
No Yes
No Yes
None Some Full
>60 30−60 10−30 0−10
No YesAlternate?
Hungry?
Reservation?
Bar? Raining?
Alternate?
Patrons?
Fri/Sat?
WaitEstimate?F T
F T
T
T
F T
TFT
TF
![Page 8: Decision Treessheldon/teaching/mhc/cs341/2012fa/files/lec8.pdf · CS 341 – Lectures 8/9 Dan Sheldon . Review: Linear Methods ! So far, we’ve looked at linear methods ! Linear](https://reader030.vdocuments.net/reader030/viewer/2022040800/5e35fcf002aa42063a0366b1/html5/thumbnails/8.jpg)
What Is a Decision Tree?
Restaurant Table — one example
Example
Attributes Target
Alt Bar Fri Hun Pat Price Rain Res Type Est Wait?X4 T F T T Full $ F F Thai 10–30 T
! Predict by following path down tree
No Yes
No Yes
No Yes
No Yes
No Yes
No Yes
None Some Full
>60 30−60 10−30 0−10
No YesAlternate?
Hungry?
Reservation?
Bar? Raining?
Alternate?
Patrons?
Fri/Sat?
WaitEstimate?F T
F T
T
T
F T
TFT
TF
![Page 9: Decision Treessheldon/teaching/mhc/cs341/2012fa/files/lec8.pdf · CS 341 – Lectures 8/9 Dan Sheldon . Review: Linear Methods ! So far, we’ve looked at linear methods ! Linear](https://reader030.vdocuments.net/reader030/viewer/2022040800/5e35fcf002aa42063a0366b1/html5/thumbnails/9.jpg)
Decision Trees in the Wild
! Decision trees are common decision-making tools ! You have all seen them!
! R&N: Car service manuals
! Viburnum ID ! http://www.hort.cornell.edu/vlb/key/
index.html
[Image: Clemson cooperative extension]
![Page 10: Decision Treessheldon/teaching/mhc/cs341/2012fa/files/lec8.pdf · CS 341 – Lectures 8/9 Dan Sheldon . Review: Linear Methods ! So far, we’ve looked at linear methods ! Linear](https://reader030.vdocuments.net/reader030/viewer/2022040800/5e35fcf002aa42063a0366b1/html5/thumbnails/10.jpg)
Decision Trees in the Wild ! C-section risk (learned from data) DT for prediction C-section risks
[Example: X. Fern, Oregon State]
![Page 11: Decision Treessheldon/teaching/mhc/cs341/2012fa/files/lec8.pdf · CS 341 – Lectures 8/9 Dan Sheldon . Review: Linear Methods ! So far, we’ve looked at linear methods ! Linear](https://reader030.vdocuments.net/reader030/viewer/2022040800/5e35fcf002aa42063a0366b1/html5/thumbnails/11.jpg)
Decision Trees in the Wild
![Page 12: Decision Treessheldon/teaching/mhc/cs341/2012fa/files/lec8.pdf · CS 341 – Lectures 8/9 Dan Sheldon . Review: Linear Methods ! So far, we’ve looked at linear methods ! Linear](https://reader030.vdocuments.net/reader030/viewer/2022040800/5e35fcf002aa42063a0366b1/html5/thumbnails/12.jpg)
Why Decision Trees? ! Easy to understand
! Match human decision-making processes
! Excellent performance ! One of the best out-of-the-box methods when used carefully
! Very flexible hypothesis space ! Can learn complex decision boundaries
! Handle messy data well ! Missing features ! Continuous discrete ! Etc.
![Page 13: Decision Treessheldon/teaching/mhc/cs341/2012fa/files/lec8.pdf · CS 341 – Lectures 8/9 Dan Sheldon . Review: Linear Methods ! So far, we’ve looked at linear methods ! Linear](https://reader030.vdocuments.net/reader030/viewer/2022040800/5e35fcf002aa42063a0366b1/html5/thumbnails/13.jpg)
How to Learn a Tree From Data ! Training data for R&N example
Restaurant Table
Example
Attributes Target
Alt Bar Fri Hun Pat Price Rain Res Type Est Wait?
X1 Yes No No Yes Some $$$ No Yes French 0–10 TX2 Yes No No Yes Full $ No No Thai 30–60 FX3 No Yes No No Some $ No No Burger 0–10 TX4 Yes No Yes Yes Full $ Yes No Thai 10–30 TX5 Yes No Yes No Full $$$ No Yes French >60 FX6 No Yes No Yes Some $$ Yes Yes Italian 0–10 TX7 No Yes No No None $ Yes No Burger 0–10 FX8 No No No Yes Some $$ Yes Yes Thai 0–10 TX9 No Yes Yes No Full $ Yes No Burger >60 FX10 Yes Yes Yes Yes Full $$$ No Yes Italian 10–30 FX11 No No No No None $ No No Thai 0–10 FX12 Yes Yes Yes Yes Full $ No No Burger 30–60 T
![Page 14: Decision Treessheldon/teaching/mhc/cs341/2012fa/files/lec8.pdf · CS 341 – Lectures 8/9 Dan Sheldon . Review: Linear Methods ! So far, we’ve looked at linear methods ! Linear](https://reader030.vdocuments.net/reader030/viewer/2022040800/5e35fcf002aa42063a0366b1/html5/thumbnails/14.jpg)
Exercise
! Design an algorithm ! Input: single training example ! Output: decision tree that matches that example
Single Example
Example
Attributes Target
Alt Bar Fri Hun Pat Price Rain Res Type Est Wait?X4 Yes No Yes Yes Full $ Yes No Thai 10–30 T
![Page 15: Decision Treessheldon/teaching/mhc/cs341/2012fa/files/lec8.pdf · CS 341 – Lectures 8/9 Dan Sheldon . Review: Linear Methods ! So far, we’ve looked at linear methods ! Linear](https://reader030.vdocuments.net/reader030/viewer/2022040800/5e35fcf002aa42063a0366b1/html5/thumbnails/15.jpg)
Exercise (cont’)
Single Example
Example
Attributes Target
Alt Bar Fri Hun Pat Price Rain Res Type Est Wait?X4 Yes No Yes Yes Full $ Yes No Thai 10–30 T
![Page 16: Decision Treessheldon/teaching/mhc/cs341/2012fa/files/lec8.pdf · CS 341 – Lectures 8/9 Dan Sheldon . Review: Linear Methods ! So far, we’ve looked at linear methods ! Linear](https://reader030.vdocuments.net/reader030/viewer/2022040800/5e35fcf002aa42063a0366b1/html5/thumbnails/16.jpg)
Exercise ! Design a tree to match all examples
! Hint: use previous exercise
Restaurant Table
Example
Attributes Target
Alt Bar Fri Hun Pat Price Rain Res Type Est Wait?
X1 Yes No No Yes Some $$$ No Yes French 0–10 TX2 Yes No No Yes Full $ No No Thai 30–60 FX3 No Yes No No Some $ No No Burger 0–10 TX4 Yes No Yes Yes Full $ Yes No Thai 10–30 TX5 Yes No Yes No Full $$$ No Yes French >60 FX6 No Yes No Yes Some $$ Yes Yes Italian 0–10 TX7 No Yes No No None $ Yes No Burger 0–10 FX8 No No No Yes Some $$ Yes Yes Thai 0–10 TX9 No Yes Yes No Full $ Yes No Burger >60 FX10 Yes Yes Yes Yes Full $$$ No Yes Italian 10–30 FX11 No No No No None $ No No Thai 0–10 FX12 Yes Yes Yes Yes Full $ No No Burger 30–60 T
![Page 17: Decision Treessheldon/teaching/mhc/cs341/2012fa/files/lec8.pdf · CS 341 – Lectures 8/9 Dan Sheldon . Review: Linear Methods ! So far, we’ve looked at linear methods ! Linear](https://reader030.vdocuments.net/reader030/viewer/2022040800/5e35fcf002aa42063a0366b1/html5/thumbnails/17.jpg)
A Correct Answer ! For each training example, create a path from root to leaf
! What is wrong with this?
![Page 18: Decision Treessheldon/teaching/mhc/cs341/2012fa/files/lec8.pdf · CS 341 – Lectures 8/9 Dan Sheldon . Review: Linear Methods ! So far, we’ve looked at linear methods ! Linear](https://reader030.vdocuments.net/reader030/viewer/2022040800/5e35fcf002aa42063a0366b1/html5/thumbnails/18.jpg)
What Is a Good Tree? ! Find smallest tree that fits training data
! A very hard problem ! Number of trees on d binary attributes at least ! E.g., with 6 attributes!
! Effectively impossible to find best tree
Number of Trees
Number of trees on d attributes is at least
22d
Number of Trees
Number of trees on d attributes is at least
22d
E.g., 226 = 18,446,744,073,709,551,616
![Page 19: Decision Treessheldon/teaching/mhc/cs341/2012fa/files/lec8.pdf · CS 341 – Lectures 8/9 Dan Sheldon . Review: Linear Methods ! So far, we’ve looked at linear methods ! Linear](https://reader030.vdocuments.net/reader030/viewer/2022040800/5e35fcf002aa42063a0366b1/html5/thumbnails/19.jpg)
Top-Down Greedy Heuristic ! Idea:
! Find “best” attribute to test at root of tree ! Recursively apply algorithm to each branch
! “Best” attribute? ! Ideally, splits the examples into subsets that are "all positive" or "all
negative"
None Some Full
Patrons?
French Italian Thai Burger
Type?
![Page 20: Decision Treessheldon/teaching/mhc/cs341/2012fa/files/lec8.pdf · CS 341 – Lectures 8/9 Dan Sheldon . Review: Linear Methods ! So far, we’ve looked at linear methods ! Linear](https://reader030.vdocuments.net/reader030/viewer/2022040800/5e35fcf002aa42063a0366b1/html5/thumbnails/20.jpg)
Base Cases
! Keep splitting until …. ! All examples have same label ! Run out of examples ! Run out of attributes
![Page 21: Decision Treessheldon/teaching/mhc/cs341/2012fa/files/lec8.pdf · CS 341 – Lectures 8/9 Dan Sheldon . Review: Linear Methods ! So far, we’ve looked at linear methods ! Linear](https://reader030.vdocuments.net/reader030/viewer/2022040800/5e35fcf002aa42063a0366b1/html5/thumbnails/21.jpg)
Base Case I
! All examples have same classification (e.g. true) ! Return classification (e.g., true)
None Some Full
Patrons?
French Italian Thai Burger
Type?
![Page 22: Decision Treessheldon/teaching/mhc/cs341/2012fa/files/lec8.pdf · CS 341 – Lectures 8/9 Dan Sheldon . Review: Linear Methods ! So far, we’ve looked at linear methods ! Linear](https://reader030.vdocuments.net/reader030/viewer/2022040800/5e35fcf002aa42063a0366b1/html5/thumbnails/22.jpg)
Base Case II
Patrons?
None Some Full
Price?
$ $$ $$$
! No more examples ! Return majority class of parent (e.g., false)
![Page 23: Decision Treessheldon/teaching/mhc/cs341/2012fa/files/lec8.pdf · CS 341 – Lectures 8/9 Dan Sheldon . Review: Linear Methods ! So far, we’ve looked at linear methods ! Linear](https://reader030.vdocuments.net/reader030/viewer/2022040800/5e35fcf002aa42063a0366b1/html5/thumbnails/23.jpg)
Base Case III
Patrons?
None Some Full
Price?
$ $$ $$$
Type?
… Hungry?
yes
Thai
! No more attributes ! Return majority class (e.g., false)
![Page 24: Decision Treessheldon/teaching/mhc/cs341/2012fa/files/lec8.pdf · CS 341 – Lectures 8/9 Dan Sheldon . Review: Linear Methods ! So far, we’ve looked at linear methods ! Linear](https://reader030.vdocuments.net/reader030/viewer/2022040800/5e35fcf002aa42063a0366b1/html5/thumbnails/24.jpg)
The Algorithm Decision tree learning
Aim: find a small tree consistent with the training examples
Idea: (recursively) choose “most significant” attribute as root of (sub)tree
function DTL(examples, attributes, default) returns a decision tree
if examples is empty then return default
else if all examples have the same classification then return the classificationelse if attributes is empty then return Mode(examples)else
best!Choose-Attribute(attributes, examples)tree! a new decision tree with root test best
for each value vi of best do
examplesi! {elements of examples with best = vi}subtree!DTL(examplesi,attributes" best,Mode(examples))add a branch to tree with label vi and subtree subtree
return tree
Chapter 18, Sections 1–3 23
! Put it (almost) all together
![Page 25: Decision Treessheldon/teaching/mhc/cs341/2012fa/files/lec8.pdf · CS 341 – Lectures 8/9 Dan Sheldon . Review: Linear Methods ! So far, we’ve looked at linear methods ! Linear](https://reader030.vdocuments.net/reader030/viewer/2022040800/5e35fcf002aa42063a0366b1/html5/thumbnails/25.jpg)
Splitting Criteria ! An obvious idea is error rate
! Total # of errors we would make if we stopped training now
15 5
13 1 2 4
15 5
7 3 8 2
Errors = 5
Attribute 1 Attribute 2
![Page 26: Decision Treessheldon/teaching/mhc/cs341/2012fa/files/lec8.pdf · CS 341 – Lectures 8/9 Dan Sheldon . Review: Linear Methods ! So far, we’ve looked at linear methods ! Linear](https://reader030.vdocuments.net/reader030/viewer/2022040800/5e35fcf002aa42063a0366b1/html5/thumbnails/26.jpg)
Splitting Criteria ! Problem: error rate fails to differentiate these two splits
15 5
10 0 5 5
15 5
7 3 8 2
! HW problem: show that error rate does not decrease unless children have different majority classes
Errors = 5
Attribute 3 Attribute 2
![Page 27: Decision Treessheldon/teaching/mhc/cs341/2012fa/files/lec8.pdf · CS 341 – Lectures 8/9 Dan Sheldon . Review: Linear Methods ! So far, we’ve looked at linear methods ! Linear](https://reader030.vdocuments.net/reader030/viewer/2022040800/5e35fcf002aa42063a0366b1/html5/thumbnails/27.jpg)
Entropy ! Measure of uncertainty from field of Information Theory
! Claude Shannon (1948)
! Consider a biased random coin ! Pr(X = heads) = q!! Pr(X = tails) = 1–q!
! Define surprise of a random variable as ! S(X = x) = -log2 Pr(X = x)!
q = 0 q=1/4 q=1/2
S(X = heads) infinite 2 1
S(X = tails) 0 .4150 1
![Page 28: Decision Treessheldon/teaching/mhc/cs341/2012fa/files/lec8.pdf · CS 341 – Lectures 8/9 Dan Sheldon . Review: Linear Methods ! So far, we’ve looked at linear methods ! Linear](https://reader030.vdocuments.net/reader030/viewer/2022040800/5e35fcf002aa42063a0366b1/html5/thumbnails/28.jpg)
Entropy
! Entropy = average surprise
! For the biased coin
q = 0 q=1/4 q=1/2
S(X = heads) infinite 2 1
S(X = tails) 0 .4150 1
H (X) = − Pr(X = x)logPr(X = x)values x∑
B(q) := H (X) = −q logq − (1− q)log(1− q)
![Page 29: Decision Treessheldon/teaching/mhc/cs341/2012fa/files/lec8.pdf · CS 341 – Lectures 8/9 Dan Sheldon . Review: Linear Methods ! So far, we’ve looked at linear methods ! Linear](https://reader030.vdocuments.net/reader030/viewer/2022040800/5e35fcf002aa42063a0366b1/html5/thumbnails/29.jpg)
Entropy ! Entropy of a biased coin
0 0.2 0.4 0.6 0.8 10
0.2
0.4
0.6
0.8
1
q
Entropy
![Page 30: Decision Treessheldon/teaching/mhc/cs341/2012fa/files/lec8.pdf · CS 341 – Lectures 8/9 Dan Sheldon . Review: Linear Methods ! So far, we’ve looked at linear methods ! Linear](https://reader030.vdocuments.net/reader030/viewer/2022040800/5e35fcf002aa42063a0366b1/html5/thumbnails/30.jpg)
Entropy as Attribute Test ! Look for split with biggest reduction in entropy
! Information gain = entropy – (average entropy of children)
15 5
10 0 5 5
![Page 31: Decision Treessheldon/teaching/mhc/cs341/2012fa/files/lec8.pdf · CS 341 – Lectures 8/9 Dan Sheldon . Review: Linear Methods ! So far, we’ve looked at linear methods ! Linear](https://reader030.vdocuments.net/reader030/viewer/2022040800/5e35fcf002aa42063a0366b1/html5/thumbnails/31.jpg)
Entropy as Attribute Test ! Back to our previous example
15 5
10 0 5 5
15 5
7 3 8 2
![Page 32: Decision Treessheldon/teaching/mhc/cs341/2012fa/files/lec8.pdf · CS 341 – Lectures 8/9 Dan Sheldon . Review: Linear Methods ! So far, we’ve looked at linear methods ! Linear](https://reader030.vdocuments.net/reader030/viewer/2022040800/5e35fcf002aa42063a0366b1/html5/thumbnails/32.jpg)
A More General Example
15 5
10 0 2 3 4 1
1020
= 12 1
4
14
B(1) = 0
B( 34 ) = − 34 log 34 − 1
4 log 14 = 0.88
B( 25 ) = − 25 log 25 − 3
5 log 35 = 0.97
B( 45 ) = − 45 log 45 − 1
5 log 15 = 0.72
12 ⋅0 + 1
4 ⋅0.72 + 14 ⋅0.97 = 0.42Average entropy of children
Reduction 0.88 − 0.42 = 0.46
![Page 33: Decision Treessheldon/teaching/mhc/cs341/2012fa/files/lec8.pdf · CS 341 – Lectures 8/9 Dan Sheldon . Review: Linear Methods ! So far, we’ve looked at linear methods ! Linear](https://reader030.vdocuments.net/reader030/viewer/2022040800/5e35fcf002aa42063a0366b1/html5/thumbnails/33.jpg)
Pruning ! It is still possible to grow very large trees ! overfitting
! Pruning ! First grow a full tree ! Then remove some nodes ! (Why not just stop growing the tree earlier?)
! Different pruning criteria ! R&N: Chi-squared. How “different’ are the children from the
parents? ! An easier possibility: use a test set to see if removing the node
hurts generalization
![Page 34: Decision Treessheldon/teaching/mhc/cs341/2012fa/files/lec8.pdf · CS 341 – Lectures 8/9 Dan Sheldon . Review: Linear Methods ! So far, we’ve looked at linear methods ! Linear](https://reader030.vdocuments.net/reader030/viewer/2022040800/5e35fcf002aa42063a0366b1/html5/thumbnails/34.jpg)
Continuous Attributes
! Split on any threshold value x ! Select attribute/threshold combination with best reduction in
entropy
15 5
13 1 2 4
Wait estimate ≤ x Wait estimate > x
![Page 35: Decision Treessheldon/teaching/mhc/cs341/2012fa/files/lec8.pdf · CS 341 – Lectures 8/9 Dan Sheldon . Review: Linear Methods ! So far, we’ve looked at linear methods ! Linear](https://reader030.vdocuments.net/reader030/viewer/2022040800/5e35fcf002aa42063a0366b1/html5/thumbnails/35.jpg)
Miscellaneous ! Issues to think/read about on your own
! Entropy criteria tends to favor multi-valued attributes over binary attributes ! Why? ! Various fixes proposed in literature. Can you think of one?
! Missing attributes ! E.g., medical testing ! How to handle these during prediction? ! During training?
![Page 36: Decision Treessheldon/teaching/mhc/cs341/2012fa/files/lec8.pdf · CS 341 – Lectures 8/9 Dan Sheldon . Review: Linear Methods ! So far, we’ve looked at linear methods ! Linear](https://reader030.vdocuments.net/reader030/viewer/2022040800/5e35fcf002aa42063a0366b1/html5/thumbnails/36.jpg)
Geometric Interpretation ! Axis-parallel splits
![Page 37: Decision Treessheldon/teaching/mhc/cs341/2012fa/files/lec8.pdf · CS 341 – Lectures 8/9 Dan Sheldon . Review: Linear Methods ! So far, we’ve looked at linear methods ! Linear](https://reader030.vdocuments.net/reader030/viewer/2022040800/5e35fcf002aa42063a0366b1/html5/thumbnails/37.jpg)
Geometric Interpretation ! Can learn simple and complex boundaries
2.5 3
3.5 4
4.5 5
5.5 6
6.5 7
7.5
4.5 5 5.5 6 6.5 7
x 2
x1-1.5
-1
-0.5
0
0.5
1
1.5
-1.5 -1 -0.5 0 0.5 1 1.5
x 2
x1
![Page 38: Decision Treessheldon/teaching/mhc/cs341/2012fa/files/lec8.pdf · CS 341 – Lectures 8/9 Dan Sheldon . Review: Linear Methods ! So far, we’ve looked at linear methods ! Linear](https://reader030.vdocuments.net/reader030/viewer/2022040800/5e35fcf002aa42063a0366b1/html5/thumbnails/38.jpg)
What You Should Know
! Top-down greedy learning for decision trees ! Entropy splitting criteria ! Continuous attributes