classification with decision trees and rules evgueni smirnov
TRANSCRIPT
![Page 1: Classification with Decision Trees and Rules Evgueni Smirnov](https://reader035.vdocuments.net/reader035/viewer/2022062407/56649e7a5503460f94b79e1c/html5/thumbnails/1.jpg)
Classification with Decision Trees and Rules
Evgueni Smirnov
![Page 2: Classification with Decision Trees and Rules Evgueni Smirnov](https://reader035.vdocuments.net/reader035/viewer/2022062407/56649e7a5503460f94b79e1c/html5/thumbnails/2.jpg)
OverviewOverview
• Classification Problem
• Decision Trees for Classification
• Decision Rules for Classification
![Page 3: Classification with Decision Trees and Rules Evgueni Smirnov](https://reader035.vdocuments.net/reader035/viewer/2022062407/56649e7a5503460f94b79e1c/html5/thumbnails/3.jpg)
Classification TaskGiven: • X is an instance space defined as {Xi}i 1..∈ N
where Xi is a discrete/continuous variable.• Y is a finite class set.• Training data D ⊆ X x Y.Find:• Class y ∈ Y of an instance x ∈X.
![Page 4: Classification with Decision Trees and Rules Evgueni Smirnov](https://reader035.vdocuments.net/reader035/viewer/2022062407/56649e7a5503460f94b79e1c/html5/thumbnails/4.jpg)
Instances, Classes, Instance SpacesInstances, Classes, Instance Spaces
friendly robots
A class is a set of objects in a world that are unified by a reason. A reason may be a similar appearance, structure or function.
Example. The set: {children, photos, cat, diplomas} can be viewed as a class “Most important things to take out of your apartment when it catches fire”.
![Page 5: Classification with Decision Trees and Rules Evgueni Smirnov](https://reader035.vdocuments.net/reader035/viewer/2022062407/56649e7a5503460f94b79e1c/html5/thumbnails/5.jpg)
head = squarebody = roundsmiling = yesholding = flagcolor = yellow
X
Instances, Classes, Instance SpacesInstances, Classes, Instance Spaces
friendly robots
![Page 6: Classification with Decision Trees and Rules Evgueni Smirnov](https://reader035.vdocuments.net/reader035/viewer/2022062407/56649e7a5503460f94b79e1c/html5/thumbnails/6.jpg)
head = squarebody = roundsmiling = yesholding = flagcolor = yellow
X
friendly robots
H
smiling = yes friendly robots
M
Instances, Classes, Instance SpacesInstances, Classes, Instance Spaces
![Page 7: Classification with Decision Trees and Rules Evgueni Smirnov](https://reader035.vdocuments.net/reader035/viewer/2022062407/56649e7a5503460f94b79e1c/html5/thumbnails/7.jpg)
X
H
M
Classification problemClassification problem
![Page 8: Classification with Decision Trees and Rules Evgueni Smirnov](https://reader035.vdocuments.net/reader035/viewer/2022062407/56649e7a5503460f94b79e1c/html5/thumbnails/8.jpg)
Decision Trees for Classification
• Classification Problem• Definition of Decision Trees• Variable Selection: Impurity Reduction,
Entropy, and Information Gain• Learning Decision Trees• Overfitting and Pruning• Handling Variables with Many Values• Handling Missing Values• Handling Large Data: Windowing
![Page 9: Classification with Decision Trees and Rules Evgueni Smirnov](https://reader035.vdocuments.net/reader035/viewer/2022062407/56649e7a5503460f94b79e1c/html5/thumbnails/9.jpg)
Decision Trees for ClassificationDecision Trees for Classification• A decision tree is a tree where:
– Each interior node tests a variable– Each branch corresponds to a variable value– Each leaf node is labelled with a class (class node)
A1
A2 A3
c1 c2
c1
c2 c1
a11a12
a13
a21 a22 a31 a32
![Page 10: Classification with Decision Trees and Rules Evgueni Smirnov](https://reader035.vdocuments.net/reader035/viewer/2022062407/56649e7a5503460f94b79e1c/html5/thumbnails/10.jpg)
A simple database: playtennisDay Outlook Temperature Humidity Wind Play
Tennis
D1 Sunny Hot High Weak No
D2 Sunny Hot High Strong No
D3 Overcast Hot High Weak Yes
D4 Rain Mild Normal Weak Yes
D5 Rain Cool Normal Weak Yes
D6 Rain Cool Normal Strong No
D7 Overcast Cool High Strong Yes
D8 Sunny Mild Normal Weak No
D9 Sunny Hot Normal Weak Yes
D10 Rain Mild Normal Strong Yes
D11 Sunny Cool Normal Strong Yes
D12 Overcast Mild High Strong Yes
D13 Overcast Hot Normal Weak Yes
D14 Rain Mild High Strong No
![Page 11: Classification with Decision Trees and Rules Evgueni Smirnov](https://reader035.vdocuments.net/reader035/viewer/2022062407/56649e7a5503460f94b79e1c/html5/thumbnails/11.jpg)
Decision Tree For Playing TennisDecision Tree For Playing Tennis
Outlook
sunny overcast rainy
Humidity Windy
high normal
no
false true
yes
yes yes no
![Page 12: Classification with Decision Trees and Rules Evgueni Smirnov](https://reader035.vdocuments.net/reader035/viewer/2022062407/56649e7a5503460f94b79e1c/html5/thumbnails/12.jpg)
Classification with Decision TreesClassification with Decision TreesClassify(x: instance, node: variable containing a node of DT)• if node is a classification node then
– return the class of node;
• else– determine the child of node that match x.
– return Classify(x, child). A1
A2 A3
c1 c2
c1
c2 c1
a11a12
a13
a21 a22 a31 a32
![Page 13: Classification with Decision Trees and Rules Evgueni Smirnov](https://reader035.vdocuments.net/reader035/viewer/2022062407/56649e7a5503460f94b79e1c/html5/thumbnails/13.jpg)
Decision Tree LearningDecision Tree LearningBasic Algorithm:
1. Xi the “best" decision variable for a node N.
2. Assign Xi as decision variable for the node N.
3. For each value of Xi, create new descendant of N.4. Sort training examples to leaf nodes.5. IF training examples perfectly classified, THEN Stop. ELSE
Iterate over new leaf nodes.
![Page 14: Classification with Decision Trees and Rules Evgueni Smirnov](https://reader035.vdocuments.net/reader035/viewer/2022062407/56649e7a5503460f94b79e1c/html5/thumbnails/14.jpg)
Variable Quality Measures
_____________________________________Outlook Temp Hum Wind Play ---------------------------------------------------------Rain Mild High Weak YesRain Cool Normal Weak YesRain Cool Normal Strong NoRain Mild Normal Weak YesRain Mild High Strong No
Outlook
____________________________________Outlook Temp Hum Wind Play -------------------------------------------------------Sunny Hot High Weak NoSunny Hot High Strong NoSunny Mild High Weak NoSunny Cool Normal Weak YesSunny Mild Normal Strong Yes
_____________________________________Outlook Temp Hum Wind Play ---------------------------------------------------------Overcast Hot High Weak YesOvercast Cool Normal Strong Yes
SunnyOvercast
Rain
![Page 15: Classification with Decision Trees and Rules Evgueni Smirnov](https://reader035.vdocuments.net/reader035/viewer/2022062407/56649e7a5503460f94b79e1c/html5/thumbnails/15.jpg)
Variable Quality Measures
• Let S be a sample of training instances and pj be the proportions of instances of class j (j=1,…,J) in S.
• Define an impurity measure I(S) that satisfies:– I(S) is minimum only when pi=1 and pj=0 for ji
(all objects are of the same class);– I(S) is maximum only when pj =1/J
(there is exactly the same number of objects of all classes);
– I(S) is symmetric with respect to p1,…,pJ;
![Page 16: Classification with Decision Trees and Rules Evgueni Smirnov](https://reader035.vdocuments.net/reader035/viewer/2022062407/56649e7a5503460f94b79e1c/html5/thumbnails/16.jpg)
Reduction of Impurity: Discrete Variables
• The “best” variable is the variable Xi that determines a split maximizing the expected reduction of impurity:
where Sxij is the subset of instances from S s.t. Xi=xij.
j
xij SxijIS
SSIXiSI )(
||
||)(),(
Xi
Sxi1Sxi2
Sxij…….
![Page 17: Classification with Decision Trees and Rules Evgueni Smirnov](https://reader035.vdocuments.net/reader035/viewer/2022062407/56649e7a5503460f94b79e1c/html5/thumbnails/17.jpg)
Information Gain: EntropyInformation Gain: Entropy
Let S be a sample of training examples, and
p+ is the proportion of positive examples in S and
p- is the proportion of negative examples in S.
Then: entropy measures the impurity of S:
E( S) = - p+ log2 p+ – p- log2 p
-
![Page 18: Classification with Decision Trees and Rules Evgueni Smirnov](https://reader035.vdocuments.net/reader035/viewer/2022062407/56649e7a5503460f94b79e1c/html5/thumbnails/18.jpg)
Entropy ExampleEntropy Example
In the Play Tennis dataset we had two target classes: yes and no
Out of 14 instances, 9 classified yes, rest no
2
2
9 9log 0.4114 14
5 5log 0.5314 14
( ) 0.94
yes
no
yes no
p
p
E S p p
Outlook Temp.Humidit
yWindy Play
Sunny Hot High False No
Sunny Hot High True No
Overcast
Hot High False Yes
Rainy Mild High False Yes
Rainy Cool Normal False Yes
Rainy Cool Normal True No
Overcast
Cool Normal True Yes
Outlook Temp.Humidit
yWindy play
Sunny Mild High False No
Sunny Cool Normal False Yes
Rainy Mild Normal False Yes
Sunny Mild Normal True Yes
Overcast
Mild High True Yes
Overcast
Hot Normal False Yes
Rainy Mild High True No
![Page 19: Classification with Decision Trees and Rules Evgueni Smirnov](https://reader035.vdocuments.net/reader035/viewer/2022062407/56649e7a5503460f94b79e1c/html5/thumbnails/19.jpg)
Information GainInformation Gain
Information Gain is the expected reduction in entropy caused by partitioning the instances from S according to a given discrete variable.
Gain(S, Xi) = E(S) -
where Sxij is the subset of instances from S s.t. Xi=xij.
)(||
||ij
ij
xj
x SES
S
Xi
Sxi1Sxi2
Sxij…….
![Page 20: Classification with Decision Trees and Rules Evgueni Smirnov](https://reader035.vdocuments.net/reader035/viewer/2022062407/56649e7a5503460f94b79e1c/html5/thumbnails/20.jpg)
ExampleExample
_____________________________________Outlook Temp Hum Wind Play ---------------------------------------------------------Rain Mild High Weak YesRain Cool Normal Weak YesRain Cool Normal Strong NoRain Mild Normal Weak YesRain Mild High Strong No
Outlook
____________________________________Outlook Temp Hum Wind Play -------------------------------------------------------Sunny Hot High Weak NoSunny Hot High Strong NoSunny Mild High Weak NoSunny Cool Normal Weak YesSunny Mild Normal Strong Yes
_____________________________________Outlook Temp Hum Wind Play ---------------------------------------------------------Overcast Hot High Weak YesOvercast Cool Normal Strong Yes
SunnyOvercast
Rain
Which attribute should be tested here?
Gain (Ssunny , Humidity) = = .970 - (3/5) 0.0 - (2/5) 0.0 = .970
Gain (Ssunny , Temperature) = .970 - (2/5) 0.0 - (2/5) 1.0 - (1/5) 0.0 = .570
Gain (Ssunny , Wind) = .970 - (2/5) 1.0 - (3/5) .918 = .019
![Page 21: Classification with Decision Trees and Rules Evgueni Smirnov](https://reader035.vdocuments.net/reader035/viewer/2022062407/56649e7a5503460f94b79e1c/html5/thumbnails/21.jpg)
Continuous Variables
Temp. Play
80 No
85 No
83 Yes
75 Yes
68 Yes
65 No
64 Yes
72 No
75 Yes
70 Yes
69 Yes
72 Yes
81 Yes
71 No No85
Yes81
Yes83
Yes75
Yes75
No80
Yes70
No71
No72
Yes72
Yes69
Yes68
No65
Yes64
PlayTemp.
SortSort
Temp.< 64.5 Temp.< 64.5 I=0.048I=0.048
Temp.< 84 Temp.< 84 I=0.113I=0.113
Temp.< 80.5 Temp.< 80.5 I=0.000I=0.000
Temp.< 77.5 Temp.< 77.5 I=0.025I=0.025
Temp.< 73.5 Temp.< 73.5 I=0.001I=0.001
Temp.< 70.5 Temp.< 70.5 I=0.045I=0.045
Temp.< 66.5 Temp.< 66.5 I=0.010I=0.010
![Page 22: Classification with Decision Trees and Rules Evgueni Smirnov](https://reader035.vdocuments.net/reader035/viewer/2022062407/56649e7a5503460f94b79e1c/html5/thumbnails/22.jpg)
ID3 AlgorithmID3 Algorithm
Informally:– Determine the variable with the highest
information gain on the training set.– Use this variable as the root, create a branch for
each of the values the attribute can have.– For each branch, repeat the process with subset
of the training set that is classified by that branch.
![Page 23: Classification with Decision Trees and Rules Evgueni Smirnov](https://reader035.vdocuments.net/reader035/viewer/2022062407/56649e7a5503460f94b79e1c/html5/thumbnails/23.jpg)
Hypothesis Space Search in ID3Hypothesis Space Search in ID3
• The hypothesis space is the set of all decision trees defined over the given set of variables.
• ID3’s hypothesis space is a compete space; i.e., the target tree is there!
• ID3 performs a simple-to-complex, hill climbing search through this space.
![Page 24: Classification with Decision Trees and Rules Evgueni Smirnov](https://reader035.vdocuments.net/reader035/viewer/2022062407/56649e7a5503460f94b79e1c/html5/thumbnails/24.jpg)
Hypothesis Space Search in ID3Hypothesis Space Search in ID3
• The evaluation function is the information gain.
• ID3 maintains only a single current decision tree.
• ID3 performs no backtracking in its search.
• ID3 uses all training instances at each step of the search.
![Page 25: Classification with Decision Trees and Rules Evgueni Smirnov](https://reader035.vdocuments.net/reader035/viewer/2022062407/56649e7a5503460f94b79e1c/html5/thumbnails/25.jpg)
Decision Trees are Non-linear Classifiers
0
1
0 1A1
A2
A2<0.33 ?
good A1<0.91 ?
A1<0.23 ? A2<0.91 ?
A2<0.75 ?A2<0.49 ?
A2<0.65 ?
good
bad good
bad
badbad
good
yes no
![Page 26: Classification with Decision Trees and Rules Evgueni Smirnov](https://reader035.vdocuments.net/reader035/viewer/2022062407/56649e7a5503460f94b79e1c/html5/thumbnails/26.jpg)
Posterior Class ProbabilitiesPosterior Class ProbabilitiesOutlook
Sunny Overcast Rainy
no: 2 pos and 3 negPpos = 0.4, Pneg = 0.6
Windy
False True
no: 2 pos and 0 negPpos = 1.0, Pneg = 0.0
no: 0 pos and 2 negPpos = 0.0, Pneg = 1.0
no: 3 pos and 0 negPpos = 1.0, Pneg = 0.0
![Page 27: Classification with Decision Trees and Rules Evgueni Smirnov](https://reader035.vdocuments.net/reader035/viewer/2022062407/56649e7a5503460f94b79e1c/html5/thumbnails/27.jpg)
OverfittingOverfitting Definition: Given a hypothesis space H, a hypothesis h H is
said to overfit the training data if there exists some hypothesis h’ H, such that h has smaller error that h’ over the training instances, but h’ has a smaller error that h over the entire distribution of instances.
![Page 28: Classification with Decision Trees and Rules Evgueni Smirnov](https://reader035.vdocuments.net/reader035/viewer/2022062407/56649e7a5503460f94b79e1c/html5/thumbnails/28.jpg)
Reasons for OverfittingReasons for Overfitting
• Noisy training instances. Consider an noisy training example: Outlook = Sunny; Temp = Hot; Humidity = Normal; Wind = True; PlayTennis = No
This instance affects the training instances: Outlook = Sunny; Temp = Cool; Humidity = Normal; Wind = False; PlayTennis = Yes Outlook = Sunny; Temp = Mild; Humidity = Normal; Wind = True; PlayTennis = Yes
Outlook
sunny overcast rainy
Humidity Windy
high normal
no
false true
yes
yes yes no
![Page 29: Classification with Decision Trees and Rules Evgueni Smirnov](https://reader035.vdocuments.net/reader035/viewer/2022062407/56649e7a5503460f94b79e1c/html5/thumbnails/29.jpg)
Reasons for OverfittingReasons for OverfittingOutlook
sunny overcast rainy
Humidity Windy
high normal
no
false true
yes
yes noWindy
true
yes
false
Temp
high
yes no
mild cool
?
Outlook = Sunny; Temp = Hot; Humidity = Normal; Wind = True; PlayTennis = NoOutlook = Sunny; Temp = Cool; Humidity = Normal; Wind = False; PlayTennis = Yes Outlook = Sunny; Temp = Mild; Humidity = Normal; Wind = True; PlayTennis = Yes
![Page 30: Classification with Decision Trees and Rules Evgueni Smirnov](https://reader035.vdocuments.net/reader035/viewer/2022062407/56649e7a5503460f94b79e1c/html5/thumbnails/30.jpg)
area with probablywrong predictions
+
++
++ +
+
-
-
- -
-
---
---
-
-
- +
---
-
-
Reasons for OverfittingReasons for Overfitting• Small number of instances are associated with leaf nodes. In this case it is possible that for coincidental regularities to occur that are unrelated to the actual borders.
![Page 31: Classification with Decision Trees and Rules Evgueni Smirnov](https://reader035.vdocuments.net/reader035/viewer/2022062407/56649e7a5503460f94b79e1c/html5/thumbnails/31.jpg)
Approaches to Avoiding OverfittingApproaches to Avoiding Overfitting
• Pre-pruning: stop growing the tree earlier, before it reaches the point where it perfectly classifies the training data
• Post-pruning: Allow the tree to overfit the data, and then post-prune the tree.
![Page 32: Classification with Decision Trees and Rules Evgueni Smirnov](https://reader035.vdocuments.net/reader035/viewer/2022062407/56649e7a5503460f94b79e1c/html5/thumbnails/32.jpg)
Pre-pruning
Outlook
Sunny Overcast Rainy
Humidity Windy
High Normal
no
False True
yes
yes yes no
SunnyRainy
Outlook
Overcast
noyes?
• It is difficult to decide when to stop growing the tree.
• A possible scenario is to stop when the leaf nodes get less than m training instances. Here is an example for m = 5.
3 2
2
23
![Page 33: Classification with Decision Trees and Rules Evgueni Smirnov](https://reader035.vdocuments.net/reader035/viewer/2022062407/56649e7a5503460f94b79e1c/html5/thumbnails/33.jpg)
Validation SetValidation Set
• Validation set is a set of instances used to evaluate the utility of nodes in decision trees. The validation set has to be chosen so that it is unlikely to suffer from same errors or fluctuations as the set used for decision-tree training.
• Usually before pruning the training data is split randomly into a growing set and a validation set.
![Page 34: Classification with Decision Trees and Rules Evgueni Smirnov](https://reader035.vdocuments.net/reader035/viewer/2022062407/56649e7a5503460f94b79e1c/html5/thumbnails/34.jpg)
Reduced-ErrorReduced-Error Pruning Pruning (Sub-tree replacement)(Sub-tree replacement)
Split data into growing and validation sets.
Pruning a decision node d consists of:1. removing the subtree rooted at d.2. making d a leaf node. 3. assigning d the most common
classification of the training instances associated with d.
Outlook
sunny overcast rainy
Humidity Windy
high normal
no
false true
yes
yes yes no
3 instances 2 instances
Accuracy of the tree on the validation set is 90%.
![Page 35: Classification with Decision Trees and Rules Evgueni Smirnov](https://reader035.vdocuments.net/reader035/viewer/2022062407/56649e7a5503460f94b79e1c/html5/thumbnails/35.jpg)
Reduced-Error PruningReduced-Error Pruning (Sub-tree replacement) (Sub-tree replacement)
Split data into growing and validation sets.
Pruning a decision node d consists of:1. removing the subtree rooted at d.2. making d a leaf node. 3. assigning d the most common
classification of the training instances associated with d.
Outlook
sunny overcast rainy
Windyno
false true
yes
yes no
Accuracy of the tree on the validation set is 92.4%.
![Page 36: Classification with Decision Trees and Rules Evgueni Smirnov](https://reader035.vdocuments.net/reader035/viewer/2022062407/56649e7a5503460f94b79e1c/html5/thumbnails/36.jpg)
Reduced-Error PruningReduced-Error Pruning (Sub-tree replacement) (Sub-tree replacement)
Split data into growing and validation sets.
Pruning a decision node d consists of:1. removing the subtree rooted at d.2. making d a leaf node. 3. assigning d the most common
classification of the training instances associated with d.
Do until further pruning is harmful:1. Evaluate impact on validation set of
pruning each possible node (plus those below it).
2. Greedily remove the one that most improves validation set accuracy.
Outlook
sunny overcast rainy
Windyno
false true
yes
yes no
Accuracy of the tree on the validation set is 92.4%.
![Page 37: Classification with Decision Trees and Rules Evgueni Smirnov](https://reader035.vdocuments.net/reader035/viewer/2022062407/56649e7a5503460f94b79e1c/html5/thumbnails/37.jpg)
Outlook
Humidity Wind
no
yes
no yes
Sunny Overcast
Rain
High Normal Strong
Weak
Temp.
no yesMild Cool,Ho
t
Outlook
Humidity Wind
no
yes
no yes
Sunny Overcast
Rain
High Normal Strong
Weak
yes
Outlook
Wind
noyes
no yes
Sunny Overcast
Rain
Strong
Weak
Outlook
noyes
Sunny Overcast
Rain
yes
yes
TT22
TT11 TT33
TT44
TT55
ErrorErrorGSGS=0%, =0%, ErrorErrorVSVS=10%=10%
ErrorErrorGSGS=6%, Error=6%, ErrorVSVS=8%=8%
ErrorErrorGSGS=13%, =13%, ErrorErrorVSVS=15%=15%
ErrorErrorGSGS=27%, =27%, ErrorErrorVSVS=25%=25%
ErrorErrorGSGS=33%, =33%, ErrorErrorVSVS=35%=35%
Reduced-Error PruningReduced-Error Pruning (Sub-tree replacement) (Sub-tree replacement)
![Page 38: Classification with Decision Trees and Rules Evgueni Smirnov](https://reader035.vdocuments.net/reader035/viewer/2022062407/56649e7a5503460f94b79e1c/html5/thumbnails/38.jpg)
Reduced Error Pruning ExampleReduced Error Pruning Example
![Page 39: Classification with Decision Trees and Rules Evgueni Smirnov](https://reader035.vdocuments.net/reader035/viewer/2022062407/56649e7a5503460f94b79e1c/html5/thumbnails/39.jpg)
Reduced-ErrorReduced-Error Pruning Pruning (Sub-tree raising)(Sub-tree raising)
Split data into growing and validation sets.
Raising a sub-tree with root d consists of:
1. removing the sub-tree rooted at the parent of d.
2. place d at the place of its parent. 3. Sort the training instances
associated with the parent of d using the sub-tree with root d .
Outlook
sunny overcast rainy
Humidity Windy
high normal
no
false true
yes
yes yes no
3 instances 2 instances
Accuracy of the tree on the validation set is 90%.
![Page 40: Classification with Decision Trees and Rules Evgueni Smirnov](https://reader035.vdocuments.net/reader035/viewer/2022062407/56649e7a5503460f94b79e1c/html5/thumbnails/40.jpg)
Reduced-ErrorReduced-Error Pruning Pruning (Sub-tree raising)(Sub-tree raising)
Split data into growing and validation sets.
Raising a sub-tree with root d consists of:
1. removing the sub-tree rooted at the parent of d.
2. place d at the place of its parent. 3. Sort the training instances
associated with the parent of d using the sub-tree with root d .
Outlook
sunny overcast rainy
Humidity Windy
high normal
no
false true
yes
yes yes no
3 instances 2 instances
Accuracy of the tree on the validation set is 90%.
![Page 41: Classification with Decision Trees and Rules Evgueni Smirnov](https://reader035.vdocuments.net/reader035/viewer/2022062407/56649e7a5503460f94b79e1c/html5/thumbnails/41.jpg)
Reduced-ErrorReduced-Error Pruning Pruning (Sub-tree raising)(Sub-tree raising)
Split data into growing and validation sets.
Raising a sub-tree with root d consists of:
1. removing the sub-tree rooted at the parent of d.
2. place d at the place of its parent. 3. Sort the training instances
associated with the parent of d using the sub-tree with root d .
Humidity
high normal
no yes
Accuracy of the tree on the validation set is 73%. So, No!
![Page 42: Classification with Decision Trees and Rules Evgueni Smirnov](https://reader035.vdocuments.net/reader035/viewer/2022062407/56649e7a5503460f94b79e1c/html5/thumbnails/42.jpg)
Rule Post-PruningRule Post-Pruning
IF (Outlook = Sunny) & (Humidity = High)THEN PlayTennis = NoIF (Outlook = Sunny) & (Humidity = Normal)THEN PlayTennis = Yes……….
1. Convert tree to equivalent set of rules.2. Prune each rule independently of others.3. Sort final rules by their estimated accuracy, and consider them
in this sequence when classifying subsequent instances.
Outlook
sunny overcast rainy
Humidity Windy
normal
no
false true
yes
yes yes no
false
![Page 43: Classification with Decision Trees and Rules Evgueni Smirnov](https://reader035.vdocuments.net/reader035/viewer/2022062407/56649e7a5503460f94b79e1c/html5/thumbnails/43.jpg)
Decision Tree are non-linear. Can we make them linear?
0
1
0 1A1
A2
A2<0.33 ?
good A1<0.91 ?
A1<0.23 ? A2<0.91 ?
A2<0.75 ?A2<0.49 ?
A2<0.65 ?
good
bad good
bad
badbad
good
yes no
![Page 44: Classification with Decision Trees and Rules Evgueni Smirnov](https://reader035.vdocuments.net/reader035/viewer/2022062407/56649e7a5503460f94b79e1c/html5/thumbnails/44.jpg)
Oblique Decision Trees
x + y < 1
Class = + Class =
• Test condition may involve multiple attributes
• More expressive representation
• Finding optimal test condition is computationally expensive!
![Page 45: Classification with Decision Trees and Rules Evgueni Smirnov](https://reader035.vdocuments.net/reader035/viewer/2022062407/56649e7a5503460f94b79e1c/html5/thumbnails/45.jpg)
Variables with Many Values
• Problem: – Not good splits: they fragment the data too quickly, leaving
insufficient data at the next level– The reduction of impurity of such test is often high (example:
split on the object id).
• Two solutions:– Change the splitting criterion to penalize variables with many
values– Consider only binary splits
Letter
a b c y z…
![Page 46: Classification with Decision Trees and Rules Evgueni Smirnov](https://reader035.vdocuments.net/reader035/viewer/2022062407/56649e7a5503460f94b79e1c/html5/thumbnails/46.jpg)
Variables with Many Values
• Example: outlook in the playtennis– InfoGain(outlook) = 0.246
– Splitinformation(outlook) = 1.577
– Gainratio(outlook) = 0.246/1.577=0.156 < 0.246
• Problem: the gain ratio favours unbalanced tests
||
||log
||
||),( 2
1 S
S
S
SASSplitInfo i
c
i
i
),(
),(
ASSplitInfo
ASGainGainRatio
![Page 47: Classification with Decision Trees and Rules Evgueni Smirnov](https://reader035.vdocuments.net/reader035/viewer/2022062407/56649e7a5503460f94b79e1c/html5/thumbnails/47.jpg)
Variables with Many Values
![Page 48: Classification with Decision Trees and Rules Evgueni Smirnov](https://reader035.vdocuments.net/reader035/viewer/2022062407/56649e7a5503460f94b79e1c/html5/thumbnails/48.jpg)
Variables with Many Values
![Page 49: Classification with Decision Trees and Rules Evgueni Smirnov](https://reader035.vdocuments.net/reader035/viewer/2022062407/56649e7a5503460f94b79e1c/html5/thumbnails/49.jpg)
Missing ValuesMissing Values
1. If node n tests variable Xi, assign most common value of Xi among other instances sorted to node n.
2. If node n tests variable Xi, assign a probability to each of possible values of Xi. These probabilities are estimated based on the observed frequencies of the values of Xi. These probabilities are used in the information gain measure (via info gain).
j
xij SxijIS
SSIXiSI )(
||
||)(),(
![Page 50: Classification with Decision Trees and Rules Evgueni Smirnov](https://reader035.vdocuments.net/reader035/viewer/2022062407/56649e7a5503460f94b79e1c/html5/thumbnails/50.jpg)
Windowing
If the data don’t fit main memory use windowing:1. Select randomly n instances from the training data
D and put them in window set W.2. Train a decision tree DT on W. 3. Determine a set M of instances from D
misclassified by DT.4. W = W U M.5. IF Not(StopCondition) THEN GoTo 2;
![Page 51: Classification with Decision Trees and Rules Evgueni Smirnov](https://reader035.vdocuments.net/reader035/viewer/2022062407/56649e7a5503460f94b79e1c/html5/thumbnails/51.jpg)
Summary PointsSummary Points
1. Decision tree learning provides a practical method for concept learning.
2. ID3-like algorithms search complete hypothesis space.3. The inductive bias of decision trees is preference (search)
bias.4. Overfitting the training data is an important issue in
decision tree learning.5. A large number of extensions of the ID3 algorithm have
been proposed for overfitting avoidance, handling missing attributes, handling numerical attributes, etc.
![Page 52: Classification with Decision Trees and Rules Evgueni Smirnov](https://reader035.vdocuments.net/reader035/viewer/2022062407/56649e7a5503460f94b79e1c/html5/thumbnails/52.jpg)
Learning Decision RulesLearning Decision Rules
• Decision Rules• Basic Sequential Covering Algorithm• Learn-One-Rule Procedure• Pruning
![Page 53: Classification with Decision Trees and Rules Evgueni Smirnov](https://reader035.vdocuments.net/reader035/viewer/2022062407/56649e7a5503460f94b79e1c/html5/thumbnails/53.jpg)
Definition of Decision RulesDefinition of Decision Rules
Example: If you run the Prism algorithm from Weka on the weather data you will get the following set of decision rules:
if outlook = overcast then PlayTennis = yes
if humidity = normal and windy = FALSE then PlayTennis = yes
if temperature = mild and humidity = normal then PlayTennis = yes
if outlook = rainy and windy = FALSE then PlayTennis = yes
if outlook = sunny and humidity = high then PlayTennis = no
if outlook = rainy and windy = TRUE then PlayTennis = no
Definition: Decision rules are rules with the following form:
if <conditions> then concept C.
![Page 54: Classification with Decision Trees and Rules Evgueni Smirnov](https://reader035.vdocuments.net/reader035/viewer/2022062407/56649e7a5503460f94b79e1c/html5/thumbnails/54.jpg)
Why Decision Rules?Why Decision Rules?• Decision rules are more compact.• Decision rules are more understandable.
Example: Let X {0,1}, Y {0,1}, Z {0,1}, W {0,1}. The rules are:
if X=1 and Y=1 then 1
if Z=1 and W=1 then 1
Otherwise 0;
X
0
Y
1 0
1 Z
1 0
0W
1 0
1 0
Z
1 0
0W
1 0
1 0
1
![Page 55: Classification with Decision Trees and Rules Evgueni Smirnov](https://reader035.vdocuments.net/reader035/viewer/2022062407/56649e7a5503460f94b79e1c/html5/thumbnails/55.jpg)
Why Decision Rules?Why Decision Rules?
+ +
++ ++
+
+++ +
+ -
-
-
- -- -
-
-
--
--
Decision boundaries of decision trees
+ +
++ ++
+
+++ +
+ -
-
-
- -- -
-
-
--
--
Decision boundaries of decision rules
![Page 56: Classification with Decision Trees and Rules Evgueni Smirnov](https://reader035.vdocuments.net/reader035/viewer/2022062407/56649e7a5503460f94b79e1c/html5/thumbnails/56.jpg)
How to Learn Decision Rules?How to Learn Decision Rules?
1. We can convert trees to rules2. We can use specific rule-learning methods
![Page 57: Classification with Decision Trees and Rules Evgueni Smirnov](https://reader035.vdocuments.net/reader035/viewer/2022062407/56649e7a5503460f94b79e1c/html5/thumbnails/57.jpg)
Sequential Covering AlgorithmsSequential Covering Algorithmsfunction LearnRuleSet(Target, Attrs, Examples, Threshold):
LearnedRules :=
Rule := LearnOneRule(Target, Attrs, Examples)
while performance(Rule,Examples) > Threshold, do
LearnedRules := LearnedRules {Rule}
Examples := Examples \ {examples covered by Rule}
Rule := LearnOneRule(Target, Attrs, Examples)
sort LearnedRules according to performance
return LearnedRules
![Page 58: Classification with Decision Trees and Rules Evgueni Smirnov](https://reader035.vdocuments.net/reader035/viewer/2022062407/56649e7a5503460f94b79e1c/html5/thumbnails/58.jpg)
IF true THEN pos
IllustrationIllustration
++
++
++
+
+
++ +
+ -
-
-
--
- -
-
-
-
-
--
![Page 59: Classification with Decision Trees and Rules Evgueni Smirnov](https://reader035.vdocuments.net/reader035/viewer/2022062407/56649e7a5503460f94b79e1c/html5/thumbnails/59.jpg)
IF true THEN posIF A THEN pos
IllustrationIllustration
++
++
++
+
+
++ +
+ -
-
-
--
- -
-
-
-
-
--
![Page 60: Classification with Decision Trees and Rules Evgueni Smirnov](https://reader035.vdocuments.net/reader035/viewer/2022062407/56649e7a5503460f94b79e1c/html5/thumbnails/60.jpg)
IF true THEN posIF A THEN pos IF A & B THEN pos
IllustrationIllustration
++
++
++
+
+
++ +
+ -
-
-
--
- -
-
-
-
-
--
![Page 61: Classification with Decision Trees and Rules Evgueni Smirnov](https://reader035.vdocuments.net/reader035/viewer/2022062407/56649e7a5503460f94b79e1c/html5/thumbnails/61.jpg)
IF true THEN pos
IllustrationIllustration
++
++
++
+
+
++ +
+ -
-
-
--
- -
-
-
-
-
--
IF A & B THEN pos
![Page 62: Classification with Decision Trees and Rules Evgueni Smirnov](https://reader035.vdocuments.net/reader035/viewer/2022062407/56649e7a5503460f94b79e1c/html5/thumbnails/62.jpg)
IF true THEN posIF C THEN pos
IllustrationIllustration
++
++
++
+
+
++ +
+ -
-
-
--
- -
-
-
-
-
--
IF A & B THEN pos
![Page 63: Classification with Decision Trees and Rules Evgueni Smirnov](https://reader035.vdocuments.net/reader035/viewer/2022062407/56649e7a5503460f94b79e1c/html5/thumbnails/63.jpg)
IF true THEN posIF C THEN posIF C & D THEN pos
IllustrationIllustration
++
++
++
+
+
++ +
+ -
-
-
--
- -
-
-
-
-
--
IF A & B THEN pos
![Page 64: Classification with Decision Trees and Rules Evgueni Smirnov](https://reader035.vdocuments.net/reader035/viewer/2022062407/56649e7a5503460f94b79e1c/html5/thumbnails/64.jpg)
Learning One RuleLearning One Rule
• To learn one rule we use one of the strategies below:
• Top-down:– Start with maximally general rule
– Add literals one by one
• Bottom-up:– Start with maximally specific rule
– Remove literals one by one
![Page 65: Classification with Decision Trees and Rules Evgueni Smirnov](https://reader035.vdocuments.net/reader035/viewer/2022062407/56649e7a5503460f94b79e1c/html5/thumbnails/65.jpg)
Bottom-up vs. Top-downBottom-up vs. Top-down
++
++
++
+
+
++ +
+ -
-
-
--
- -
-
-
-
-
--
Top-down: typically more general rules
Bottom-up: typically more specific rules
![Page 66: Classification with Decision Trees and Rules Evgueni Smirnov](https://reader035.vdocuments.net/reader035/viewer/2022062407/56649e7a5503460f94b79e1c/html5/thumbnails/66.jpg)
Learning One RuleLearning One Rule
Bottom-up:• Example-driven (AQ family).
Top-down:• Generate-then-Test (CN-2).
![Page 67: Classification with Decision Trees and Rules Evgueni Smirnov](https://reader035.vdocuments.net/reader035/viewer/2022062407/56649e7a5503460f94b79e1c/html5/thumbnails/67.jpg)
Example of Learning One RuleExample of Learning One Rule
![Page 68: Classification with Decision Trees and Rules Evgueni Smirnov](https://reader035.vdocuments.net/reader035/viewer/2022062407/56649e7a5503460f94b79e1c/html5/thumbnails/68.jpg)
Heuristics for Learning One RuleHeuristics for Learning One Rule
– When is a rule “good”?• High accuracy;• Less important: high coverage.
– Possible evaluation functions:• Relative frequency: nc/n, where nc is the number of correctly
classified instances, and n is the number of instances covered by the rule;
• m-estimate of accuracy: (nc+ mp)/(n+m), where nc is the number of correctly classified instances, n is the number of instances covered by the rule, p is the prior probablity of the class predicted by the rule, and m is the weight of p.
• Entropy.
![Page 69: Classification with Decision Trees and Rules Evgueni Smirnov](https://reader035.vdocuments.net/reader035/viewer/2022062407/56649e7a5503460f94b79e1c/html5/thumbnails/69.jpg)
How to Arrange the RulesHow to Arrange the Rules 1. The rules are ordered according to the order they have been
learned. This order is used for instance classification.
2. The rules are ordered according to their accuracy. This order is used for instance classification.
3. The rules are not ordered but there exists a strategy how to apply the rules (e.g., an instance covered by conflicting rules gets the classification of the rule that classifies correctly more training instances; if an instance is not covered by any rule, then it gets the classification of the majority class represented in the training data).
![Page 70: Classification with Decision Trees and Rules Evgueni Smirnov](https://reader035.vdocuments.net/reader035/viewer/2022062407/56649e7a5503460f94b79e1c/html5/thumbnails/70.jpg)
Approaches to Avoiding OverfittingApproaches to Avoiding Overfitting
• Pre-pruning: stop learning the decision rules before they reach the point where they perfectly classify the training data
• Post-pruning: allow the decision rules to overfit the training data, and then post-prune the rules.
![Page 71: Classification with Decision Trees and Rules Evgueni Smirnov](https://reader035.vdocuments.net/reader035/viewer/2022062407/56649e7a5503460f94b79e1c/html5/thumbnails/71.jpg)
Post-PruningPost-Pruning
1. Split instances into Growing Set and Pruning Set;
2. Learn set SR of rules using Growing Set;
3. Find the best simplification BSR of SR.
4. while (Accuracy(BSR, Pruning Set) >
Accuracy(SR, Pruning Set) ) do
4.1 SR = BSR;
4.2 Find the best simplification BSR of SR.
5. return BSR;
![Page 72: Classification with Decision Trees and Rules Evgueni Smirnov](https://reader035.vdocuments.net/reader035/viewer/2022062407/56649e7a5503460f94b79e1c/html5/thumbnails/72.jpg)
Incremental Reduced Error PruningIncremental Reduced Error Pruning
D1
D2
D3
D3
D22
D1 D21
Post-pruning
![Page 73: Classification with Decision Trees and Rules Evgueni Smirnov](https://reader035.vdocuments.net/reader035/viewer/2022062407/56649e7a5503460f94b79e1c/html5/thumbnails/73.jpg)
Incremental Reduced Error PruningIncremental Reduced Error Pruning
1. Split Training Set into Growing Set and Validation Set;
2. Learn rule R using Growing Set;
3. Prune the rule R using Validation Set;
4. if performance(R, Training Set) > Threshold
4.1 Add R to Set of Learned Rules
4.2 Remove in Training Set the instances covered by R;
4.2 go to 1;
5. else return Set of Learned Rules
![Page 74: Classification with Decision Trees and Rules Evgueni Smirnov](https://reader035.vdocuments.net/reader035/viewer/2022062407/56649e7a5503460f94b79e1c/html5/thumbnails/74.jpg)
Summary PointsSummary Points
1. Decision rules are easier for human comprehension than decision trees.
2. Decision rules have simpler decision boundaries than decision trees.
3. Decision rules are learned by sequential covering of the training instances.
![Page 75: Classification with Decision Trees and Rules Evgueni Smirnov](https://reader035.vdocuments.net/reader035/viewer/2022062407/56649e7a5503460f94b79e1c/html5/thumbnails/75.jpg)
Lab 1: Some Details
![Page 76: Classification with Decision Trees and Rules Evgueni Smirnov](https://reader035.vdocuments.net/reader035/viewer/2022062407/56649e7a5503460f94b79e1c/html5/thumbnails/76.jpg)
Model Evaluation Techniques
• Evaluation on the training set: too optimistic
Training set
Classifier
Training set
![Page 77: Classification with Decision Trees and Rules Evgueni Smirnov](https://reader035.vdocuments.net/reader035/viewer/2022062407/56649e7a5503460f94b79e1c/html5/thumbnails/77.jpg)
Model Evaluation Techniques
• Hold-out Method: depends on the make-up of the test set.
Training set
Classifier
Test set
Data
• To improve the precision of the hold-out method: it is repeated many times.
![Page 78: Classification with Decision Trees and Rules Evgueni Smirnov](https://reader035.vdocuments.net/reader035/viewer/2022062407/56649e7a5503460f94b79e1c/html5/thumbnails/78.jpg)
Model Evaluation Techniques
• k-fold Cross Validation
Classifier
Data
train train test
train test train
test train train
![Page 79: Classification with Decision Trees and Rules Evgueni Smirnov](https://reader035.vdocuments.net/reader035/viewer/2022062407/56649e7a5503460f94b79e1c/html5/thumbnails/79.jpg)
Intro to WekaIntro to Weka@relation weather.symbolic
@attribute outlook {sunny, overcast, rainy}@attribute temperature {hot, mild, cool}@attribute humidity {high, normal}@attribute windy {TRUE, FALSE}@attribute play {TRUE, FALSE}
@datasunny,hot,high,FALSE,FALSEsunny,hot,high,TRUE,FALSEovercast,hot,high,FALSE,TRUErainy,mild,high,FALSE,TRUErainy,cool,normal,FALSE,TRUErainy,cool,normal,TRUE,FALSEovercast,cool,normal,TRUE,TRUE………….
![Page 80: Classification with Decision Trees and Rules Evgueni Smirnov](https://reader035.vdocuments.net/reader035/viewer/2022062407/56649e7a5503460f94b79e1c/html5/thumbnails/80.jpg)
ReferencesReferences
• Mitchell, Tom. M. 1997. Machine Learning. New York: McGraw-Hill
• Quinlan, J. R. 1986. Induction of decision trees. Machine Learning
• Stuart Russell, Peter Norvig, 2010. Artificial Intelligence: A Modern Approach. New Jersey: Prantice Hall