![Page 1: Chapter 6. Decision Tree (Chapter 5 of PKKSC) · 2017-03-28 · Classi cation And Regression Tree Breiman and et al. 1984 A result of machine learning research. One of the most popular](https://reader035.vdocuments.net/reader035/viewer/2022081612/5f58a4bbed255332c340f077/html5/thumbnails/1.jpg)
Chapter 6. Decision Tree(Chapter 5 of PKKSC)
Yongdai Kim
Seoul National University
![Page 2: Chapter 6. Decision Tree (Chapter 5 of PKKSC) · 2017-03-28 · Classi cation And Regression Tree Breiman and et al. 1984 A result of machine learning research. One of the most popular](https://reader035.vdocuments.net/reader035/viewer/2022081612/5f58a4bbed255332c340f077/html5/thumbnails/2.jpg)
1. Introduction
Seoul National University. 1
![Page 3: Chapter 6. Decision Tree (Chapter 5 of PKKSC) · 2017-03-28 · Classi cation And Regression Tree Breiman and et al. 1984 A result of machine learning research. One of the most popular](https://reader035.vdocuments.net/reader035/viewer/2022081612/5f58a4bbed255332c340f077/html5/thumbnails/3.jpg)
Decision Tree
• A supervised learning(classification and prediction) method.
• Genrate rules of the form ”if-then”.
• Easy rules represented by DB language like SQL.
• Good interpretation.
* The first method for high dimensional nonlinear function
estimation
Seoul National University. 2
![Page 4: Chapter 6. Decision Tree (Chapter 5 of PKKSC) · 2017-03-28 · Classi cation And Regression Tree Breiman and et al. 1984 A result of machine learning research. One of the most popular](https://reader035.vdocuments.net/reader035/viewer/2022081612/5f58a4bbed255332c340f077/html5/thumbnails/4.jpg)
Prediction and Interpretation
• Prediction accuracy is of the first concern:
– Example: A company wants to select a small part of its
customers who respond, with high probability, to a direct mail
sent by company. method.
• In general, not onyl prediction accuracy, but also explaining
reasons of decision is important.
– Example: A bank must explain reasons for rejection to a loan
applicant.
• Decision tree has a good interpretation power.
Seoul National University. 3
![Page 5: Chapter 6. Decision Tree (Chapter 5 of PKKSC) · 2017-03-28 · Classi cation And Regression Tree Breiman and et al. 1984 A result of machine learning research. One of the most popular](https://reader035.vdocuments.net/reader035/viewer/2022081612/5f58a4bbed255332c340f077/html5/thumbnails/5.jpg)
Example
Seoul National University. 4
![Page 6: Chapter 6. Decision Tree (Chapter 5 of PKKSC) · 2017-03-28 · Classi cation And Regression Tree Breiman and et al. 1984 A result of machine learning research. One of the most popular](https://reader035.vdocuments.net/reader035/viewer/2022081612/5f58a4bbed255332c340f077/html5/thumbnails/6.jpg)
Elements of Decision Tree
Seoul National University. 5
![Page 7: Chapter 6. Decision Tree (Chapter 5 of PKKSC) · 2017-03-28 · Classi cation And Regression Tree Breiman and et al. 1984 A result of machine learning research. One of the most popular](https://reader035.vdocuments.net/reader035/viewer/2022081612/5f58a4bbed255332c340f077/html5/thumbnails/7.jpg)
2. Construction of Decision Tree
Seoul National University. 6
![Page 8: Chapter 6. Decision Tree (Chapter 5 of PKKSC) · 2017-03-28 · Classi cation And Regression Tree Breiman and et al. 1984 A result of machine learning research. One of the most popular](https://reader035.vdocuments.net/reader035/viewer/2022081612/5f58a4bbed255332c340f077/html5/thumbnails/8.jpg)
Questions for tree construction
• We ask the following questions to construct a tree.
– Why does the root node ask for income?
– Why is the node 6 an internal node?
– What value does the tree assign to the node 7?
Seoul National University. 7
![Page 9: Chapter 6. Decision Tree (Chapter 5 of PKKSC) · 2017-03-28 · Classi cation And Regression Tree Breiman and et al. 1984 A result of machine learning research. One of the most popular](https://reader035.vdocuments.net/reader035/viewer/2022081612/5f58a4bbed255332c340f077/html5/thumbnails/9.jpg)
Four ingredients in decision tree construction
• Splitting rule.
• Stopping rule
• Pruning rule.
• Assigning a prediction value for each terminal node.
Seoul National University. 8
![Page 10: Chapter 6. Decision Tree (Chapter 5 of PKKSC) · 2017-03-28 · Classi cation And Regression Tree Breiman and et al. 1984 A result of machine learning research. One of the most popular](https://reader035.vdocuments.net/reader035/viewer/2022081612/5f58a4bbed255332c340f077/html5/thumbnails/10.jpg)
Four steps for construction of a decision tree
• Growing : Find an optimal splitting rule for each node and grow
the tree. Stop growing if stopping rule is satisfied.
• Pruning : Remove nodes which increase prediction error or which
have inappropriate inference rules. And also remove unnecessary
(redundant) nodes.
• Validation : Validate using gain chart, risk chart, test sample
error, cross validation and etc (to decide how much we prune the
tree)
• Interpretation and prediction : Interpret the constructed tree
and predict
Seoul National University. 9
![Page 11: Chapter 6. Decision Tree (Chapter 5 of PKKSC) · 2017-03-28 · Classi cation And Regression Tree Breiman and et al. 1984 A result of machine learning research. One of the most popular](https://reader035.vdocuments.net/reader035/viewer/2022081612/5f58a4bbed255332c340f077/html5/thumbnails/11.jpg)
Splitting Rule
• For each node, determine a splitting variable and splitting
criterion.
• For continuous splitting variable X, splitting criterion c is a
number. In general, tree assigns the case X <c to the left child
node and assigns the case X ≥ c to the right child node.
• For categorical variable, the splitting criterion divides the range
of the splitting variable in two parts. For example, {1, 2, 4} and
{3} is a splitting criterion for a splitting variable X whose range
{1, 2, 3, 4}. If a case X ∈ {1, 2, 3}, tree assigns it to the left child.
Otherwise , tree assigns it to the right child.
Seoul National University. 10
![Page 12: Chapter 6. Decision Tree (Chapter 5 of PKKSC) · 2017-03-28 · Classi cation And Regression Tree Breiman and et al. 1984 A result of machine learning research. One of the most popular](https://reader035.vdocuments.net/reader035/viewer/2022081612/5f58a4bbed255332c340f077/html5/thumbnails/12.jpg)
Purity
• Purity( or impurity) is the measure of homogeneity of the target
variable for a given node.
• For example, a node in which the ratio of group 0 and group 1 is
9:11 has a lower purity than a node in which the ratio of group 0
and group 1 is 1:9.
• For each node, we selection a splitting variable and a splitting
criterion which maximizes the sum of purities of the two child
nodes.
Seoul National University. 11
![Page 13: Chapter 6. Decision Tree (Chapter 5 of PKKSC) · 2017-03-28 · Classi cation And Regression Tree Breiman and et al. 1984 A result of machine learning research. One of the most popular](https://reader035.vdocuments.net/reader035/viewer/2022081612/5f58a4bbed255332c340f077/html5/thumbnails/13.jpg)
Purity (impurity) measure
• Decision tree grows by splitting of each node.
• After a node is split into two child nodes, the sum of the purity
of the child nodes is greater than the purity of the parent node.
• It means the child nodes are purer than the parent node.
• A splitting variable minimizes reduction of impurity for the
splitting.
• An easy candidate of impurity is the error rate. That is, tree
selects a splitting variable and a splitting criterion which
minimize error rate of child nodes.
Seoul National University. 12
![Page 14: Chapter 6. Decision Tree (Chapter 5 of PKKSC) · 2017-03-28 · Classi cation And Regression Tree Breiman and et al. 1984 A result of machine learning research. One of the most popular](https://reader035.vdocuments.net/reader035/viewer/2022081612/5f58a4bbed255332c340f077/html5/thumbnails/14.jpg)
Problem of error rate as an impurity measure
• Split 2 has the same error rate as Split 1, but Split 2 is better in
the sense that the left child node can be split further.
Seoul National University. 13
![Page 15: Chapter 6. Decision Tree (Chapter 5 of PKKSC) · 2017-03-28 · Classi cation And Regression Tree Breiman and et al. 1984 A result of machine learning research. One of the most popular](https://reader035.vdocuments.net/reader035/viewer/2022081612/5f58a4bbed255332c340f077/html5/thumbnails/15.jpg)
Conditions for impurity functions
• We have seen that the error rate is not a good impurity measure.
• It would be appropriate that the impurity measure is small when
one of the child nodes has an extremely small error rate.
• The impurity function ϕ : [0, 1] → [0,∞) should satisfy
– ϕ(0) = ϕ(1) = 0
– ϕ(1/2) = maximum
– ϕ(p) = ϕ(1− p)
• Also, to give more impurity for p around 1/2, we require that the
impurity measure is concave.
• Proposition. For given node t, let
∆i(t) = ϕ(pt)− (ϕ(ptR) + ϕ(ptL)). Then ∆i(t) ≥ 0. (see
Proposition 4.4. in Breiman et al. (1984))
Seoul National University. 14
![Page 16: Chapter 6. Decision Tree (Chapter 5 of PKKSC) · 2017-03-28 · Classi cation And Regression Tree Breiman and et al. 1984 A result of machine learning research. One of the most popular](https://reader035.vdocuments.net/reader035/viewer/2022081612/5f58a4bbed255332c340f077/html5/thumbnails/16.jpg)
Impurity functions
• Classification model
– χ2 statistic.
– Gini index: ϕ(p) = p(1− p)
– Entropy index: ϕ(p) = p log p+ (1− p) log(1− p)
• Regression model
– F statistic of ANOVA.
– Decrement of variance.
Seoul National University. 15
![Page 17: Chapter 6. Decision Tree (Chapter 5 of PKKSC) · 2017-03-28 · Classi cation And Regression Tree Breiman and et al. 1984 A result of machine learning research. One of the most popular](https://reader035.vdocuments.net/reader035/viewer/2022081612/5f58a4bbed255332c340f077/html5/thumbnails/17.jpg)
χ2 statistics
• For given splitting variable and splitting criterion, we make the
following table.
• This table is called by observed frequency O table.
Seoul National University. 16
![Page 18: Chapter 6. Decision Tree (Chapter 5 of PKKSC) · 2017-03-28 · Classi cation And Regression Tree Breiman and et al. 1984 A result of machine learning research. One of the most popular](https://reader035.vdocuments.net/reader035/viewer/2022081612/5f58a4bbed255332c340f077/html5/thumbnails/18.jpg)
χ2 statistics
• We can compute the expected frequency E of previous table.
Seoul National University. 17
![Page 19: Chapter 6. Decision Tree (Chapter 5 of PKKSC) · 2017-03-28 · Classi cation And Regression Tree Breiman and et al. 1984 A result of machine learning research. One of the most popular](https://reader035.vdocuments.net/reader035/viewer/2022081612/5f58a4bbed255332c340f077/html5/thumbnails/19.jpg)
χ2 statistics
• χ2 statistic.
χ2 =∑ (Eij −Oij)
2
Eij
• Apply to previous table, compute χ2 statistic
χ2 = (56− 32)2/56 + (24− 48)2/24
+ (154− 178)2/154 + (66− 42)2/66
= 46.75
• Find the splitting variable and the splitting criterion maximize
χ2 statistic.
Seoul National University. 18
![Page 20: Chapter 6. Decision Tree (Chapter 5 of PKKSC) · 2017-03-28 · Classi cation And Regression Tree Breiman and et al. 1984 A result of machine learning research. One of the most popular](https://reader035.vdocuments.net/reader035/viewer/2022081612/5f58a4bbed255332c340f077/html5/thumbnails/20.jpg)
Gini index
• Gini index
Gini index = Probability of Good at left child
× Probability of Bad at left child
+ Probability of Good at right child
× Probability of Good at right child
• Apply to previous table, compute Gini index
Gini index = (32/80) ∗ (48/80) + (178/220) ∗ (42/220) = 0.3944
• Find the splitting variable and the splitting criterion minimizing
Gini index.
Seoul National University. 19
![Page 21: Chapter 6. Decision Tree (Chapter 5 of PKKSC) · 2017-03-28 · Classi cation And Regression Tree Breiman and et al. 1984 A result of machine learning research. One of the most popular](https://reader035.vdocuments.net/reader035/viewer/2022081612/5f58a4bbed255332c340f077/html5/thumbnails/21.jpg)
Entropy index
• Entropy index
Entropy index = Probability of Good at left child
× log(Probability of Good at left child)
+ Probability of Bad at left child
× log(Probability of Bad at left child)
+ Probability of Good at right child
× log(Probability of Good at right child)
+ Probability of Bad at right child
× log(Probability of Bad at right child)
Seoul National University. 20
![Page 22: Chapter 6. Decision Tree (Chapter 5 of PKKSC) · 2017-03-28 · Classi cation And Regression Tree Breiman and et al. 1984 A result of machine learning research. One of the most popular](https://reader035.vdocuments.net/reader035/viewer/2022081612/5f58a4bbed255332c340f077/html5/thumbnails/22.jpg)
• Apply to previous table, comput entropy index
entropy = (32/80) ∗ log(32/80) + (48/80) ∗ log(48/80)
+ (178/200) ∗ log(178/200) + (42/200) ∗ log(42/200)
= −0.4796
• Find the splitting variable and the splitting criterion minimizing
entropy.
Seoul National University. 21
![Page 23: Chapter 6. Decision Tree (Chapter 5 of PKKSC) · 2017-03-28 · Classi cation And Regression Tree Breiman and et al. 1984 A result of machine learning research. One of the most popular](https://reader035.vdocuments.net/reader035/viewer/2022081612/5f58a4bbed255332c340f077/html5/thumbnails/23.jpg)
Example : Splitting method
• Using Gini index, find an optimal split for following table.
Seoul National University. 22
![Page 24: Chapter 6. Decision Tree (Chapter 5 of PKKSC) · 2017-03-28 · Classi cation And Regression Tree Breiman and et al. 1984 A result of machine learning research. One of the most popular](https://reader035.vdocuments.net/reader035/viewer/2022081612/5f58a4bbed255332c340f077/html5/thumbnails/24.jpg)
Example : Splitting method
1. Split by temperature.
1-1. left node={hot}, right node={mild,cold}.
• Gini index = 3/4 ∗ 1/4 + 3/10 ∗ 7/10 = 0.3975
Seoul National University. 23
![Page 25: Chapter 6. Decision Tree (Chapter 5 of PKKSC) · 2017-03-28 · Classi cation And Regression Tree Breiman and et al. 1984 A result of machine learning research. One of the most popular](https://reader035.vdocuments.net/reader035/viewer/2022081612/5f58a4bbed255332c340f077/html5/thumbnails/25.jpg)
Example : Splitting method
1. Split by temperature.
1-2. left node={mild}, right node={hot,cold}.
• Gini index = 1/6 ∗ 5/6 + 5/8 ∗ 3/8 = 0.373
Seoul National University. 24
![Page 26: Chapter 6. Decision Tree (Chapter 5 of PKKSC) · 2017-03-28 · Classi cation And Regression Tree Breiman and et al. 1984 A result of machine learning research. One of the most popular](https://reader035.vdocuments.net/reader035/viewer/2022081612/5f58a4bbed255332c340f077/html5/thumbnails/26.jpg)
Example : Splitting method
1. Split by temperature.
1-3. left node={cold}, right node={hot,mild}.
• Gini index = 2/4 ∗ 2/4 + 4/10 ∗ 6/10 = 0.49
Seoul National University. 25
![Page 27: Chapter 6. Decision Tree (Chapter 5 of PKKSC) · 2017-03-28 · Classi cation And Regression Tree Breiman and et al. 1984 A result of machine learning research. One of the most popular](https://reader035.vdocuments.net/reader035/viewer/2022081612/5f58a4bbed255332c340f077/html5/thumbnails/27.jpg)
Example : Splitting method
2. Split by humidity.
2-1. left node={high}, right node={normal}.
• Gini index = 3/7 ∗ 4/7 + 3/7 ∗ 4/7 = 0.489
Seoul National University. 26
![Page 28: Chapter 6. Decision Tree (Chapter 5 of PKKSC) · 2017-03-28 · Classi cation And Regression Tree Breiman and et al. 1984 A result of machine learning research. One of the most popular](https://reader035.vdocuments.net/reader035/viewer/2022081612/5f58a4bbed255332c340f077/html5/thumbnails/28.jpg)
Example : Splitting method
3. Split by windy.
3-1. left node={false}, right node={true}.
• Gini index = 4/8 ∗ 4/8 + 2/6 ∗ 4/6 = 0.472
Seoul National University. 27
![Page 29: Chapter 6. Decision Tree (Chapter 5 of PKKSC) · 2017-03-28 · Classi cation And Regression Tree Breiman and et al. 1984 A result of machine learning research. One of the most popular](https://reader035.vdocuments.net/reader035/viewer/2022081612/5f58a4bbed255332c340f077/html5/thumbnails/29.jpg)
Example : Splitting method
• Select a split with the smallest impurity.
• Thus, select the split 1-2.
Seoul National University. 28
![Page 30: Chapter 6. Decision Tree (Chapter 5 of PKKSC) · 2017-03-28 · Classi cation And Regression Tree Breiman and et al. 1984 A result of machine learning research. One of the most popular](https://reader035.vdocuments.net/reader035/viewer/2022081612/5f58a4bbed255332c340f077/html5/thumbnails/30.jpg)
Measurement of impurity in regression model
• Use a split with the smallest significant probability of t statistic
which tests difference between means of both child nodes.
• Use a split with the smallest sum of variances of both child nodes.
Seoul National University. 29
![Page 31: Chapter 6. Decision Tree (Chapter 5 of PKKSC) · 2017-03-28 · Classi cation And Regression Tree Breiman and et al. 1984 A result of machine learning research. One of the most popular](https://reader035.vdocuments.net/reader035/viewer/2022081612/5f58a4bbed255332c340f077/html5/thumbnails/31.jpg)
Remark on impurity
• Impurity is defined for each node.
• For a given nore, selection of split rule is done using the smallest
sum of impurities of child nodes. This maximizes the difference
between impurity of parent node and sum of impurity of child
nodes.
• For choosing the optimal split rule among several nodes, find the
split that not minimizes the sum of impurities of the child nodes,
but maximizes difference of impurity between parent node and
child nodes.
Seoul National University. 30
![Page 32: Chapter 6. Decision Tree (Chapter 5 of PKKSC) · 2017-03-28 · Classi cation And Regression Tree Breiman and et al. 1984 A result of machine learning research. One of the most popular](https://reader035.vdocuments.net/reader035/viewer/2022081612/5f58a4bbed255332c340f077/html5/thumbnails/32.jpg)
Remark on multi-way split
• We have considered binary split.
• That is, each node can have only two children.
• We can think of multi-way split where a node can have more
than two children.
• There is such an option in SAS E-Miner.
• However, it is known that multi-way split is inferior to binary
split.
• This is partly because multi-way split is too much greedy.
• Also, note that multi-way split can be represented by several
binary splits.
Seoul National University. 31
![Page 33: Chapter 6. Decision Tree (Chapter 5 of PKKSC) · 2017-03-28 · Classi cation And Regression Tree Breiman and et al. 1984 A result of machine learning research. One of the most popular](https://reader035.vdocuments.net/reader035/viewer/2022081612/5f58a4bbed255332c340f077/html5/thumbnails/33.jpg)
Stopping rules
• Stopping rules terminates further splitting.
• For example
– All observations in a node are contained in a group.
– The number of observations in a node is small.
– The decrement of impurity is small.
– The depth of a node is larger than a given number.
Seoul National University. 32
![Page 34: Chapter 6. Decision Tree (Chapter 5 of PKKSC) · 2017-03-28 · Classi cation And Regression Tree Breiman and et al. 1984 A result of machine learning research. One of the most popular](https://reader035.vdocuments.net/reader035/viewer/2022081612/5f58a4bbed255332c340f077/html5/thumbnails/34.jpg)
Pruning
• A tree with too many nodes will have large prediction error rate
for new observations.
• It is appropriate to prune away some branch of tree for good
prediction error rate.
• To determine the size of tree, we estimate prediction error using
validation set or cross validation.
Seoul National University. 33
![Page 35: Chapter 6. Decision Tree (Chapter 5 of PKKSC) · 2017-03-28 · Classi cation And Regression Tree Breiman and et al. 1984 A result of machine learning research. One of the most popular](https://reader035.vdocuments.net/reader035/viewer/2022081612/5f58a4bbed255332c340f077/html5/thumbnails/35.jpg)
Pruning Process
• For a given tree T and positive number a, cost-complexity
pruning is defined by
cost-complexity(a) = error rate of T + a|T |
where |T | is the number of nodes.
• In general, the larger tree(the larger |T |), the smaller error rate.
But cost-complexity does not decrease as |T | increases.
• For the grown tree Tm, T (a) is a subtree which minimizes
cost-complexity(a).
• In general, the larger a, the smaller |T (a)|.
Seoul National University. 34
![Page 36: Chapter 6. Decision Tree (Chapter 5 of PKKSC) · 2017-03-28 · Classi cation And Regression Tree Breiman and et al. 1984 A result of machine learning research. One of the most popular](https://reader035.vdocuments.net/reader035/viewer/2022081612/5f58a4bbed255332c340f077/html5/thumbnails/36.jpg)
Pruning Process
• One important property of T (a) is that T (a) is a subtree of T (b)
when a > b.
• This saves the computation for searching T (a) significantly.
• See Breiman et al. (1984) for proof.
• For a given a, we estimate the generalization error of T (a) by
cross-validation.
• Choose a∗ (and crresponding T (a∗)) which minimizes the
(estimated) generalization error.
Seoul National University. 35
![Page 37: Chapter 6. Decision Tree (Chapter 5 of PKKSC) · 2017-03-28 · Classi cation And Regression Tree Breiman and et al. 1984 A result of machine learning research. One of the most popular](https://reader035.vdocuments.net/reader035/viewer/2022081612/5f58a4bbed255332c340f077/html5/thumbnails/37.jpg)
3. Some Algorithms for Decision Tree
Seoul National University. 36
![Page 38: Chapter 6. Decision Tree (Chapter 5 of PKKSC) · 2017-03-28 · Classi cation And Regression Tree Breiman and et al. 1984 A result of machine learning research. One of the most popular](https://reader035.vdocuments.net/reader035/viewer/2022081612/5f58a4bbed255332c340f077/html5/thumbnails/38.jpg)
CART
• Classification And Regression Tree
• Breiman and et al. 1984
• A result of machine learning research.
• One of the most popular decision tree algorithm.
• Using Gini index or entropy
• Cost-complexity pruning is an important unique feature.
• Can consider a split rule based on a linear combination of
variables.
• Missing data can be processed using surrogate variables.
Seoul National University. 37
![Page 39: Chapter 6. Decision Tree (Chapter 5 of PKKSC) · 2017-03-28 · Classi cation And Regression Tree Breiman and et al. 1984 A result of machine learning research. One of the most popular](https://reader035.vdocuments.net/reader035/viewer/2022081612/5f58a4bbed255332c340f077/html5/thumbnails/39.jpg)
C4.5
• J. Ross Quinlan
• The early version : ID3(Iterative Dichotomizer 3) 1986
• Multisplit is available.
• For categorical input variable, a node splits into the number of
categories.
• C4.5 employs entropy as impurity.
• C4.5 uses test data set for pruning process.
Seoul National University. 38
![Page 40: Chapter 6. Decision Tree (Chapter 5 of PKKSC) · 2017-03-28 · Classi cation And Regression Tree Breiman and et al. 1984 A result of machine learning research. One of the most popular](https://reader035.vdocuments.net/reader035/viewer/2022081612/5f58a4bbed255332c340f077/html5/thumbnails/40.jpg)
C4.5
• There is a program to generate rules from tree.
• Example
– Watch a game, home team wins , drink beer.
– Watch a game, home team wins, drink soda.
– Watch a game, home team loses, drink beer.
– Watch a game, home team loses, drink milk.
• There is no relation between wining or losing of home team and
drinking beer. → Watch a game, drink beer.
Seoul National University. 39
![Page 41: Chapter 6. Decision Tree (Chapter 5 of PKKSC) · 2017-03-28 · Classi cation And Regression Tree Breiman and et al. 1984 A result of machine learning research. One of the most popular](https://reader035.vdocuments.net/reader035/viewer/2022081612/5f58a4bbed255332c340f077/html5/thumbnails/41.jpg)
CHAID
• Chi-squared Automatic Interaction Detection
• J. A. Hartigan 1975
• Successor of AID described by J. A. Morgan and N. A. Souquist
1963
• CHAID has no pruning process, it stops growing at a certain size.
• Categorical input variable only.
• CHAID employs χ2 statistic as impurity.
Seoul National University. 40
![Page 42: Chapter 6. Decision Tree (Chapter 5 of PKKSC) · 2017-03-28 · Classi cation And Regression Tree Breiman and et al. 1984 A result of machine learning research. One of the most popular](https://reader035.vdocuments.net/reader035/viewer/2022081612/5f58a4bbed255332c340f077/html5/thumbnails/42.jpg)
4. Advantages and Disadvantages of Decision Tree
Seoul National University. 41
![Page 43: Chapter 6. Decision Tree (Chapter 5 of PKKSC) · 2017-03-28 · Classi cation And Regression Tree Breiman and et al. 1984 A result of machine learning research. One of the most popular](https://reader035.vdocuments.net/reader035/viewer/2022081612/5f58a4bbed255332c340f077/html5/thumbnails/43.jpg)
Advantages
• Tree generates easy rules.
• Classification is easy.
• Deal with both categorical and continuous variables.
• Find the most significant variable.
• Robust to input outliers.
• Nonparametric model.
Seoul National University. 42
![Page 44: Chapter 6. Decision Tree (Chapter 5 of PKKSC) · 2017-03-28 · Classi cation And Regression Tree Breiman and et al. 1984 A result of machine learning research. One of the most popular](https://reader035.vdocuments.net/reader035/viewer/2022081612/5f58a4bbed255332c340f077/html5/thumbnails/44.jpg)
Disadvantages
• Poor prediction accuracy for linear regression model with
continuous target variable.
• When depth is large, not only accuracy but interpretation are
bad.
• Heavy computation cost.
• Unstable.
• Absence of linearity and main effects (all nodes are high order
interactions)
Seoul National University. 43
![Page 45: Chapter 6. Decision Tree (Chapter 5 of PKKSC) · 2017-03-28 · Classi cation And Regression Tree Breiman and et al. 1984 A result of machine learning research. One of the most popular](https://reader035.vdocuments.net/reader035/viewer/2022081612/5f58a4bbed255332c340f077/html5/thumbnails/45.jpg)
5. Decision tree as high dimensional nonlinearfunction estimation
• We can write the decision tree as
f̂(x) =∑t∈T̃
αtI(x ∈ Rt)
where T̃ is the set of terminal nodes, αt are predictive values at
node t and Rt is the range of input variables at node t.
• Rt is given as Rt = I(x1 ∈ A1, . . . , xp ∈ Ap) where Ai is the
subset of the domain of xi.
• In this view, Decision tree can be considered as a local constant
model.
• Art of decision tree is to find Rt, t ∈ T̃ .
• Global search for the optimal Rt is NP-complete.
Seoul National University. 44
![Page 46: Chapter 6. Decision Tree (Chapter 5 of PKKSC) · 2017-03-28 · Classi cation And Regression Tree Breiman and et al. 1984 A result of machine learning research. One of the most popular](https://reader035.vdocuments.net/reader035/viewer/2022081612/5f58a4bbed255332c340f077/html5/thumbnails/46.jpg)
• Decision tree finds Rt greedily similar to the forward selection
procedure.
• In each node, we find a split rule which uses only one variable.
Hence, we can say that Decision Tree uses a univariate function
estimation procedure (i.e. local constant fit) repeatedly
(greedily).
• Most of high dimensional nonlinear function estimation uses this
idea to overcome curse of dimensionality.
• An important disadvantage of this approach (repeated univariate
function estimation) is that the final model might be sub-optimal
and can be unstable.
Seoul National University. 45