business intelligence technologies – data mining lecture 4 classification, decision tree 1
TRANSCRIPT
![Page 1: Business Intelligence Technologies – Data Mining Lecture 4 Classification, Decision Tree 1](https://reader035.vdocuments.net/reader035/viewer/2022062518/56649e305503460f94b21617/html5/thumbnails/1.jpg)
Business Intelligence Technologies – Data Mining
Lecture 4 Classification, Decision Tree
1
![Page 2: Business Intelligence Technologies – Data Mining Lecture 4 Classification, Decision Tree 1](https://reader035.vdocuments.net/reader035/viewer/2022062518/56649e305503460f94b21617/html5/thumbnails/2.jpg)
Age Income City Gender Response
30 50K New York M No
50 125K Tampa F Yes
… … … … Yes
… … … … No
28 35K Orlando M ???
… .. … … ???
… .. … … ???
Classification
Mapping instances onto a predefined set of classes. Examples
Classify each customers as “Responder” versus “Non-Responder”
Classify cellular calls as “legitimate” versus “fraudulent”
2
![Page 3: Business Intelligence Technologies – Data Mining Lecture 4 Classification, Decision Tree 1](https://reader035.vdocuments.net/reader035/viewer/2022062518/56649e305503460f94b21617/html5/thumbnails/3.jpg)
Prediction Broad definition: build model to estimate any
type of values (predictive data mining) Narrow definition: estimate continuous values
Examples Predict how much money a customer will spend Predict the value of a stock
Age Income City Gender Dollar Spent
30 50K New York M $150
50 125K Tampa F $400
… … … … $0
… … … … $230
28 35K Orlando M ???
… .. … … ???
… .. … … ???3
![Page 4: Business Intelligence Technologies – Data Mining Lecture 4 Classification, Decision Tree 1](https://reader035.vdocuments.net/reader035/viewer/2022062518/56649e305503460f94b21617/html5/thumbnails/4.jpg)
Classification: Terminology
Inputs = Predictors = Independent Variables Outputs = Responses = Dependent Variables Models = Classifiers Data points: examples, instances With classification, we want to build a
(classification) model to predict the outputs given the inputs.
4
![Page 5: Business Intelligence Technologies – Data Mining Lecture 4 Classification, Decision Tree 1](https://reader035.vdocuments.net/reader035/viewer/2022062518/56649e305503460f94b21617/html5/thumbnails/5.jpg)
Steps in Classification
Input & Output Step 1:Building the Model
Step 2:Validating the Model
Step 3:Applying/Using
the Model
Model
ValidatedModel
OutputInput only
Input & Output
Training Data
Test Data
New Data
5
![Page 6: Business Intelligence Technologies – Data Mining Lecture 4 Classification, Decision Tree 1](https://reader035.vdocuments.net/reader035/viewer/2022062518/56649e305503460f94b21617/html5/thumbnails/6.jpg)
Common Classification Techniques
Decision tree Logistics regression K-nearest neighbors Neural Network Naïve Bayes
6
![Page 7: Business Intelligence Technologies – Data Mining Lecture 4 Classification, Decision Tree 1](https://reader035.vdocuments.net/reader035/viewer/2022062518/56649e305503460f94b21617/html5/thumbnails/7.jpg)
Balance
>=50K<50K
Age
<=45>45
Employed
Class=NotDefault
NoYes
Class=Default
Root
Leaf
Class=NotDefault
Class=NotDefault
Node
Name Balance Age Emp. DefaultMike 23,000 30 yes noMary 51,100 40 yes noBill 48,000 40 no noJim 53,000 45 no yesDave 65,000 60 no noAnne 30,000 35 no no
Decision Tree --- An Example
7
![Page 8: Business Intelligence Technologies – Data Mining Lecture 4 Classification, Decision Tree 1](https://reader035.vdocuments.net/reader035/viewer/2022062518/56649e305503460f94b21617/html5/thumbnails/8.jpg)
Decision Tree Representation
A series of nested tests:
Each node represents a teston one attribute Nominal attributes: each branch
could represent one or more values Numeric attributes are split into
ranges, normally binary split
Leaves A class assignment (E.g, Default /Not
default)
Balance
>=50K<50K
Age
<=45>45
Employed
Class=No
NoYes
Class=No
Class=No Class=Yes8
![Page 9: Business Intelligence Technologies – Data Mining Lecture 4 Classification, Decision Tree 1](https://reader035.vdocuments.net/reader035/viewer/2022062518/56649e305503460f94b21617/html5/thumbnails/9.jpg)
The Use of a Decision Tree: Classifying New Instances
To determine the class of a new instance: e.g., Mark, age 40, unemployed, balance 88K. The instance is routed down the
tree according to values of attributes.
At each node a test is applied to one attribute.
When a leaf is reached the instance is assigned to a class.
Mark: Yes
Balance
>=50K<50K
Age
<=45>45
Employed
Class=No
NoYes
Class=No
Class=No Class=Yes9
![Page 10: Business Intelligence Technologies – Data Mining Lecture 4 Classification, Decision Tree 1](https://reader035.vdocuments.net/reader035/viewer/2022062518/56649e305503460f94b21617/html5/thumbnails/10.jpg)
Goal of Decision Tree Construction Partition the training instances into purer sub
groups pure: the instances in a sub-group mostly belong to the
same class
Entire populationAge<45
Age>=45
Balance<50K
Balance>=50K
Age>=45
Age<45Default
Not default
How to build a tree: How to split instances into purer sub-groups
10
![Page 11: Business Intelligence Technologies – Data Mining Lecture 4 Classification, Decision Tree 1](https://reader035.vdocuments.net/reader035/viewer/2022062518/56649e305503460f94b21617/html5/thumbnails/11.jpg)
Why do we want to identify pure sub groups?
To classify a new instance, we can determine the leaf that the instance belongs to based on its attributes.
If the leaf is very pure (e.g. all have defaulted) we can determine with greater confidence that the new instance belongs to this class (i.e., the “Default” class.)
If the leaf is not very pure (e.g. a 50%/50% mixture of the two classes, Default and Not Default), our prediction for the new instance is more like a random guessing.
11
![Page 12: Business Intelligence Technologies – Data Mining Lecture 4 Classification, Decision Tree 1](https://reader035.vdocuments.net/reader035/viewer/2022062518/56649e305503460f94b21617/html5/thumbnails/12.jpg)
A tree is constructed by recursively partitioning the examples.
With each partition the examples are split into increasingly purer sub groups.
The key in building a tree: How to split
Decision Tree Construction
12
![Page 13: Business Intelligence Technologies – Data Mining Lecture 4 Classification, Decision Tree 1](https://reader035.vdocuments.net/reader035/viewer/2022062518/56649e305503460f94b21617/html5/thumbnails/13.jpg)
Building a Tree - Choosing a Split
Few Medium PAYSFew High PAYS
PhillyPhilly
CityPhillyPhilly
ChildrenManyMany
IncomeMediumLow
StatusDEFAULTSDEFAULTS
34
ApplicantID12
Try split on Children attribute:
Try split on Income attribute:
Children
Many
Few
Income
Low
High
Medium
Notice how the split on the Children attribute gives purer partitions.It is therefore chosen as the first split (and in this case the only split – because the two sub-groups are 100% pure).
13
![Page 14: Business Intelligence Technologies – Data Mining Lecture 4 Classification, Decision Tree 1](https://reader035.vdocuments.net/reader035/viewer/2022062518/56649e305503460f94b21617/html5/thumbnails/14.jpg)
Better, as purity of sub-nodes is improving.
Recursive Steps in Building a TreeExample
STEP 1: Split Option A
Not good as sub-nodes are still very heterogenous!
STEP 1: Split Option B
STEP 2: Choose Split Option B as it is the better split.
STEP 3: Try out splits on each of the sub-nodes of Split Option B.Eventually, we arrive at:
Notice how examples in a parent node are split between sub-nodes - i.e. notice how the training examples are partitioned into smaller and smaller subsets. Also, notice that sub-nodes are purer than parent nodes. 14
![Page 15: Business Intelligence Technologies – Data Mining Lecture 4 Classification, Decision Tree 1](https://reader035.vdocuments.net/reader035/viewer/2022062518/56649e305503460f94b21617/html5/thumbnails/15.jpg)
Recursive Steps in Building a Tree
STEP 1: Try using different attributes to split the training examples into different
subsets.
STEP 2: Rank the splits. Choose the best split.
STEP 3: For each node obtained by splitting, repeat from STEP 1, until no more
good splits are possible.
Note: Usually it is not possible to create leaves that are completely pure - i.e. contain one class only - as that would result in a very bushy tree which is not sufficiently general. However, it is possible to create leaves that are purer – i.e. contain predominantly one class - and we can settle for that.
15
![Page 16: Business Intelligence Technologies – Data Mining Lecture 4 Classification, Decision Tree 1](https://reader035.vdocuments.net/reader035/viewer/2022062518/56649e305503460f94b21617/html5/thumbnails/16.jpg)
Purity Measures
Purity measures: Many available Gini (population diversity)
Entropy (information gain)
Information Gain Ratio
Chi-square Test
Most common one (from information theory) is: Information Gain Informally: How informative is the attribute in distinguishing
among instances (e.g., credit applicants) from different classes (Yes/No default)
16
![Page 17: Business Intelligence Technologies – Data Mining Lecture 4 Classification, Decision Tree 1](https://reader035.vdocuments.net/reader035/viewer/2022062518/56649e305503460f94b21617/html5/thumbnails/17.jpg)
Information GainConsider the two following splits.
Which one is more informative?
Split over whether Balance exceeds 50K
Over 50KLess or equal 50K EmployedUnemployed
Split over whether applicant is employed
17
![Page 18: Business Intelligence Technologies – Data Mining Lecture 4 Classification, Decision Tree 1](https://reader035.vdocuments.net/reader035/viewer/2022062518/56649e305503460f94b21617/html5/thumbnails/18.jpg)
Information Gain Impurity/Entropy:
Measures the level of impurity/chaos in a group of examples Information gain is defined as the decrease in impurity with
the split generating more pure sub-groups
18
![Page 19: Business Intelligence Technologies – Data Mining Lecture 4 Classification, Decision Tree 1](https://reader035.vdocuments.net/reader035/viewer/2022062518/56649e305503460f94b21617/html5/thumbnails/19.jpg)
Impurity
Very impure group Less impure Minimum impurity
When examples can belong to one of two classes: What is the worst case of impurity?
19
![Page 20: Business Intelligence Technologies – Data Mining Lecture 4 Classification, Decision Tree 1](https://reader035.vdocuments.net/reader035/viewer/2022062518/56649e305503460f94b21617/html5/thumbnails/20.jpg)
Calculating Impurity
i
ii pp 2log
997.030
16log
30
16
30
14log
30
1422
20
![Page 21: Business Intelligence Technologies – Data Mining Lecture 4 Classification, Decision Tree 1](https://reader035.vdocuments.net/reader035/viewer/2022062518/56649e305503460f94b21617/html5/thumbnails/21.jpg)
2-class Cases:
What is the impurity of a group in which all examples belong to the same class? Impurity= - 1 log21 - 0 log20 = 0 (lowest possible value)
What is the impurity of a group with 50% in either class? Impurity= -0.5 log20.5 – 0.5 log20.5 =1
(highest possible value)
Minimum impurity
Maximumimpurity
0 log(0) is defined as 021
![Page 22: Business Intelligence Technologies – Data Mining Lecture 4 Classification, Decision Tree 1](https://reader035.vdocuments.net/reader035/viewer/2022062518/56649e305503460f94b21617/html5/thumbnails/22.jpg)
Calculating Information Gain
997.030
16log
30
16
30
14log
30
1422
impurity
787.017
4log
17
4
17
13log
17
1322
impurity
Entire population (30 instances)
Balance<50K
Balance>=50K
17 instances
13 instances
(Weighted) Average Impurity of Children = 615.0391.030
13787.0
30
17
Information Gain= 0.997 - 0.615 = 0.382
391.013
12log
13
12
13
1log
13
122
impurity
Information Gain = Impurity (parent) – Impurity (children)
22
![Page 23: Business Intelligence Technologies – Data Mining Lecture 4 Classification, Decision Tree 1](https://reader035.vdocuments.net/reader035/viewer/2022062518/56649e305503460f94b21617/html5/thumbnails/23.jpg)
Information Gain of a Split
The weighted average of the impurity of the children nodes after a split is compared with the impurity of the parent node to see how much information is gained through the split.
A child node with fewer examples in it has a lower weight
23
![Page 24: Business Intelligence Technologies – Data Mining Lecture 4 Classification, Decision Tree 1](https://reader035.vdocuments.net/reader035/viewer/2022062518/56649e305503460f94b21617/html5/thumbnails/24.jpg)
Information Gain = Impurity (parent) – Impurity (children)
Impurity(A) =0.997Impurity(B,C) = 0.615
Gain=0.382
Gain=0.381
Age<45
Age>=45
Balance<50K
Balance>=50KEntire population
Impurity(D,E) =0.406
Information Gain
Impurity(B) = 0.787
Impurity (C)= 0.391
Impurity(D)=0 Log20 +1 log21=0
Impurity(E) =-3/7 Log23/7 -
4/7Log24/7=0.985
A
B
C
D
E
24
![Page 25: Business Intelligence Technologies – Data Mining Lecture 4 Classification, Decision Tree 1](https://reader035.vdocuments.net/reader035/viewer/2022062518/56649e305503460f94b21617/html5/thumbnails/25.jpg)
Which attribute to split over?
At each node examine splits over each of the attributes
Select the attribute for which the maximum information gain is obtained For a continuous attribute, also need to consider different
ways of splitting (>50 or <=50; >60 or <=60)
For a categorical attribute with lots of possible values, sometimes also need to consider how to group these values ( branch 1 corresponds to {A,B,E} and branch 2 corresponds to {C,D,F,G})
25
![Page 26: Business Intelligence Technologies – Data Mining Lecture 4 Classification, Decision Tree 1](https://reader035.vdocuments.net/reader035/viewer/2022062518/56649e305503460f94b21617/html5/thumbnails/26.jpg)
Calculating the Information Gain of a Split
26
1. For each sub-group produced by the split, calculate the impurity/entropy of that subset.
2. Calculate the weighted impurity of the split by weighting each sub-group’s impurity by the proportion of training examples (out of the training examples in the parent node) that are in that subset.
3. Calculate the impurity of the parent node, and subtract the weighted impurity of the child nodes to obtain the information gain for the split.
Note: If impurity is increasing with the split then the tree is getting worse, so we wouldn’t want to make the split ! So, information gain of the split needs to be positive in order to choose to split. 26
![Page 27: Business Intelligence Technologies – Data Mining Lecture 4 Classification, Decision Tree 1](https://reader035.vdocuments.net/reader035/viewer/2022062518/56649e305503460f94b21617/html5/thumbnails/27.jpg)
Person Hair Length
Weight Age Class
Homer 0” 250 36 M
Marge 10” 150 34 F
Bart 2” 90 10 M
Lisa 6” 78 8 F
Maggie 4” 20 1 F
Abe 1” 170 70 M
Selma 8” 160 41 F
Otto 10” 180 38 M
Krusty 6” 200 45 M
Comic 8” 290 38 ?27
![Page 28: Business Intelligence Technologies – Data Mining Lecture 4 Classification, Decision Tree 1](https://reader035.vdocuments.net/reader035/viewer/2022062518/56649e305503460f94b21617/html5/thumbnails/28.jpg)
Hair Length <= 5?yes no
Entropy(4F,5M) = -(4/9)log2(4/9) - (5/9)log2(5/9) = 0.9911
Entropy(1F,3M) = -(1/4)log2(1/4) - (3/4)log
2(3/4)
= 0.8113
Entropy(3F,2M) = -(3/5)log2(3/5) - (2/5)log
2(2/5)
= 0.9710
Gain(Hair Length <= 5) = 0.9911 – (4/9 * 0.8113 + 5/9 * 0.9710 ) = 0.0911
Let us try splitting on Hair length
Let us try splitting on Hair length
Gain= Entropy of parent – Weighted average of entropies of the children
28
![Page 29: Business Intelligence Technologies – Data Mining Lecture 4 Classification, Decision Tree 1](https://reader035.vdocuments.net/reader035/viewer/2022062518/56649e305503460f94b21617/html5/thumbnails/29.jpg)
Weight <= 160?yes no
Entropy(4F,5M) = -(4/9)log2(4/9) - (5/9)log2(5/9) = 0.9911
Entropy(4F,1M) = -(4/5)log2(4/5) - (1/5)log
2(1/5)
= 0.7219
Entropy(0F,4M) = -(0/4)log2(0/4) - (4/4)log
2(4/4)
= 0
Gain(Weight <= 160) = 0.9911 – (5/9 * 0.7219 + 4/9 * 0 ) = 0.5900
Let us try splitting on Weight
Let us try splitting on Weight
29
![Page 30: Business Intelligence Technologies – Data Mining Lecture 4 Classification, Decision Tree 1](https://reader035.vdocuments.net/reader035/viewer/2022062518/56649e305503460f94b21617/html5/thumbnails/30.jpg)
age <= 40?yes no
Entropy(4F,5M) = -(4/9)log2(4/9) - (5/9)log2(5/9) = 0.9911
Entropy(3F,3M) = -(3/6)log2(3/6) - (3/6)log
2(3/6)
= 1
Entropy(1F,2M) = -(1/3)log2(1/3) - (2/3)log
2(2/3)
= 0.9183
Gain(Age <= 40) = 0.9911 – (6/9 * 1 + 3/9 * 0.9183 ) = 0.0183
Let us try splitting on Age
Let us try splitting on Age
30
![Page 31: Business Intelligence Technologies – Data Mining Lecture 4 Classification, Decision Tree 1](https://reader035.vdocuments.net/reader035/viewer/2022062518/56649e305503460f94b21617/html5/thumbnails/31.jpg)
Weight <= 160?yes no
Hair Length <= 2?yes no
Of the 3 features we had, Weight was the best. But while people who weigh over 160 are perfectly classified (as males), the under 160 people are not perfectly classified… So we simply continue splitting!
This time we find that we can split on Hair length, and then we are done!
Aha, I got a tree!Note: the splitting decision is not only to choose attribute, but to choose the value to split for a continuous attribute (e.g. Hair <=5 or Hair <=2) 31
![Page 32: Business Intelligence Technologies – Data Mining Lecture 4 Classification, Decision Tree 1](https://reader035.vdocuments.net/reader035/viewer/2022062518/56649e305503460f94b21617/html5/thumbnails/32.jpg)
Building a Tree - Stopping Criteria
You can stop building the tree when: The impurity of all nodes is zero: Problem is that this
tends to lead to bushy, highly-branching trees, often with one example at each node.
No split achieves a significant gain in purity (information gain not high enough)
Node size is too small: That is, there are less than a certain number of examples, or proportion of the training set, at each node.
32
![Page 33: Business Intelligence Technologies – Data Mining Lecture 4 Classification, Decision Tree 1](https://reader035.vdocuments.net/reader035/viewer/2022062518/56649e305503460f94b21617/html5/thumbnails/33.jpg)
Reading Rules off the Decision Tree
IF Income=High AND Gender=Male AND Children=Few THEN Non-Responder
For each leaf in the tree, read the rule from the root to that leaf.You will arrive at a set of rules.
Income
Debts
GenderChildren
Responder
Non-Responder
Non-Responder
Low
Male
High
Low
High
Female
Many
Few
IF Income=Low AND Debts=Low THEN Non-Responder
IF Income=Low AND Debts=High THEN Responder
IF Income=High AND Gender=Male AND Children=Many THEN Responder
Non-Responder
Responder
IF Income=High AND Gender=Female THEN Non-Responder
33
![Page 34: Business Intelligence Technologies – Data Mining Lecture 4 Classification, Decision Tree 1](https://reader035.vdocuments.net/reader035/viewer/2022062518/56649e305503460f94b21617/html5/thumbnails/34.jpg)
Estimate accuracy of the model
The known label of test sample is compared with the classified result from the model
Accuracy rate is the percentage of test set samples that are correctly classified by the model
If the accuracy on testing data is acceptable, we can use the model to classify instances whose class labels are not known
34
![Page 35: Business Intelligence Technologies – Data Mining Lecture 4 Classification, Decision Tree 1](https://reader035.vdocuments.net/reader035/viewer/2022062518/56649e305503460f94b21617/html5/thumbnails/35.jpg)
There are many possible splitting rules that perfectly classify the data, but will not generalize to future datasets.
Glasses
YesNo
Hair
Female
ShortLong
Male Female
Hair
Female
ShortLong
Male
Name Hair Glasses ClassMary Long No FemaleMike Short No MaleBill Short No MaleJane Long No FemaleAnn Short Yes Female
Hair Glasses Tree 1 Tree 2 TRUEShort Yes Female Male MaleShort No Male Male FemaleLong No Female Female FemaleShort Yes Female Male Male
Error: 75% 25%
Training
Testing
Tree 1
Tree 2
35
![Page 36: Business Intelligence Technologies – Data Mining Lecture 4 Classification, Decision Tree 1](https://reader035.vdocuments.net/reader035/viewer/2022062518/56649e305503460f94b21617/html5/thumbnails/36.jpg)
Overfitting & Underfitting
Notice how the error rate on the testing data increases for overly large trees.
Overfitting: the model performs poorly on new examples (e.g. testing examples) as it is too highly trained to the specific training examples (pick up patterns and noises).
Underfitting: the model performs poorly on new examples as it is too simplistic to distinguish between them (i.e. has not picked up the important patterns from the training examples)
underfitting overfitting
36
![Page 37: Business Intelligence Technologies – Data Mining Lecture 4 Classification, Decision Tree 1](https://reader035.vdocuments.net/reader035/viewer/2022062518/56649e305503460f94b21617/html5/thumbnails/37.jpg)
PruningA decision trees is typically more accurate on its training data than on its test data. Removing branches from a tree can often improve its accuracy on a test set - so-called ‘reduced error pruning’. The intention of this pruning is to cut off branches from the tree when this improves performance on test data - this reduces overfitting and makes the tree more general.
Small is beautiful.
37
![Page 38: Business Intelligence Technologies – Data Mining Lecture 4 Classification, Decision Tree 1](https://reader035.vdocuments.net/reader035/viewer/2022062518/56649e305503460f94b21617/html5/thumbnails/38.jpg)
Decision tree A tree structure Internal node denotes a test on an attribute Branch represents an outcome of the test Leaf nodes represent class labels or class distribution
Decision tree generation consists of two phases Tree construction
At start, all the training examples are at the root Partition examples recursively based on selected attributes
Tree pruning Identify and remove branches that reflect noise or outliers To avoid overfitting
Use of decision tree: Classifying an unknown sample Test the attribute values of the sample against the decision tree
Decision Tree Classification in a Nutshell
38
![Page 39: Business Intelligence Technologies – Data Mining Lecture 4 Classification, Decision Tree 1](https://reader035.vdocuments.net/reader035/viewer/2022062518/56649e305503460f94b21617/html5/thumbnails/39.jpg)
Strengths & Weaknesses In practice: One of the most popular method. Why?
Very comprehensible – the tree structure specifies the entire decision structure
Easy for decision makers to understand model’s rational Map nicely to a set of business rules
Relatively easy to implement Very fast to run (to classify examples) with large data sets Good at handling missing values: just treat “missing” as a
value – can become a good predictor Weakness
Bad at handling continuous data, good at categorical input and output. Continuous output: high error rate Continuous input: ranges may introduce bias
39
![Page 40: Business Intelligence Technologies – Data Mining Lecture 4 Classification, Decision Tree 1](https://reader035.vdocuments.net/reader035/viewer/2022062518/56649e305503460f94b21617/html5/thumbnails/40.jpg)
Decision Tree VariationsRegression Trees
The leaves of a regression tree predict the average value for examples that reach that node.
Age
TrustFund
RetirementFund
<18
No
Yes
>65Yes
No
Professional18-65
Average Income = $10,000 p.a.Standard Deviation = $1,000
Yes
No
Average Income = $100 p.a.Standard Deviation = $10
Average Income = $100,000 p.a.Standard Deviation = $5,000
Average Income = $30,000 p.a.Standard Deviation = $2,000
Average Income = $20,000 p.a.Standard Deviation = $800
Average Income = $1,000 p.a.Standard Deviation = $150
40
![Page 41: Business Intelligence Technologies – Data Mining Lecture 4 Classification, Decision Tree 1](https://reader035.vdocuments.net/reader035/viewer/2022062518/56649e305503460f94b21617/html5/thumbnails/41.jpg)
Decision Tree VariationsModel Trees
The leaves of a model tree specify a function that can be used to predict the value of examples that reach that node.
Age
TrustFund
RetirementFund
<18
No
Yes
>65Yes
No
Professional18-65
Income Per Annum = Number Of Trust Funds * $5,000
Yes
No
Income Per Annum = Age * $50
Income Per Annum = $80,000 + (Age * $2,000)
Income Per Annum = FundValue * 10%
Income Per Annum = $100 * Number of Children
Income Per Annum = $20,000 + (Age * $1,000)
41
![Page 42: Business Intelligence Technologies – Data Mining Lecture 4 Classification, Decision Tree 1](https://reader035.vdocuments.net/reader035/viewer/2022062518/56649e305503460f94b21617/html5/thumbnails/42.jpg)
Classification Tree vs. Regression Tree Classification Tree:
Categorical output The leaves of the tree assign a class label or a probability of being in a
class Achieve nodes such that one class is predominant at each node.
Regression Tree: Numeric output The leaves of the tree assign an average value (regression trees) or
specify a function that can be used to compute a value (model trees) for examples that reach that node.
Achieve nodes such that means between nodes vary as much as possible and standard-deviation or variance within each node is as low as possible.
42
![Page 43: Business Intelligence Technologies – Data Mining Lecture 4 Classification, Decision Tree 1](https://reader035.vdocuments.net/reader035/viewer/2022062518/56649e305503460f94b21617/html5/thumbnails/43.jpg)
Different Decision Tree Algorithms ID3, ID4, ID5, C4.0, C4.5, C5.0, ACLS, and
ASSISTANT: Use information gain as splitting criterion
CART (Classification And Regression Trees): Uses Gini diversity index as measure of impurity when deciding
splitting.
CHAID: A statistical approach that uses the Chi-squared test when
deciding on the best split.
Hunt’s Concept Learning System (CLS), and MINIMAX: Minimizes the cost of classifying examples correctly or
incorrectly. 43
![Page 44: Business Intelligence Technologies – Data Mining Lecture 4 Classification, Decision Tree 1](https://reader035.vdocuments.net/reader035/viewer/2022062518/56649e305503460f94b21617/html5/thumbnails/44.jpg)
10-fold Cross Validation
Break data into 10 sets of size n/10. Train on 9 datasets and test on 1. Repeat 10 times and take a mean
accuracy.
44
![Page 45: Business Intelligence Technologies – Data Mining Lecture 4 Classification, Decision Tree 1](https://reader035.vdocuments.net/reader035/viewer/2022062518/56649e305503460f94b21617/html5/thumbnails/45.jpg)
45
SAS: % Captured Response
![Page 46: Business Intelligence Technologies – Data Mining Lecture 4 Classification, Decision Tree 1](https://reader035.vdocuments.net/reader035/viewer/2022062518/56649e305503460f94b21617/html5/thumbnails/46.jpg)
46
SAS: Lift Value
In SAS, %Response = Percentage of Responses in the top n% ranked individuals. It should be relatively high in the top deciles. And a decreasing plotted curve indicates a good model. The lift chart captures the same informationon a different scale.
![Page 47: Business Intelligence Technologies – Data Mining Lecture 4 Classification, Decision Tree 1](https://reader035.vdocuments.net/reader035/viewer/2022062518/56649e305503460f94b21617/html5/thumbnails/47.jpg)
Case Discussion Fleet
1. How many input variables are used to build the tree? How many show up in the tree built? Why?
2. How can the tree built be used for segmentation?
3. How can the new campaign results help enhance the tree?
HIV1. How does the pruning process work?
2. How does decision tree pick the interactions among variables?
47
![Page 48: Business Intelligence Technologies – Data Mining Lecture 4 Classification, Decision Tree 1](https://reader035.vdocuments.net/reader035/viewer/2022062518/56649e305503460f94b21617/html5/thumbnails/48.jpg)
Exercise – Decision TreeCustom
er ID
Student
Credit Rating
Class: Buy PDA
1 No Fair No
2 No Excellent No
3 No Fair Yes
4 No Fair Yes
5 Yes Fair Yes
6 Yes Excellent No
7 Yes Excellent Yes
8 No Excellent No
Which attribute to split on first?
log2(2/3) = -0.585, log2(1/3) = -1.585, log2(1/2) = -1, log2(3/5) = -0.737, Log2(2/5) = -1.322, log2(1/4) = -2, log2(3/4) = -0.415
48