data mining models and evaluation techniques
DESCRIPTION
Seminar Presentation on Data Mining Models and Evaluation TechniquesTRANSCRIPT
![Page 1: Data Mining Models and Evaluation Techniques](https://reader035.vdocuments.net/reader035/viewer/2022081520/577cc2e31a28aba71194a598/html5/thumbnails/1.jpg)
Data Mining Models & Data Mining Models & Evaluation TechniquesEvaluation Techniques
Created By – Shubham Pachori
Mentored By-Prof. K.P Agarwal
![Page 2: Data Mining Models and Evaluation Techniques](https://reader035.vdocuments.net/reader035/viewer/2022081520/577cc2e31a28aba71194a598/html5/thumbnails/2.jpg)
Motivation Metrics for Classifier’s Evaluation Methods for Classifier’s Evaluation Comparing the Performance of two
Classifiers Costs in Classification Ensemble Methods To Improve Accuracy
Overview
![Page 3: Data Mining Models and Evaluation Techniques](https://reader035.vdocuments.net/reader035/viewer/2022081520/577cc2e31a28aba71194a598/html5/thumbnails/3.jpg)
It is important to evaluate classifier’s generalization performance in order to: Determine whether to employ the classifier;
(For example: when learning the effectiveness of medical treatments from a limited-size data, it is important to estimate the accuracy of the classifiers.)
Optimize the classifier.(For example: when post-pruning decision trees we must evaluate the accuracy of the decision trees on each pruning step.)
Motivation
![Page 4: Data Mining Models and Evaluation Techniques](https://reader035.vdocuments.net/reader035/viewer/2022081520/577cc2e31a28aba71194a598/html5/thumbnails/4.jpg)
data
Targetdata Processed
data
Transformeddata Patterns
Knowledge
SelectionPreprocessing
& cleaning
Transformation& featureselection
Data Mining
InterpretationEvaluation
Model’s Evaluation in the KDD Process
![Page 5: Data Mining Models and Evaluation Techniques](https://reader035.vdocuments.net/reader035/viewer/2022081520/577cc2e31a28aba71194a598/html5/thumbnails/5.jpg)
Metrics for Classifier’s Evaluation
Predicted class
Actualclass
Pos Neg
Pos TP FN
Neg FP TN
P
N
Accuracy = (TP+TN)/(P+N) Error = (FP+FN)/(P+N) Precision = TP/(TP+FP) Recall/TP rate = TP/P FP Rate = FP/N
![Page 6: Data Mining Models and Evaluation Techniques](https://reader035.vdocuments.net/reader035/viewer/2022081520/577cc2e31a28aba71194a598/html5/thumbnails/6.jpg)
We can use: Training data; Independent test data; Hold-out method; k-fold cross-validation method; Leave-one-out method; Bootstrap method; Ensemble methods
How to Estimate the Metrics?
![Page 7: Data Mining Models and Evaluation Techniques](https://reader035.vdocuments.net/reader035/viewer/2022081520/577cc2e31a28aba71194a598/html5/thumbnails/7.jpg)
The accuracy/error estimates on the training data are not good indicators of performance on future data.
Q: Why? A: Because new data will probably not be exactly the same as the training data!
The accuracy/error estimates on the training data measure the degree of classifier’s overfitting.
Estimation with Training Data
Training set
Classifier
Training set
![Page 8: Data Mining Models and Evaluation Techniques](https://reader035.vdocuments.net/reader035/viewer/2022081520/577cc2e31a28aba71194a598/html5/thumbnails/8.jpg)
Estimation with independent test data is used when we have plenty of data and there is a natural way to forming training and test data.
For example: Quinlan in 1987 reported experiments in a medical domain for which the classifiers were trained on data from 1985 and tested on data from 1986.
Estimation with Independent Test Data
Training set
Classifier
Test set
![Page 9: Data Mining Models and Evaluation Techniques](https://reader035.vdocuments.net/reader035/viewer/2022081520/577cc2e31a28aba71194a598/html5/thumbnails/9.jpg)
The hold-out method splits the data into training data and test data (usually 2/3 for train, 1/3 for test). Then we build a classifier using the train data and test it using the test data.
The hold-out method is usually used when we have thousands of instances, including several hundred instances from each class.
Hold-out Method
Training set
Classifier
Test set
Data
![Page 10: Data Mining Models and Evaluation Techniques](https://reader035.vdocuments.net/reader035/viewer/2022081520/577cc2e31a28aba71194a598/html5/thumbnails/10.jpg)
Classification: Train, Validation, Test Split
Data
Predictions
Y N
Results Known
Training set
Validation set
++--+
Classifier BuilderEvaluate
+-+-
ClassifierFinal Test Set
+-+-
Final Evaluation
ModelBuilder
The test data can’t be used for parameter tuning!
![Page 11: Data Mining Models and Evaluation Techniques](https://reader035.vdocuments.net/reader035/viewer/2022081520/577cc2e31a28aba71194a598/html5/thumbnails/11.jpg)
Once evaluation is complete, all the data can be used to build the final classifier.
Generally, the larger the training data the better the classifier (but returns diminish).
The larger the test data the more accurate the error estimate.
Making the Most of the Data
![Page 12: Data Mining Models and Evaluation Techniques](https://reader035.vdocuments.net/reader035/viewer/2022081520/577cc2e31a28aba71194a598/html5/thumbnails/12.jpg)
The holdout method reserves a certain amount for testing and uses the remainder for training. Usually: one third for testing, the rest for training.
For “unbalanced” datasets, samples might not be representative. Few or none instances of some classes viz.
fraudulent transaction detection and Medical Diagnostic Tests
Stratified sample: advanced version of balancing the data. Make sure that each class is represented with
approximately equal proportions in both subsets.
Stratification
![Page 13: Data Mining Models and Evaluation Techniques](https://reader035.vdocuments.net/reader035/viewer/2022081520/577cc2e31a28aba71194a598/html5/thumbnails/13.jpg)
Holdout estimate can be made more reliable by repeating the process with different subsamples. In each iteration, a certain proportion is
randomly selected for training (possibly with stratification).
The error rates on the different iterations are averaged to yield an overall error rate.
This is called the repeated holdout method.
Repeated Holdout Method
![Page 14: Data Mining Models and Evaluation Techniques](https://reader035.vdocuments.net/reader035/viewer/2022081520/577cc2e31a28aba71194a598/html5/thumbnails/14.jpg)
Still not optimum: the different test sets overlap, but we would like all our instances from the data to be tested at least ones.
Can we prevent overlapping?
Repeated Holdout Method, 2
witten & eibe
![Page 15: Data Mining Models and Evaluation Techniques](https://reader035.vdocuments.net/reader035/viewer/2022081520/577cc2e31a28aba71194a598/html5/thumbnails/15.jpg)
k-fold cross-validation avoids overlapping test sets: First step: data is split into k subsets of equal
size; Second step: each subset in turn is used for
testing and the remainder for training. The subsets are stratified
before the cross-validation. The estimates are averaged to
yield an overall estimate.
k-Fold Cross-Validation
Classifier
Data
train train test
train test train
test train train
![Page 16: Data Mining Models and Evaluation Techniques](https://reader035.vdocuments.net/reader035/viewer/2022081520/577cc2e31a28aba71194a598/html5/thumbnails/16.jpg)
Standard method for evaluation: stratified 10-fold cross-validation.
Why 10? Extensive experiments have shown that this is the best choice to get an accurate estimate.
Stratification reduces the estimate’s variance. Even better: repeated stratified cross-validation:
E.g. ten-fold cross-validation is repeated ten times and results are averaged (reduces the variance).
More on Cross-Validation
![Page 17: Data Mining Models and Evaluation Techniques](https://reader035.vdocuments.net/reader035/viewer/2022081520/577cc2e31a28aba71194a598/html5/thumbnails/17.jpg)
Leave-One-Out is a particular form of cross-validation: Set number of folds to number of training
instances; I.e., for n training instances, build classifier
n times. Makes best use of the data. Involves no random sub-sampling. Very computationally expensive.
Leave-One-Out Cross-Validation
![Page 18: Data Mining Models and Evaluation Techniques](https://reader035.vdocuments.net/reader035/viewer/2022081520/577cc2e31a28aba71194a598/html5/thumbnails/18.jpg)
A disadvantage of Leave-One-Out-CV is that stratification is not possible: It guarantees a non-stratified sample because
there is only one instance in the test set! Extreme example - random dataset split
equally into two classes: Best inducer predicts majority class; 50% accuracy on fresh data; Leave-One-Out-CV estimate is 100% error!
Leave-One-Out Cross-Validation and Stratification
![Page 19: Data Mining Models and Evaluation Techniques](https://reader035.vdocuments.net/reader035/viewer/2022081520/577cc2e31a28aba71194a598/html5/thumbnails/19.jpg)
Cross validation uses sampling without replacement: The same instance, once selected, can not be
selected again for a particular training/test set The bootstrap uses sampling with replacement
to form the training set: Sample a dataset of n instances n times with
replacement to form a new dataset of n instances; Use this data as the training set; Use the instances from the original dataset that don’t
occur in the new training set for testing.
Bootstrap Method
![Page 20: Data Mining Models and Evaluation Techniques](https://reader035.vdocuments.net/reader035/viewer/2022081520/577cc2e31a28aba71194a598/html5/thumbnails/20.jpg)
The bootstrap method is also called the 0.632 bootstrap: A particular instance has a probability of 1–1/n of
not being picked; Thus its probability of ending up in the test data is:
This means the training data will contain approximately 63.2% of the instances and the test data will contain approximately 36.8% of the instances.
Bootstrap Method
368.011 1
e
n
n
![Page 21: Data Mining Models and Evaluation Techniques](https://reader035.vdocuments.net/reader035/viewer/2022081520/577cc2e31a28aba71194a598/html5/thumbnails/21.jpg)
The error estimate on the test data will be very pessimistic because the classifier is trained on just ~63% of the instances.
Therefore, combine it with the training error:
The training error gets less weight than the error on the test data.
Repeat process several times with different replacement samples; average the results.
Estimating Error with the Bootstrap Method
instances traininginstancestest 368.0632.0 eeerr
![Page 22: Data Mining Models and Evaluation Techniques](https://reader035.vdocuments.net/reader035/viewer/2022081520/577cc2e31a28aba71194a598/html5/thumbnails/22.jpg)
• Assume that we have two classifiers, M1 and M2, and we would like to know which one is better for a classification problem.
• We test the classifiers on n test data sets D1, D2, …, Dn, and we receive error rate estimates e11, e12, …, e1n for classifier M1 and error rate estimates e21, e22, …, e2n for classifier M2.
• Using rate estimates we can compute the mean error rate e1
for classifier M1 and the mean error rate e2 for classifier M2. • These mean error rates are just estimates of error on the
true population of future data cases. What if the difference between the two error rates is just attributed to chance?
Comparing Two Classifier Models
![Page 23: Data Mining Models and Evaluation Techniques](https://reader035.vdocuments.net/reader035/viewer/2022081520/577cc2e31a28aba71194a598/html5/thumbnails/23.jpg)
• We note that error rate estimates e11, e12, …, e1n for classifier M1 and error rate estimates e21, e22, …, e2n for classifier M2 are paired. Thus, we consider the differences d1, d2, …, dn where dj= | e1j- e2j|.
• The differences d1, d2, …, dn are instantiations of n random variables D1, D2, …, Dn with mean µD and standard deviation σD.
• We need to establish confidence intervals for µD in order to decide whether the difference in the generalization performance of the classifiers M1 and M2 is statistically significant or not.
Comparing Two Classifier Models
![Page 24: Data Mining Models and Evaluation Techniques](https://reader035.vdocuments.net/reader035/viewer/2022081520/577cc2e31a28aba71194a598/html5/thumbnails/24.jpg)
• Since the standard deviation σD is unknown, we approximate it using the sample standard deviation sd:
• Since we approximate the true standard deviation σD, we introduce T statistics:
2212
11 )]()[(1 eeee
ns i
n
iid
nsDT
d
D
/
Comparing Two Classifier Models
![Page 25: Data Mining Models and Evaluation Techniques](https://reader035.vdocuments.net/reader035/viewer/2022081520/577cc2e31a28aba71194a598/html5/thumbnails/25.jpg)
• The T statistics is governed by t-distribution with n - 1 degrees of freedom. Area = 1 -
t/2 t1- /2
Comparing Two Classifier Models
![Page 26: Data Mining Models and Evaluation Techniques](https://reader035.vdocuments.net/reader035/viewer/2022081520/577cc2e31a28aba71194a598/html5/thumbnails/26.jpg)
• If d and sd are the mean and standard deviation of the normally distributed differences of n random pairs of errors, a (1 – α)100% confidence interval for µD = µ1 - µ2 is :
where tα/2 is the t-value with v = n -1 degrees of freedom, leaving an area of α/2 to the right.
• Thus, if the interval contains 0.0 we can conclude on significance level α that the difference is 0.0.
,2/2/ nstd
nstd d
Dd
Comparing Two Classifier Models
![Page 27: Data Mining Models and Evaluation Techniques](https://reader035.vdocuments.net/reader035/viewer/2022081520/577cc2e31a28aba71194a598/html5/thumbnails/27.jpg)
In practice, different types of classification errors often incur different costs
Examples: ¨ Terrorist profiling
“Not a terrorist” correct 99.99% of the time Loan decisions Fault diagnosis Promotional mailing
Counting the Costs
![Page 28: Data Mining Models and Evaluation Techniques](https://reader035.vdocuments.net/reader035/viewer/2022081520/577cc2e31a28aba71194a598/html5/thumbnails/28.jpg)
Cost MatricesPos Neg
Pos TP Cost FN CostNeg FP Cost TN Cost
Usually, TP Cost and TN Cost are set equal to 0!
Hypothesized classTrue class
![Page 29: Data Mining Models and Evaluation Techniques](https://reader035.vdocuments.net/reader035/viewer/2022081520/577cc2e31a28aba71194a598/html5/thumbnails/29.jpg)
Cost-Sensitive Classification If the classifier outputs probability for each class, it can be
adjusted to minimize the expected costs of the predictions. Expected cost is computed as dot product of vector of
class probabilities and appropriate column in cost matrix.
Pos Neg
Pos TP Cost FN CostNeg FP Cost TN Cost
Hypothesized classTrue class
![Page 30: Data Mining Models and Evaluation Techniques](https://reader035.vdocuments.net/reader035/viewer/2022081520/577cc2e31a28aba71194a598/html5/thumbnails/30.jpg)
Assume that the classifier returns for an instance ppos = 0.6 and pneg = 0.4. Then, the expected cost if the instance is classified as positive is 0.6 * 0 + 0.4 * 10 = 4. The expected cost if the instance is classified as negative is 0.6 * 5 + 0.4 * 0 = 3. To minimize the costs the instance is classified as negative.
Cost Sensitive Classification
Pos Neg
Pos 0 5Neg 10 0
Hypothesized classTrue class
![Page 31: Data Mining Models and Evaluation Techniques](https://reader035.vdocuments.net/reader035/viewer/2022081520/577cc2e31a28aba71194a598/html5/thumbnails/31.jpg)
Cost Sensitive Learning Simple methods for cost sensitive learning:
Resampling of instances according to costs; Weighting of instances according to costs.
In Weka Cost Sensitive Classification and Learning can be applied for any classifier using the meta scheme: CostSensitiveClassifier.
Pos Neg
Pos 0 5Neg 10 0
Hypothesized classTrue class
![Page 32: Data Mining Models and Evaluation Techniques](https://reader035.vdocuments.net/reader035/viewer/2022081520/577cc2e31a28aba71194a598/html5/thumbnails/32.jpg)
ROC Curves and Analysis
TruePredictedpos neg
pos 60 40neg 20 80
TruePredictedpos neg
pos 70 30neg 50 50
TruePredictedpos neg
pos 40 60neg 30 70
Classifier 1TPr = 0.4FPr = 0.3
Classifier 2TPr = 0.7FPr = 0.5
Classifier 3TPr = 0.6FPr = 0.2
![Page 33: Data Mining Models and Evaluation Techniques](https://reader035.vdocuments.net/reader035/viewer/2022081520/577cc2e31a28aba71194a598/html5/thumbnails/33.jpg)
ROC SpaceIdeal classifier
chance
always negative
always positive
True Negative Rate
Fals
e N
egat
ive
Rat
e
![Page 34: Data Mining Models and Evaluation Techniques](https://reader035.vdocuments.net/reader035/viewer/2022081520/577cc2e31a28aba71194a598/html5/thumbnails/34.jpg)
Dominance in the ROC Space
Classifier A dominates classifier B if and only if TPrA > TPrB and FPrA < FPrB.
![Page 35: Data Mining Models and Evaluation Techniques](https://reader035.vdocuments.net/reader035/viewer/2022081520/577cc2e31a28aba71194a598/html5/thumbnails/35.jpg)
35
Ensemble Methods
Data
……
model 1
model 2
model k
Ensemble model
Applications: classification, clustering, collaborative filtering, anomaly
detection……
Combine multiple models into one!
![Page 36: Data Mining Models and Evaluation Techniques](https://reader035.vdocuments.net/reader035/viewer/2022081520/577cc2e31a28aba71194a598/html5/thumbnails/36.jpg)
36
Motivations
Ensemble model improves accuracy and robustness over single model methods
Applications: distributed computing privacy-preserving applications large-scale data with reusable models multiple sources of data
Efficiency: a complex problem can be decomposed into multiple sub-problems that are easier to understand and solve (divide-and-conquer approach)
![Page 37: Data Mining Models and Evaluation Techniques](https://reader035.vdocuments.net/reader035/viewer/2022081520/577cc2e31a28aba71194a598/html5/thumbnails/37.jpg)
37
Why Ensemble Works? (1)
Intuition combining diverse, independent opinions in human
decision-making as a protective mechanism (e.g. stock portfolio)
Uncorrelated error reduction Suppose we have 5 completely independent classifiers
for majority voting If accuracy is 70% for each
10 (.7^3)(.3^2)+5(.7^4)(.3)+(.7^5) 83.7% majority vote accuracy
101 such classifiers 99.9% majority vote accuracy
![Page 38: Data Mining Models and Evaluation Techniques](https://reader035.vdocuments.net/reader035/viewer/2022081520/577cc2e31a28aba71194a598/html5/thumbnails/38.jpg)
38
Why Ensemble Works? (2)
Model 1Model 2
Model 3Model 4
Model 5Model 6
Some unknown distribution
Ensemble gives the global picture!
![Page 39: Data Mining Models and Evaluation Techniques](https://reader035.vdocuments.net/reader035/viewer/2022081520/577cc2e31a28aba71194a598/html5/thumbnails/39.jpg)
39
Why Ensemble Works? (3)• Overcome limitations of single hypothesis– The target function may not be implementable with individual
classifiers, but may be approximated by model averaging
Decision Tree Model Averaging
![Page 40: Data Mining Models and Evaluation Techniques](https://reader035.vdocuments.net/reader035/viewer/2022081520/577cc2e31a28aba71194a598/html5/thumbnails/40.jpg)
40
Ensemble of Classifiers—Learn to Combine
labeled data
unlabeled data
……
final prediction
slearn the combination from labeled data
Training test
classifier 1
classifier 2
classifier k
Ensemble model
Algorithms: boosting, stacked generalization, rule ensemble, Bayesian
model averaging……
![Page 41: Data Mining Models and Evaluation Techniques](https://reader035.vdocuments.net/reader035/viewer/2022081520/577cc2e31a28aba71194a598/html5/thumbnails/41.jpg)
41
Ensemble of Classifiers—Consensus
labeled data
unlabeled data
……
final predictio
ns
training test
classifier 1
classifier 2
classifier k
combine the
predictions by
majority voting
Algorithms: bagging, random forest, random decision tree, model averaging of
probabilities……
![Page 42: Data Mining Models and Evaluation Techniques](https://reader035.vdocuments.net/reader035/viewer/2022081520/577cc2e31a28aba71194a598/html5/thumbnails/42.jpg)
42
Pros and Cons
Combine by learning
Combine by consensus
Pros Get useful feedbacks from labeled dataCan potentially improve accuracy
Do not need labeled dataCan improve the generalization performance
Cons Need to keep the labeled data to train the ensembleMay overfit the labeled dataCannot work when no labels are available
No feedbacks from the labeled dataRequire the assumption that consensus is better
![Page 43: Data Mining Models and Evaluation Techniques](https://reader035.vdocuments.net/reader035/viewer/2022081520/577cc2e31a28aba71194a598/html5/thumbnails/43.jpg)
Bagging
• Also known as bootstrap aggregation• Sampling uniformly with replacement • Build classifier on each bootstrap sample• 0.632 bootstrap• Each bootstrap sample Di contains approx.
63.2% of the original training data• Remaining (36.8%) are used as test set
Original Data 1 2 3 4 5 6 7 8 9 10Bagging (Round 1) 7 8 10 8 2 5 10 10 5 9Bagging (Round 2) 1 4 9 1 2 3 2 7 3 2Bagging (Round 3) 1 8 5 10 5 5 9 6 3 7
![Page 44: Data Mining Models and Evaluation Techniques](https://reader035.vdocuments.net/reader035/viewer/2022081520/577cc2e31a28aba71194a598/html5/thumbnails/44.jpg)
Bagging
Accuracy of bagging:
Works well for small data sets Example:
))(*368.0)(*632.0()( __1
settrainisettest
k
ii MAccMAccMAcc
X 0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
y 1 1 1 -1 -1 -1 -1 1 1 1
![Page 45: Data Mining Models and Evaluation Techniques](https://reader035.vdocuments.net/reader035/viewer/2022081520/577cc2e31a28aba71194a598/html5/thumbnails/45.jpg)
Bagging• Decision Stump
• Single level decision binary tree
• Entropy – x<=0.35 or x<=0.75
• Accuracy at most 70%
![Page 46: Data Mining Models and Evaluation Techniques](https://reader035.vdocuments.net/reader035/viewer/2022081520/577cc2e31a28aba71194a598/html5/thumbnails/46.jpg)
Bagging
Accuracy of ensemble classifier: 100%
![Page 47: Data Mining Models and Evaluation Techniques](https://reader035.vdocuments.net/reader035/viewer/2022081520/577cc2e31a28aba71194a598/html5/thumbnails/47.jpg)
Bagging- Final Points
• Works well if the base classifiers are unstable• Increased accuracy because it reduces the
variance of the individual classifier• Does not focus on any particular instance of
the training data• Therefore, less susceptible to model over-
fitting when applied to noisy data• What if we want to focus on a particular
instances of training data?
![Page 48: Data Mining Models and Evaluation Techniques](https://reader035.vdocuments.net/reader035/viewer/2022081520/577cc2e31a28aba71194a598/html5/thumbnails/48.jpg)
Boosting Principles Boost a set of weak learners to a strong learner An iterative procedure to adaptively change
distribution of training data by focusing more on previously misclassified records Initially, all N records are assigned equal weights Unlike bagging, weights may change at the end of a
boosting round
![Page 49: Data Mining Models and Evaluation Techniques](https://reader035.vdocuments.net/reader035/viewer/2022081520/577cc2e31a28aba71194a598/html5/thumbnails/49.jpg)
Boosting Records that are wrongly classified will have their
weights increased Records that are classified correctly will have their
weights decreasedOriginal Data 1 2 3 4 5 6 7 8 9 10Boosting (Round 1) 7 3 2 8 7 9 4 10 6 3Boosting (Round 2) 5 4 9 4 2 5 1 7 4 2Boosting (Round 3) 4 4 8 10 4 5 4 6 3 4
• Example 4 is hard to classify
• Its weight is increased, therefore it is more likely to be chosen again in
subsequent rounds
![Page 50: Data Mining Models and Evaluation Techniques](https://reader035.vdocuments.net/reader035/viewer/2022081520/577cc2e31a28aba71194a598/html5/thumbnails/50.jpg)
Boosting
• Equal weights are assigned to each training tuple (1/d for round 1)
• After a classifier Mi is learned, the weights are adjusted to allow the subsequent classifier Mi+1 to “pay more attention” to tuples that were misclassified by Mi.
• Final boosted classifier M* combines the votes of each individual classifier
• Weight of each classifier’s vote is a function of its accuracy
• Adaboost – popular boosting algorithm
![Page 51: Data Mining Models and Evaluation Techniques](https://reader035.vdocuments.net/reader035/viewer/2022081520/577cc2e31a28aba71194a598/html5/thumbnails/51.jpg)
Adaboost
Input: Training set D containing d tuples k rounds A classification learning scheme
Output: A composite model
![Page 52: Data Mining Models and Evaluation Techniques](https://reader035.vdocuments.net/reader035/viewer/2022081520/577cc2e31a28aba71194a598/html5/thumbnails/52.jpg)
Adaboost Data set D containing d class-labeled tuples
(X1,y1), (X2,y2), (X3,y3),….(Xd,yd) Initially assign equal weight 1/d to each tuple To generate k base classifiers, we need k rounds
or iterations Round i, tuples from D are sampled with
replacement , to form Di (size d) Each tuple’s chance of being selected depends on
its weight
![Page 53: Data Mining Models and Evaluation Techniques](https://reader035.vdocuments.net/reader035/viewer/2022081520/577cc2e31a28aba71194a598/html5/thumbnails/53.jpg)
Adaboost
Base classifier Mi, is derived from training tuples of Di
Error of Mi is tested using Di Weights of training tuples are adjusted depending
on how they were classified Correctly classified: Decrease weight Incorrectly classified: Increase weight
Weight of a tuple indicates how hard it is to classify it (directly proportional)
![Page 54: Data Mining Models and Evaluation Techniques](https://reader035.vdocuments.net/reader035/viewer/2022081520/577cc2e31a28aba71194a598/html5/thumbnails/54.jpg)
Adaboost Some classifiers may be better at classifying some
“hard” tuples than others We finally have a series of classifiers that
complement each other! Error rate of model Mi:
where err(Xj) is the misclassification error for Xj(=1) If classifier error exceeds 0.5, we abandon it Try again with a new Di and a new Mi derived from it
d
jjji XerrwMerror )(*)(
![Page 55: Data Mining Models and Evaluation Techniques](https://reader035.vdocuments.net/reader035/viewer/2022081520/577cc2e31a28aba71194a598/html5/thumbnails/55.jpg)
Adaboost error (Mi) affects how the weights of training tuples
are updated If a tuple is correctly classified in round i, its weight
is multiplied by
Adjust weights of all correctly classified tuples Now weights of all tuples (including the
misclassified tuples) are normalized Normalization factor =
Weight of a classifier Mi’s weight is
)(1)(
i
i
MerrorMerror
weightsnewofsumweightsoldofsum
______
)(1)(log
i
i
MerrorMerror
![Page 56: Data Mining Models and Evaluation Techniques](https://reader035.vdocuments.net/reader035/viewer/2022081520/577cc2e31a28aba71194a598/html5/thumbnails/56.jpg)
Adaboost
The lower a classifier error rate, the more accurate it is, and therefore, the higher its weight for voting should be
Weight of a classifier Mi’s vote is
For each class c, sum the weights of each classifier that assigned class c to X (unseen tuple)
The class with the highest sum is the WINNER!
)(1)(log
i
i
MerrorMerror
![Page 57: Data Mining Models and Evaluation Techniques](https://reader035.vdocuments.net/reader035/viewer/2022081520/577cc2e31a28aba71194a598/html5/thumbnails/57.jpg)
Use test sets and the hold-out method for “large” data;
Use the cross-validation method for “middle-sized” data;
Use the leave-one-out and bootstrap methods for small data;
Don’t use test data for parameter tuning - use separate validation data.
Metric Evaluation Summary:
![Page 58: Data Mining Models and Evaluation Techniques](https://reader035.vdocuments.net/reader035/viewer/2022081520/577cc2e31a28aba71194a598/html5/thumbnails/58.jpg)
In this seminar report we have considered: Metrics for Classifier’s Evaluation Methods for Classifier’s Evaluation Comparing Data Mining Schemes Costs in Data Mining Ensemble Methods
Summary
![Page 59: Data Mining Models and Evaluation Techniques](https://reader035.vdocuments.net/reader035/viewer/2022081520/577cc2e31a28aba71194a598/html5/thumbnails/59.jpg)
Thank You