dynamic classifier selection for effective mining from noisy data streams
DESCRIPTION
Dynamic Classifier Selection for Effective Mining from Noisy Data Streams. Xingquan Zhu, Xindong Wu, and Ying Yang Proc. of KDD 2003 2005/3/25 報告人 : 董原賓. Problem. Problem: Many existing data stream mining efforts are based on the Classifier Combination techniques - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Dynamic Classifier Selection for Effective Mining from Noisy Data Streams](https://reader036.vdocuments.net/reader036/viewer/2022062517/56813b87550346895da4b3d8/html5/thumbnails/1.jpg)
Dynamic Classifier Selection for Effective Mining from Noisy Data Streams
Xingquan Zhu, Xindong Wu, and Ying YangProc. of KDD 2003
2005/3/25 報告人 : 董原賓
![Page 2: Dynamic Classifier Selection for Effective Mining from Noisy Data Streams](https://reader036.vdocuments.net/reader036/viewer/2022062517/56813b87550346895da4b3d8/html5/thumbnails/2.jpg)
Problem
Problem: Many existing data stream mining efforts
are based on the Classifier Combination techniques
Dramatic concept drift 、 Significant amount of noise
Solution: Choose the most reliable classifier
![Page 3: Dynamic Classifier Selection for Effective Mining from Noisy Data Streams](https://reader036.vdocuments.net/reader036/viewer/2022062517/56813b87550346895da4b3d8/html5/thumbnails/3.jpg)
Multiple Classifier System(MCS) MCS assumption: each base classifier has a
particular sub-domain from which it is most reliable
Two categories of MCS integration techniques: Classifier Combination (CC) techniques
All base classifiers are combined to work out the final decision EX:SAM( Select All
Majority ) Classifier Selection (CS) techniques
Select the single best classifier from base classifiers for the final decision
![Page 4: Dynamic Classifier Selection for Effective Mining from Noisy Data Streams](https://reader036.vdocuments.net/reader036/viewer/2022062517/56813b87550346895da4b3d8/html5/thumbnails/4.jpg)
Classifier Selection techniques
Two types of CS techniques: Static Classifier Selection, during the
training phase, EX: CVM (Cross Validation Majority)
Dynamic Classifier Selection, during the classification phase, call it “dynamic” because the classifier used critically depends on the test instance itself, EX: DCS_LA (Dynamic Classifier Selection by Local Accuracy)
![Page 5: Dynamic Classifier Selection for Effective Mining from Noisy Data Streams](https://reader036.vdocuments.net/reader036/viewer/2022062517/56813b87550346895da4b3d8/html5/thumbnails/5.jpg)
Definition
Dataset D, training set X, test set Y and evaluation set Z
Nx, Ny and Nz represent the numbers of instances in X, Y and Z respectively
C1,C2,…,CL the L base classifiers from X The selected best classifier C* to classi
fy each instance Ix in Y
![Page 6: Dynamic Classifier Selection for Effective Mining from Noisy Data Streams](https://reader036.vdocuments.net/reader036/viewer/2022062517/56813b87550346895da4b3d8/html5/thumbnails/6.jpg)
Definition The instances in D have M attributes A
1,A2,…,AM and each attribute A contains ni values V1
Ai,…,VniAi
For an attribute Ai ,use its values to partition Z into ni subsets S1
Ai,…,SniAi whe
re S1Ai ∪.. ∪ Sni
Ai = Z Ik
Ai denotes instance Ik’s value on attribute Ai
![Page 7: Dynamic Classifier Selection for Effective Mining from Noisy Data Streams](https://reader036.vdocuments.net/reader036/viewer/2022062517/56813b87550346895da4b3d8/html5/thumbnails/7.jpg)
Attribute-Oriented Dynamic Classifier Selection (AO-DCS)
Three steps of AO-DCS: Partition the evaluation set into subsets by
using the attribute values of the instances Evaluate the classification accuracy of
each base classifier on all subsets For a test instance, use its attribute values
to select the corresponding subsets and select the base classifier that has the highest classification accuracy
![Page 8: Dynamic Classifier Selection for Effective Mining from Noisy Data Streams](https://reader036.vdocuments.net/reader036/viewer/2022062517/56813b87550346895da4b3d8/html5/thumbnails/8.jpg)
Partition by attributes
![Page 9: Dynamic Classifier Selection for Effective Mining from Noisy Data Streams](https://reader036.vdocuments.net/reader036/viewer/2022062517/56813b87550346895da4b3d8/html5/thumbnails/9.jpg)
Partition By Attributes
NameGende
rAge
Height
Mary Female 29 163
Dave Male 51 170
Martha Female 63 149
Nancy Female 35 157
John Male 18 182
Age :< 30(S1A)
≧30(S2A)
Height :≦ 160 (S1H)
161 ~ 180(S2H)
≧ 181 (S3H)
Base Classifier : C1, C2, C3
Instance IMary
S1G : IDave, IJohn
S2G : IMary, IMartha, INancy
Gender : Male(S1
G)
Female(S2G)
![Page 10: Dynamic Classifier Selection for Effective Mining from Noisy Data Streams](https://reader036.vdocuments.net/reader036/viewer/2022062517/56813b87550346895da4b3d8/html5/thumbnails/10.jpg)
Evaluate the classification accuracy
Partition by attributes
L base classifiers
Subsets from Attribute Ai
![Page 11: Dynamic Classifier Selection for Effective Mining from Noisy Data Streams](https://reader036.vdocuments.net/reader036/viewer/2022062517/56813b87550346895da4b3d8/html5/thumbnails/11.jpg)
The classification accuracy
![Page 12: Dynamic Classifier Selection for Effective Mining from Noisy Data Streams](https://reader036.vdocuments.net/reader036/viewer/2022062517/56813b87550346895da4b3d8/html5/thumbnails/12.jpg)
Dynamic Classifier Selection
![Page 13: Dynamic Classifier Selection for Effective Mining from Noisy Data Streams](https://reader036.vdocuments.net/reader036/viewer/2022062517/56813b87550346895da4b3d8/html5/thumbnails/13.jpg)
S1G S2
G S1A S2
A S1H S2
H S3H
C1 0.8 0.5 0.6 0.4 0.2 0.4 0.6
C2 0.4 0.7 0.6 0.3 0.5 0.9 0.8
C3 0.6 0.9 0.3 0.5 0.7 0.8 0.4
NameGende
rAge Height
Alex Male 24 177The accuracy of C1 : AverageAcy[1] = (0.8+0.6+0.4) / 3 = 0.6
AverageAcy[2] = 0.63
AverageAcy[3] = 0.56
![Page 14: Dynamic Classifier Selection for Effective Mining from Noisy Data Streams](https://reader036.vdocuments.net/reader036/viewer/2022062517/56813b87550346895da4b3d8/html5/thumbnails/14.jpg)
Applying AO-DCS in Data Steam Mining
Steps: partition streaming data into a series
of chunks, S1 , S2 , .. Si ,.., each of which is small enough to be processed by the algorithm at one time.
Then learn a base classifier Ci from each chunk Si
![Page 15: Dynamic Classifier Selection for Effective Mining from Noisy Data Streams](https://reader036.vdocuments.net/reader036/viewer/2022062517/56813b87550346895da4b3d8/html5/thumbnails/15.jpg)
Applying AO-DCS in Data Steam Mining (cont.)
To evaluate all base classifiers (in the case that the number of base classifiers is too large, we can keep only the most recent K classifiers) and determine the “best” one for each test instance
note: We will dynamically construct an evaluation set Z (using the most recent instances, because they are likely consistent with the current test instances)
![Page 16: Dynamic Classifier Selection for Effective Mining from Noisy Data Streams](https://reader036.vdocuments.net/reader036/viewer/2022062517/56813b87550346895da4b3d8/html5/thumbnails/16.jpg)
Experiment
![Page 17: Dynamic Classifier Selection for Effective Mining from Noisy Data Streams](https://reader036.vdocuments.net/reader036/viewer/2022062517/56813b87550346895da4b3d8/html5/thumbnails/17.jpg)
Experiment
![Page 18: Dynamic Classifier Selection for Effective Mining from Noisy Data Streams](https://reader036.vdocuments.net/reader036/viewer/2022062517/56813b87550346895da4b3d8/html5/thumbnails/18.jpg)
Experiment
![Page 19: Dynamic Classifier Selection for Effective Mining from Noisy Data Streams](https://reader036.vdocuments.net/reader036/viewer/2022062517/56813b87550346895da4b3d8/html5/thumbnails/19.jpg)
Experiment
![Page 20: Dynamic Classifier Selection for Effective Mining from Noisy Data Streams](https://reader036.vdocuments.net/reader036/viewer/2022062517/56813b87550346895da4b3d8/html5/thumbnails/20.jpg)
Experiment
![Page 21: Dynamic Classifier Selection for Effective Mining from Noisy Data Streams](https://reader036.vdocuments.net/reader036/viewer/2022062517/56813b87550346895da4b3d8/html5/thumbnails/21.jpg)
Experiment
![Page 22: Dynamic Classifier Selection for Effective Mining from Noisy Data Streams](https://reader036.vdocuments.net/reader036/viewer/2022062517/56813b87550346895da4b3d8/html5/thumbnails/22.jpg)
Experiment