mining several databases with an ensemble of classifiers seppo puuronen vagan terziyan alexander...
Post on 21-Dec-2015
216 views
TRANSCRIPT
Mining Several Databases with Mining Several Databases with an Ensemble of Classifiersan Ensemble of Classifiers
Seppo Puuronen
Vagan Terziyan
Alexander Logvinovsky
10th International Conference and Workshop on Database and Expert Systems Applications
August 30 - September 3, 1999
Florence, Italy
DEXA-99
AuthorsAuthors
Department of Computer Science and Information Systems
University of Jyvaskyla FINLAND
Seppo Puuronen Vagan Terziyan
Department of Artificial Intelligence
Kharkov State Technical University of Radioelectronics,
UKRAINE
[email protected]@jytko.jyu.fi
Alexander Logvinovsky
Department of Artificial Intelligence
Kharkov State Technical University of Radioelectronics, UKRAINE
ContentsContents
The problem of “multiclassifiers” - “multidatabase” mining;
Case “One Database - Many Classifiers”; Dynamic integration of classifiers; Case “One Classifier - Many Databases”; Weighting databases; Case “Many Databases - Many Classifiers”; Context-based trend within the classifiers predictions and
decontextualization; Conclusion
IntroductionIntroduction
____
x1
y1
____
x2
y2
____
x3
y3
Data Set
y=?
ClassifierSample Result
x f(x)
Dynamic Integration of ClassifiersDynamic Integration of Classifiers
Final classification is made by weighted voting of classifiers from the ensemble;
Weights of classifiers are recalculated for every new instance;
Weighting is based on predicted errors of the classifiers in the neighborhood area of the instance
Sliding Exam of a Classifier Sliding Exam of a Classifier (Predictor, Interpolator)(Predictor, Interpolator)
x
y(x)
xi-1 xi xi+1
Remove an instance y(xi) from training set;
Use a classifier to derive prediction result y’(xi);
Evaluate difference as distance between real and predicted values
Continue for every instancex
y(x)
xi-1 xi xi+1 x
y(x)
xi-1 xi xi+1
Brief Review of Distance FunctionsBrief Review of Distance Functions According to D. Wilson and T. Martinez (1997)According to D. Wilson and T. Martinez (1997)
PEBLSPEBLS Distance Evaluation for Nominal Distance Evaluation for Nominal ValuesValues (According to (According to Cost S. and Salzberg S., 1993 ))
The distance di between two values v1 and v2 for certain instance is:
where C1 and C2 are the numbers of instances in the training set with selected values v1 and v2, C1i and C2i are the numbers of instances from the i-th class, where the values v1 and v2 were selected, and k is the number of classes of instances
d v vC
C
C
Ci i
i
k
( , ) ,1 21
1
2
2
2
1
Interpolation of Error Function Based Interpolation of Error Function Based on Hypothesis of Compactnesson Hypothesis of Compactness
xix2x1x0 x
x
x4x3
| x - xi | < ( 0) | (x) - (xi) | 0
Integration of DatabasesIntegration of Databases
Final classification of an instance is obtained by weighted voting of predictions made by the classifier for every database separately;
Weighting is based on taking the integral of the error function of the classifier across every database
Integral Weight of Classifier Integral Weight of Classifier
Classifier
DBn
DB1
x
(x)
(x)
x
xi
xi
a b
b
a
jj xab
)(1
21
1
Weighting Classifiers and DatabasesWeighting Classifiers and Databases
Classifier1 … Classifier mDB1 y1
1, 11 (1
1) … y1m, 1
m (1m) y1, 1
DB2 y21, 2
1 (21) … y2
m, 2m (2
m) y2, 2
… … … … …DBn yn
1, n1 (n
1) … ynm, n
m (nm) yn, n
y1, 1 … ym, m y,
m
i
ij
m
i
ij
ij
j
y
y
1
1
m
i
ij
m
i
ij
ij
j
1
1
n
j
ij
n
j
ij
ij
i
y
y
1
1
n
j
ij
n
j
ij
ij
i
1
1
Prediction and weight of a database Prediction and weight of a classifier
Solutions for MANY:MANYSolutions for MANY:MANY
MANY:MANY
DB 1
Classifier m
Classifier 1
DB n
ONE:MANY
Classifier m
Classifier 1
DB
MANY:ONE
DB 1
Classifier
DB n
ONE:ONE
DB
Classifier
MANY:MANY
DB 1
Classifier m
Classifier 1
DB n
Solutions for MANY:MANYSolutions for MANY:MANY
MANY:MANY
DB 1
Classifier m
Classifier 1
DB n
ONE:MANY
Classifier m
Classifier 1
DB
MANY:ONE
DB 1
Classifier
DB n
ONE:ONE
DB
Classifier
MANY:MANY
DB 1
Classifier m
Classifier 1
DB n
ji
ij
ji
ij
ijy
y
,
,
n
jj
n
jjjy
y
1
1
m
i
i
m
i
iiy
y
1
1
1
3
2
1
2
3
Decontextualization of PredictionsDecontextualization of Predictions
Sometimes actual value cannot be predicted as weighted mean of individual predictions of classifiers from the ensemble;
It means that the actual value is outside the area of predictions;
It happens if classifiers are effected by the same type of a context with different power;
It results to a trend among predictions from the less powerful context to the most powerful one;
In this case actual value can be obtained as the result of “decontextualization” of the individual predictions
Neighbor Context TrendNeighbor Context Trend
1
2
3
x
prediction in (1,2) neighbor context: prediction in (1,2) neighbor context: ““worse contextworse context””
prediction in (1,2,3) neighbor context: prediction in (1,2,3) neighbor context: ““better contextbetter context””
actual value: “actual value: “ideal contextideal context””y
xi
y(xi)
y+(xi)
y-(xi)
Main Decontextalization FormulaMain Decontextalization Formula
y
Y
y- - prediction in worse context
y+ - prediction in better context
y’ - decontextualized prediction
y - actual value
y’y+y-
’
+
-
’ ’ == -- ··++
-- + + ++ ’ < - ; ’ < ++ < -
DecontextualizationDecontextualization
One level decontextualization
All subcontexts decontextualization
Decontextualized difference
New sample classification
i
i
yy
yyy
.11
1
jjyy
yy
yyy
= y – y
y(x) = y(x)+ (x)
Physical Interpretation of Physical Interpretation of DecontextualizationDecontextualization
R1
R2
Rres
actual value
decontextualized value
predicted values
RR R
R Rres
1 2
1 2
Uncertainty is like a “resistance” for precise prediction
actual value
y+y- y’
yy
yi- - prediction in worse context
y+ - prediction in better context
y’ - decontextualized prediction
y - actual value
ConclusionConclusion
Dynamic integration of classifiers based on locally adaptive weights of classifiers allows to handle the case «One Dataset - Many Classifiers»;
Integration of databases based on their integral weights relatively to the classification accuracy allows to handle the case «One Classifier - Many Datasets»;
Successive or parallel application of the two abowe algorithms allows a variety of solutions for the case «Many Classifiers - Many Datasets»;
Decontextualization as the opposite to weighted voting way of integration of classifiers allows to handle context of classification in the case of a trend