mining several databases with an ensemble of classifiers seppo puuronen vagan terziyan alexander...

30
Mining Several Mining Several Databases with an Databases with an Ensemble of Ensemble of Classifiers Classifiers Seppo Puuronen Vagan Terziyan Alexander Logvinovsky 10th International Conference and Workshop on Database and Expert Systems Applications August 30 - September 3, 1999 Florence, Italy DEXA-99

Post on 21-Dec-2015

216 views

Category:

Documents


0 download

TRANSCRIPT

Mining Several Databases with Mining Several Databases with an Ensemble of Classifiersan Ensemble of Classifiers

Seppo Puuronen

Vagan Terziyan

Alexander Logvinovsky

10th International Conference and Workshop on Database and Expert Systems Applications

August 30 - September 3, 1999

Florence, Italy

DEXA-99

AuthorsAuthors

Department of Computer Science and Information Systems

University of Jyvaskyla FINLAND

Seppo Puuronen Vagan Terziyan

Department of Artificial Intelligence

Kharkov State Technical University of Radioelectronics,

UKRAINE

[email protected]@jytko.jyu.fi

Alexander Logvinovsky

Department of Artificial Intelligence

Kharkov State Technical University of Radioelectronics, UKRAINE

[email protected]

ContentsContents

The problem of “multiclassifiers” - “multidatabase” mining;

Case “One Database - Many Classifiers”; Dynamic integration of classifiers; Case “One Classifier - Many Databases”; Weighting databases; Case “Many Databases - Many Classifiers”; Context-based trend within the classifiers predictions and

decontextualization; Conclusion

1 2 3

?

ClassifierSample

Data Set

Result

IntroductionIntroduction

IntroductionIntroduction

____

x1

y1

____

x2

y2

____

x3

y3

Data Set

y=?

ClassifierSample Result

x f(x)

Sample

1 2

3

12

3

?

Classifiers

Data Sets

Result

ProblemProblem

ProblemProblem

DB nDB 1

Sample

?

Classifiers

Data Sets

Result

Classifier m

x Classifier 1

Case ONE:ONECase ONE:ONE

DB

?xClassifier

DB

?x

Classifier m

Classifier 1

Case ONE:MANYCase ONE:MANY

Dynamic Integration of ClassifiersDynamic Integration of Classifiers

Final classification is made by weighted voting of classifiers from the ensemble;

Weights of classifiers are recalculated for every new instance;

Weighting is based on predicted errors of the classifiers in the neighborhood area of the instance

Sliding Exam of a Classifier Sliding Exam of a Classifier (Predictor, Interpolator)(Predictor, Interpolator)

x

y(x)

xi-1 xi xi+1

Remove an instance y(xi) from training set;

Use a classifier to derive prediction result y’(xi);

Evaluate difference as distance between real and predicted values

Continue for every instancex

y(x)

xi-1 xi xi+1 x

y(x)

xi-1 xi xi+1

Brief Review of Distance FunctionsBrief Review of Distance Functions According to D. Wilson and T. Martinez (1997)According to D. Wilson and T. Martinez (1997)

PEBLSPEBLS Distance Evaluation for Nominal Distance Evaluation for Nominal ValuesValues (According to (According to Cost S. and Salzberg S., 1993 ))

The distance di between two values v1 and v2 for certain instance is:

where C1 and C2 are the numbers of instances in the training set with selected values v1 and v2, C1i and C2i are the numbers of instances from the i-th class, where the values v1 and v2 were selected, and k is the number of classes of instances

d v vC

C

C

Ci i

i

k

( , ) ,1 21

1

2

2

2

1

Interpolation of Error Function Based Interpolation of Error Function Based on Hypothesis of Compactnesson Hypothesis of Compactness

xix2x1x0 x

x

x4x3

| x - xi | < ( 0) | (x) - (xi) | 0

Competence mapCompetence map

absolute difference weight function

1 2

DCBA3 2 1

x

3

2

x

1

21

1

Solution for ONE:MANYSolution for ONE:MANY

i

ii

iiy

y

ONE:MANY

Classifier m

Classifier 1

DB

Case MANY:ONECase MANY:ONE

DB 1

?x Classifier

DB n

Integration of DatabasesIntegration of Databases

Final classification of an instance is obtained by weighted voting of predictions made by the classifier for every database separately;

Weighting is based on taking the integral of the error function of the classifier across every database

Integral Weight of Classifier Integral Weight of Classifier

Classifier

DBn

DB1

x

(x)

(x)

x

xi

xi

a b

b

a

jj xab

)(1

21

1

Solution for Solution for MANY:ONEMANY:ONE

MANY:ONE

DB 1

Classifier

DB n

jj

jjjy

y

Case MANY:MANYCase MANY:MANY

DB 1

?x

Classifier m

Classifier 1

DB n

Weighting Classifiers and DatabasesWeighting Classifiers and Databases

Classifier1 … Classifier mDB1 y1

1, 11 (1

1) … y1m, 1

m (1m) y1, 1

DB2 y21, 2

1 (21) … y2

m, 2m (2

m) y2, 2

… … … … …DBn yn

1, n1 (n

1) … ynm, n

m (nm) yn, n

y1, 1 … ym, m y,

m

i

ij

m

i

ij

ij

j

y

y

1

1

m

i

ij

m

i

ij

ij

j

1

1

n

j

ij

n

j

ij

ij

i

y

y

1

1

n

j

ij

n

j

ij

ij

i

1

1

Prediction and weight of a database Prediction and weight of a classifier

Solutions for MANY:MANYSolutions for MANY:MANY

MANY:MANY

DB 1

Classifier m

Classifier 1

DB n

ONE:MANY

Classifier m

Classifier 1

DB

MANY:ONE

DB 1

Classifier

DB n

ONE:ONE

DB

Classifier

MANY:MANY

DB 1

Classifier m

Classifier 1

DB n

Solutions for MANY:MANYSolutions for MANY:MANY

MANY:MANY

DB 1

Classifier m

Classifier 1

DB n

ONE:MANY

Classifier m

Classifier 1

DB

MANY:ONE

DB 1

Classifier

DB n

ONE:ONE

DB

Classifier

MANY:MANY

DB 1

Classifier m

Classifier 1

DB n

ji

ij

ji

ij

ijy

y

,

,

n

jj

n

jjjy

y

1

1

m

i

i

m

i

iiy

y

1

1

1

3

2

1

2

3

Decontextualization of PredictionsDecontextualization of Predictions

Sometimes actual value cannot be predicted as weighted mean of individual predictions of classifiers from the ensemble;

It means that the actual value is outside the area of predictions;

It happens if classifiers are effected by the same type of a context with different power;

It results to a trend among predictions from the less powerful context to the most powerful one;

In this case actual value can be obtained as the result of “decontextualization” of the individual predictions

Neighbor Context TrendNeighbor Context Trend

1

2

3

x

prediction in (1,2) neighbor context: prediction in (1,2) neighbor context: ““worse contextworse context””

prediction in (1,2,3) neighbor context: prediction in (1,2,3) neighbor context: ““better contextbetter context””

actual value: “actual value: “ideal contextideal context””y

xi

y(xi)

y+(xi)

y-(xi)

Main Decontextalization FormulaMain Decontextalization Formula

y

Y

y- - prediction in worse context

y+ - prediction in better context

y’ - decontextualized prediction

y - actual value

y’y+y-

+

-

’ ’ == -- ··++

-- + + ++ ’ < - ; ’ < ++ < -

DecontextualizationDecontextualization

One level decontextualization

All subcontexts decontextualization

Decontextualized difference

New sample classification

i

i

yy

yyy

.11

1

jjyy

yy

yyy

= y – y

y(x) = y(x)+ (x)

Physical Interpretation of Physical Interpretation of DecontextualizationDecontextualization

R1

R2

Rres

actual value

decontextualized value

predicted values

RR R

R Rres

1 2

1 2

Uncertainty is like a “resistance” for precise prediction

actual value

y+y- y’

yy

yi- - prediction in worse context

y+ - prediction in better context

y’ - decontextualized prediction

y - actual value

ConclusionConclusion

Dynamic integration of classifiers based on locally adaptive weights of classifiers allows to handle the case «One Dataset - Many Classifiers»;

Integration of databases based on their integral weights relatively to the classification accuracy allows to handle the case «One Classifier - Many Datasets»;

Successive or parallel application of the two abowe algorithms allows a variety of solutions for the case «Many Classifiers - Many Datasets»;

Decontextualization as the opposite to weighted voting way of integration of classifiers allows to handle context of classification in the case of a trend