20.12.05/12:00 agora, aud 2 public examination of phd thesis: “feature extraction for supervised...
TRANSCRIPT
20.12.05/12:00Agora, Aud 2
Public examination of PhD thesis: “Feature Extraction for Supervised Learning in Knowledge Discovery Systems”
1
Prof. Seppo Puuronen, JYU
Dr. Alexey Tsymbal, TCD
Prof. Tommi Kärkkäinen, JYU
Prof. Ryszard Michalski, GMU
Prof. Peter Kokol, UM
Dr. Kari Torkkola, MotorolaLabs
Supervisors:
Reviewers:
Opponent:JYU, Agora Building, Auditorium 2
December 20, 2005 12:00
Mykola Pechenizkiy
Feature Extraction for Supervised Learning in Knowledge Discovery
Systems
Public examination of dissertation
20.12.05/12:00Agora, Aud 2
Public examination of PhD thesis: “Feature Extraction for Supervised Learning in Knowledge Discovery Systems”
2
Outline DM and KDD background
– KDD as a process– DM strategy
Classification– Curse of dimensionality and indirectly relevant
features– Feature extraction (FE) as dimensionality reduction
Feature Extraction for Classification– Conventional Principal Component Analysis – Class-conditional FE: parametric and non-parametric
Research Questions Research Methods Contributions
20.12.05/12:00Agora, Aud 2
Public examination of PhD thesis: “Feature Extraction for Supervised Learning in Knowledge Discovery Systems”
3
Knowledge discovery as a process
Fayyad, U., Piatetsky-Shapiro, G., Smyth, P., Uthurusamy, R., Advances in Knowledge Discovery and Data Mining, AAAI/MIT Press, 1997.
20.12.05/12:00Agora, Aud 2
Public examination of PhD thesis: “Feature Extraction for Supervised Learning in Knowledge Discovery Systems”
4
CLASSIFICATIONCLASSIFICATION
New instance to be classified
Class Membership ofthe new instance
J classes, n training observations, p features
Given n training instances
(xi, yi) where xi are values of
attributes and y is class
Goal: given new x0,
predict class y0
Training Set
The task of classification
Examples:
- diagnosis of thyroid diseases;
- heart attack prediction, etc.
20.12.05/12:00Agora, Aud 2
Public examination of PhD thesis: “Feature Extraction for Supervised Learning in Knowledge Discovery Systems”
5
Improvement of Representation Space
Curse of dimensionality drastic increase in computational complexity and
classification error with data having a large number of dimensions
Indirectly relevant features
20.12.05/12:00Agora, Aud 2
Public examination of PhD thesis: “Feature Extraction for Supervised Learning in Knowledge Discovery Systems”
6
FE example “Heart Disease”
0.1·Age-0.6·Sex-0.73·RestBP-0.33·MaxHeartRate
-0.01·Age+0.78·Sex-0.42·RestBP-0.47·MaxHeartRate
-0.7·Age+0.1·Sex-0.43·RestBP+0.57·MaxHeartRate
100% Variance covered 87%
60% <= 3NN Accuracy => 67%
20.12.05/12:00Agora, Aud 2
Public examination of PhD thesis: “Feature Extraction for Supervised Learning in Knowledge Discovery Systems”
7
representation of instances of class y1
representation of instances of class yk
Selecting most relevant features
Selecting most
representative instances
Extracted featuresOriginal features
How to construct good RS for SL?
RQ4: Which features – original, extracted or both – are useful for SL?RQ1 – How important is to use class information in the FE process?RQ2 – Is FE data oriented or SL oriented or both?
RQ5 – How many extracted features are useful for SL?
RQ6 – How to cope with the presence of contextual features in data, and data heterogeneity?
RQ7 – What is the effect of sample reduction on the performance of FE for SL?
RQ3 – Is FE for dynamic integration of base-level classifiers useful in a similar way as for a single base-level classifier?
20.12.05/12:00Agora, Aud 2
Public examination of PhD thesis: “Feature Extraction for Supervised Learning in Knowledge Discovery Systems”
8
Research Problem
Studying both theoretical background and practical aspects of FE for SL in KDSs
Main Contribution
Many-sided analysis of the research problem Ensemble of relatively small contributions
Research Method A multimethodological approach to the
construction of an artefact for DM (following Nunamaker et al., 1990-91)
DM ArtifactDevelopment
Experimentation
Theory Building
Observation
20.12.05/12:00Agora, Aud 2
Public examination of PhD thesis: “Feature Extraction for Supervised Learning in Knowledge Discovery Systems”
9
Further Research
Meta-Model, ES, KB
Feature Manipu-lators
ML algorithms/ Classifiers
Post-processors/visualisers
Meta-Data
Meta-learning
Data set
KDD-Manager Data Pre-
processors
Instances Manipu-lators
GUI
Data generator
Evaluators
How to help in decision making on the selection of the appropriate DM strategy for a problem at consideration?
When FE is useful for SL?
What is the effect of FE on interpret-ability of results and transparency of SL?
20.12.05/12:00Agora, Aud 2
Public examination of PhD thesis: “Feature Extraction for Supervised Learning in Knowledge Discovery Systems”
10
Additional Slides …
Further Slides for Step-by-Step Analysis of Research Questions and Corresponding Contributions
20.12.05/12:00Agora, Aud 2
Public examination of PhD thesis: “Feature Extraction for Supervised Learning in Knowledge Discovery Systems”
11
Research Questions:
RQ1 – How important is to use class information in the FE process?
RQ2 – Is FE a data- or hypothesis-driven constructive induction?
RQ3 – Is FE for dynamic integration of base-level classifiers useful in a similar way as for a single base-level classifier?
RQ4 – Which features – original, extracted or both – are useful for SL?
RQ5 – How many extracted features are useful for SL?
20.12.05/12:00Agora, Aud 2
Public examination of PhD thesis: “Feature Extraction for Supervised Learning in Knowledge Discovery Systems”
12
Research Questions (cont.):
RQ6 – How to cope with the presence of contextual features in data, and data heterogeneity?
RQ7 – What is the effect of sample reduction on the performance of FE for SL?
RQ8 – When FE is useful for SL?
RQ9 – What is the effect of FE on interpretability of results and transparency of SL?
RQ10 – How to make a decision about the selection of the appropriate DM strategy for a problem at consideration?
20.12.05/12:00Agora, Aud 2
Public examination of PhD thesis: “Feature Extraction for Supervised Learning in Knowledge Discovery Systems”
13
RQ1: Use of class information in FE
Tsymbal A., Puuronen S., Pechenizkiy M., Baumgarten M., Patterson D. 2002. Eigenvector-based Feature Extraction for Classification (Article I, FLAIRS’02)
Use of class information in FE process is crucial for many datasets:
Class-conditional FE can result in better classification accuracy while solely variance-based FE has no effect on or deteriorates the accuracy.
x2 PC(1) PC(2)
a) x1
x2 PC(1) PC(2)
b) x1
No superior technique, but nonparametric approaches are more stables to various dataset characteristics
20.12.05/12:00Agora, Aud 2
Public examination of PhD thesis: “Feature Extraction for Supervised Learning in Knowledge Discovery Systems”
14
RQ2: Is FE a data- or hypothesis-driven CI?
Pechenizkiy M. 2005. Impact of the Feature Extraction on the Performance of a Classifier: kNN, Naïve Bayes and C4.5 (Article III, AI’05)
Search for the most appropriate FE technique
FE process Trans-
formed train set
Train set
Search for the most appropriate SL technique
FE model
SL process
SL model
Test set
Prediction
Search for the most appropriate FE technique
FE process
Trans-formed
Train set
Train set
Search for the most appropriate SL technique
FE model
SL process
SL model
Test set
PredictionRanking of different FE techniques according to the corresponding accuracy results of a SL technique can vary a lot for different datasets.
Different FE techniques behave also in a different way when integrated with different SL techniques.
Selection of FE method is not independent from the selection of classifier
20.12.05/12:00Agora, Aud 2
Public examination of PhD thesis: “Feature Extraction for Supervised Learning in Knowledge Discovery Systems”
15
RQ3: FE for Dynamic Integration of Classifiers
Dynamic Selection
Dynamic Voting
Dynamic Voting with Selection
Dynamic Integration
Divide instances
Data set
Training set Validation set
Test set
S - size of the emsemble N - number of features TS - training subset BC - base classifier NN - nearest neighborhood
Search for NN
Feature Extraction :
RSM(S,N)
TS1
Training BC1
accuracy estimation
TSS
Training BCS
accuracy estimation
TSi
Training BCi
accuracy estimation
Local accuracy estimates
Trained Base
classifiers
Meta-Data
WNN: for each nn predict local errors
of every BC
Transformed training set
Transforma-tion models
Feature subsets refinement
PCA Par Non-Par
trai
ning
phas
e ap
plica
tion
phas
e
...
...
...
...
...
...
Meta-Learning
(Article VIII, Pechenizkiy et al., 2005)
20.12.05/12:00Agora, Aud 2
Public examination of PhD thesis: “Feature Extraction for Supervised Learning in Knowledge Discovery Systems”
16
RQ4: How to construct good RS for SL?
Pechenizkiy M., Tsymbal A., Puuronen S. 2004. PCA-based feature transformation for classification: issues in medical diagnostics, (Article II, CBMS’2004)
representation of instances of class y1
representation of instances of class yk
Selecting most relevant features
Selecting most
representative instances
Combination of original features with extracted features can be beneficial for SL with many datasets, especially when tree-based inducers like C4.5 are used for classification.
Which features – original, extracted or both – are useful for SL?
20.12.05/12:00Agora, Aud 2
Public examination of PhD thesis: “Feature Extraction for Supervised Learning in Knowledge Discovery Systems”
17
RQ4: How to construct good RS for SL? (cont.)
Pechenizkiy M., Tsymbal A., Puuronen S. 2005. On Combining Principal Components with Parametric LDA-based Feature Extraction for Supervised Learning. (Article III, FCDS)
PCA
Training Data
PCs
PAR LDs
Train PCs + LDs SL
Accuracy
Test PCs+LDs
Transform Test Data
Classifier
0.70
0.72
0.74
0.76
0.78
0.80
3NN NB C4.5
LDA PCA LDA+ PCA
0
2
4
6
8
10
12
3NN NB C4.5
- - - = + ++
20.12.05/12:00Agora, Aud 2
Public examination of PhD thesis: “Feature Extraction for Supervised Learning in Knowledge Discovery Systems”
18
RQ5: How many extracted features are useful?
Criteria for selecting the most useful transformed features are often based on variance accounted by the features to be selected
all the components, the corresponding eigenvalues of which are significantly greater than one
a ranking procedure: select principal components that have the highest correlations with the class attribute
1#
1#21#
instaces
featuresseigenvalue
Pechenizkiy M., Tsymbal A., Puuronen S. 2004. PCA-based feature transformation for classification: issues in medical diagnostics, (Article II, CBMS’2004)
20.12.05/12:00Agora, Aud 2
Public examination of PhD thesis: “Feature Extraction for Supervised Learning in Knowledge Discovery Systems”
19
RQ6: How to cope with data heterogeneity?
Pechenizkiy M., Tsymbal A., Puuronen S. 2005. Supervised Learning and Local Dimensionality Reduction within Natural Clusters: Biomedical Data Analysis, (T-ITB, "Mining Biomedical Data“)
TrainingData
TestData
SL SL
Classifier Classifier C1
SL
DR
Natural Clustering
Accuracy
Cluster1 Cluster2 Clustern
SL SL
C2 Cn C1
SL SL SL
C2 Cn
DR DR DR
Accuracy AccuracyAccuracy
20.12.05/12:00Agora, Aud 2
Public examination of PhD thesis: “Feature Extraction for Supervised Learning in Knowledge Discovery Systems”
20
RQ7: What is the effect of sample reduction?
Pechenizkiy M., Puuronen S., Tsymbal A. 2005. The Impact of Sample Reduction on PCA-based Feature Extraction for Naïve Bayes Classification. (Article V, ACM SAC’06: DM Track)
11
%100S
pN
kd-tree
building
Root
kd-tree
11N 1
nN
11
1 NNn
ii
cc S
pN
%100
kd-tree building
Root
kd-tree
cN1c
nN
c
n
i
ci NN
1
FE + SLo o o o o oo o o
k
k
k
clas
s 1
class c
Sample
SSc
ii
1
k
Data
N
k
1N
k
cN
Random Sampling
Random Sampling
20.12.05/12:00Agora, Aud 2
Public examination of PhD thesis: “Feature Extraction for Supervised Learning in Knowledge Discovery Systems”
21
RQ8: When FE is useful for SL?
Kaiser-Meyer-Olkin (KMO) criterion: accounts total and partial correlation
,22
2
i jij
i jij
i jij
ar
r
KMO
jjii
ij
Xij RR
Ra ji
),(.
IF KMO > 0.5
THEN Apply PCA
General General recommendation:recommendation:
Rarely works in the context of SL
20.12.05/12:00Agora, Aud 2
Public examination of PhD thesis: “Feature Extraction for Supervised Learning in Knowledge Discovery Systems”
22
RQ9: What is the effect of FE on interpretability?
Pechenizkiy M., Tsymbal A., Puuronen S. 2004. PCA-based feature transformation for classification: issues in medical diagnostics, (Article II, CBMS’2004)
Interpretability refers to whether a classifier is easy to understand. – rule-based classifiers like a decision tree and association rules are very easy to interpret, – neural networks and other connectionist and “black-box” classifiers have low interpretability.
FE enables: • New concepts – new understanding• Information summary from a large number of features into a
limited number of components • The transformation formulae provide information about the
importance of the original features • Better RS – better neighbourhood – better interpretability by
analogy with similar medical cases• Visual analysis projecting data onto 2D or 3D plots.
20.12.05/12:00Agora, Aud 2
Public examination of PhD thesis: “Feature Extraction for Supervised Learning in Knowledge Discovery Systems”
23
RQ9: Feature Extraction & Interpretability (cont.)
The assessment of interpretability relies on the user’s perception of the classifier
The assessment of an algorithm’s practicality depends much on a user’s background, preferences and priorities.
Most of the characteristics related to practicality can be described only by reporting users’ subjective evaluations.
Thus, – the interpretability issues are disputable and difficult to
evaluate, – many conclusions on interpretability are relative and
subjective. Collaboration between DM researchers and domain experts
is needed for further analysis of interpretability issues
Objectivity of interpretabilityObjectivity of interpretability
Pechenizkiy M., Tsymbal A., Puuronen S. 2004. PCA-based feature transformation for classification: issues in medical diagnostics, (Article II, CBMS’2004)
20.12.05/12:00Agora, Aud 2
Public examination of PhD thesis: “Feature Extraction for Supervised Learning in Knowledge Discovery Systems”
24
RQ10: Framework for DM Strategy Selection
Pechenizkiy M. 2005. DM strategy selection via empirical and constructive induction. (Article IX, DBA’05)
Meta-Model, ES, KB
Feature Manipu-lators
ML algorithms/ Classifiers
Post-processors/visualisers
Meta-Data
Meta-learning
Data set
KDD-Manager Data Pre-
processors
Instances Manipu-lators
GUI
Data generator
Evaluators
20.12.05/12:00Agora, Aud 2
Public examination of PhD thesis: “Feature Extraction for Supervised Learning in Knowledge Discovery Systems”
25
Additional Slides …
20.12.05/12:00Agora, Aud 2
Public examination of PhD thesis: “Feature Extraction for Supervised Learning in Knowledge Discovery Systems”
26
Meta-Learning
Suggested technique
A new data set Meta-model
Collection of data sets
Collection of techniques
Meta-learning space
Performance criteria
Knowledge repository
Evaluation
20.12.05/12:00Agora, Aud 2
Public examination of PhD thesis: “Feature Extraction for Supervised Learning in Knowledge Discovery Systems”
27
New Research Framework for DM Research
Ap
plic
able
K
no
wle
dg
e
(Un-)Successful Applications in the appropriate environment
People Organizations Technology
Environment Knowledge Base
Foundations Design knowledge
Develop/Build
Justify/Evaluate
Assess Refine
Contribution to Knowledge Base
DM Research
Bu
siness
Need
s
Relevance Rigor
20.12.05/12:00Agora, Aud 2
Public examination of PhD thesis: “Feature Extraction for Supervised Learning in Knowledge Discovery Systems”
28
People Roles Capabilities Characteristics Organizations Strategy Structure&Culture Processes Technology Infrastructure Applications Communications Architecture Development Capabilities
Environment Knowledge Base
Foundations Base-level theories Frameworks Models Instantiation Validation Criteria Design knowledge Methodologies Validation Criteria (not instantiations of models but KDD processes, services, systems)
Develop/Build Theories Artifacts
Justify/ Evaluate Analytical Case Study Experimental Field Study Simulation
Assess Refine
(Un-)Successful Applications in the appropriate environment
Contribution to Knowledge Base
DM Research
Ap
plic
able
Kn
ow
led
ge
Bu
sines
s Ne
eds
Relevance Rigor
New Research Framework for DM Research
… following Hevner et al. framework
20.12.05/12:00Agora, Aud 2
Public examination of PhD thesis: “Feature Extraction for Supervised Learning in Knowledge Discovery Systems”
29
Some Multidisciplinary Research Pechenizkiy M., Puuronen S., Tsymbal A. 2005. Why Data Mining Does
Not Contribute to Business? In: C.Soares et al. (Eds.), Proc. of Data Mining for Business Workshop, DMBiz (ECML/PKDD’05), Porto, Portugal, pp. 67-71.
Pechenizkiy M., Puuronen S., Tsymbal A. 2005. Competitive advantage from Data Mining: Lessons learnt in the Information Systems field. In: IEEE Workshop Proc. of DEXA’05, 1st Int. Workshop on Philosophies and Methodologies for Knowledge Discovery PMKD’05, IEEE CS Press, pp. 733-737 (Invited paper).
Pechenizkiy M., Puuronen S., Tsymbal A. 2005. Does the relevance of data mining research matter? (resubmitted as a book chapter to) Foundations of Data Mining, Springer.
Pechenizkiy M., Tsymbal A., Puuronen S. 2005. Knowledge Management Challenges in Knowledge Discovery Systems. In: IEEE Workshop Proc. of DEXA’05, 6th Int. Workshop on Theory and Applications of KM, TAKMA’05, IEEE CS Press, pp. 433-437.
20.12.05/12:00Agora, Aud 2
Public examination of PhD thesis: “Feature Extraction for Supervised Learning in Knowledge Discovery Systems”
30
Some Applications: Pechenizkiy M., Tsymbal A., Puuronen S., Shifrin M., Alexandrova I.
2005. Knowledge Discovery from Microbiology Data: Many-sided
Analysis of Antibiotic Resistance in Nosocomial Infections. In: K.D.
Althoff et al. (Eds) Post-Conference Proc. of 3rd Conf. on Professional
Knowledge Management: Experiences and Visions, LNAI 3782,
Springer Verlag, pp. 360-372.
Pechenizkiy M., Tsymbal A., Puuronen S. 2005. Supervised Learning
and Local Dimensionality Reduction within Natural Clusters:
Biomedical Data Analysis, (T-ITB, "Mining Biomedical Data“)
Tsymbal A., Pechenizkiy M., Cunningham P., Puuronen S. 2005.
Dynamic Integration of Classifiers for Handling Concept Drift.
(submitted to Special Issue on Application of Ensembles, Information
Fusion, Elsevier)
20.12.05/12:00Agora, Aud 2
Public examination of PhD thesis: “Feature Extraction for Supervised Learning in Knowledge Discovery Systems”
31
Contact Info
Mykola Pechenizkiy
Department of Computer Science and Information Systems,
University of Jyväskylä, FINLANDE-mail: [email protected]
Tel. +358 14 2602472Mobile: +358 44 3851845
Fax: +358 14 2603011www.cs.jyu.fi/~mpechen
THANK YOU!
MS Power Point slides of recent talks and full texts of selected publications
are available online at: www.cs.jyu.fi/~mpechen