fuzzy-rough instance selection
DESCRIPTION
Fuzzy-Rough Instance Selection. Outline. The importance of instance selection Rough set theory Fuzzy-rough sets Fuzzy-rough instance selection Experimentation Conclusion. Instance selection. Knowledge discovery The problem of too much data Requires storage - PowerPoint PPT PresentationTRANSCRIPT
Richard Jensen and Chris Cornelis
Chris CornelisGhent University, Belgium
Richard JensenAberystwyth University, UK
Fuzzy-Rough Instance Selection
Richard Jensen and Chris Cornelis
Outline• The importance of instance selection
• Rough set theory
• Fuzzy-rough sets
• Fuzzy-rough instance selection
• Experimentation
• Conclusion
Richard Jensen and Chris Cornelis
• Knowledge discovery
• The problem of too much data• Requires storage• Intractable for data mining algorithms
• Removing data that is noisy or irrelevant
Instance selection
Richard Jensen and Chris Cornelis
Rough set theory
Rx is the set of all points that are indiscernible with point x
UpperApproximation
Set A
LowerApproximation
Equivalence class Rx
Richard Jensen and Chris Cornelis
Fuzzy-rough sets• Approximate equality
• Handle real-valued features via fuzzy tolerance relations instead of crisp equivalence
• Better noise and uncertainty handling
• Focus has been on feature selection, not instance selection
Richard Jensen and Chris Cornelis
Fuzzy-rough sets• Parameterized relation
• Fuzzy-rough definitions:
Richard Jensen and Chris Cornelis
Instance selection: basic idea
Not needed
Remove objects to keep the underlying approximations unchanged
Richard Jensen and Chris Cornelis
Instance selection: basic idea
Remove objects to keep the underlying approximations unchanged
Richard Jensen and Chris Cornelis
FRIS-I
Richard Jensen and Chris Cornelis
FRIS-II
Richard Jensen and Chris Cornelis
FRIS-III
Richard Jensen and Chris Cornelis
Experimentation: setup
Richard Jensen and Chris Cornelis
Results: FRIS-I (heart)
• (214 objects, 9 features)
Richard Jensen and Chris Cornelis
Results: FRIS-II (heart)
Richard Jensen and Chris Cornelis
Results: FRIS-III (heart)
Richard Jensen and Chris Cornelis
Conclusion• Proposed new techniques for instance selection
based on fuzzy-rough sets• Managed to reduce the number of instances significantly,
retaining classification accuracy
• Future work• Many possibilities for novel fuzzy-rough instance
selection methods• Comparisons with non-rough techniques• Improving the complexity of FRIS-III• Combined instance/feature selection
Richard Jensen and Chris Cornelis
• WEKA implementations of all fuzzy-rough methods can be downloaded from:
http://users.aber.ac.uk/rkj/book/weka.zip