zespołowe podej ścia do selekcji cech – metody zbiorów...
TRANSCRIPT
Zespołowe Podejścia do Selekcji Cech – Metody Zbiorów Przybliżonych
Dominik Ślęzak
8 stycznia 2013
Poznań, Polska
Introduction
� The theory of rough sets founded in 1981 by Prof.
Zdzisław Pawlak provides the means for handling
incompleteness and uncertainty in large data sets
� In the process of knowledge discovery, one can
search for decision reducts, which are irreducible
subsets of attributes determining decision values
� Dependencies in data can be expressed in terms
of, e.g., discernibility or rough set approximations
� There are also rough-set-inspired computational
models, such as rough clustering, rough SQL etc.
Outlook Temp. Humid. Wind Sport?
1 Sunny Hot High Weak No
2 Sunny Hot High Strong No
3 Overcast Hot High Weak Yes
4 Rain Mild High Weak Yes
5 Rain Cold Normal Weak Yes
6 Rain Cold Normal Strong No
7 Overcast Cold Normal Strong Yes
8 Sunny Mild High Weak No
9 Sunny Cold Normal Weak Yes
10 Rain Mild Normal Weak Yes
11 Sunny Mild Normal Strong Yes
12 Overcast Mild High Strong Yes
13 Overcast Hot Normal Weak Yes
14 Rain Mild High Strong No
IF (H=Normal)
AND (T=Mild)
THEN (S=Yes)
It corresponds
to a data block
included in the
positive region
of the decision
class “Yes”
Decision Tables & Rules
O u tlo o k T em p . H u m id . W in d S p o rt?
1 S u n ny H o t H igh W eak N o
2 S u n ny H o t H igh S tron g N o
3 O v ercast H o t H igh W eak Y es
4 R ain M ild H igh W eak Y es
5 R ain C o ld N o rm a l W eak Y es
6 R ain C o ld N o rm a l S tron g N o
7 O v ercast C o ld N o rm a l S tron g Y es
8 S u n ny M ild H igh W eak N o
9 S u n ny C o ld N o rm a l W eak Y es
10 R ain M ild N o rm a l W eak Y es
11 S u n ny M ild N o rm a l S tron g Y es
12 O v ercast M ild H igh S tron g Y es
13 O v ercast H o t N o rm a l W eak Y es
14 R ain M ild H igh S tron g N o
Sport? = YesClasses of objects
with the same values
of Temp. and Humid.
Rules & Indiscernibility Classes
Rules & Rough Set Approximations
Lower Approxi-mation: Objects certainly in X (the exact rules for X)
Upper Approxi-mation: Objects that may be in X (the rules whichdon’t exclude X)
Reducts Preserving Positive Region
� Indiscernibility classes
gather objects with
(almost) the same values
on a subset of attributes
� If some attributes are
removed, indiscernibility
classes may be merged
� So, lower approximations
of some decision classes
may (significantly or just
slightly) decrease
Reducts are Everywhere!!!
� In rough sets, reducts are irreducible subsets
of attributes that provide specified information
� In databases, we have keys, multivalued
dependencies, soft dependencies, etc.
� In probabilistic modeling, we have Markov
boundaries (probabilistic decision reducts)
� In bioinformatics, we have signatures:
irreducible subsets of genes providing
enough information about cancer
An Ensemble of Decision Reducts
� A family of reducts that all together contain
many attributes but having small amount
of attributes repeating in different reducts
� Ensembles of classifiers – diversity
improves predictive performance
� Knowledge bases – more complete
knowledge about data dependencies
� Domain experts – lower risk of removing
important information from decision model
Different Approaches to Attribute Reduction
� Algorithms & Structures:� Exhaustive search, greedy heuristics,
genetic algorithms, attribute clustering
� Discernibility matrices, data sorting,
hash tables, SQL-based algorithms
� Reduction Constraints:� Keep (almost) the same appro-
ximations of decision classes
� Discern between (almost) all pairs of
objects with different decision values
� Keep at (almost) the same level
a value of some quality function
� Optimization Goals:� Search for minimal reducts
� Search for reducts inducing
a minimum amount of rules
� Search for ensembles
of different reducts
� POS(d/B): amount of objects belonging to lower approximations of decision classes
� Ind(d/B) = Disc(B∪{d}) – Disc(B) where
Disc(C) = |{(x,y): a(x)≠a(y) for some a∈C}|
� Relative Gain R(d/B) =
Decision measures to be preserved at (almost!) the
same level during the process of attribute reduction
Attribute Reduction based on o-GA
� Genetic level, where each chromosome encodes a permutation τ of attributes
� Heuristic level, where permutationsτ are put into the following algorithm:
1. For τ: {1,...,|A|} → {1,...,|A|}, let Bτ = A;
2. For i = 1 to |A| repeat steps 3 and 4;
3. Let Bτ ← Bτ \ {aτ(i)};
4. If R(d/Bτ) < (1–ε) R(d/A), undo step 3;
Here we can put a failure of arbitraryconstraint for preserving information
Reducts mapped by most permutations
� Those with least cardinality
� Those with least intersections with others
� A good basis for an ensemble construction
Attribute Space
Reducts
� Thousands of genes-attributes to analyze
� Number of experiments-objects quite low
� Simple knowledge representation needed
� Black-box approaches unacceptable
� Standard discretization unacceptable
� Rules too detailed for this level
The Case Study of Gene Expression Data
a b c d
u1 3 7 3 0
u2 2 1 0 1
u3 4 0 6 1
u4 0 5 1 2
a* b* c* d*
(u1,u1) 1+ 1+ 1+ 0
(u1,u2) 1– 1– 1– 1
(u1,u3) 1+ 1– 1+ 1
(u1,u4) 1– 1– 1– 2
(u2,u1) 2+ 2+ 2+ 0
(u2,u2) 2+ 2+ 2+ 1
(u2,u3) 2+ 2– 2+ 1
(u2,u4) 2– 2+ 2+ 2
(u3,u1) 3– 3+ 3– 0
(u3,u2) 3– 3+ 3– 1
(u3,u3) 3+ 3+ 3+ 1
(u3,u4) 3– 3+ 3– 2
(u4,u1) 4+ 4+ 4+ 0
(u4,u2) 4+ 4– 4– 1
(u4,u3) 4+ 4– 4+ 1
(u4,u4) 4+ 4+ 4+ 2
IF a≥3 AND b≥7 THEN d=0
IF a≥3 AND b<7 THEN d=1
IF a≥2 AND b<1 THEN d=1
IF a<2 AND b≥1 THEN d=2
IF a≥4 AND b≥0 THEN d=1
IF a≥0 AND b<5 THEN d=1
POS(a*,b*) = POS(a*,b*,c*)
POS(a*) ⊂⊂⊂⊂ POS(a*,b*,c*)
POS(b*) ⊂⊂⊂⊂ POS(a*,b*,c*)
Fuzzy Discernibility & Discretization
� Consider the following modificationof Ind(d/B) for numeric attributes B:
� The above measure equals to:
where
� An analogous relationship can be obtainedalso for a numeric decision attribute d
CLUSTERS OF ATTRIBUTES
REDUCTS WITH CLUSTER REP-RESENTATIVES
FEED
BACK
Grużdź, Ihnatowicz, Ślęzak: Interactive gene clustering – a case study of breast cancer microarray data. Inf. Systems Frontiers 8 (2006).
Inspirations for Attribute Clustering
Abeel et al: Robust Biomarker Identification for Cancer Diagnosis with Ensemble Feature Selection Methods. Bioinformatics 26(3) (2010).
Janusz, Ślęzak: Utilization of Attribute Clustering Methods for Scalable Computation of Reducts from High-Dimensional Data. FedCSIS 2012.
Reduct Computation Scalability
� With respect to the amount of objects
� MapReduce methodology
� SQL-based algorithms etc.
� With respect to the amount of attributes
� Linear complexity of heuristic algorithms
with respect to the amount of attributes
� Attribute clustering – grouping together attributes
that are replaceable / interchangeable in reducts
� With respect to the types of reduction criteria
� A need of meta-algorithms that work in a similar
way with different types of constraints, different
parameter settings, and different types of data
Bireducts – subsets of attributes paired with subsets of
objects, for which the corresponding classifiers work well
An Illustrative Example
� (B,X) is a proximity bireduct, if X is convex (it
has no „wholes”) due to a given metric space
� Temporal bireducts: given a timestamp-based
ordering of objects, X needs to be an interval
� Consider (B,X), where X is a buffer of objects
that occurred most recently in a data stream
� If the next x is contradictory with X subject to B, we
can remove the oldest contradictory objects from X
and/or add some attributes to B to be able to add x
� If the next x can be added to X subject to B, we can
decrease B in order to avoid too rapid growth of X
Toward Feature Selection on Streams
Ensembles of Decision Bireducts
� Better control of deficiencies of local classifiers
based on particular bireducts in the ensemble
Permutations for Bireducts and Concepts
� S. Widz, D. Ślęzak: Rough Set Based Decision Support – Models Easy to
Interpret. In: Rough Sets: Selected Methods and Applications in Management &
Engineering. Springer, 95-112 (2012)
� A. Janusz, D. Ślęzak: Utilization of Attribute Clustering Methods for Scalable
Computation of Reducts from High-Dimensional Data. FedCSIS 2012: 295-302
� D. Ślęzak, P. Betliński: A Role of (Not) Crisp Discernibility in Rough Set Approach
to Numeric Feature Selection. AMLTA 2012
� D. Ślęzak, A. Janusz: Ensembles of Bireducts: Towards Robust Classification and
Simple Representation. FGIT 2011: 64-77
� D. Ślęzak: Rough Sets and Functional Dependencies in Data: Foundations of
Association Reducts. Tr. Comp. Sci. 5: 182-205 (2009)
� D. Ślęzak: Rough Sets and Few-Objects-Many-Attributes Problem: The Case
Study of Analysis of Gene Expression Data Sets. FBIT 2007: 437-442
� D. Ślęzak, J. Wróblewski: Roughfication of Numeric Decision Tables: The Case
Study of Gene Expression Data. RSKT 2007: 316-323
� S. Widz, D. Ślęzak: Approximation Degrees in Decision Reduct-based MRI
Segmentation. FBIT 2007: 431-436
� D. Ślęzak, W. Ziarko: The Investigation of the Bayesian Rough Set Model. Int. J.
Approx. Reasoning. 40(1-2): 81-91 (2005)
� J.G. Bazan, A. Skowron, D. Ślęzak, J. Wróblewski: Searching for the Complex
Decision Reducts: The Case Study of the Survival Analysis. ISMIS 2003: 160-168
Selected Papers about Attribute Reduction