extending propositional satisfiability to determine minimal fuzzy-rough reducts
DESCRIPTION
Extending Propositional Satisfiability to Determine Minimal Fuzzy-Rough Reducts. Outline. The importance of feature selection Rough set theory Fuzzy-rough feature selection (FRFS) FRFS-SAT Experimentation Conclusion. Feature selection. Why dimensionality reduction/feature selection? - PowerPoint PPT PresentationTRANSCRIPT
Richard Jensen, Andrew Tuson and Qiang Shen
Qiang ShenAberystwyth University, UK
Richard JensenAberystwyth University, UK
Andrew TusonCity University, UK
Extending Propositional Satisfiability to Determine Minimal Fuzzy-Rough
Reducts
Richard Jensen, Andrew Tuson and Qiang Shen
Outline• The importance of feature selection
• Rough set theory
• Fuzzy-rough feature selection (FRFS)
• FRFS-SAT
• Experimentation
• Conclusion
Richard Jensen, Andrew Tuson and Qiang Shen
• Why dimensionality reduction/feature selection?
• Growth of information - need to manage this effectively• Curse of dimensionality - a problem for machine learning
High dimensionaldata
DimensionalityReduction
Low dimensionaldata
Processing System
Intractable
Feature selection
Richard Jensen, Andrew Tuson and Qiang Shen
Rough set theory
Rx is the set of all points that are indiscerniblewith point x in terms of feature subset B
UpperApproximation
Set A
LowerApproximation
Equivalence class Rx
Richard Jensen, Andrew Tuson and Qiang Shen
Discernibility approach• Decision-relative discernibility matrix
• Compare objects• Examine attribute values• For attributes that differ:
• If decision values differ, include attributes in matrix• Else leave slot blank
• Construct discernibility function:
)}()( ),()(|{ jijiij xdxdxaxaCac
}|,|1:)({),...,( 1 ijijmC cUijcaaf
Richard Jensen, Andrew Tuson and Qiang Shen
Example
• Remove duplicatesfC(a,b,c,d) = {a b c d} {a c d} ⋁ ⋁ ⋁ ⋀ ⋁ ⋁ ⋀
{b c} {d} {a b c} ⋁ ⋀ ⋀ ⋁ ⋁ ⋀{a b d} {b c d} {a d}⋁ ⋁ ⋀ ⋁ ⋁ ⋀ ⋁
• Remove supersetsfC(a,b,c,d) = {b c} {d}⋁ ⋀
Richard Jensen, Andrew Tuson and Qiang Shen
Finding reducts• Usually too expensive to search exhaustively for
reducts with minimal cardinality
• Reducts found through:• Converting from CNF to DNF (expensive)• Hill-climbing search using clauses (non-optimal)• Other search methods - GAs etc (non-optimal)
• RSAR-SAT• Solve directly in SAT formulation.• DPLL approach is both fast and ensures optimal
reducts
Richard Jensen, Andrew Tuson and Qiang Shen
Fuzzy discernibility matrices• Extension of crisp approach
• Previously, attributes had {0,1} membership to clauses• Now have membership in [0,1]• Allows real-coded data as well as nominal.
• Fuzzy DMs can be used to find fuzzy-rough reducts
Richard Jensen, Andrew Tuson and Qiang Shen
Formulation• Fuzzy satisfiability
• In crisp SAT, a clause is fully satisfied if at least one variable in the clause has been set to true
• For the fuzzy case, clauses may be satisfied to a certain degree depending on which variables have been assigned the value true
Richard Jensen, Andrew Tuson and Qiang Shen
Experimentation: setup• 9 benchmark datasets
• Features – 10 to 39• Objects – 120 to 690
• Methods used: • FRFS-SAT• Greedy hill-climbing: fuzzy dependency, fuzzy boundary region and
fuzzy discernibility.• Evolutionary algorithms: genetic algorithms (GA) and particle swarm
optimization (PSO) using fuzzy dependency
• 10x10-fold cross validation• FS performed on the training folds, test folds reduced using
discovered reducts
Richard Jensen, Andrew Tuson and Qiang Shen
Experimentation: results
Richard Jensen, Andrew Tuson and Qiang Shen
Conclusion• Extended propositional satisfiability to enable
search for fuzzy-rough reducts• New framework for fuzzy satisfiability• New DPLL algorithm• Fuzzy clause simplification
• Future work:• Non-chronological backtracking• Better heuristics• Unsupervised FS• Other extensions in propositional satisfiability
Richard Jensen, Andrew Tuson and Qiang Shen
• WEKA implementations of all fuzzy-rough feature selectors and classifiers can be downloaded from:
http://users.aber.ac.uk/rkj/book/weka.zip
Richard Jensen, Andrew Tuson and Qiang Shen
Feature selection• Feature selection (FS) is a DR technique that
preserves data semantics (meaning of data)
• Subset generation: forwards, backwards, random…• Evaluation function: determines ‘goodness’ of subsets• Stopping criterion: decide when to stop subset search
Generation Evaluation
StoppingCriterion Validation
Feature set Subset
Subsetsuitability
Continue Stop
Richard Jensen, Andrew Tuson and Qiang Shen
Algorithm
Richard Jensen, Andrew Tuson and Qiang Shen
Example