[email protected] winter 2014 presentation outline feature selection categorize and describe...
TRANSCRIPT
- Slide 1
- Slide 2
- [email protected] Winter 2014
- Slide 3
- Presentation Outline Feature Selection Categorize and Describe Various Algorithms for Feature Selection A Short View on the Dimension Reduction My Paper
- Slide 4
- Slide 5
- Dimension (Feature or Variable)
- Slide 6
- Dimension (Feature or Variable) Two feature of person: weight hight
- Slide 7
- The curse of dimensionality Observe that the data become more and more sparse in higher dimensions (a) 12 samples that fall inside the unit- sized box (b) 7 samples in box(C) 2 samples in box Dimensionality reduction Effective solution to the problem of curse of dimensionality is: Dimensionality reduction
- Slide 8
- Dimension Reduction General objectives of dimensionality reduction: I.Improve the quality of data for efficient data-intensive processing tasks II.Reduce the computational cost and avoid data over-fitting
- Slide 9
- Dimension Reduction Dimensionality reduction approaches include : Feature Selection Feature Extraction
- Slide 10
- Dimension Reduction Feature Extraction: Create new feature based on transformations or combinations of the original feature set. N: Number of original features M: Number of extracted features M
- Focus Feature selection Methods Compatibility with the least number of features Search tree --- > BFS
- Slide 35
- LVF Las Vegas Filter Feature selection Methods Searches for a minimal subset of features N: Number of feature (attribute) M: number of Samples (examples) Evaluation Criterion: inconsistency t max : predetermined number of iteration
- Slide 36
- SFS (Sequential Forward Selection) SBS (Sequential Backward Selection) Feature selection Methods Nesting Effect plus-l-take-away-r SFFS (Sequential forward Floating Search) SBFS (Sequential Backward Floating Search)
- Slide 37
- GA (Genetic Algorithm) Feature selection Methods Crossover Mutation SA (Simulated Annealing) RMHC-PF1 (Random Mutation Hill Climbing- Prototype and Feature selection) find sets of prototypes for nearest neighbor classification is a Monte Carlo method can be converted to a Las Vegas algorithm by running the many times.
- Slide 38
- Slide 39
- Three methods commonly used in feature selection : Filter model --- > not consider interrelationship between the features Wrapper model --- > High Complexity Embedded methods Feature redundancy Failure to select the appropriate number of features Defining the problem as a game Defining the problem as a game
- Slide 40
- Problem as a One-Player Game Defining the problem as a Markov Decision Process Scan environment by Reinforcement Learning Methods Feature selection Method : to consider the interrelationship between the features Upper Confidence Graph Method
- Slide 41
- The main algorithms : The main algorithms : Dynamic programming Monte Carlo Method Temporal Difference Learning
- Slide 42
- The best policy possible in the situation f reward that have already achieved The whole set of features Subset of features each allowed action
- Slide 43
- Average score collected by this feature The number of times that this feature is selected
- Slide 44
- Benchmarks Information Gain CHI-squared statistic Feature Asseeement by Sliding Threshold(FAST) WEKA Software
- Slide 45
- Slide 46
- Slide 47
- Any Question? May 201346 Thanks for your attention