1 feature selection: algorithms and challenges joint work with yanglan gang, hao wang & xuegang...

24
1 Feature Selection: Algorithms and Challenges Joint Work with Yanglan Gang, Hao Wang & Xuegang Hu Xindong Wu Xindong Wu University of Vermont, USA; Hefei University of Technology, China 合合合合合合合合合合合合合合合合合合合

Upload: samson-mosley

Post on 24-Dec-2015

219 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1 Feature Selection: Algorithms and Challenges Joint Work with Yanglan Gang, Hao Wang & Xuegang Hu Xindong Wu University of Vermont, USA; Hefei University

1

Feature Selection: Algorithms and Challenges

Joint Work with Yanglan Gang, Hao Wang & Xuegang Hu

Xindong WuXindong Wu

University of Vermont, USA;Hefei University of Technology, China合肥工业大学计算机应用长江学者讲座教授

Page 2: 1 Feature Selection: Algorithms and Challenges Joint Work with Yanglan Gang, Hao Wang & Xuegang Hu Xindong Wu University of Vermont, USA; Hefei University

2

Deduction Induction: My Research Background

1988Expert Systems

19951990Expert Systems

2004

……

Page 3: 1 Feature Selection: Algorithms and Challenges Joint Work with Yanglan Gang, Hao Wang & Xuegang Hu Xindong Wu University of Vermont, USA; Hefei University

3

Outlines

1. Why feature selection

2. What is feature selection

3. Components of feature selection

4. Some research efforts by myself

5. Challenges in feature selection

Page 4: 1 Feature Selection: Algorithms and Challenges Joint Work with Yanglan Gang, Hao Wang & Xuegang Hu Xindong Wu University of Vermont, USA; Hefei University

4

1. Why Feature Selection?

High-dimensional data often contain irrelevant or redundant features reduce the accuracy of data mining algorithms slow down the mining process be a problem in storage and retrieval hard to interpret

Page 5: 1 Feature Selection: Algorithms and Challenges Joint Work with Yanglan Gang, Hao Wang & Xuegang Hu Xindong Wu University of Vermont, USA; Hefei University

5

2. What Is Feature Selection?

Select the most “relevant” subset of attributes according to some selection criteria.

Page 6: 1 Feature Selection: Algorithms and Challenges Joint Work with Yanglan Gang, Hao Wang & Xuegang Hu Xindong Wu University of Vermont, USA; Hefei University

6

Outlines

1. Why feature selection

2. What is feature selection

3.3. Components of feature selectionComponents of feature selection

4. Some research efforts by myself

5. Challenges in feature selection

Page 7: 1 Feature Selection: Algorithms and Challenges Joint Work with Yanglan Gang, Hao Wang & Xuegang Hu Xindong Wu University of Vermont, USA; Hefei University

7

Traditional TaxonomyTraditional Taxonomy

Wrapper approach Features are selected as part of the mining algorithm

Filter approach Features selected before a mining algorithm,using

heuristics based on general characteristics of the data, rather than a learning algorithm to evaluate the merit of feature subsets

Wrapper approach is generally more accurate but also more computationally expensive.

Page 8: 1 Feature Selection: Algorithms and Challenges Joint Work with Yanglan Gang, Hao Wang & Xuegang Hu Xindong Wu University of Vermont, USA; Hefei University

8

Components of Feature Selection

Feature selection is actually a search problem, including four basic components:

1. an initial subset

2. one or more selection criteria ( * )

3. a search strategy ( * )4. some given stopping conditions

Page 9: 1 Feature Selection: Algorithms and Challenges Joint Work with Yanglan Gang, Hao Wang & Xuegang Hu Xindong Wu University of Vermont, USA; Hefei University

9

Feature Selection Criteria Selection criteria generally use “relevance” to

estimate the goodness of a selected feature subset in one way or another: Distance Measure Information MeasureInformation Measure Inconsistency Measure Relevance Estimation Selection Criteria related to Learning Algorithms (wrapper

approach) Some unified framework for relevance has been

proposed recently.

Page 10: 1 Feature Selection: Algorithms and Challenges Joint Work with Yanglan Gang, Hao Wang & Xuegang Hu Xindong Wu University of Vermont, USA; Hefei University

10

Search Strategy

Exhaustive Search Every possible subset is evaluated and the best one is

chosen Guarantee the optimal solution Low efficiency

A modified approach: B&B

Page 11: 1 Feature Selection: Algorithms and Challenges Joint Work with Yanglan Gang, Hao Wang & Xuegang Hu Xindong Wu University of Vermont, USA; Hefei University

11

Search Strategy (2)

Heuristic search Sequential search, including SFS,SFFS,SBS and SBFS SFS: Start with empty attribute set

Add “best” of attributes Add “best” of remaining attributes Repeat until the maximum performance is reached

SBS: Start with the entire attribute set Remove “worst” of attributes Repeat until the maximum performance has been reached.

Page 12: 1 Feature Selection: Algorithms and Challenges Joint Work with Yanglan Gang, Hao Wang & Xuegang Hu Xindong Wu University of Vermont, USA; Hefei University

12

Search Strategy (3)

Random search

It proceeds in two different ways Inject randomness into classical sequential approaches

(simulated annealing, beam search, the genetic algorithm , and random-start hill-climbing)

Generate the next subset randomly

The use of randomness can help to escape local optima in the search space, and the optimality of the selected subset would depend on the available resources.

Page 13: 1 Feature Selection: Algorithms and Challenges Joint Work with Yanglan Gang, Hao Wang & Xuegang Hu Xindong Wu University of Vermont, USA; Hefei University

13

Outlines

1. Why feature selection

2. What is feature selection

3. Components of feature selection

4.4. Some research efforts by myselfSome research efforts by myself

5. Challenges in feature selection

Page 14: 1 Feature Selection: Algorithms and Challenges Joint Work with Yanglan Gang, Hao Wang & Xuegang Hu Xindong Wu University of Vermont, USA; Hefei University

14

RITIO: Rule Induction Two In One

Feature selection using the information gain in a reverse order

Delete features that are lest informative Results are significant compared to forward

selection [Wu et al 1999, TKDE].

Page 15: 1 Feature Selection: Algorithms and Challenges Joint Work with Yanglan Gang, Hao Wang & Xuegang Hu Xindong Wu University of Vermont, USA; Hefei University

15

Induction as Pre-processing

Use one induction algorithm to select attributes for another induction algorithm Can be a decision-tree method for rule induction, or vice

versa Accuracy results are not as good as expected Reason: feature selection normally causes

information loss Details: [Wu 1999, PAKDD].

Page 16: 1 Feature Selection: Algorithms and Challenges Joint Work with Yanglan Gang, Hao Wang & Xuegang Hu Xindong Wu University of Vermont, USA; Hefei University

16

Subspacing with Asysmetric Bagging

When the number of examples is less than the When the number of examples is less than the number of attributesnumber of attributes

When the number of positive examples is smaller When the number of positive examples is smaller than the number of negative examplesthan the number of negative examples

An example: content-based information retrievalAn example: content-based information retrieval Details: [Tao et al., 2006, TPAMI].Details: [Tao et al., 2006, TPAMI].

Page 17: 1 Feature Selection: Algorithms and Challenges Joint Work with Yanglan Gang, Hao Wang & Xuegang Hu Xindong Wu University of Vermont, USA; Hefei University

17

Outlines

1. Why feature selection

2. What is feature selection

3. Components of feature selection

4. Some research efforts by myself

5.5. Challenges in feature selectionChallenges in feature selection

Page 18: 1 Feature Selection: Algorithms and Challenges Joint Work with Yanglan Gang, Hao Wang & Xuegang Hu Xindong Wu University of Vermont, USA; Hefei University

18

Challenges in Feature Selection (1)Challenges in Feature Selection (1)

Dealing with ultra-high dimensional data and feature interactions

Traditional feature selection encounter two major problems when the dimensionality runs into tens or hundreds of thousands:

1. curse of dimensionality

2. the relative shortage of instances.

Page 19: 1 Feature Selection: Algorithms and Challenges Joint Work with Yanglan Gang, Hao Wang & Xuegang Hu Xindong Wu University of Vermont, USA; Hefei University

19

Challenges in Feature Selection (2)Challenges in Feature Selection (2)

Dealing with active instances (Liu et al., 2005) When the dataset is huge, feature selection performed on the

whole dataset is inefficient,

so instance selection is necessary: Random sampling (pure random sampling without

exploiting any data characteristics) Active feature selection (selective sampling using data

characteristics achieves better or equally good results with a significantly smaller number of instances).

Page 20: 1 Feature Selection: Algorithms and Challenges Joint Work with Yanglan Gang, Hao Wang & Xuegang Hu Xindong Wu University of Vermont, USA; Hefei University

20

Challenges in Feature Selection (3)Challenges in Feature Selection (3)

Dealing with new data types (Liu et al., 2005) traditional data type: an N*M data matrix

Due to the growth of computer and Internet/Web techniques, new data types are emerging:

text-based data (e.g., e-mails, online news, newsgroups) semistructure data (e.g., HTML, XML) data streams.

Page 21: 1 Feature Selection: Algorithms and Challenges Joint Work with Yanglan Gang, Hao Wang & Xuegang Hu Xindong Wu University of Vermont, USA; Hefei University

21

Challenges in Feature Selection (4)Challenges in Feature Selection (4)

Unsupervised feature selection Feature selection vs classification: almost every

classification algorithm Subspace method with the curse of

dimensionality in classification Subspace clustering.

Page 22: 1 Feature Selection: Algorithms and Challenges Joint Work with Yanglan Gang, Hao Wang & Xuegang Hu Xindong Wu University of Vermont, USA; Hefei University

22

Challenges in Feature Selection (5)Challenges in Feature Selection (5)

Dealing with predictive-but-unpredictable attributes in noisy data Attribute noise is difficult to process, and removing noisy

instances is dangerous Predictive attributes: essential to classification Unpredictable attributes: cannot be predicted by the class

and other attributes Noise identification, cleansing, and measurement

need special attention [Yang et al., 2004]

Page 23: 1 Feature Selection: Algorithms and Challenges Joint Work with Yanglan Gang, Hao Wang & Xuegang Hu Xindong Wu University of Vermont, USA; Hefei University

23

Challenges in Feature Selection (6)Challenges in Feature Selection (6)

Deal with inconsistent and redundant features Redundancy can indicate reliability Inconsistency can also indicate a problem for handling

Researchers in Rough Set Theory: What is the purpose of feature selection? Can you really demonstrate the usefulness of reduction, in data

mining accuracy, or what? Removing attributes can well result in information loss When the data is very noisy, removals can cause a very different data

distribution Discretization can possibly bring new issues.

Page 24: 1 Feature Selection: Algorithms and Challenges Joint Work with Yanglan Gang, Hao Wang & Xuegang Hu Xindong Wu University of Vermont, USA; Hefei University

24

Concluding Remarks

Feature selection is and will remain an important issue in data mining, machine learning, and related disciplines

Feature selection has a price in accuracy for efficiency

Researchers need to have the bigger picture in mind, not just doing selection for the purpose of feature selection.